Introduction
In this lab, you will learn how to implement full-text search in PostgreSQL. Full-text search provides the capability to search natural language documents for specific words or phrases. You will start by creating an articles table and adding sample data. Then, you will add a search_vector column of type tsvector, which is a specialized data type for storing processed text. You will create an index to speed up searches and then perform queries using tsquery. Finally, you will set up a trigger to automatically update the search data whenever the source text is modified, ensuring your search index remains current.
Create a Table and Insert Sample Data
In this step, you will connect to the PostgreSQL database, create an articles table, and insert some sample data. This table will serve as the foundation for implementing and testing full-text search.
First, open a terminal and connect to the PostgreSQL interactive shell (psql) as the postgres user. All subsequent SQL commands in this lab will be run inside this shell.
sudo -u postgres psql
You will now see the PostgreSQL prompt, which looks like postgres=#.
Next, create the articles table. This table will store an ID, a title, and the main content for each article.
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
content TEXT
);
Now, insert three sample articles into the articles table. This data will be used for your search queries in the following steps.
INSERT INTO articles (title, content) VALUES
('PostgreSQL Tutorial', 'This is a comprehensive tutorial on PostgreSQL.'),
('Full Text Search in PostgreSQL', 'Learn how to implement full text search using TSVECTOR in PostgreSQL.'),
('PostgreSQL Performance Tuning', 'Tips and tricks to improve the performance of your PostgreSQL database.');
To confirm that the data was inserted correctly, run a SELECT query to view all records in the table.
SELECT * FROM articles;
You should see the three rows you just inserted, similar to the output below:
id | title | content
----+----------------------------------+-----------------------------------------------------------------------
1 | PostgreSQL Tutorial | This is a comprehensive tutorial on PostgreSQL.
2 | Full Text Search in PostgreSQL | Learn how to implement full text search using TSVECTOR in PostgreSQL.
3 | PostgreSQL Performance Tuning | Tips and tricks to improve the performance of your PostgreSQL database.
(3 rows)
Add a TSVECTOR Column and GIN Index
To perform full-text search, you need a column of type tsvector. A tsvector value is a sorted list of distinct words (lexemes) that have been normalized to merge different variants of the same word. In this step, you will add a tsvector column and create an index on it to accelerate search queries.
First, add a new column named search_vector of type TSVECTOR to the articles table.
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;
Next, populate this new column by converting the title and content of each article into a tsvector. The to_tsvector function is used for this conversion. It processes the text by removing common words (stop words) and reducing words to their root form (stemming).
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || content);
In this command, 'english' specifies the text search configuration, and || is the concatenation operator to combine the title and content.
To significantly speed up full-text searches, create a GIN (Generalized Inverted Index) on the search_vector column. GIN indexes are highly effective for indexing composite values like tsvector.
CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);
You can verify that the column and index have been added by inspecting the table structure with the \d command.
\d articles
The output should now include the search_vector column and the articles_search_idx GIN index.
Table "public.articles"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('articles_id_seq'::regclass)
title | character varying(255) | | not null |
content | text | | |
search_vector | tsvector | | |
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"articles_search_idx" gin (search_vector)
Perform Full-Text Searches
With the tsvector column and GIN index in place, you can now perform efficient full-text searches. Queries are constructed using the tsquery data type, which represents a text query. The @@ operator is used to match a tsquery against a tsvector.
First, let's search for articles containing the word search. The to_tsquery function converts the search string into a tsquery.
SELECT title, content FROM articles WHERE search_vector @@ to_tsquery('english', 'search');
This query will return the article about "Full Text Search".
title | content
----------------------------------+-----------------------------------------------------------------------
Full Text Search in PostgreSQL | Learn how to implement full text search using TSVECTOR in PostgreSQL.
(1 row)
You can also use operators within your query. The & (AND) operator finds documents containing all specified terms. Let's search for articles that contain both PostgreSQL and performance.
SELECT title FROM articles WHERE search_vector @@ to_tsquery('english', 'PostgreSQL & performance');
This returns the article on performance tuning.
title
-------------------------------
PostgreSQL Performance Tuning
(1 row)
The | (OR) operator finds documents containing at least one of the specified terms. Let's search for articles containing either tutorial or tuning.
SELECT title FROM articles WHERE search_vector @@ to_tsquery('english', 'tutorial | tuning');
This query will return two articles.
title
-------------------------------
PostgreSQL Tutorial
PostgreSQL Performance Tuning
(2 rows)
Automate Updates with a Trigger
Currently, the search_vector column is static. If you update an article's title or content, the search_vector will not automatically reflect the change. To solve this, you can create a trigger that automatically updates the search_vector column before any INSERT or UPDATE operation on the articles table.
First, define a function that will be executed by the trigger. This function updates the search_vector for the new or updated row.
CREATE OR REPLACE FUNCTION articles_tsvector_update() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
to_tsvector('pg_catalog.english', NEW.title || ' ' || NEW.content);
RETURN NEW;
END
$$ LANGUAGE plpgsql;
The NEW variable in the function refers to the row being inserted or updated.
Next, create the trigger itself. This trigger will call the articles_tsvector_update function before each row is inserted or updated.
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON articles FOR EACH ROW EXECUTE FUNCTION articles_tsvector_update();
Now, let's test the trigger. First, insert a new article about database indexing.
INSERT INTO articles (title, content) VALUES ('Database Indexing', 'Learn about B-Tree and GIN indexes.');
Because of the trigger, the search_vector for this new article was automatically populated. You can verify this by searching for the word indexes.
SELECT title FROM articles WHERE search_vector @@ to_tsquery('indexes');
The new article should appear in the results.
title
---------------------
Database Indexing
(1 row)
This confirms that your trigger is working correctly, ensuring your full-text search index remains synchronized with your data.
Summary
In this lab, you have learned the fundamentals of implementing full-text search in PostgreSQL. You successfully created a table, populated it with data, and configured it for full-text search by adding a tsvector column. You learned how to use the to_tsvector function to process text and a GIN index to optimize search performance. You also practiced performing searches with tsquery using different operators. Finally, you implemented a trigger to automate the process of keeping the search index synchronized with your data, which is a critical step for maintaining a reliable search system.


