SQLite Full-Text Indexing

SQLiteBeginner
Practice Now

Introduction

In this lab, you will learn how to use SQLite's FTS5 extension for efficient full-text searching. You'll create FTS5 tables, insert data, perform searches, and optimize queries using MATCH clauses. This lab provides a hands-on introduction to full-text indexing in SQLite.

Create a SQLite Database and Enable FTS5

In this step, you will create a SQLite database and enable the FTS5 extension. FTS5 is a module that provides full-text search capabilities.

First, open your terminal in the LabEx VM. Your default path is /home/labex/project.

  1. Create a SQLite database:

    Create a database file named books.db and connect to it using the sqlite3 command:

    sqlite3 books.db

    This command starts the SQLite shell, where you can execute SQL commands.

  2. Enable the FTS5 extension:

    Before you can use FTS5, you need to enable it. Execute the following SQL command:

    CREATE VIRTUAL TABLE book_search USING fts5();

    If you see an error like Error: no such module: fts5, it means the FTS5 extension is not available. In most modern SQLite versions, FTS5 is included by default. If you encounter this error, ensure your SQLite version is up-to-date. If the command runs without error, the FTS5 extension is enabled.

    This command creates a virtual table named book_search using the fts5 module. Virtual tables are a feature of SQLite that allows you to extend the database with custom functionality.

  3. Define the table schema:

    Now, let's define the columns for our book_search table. We'll include columns for title, author, and content. Execute the following SQL command:

    CREATE VIRTUAL TABLE book_search USING fts5(title, author, content);

    This command creates the book_search table with the specified columns. The title, author, and content columns will be indexed by FTS5, allowing you to perform full-text searches on them.

  4. Configure the tokenizer (Optional):

    You can customize how FTS5 tokenizes text by specifying a tokenizer. The unicode61 tokenizer provides good Unicode support. To use it and remove diacritics (accents), recreate the table with the following command:

    DROP TABLE IF EXISTS book_search;
    CREATE VIRTUAL TABLE book_search USING fts5(title, author, content, tokenize="unicode61 remove_diacritics 1");

    This command first drops the existing book_search table (if it exists) and then recreates it with the unicode61 tokenizer. The remove_diacritics 1 option tells FTS5 to remove diacritics from the text during indexing.

Insert Data into the FTS5 Table

In this step, you will insert data into the book_search table. This data will be used for full-text searching in later steps.

  1. Insert book data:

    Execute the following SQL commands to insert sample book data into the book_search table:

    INSERT INTO book_search (title, author, content) VALUES (
        'The Lord of the Rings',
        'J.R.R. Tolkien',
        'A fantasy epic about hobbits, elves, and the battle against Sauron.'
    );
    
    INSERT INTO book_search (title, author, content) VALUES (
        'Pride and Prejudice',
        'Jane Austen',
        'A classic novel about love, class, and society in 19th-century England.'
    );
    
    INSERT INTO book_search (title, author, content) VALUES (
        'The Hitchhiker''s Guide to the Galaxy',
        'Douglas Adams',
        'A comedic science fiction series following the misadventures of Arthur Dent.'
    );
    
    INSERT INTO book_search (title, author, content) VALUES (
        'To Kill a Mockingbird',
        'Harper Lee',
        'A powerful story about racial injustice in the American South.'
    );
    
    INSERT INTO book_search (title, author, content) VALUES (
        '1984',
        'George Orwell',
        'A dystopian novel about totalitarianism and surveillance.'
    );

    These commands insert five rows into the book_search table, each representing a book. The title, author, and content values are provided for each book. Note the escaped single quote in 'The Hitchhiker''s Guide to the Galaxy'. Single quotes within a string must be escaped by doubling them.

  2. Verify the data insertion:

    To confirm that the data has been added correctly, run the following command:

    SELECT * FROM book_search;

    This command retrieves all rows and columns from the book_search table. You should see the data you just inserted.

    Expected Output:

    The Lord of the Rings|J.R.R. Tolkien|A fantasy epic about hobbits, elves, and the battle against Sauron.
    Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
    The Hitchhiker's Guide to the Galaxy|Douglas Adams|A comedic science fiction series following the misadventures of Arthur Dent.
    To Kill a Mockingbird|Harper Lee|A powerful story about racial injustice in the American South.
    1984|George Orwell|A dystopian novel about totalitarianism and surveillance.

Perform Basic Full-Text Searches

In this step, you will perform basic full-text searches using the MATCH operator.

  1. Search for a single term:

    To find books containing the word "fantasy", execute the following SQL command:

    SELECT * FROM book_search WHERE book_search MATCH 'fantasy';

    This command searches all columns of the book_search table for the term "fantasy". The MATCH operator is used to perform the full-text search.

    Expected Output:

    The Lord of the Rings|J.R.R. Tolkien|A fantasy epic about hobbits, elves, and the battle against Sauron.
  2. Search for multiple terms:

    You can search for multiple terms by separating them with spaces. To find books containing both "love" and "society", execute the following SQL command:

    SELECT * FROM book_search WHERE book_search MATCH 'love society';

    This command searches for rows that contain both "love" and "society" in any of the indexed columns.

    Expected Output:

    Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
  3. Search within a specific column:

    To search within a specific column, specify the column name before the MATCH operator. For example, to find books with "Orwell" in the author column, execute the following SQL command:

    SELECT * FROM book_search WHERE author MATCH 'Orwell';

    This command searches only the author column for the term "Orwell".

    Expected Output:

    1984|George Orwell|A dystopian novel about totalitarianism and surveillance.

Use Advanced MATCH Features

In this step, you will explore advanced features of the MATCH operator, including prefix searches and boolean operators.

  1. Prefix Search:

    Use the * wildcard to perform a prefix search. To find books containing words starting with "soci", execute the following SQL command:

    SELECT * FROM book_search WHERE book_search MATCH 'soci*';

    This command searches for terms that begin with "soci", such as "society".

    Expected Output:

    Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
  2. Boolean Operators:

    FTS5 supports boolean operators like AND, OR, and NOT to create more complex search queries.

    • AND: Find documents containing both terms. The default behavior when terms are separated by spaces is AND.

      SELECT * FROM book_search WHERE book_search MATCH 'love AND society';

      This is equivalent to SELECT * FROM book_search WHERE book_search MATCH 'love society';

      Expected Output:

      Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
    • OR: Find documents containing either term.

      SELECT * FROM book_search WHERE book_search MATCH 'love OR injustice';

      This command finds books containing either "love" or "injustice" (or both).

      Expected Output:

      Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
      To Kill a Mockingbird|Harper Lee|A powerful story about racial injustice in the American South.
    • NOT: Exclude documents containing a specific term.

      SELECT * FROM book_search WHERE book_search MATCH 'NOT fantasy';

      This command finds books that do not contain the word "fantasy".

      Expected Output:

      Pride and Prejudice|Jane Austen|A classic novel about love, class, and society in 19th-century England.
      The Hitchhiker's Guide to the Galaxy|Douglas Adams|A comedic science fiction series following the misadventures of Arthur Dent.
      To Kill a Mockingbird|Harper Lee|A powerful story about racial injustice in the American South.
      1984|George Orwell|A dystopian novel about totalitarianism and surveillance.

Summary

In this lab, you have learned how to use SQLite's FTS5 extension for full-text searching. You created an FTS5 table, inserted data, performed basic searches using the MATCH operator, and explored advanced features like prefix searches and boolean operators. These skills provide a foundation for building powerful search capabilities into your SQLite applications.