PostgreSQL Table Partitioning

PostgreSQLBeginner
Practice Now

Introduction

In this lab, you will learn how to implement table partitioning in PostgreSQL. The goal is to divide a large table into smaller, more manageable pieces, which can significantly improve query performance and simplify data management tasks like backups or archiving.

You will begin by creating a main "parent" table designed for partitioning. Then, you will define several "child" tables, or partitions, each holding data for a specific date range. Finally, you will insert data into the parent table and observe how PostgreSQL automatically routes it to the correct partition. You will also learn how to query the partitioned table and see how PostgreSQL optimizes these queries by only accessing the relevant partitions.

Create the Parent Partitioned Table

In this step, you will create the main sales table, which will serve as the parent table for our partitions. This table defines the structure for all its partitions but will not store any data itself.

First, you need to connect to the PostgreSQL database. Open a terminal and use the following command to start the psql interactive shell as the postgres user:

sudo -u postgres psql

You should now see the PostgreSQL prompt, which looks like postgres=#. All subsequent SQL commands in this lab will be run from this prompt.

Next, create the sales table. This table will be partitioned by range on the sale_date column.

CREATE TABLE sales (
    sale_id SERIAL,
    sale_date DATE NOT NULL,
    product_id INTEGER,
    sale_amount DECIMAL(10, 2),
    PRIMARY KEY (sale_id, sale_date)
) PARTITION BY RANGE (sale_date);

Let's break down this command:

  • CREATE TABLE sales (...): Defines the columns for our sales data.
  • PRIMARY KEY (sale_id, sale_date): In a partitioned table, the primary key must include the partitioning column (sale_date).
  • PARTITION BY RANGE (sale_date): This is the key part. It declares that this table is partitioned using the RANGE method on the sale_date column.

After running the command, you should see a CREATE TABLE confirmation message.

To verify that the table was created, you can use the \d command in psql to describe the table structure.

\d sales

The output will show the table's columns and, at the bottom, confirm that it is a "Partitioned table" and lists the "Partition key".

                                       Table "public.sales"
   Column    |     Type      | Collation | Nullable |                    Default
-------------+---------------+-----------+----------+------------------------------------------------
 sale_id     | integer       |           | not null | nextval('sales_sale_id_seq'::regclass)
 sale_date   | date          |           | not null |
 product_id  | integer       |           |          |
 sale_amount | numeric(10,2) |           |          |
Partition key: RANGE (sale_date)
Indexes:
    "sales_pkey" PRIMARY KEY, btree (sale_id, sale_date)
Number of partitions: 0

Notice that the "Number of partitions" is 0. You will create the actual partitions in the next step.

Define Partitions for Date Ranges

Now that you have the parent sales table, you need to create the actual partitions where the data will be stored. Each partition will hold data for a specific date range. In this step, you will create quarterly partitions for the years 2023 and 2024.

You should still be in the psql interactive terminal.

First, create the four partitions for 2023. Each command defines a new table as a partition of sales and specifies the date range it will cover.

CREATE TABLE sales_2023_q1 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE sales_2023_q2 PARTITION OF sales
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

CREATE TABLE sales_2023_q3 PARTITION OF sales
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');

CREATE TABLE sales_2023_q4 PARTITION OF sales
    FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');

The FOR VALUES FROM ... TO ... clause defines the range for each partition. The lower bound is inclusive, and the upper bound is exclusive. For example, sales_2023_q1 will store records where sale_date is from 2023-01-01 up to, but not including, 2023-04-01.

Next, create the partitions for the year 2024 using the same quarterly scheme:

CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE sales_2024_q3 PARTITION OF sales
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

CREATE TABLE sales_2024_q4 PARTITION OF sales
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

After running each CREATE TABLE command, you will see a confirmation message.

To verify that all partitions have been created, you can list the tables in the database again.

\dt

You should now see the parent sales table and all eight partitions you just created (sales_2023_q1, sales_2023_q2, etc.) in the output.

Insert and Route Data

In this step, you will insert sample data. A key feature of partitioning is that you insert data directly into the parent table (sales), and PostgreSQL automatically routes each row to the correct partition based on the value of the partition key (sale_date).

You should still be in the psql interactive terminal.

Execute the following INSERT statement to add 16 sample sales records spanning 2023 and 2024:

INSERT INTO sales (sale_date, product_id, sale_amount) VALUES
('2023-01-15', 101, 50.00),
('2023-02-20', 102, 75.50),
('2023-04-10', 103, 100.00),
('2023-05-25', 104, 60.25),
('2023-07-01', 105, 120.00),
('2023-08-12', 106, 80.75),
('2023-10-05', 107, 90.00),
('2023-11-18', 108, 110.50),
('2024-01-22', 109, 55.00),
('2024-03-01', 110, 70.00),
('2024-04-15', 111, 95.50),
('2024-06-10', 112, 65.00),
('2024-07-08', 113, 125.00),
('2024-09-20', 114, 85.25),
('2024-10-12', 115, 95.00),
('2024-12-01', 116, 115.75);

After the command completes, you will see the output INSERT 0 16, which indicates that 16 rows were successfully inserted.

To verify that the data was routed correctly, you can query the individual partitions. For example, let's check the count of records in the first quarter of 2023:

SELECT COUNT(*) FROM sales_2023_q1;

The output should be:

 count
-------
     2
(1 row)

Now, check the count for the fourth quarter of 2024:

SELECT COUNT(*) FROM sales_2024_q4;

The output should also be 2. This confirms that PostgreSQL has placed the data into the correct underlying partition tables.

Query Data and Analyze Performance

In this final step, you will query the partitioned sales table. The main benefit of partitioning, known as "partition pruning," is that PostgreSQL's query planner is smart enough to scan only the necessary partitions, avoiding a full scan of the entire dataset.

You should still be in the psql interactive terminal.

First, run a query to retrieve all sales from the first quarter of 2023.

SELECT * FROM sales WHERE sale_date >= '2023-01-01' AND sale_date < '2023-04-01';

You will see the two records that fall within this date range. To see how PostgreSQL optimizes this, you can use the EXPLAIN command, which shows the query execution plan.

EXPLAIN SELECT * FROM sales WHERE sale_date >= '2023-01-01' AND sale_date < '2023-04-01';

The output will look something like this:

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Seq Scan on sales_2023_q1 sales  (cost=0.00..31.75 rows=7 width=28)
   Filter: ((sale_date >= '2023-01-01'::date) AND (sale_date < '2023-04-01'::date))
(2 rows)

Notice the line Seq Scan on sales_2023_q1. This proves that PostgreSQL only scanned the sales_2023_q1 partition and ignored the other seven, making the query much faster on a large dataset.

Now, let's run a more complex query to find the total sales amount for each product in 2024.

SELECT product_id, SUM(sale_amount) AS total_sales
FROM sales
WHERE sale_date >= '2024-01-01' AND sale_date < '2025-01-01'
GROUP BY product_id
ORDER BY product_id;

This query will efficiently scan only the four partitions for 2024 to calculate the result. The output will show the total sales for each product from 109 to 116.

Finally, you can exit the PostgreSQL interactive terminal by typing:

\q

You will be returned to your regular shell prompt.

Summary

In this lab, you have learned the fundamentals of table partitioning in PostgreSQL. You successfully created a parent table partitioned by a date range, defined specific partitions for different time periods, and inserted data that was automatically routed to the correct partition. Most importantly, you used the EXPLAIN command to see partition pruning in action, demonstrating how partitioning can significantly improve query performance by allowing the database to scan only a subset of the data. This is a powerful technique for managing large-scale datasets efficiently.