PostgreSQL Streaming Replication

Introduction

In this lab, you will learn how to configure PostgreSQL streaming replication. This powerful feature allows you to maintain a live, read-only copy of your primary database on a secondary server, known as a replica. The replica continuously receives and applies data changes from the primary, making it useful for high availability, load balancing read queries, and backups.

You will walk through the entire process, starting with configuring the primary server to allow replication. Then, you will create a replica server by taking a base backup. Finally, you will start the replica, test the data flow between the two servers, and see how the replica stays synchronized with the primary.

Configure the Primary Server for Replication

In this first step, you will configure the primary PostgreSQL server to prepare it for replication. This involves modifying its configuration files to allow network connections, enabling the necessary Write-Ahead Log (WAL) level, and creating a special user for replication.

1. Modify the PostgreSQL Configuration File

To prepare for replication, you need to modify the main configuration file, postgresql.conf. Editing this large file manually with a text editor like nano can be challenging and error-prone due to the large number of options. A more efficient and reliable method is to use command-line tools to view and update the specific settings you need to change. This avoids having to scroll through hundreds of lines and prevents accidental mistakes.

First, let's check the current values of the settings we need to modify: listen_addresses, wal_level, and max_wal_senders.

CONF_FILE="/etc/postgresql/14/main/postgresql.conf"
echo "--- Current Settings ---"
sudo grep -E "^#?\s*(listen_addresses|wal_level|max_wal_senders)" $CONF_FILE

You will likely see these lines commented out with a #.

Now, let's update these settings automatically. The following commands will back up the configuration file and then update each required parameter.

CONF_FILE="/etc/postgresql/14/main/postgresql.conf"
## Create a backup before making changes
sudo cp $CONF_FILE ${CONF_FILE}.bak.$(date +%s)

## Set listen_addresses to allow connections from any IP address
sudo sed -i -E "s/^[#\s]*listen_addresses\s*=.*/listen_addresses = '*'/" "$CONF_FILE"

## Set wal_level to 'replica' to enable replication logs
sudo sed -i -E "s/^[#\s]*wal_level\s*=.*/wal_level = replica/" "$CONF_FILE"

## Set the maximum number of concurrent replication connections
sudo sed -i -E "s/^[#\s]*max_wal_senders\s*=.*/max_wal_senders = 10/" "$CONF_FILE"

Finally, let's verify that the changes were applied correctly. The command below will display the new values, filtering out any commented lines.

echo "--- Verified Settings ---"
sudo grep -E "^(listen_addresses|wal_level|max_wal_senders)" $CONF_FILE

The output should show the updated, active configuration:

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10

With these settings confirmed, the server is properly configured for the next steps.

2. Create a Dedicated Replication User

It is a best practice to use a dedicated user for replication instead of a superuser. Let's create a role named replicator.

Connect to PostgreSQL using the psql command-line client:

sudo -u postgres psql

Now, run the following SQL command to create the user with replication privileges and a password:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replicapass';

You should see the output CREATE ROLE. Now, exit the psql client:

\q

3. Allow the Replication Connection

Next, you need to configure the server to allow the replicator user to connect from the replica server. You will do this by editing the pg_hba.conf file.

Open the file with nano:

sudo nano /etc/postgresql/14/main/pg_hba.conf

Add the following line to the end of the file. This rule specifies that the user replicator is allowed to connect to the replication pseudo-database from any IP address (0.0.0.0/0). For this lab, we use 127.0.0.1/32 since both servers are on the same machine.

host    replication     replicator      127.0.0.1/32            md5

Press Ctrl+X, Y, and Enter to save and exit.

4. Restart the Primary Server

To apply all these configuration changes, you must restart the PostgreSQL service.

sudo service postgresql restart

The primary server is now ready to accept replication connections.

Create the Replica from a Base Backup

With the primary server configured, the next step is to create the replica. The standard way to do this is by taking a "base backup" of the primary server. This creates an identical copy of the primary's data directory, which will serve as the starting point for the replica.

1. Create a Data Directory for the Replica

First, create a new directory where the replica's data will be stored. It's best practice to create this in a system location, not a user's home directory, to avoid permission issues. We'll create it as the postgres user in PostgreSQL's standard directory.

sudo -u postgres mkdir -p /var/lib/postgresql/14/replica

2. Take the Base Backup

Now, use the pg_basebackup utility to copy the data from the primary server. This command will connect to the primary as the replicator user and stream the data to the new replica directory. Because you created the directory as the postgres user, there are no permission issues to solve.

Run the following command. You will be prompted for the password for the replicator user, which is replicapass.

sudo -u postgres pg_basebackup -h localhost -p 5432 -U replicator -D /var/lib/postgresql/14/replica -P -v -R

Let's break down this command:

sudo -u postgres: Runs the command as the postgres system user, which has the necessary permissions.
pg_basebackup: The utility for taking base backups.
-h localhost -p 5432: Specifies the host and port of the primary server.
-U replicator: The username to connect with.
-D /var/lib/postgresql/14/replica: The target directory for the backup.
-P: Shows a progress report.
-v: Enables verbose mode.
-R: This is a very helpful option. It creates a standby.signal file and appends the connection settings to the postgresql.auto.conf file in the target directory, which automatically configures the new data directory as a replica.

3. Copy Configuration Files

The base backup copies the main data files, but on many Debian-based systems (like this Ubuntu environment), the configuration files (pg_hba.conf and pg_ident.conf) are stored separately in /etc/postgresql/ and are not included in the backup. You must copy them to the replica's data directory manually. After copying, you must also ensure they are owned by the postgres user so the server process can read them.

sudo cp /etc/postgresql/14/main/pg_hba.conf /var/lib/postgresql/14/replica/
sudo cp /etc/postgresql/14/main/pg_ident.conf /var/lib/postgresql/14/replica/
sudo chown postgres:postgres /var/lib/postgresql/14/replica/pg_hba.conf
sudo chown postgres:postgres /var/lib/postgresql/14/replica/pg_ident.conf

With the data files and configuration in place, the replica directory is ready for the final configuration steps.

Configure and Start the Replica Server

The base backup has prepared the replica's data directory. However, since you are running both the primary and replica servers on the same machine, they cannot both use the same network port (default is 5432). In this step, you will configure the replica to use a different port and then start it.

1. Configure the Replica's Port

Edit the postgresql.conf file inside the replica's data directory to change its listening port. Note that you need sudo because this file is now owned by the postgres user.

sudo nano /var/lib/postgresql/14/replica/postgresql.conf

Add the following line at the end of the file to set the port to 5433:

port = 5433

Press Ctrl+X, Y, and Enter to save and exit.

2. Set Data Directory Permissions

For security, PostgreSQL requires that its data directory is not accessible by other users. You must set its permissions to 700, which gives read, write, and execute permissions only to the owner (postgres).

sudo chmod 0700 /var/lib/postgresql/14/replica

3. Start the Replica Server

Now you can start the replica server. You will use pg_ctl, a standard PostgreSQL utility for controlling a database server.

Run the following command to start the replica. You must use the full path to pg_ctl because the sudo command may not know where to find it otherwise. We will also specify a log file in /tmp to make it easy to check the server's status.

sudo -u postgres /usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/14/replica -l /tmp/replica.log start

pg_ctl: The server control utility.
-D /var/lib/postgresql/14/replica: Specifies the data directory for this server instance.
-l /tmp/replica.log: Specifies the log file.
start: The action to perform.

You should see the message:

waiting for server to start.... done
server started

4. Check the Replica's Log File

To confirm that the replica started correctly and connected to the primary, view its log file with sudo since it is owned by the postgres user:

sudo cat /tmp/replica.log

Look for lines indicating that the database system is ready to accept read-only connections and that it has started streaming from the primary. You should see something similar to this:

...
LOG:  database system is ready to accept read-only connections
LOG:  started streaming WAL from primary at 0/4000000 on timeline 1
...

This confirms your replica server is up, running, and successfully connected to the primary.

Test the Replication

Now that both servers are running, it's time to test if the replication is working as expected. You will create a table on the primary server and verify that it appears on the replica. You will also confirm that the replica is read-only.

1. Check Replication Status on the Primary

First, connect to the primary server (on port 5432) and check the pg_stat_replication view. This view provides monitoring information about connected replicas.

sudo -u postgres psql -p 5432

Run this query:

SELECT client_addr, state, sync_state FROM pg_stat_replication;

The output should show your replica connected, with its state as streaming and sync_state as async or sync.

 client_addr |   state   | sync_state
-------------+-----------+------------
 127.0.0.1   | streaming | async
(1 row)

2. Create Data on the Primary

While still connected to the primary, create a new table and insert some data:

CREATE TABLE replication_test (id INT, message TEXT);
INSERT INTO replication_test VALUES (1, 'Hello from primary!');

Now, query the table to confirm the data is there:

SELECT * FROM replication_test;

You should see the row you just inserted. Exit the primary's psql shell:

\q

3. Verify Data on the Replica

Open a new terminal or use your existing one to connect to the replica server on port 5433. You will need to provide the password for the postgres user, which is labex.

psql -h localhost -p 5433 -U postgres -d postgres

Now, query the replication_test table on the replica:

SELECT * FROM replication_test;

You should see the exact same data you created on the primary. This confirms that streaming replication is working!

 id |      message
----+---------------------
  1 | Hello from primary!
(1 row)

4. Test the Read-Only Nature of the Replica

Try to insert data into the table on the replica:

INSERT INTO replication_test VALUES (2, 'Hello from replica?');

The command will fail with an error because a replica server is in read-only mode by default.

ERROR:  cannot execute INSERT in a read-only transaction

This is expected behavior and a key characteristic of a streaming replica. Exit the replica's psql shell:

\q

Clean Up the Environment

In this final step, you will shut down the servers and remove the files and directories created during the lab to return the environment to its initial state.

1. Stop the Replica Server

First, stop the replica server using pg_ctl. Remember to use the full path to the executable.

sudo -u postgres /usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/14/replica stop

You will see a confirmation message that the server has stopped.

waiting for server to shut down.... done
server stopped

2. Stop the Primary Server

Next, stop the primary server using the service command.

sudo service postgresql stop

3. Remove Replica Data and Logs

Now that the servers are stopped, you can safely remove the replica's data directory and the log file you created. Note that sudo is required as these files are owned by the postgres user.

sudo rm -rf /var/lib/postgresql/14/replica /tmp/replica.log

This completes the cleanup process. You have successfully set up, tested, and torn down a PostgreSQL streaming replication environment.

Summary

In this lab, you successfully configured PostgreSQL streaming replication from scratch. You learned how to prepare a primary server by modifying postgresql.conf and pg_hba.conf and creating a dedicated replication user. You then used pg_basebackup to create a replica and configured it to run on a separate port.

By testing the setup, you verified that data written to the primary is automatically replicated to the secondary server in near real-time. You also confirmed that a replica server is read-only, which is a fundamental aspect of its design. These skills are essential for managing robust and scalable PostgreSQL deployments.