This blog aims to assist you in understanding the initial concepts of Chapter:10 [Base Backup & Point-in-Time Recovery] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of
Chapter 9 Part-4 and basics of PostreSQL before we proceed to Chapter 10 Part-1, as it forms the foundation for our exploration.
So, Let's Start:
Online database backup can be roughly classified into two categories: logical and physical backups.
One disadvantage of logical backups is that they can be very time-consuming. In particular, it can take a long time to make a backup of a large database, and even longer to restore the database from the backup data.
Meanwhile physical backups can be made and restored much more quickly, making them a very important and useful feature in practical systems.
In PostgreSQL, online physical full backups have been available since version 8.0.
A snapshot of a running whole database cluster (i.e., physical backup data) is known as a base backup.
Point-in-Time Recovery (PITR), which has also been available since version 8.0, is the feature to restore a database cluster to any point in time using a base backup and archive logs created by the continuous archiving feature.
- In versions 7.4 or earlier, PostgreSQL had supported only logical backups (logical full and partial backups, and data exports).
First of all, the standard procedure to make a base backup using the low-level commands is as follows:
(1) Issue the pg_backup_start command (versions 14 or earlier, pg_start_backup).
(2) Take a snapshot of the database cluster using the archiving command of your choice.
(3) Issue the pg_backup_stop command (versions 14 or earlier, pg_stop_backup).
Simple procedure for database administrators.
No need for special tools; uses common tools like cp or archiving tools.
Backup process doesn't require table locks.
Allows users to continue issuing queries during the backup.
Offers a significant advantage over other major open-source RDBMSs.
Making a base backup in PostgreSQL is depicted in figure below:
pg_backup_start (Ver.14 or earlier, pg_start_backup)
The pg_backup_start command prepares for making a base backup.
The recovery process starts from a REDO point, so the pg_backup_start command must do a checkpoint to explicitly create a REDO point at the start of making a base backup.
The third and fourth operations are the heart of this command.
The first and second operations are performed to recover a database cluster more reliably.
A backup_label file contains the following six items (versions 11 or later, seven items):
CHECKPOINT LOCATION – This is the LSN location where the checkpoint created by this command has been recorded.
START WAL LOCATION – This is not used with PITR, but used with the streaming replication, which is described in Chapter 11. It is named 'START WAL LOCATION' because the standby server in replication-mode reads this value only once at initial startup.
BACKUP METHOD – This is the method used to make this base backup.
BACKUP FROM – This shows whether this backup is taken from the primary or standby server.
START TIME – This is the timestamp when the pg_backup_start command was executed.
LABEL – This is the label specified at the pg_backup_start command.
START TIMELINE – This is the timeline that the backup started. This is for a sanity check and has been introduced in version 11.
The pg_backup_start performs the following four operations:
(1) Force the database into full-page write mode.
(2) Switch to the current WAL segment file (versions 8.4 or later).
(3) Do a checkpoint.
(4) Create a backup_label file – This file, created in the top level of the base directory, contains essential information about base backup itself, such as the checkpoint location of this checkpoint.
pg_backup_stop (Ver.14 or earlier, pg_stop_backup)
The pg_backup_stop command performs the following five operations to complete the backup:
(1) Reset to non-full-page writes mode if it has been forcibly changed by the pg_backup_start command.
(2) Write a XLOG record of backup end.
(3) Switch the WAL segment file.
(4) Create a backup history file. This file contains the contents of the backup_label file and the timestamp that the pg_backup_stop command was executed.
(5) Delete the backup_label file. The backup_label file is required for recovery from the base backup, but once copied, it is not necessary in the original database cluster.
In PITR mode, PostgreSQL replays the WAL data of the archive logs on the base backup, from the REDO point created by the pg_backup_start up to the point you want to recover.
In PostgreSQL, the point to be recovered is referred to as a recovery target.
Basic concept of PITR in PostgreSQL is depicted in figure below:
- At first, you need to set the command of the restore_command parameter, and also set the time of the recovery_target_time parameter to the point you made the mistake (in this case, 12:05 GMT) in a postgresql.conf (versions 12 or later) or recovery.conf (versions 11 or earlier).
I hope, this blog has helped you in understanding the initial concepts of Base Backup & Point-in-Time Recovery in PostreSQL.
Check out summary of Chapter : 10 Part-2
If you want to understand PostgreSQL In-Depth.