This blog aims to assist you in understanding the initial concepts of Chapter:10 [Base Backup & Point-in-Time Recovery] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of
Chapter 9 Part-4 and basics of PostreSQL before we proceed to Chapter 10 Part-1, as it forms the foundation for our exploration.
So, Let's Start:
Introduction to chapter
Online database backup can be roughly classified into two categories: logical and physical backups.
One disadvantage of logical backups is that they can be very time-consuming. In particular, it can take a long time to make a backup of a large database, and even longer to restore the database from the backup data.
Meanwhile physical backups can be made and restored much more quickly, making them a very important and useful feature in practical systems.
In PostgreSQL, online physical full backups have been available since version 8.0.
A snapshot of a running whole database cluster (i.e., physical backup data) is known as a base backup.
Point-in-Time Recovery (PITR), which has also been available since version 8.0, is the feature to restore a database cluster to any point in time using a base backup and archive logs created by the continuous archiving feature.
Bonus Point:-
- In versions 7.4 or earlier, PostgreSQL had supported only logical backups (logical full and partial backups, and data exports).
Base Backup
First of all, the standard procedure to make a base backup using the low-level commands is as follows:
(1) Issue the pg_backup_start command (versions 14 or earlier, pg_start_backup).
(2) Take a snapshot of the database cluster using the archiving command of your choice.
(3) Issue the pg_backup_stop command (versions 14 or earlier, pg_stop_backup).
Simple procedure for database administrators.
No need for special tools; uses common tools like cp or archiving tools.
Backup process doesn't require table locks.
Allows users to continue issuing queries during the backup.
Offers a significant advantage over other major open-source RDBMSs.
Making a base backup in PostgreSQL is depicted in figure below:
pg_backup_start (Ver.14 or earlier, pg_start_backup)
The pg_backup_start command prepares for making a base backup.
The recovery process starts from a REDO point, so the pg_backup_start command must do a checkpoint to explicitly create a REDO point at the start of making a base backup.
The third and fourth operations are the heart of this command.
The first and second operations are performed to recover a database cluster more reliably.
A backup_label file contains the following six items (versions 11 or later, seven items):
CHECKPOINT LOCATION – This is the LSN location where the checkpoint created by this command has been recorded.
START WAL LOCATION – This is not used with PITR, but used with the streaming replication, which is described in Chapter 11. It is named 'START WAL LOCATION' because the standby server in replication-mode reads this value only once at initial startup.
BACKUP METHOD – This is the method used to make this base backup.
BACKUP FROM – This shows whether this backup is taken from the primary or standby server.
START TIME – This is the timestamp when the pg_backup_start command was executed.
LABEL – This is the label specified at the pg_backup_start command.
START TIMELINE – This is the timeline that the backup started. This is for a sanity check and has been introduced in version 11.
The pg_backup_start performs the following four operations:
(1) Force the database into full-page write mode.
(2) Switch to the current WAL segment file (versions 8.4 or later).
(3) Do a checkpoint.
(4) Create a backup_label file – This file, created in the top level of the base directory, contains essential information about base backup itself, such as the checkpoint location of this checkpoint.
pg_backup_stop (Ver.14 or earlier, pg_stop_backup)
The pg_backup_stop command performs the following five operations to complete the backup:
(1) Reset to non-full-page writes mode if it has been forcibly changed by the pg_backup_start command.
(2) Write a XLOG record of backup end.
(3) Switch the WAL segment file.
(4) Create a backup history file. This file contains the contents of the backup_label file and the timestamp that the pg_backup_stop command was executed.
(5) Delete the backup_label file. The backup_label file is required for recovery from the base backup, but once copied, it is not necessary in the original database cluster.
How Point-in-Time Recovery Works
In PITR mode, PostgreSQL replays the WAL data of the archive logs on the base backup, from the REDO point created by the pg_backup_start up to the point you want to recover.
In PostgreSQL, the point to be recovered is referred to as a recovery target.
Basic concept of PITR in PostgreSQL is depicted in figure below:
- At first, you need to set the command of the restore_command parameter, and also set the time of the recovery_target_time parameter to the point you made the mistake (in this case, 12:05 GMT) in a postgresql.conf (versions 12 or later) or recovery.conf (versions 11 or earlier).
I hope, this blog has helped you in understanding the initial concepts of Base Backup & Point-in-Time Recovery in PostreSQL.
Check out summary of Chapter : 10 Part-2
If you want to understand PostgreSQL In-Depth.
Top comments (0)