PostgreSQL is one of the most popular and widely used open-source database systems in the world. It supports many advanced features such as transactions, concurrency control, replication, and backup. But how does PostgreSQL ensure that these features work correctly and reliably, even in the face of system failures? The answer is WAL, which stands for Write Ahead Logging. In this blog post, I will introduce you to the concept of WAL, how it works, and how it helps PostgreSQL to write and recover data efficiently and safely.
What is WAL?
WAL is a technique that records all the changes and actions that occur in a database system in a sequential log file, before they are applied to the actual data files. The log file is called the transaction log or the WAL file, and each entry in the log file is called a WAL record or an XLOG record.
The main purpose of WAL is to ensure that no data is lost in case of a system failure, such as a power outage, a hardware malfunction, or a server crash. By keeping a history of all the changes and actions in the WAL file, PostgreSQL can recover the database to a consistent state by replaying the WAL records from the last checkpoint.
A checkpoint is a point in time when PostgreSQL writes all the modified data pages (also called dirty pages) from the shared buffer pool (the in-memory cache of data pages) to the data files on disk. A checkpoint also updates a special file called pg_control, which contains information about the current state of the database and the location of the latest checkpoint record in the WAL file.
By using WAL, PostgreSQL can avoid writing every change to disk immediately, which would be very slow and inefficient. Instead, PostgreSQL can write changes to disk in batches at checkpoints, while ensuring that the WAL file contains enough information to reconstruct the changes in case of a failure.
How does WAL work?
To illustrate how WAL works, let's see an example of inserting two tuples (rows) into a table called TABLE_A. Suppose that TABLE_A has only one page (a fixed-size unit of storage) and that PostgreSQL does not use WAL. The following figure shows what happens when we execute two INSERT statements without WAL:
- When we execute the first INSERT statement, PostgreSQL loads the TABLE_A's page from the data file into the shared buffer pool and inserts a tuple into the page. This page is not written to disk immediately; it is marked as dirty and stays in memory.
- When we execute the second INSERT statement, PostgreSQL inserts another tuple into the same page in memory. This page is still not written to disk yet.
- If a system failure occurs at this point, all the inserted data would be lost, because they are only in memory and not on disk.
To prevent data loss due to system failures, PostgreSQL uses WAL. The following figure shows what happens when we execute two INSERT statements with WAL:
- A check pointer process periodically performs checkpoints. Whenever it starts, it writes a checkpoint record to the current WAL segment file. This record contains the location of the latest REDO point, which is the starting point for database recovery.
- When we execute the first INSERT statement, PostgreSQL loads the TABLE_A's page into the shared buffer pool, inserts a tuple into the page, creates and writes a WAL record of this statement into the WAL buffer at location LSN_1 (Log Sequence Number), and updates the TABLE_A's LSN from LSN_0 to LSN_1. In this example, this WAL record is a pair of a header-data and the tuple entire.
- When this transaction commits, PostgreSQL creates and writes a WAL record of this commit action into the WAL buffer, and then writes and flushes all WAL records on the WAL buffer to the WAL segment file, from LSN_1.
- When we execute the second INSERT statement, PostgreSQL inserts another tuple into the same page in memory, creates and writes another WAL record of this statement into the WAL buffer at LSN_2, and updates the TABLE_A's LSN from LSN_1 to LSN_2.
- When this transaction commits, PostgreSQL operates in the same way as in step 3.
- If a system failure occurs at this point, even though all data in memory are lost, all changes are recorded in the WAL segment files as history data.
To recover from a system failure, PostgreSQL can use the information in the pg_control file and the WAL segment files to restore the database to a consistent state. The following figure shows how PostgreSQL recovers using WAL:
- PostgreSQL reads the pg_control file and finds the location of the latest checkpoint record in the WAL segment file.
- PostgreSQL reads the checkpoint record and finds the location of the latest REDO point in the WAL segment file.
- PostgreSQL reads and replays the WAL records from the REDO point in chronological order. For each WAL record, PostgreSQL compares its LSN with the corresponding page's LSN. If the WAL record's LSN is larger than the page's LSN, it means that the WAL record has not been applied to the page yet, so PostgreSQL applies the data portion of the WAL record to the page and updates the page's LSN. If the WAL record's LSN is smaller or equal to the page's LSN, it means that the WAL record has already been applied to the page, so PostgreSQL skips it and reads the next WAL record.
- PostgreSQL continues to replay WAL records until it reaches the end of the WAL segment files or encounters an invalid record.
By using this recovery process, PostgreSQL can restore the database to a consistent state as if all transactions had been completed before the failure.
Conclusion
In this blog post, I have explained what WAL is, how it works, and how it helps PostgreSQL to write and recover data efficiently and safely. WAL is a powerful technique that enables various features such as transactions, concurrency control, replication, and backup in PostgreSQL. By understanding how WAL works, you can better appreciate and utilize these features in your database applications.
I hope you enjoyed this blog post and learned something new. In the next blog post, I will continue to explore more details about WAL, such as its internal structure, format, and configuration. Stay tuned for more!
References
- Chapter 9: Write Ahead Logic - Wal
- Streamlined and grammar correction by ChatGPT
Top comments (0)