DEV Community

m_aamir
m_aamir

Posted on

Understanding Transaction Status and Commit Log in PostgreSQL

Transactions play a crucial role in ensuring the reliability and consistency of data in a database management system. In PostgreSQL, the statuses of transactions are stored in the Commit Log, commonly known as the clog. This article dives into the intricacies of transaction statuses, the functioning of the clog, and the maintenance involved.

Transaction Status in PostgreSQL

PostgreSQL defines four transaction states: IN_PROGRESS, COMMITTED, ABORTED, and SUB_COMMITTED. The first three statuses are self-explanatory. When a transaction is ongoing, its status is IN_PROGRESS. Similarly, when a transaction successfully completes, it is marked as COMMITTED. In contrast, if a transaction fails or is explicitly rolled back, its status is set to ABORTED. The SUB_COMMITTED status is specific to sub-transactions but won't be covered in detail here.

How the Commit Log (Clog) Functions

The Commit Log, or clog, is a portion of shared memory in PostgreSQL that holds the transaction statuses. Physically, the clog consists of one or more 8 KB pages, forming a logical array. Each index in the array corresponds to a transaction ID, and the array stores the status of each transaction.

How the clog operates

Let's consider two scenarios:

  1. Transaction T1 with txid 200 commits, causing its status to change from IN_PROGRESS to COMMITTED.
  2. Transaction T2 with txid 201 aborts, resulting in its status changing from IN_PROGRESS to ABORTED. As the transaction ID advances and the clog reaches its capacity, a new page is appended to accommodate additional transaction statuses.

To retrieve the status of a transaction, internal functions are invoked, which read the clog and provide the requested transaction's status.

Maintenance of the Commit Log

When PostgreSQL shuts down or the checkpoint process runs, the data from the clog are written into files stored under the pg_xact subdirectory. It's worth noting that prior to Version 9.6, the subdirectory was called pg_clog. These files are named 0000, 0001, and so on, with a maximum file size of 256 KB.

For instance, if the clog occupies eight pages (from the first to the eighth, totaling 64 KB), its data will be written into the file 0000 (64 KB). In another scenario, if it comprises 37 pages (296 KB), the data will be stored in both 0000 (256 KB) and 0001 (40 KB).

During startup, PostgreSQL loads the data from the pg_xact files to initialize the clog.

Since new pages are appended to the clog as it fills up, its size continuously increases. However, not all data in the clog are necessary. Regular vacuum processing is responsible for removing old data, including both clog pages and files.

Understanding Transaction Snapshots

A transaction snapshot refers to a dataset that stores information about the active status of transactions at a specific point in time for an individual transaction. Here, an active transaction refers to one that is either in progress or has not yet started.

PostgreSQL internally defines the textual representation format of transaction snapshots as '100: 100:', where '100: 100:' signifies that transactions with IDs less than 99 are inactive, while those equal to or greater than 100 are active.

Transaction snapshots are essential for maintaining data consistency in PostgreSQL's multi-version concurrency control (MVCC) system. They allow transactions to access a consistent view of the database, even if other transactions are modifying it concurrently.

Each transaction maintains its own snapshot, which is stored in memory and includes information about the transaction's start time, visibility of data, and other relevant details. The snapshot is used to determine which data versions are visible to the transaction based on the transaction's start time and the commit status of other transactions.

When a transaction reads data, it uses its snapshot to identify the appropriate data versions based on their visibility. If a data version is marked as committed and was committed before the transaction's start time, it is considered visible to the transaction. On the other hand, if a data version is marked as aborted or was committed after the transaction's start time, it is considered invisible to the transaction.

By using transaction snapshots, PostgreSQL provides a consistent and isolated view of the data for each transaction, ensuring that transactions operate on a consistent set of data even in the presence of concurrent modifications.

References:

https://www.interdb.jp/pg/pgsql05.html

Top comments (0)