DEV Community

Cover image for Summary of Chapter# 5 : "Concurrency Control" from the book "The Internals of PostgreSQL" Part-1
Vinay Kumar Talreja
Vinay Kumar Talreja

Posted on

Summary of Chapter# 5 : "Concurrency Control" from the book "The Internals of PostgreSQL" Part-1

This blog aims to assist you in understanding the initial concepts of Chapter:5 [Concurrency Control] from the book The Internals of PostgreSQL.

Note: Ensure that you have a thorough understanding of Chapter 4 and basics of PostreSQL before we proceed to Chapter 5 Part-1, as it forms the foundation for our exploration.

So, Let's Start:

Introduction to Chapter

  • Concurrency Control ensures atomicity and isolation in databases when multiple transactions run concurrently.

  • Three main concurrency control techniques are Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC).

  • MVCC creates new versions of data items during write operations while retaining old versions to ensure isolation.

  • MVCC's main advantage is that it allows readers and writers to operate concurrently without blocking each other.

  • PostgreSQL and some RDBMSs use a variation of MVCC called Snapshot Isolation (SI) for implementing concurrency control.

  • SI avoids anomalies such as Dirty Reads, Non-Repeatable Reads, and Phantom Reads.

  • Serializable Snapshot Isolation (SSI) was introduced in PostgreSQL version 9.1 to address serialization anomalies and provide true SERIALIZABLE isolation level.


Transaction ID

  • Whenever a transaction begins, a unique identifier, referred to as a transaction id (txid), is assigned by the transaction manager.

  • PostgreSQL's txid is a 32-bit unsigned integer.

  • If you execute the built-in txid_current() function after a transaction starts, the function returns the current txid as follows.

Image description

  • PostgreSQL reserves the following three special txids:

0 means Invalid txid.
1 means Bootstrap txid, which is only used in the initialization of the database cluster.
2 means Frozen txid.

Transaction ids in PostgreSQL is depicted in figure below:

Image description


Tuple Structure

  • Heap tuples in table pages are classified as a usual data tuple and a TOAST tuple.

  • A heap tuple comprises three parts, i.e. the HeapTupleHeaderData structure, NULL bitmap, and user data.

Tuple structure in PostgreSQL is depicted in figure below:

Image description

  • The HeapTupleHeaderData structure is defined in src/include/access/htup_details.h.

  • While the HeapTupleHeaderData structure contains seven fields, four fields are required in the subsequent sections.

  1. t_xmin holds the txid of the transaction that inserted this tuple.

  2. t_xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.

  3. t_cid holds the command id (cid), which means how many SQL commands were executed before this command was executed within the current transaction beginning from 0.

  4. t_ctid holds the tuple identifier (tid) that points to itself or a new tuple. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, the t_ctid points to itself.


Inserting, Deleting and Updating Tuples

Representation of tuples in PostgreSQL is depicted in figure below:

Image description

Insertion

  • With the insertion operation, a new tuple is inserted directly into a page of the target table.

Tuple insertion in PostgreSQL is depicted in figure below:

Image description

Analysis of above tuple insertion is given below:

  • t_xmin is set to 99 because this tuple is inserted by txid 99.
  • t_xmax is set to 0 because this tuple has not been deleted or updated.
  • t_cid is set to 0 because this tuple is the first tuple inserted by txid 99.
  • t_ctid is set to (0,1), which points to itself, because this is the latest tuple.

Deletion

  • In the deletion operation, the target tuple is deleted logically. The value of the txid that executes the DELETE command is set to the t_xmax of the tuple.

Tuple deletion in PostgreSQL is depicted in figure below:

Image description

Analysis of above tuple deletion is given below:

  • t_xmax is set to 111.

  • If txid 111 is committed, Tuple_1 is no longer required. Generally, unneeded tuples are referred to as dead tuples in PostgreSQL.

  • Dead tuples should eventually be removed from pages. Cleaning dead tuples is referred to as VACUUM Processing.

Update

  • In the update operation, PostgreSQL logically deletes the latest tuple and inserts a new one.

Tuple updation twice the row in PostgreSQL is depicted in figure below:

Image description

  • As with the delete operation, if txid 100 is committed, Tuple_1 and Tuple_2 will be dead tuples, and, if txid 100 is aborted, Tuple_2 and Tuple_3 will be dead tuples.

Free Space Map

  • When inserting a heap or an index tuple, PostgreSQL uses the Free Space Map (FSM) of the corresponding table or index to select the page which can be inserted it.

  • Each FSM stores the information about the free space capacity of each page within the corresponding table or index file.

pg_freespacemap in PostgreSQL is depicted in figure below:

Image description

Commit Log (Clog)

  • PostgreSQL holds the statuses of transactions in the Commit Log also known as Clog.

  • Clog is allocated to the shared memory, and is used throughout transaction processing.

Transaction Status

  • PostgreSQL defines four transaction states, i.e. IN_PROGRESS, COMMITTED, ABORTED, and SUB_COMMITTED.

The first three statuses are obvious. For example, when a transaction is in progress, its status is IN_PROGRESS, etc.

SUB_COMMITTED is for sub-transactions, and its description is omitted in this document.

How Clog Performs

How the Clog operates in PostgreSQL is depicted in figure below:

  • The Clog comprises one or more 8 KB pages in shared memory.

  • The Clog logically forms an array.

  • The indices of the array correspond to the respective transaction ids, and each item in the array holds the status of the corresponding transaction id.

Image description

  • When the current txid advances and the Clog can no longer store it, a new page is appended.

  • When the status of a transaction is needed, the internal functions are invoked. Those functions read the Clog and return the status of the requested transaction.

Maintenance of the Clog

  • When PostgreSQL shuts down or during checkpoint processes, the Clog data is written into files stored in the pg_xact (pg_clog) subdirectory.

  • Files in pg_xact are named 0000, 0001, etc., with a maximum file size of 256 KB.

  • When PostgreSQL starts up, the data stored in the pg_xact's files are loaded to initialize the clog.

  • The clog size grows continuously by appending new pages when it gets filled up.

  • Regular vacuum processing removes old data from the clog, including both pages and files.


I hope, this blog has helped you in understanding the initial concepts of Concurrency Control in PostreSQL.

Check out summary of Chapter : 5 Part-2

If you want to understand PostgreSQL In-Depth.

Top comments (0)