Concurrency control is a vital aspect of database management systems that allows multiple transactions to run concurrently while maintaining atomicity and isolation. In PostgreSQL, a popular open-source relational database management system (RDBMS), several concurrency control techniques are employed to ensure data consistency and isolation. In this blog post, we will explore these techniques and understand how PostgreSQL handles concurrency control.
MVCC is one of the concurrency control techniques utilized by PostgreSQL. It operates on the principle of creating multiple versions of data items during write operations while retaining the old versions. When a transaction reads a data item, the system selects the appropriate version to ensure isolation for that transaction. The key advantage of MVCC is that it allows concurrent reading and writing without blocking each other, unlike other techniques such as Strict Two-Phase Locking (S2PL).
🔒 Snapshot Isolation (SI)
PostgreSQL employs a variation of MVCC called Snapshot Isolation (SI). In SI, when a new data item is written, the old version is stored in a rollback segment. The new item is then directly inserted into the relevant table page. During read operations, PostgreSQL applies visibility check rules to select the appropriate version of an item for each transaction. SI prevents three anomalies defined in the ANSI SQL-92 standard, namely Dirty Reads, Non-Repeatable Reads, and Phantom Reads.
⚡ Optimistic Concurrency Control (OCC)
Another concurrency control technique used by PostgreSQL is Optimistic Concurrency Control (OCC). Unlike MVCC, which creates multiple versions of data items, OCC assumes that conflicts between transactions are rare. It allows transactions to proceed without acquiring locks initially but checks for conflicts during the commit phase. If a conflict is detected, one of the conflicting transactions is rolled back and must be retried.
While SI provides good isolation and prevents ANSI SQL-92 anomalies, it cannot achieve true serializability. To address this limitation, Serializable Snapshot Isolation (SSI) was introduced in PostgreSQL version 9.1. SSI can detect serialization anomalies, such as Write Skew and Read-only Transaction Skew, and resolve conflicts caused by such anomalies. With SSI, PostgreSQL offers a true SERIALIZABLE isolation level, ensuring strict consistency even in the presence of complex concurrency scenarios.
PostgreSQL provides three transaction isolation levels: READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. These levels determine the visibility and behavior of data during concurrent transactions. The table below summarizes the isolation levels and their characteristics in PostgreSQL.
|Isolation Level||Dirty Reads||Non-repeatable Read||Phantom Read||Serialization Anomaly|
|READ COMMITTED||Not possible||Possible||Possible||Possible|
|REPEATABLE READ||Not possible||Not possible||Not possible||Possible|
|SERIALIZABLE||Not possible||Not possible||Not possible||Not possible|
It's worth noting that in PostgreSQL versions before 9.1, the REPEATABLE READ level was labeled as 'SERIALIZABLE.' However, the introduction of SSI in version 9.1 led to the introduction of a true SERIALIZABLE level, and REPEATABLE READ was redefined.
In PostgreSQL, every transaction is assigned a unique identifier called a transaction ID (txid). It is a 32-bit unsigned integer, allowing approximately 4.2 billion transaction IDs. PostgreSQL reserves special txids: 0 for Invalid txid, 1 for Bootstrap txid (used during database initialization), and 2 for Frozen txid.
Transaction IDs in PostgreSQL can be compared with each other. Let's consider the viewpoint of txid 100. Any txids greater than 100 are considered "in the future" and are invisible from the perspective of txid 100. On the other hand, txids less than 100 are considered "in the past" and are visible to txid 100.
Due to the limited space of the txid, PostgreSQL treats it as a circular space. The previous 2.1 billion txids are classified as "in the past," and the next 2.1 billion txids are classified as "in the future."
It's important to note the txid wraparound problem, which occurs when the transaction ID space is exhausted. In practical systems, the available txid space may not be sufficient. When the txid wraparound problem occurs, it can lead to data inconsistencies and system failures.
To mitigate this problem, PostgreSQL employs various mechanisms such as autovacuum to regularly clean up dead tuples and recycle transaction IDs. By ensuring that old transaction IDs are no longer needed and recycled, the txid wraparound problem can be prevented.
In PostgreSQL, heap tuples in table pages consist of three main components: the HeapTupleHeaderData structure, NULL bitmap, and user data.
For our purposes, we'll focus on four important fields:
- t_xmin: This field holds the txid of the transaction that inserted the tuple.
- t_xmax: It stores the txid of the transaction that deleted or updated the tuple. If the tuple hasn't been modified, t_xmax is set to 0 (INVALID).
- t_cid: The command ID (cid) represents the number of SQL commands executed before the current command within the same transaction.
- t_ctid: The tuple identifier (tid) points to itself or a new tuple. It is used to identify a tuple within a table. If the tuple is updated, t_ctid points to the new tuple; otherwise, it points to itself.
Now, let's explore how tuples are inserted, deleted, and updated in PostgreSQL. We'll also touch upon the Free Space Map (FSM) used for these operations.
- Insertion: When inserting a new tuple, it is directly inserted into a page of the target table. For example, if a tuple is inserted by a transaction with txid 99, the tuple's header fields would be set accordingly.
- t_xmin: 99 (inserted by txid 99)
- t_xmax: 0 (not deleted or updated)
- t_cid: 0 (first tuple inserted by txid 99)
- t_ctid: (0,1) (points to itself)
- Deletion: To logically delete a tuple, the txid executing the DELETE command is set as the t_xmax of the tuple. This marks the tuple as deleted, and it becomes a dead tuple.
- t_xmax: 111 (deleted by txid 111)
- Update: When updating a tuple, PostgreSQL performs a logical deletion of the existing tuple and inserts a new one. The t_ctid field of the old tuple is updated to point to the new tuple.
- t_xmax: 100 (logically deleted by txid 100) t_ctid: (0,2) (points to the new tuple)
- t_xmax: 150 (logically deleted by txid 150)
- t_ctid: (0,3) (points to the new tuple)
In both update scenarios, the original tuples are logically deleted by setting their t_xmax values to the respective transaction IDs. This ensures that the updated tuples have separate entries and maintain the consistency and isolation of transactions.
📦 Free Space Map (FSM)
The Free Space Map (FSM) is a crucial component of PostgreSQL's storage management system. It keeps track of the free space available in each page of a table. The FSM is used during tuple insertion and helps determine the appropriate page to accommodate new tuples.
When a tuple is inserted, PostgreSQL consults the FSM to find a page with enough free space. If the existing page doesn't have sufficient room, the system looks for an available page with more space. Once a suitable page is found, the new tuple is inserted, and the FSM is updated to reflect the changes in free space.
Similarly, when a tuple is deleted, the space occupied by the tuple is marked as free in the FSM, allowing it to be reused for future insertions.
⚙️ Transaction Management and Tuple Visibility
Transaction management and tuple visibility are closely linked in PostgreSQL. The transaction ID (txid) plays a vital role in determining which tuples are visible to a transaction.
When a transaction starts, it gets assigned a unique txid. The transaction can see all tuples with a txid less than its own (in the past). Tuples with txids greater than the transaction's ID (in the future) are considered invisible to the transaction.
This mechanism ensures transaction isolation, as each transaction operates on a consistent snapshot of the database, only seeing tuples that were committed before its start time.
🔍 Snapshot Isolation and Multi-Version Concurrency Control (MVCC)
PostgreSQL employs the Multi-Version Concurrency Control (MVCC) technique to achieve snapshot isolation. MVCC allows concurrent transactions to operate on the database without blocking each other excessively.
Under MVCC, each transaction works with its snapshot of the database, composed of tuples that are visible to that transaction based on their transaction IDs. This approach enables a higher degree of concurrency while maintaining data integrity.
- Transaction Status 🔄 PostgreSQL defines four transaction states: • IN_PROGRESS: Represents an ongoing transaction. • COMMITTED: Indicates that a transaction has been successfully committed. • ABORTED: Denotes a transaction that has been aborted. • SUB_COMMITTED: Reserved for sub-transactions (omitted in this document). 🔍 Example: Suppose we have two transactions: • Transaction T1 with txid 200 commits, changing its status from IN_PROGRESS to COMMITTED. • Transaction T2 with txid 201 aborts, changing its status from IN_PROGRESS to ABORTED.
- How Clog Performs 📚 The Commit Log (Clog) in PostgreSQL is a logical array stored in shared memory. It consists of one or more 8 KB pages, where each page represents a transaction id (txid) and holds the status of the corresponding transaction. When the clog reaches its capacity, a new page is appended. To determine the status of a transaction, internal functions read the clog and retrieve the relevant information. 🖼️ Example: Consider a scenario where the clog contains two pages: • Page 1: Stores the status of txids 0-8191. • Page 2: Stores the status of txids 8192-16383.
- Maintenance of the Clog 💡 The data in the clog are periodically written to files stored in the pg_xact subdirectory when PostgreSQL shuts down or during checkpoint processes. These files are named 0000, 0001, and so on, with a maximum file size of 256 KB. Upon startup, PostgreSQL loads the data stored in the pg_xact files to initialize the clog. Additionally, PostgreSQL regularly performs vacuum processing to remove unnecessary old data from the clog. 📝 Note: In previous versions (before 9.6), pg_xact was called pg_clog.
- Transaction Snapshot 📸 A transaction snapshot in PostgreSQL is a dataset that stores information about the active state of transactions at a specific point in time. An active transaction refers to a transaction that is either in progress or has not yet started. PostgreSQL internally represents transaction snapshots using a textual format: 'xmin:xmax:xip_list.' For example, '100:104:100,102' indicates that txids less than 99 are not active, while txids 100, 102, and greater are active. 💻 Example: Executing the query SELECT pg_current_snapshot(); returns the transaction snapshot '100:104:100,102'. The snapshot representation breakdown: • xmin: The earliest active txid. • xmax: The first unassigned txid. • xip_list: Active txids between xmin and xmax.
Visibility check rules determine whether a tuple is visible or invisible based on the tuple's t_xmin, t_xmax, the clog, and the obtained transaction snapshot. Let's focus on the essential rules for visibility checks (ignoring sub-transactions and t_ctid discussions).
- Status of t_xmin is ABORTED • Rule 1: If t_xmin status is 'ABORTED,' the tuple is always invisible.
- Status of t_xmin is COMMITTED and t_xmax is not set • Rule 2: If t_xmin status is 'COMMITTED' and t_xmax is not set, the tuple is always visible.
- Status of t_xmin is COMMITTED and t_xmax is set • Rule 3: If t_xmin status is 'COMMITTED' and t_xmax is set, the tuple is visible if the current transaction snapshot is earlier than t_xmax. • Rule 4: If t_xmin status is 'COMMITTED' and t_xmax is set, the tuple is invisible if the current transaction snapshot is later than or equal to t_xmax.
- Status of t_xmin is IN_PROGRESS • Rule 5: If t_xmin status is 'IN_PROGRESS' and t_xmax is not set, the tuple is visible if the current transaction snapshot is earlier than the snapshot that initiated the transaction (t_xmin). • Rule 6: If t_xmin status is 'IN_PROGRESS' and t_xmax is not set, the tuple is invisible if the current transaction snapshot is later than or equal to the snapshot that initiated the transaction (t_xmin).
- Status of t_xmin is IN_PROGRESS and t_xmax is set • Rule 7: If t_xmin status is 'IN_PROGRESS' and t_xmax is set, the tuple is visible if the current transaction snapshot is earlier than t_xmax and later than or equal to t_xmin. • Rule 8: If t_xmin status is 'IN_PROGRESS' and t_xmax is set, the tuple is invisible if the current transaction snapshot is earlier than or equal to t_xmin or later than or equal to t_xmax.
- Status of t_xmin is IN_PROGRESS and t_xmax is not set, and current transaction snapshot's xmax is set • Rule 9: If t_xmin status is 'IN_PROGRESS,' t_xmax is not set, and the current transaction snapshot's xmax is set, the tuple is invisible if the current transaction snapshot is later than or equal to t_xmax.
- Status of t_xmin is IN_PROGRESS and t_xmax is set, and current transaction snapshot's xmax is set • Rule 10: If t_xmin status is 'IN_PROGRESS,' t_xmax is set, and the current transaction snapshot's xmax is set, the tuple is invisible if the current transaction snapshot is later than or equal to t_xmax. By applying these visibility check rules, PostgreSQL ensures that transactions see consistent and appropriate data according to their transaction states and the state of other concurrent transactions.
- Performing a Visibility Check Let's begin by understanding how PostgreSQL performs a visibility check. 📊 Consider the following time sequence of SQL commands: T1: Start transaction (txid 200) T2: Start transaction (txid 201) T3: Execute SELECT commands of txid 200 and 201 T4: Execute UPDATE command of txid 200 T5: Execute SELECT commands of txid 200 and 201 T6: Commit txid 200 T7: Execute SELECT command of txid 201 For simplicity, let's assume we have two transactions, txid 200 and txid 201. The isolation level of txid 200 is READ COMMITTED, while txid 201 is either READ COMMITTED or REPEATABLE READ. Now, let's examine how the visibility check works for each tuple using the SELECT commands at T3 and T5. SELECT commands of T3: At T3, the table "tbl" contains only one tuple, Tuple_1, which is visible according to Rule 6. As a result, the SELECT commands in both transactions return 'Jekyll'. Rule6(Tuple_1) ⇒ Status(t_xmin:199) = COMMITTED ∧ t_xmax = INVALID ⇒ Visible sqlCopy code testdb=# -- txid 200 testdb=# SELECT * FROM tbl; name -------- Jekyll (1 row) testdb=# -- txid 201 testdb=# SELECT * FROM tbl; name -------- Jekyll (1 row) SELECT commands of T5: Let's explore the SELECT command executed by txid 200 first. Tuple_1 is invisible as per Rule 7, while Tuple_2 is visible based on Rule 2. Consequently, this SELECT command returns 'Hyde'. Rule7(Tuple_1): Status(t_xmin:199) = COMMITTED ∧ Status(t_xmax:200) = IN_PROGRESS ∧ t_xmax:200 = current_txid:200 ⇒ Invisible Rule2(Tuple_2): Status(t_xmin:200) = IN_PROGRESS ∧ t_xmin:200 = current_txid:200 ∧ t_xmax = INVALID ⇒ Visible sqlCopy code testdb=# -- txid 200 testdb=# SELECT * FROM tbl; name ------ Hyde (1 row) On the other hand, in the SELECT command executed by txid 201, Tuple_1 is visible according to Rule 8, while Tuple_2 is invisible due to Rule 4. Thus, this SELECT command returns 'Jekyll'. Rule8(Tuple_1): Status(t_xmin:199) = COMMITTED ∧ Status(t_xmax:200) = IN_PROGRESS ∧ t_xmax:200 ≠ current_txid:201 ⇒ Visible Rule4(Tuple_2): Status(t_xmin:200) = IN_PROGRESS ∧ t_xmin:200 ≠ current_txid:201 ⇒ Invisible
testdb=# -- txid 201 testdb=# SELECT * FROM tbl; name -------- Jekyll (1 row)
PostgreSQL implements various measures to prevent anomalies and ensure data consistency. Let's examine how it addresses the three ANSI SQL-92 anomalies: Dirty Reads, Repeatable Reads, and Phantom Reads.
Dirty Reads occur when a transaction reads uncommitted data from another transaction that is still in progress. PostgreSQL prevents Dirty Reads by following the Read Committed isolation level, which ensures that a transaction only sees changes committed by other transactions.
In our example scenario, when txid 200 updates Tuple_1 at T4, txid 201 does not see the uncommitted changes in its SELECT command at T5. This behavior aligns with the Read Committed isolation level, eliminating Dirty Reads.
Repeatable Reads guarantee that a transaction sees the same snapshot of data throughout its execution, even if other transactions commit changes. PostgreSQL achieves this by using multi-version concurrency control (MVCC) and transaction IDs (txid). Each transaction has a unique txid, and PostgreSQL ensures that a transaction only sees tuples that are visible at its snapshot's txid.
In the example scenario, the SELECT command executed by txid 200 at T5 retrieves 'Hyde' as expected. Despite the concurrent update to Tuple_1 by txid 200, txid 201, executing its SELECT command at T7, still sees the initial value 'Jekyll.' This demonstrates the Repeatable Read isolation level in action.
Phantom Reads occur when a transaction sees a different set of rows during successive reads due to concurrent insertions or deletions by other transactions. PostgreSQL addresses Phantom Reads by using a combination of MVCC and predicate locking.
In our scenario, txid 201 executing its SELECT command at T7 does not encounter any Phantom Reads. Despite the concurrent insertion of Tuple_2 by txid 200, txid 201 consistently sees 'Jekyll' in its result. PostgreSQL's MVCC mechanism and predicate locking prevent the occurrence of Phantom Reads.
In PostgreSQL, there is a powerful isolation level called Serializable Snapshot Isolation (SSI) that ensures transactions are executed as if they were running serially, even in a concurrent environment. SSI is a robust mechanism that prevents serialization anomalies, such as Write-Skew, and guarantees the consistency of data. In this blog, we will explore the implementation of SSI in PostgreSQL and understand how it works with proper examples.
Serializable Snapshot Isolation is a true serializable isolation level embedded in PostgreSQL since version 9.1. It aims to provide a high level of isolation and prevent anomalies that can occur in concurrent transactions. SSI is based on the concept of a precedence graph, also known as a dependency graph or serialization graph. To understand SSI better, let's briefly go through some technical terms used in its implementation.
A precedence graph represents the order in which transactions read and write data. It captures the dependencies between transactions and helps identify potential conflicts and anomalies.
Serialization anomalies refer to situations where the outcome of concurrent transactions violates the expected serial order. One example of a serialization anomaly is Write-Skew, which we will explore in detail later.
The core strategy for implementing SSI in PostgreSQL involves detecting and resolving conflicts between transactions. If a cycle is formed in the precedence graph due to conflicting operations, it indicates a serialization anomaly. Let's consider the simplest anomaly, Write-Skew, to understand the basic strategy.
Write-Skew occurs when two transactions read a set of values and then modify them based on the assumption that the values remain unchanged. However, when both transactions commit, the modifications conflict with each other, leading to an inconsistent state.
To illustrate Write-Skew, let's consider the following schedule:
Transaction_A: Read Tuple_B Transaction_B: Read Tuple_A Transaction_A: Write Tuple_A Transaction_B: Write Tuple_B
In this scenario, Transaction_A reads Tuple_B, and Transaction_B reads Tuple_A. Then, both transactions write their respective tuples. This schedule contains two read-write conflicts (rw-conflicts), forming a cycle in the precedence graph, as shown in Figure. As a result, Write-Skew, a serialization anomaly, occurs.
To handle rw-conflicts and detect serialization anomalies, PostgreSQL follows the following strategy:
- Record all objects (tuples, pages, relations) accessed by transactions as SIREAD locks.
- Detect rw-conflicts by checking SIREAD locks whenever a heap or index tuple is written.
- Abort the transaction if a serialization anomaly is detected by checking the detected rw-conflicts.
By implementing this strategy, PostgreSQL ensures that transactions running under Serializable Snapshot Isolation are free from serialization anomalies.
PostgreSQL has implemented several functions and data structures to support the SSI mechanism. In this section, we will focus on two key data structures: SIREAD locks and rw-conflicts, which play a crucial role in detecting conflicts and ensuring serialization consistency.
SIREAD locks, also known as predicate locks, are pairs of objects and transaction IDs (txids) that store information about which objects have been accessed by which transactions. In PostgreSQL, SIREAD locks are created as part of the CheckTargetForConflictsOut function whenever a DML command is executed in SERIALIZABLE isolation level.
When a transaction performs a DML (Data Manipulation Language) command, such as INSERT, UPDATE, or DELETE, in the SERIALIZABLE isolation level, the CheckTargetForConflictsOut function is invoked. This function examines the target objects of the command and creates SIREAD locks for each object involved in the transaction.
A SIREAD lock consists of the following components:
Object Identifier: It uniquely identifies the object being accessed, such as a table or an index.
Transaction ID (txid): It represents the transaction that acquired the SIREAD lock on the object.
The SIREAD locks are stored in a data structure called the SIREAD lock table, which maintains a mapping between object identifiers and the corresponding transaction IDs.
The purpose of SIREAD locks is to track the read dependencies between transactions. When a transaction reads data from an object, it acquires a SIREAD lock on that object. This lock ensures that no other transaction can modify or delete the object until the reading transaction completes. If a conflicting write or delete operation is detected, the reading transaction will be aborted to maintain serialization consistency.
Alongside SIREAD locks, PostgreSQL uses a mechanism called rw-conflicts to detect conflicts between read and write operations. RW-conflicts represent situations where a transaction tries to read an object that is being modified by another transaction simultaneously.
When a transaction attempts to read an object, it checks for rw-conflicts by examining the SIREAD locks in the SIREAD lock table. If a transaction holds a SIREAD lock on an object, it means that the object is being modified by that transaction. In such cases, a rw-conflict is detected, and the reading transaction is aborted to ensure serialization consistency.
By combining SIREAD locks and rw-conflicts, PostgreSQL's SSI mechanism provides a robust solution for achieving serializability in multi-transaction environments. It allows concurrent execution of transactions while preventing conflicts that could lead to data inconsistencies.
It's important to note that implementing SSI in PostgreSQL requires configuring the appropriate isolation level for transactions and ensuring proper transaction management to leverage the benefits of the SSI mechanism effectively.
In conclusion, concurrency control in PostgreSQL plays a vital role in maintaining data consistency and isolation in a multi-user environment. PostgreSQL utilizes various techniques such as Multi-version Concurrency Control (MVCC), Snapshot Isolation (SI), Optimistic Concurrency Control (OCC), and Serializable Snapshot Isolation (SSI) to handle concurrent transactions effectively.
These techniques allow transactions to read and write data concurrently without blocking each other, while ensuring that anomalies such as dirty reads, non-repeatable reads, and phantom reads are prevented. PostgreSQL's transaction isolation levels, including READ COMMITTED, REPEATABLE READ, and SERIALIZABLE, offer different levels of visibility and behavior for concurrent transactions.
To manage concurrency, PostgreSQL assigns a unique transaction ID (txid) to each transaction and implements mechanisms like autovacuum to prevent the transaction ID wraparound problem. Understanding the tuple structure, insertion, deletion, and update operations, as well as the role of the Free Space Map (FSM), is crucial for comprehending concurrency control in PostgreSQL.
By effectively managing transactions, performing visibility checks, and leveraging MVCC, PostgreSQL ensures data consistency and isolation even in complex concurrency scenarios. This allows multiple users to access and modify the database concurrently without compromising the integrity of the data. PostgreSQL's robust concurrency control mechanisms make it a reliable choice for applications that require concurrent access to data while maintaining data consistency and isolation.