DEV Community

Yusen Meng
Yusen Meng

Posted on

Understanding Two-Phase and Three-Phase Commit Protocols in Distributed Systems

Introduction: What problems might we encounter in a distributed system?

In a distributed system, multiple nodes need to work together to complete various tasks. Take an e-commerce website as an example, the services of the website are distributed across multiple data centers. Users place orders on the website, the order service is in one data center, and the inventory service is in another data center. Without a coordination mechanism (such as two-phase commit), the following problems may occur: the order service has already created the order, but when reducing the inventory, due to network problems, the inventory service did not receive the request, resulting in incorrect inventory quantity, and users may purchase goods that are actually sold out. Or, the inventory service has already reduced the inventory, but there was a problem when the order service created the order, causing the inventory to be reduced erroneously and there is no corresponding order.

This is just one example of many possibilities. Without a suitable distributed transaction processing mechanism, it may lead to data inconsistency, system unreliability, and damage to the user's shopping experience. To solve these problems, we need to introduce protocols like two-phase commit to ensure that transactions in distributed systems can be executed safely and completely.

Two-Phase Commit: A Deep Understanding of the Core of Distributed Transactions

With the popularity of distributed systems, how to ensure data consistency across multiple nodes has become an important challenge. Two-phase commit (2PC) is one of the classic algorithms to solve the distributed transaction problem. This article will provide an in-depth analysis of the working principle of two-phase commit and its advantages and disadvantages.

What is Two-Phase Commit?

Two-phase commit (2PC) is a protocol that implements transaction atomicity in a distributed system. Atomicity means that a transaction (transaction) is either fully executed or not executed at all, and there will be no partial execution. In a distributed system, a transaction may involve multiple nodes, so a mechanism is needed to ensure that all nodes either commit (commit) the transaction or roll back (rollback) the transaction. This is the role of the two-phase commit protocol.

Basic Process of Two-Phase Commit

2PC mainly involves two roles: Coordinator and Participant.

First Phase (Preparation Phase)

  1. Voting request: The coordinator sends a transaction request to all participants, asking them whether they can commit the transaction.
  2. Voting: Each participant pre-executes the transaction and decides to vote based on its local situation. If it can be executed, it records the transaction log and responds with "YES"; otherwise, it responds with "NO".

Second Phase (Commit/Rollback Phase)

  1. All votes are YES: If all participants respond with "YES", the coordinator sends a "commit" command to all participants.
  2. Some votes are NO: If any participant responds with "NO", or does not respond within a specified time, the coordinator sends a "rollback" command to all participants. Each participant, after receiving the "commit" or "rollback" command, performs the corresponding operation and sends a confirmation to the coordinator.

2PC

Pros and Cons

Pros:

  1. Ensures transaction atomicity in a distributed environment: The two-phase commit protocol ensures transaction atomicity in a distributed environment through two stages of operation. In the first stage, all participants pre-execute the transaction and decide whether they can commit the transaction based on their local situation. In the second stage, if all participants agree to commit the transaction, the transaction will be committed, otherwise it will be rolled back. In this way, whether the transaction is committed or not, all participants' states can be consistent, thus achieving transaction atomicity.

  2. The structure is relatively simple and easy to understand: The process and structure of the two-phase commit protocol are relatively simple. It only includes two stages: the preparation stage and the commit/rollback stage. In the preparation stage, the coordinator sends a transaction request to all participants, and the participants vote based on their local situation. In the commit/rollback stage, the coordinator decides whether to commit or rollback the transaction based on the voting results. This process is intuitive, easy to understand, and easy to implement.

Cons:

  1. Performance overhead: The two-phase commit protocol involves multiple network communications, including the coordinator sending transaction requests to participants, participants returning voting results to the coordinator, and the coordinator sending commit or rollback commands to participants. These communication processes will increase the system latency, especially in poor network environments, the performance overhead may be greater.

  2. Single point of failure: In the two-phase commit protocol, the coordinator plays a key role. If the coordinator fails or crashes, it will cause the participants to be in an uncertain state, not knowing whether to commit or rollback the transaction. In this case, the participants must wait for the coordinator to recover before they can continue to execute, which may cause the overall performance of the system to decline. Multiple coordinators can be used to avoid single point of failure. For example, you can use the master-slave mode. When the master coordinator fails, the backup coordinator can take over. In addition, heartbeat detection and timeout mechanisms can be used to detect and handle coordinator failures.

  3. Blocking problem: In the second stage of the two-phase commit protocol, if a participant does not return a voting result, or the returned result is "NO", then the coordinator will send a rollback command to all participants. However, if a participant fails or is delayed at this stage, other participants must wait for this participant to recover before they can continue to execute, which may cause a blocking problem. In some scenarios, this blockage may last a long time, seriously affecting the performance and availability of the system. Introducing a timeout mechanism can solve the blocking problem. If the participant does not respond within a specified time, the coordinator can decide to rollback the transaction to avoid other participants waiting for a long time.

Three-Phase Commit

However, these strategies cannot completely solve the problems of the two-phase commit protocol. In order to better solve the distributed transaction problem, we propose the three-phase commit protocol (3PC). 3PC adds a timeout mechanism and a pre-commit stage to 2PC, which can better solve the single point of failure and blocking problems.

The three-phase commit introduces a new stage, making the entire protocol have the following three stages:

First Phase (CanCommit Phase)

  1. Commit Inquiry: The coordinator sends a commit inquiry to all participants, asking them whether they can enter the next stage, that is, commit the transaction.
  2. Participant Decision: Each participant decides whether it can commit based on its own status, and then responds with "can" or "cannot".

Second Phase (PreCommit Phase)

  1. All participants agree: If all participants reply "can", then the coordinator decides to continue to commit and enter this stage, sending a "pre-commit" message to all participants.
  2. Participant Pre-execution: After receiving the "pre-commit" message, the participant executes the transaction operation but does not commit, and then sends a "ready to commit" message to the coordinator.

Third Phase (DoCommit Phase)

  1. All participants are ready: After the coordinator receives the "ready to commit" message from all participants, it sends them a "commit" message.
  2. Participant Commit: After receiving the "commit" message, the participant officially commits its transaction.
  3. Transaction Abort: If during this process, any participant or coordinator sends an "abort" message or does not respond, then the transaction will be aborted.

3PC

Pros:

  1. Better solves single point of failure and blocking problems: The three-phase commit protocol adds a pre-commit stage and a timeout mechanism to the two-phase commit, which makes it better at solving single point of failure and blocking problems. In the pre-commit stage, participants execute transaction operations but do not commit, which provides more flexibility for the coordinator to handle possible failures and blockages.

  2. Enhances system availability and robustness: By introducing a timeout mechanism, the three-phase commit protocol can avoid other participants waiting for a long time in the case of no response from a participant or a response of "NO", thereby enhancing the availability and robustness of the system.

  3. Improves transaction execution efficiency: In the three-phase commit protocol, participants begin to execute transaction operations in the pre-commit stage, which can improve the execution efficiency of the transaction, especially when the transaction operation takes a long time.

Cons and Solutions:

  1. Higher message overhead: The three-phase commit protocol adds a stage to the two-phase commit protocol, which means more message exchanges, leading to higher message overhead. This may affect the performance of the system, especially in poor network environments. The solution is to optimize the network environment and message transmission mechanism to reduce message overhead.

  2. Increased complexity: The introduction of a new pre-commit stage and timeout mechanism increases the complexity of the protocol. This may increase the difficulty of implementation and maintenance. The solution is to use more advanced protocols, such as Paxos or Raft, which can better handle failures and blocking problems when processing distributed transactions, and also provide higher performance.

  3. Blocking problem still exists: Although the three-phase commit protocol solves the blocking problem by introducing a timeout mechanism, in some cases, such as network partitioning or simultaneous failure of the coordinator and participants, blocking may still occur. The solution is to introduce a fault recovery mechanism, such as log recording. When the coordinator or participant fails, the state can be recovered through the log to avoid blocking.

Conclusion

In a distributed system, transaction processing is an important issue. Two-phase commit (2PC) and three-phase commit (3PC) protocols are two common solutions. 2PC ensures transaction atomicity in a distributed environment through two stages of operation, but it has performance overhead, single point of failure, and blocking problems. 3PC adds a pre-commit stage and a timeout mechanism to 2PC, better solves the single point of failure and blocking problems, improves the execution efficiency of the transaction, but brings higher message overhead and complexity. In practical applications, the appropriate protocol needs to be selected based on the specific requirements and environment of the system.

Top comments (0)