DEV Community

Sidharth
Sidharth

Posted on • Originally published at newsletter.scalablethread.com

Understanding CAP Theorem for Distributed Systems

When designing a distributed system, software engineers frequently encounter challenges related to three essential properties: Consistency (C), Availability (A), and Partition Tolerance (P). While engineers ideally aim to address all three properties in their design, the CAP theorem dictates that a system can only fulfill two of these properties simultaneously.

Three Properties

Consistency

In distributed systems, consistency in the CAP theorem ensures that after a write operation, all subsequent read operations should return the most recently written value, thereby preventing stale reads. This means that regardless of which node clients are connected to and which node the writes happen, all clients should read the same data simultaneously.

Availability

For a distributed system to be considered available, it must be able to respond to every request it receives, even if some of its nodes are failing or experiencing network partition. However, in such cases, the response may not always reflect the most recent data written to the system.

Partition Tolerance

In a distributed system, a network partition occurs when the system divides into multiple sub-networks due to communication failures. A partition-tolerant system will continue to serve requests even if communication between nodes is dropped or delayed.

Pick Any Two!

As previously mentioned, the CAP theorem states that a distributed system can only achieve two out of the three properties: consistency, availability, and partition tolerance. This means that if a node or group of nodes becomes partitioned from the rest of the system, it can either continue responding to client queries (AP), stop responding to client queries (CP), or shut down completely (CA).

AP — Availability and Partition Tolerance

In this scenario, the system prioritizes availability and partition tolerance, resulting in slightly weaker consistency. This means that during a network partition, the disconnected nodes may return outdated data to clients (hence sacrificing consistency). However, once the network delay is resolved or the partition is healed, the data on the nodes will eventually synchronize to the most recent version, achieving eventual consistency.

AP — Availability and Partition Tolerance


AP — Availability and Partition Tolerance

CP — Consistency and Partition Tolerance

In this situation, a distributed system emphasizes consistency and partition tolerance, which leads to decreased availability when there are network faults. In other words, if a network partition happens, the nodes that are disconnected will not respond to client queries (hence sacrificing availability), which contradicts the idea of availability.

CP — Consistency and Partition Tolerance


CP — Consistency and Partition Tolerance

CA — Consistency and Availability

In this situation, the system prioritizes consistency and availability, resulting in a lack of partition tolerance. During a network partition, the disconnected nodes shut down, sacrificing partition tolerance. Nevertheless, the connected nodes continue to maintain consistency and availability.

CA — Consistency and Availability


CA — Consistency and Availability


This article was originally published in The Scalable Thread newsletter. If you enjoyed reading it, please consider subscribing FREE to the newsletter to support my work and to receive the next post directly in your inbox.

Top comments (0)