Kostas Kalafatis

Posted on May 12, 2023 • Originally published at dfordebugging.wordpress.com

CAP Theorem

#architecture #database

We'd all like to have everything, but we know that's not possible. We even have mathematical proof that tells us we can't do this with distributed systems like the ones we build with microservice designs. You've probably heard of the CAP theorem, especially when people talk about the pros and cons of different ways to store data. At its core, it tells us that in a distributed system, we can trade off three things: Consistency, Availability and Partition Tolerance.

The CAP theorem is important since it helps us decide what we will sacrifice to meet our system's needs. On a platform such as Twitter, for example, we would sacrifice consistency to achieve availability. We need to offer a highly available service to as many users as possible. Even if one user posts a tweet and it becomes publicly available after a short time, the system is running properly. Data will not be consistent across all of Twitter's servers, but remember we aim for quantity not quality.

In a banking system, we would sacrifice availability to achieve consistency. We need to make sure that the data are consistent across the system, even if that means that some users might not be able to use our service.

What is the CAP Theorem

In general, the term distributed system describes all systems that are spread across two or more nodes connected over a network of any type. A distributed system is a collection of computers that work together to work as a single unit. All of the distributed machines have one shared state and operate concurrently.

The most significant advantage of a distributed system is its capacity to grow horizontally by adding more and more nodes to the cluster. Additionally, such systems greatly mitigate the problems of a single point of failure, adding to the system's resilience and fault tolerance.

In any distributed system with a state, we can only provide two of the following three guarantees: Consistency, Availability and Partition Tolerance.

Let's imagine a very simple distributed system. Our system is composed of two servers, Server1 and Server2. Both of these servers are keeping track of the same variable state, whose value is initially state0. Server1 and Server2 can communicate with each other and can also communicate with external clients. Here's what our system looks like:

A client can request to write and read data, and these requests can be handled by any server. When a server receives a request, it performs some computations and then responds to the client.

Understanding Consistency

In a consistent system, all nodes see the same data simultaneously. If we perform a read operation on a system with high consistency, we are guaranteed that we have the most recent data. A read operation on any node should return the same data. Also, all users see the same data at the same time, regardless of the node. It's the responsibility of the node that the write operation is performed to instantly replicate the changes to all other nodes in the system.

It is important to note that when discussing consistency in terms of the CAP theorem, we are talking about linearised consistency. Linearized consistency ensures that each operation in a concurrent system appears to occur atomically at some point between its invocation and response as if there are no concurrent operations. In our two-server system, we need to ensure that a request for updated data from the client returns the updated value, even if the update is performed by Server 1 and the read request is handled by Server 2.

To achieve this level of consistency, we need some replication functionality where changes made on Server 1 are propagated to Server 2 and vice-versa.

Understanding Availability

In an available system, every client must receive a response 100% of the time. The entire system must remain functional even if several nodes malfunction. Unlike a consistent system, though, there's no guarantee that the response will be the most recent write operation.

Understanding Partition Tolerance

Network partitions are an integral part of modern distributed systems. In a partition-tolerant system, the system will continue to function even if there is a break in communication between nodes. The system should not fail, regardless of whether messages are dropped or delayed between nodes.

Back to our example system, imagine that our servers are deployed across two separate data centres. Backing our service instance in each data centre is a database, and the two databases communicate and try to synchronize data between them. Read and write operations are performed on a local database and we use two-way replication to synchronize the data between the two nodes.

Now let's think about what happens when something fails. Imagine that a shark bit down the cable that connects the two data centres. The connection between the two data centres will stop working and the synchronization will fail. Writes made to Server1 will not propagate to the database of Server2 and vice versa. There are techniques to ensure that the two databases will be eventually consistent, but let's see what happens in the meantime.

Sacrificing Consistency

Let's assume that we don't stop our services down entirely. If Server1 handles a write request, the database in Server2 will not get updated. This means that any read requests sent to Server2 will receive stale data as a response. In other words, our system is still available since both servers can serve requests, and the system is running despite the partition, but we have lost consistency, we don't get to keep all of the traits. This kind of system is called an AP system because it displays Availability and Partition Tolerance.

During this partition, if we keep accepting write requests, then we are accepting that in the future we will need to resynchronize the databases. The longer the partition lasts, the more difficult this resynchronization can become.

In a real system, even if we don't have a network failure between our database nodes, the replication of data is never instantaneous. After all, bandwidth is not infinite and latency is never zero. Systems that accept ceding consistency for availability and partition tolerance, are said to be eventually consistent, that is, we expect that at some point in the future, all nodes will have the updated data, so we have to live with the fact that users will possibly see stale data.

Sacrificing Availability

Let's now consider what happens if we want to maintain consistency. To keep consistency the database nodes need to communicate with each other and make sure that they all have the same data. But during a partition, the database nodes cannot communicate with each other, and thus cannot coordinate to ensure consistency. The system cannot guarantee consistency if the database nodes can communicate so the only option left is to refuse to respond to requests. In other words, the system has to sacrifice availability. This kind of system is called a CP system because it displays Consistency and Partition Tolerance.

Ensuring consistency across multiple nodes is a really hard problem. Imagine that we want to read a record from a local database node. But how do we know that the data are up to date? We have to go on and ask another node. But we also have to make sure that the database node will not be updated while the read request completes. In other words, we need to initiate a transactional read across multiple database nodes to ensure consistency. And transactional reads require locks.

Distributed systems have to expect failure. Consider our transactional read across a set of consistent nodes. Then we lock a given record while the read is performed. We complete the read and try to unlock the remote node, but now we have a partition and can't connect with it. Locks are really hard to implement correctly in a single process system, and the complexity increases exponentially when we try to do it in a distributed environment.

Sacrificing Partition Tolerance

Up to this point, we discussed an eventually consistent AP system and a consistent but hard-to-scale CP system. And since we have to choose two out of three, why not choose a CA system? In other words, why not build a linearly consistent, highly available system that will not tolerate partitions?

Unfortunately, in any distributed system, partitions are bound to happen, meaning that a CA database isn't a very practical choice.

Proof of the CAP Theorem

Now that we've acquainted ourselves with what consistency, availability, and partition tolerance means in a distributed setting, we can prove that the system cannot simultaneously have all three.

I will provide a visual representation of the proof, to avoid some complex set theory math. It might not be the mathematical proof of the theorem but it still stands.

Assume that we have a system that is consistent, available, and partition tolerant, all at the same time. Our system would look somewhat like the following:

Now let's cause a partition on the system. This means that Server 1 and Server 2 can't communicate anymore:

Next, the client makes a write request to Server 1. This will cause a change of state to Server 1 from State 0 to State 1. If Server 1 doesn't respond due to the partition then our system is not available, so Server 1 must process the request from the Client and change its state to State 1.

Finally, the client makes a read request to Server 2. Again, Server 2 can either not respond due to the partition, making the system not available, or process the request to keep the system available. But since the system is partitioned, Server 2 cannot communicate with Server 1 to update its state. The only option left for Server 2 is to respond to Client.

Server 2 replied to the Client thus keeping the system available, but it responded with stale data, making the system inconsistent.

At the beginning of this section, we assumed that a system that was simultaneously consistent, available and partition tolerant existed. But we have just found out there exists an execution path for such a system in which the system will act inconsistently. Thus, no such a system can exist.

CAP Theorem and NoSQL Databases

NoSQL databases are ideal for distributed applications. Unlike the traditional, relational SQL databases, which are vertically scalable, NoSQL databases are horizontally scalable and distributed by design.

NoSQL databases are classified based on the two CAP characteristics they can support, CP databases and AP databases.

CP Databases - MongoDB

MongoDB is a popular NoSQL database (more accurately a DBMS) that stores data as Binary JSON documents. It's frequently used for big data and real-time applications. By design, MongoDB resolves network partitions by maintaining consistency, while sacrificing availability.

MongoDB is a single-master system. This means that there can be a single primary node that handles all the write operations. All other nodes in the same replica set replicate the primary node's data operations and apply them to their own data set. By default, clients read only from the primary node, but they can also set a read preference so that they can access secondary nodes.

When the primary node becomes unavailable due to a partition, the secondary nodes elect the node with the most recent changes as the new primary node. The system becomes unavailable until all other secondary nodes catch up with the new master. Because clients can't make any write requests during this election process, the data remain consistent across the entire network.

AP Databases - Apache Cassandra

Apache Cassandra is another popular NoSQL database that stores data in a wide-column storage model. A wide-column storage model can be interpreted as a two-dimensional key-value store. Unlike MongoDB, Cassandra has a masterless architecture and as a result, it has multiple points of failure.

A Cassandra cluster is a collection of instances, called nodes, connected in a peer-to-peer share-nothing distributed architecture. This means that there is no master node and every node can perform all database operations and serve client requests. Data are partitioned across nodes based on their partitioning key.

Data also have a replication factor that determines how many copies should be made. The replicas are stored on different nodes.

When a client sends the request, it is first handled by the coordinator node. It is the job of the coordinator to forward the request to the nodes that contain the data for that request and to send the results back to the client.

Cassandra's consistency model is a bit more complicated than the one used by MongoDB since Cassandra provides Tunable Consistency. This means that the client can select the level of consistency they require. We can tune the number of nodes that agree on what the latest data is for read requests, let's call this number R. We can also tune the number of nodes that agree on what the latest data is for write requests, let's call this number W. And we can configure the number of nodes in our cluster, let's call this number N.

What we can tune now is how many nodes in the cluster need to agree that the data is valid during an operation. But the more nodes we need to agree on for the validity of data, the less available the system becomes. To achieve maximum consistency we need all of the nodes of the cluster to agree or R + W >= N. But since this will lead to a loss of availability of the system until all nodes are up to date, we tend to require the approval of fewer nodes or R + W < N. This can lead to scenarios that the data we will get might be stale, so Cassandra is considered an eventually consistent database. Rather than maintain strict consistency that could really slow things down at the scale, eventual consistency enables high availability—just at the cost of every instance of your data not being synced up across all servers right away.

PACELC vs CAP Theorem

As we discussed before, we cannot avoid partitions in a distributed system. Therefore, according to the CAP theorem, a distributed system should choose between consistency or availability. For that reason ACID(Availability, Consistency, Isolation, Durability)-based databases such as MySQL, Oracle and Microsoft SQL Server, chose consistency, by refusing to respond if they cannot check with peers. On the other hand, BASE(Basically Available, Soft-state, Eventually consistent)-based databases like Cassandra and Redis, chose availability by responding with local data without ensuring that it is the latest.

But the CAP theorem doesn't specify what happens when there is no network partition. The choice between consistency and availability is really only made when partitioning or failure occurs. At other times, no tradeoffs are required. Moreover, these choices can happen many times within the same system, the choices may change depending on the operation or even the specific data or the user.

Furthermore, in its original definition, the CAP theorem ignores time delays, although in practice latency and partitioning are deeply connected. In practice, the CAP nature occurs during a timeout, the period when the system must make a fundamental decision.

The PACELC theorem states that in a system that replicates data:

If there is a Particion, a distributed system can tradeoff between Availability and Consistency.
Else, when the system is running normally in the absence of partitions, the system can trade off between Latency and Consistency.

The PACELC theorem was developed to address a key limitation of the CAP theorem. It makes no provision for performance or latency. For example, according to the CAP theorem, a database can be considered Available even if the query returns a response 30 days later.

Abadi divides system operation into two modes — operation without and with partitioning. Accordingly, the choices for the designer of a distributed system are as follows: in the case of partitioning, one must choose between availability and consistency, otherwise, that is, in the case of normal system operation, one must choose between reduced response time and consistency.

Conclusion

We can now see why the CAP theorem plays a major part in data management for distributed systems. These systems need to be planned out properly to provide the right balance between consistency and availability. Despite not being the best suited for most modern-day applications, CAP is still one of the most useful laws applying to distributed systems. The PACELC theorem, an extension of the CAP theorem, models better the needs of modern distributed applications, but you need a somewhat solid understanding of the CAP theorem when you design your architecture.

I hope that this post was a good learning experience for you and that you got some useful insights. I would love to hear feedback from you.

Thank you for your time.

DEV Community