When reading about distributed databases, the CAP theorem is often discussed, where a system can only choose 2 out of the following three properties
- consistency
- availability
- partition tolerance
It is usually then stated that network partitions are rare but unavoidable,so most databases chose CP or AP. But what would it mean for an application to not be partition tolerant? It seems that not being partition tolerant would sacrifice either availability (by shutting down when a partition is detected), or consistency (by letting each partition manage its own state). So can a system ever really be A%C and only sacrifice partition tolerance?
Top comments (3)
From the wiki itself.
So a better exercise would be understanding it from the following format.
In a cluster of X node, when Y nodes are failed/removed/disconnected/isolated from the cluster.
Does the system choose to ensure consistency (by crashing), or availability (by serving possibly incorrect data)
X and Y at times can be highly configurable in certain systems.
There is also PACELC, which goes on to refine CAP into the following paraphrase: When Partitioned, a system is either Available or Consistent (not both), else (not partitioned) the system chooses to minimize Latency or provide Consistency but not both.
The idea being that consistency between nodes requires replication. And replication takes extra time, which adds latency to any changes.
CA is trivially satisfied by not being distributed or by tolerating divergence, although the latter case might show how calling it a "theorem" is overselling it: consistency qua "reads receive the most recent writes" can be read as implying partition-tolerance magic. And there are other instances where the simple three-options-pick-two formulation conceals more ambiguities than may be good for it.
You may find Mark Burgess' thoughts on the matter useful.