Author: Evan Weaver
Date: March 15, 2019
Originally published on the Fauna blog.
It is very difficult to find accurate information about the correctness and isolation levels offered by modern distributed databases and the operational conditions required to achieve them. Developers use different terms for the same thing, the meaning of terms varies or is ambiguous, and sometimes vendors themselves do not actually know.
At Fauna, we care a lot about accurately describing which guarantees different systems provide. This is our effort to centralize a description of which database does what. For consistency’s sake, we will use the terminology from Kyle Kingsbury’s explanation on the Jepsen site. The chart is ranked by the maximum multi-partition isolation level offered.
The data is based on statements about isolation levels from vendor documentation, white papers, and developer commentary, exclusive of aspirational marketing statements. We have tried to be neutral in the characterization of the various systems' architectural properties. Whether the system implementations uphold these guarantees is addressed elsewhere. If you haven't already, please see FaunaDB's own Jepsen results for confirmation that FaunaDB upholds its guarantees.
Before we BEGIN
In discussing transactional isolation, we frequently encounter the "worse is better" argument, which essentially goes:
- This database does what it does
- Implementing better isolation in the database is impossible or has unacceptable tradeoffs
- Implementing better isolation in the application is simple and useful
This argument also goes by "it's not a bug, it's a feature".
The pretense of low maximum isolation levels, eventual consistency, or CRDTs is that application developers are ready and willing to work through every failure and recovery condition of their distributed dataflow. But in practice, moving beyond “works on my machine” correctness testing requires an extraordinary level of investment that product teams simply can not do.
In my experience, the implications of different isolation levels are very subtle. Pushing the burden to application developers—especially when there are a lot of distinct applications, like in a microservices architecture—is tremendously detrimental to productivity. And although tunable consistency increases flexibility, it cannot be used to paper over an isolation level that is fundamentally too weak to effectively compose.
After all, /dev/null is serializable, but not very useful as a database.
Distributed Databases
Distributed databases present a unified topology and do not require operator management of replication, although some, like the Percolator systems, do require management of special nodes.
Replicated Databases
Replicated databases require operator management of primaries and secondaries and the associated replication links. Asynchronous replication can improve availability and scale read capacity, but does not offer any distributed consistency guarantees. Semi-synchronous replication further improves availability, but does not improve distributed isolation.
This is the traditional RDBMS scale-out model.
Conclusion
A good way to think about isolation is in terms of the breadth of potential anomalies. The lower the isolation level, the more types of anomalies can occur, and the harder it is to reason about application behavior both at steady-state and under faults. At Fauna, we encourage you to think critically about whether your current databases really guarantee the level of transactional isolation you need.
Top comments (0)