Transactions are hard. Distributed transactions are harder. Distributed transactions over the WAN are final boss hardness.
October 15, 2018
FaunaDB is a global serverless database for ubiquitous, low latency access to app data. It prioritizes data correctness and scale. Fauna is heavily integrated with GraphQL making it a natural extension for JAMstack developers wishing to incorporate a database.
The Fauna Query Language provides many built-in functions that can be used to query and modify a database. User-defined functions (UDFs) can store and run commonly used FaunaDB queries.
FaunaDB is a transactional, temporal, geographically distributed, strongly consistent, secure, multi-tenant, QoS-managed operational database. It’s implemented on the JVM for portability, and it’s relational, but not SQL. Instead, it’s queried via type-safe embedded DSLs, like LINQ.
Welcome to the Jungle (September 26, 2016)
The explosion of data at companies like Google, Facebook, and Amazon in the mid 2000s pushed traditional relational database technology to its breaking limit. To address the difficulties of sharding systems like MySQL and PostgreSQL a new group of databases were created known as NoSQL.
Google’s Bigtable was a distributed storage system implemented on top of the Paxos inspired Chubby. Before talking about Bigtable it's useful to understand Chubby. The Chubby lock service provided coarse-grained locking as well as reliable (low-volume) storage for a loosely-coupled distributed system.
Its interface is much like a distributed file system with advisory locks with a design emphasis on availability and reliability, instead of high performance.
Bigtable was a distributed storage system for managing structured data designed to scale to petabytes of data across thousands of commodity servers. HBase and Hypertable were created as open source alternatives to Bigtable.
Bigtable depended on a cluster management system for scheduling jobs, managing resources on shared machines, dealing with machine failures, and monitoring machine status. The Google SSTable file format stored Bigtable data.
An SSTable provided a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.
In the late 2000s many different experimental systems were created that attempted to meet the scaleability requirements of companies that had jumped ship to NoSQL while retaining the features developers had come to expect from traditional relational database management systems like joins and ACID transactions.
The Paxos algorithm, when presented in plain English, is very simple.
Paxos Made Simple (November 1, 2001)
In 2011, Megastore implemented it's own versions of the Paxos algorithm to show how the classic distributed consensus algorithm could be used to achieve consistency at scale.
Spinnaker was designed to run on a large cluster of commodity servers in a single datacenter. Paxos ensured that a data partition in Spinnaker was available for reads and writes as long as a majority of its replicas were alive. It's features included:
- Key-based range partitioning
- 3-way replication
- Transactional get-put API
In 2012, two papers were released that built on the foundational work of Bigtable, Dynamo, Megastore, and Spinnaker by showing how consensus algorithms could be used to build geographically replicated, consistent, ACID compliant, transactional database systems.
Spanner ultimately made a bigger splash on the scene at the time but both have proved to be highly influential in the design of databases in the late 2010s including CockroachDB, YugaByte, Vitess, TiDB, and Aurora.
Before [the Google Spanner paper], it was widely believed and also continually marketed by the NoSQL vendors that distributed ACID transactions were literally impossible.
Software Daily (March 21, 2019)
Up to this point all of these different systems were using custom implementations of Paxos as their consensus algorithm. Paxos was first described in Leslie Lamport's famous(ly complicated) 1998 paper The Part-Time Parliament. It described how a hypothetical parliament made up of voting state machines could use three phase commits to achieve global consistency.
If that last sentence doesn’t make any sense to you then you are not alone. 2013 saw the introduction of Raft in a paper titled In Search of an Understandable Consensus Algorithm. Its author considered the comprehensibility of the algorithm to be a primary concern.
The consensus algorithm manages a replicated log containing state machine commands from clients. The state machines process identical sequences of commands from the logs, so they produce the same outputs. It is called Raft because it is a bunch of logs.
Raft aimed to provide the same benefits of Paxos with a simpler implementation that could be more easily adopted by practical systems.
FaunaDB has taken the insights of Calvin and Raft and combined them with the real world experience of Evan Weaver's time scaling Twitter to create a new base protocol, the FaunaDB Distributed Transaction Protocol. It supports strictly serializable, externally consistent transactions.
In FaunaDB, data is both partitioned and replicated across machines. Each partition contains multiple records (rows), and each record may have many versions associated with it. Each version is stored separately and is annotated with the transaction identifier that wrote that version.