For applications that demand absolute correctness of data, Aerospike offers the Strong Consistency (SC) mode that guarantees no stale or dirty data is read and no committed data is lost. Aerospike's strong consistency support has been independently confirmed through Jepsen testing.
Developers building such applications should follow the following Twelve Do's of Consistency.
The scope of a transaction in Aerospike is a single request and a single record. In other words, an atomic update can only be performed on a single record. Therefore model your data such that data that must be updated in a transaction (atomically) is kept in a single record. Data modeling techniques like embedding, linking, and denormalization can be used to achieve this goal.
Per the CAP theorem, the system must make a choice between Availability and Consistency if it continues to function during a network partition. Aerospike offers both choices. A namespace (equivalent to a database or schema) in a cluster can be configured in AP (choosing Availability over Consistency) or SC (Strong Consistency, choosing Consistency over Availability) mode. All writes in SC mode are serialized and synchronously replicated to all replicas. ensuring one version and immediate consistency.
In this pattern, the generation comparison check is included in the write policy. A record's generation is its version, and this check preserves validity of a write that is dependent on a previous read. The "Check-And-Set" (CAS) equality check with read generation would fail raising generation-error if another write has incremented the generation in the meanwhile. In which case, the entire Read-Modify-Write pattern must be retried.
Uncertainty about a transaction's outcome can arise due to client, connection, and server failures. System load can lead to incomplete replication sequence before the request times out with "in-doubt" status. There is no transaction handle for the application to use to probe the status in this case. It must therefore tag a record with a unique id as part of the transaction, which it can use later to check if the transaction succeeded or failed.
5. Achieve multi-operation atomicity and only-once effect through Operate, predicate expressions, and various policies.
The Aerospike operation Operate allows multiple operations to be performed atomically on a single record. It can be combined with various policies that enable conditional execution to achieve only-once effect. Examples include predicate expressions in operate policy, insertion in map with create-only write mode, insertion in list with add-unique write flag, and so on.
An only-once write (enabled by the mechanisms described in 5 above) becomes safe to just retry on failure. A prior success will result in an "already exists" failure which indicates prior successful execution of the transaction.
7. Record the details for subsequent handling in a batch or manual process if a write's outcome cannot be resolved.
During a long duration cluster split event, the client may be unable to resolve a transaction's outcome. The client can timeout after retries but should record the details needed for external resolution such as the record key, transaction id, and write details.
There are four SC read modes to choose from: Linearizable, Session, Allow-replica, and Allow-unavailable. They all guarantee no data loss and no dirty reads, but differ in "no stale" guarantees as well as performance. A Linearizable read ensures the latest version across all clients, but it involves checking with all replicas and therefore is most expensive. Also, without additional external synchronization mechanism among clients, the version is not guaranteed to be the latest when it reaches the client. A Session read is faster as it directly reads from the master replica, and therefore recommended. In a multi-site cluster, local reads are much faster than remote reads. Since the master replica may reside at another site, the Allow-replica mode offers much better performance with no-stale guarantee practically equivalent to the Session mode, and therefore is recommended in multi-site clusters. There are no staleness guarantees with Allow-unavailable mode, but the application may judiciously leverage it when it is aware of stale data but can still derive positive value from it.
The max-retries value indicates the number of retries that the client library will perform automatically in case of a failure. Because the transaction logic is sensitive to the type of failure, a transaction failure must be handled in the application, not automatically by the client library. Therefore use the default value to turn off the automatic retries in the client library.
10. For maximum durability, commit each write to the disk on a per-transaction basis using commit-to-device setting.
With this setting, a replica flushes the write buffer to disk before acknowledging back to the master. The application on a successful write operation is certain that the update is secure on the disk at each replica, thus achieving maximum possible durability. Be aware of the performance implications of flushing each write to disk (unless using data in PMEM), and balance it with the desired durability.
11. For exactly-once multi-record (non-atomic) updates use the pattern: record atomically - post at-least-once - process only-once.
Aerospike does not support multi-record transactions. To implement exactly-once semantics for multi-record updates, record the event atomically in the first record as part of the update. Implement a process to collect the recorded event and post it for processing in the second record. At-least-once semantics can be achieved by removing the event only after successful hand-off to or execution of the subsequent step which would update another record with only-once semantics. This sequence achieves exactly-once execution of multi-record updates. The pattern is explored further in this post.
Before a write request is sent to the server, record the intent so that it can be read and retried if necessary during crash recovery. The intent is removed on successful execution as part of normal processing. During recovery, the intent list is read and retried.