DEV Community

Cover image for Isolation Levels - part III: Main concepts
Franck Pachot
Franck Pachot

Posted on • Edited on

Isolation Levels - part III: Main concepts

To dig into more detail and list the isolation levels and their characteristics, four fundamental concepts must be grasped: SQL transactions, read and write time, optimistic/pessimistic locking and explicit locking.

SQL transactions

These isolation levels, like all ACID properties, pertain to SQL databases in which transactions involve multiple read and write operations to build the transaction outcome. Even a seemingly simple task like inserting a single row into a SQL table entails reading the key to check for duplicates, maybe check some referential integrity constraints, and writing to various locations, including table rows, index entries, and transaction status. It's important to note that discussing serializable transactions is irrelevant for NoSQL databases, which confine transactions to single objects without consistent secondary indexes, foreign keys, joins, savepoints, or rollbacks. Serializable is a characteristic unique to SQL transactions, involving explicit or implicit locking, explicit locking, and serializable errors.

Read and Write time

I introduced isolation levels as the coherency of read and write operations on different states. These states are defined by a point in time, between the beginning of your transaction and the commit time. Depending on the isolation level, the read time may be the start of the transaction (or even earlier), the start of the statement, or the current time of the read itself. The latter can potentially encounter anomalies even within a single query result. To ensure that the result remains consistent, the state that is read, reflecting data as of the read-point, must persist until the end of the statement, cursor, or transaction, depending on the isolation level. In the absence of MVCC (Multi-Version Concurrency Control), the database needs to acquire locks while reading to block concurrent writes. Depending on the isolation level, these locks will be held until the end of the transaction, the cursor, or the completion of the write itself. With MVCC, the database can read from a virtual consistent read snapshot as of the read time without blocking locks, which enhances concurrency.

However, writes cannot merely update this virtual snapshot because they must be merged with the database's current state at commit time. They must write to the current state of the database and ensure that this state remains unaltered from the write-time until the end of the transaction, at the commit-time.

MVCC reads from the past, the transaction, or statement start, which suffices for read-only transactions. However, they cannot write to the future, the commit-time, and require additional mechanisms to lock and wait, at least from the write time to the commit time. In addition, to prevent anomalies, databases must ensure that reads and writes operate on consistent states, as if all read and write times were simultaneous. This is akin to virtually advancing the read time to the future by blocking others, or continuing to read from a past snapshot while detecting anomalies and handling failures.

Optimistic and Pessimistic Locking

To detect conflicts between transactions and prevent anomalies arising from these conflicts, transaction intents must be recorded. These intents may include the operation type (read or write) and its scope (a column, a row, a partition, a table), which are stored in a dedicated structure, which only needs to exist for the duration of the longest transaction. Some databases stores it in memory, with some scalability and availability limits, others, like YugabyteDB, distribute it with data and transactions do not need to be aborted in case of infrastructure failure.

When a conflict is detected, a write operation can either fail, to be retried later, or it can wait for the conflicting transaction to complete before continuing. The fail-and-retry scenario is often referred to as optimistic locking because it assumes that the likelihood of conflicting on the same dataset is low and treats it as an exceptional scenario. The wait-to-continue scenario is often termed pessimistic because conflicts on the same dataset are expected in the normal scenario. For example, the occurrence of two customers simultaneously trying to order the last item in stock is rare, and optimistic locking is used to ensure the fastest throughput for the majority of orders. However, when passengers are checking in for a flight, it may be more appropriate to employ pessimistic locking to assign seats one by one.

Explicit locking and Serializable errors

Some transaction controls are handled automatically by the database, allowing developers to focus on their business logic, while others need to be managed by the application. Ideally, developers should code and test as if there's only one user in the database, and optimistic or pessimistic locking is managed by the database to prevent anomalies. This is the objective of SQL databases and the rationale behind the Serializable isolation level. However, in certain isolation levels, to reduce the overhead of serialization, developers must take additional precautions and explicitly lock a row or table to prevent corruption in race conditions.

When a potential anomaly is detected, some optimistic locking errors, referred to as serializable errors or retryable errors, can be automatically retried by the database, relieving developers from the need to handle them explicitly. However, in specific isolation levels, the operations to retry may extend beyond the current invocation, and the database cannot automatically roll back and restart them, as it lacks awareness of what has been done on the application side. For instance, if an application, during an SQL transaction, invokes a procedure on an external system, such as sending an email, or calling a webservice, only the application can determine how to respond to errors and manage the annulation of the call and retry the whole transaction.

Pessimistic locking and Read Committed aim to avoid raising retryable errors by waiting rather than failing. However, some retryable errors cannot be entirely avoided. When multiple transactions are concurrently reading and writing to the same objects but in a different order, they may end up waiting for each other indefinitely. In such cases, deadlock detection mechanisms will intervene and terminate one of the transactions, requiring the application to retry it. Applications should be prepared to either implement a retry logic for all isolation levels (specifically for multi-statements) or take care to manage the order of SQL statements. Some Object-Relational Mapping (ORM) tools may handle statement ordering to help mitigate these issues.

Isolation levels

With these concepts in mind, we can outline the behavior of each isolation level and assess its impact on anomalies, performance, and the need for additional code. SQL databases should, at a minimum, provide Serializable and Read Committed levels because applications have the flexibility to implement retry logic or use explicit locking. Avoiding one of these options could undermine the scalability of the application.


The next posts will add more explanation for each isolation level. Your comments will help to add more details for specific databases.

Top comments (2)

Collapse
 
bhmortim profile image
Brian Mortimore

great article series. Thank you. This becomes especially challenging in distributed environments where even providing serialized transactions to all distributed data replicas becomes challenging. With Cassandra, we are using Accord to add ACID to Cassandra as a consistent distributed database that can operate at high scale and high write volume. Very curious about your thoughts when applying these principals to a distributed data environment.

Collapse
 
franckpachot profile image
Franck Pachot

Thanks for your feedback. I didn't look at Accord which looks very interesting. I'm working with YugabyteDB which uses Raft. It supports all isolation levels with the PostgreSQL API, and Snapshot Isolation with the Cassandra-like API.