“We need to build this for scale!” yells your manager, in a fury. You think back to a simpler time: when your application only had a few users. When your transactions executed sequentially, and you didn’t need to deal with things like Hot Locks and stalemates. Time to hire a DBA.
Running a database would be easy if your app only had one user, or if your users did one thing after another, never concurrently and never in contention. In reality though, you want to get more people using what you’ve built – your business needs to grow. Increased user volume tends to mean a lot of concurrent transactions, and all those requests bring us into the realm of database locking and isolation, a topic with little publicly available information (until you know what to search for!) and an almost mystic air to it. This post will hopefully make things simpler.
Why database locking and isolation exist
A modern web application might need to handle hundreds of thousands of transactions daily, and the biggest ones deal with orders of magnitude more than that. That kind of scale can create a bunch of problems with data integrity, starting with the big 3 as defined by the SQL standard:
Dirty Reads
If a transaction is in the middle of updating some data and hasn’t committed yet, and another transaction is allowed to read that uncommitted data, that’s called a dirty read, and could lead to your app showing incorrect data that got rolled back.
An example of a dirty read could be a transaction that invalidates login tokens when a user changes their password. If as the first transaction loads the token, a second one reads that token before the first invalidates it, you’d have yourself a dirty read.
In terms of actual SQL, here's what a dirty read might look like:
### Transaction 1 ###
SELECT user_login_token_id
FROM tokens
WHERE ...
UPDATE tokens
SET token_status = "INVALID"
WHERE ...
### Transaction 2 ###
SELECT user_login_token_id
FROM tokens
WHERE ...
If the second SELECT
statement was allowed to read the user_login_token_id
before the first transaction invalidated it, the read would be dirty: it's reading stale data.
Non-Repeatable Reads
If you’ve got two consecutive reads in one transaction with a concurrent update in between, those reads are going to show different results even though they’re part of the same transaction.
An example might be two writers working on a blog. Our first user starts a transaction that reads a post’s title, writes to the post, and then reads that post’s title again. If a second user changes that post’s title in the middle of the first user’s transaction, the first user is going to see different values for the title across the two reads; or in other words, a non-repeatable read.
Here's what a non-repeatable read might look like in SQL:
### Transaction 1 ###
SELECT post_title
FROM posts
SELECT
post_title,
post_content
FROM posts
### Transaction 2 ###
UPDATE posts
SET post_title = "something_new"
WHERE post_title = post_title
Phantom Reads
If a transaction reads data and then a concurrent transaction inserts data that would have been read in the original transaction, that’s a phantom read.
Let’s use the same example as a non-repeatable read: if our second user adds content in between our first user’s two reads, the first read will be missing data that appears in the second read (this is actually really similar to a non-repeatable read, which is why the same example works).
In SQL, a phantom read might look like this:
### Transaction 1 ###
SELECT post_title
FROM posts
SELECT
post_title,
post_content
FROM posts
### Transaction 2 ###
INSERT INTO posts
VALUES "something_new", ...
To avoid all of this nastiness, most SQL databases follow a set of principles called ACID that prioritizes transactional integrity. We wrote all about ACID and concurrency in databases here, but the TL;DR is that databases use locks that prevent data from being read or changed while a transaction is making use of it.
How database locking works
So you’ve got your database — let’s say it’s a Postgres database — up and running, and you start a transaction. How does locking actually happen? Let’s take a look at two concurrent transactions and see how locking impacts them:
- Transaction 1 (T1) wants to read a user’s password and change it to a new one (
SELECT
andUPDATE
) - Transaction 2 (T2) wants to read that user’s password (
SELECT
)
Think of the leftmost column as a time axis. When a transaction acquires a lock, other transactions basically won’t be able to interact with the locked data. Here’s how these concurrent transactions will play together:
Time | Transaction 1 | Transaction 2 |
1 | Selects row #1, and acquires a lock on row #1 | |
2 | Tries to select row #1, but blocked by T1’s lock | |
3 | Updates row #1 and commits transaction | |
4 | Selects row #1 |
Notice that Transaction 1 is made up of two queries — a SELECT
and an UPDATE
— and even though the lock is acquired on that first query, it _remains _until the end of the entire transaction when the transaction gets committed.
The most common lock is on the row level, like in our example; but locks are maintained in database memory, so having too many of them active at any point can be computationally prohibitive. This means if your DBMS (database management system) is running out of memory, it might escalate a lock to a higher level to free up space. For example, if one database table has 40 different locks active on 40 different rows, the DBMS might escalate the lock to the table level so it can continue blocking while using less memory.
Locks can exist across larger sections of your data, and even on an entire database. System level updates might acquire a database lock, which would block all transactions on any part of a database. That kind of lock is pretty uncommon, but table level locks are used pretty often (like our escalation example above), and page or file locks (this means different things across different database setups) are also not infrequent.
Levels of database isolation
We’ve been assuming that a lock means that no other transactions can use the locked data at all. But that’s not entirely true; in fact, if it was, your database would basically never work at even low levels of concurrency. Databases deal in different isolation levels to help avoid lock craziness, and there are four (4) major ones:
READ_UNCOMMITTED
This is the lowest level of isolation, and does almost nothing. It means that transactions can read data being worked with by other transactions, even if the changes aren’t committed yet. READ_UNCOMMITTED
doesn’t avoid any of the three read problems that we outlined above. Let’s go back to our two transaction example and see what would happen:
Time | Transaction 1 | Transaction 2 |
1 | Selects row #1 | |
2 | Selects row #1 (even though it’s currently being used by T1) | |
3 | Updates row #1 and commits transaction |
Transaction 1 never acquires a lock when it starts, so Transaction 2 is able to slide in there and execute its SELECT
query. In this example, things turned out fine: but if T2 did its SELECT
query after the update from T1, but T1 got rolled back (aborted), we’d have a dirty read on our hands.
READ_COMMITTED
The next level of isolation is READ_COMMITTED
, which adds a little locking into the equation to avoid dirty reads. In READ_COMMITTED
, transactions can only read data once writes have been committed. Let’s use our two transactions, but change up the order a bit: T2 is going to read data after T1 has written to it, but then T1 gets rolled back (for some reason).
Time | Transaction 1 | Transaction 2 |
1 | Selects row #1 | |
2 | Updates row #1, acquires lock | |
3 | Tries to select row #1, but blocked by T1’s lock | |
4 | Rolls back transaction | |
5 | Selects row #1 |
READ_COMMITTED
helps avoid a dirty read here: if T2 was allowed to read row #1 at Time 3, that read would be invalid; T1 ended up getting rolled back, so the data that T2 read was actually wrong. Because of the lock acquired at Time 2 (thanks READ_COMMITTED
!), everything works smoothly and T2 waits to execute its SELECT query.
REPEATABLE_READ
The most popular isolation level is REPEATABLE_READ
, which (surprise!) helps avoid non-repeatable reads. This iso level locks data from the start of the entire transaction – including on SELECT
queries – as opposed to the start of the query, which adds an additional layer of isolation.
If you recall from above, a non-repeatable read happens when one transaction has two SELECT
queries, and another transaction is able to insert or update a row that makes those two SELECT
queries show different results. REPEATABLE_READ
isolation avoids that by letting the first transaction acquire a lock when it starts the first SELECT
query:
Time | Transaction 1 | Transaction 2 |
1 | Selects row #1, acquires lock on row #1 | |
2 | Tries to update row #1, but is blocked by T1’s lock | |
3 | Selects row #1, commits transaction | |
4 | Updates row #1 |
If you’re really clever, you might notice that REPEATABLE_READ
still doesn’t solve for phantom reads: Transaction 1 only acquired a lock on the row it was working with (Row #1), so Transaction 2 is still able to insert a row that might show up in T1’s second SELECT
query. The only way to avoid that is with a higher lock level than a row, which is our next level of isolation.
SERIALIZABLE
This is the highest possible level of isolation, and is a bit difficult to define: in practice, SERIALIZABLE
means that from the user’s perspective, transactions appear as if they’re executing sequentially, not concurrently. In other words, isolation is so intense that transactions will only execute concurrently if the end result is the same as them executing sequentially.
SERIALIZABLE
also uses range locks to avoid phantom reads. Range locks are a mechanism that’s in between locking a row and locking a table: if you’re running a SELECT
query with a WHERE
clause, range locks will lock some of the rows that exist close to your selected rows (before and after).
Wikipedia has a pretty great summary chart for wrapping up how these isolation levels help avoid the SQL standard read problems:
You can think of isolation levels as lock settings: they determine when locks get acquired and released, and how intense that process is. Most modern SQL databases use REPEATABLE_READ
as the default isolation level, but it’s a setting that you can change internally if you need (even on a transaction level).
Problems you’ll run into with database locks
If you’ve been following along, you’ve probably gathered that locks solve a lot of problems, but they also block data access; and that might not always work out well for your application’s user. There are three big ones to worry about:
Lock Contention
If a bunch of user sessions all require access to the same lock concurrently, they’re not going to be able to retrieve the data they need. A good example here is anything that gets incremented on activity, like a total monthly orders count in a metrics table. If every user order needs to increment that row, the constant locking is going to block your user requests. The cool kids also call this one a Hot Lock.
Long Term Blocking
Long term lock issues happen when you have some sort of transaction or job that requires a lock for a long period of time. If you’re running a batch job that needs locking access to a table or row for a while (think: 15 minutes), and that table or row is also heavily used by smaller transactions, they’re going to get stuck waiting while the job finishes.
Database Deadlocks
A deadlock happens when two transactions both need to wait for each other to commit...to commit. This might sound kind of funky, so let’s use our classic two transaction example and see what can go wrong:
Time | Transaction 1 | Transaction 2 |
1 | Selects row #1, acquires lock on row #1 | |
2 | Selects row #2, acquires lock on row #2 | |
3 | Tries to select row #2, blocked by T2’s lock | |
4 | Tries to select row #1, blocked by T1 |
Your DBMS will usually resolve this stalemate by rolling back whichever transaction is easiest to roll back, but this is also a setting you can tweak when configuring your database. Methods and Tools wrote a great paper about this entire “locking issues” topic and how it impacts QA if you want to go deeper.
If you’re building your app on top of a SQL database, check out Retool! Retool saves you a bunch of time and hassle by letting you drag and drop components like tables, search boxes, and buttons, and connecting them to SQL or Javascript on the backend. Retool also protects you against bad writes (no accidental DROP TABLE;) through a GUI based write interface.
Top comments (1)
Thanks for the article. Well-explained.
Some recommended reading from Postgres docs.
BEGIN UPDATE ... COMMIT
.Markdown correction:
Language tip: