As I mentioned earlier in the previous part, the READ COMMITTED isolation level does not guarantee that our read results are repeatable. Non-repeatable read and phantom read are the two phenomena that we would resolve in the next isolation level. It is called REPEATABLE READ.
Repeatable Read
Let me remind you a bit about the example that we used last time for the demonstration.
> describe account;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| owner | varchar(50) | NO | PRI | NULL | |
| balance | int | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
2 rows in set (0.03 sec)
> select * from account;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary | 210 |
| Tom | 200 |
+-------+---------+
2 rows in set (0.00 sec)
First, we open two connections to our local MySQL server. REPEATABLE READ is the default isolation level of the InnoDB engine. To verify this, we can run a command to check for the value of the @@transaction_isolation variable. In case it has been changed before, we can run the following command to make sure that the isolation level on both sessions is REPEATABLE READ.
:: BOTH TERMINALS
> set session transaction isolation level repeatable read;
Query OK, 0 rows affected (0.00 sec)
> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| REPEATABLE-READ |
+-------------------------+
1 row in set (0.00 sec)
In terminal B, let's re-run the two queries (which cause us some troubles in the READ COMMITTED level).
:: TERMINAL B
> begin;
Query OK, 0 rows affected (0.00 sec)
> select * from account where owner = 'Tom';
+-------+---------+
| owner | balance |
+-------+---------+
| Tom | 200 |
+-------+---------+
1 row in set (0.00 sec)
> select * from account where balance < 220;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary | 210 |
| Tom | 200 |
+-------+---------+
2 rows in set (0.00 sec)
Now let's update Tom's account in terminal A. You might be tempted to also check for the dirty read phenomena.
:: TERMINAL A
> begin;
Query OK, 0 rows affected (0.00 sec)
> update account set balance = balance + 50 where owner = 'Tom';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
:: TERMINAL B
> begin;
Query OK, 0 rows affected (0.00 sec)
> select * from account where owner = 'Tom';
+-------+---------+
| owner | balance |
+-------+---------+
| Tom | 200 |
+-------+---------+
1 row in set (0.00 sec)
> select * from account where balance < 220;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary | 210 |
| Tom | 200 |
+-------+---------+
2 rows in set (0.00 sec)
So far so good, no dirty read. Let's commit transaction A and check for the result of those two queries again (in transaction B).
:: TERMINAL A
> commit;
Query OK, 0 rows affected (0.01 sec)
:: TERMINAL B
> begin;
Query OK, 0 rows affected (0.00 sec)
> select * from account where owner = 'Tom';
+-------+---------+
| owner | balance |
+-------+---------+
| Tom | 200 |
+-------+---------+
1 row in set (0.00 sec)
> select * from account where balance < 220;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary | 210 |
| Tom | 200 |
+-------+---------+
2 rows in set (0.00 sec)
The results are the same as the first time we executed those two queries. We now get rid of non-repeatable read and phantom read.
Notice that if we read the standard definition of the isolation levels, we might see that REPEATABLE READ only guarantees to eliminate non-repeatable phenomena, while allowing phantom read to occur. However, some RDBMS (like MySQL and PostgreSQL) choose to also eliminate phantom read in REPEATABLE READ isolation level.
In short, with MySQL, REPEATABLE READ guarantee that our reads are consistent. However, there are a couple of things that we need to notice here.
Consistent Read
First, I want to emphasize that "our reads are consistent". It means that when we read the row for the first time in our transaction, the DBMS will take a snapshot of that row. Then, as long as we didn't make any change in that transaction, later read results of that row will stay the same as that snapshot.
However, what if we make some changes in that transaction? Let's say we subtract $50 from Mary's balance in transaction B.
:: TERMINAL B
mysql> update account set balance = balance - 50 where owner = 'Mary';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from account where balance < 220;
+-------+---------+
| owner | balance |
+-------+---------+
| Mary | 160 |
| Tom | 200 |
+-------+---------+
2 rows in set (0.00 sec)
As we can see, Mary's balance changed to $160, while Tom's balance remains at $200. We can see one interesting property of this isolation level: the SELECT statement might return a set of rows that never exists in the database.
Before transaction A, those two balance values (on-disk) should be ($210, $200). After transaction A, it should be ($210, $250). And after the update in transaction B, it should be ($160, $250). The result ($160, $200) that we see is just a combination of the real Mary's balance and the snapshot of Tom's balance.
Serialization Anomaly
So we can get the feeling of what is called "consistent read" in REPEATABLE READ. Let me ask you one more question. If we increase Tom's balance by $50 in transaction B, what would be Tom's balance after we run the SELECT statement again?
Would it be $250? Or would it be $300?
:: TERMINAL B
mysql> update account set balance = balance + 50 where owner = 'Tom';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from account where owner = 'Tom';
+-------+---------+
| owner | balance |
+-------+---------+
| Tom | 300 |
+-------+---------+
3 rows in set (0.00 sec)
It was $300. So the balance value that got increased is the actual balance value that resides on the disk. Let me summarize what's happening here:
- In transaction B, we read Tom's balance and know that he has $200.
- Then we increase his balance by $50, and when we read his balance again, he suddenly has $300!!!
Our transaction somehow still gets interfered with by other transactions.
And if you don't see that it's big trouble. Let's think about this scenario (I changed the number to emphasize how bad it can be):
- I want to subtract $50 from Tom's account if he has enough money
- I open a transaction, run a SELECT statement and realize that he has $100
- At this very moment, someone opens a transaction, check and see that Tom has $100, then he subtracts $60 from Tom's balance and commits immediately
- Since I've checked that Tom got $100, I proceed to subtract $50 from this account
- Now, his on-disk balance value is
-$50(update: -$10), which is an inconsistent state
This is a serialization anomaly. It means that transactions A & B interfere and result in an anomaly state (a state that cannot be achieved if we run A before B or B before A).
This is also the last phenomenon that we need to solve. As long as we can guarantee that there is no serialization anomaly, we can confidently say that our transactions are isolated.
The solution turns out to be very familiar: we need to apply a certain kind of locking mechanism to our read/write operations. Anyway, we will discuss the solution more in the next post, along with the last isolation level.
Top comments (3)
Now, his on-disk balance value is -$50, which is an inconsistent state
It should be $10.
100-60 = 40
40 -50 = -10
Thank you both for the correction!