In our office we have a TV with a performance dashboard, so everyone can see possible problems in the system at any time. Amongst other things we show the TTFB on this dashboard. Those two spikes are not what you want to see, when you are just about to leave the office for lunch.
What happened?
We started looking around what’s going on and found out the following in that order:
- the number of workers on the app-server rose dramatically
- but the CPU consumption did not
- the number of connections to the database reached the upper limit
- but the CPU consumption did not rise (in fact it decreased)
- and finally we start to see the following error messages in our monitoring tools:
Deadlock found when trying to get lock; try restarting transaction
- by that time the system was not responding anymore
- we checked the processlist from the database to see what it is doing
- and found loads of
REPLACE INTO
statements and several bigDELETE
statements (running since minutes) on the same table
What is so special to that table?
It’s a cache lookup table. Everytime a page gets stored in our frontend cache, we do store some meta information about that cache entry in that database table, mostly dependencies to object data. And when that object data changes (for example the article stock), we fetch all cache entries depending on that object, flush them from the frontend cache and remove the information about it from the lookup table via a simple DELETE
statement.
New entries in that lookup table are not created via INSERT
, but via a REPLACE INTO
statement. This is because it could happen, that two requests hit the same page, both are a cache miss and both try to create an entry in that lookup table. Easy solution …
The InnoDB gap lock
While digging deeper into that problem with our deadlocks, i found an article from Jervin Real from Percona about the InnoDB gap lock where he states the following:
For a non-
INSERT
write operation where theWHERE
clause does not match any row, I expected there should’ve been no locks to be held by the transaction, but I was wrong
Until the moment i read that, i expected the same. But when you read that article from Jervin and the aging bug report in the MySQL bugzilla about the gap lock, you will understand that this makes perfect sense and you start to think about a REPLACE INTO
statement being the right thing in this situation.
How we solved that problem?
We just replaced that REPLACE INTO
statement with a INSERT ON DUPLICATE KEY UPDATE
statement, which does exactly this. If the primary key is found, it updates that record, if not, the record gets inserted. This solved the problems with the deadlocks in that case.
You think that REPLACE INTO
works exactly that way? Keep on reading.
The lesser known problem with REPLACE INTO
Do you know what REPLACE INTO
really does? Let’s see:
CREATE TABLE a (
id INT(1) UNSIGNED NOT NULL PRIMARY KEY,
data VARCHAR(255) NULL
);
CREATE TABLE b (
id INT(1) UNSIGNED NOT NULL PRIMARY KEY,
fk INT(1) UNSIGNED NOT NULL,
data VARCHAR(255) NULL,
FOREIGN KEY (fk)
REFERENCES a(id)
ON DELETE CASCADE
);
INSERT INTO a VALUES (1, 'test');
INSERT INTO b VALUES (1, 1, 'test');
REPLACE INTO a VALUES (1, 'data');
What do you think will the result of a SELECT * FROM b;
look like?
Exactly, the table is empty. This is because the REPLACE INTO
first deletes the found record (which triggers the delete cascade) and then creates a new one. This might be expected behavior, depending on the situation, but to my understanding this would normally not be the intended behavior and at least this was new to me.
Top comments (0)