Why you simply don't drop a huge InnoDB table in production...

#mysql #innodb #todayilearned

Context

Sometimes it happens that you have large unused tables in production and you think it would be a good idea to reclaim the space that is not being used. DON’T ever do that. At least, not immediately. This was exactly what I tried to do a few days ago with a 624GB table and discovered it can put you offline, even if no one else is using the table.
It was part of an improvement package that we created aiming to remove non-transactional data from our transactional database i.e. some logs and non-critical info tables. These were previously created on separate databases and their contents were already copied. After asserting no queries were touching those tables, we decided to drop them. When I issued the first DROP, it became stuck. I immediately started to look at what was going on: “Is that the correct table?”, “did it deadlock?”, nothing. And then:

[2020-03-12 10:57:04] completed in 58 s 684 ms

Image 1: DROP impact (y-axis is in milliseconds instead of seconds).

Our alerts suddenly started to blip, showing some requests that have passed the execution time SLA. I knew it was my mistake and immediately informed everyone I possibly could. About two minutes later the glitch was gone, our clients were advised and I started to dig “How the hell did that drop cause the database to stall?”.
One of the first blog posts I opened showed me the reasons: Slow DROP TABLE. That post opened my mind about how critical a drop statement is. The post states that drops can be slow on ext filesystems but I am not sure what is the filesystem in our production server is since it is AWS RDS and, for the same reason, it doesn’t matter as I am not able to change it. It is probably ext3 or ext4 BTW. According to the author, the worst performance involved in a DROP statement is the removal of huge data files and there is no other viable solution other than changing the server filesystem.

Understanding what happened

InnoDB is the most used storage engine in MySQL databases all over the world. It has a not so simple, yet not too complex architecture, that provides a reliable transactional storage engine. To make concurrent writing safe, InnoDB organizes data in such a way that removing data from a table is worthless, meaning space occupied by a row will not be freed right after you delete that row and instead, a delete mark will be placed in the row header making it ignored in forwarding reads. It does not make any difference if the table you are about to drop is with or without data if originally there was tons of rows in it. The only way to reclaim that unused space is via administration statements like ANALYZE TABLE or OPTIMIZE TABLE, both commands will end up shrinking the data and the index files.
A DROP statement in MySQL locks the LOCK_open mutex causing almost all other threads to hang until the statement is completed, most of all because the table needs to be removed from the shared Table Definition Cache (TDC) and this is the mutex used to serialize the TDC updates. In addition to deleting the datafiles and removing the table from the database dictionary and from the TDC, InnoDB needs to get rid of any data pages that can be in the well known InnoDB Buffer Pool. The larger the buffer pool and the data files, the more time it will take for this operation to complete. In this case, the data file was 624GB and the buffer pool about 93GB split in 8 instances of ~12GB each.

Image 2: InnoDB storage schema.

There is one thing that could save you if you only want to drop the table and do not care about freeing the reserved space: The innodb_file_per_table config when disabled will cause MySQL to not try to delete any data file, and it can speed up the process IF AND ONLY IF your buffer pool isn’t that big, otherwise you have no other option than addressing the table size before dropping it.
That wasn’t my case so I needed to dig further and then confirmed that the TRUNCATE TABLE operation is pretty much a DROP + RECREATE operation, acquiring the lock during all the process and thus it is not possible for us to use it before dropping the table as the result would be the same.
The OPTIMIZE statement also locks the LOCK_open mutex, but for a shorter period since it is far less critical than dropping the table. From the MySQL documentation:

"As of MySQL 5.6.17, OPTIMIZE TABLE uses online DDL for regular and partitioned InnoDB tables, which reduces downtime for concurrent DML operations. The table rebuild triggered by OPTIMIZE TABLE and performed under the cover by ALTER TABLE ... FORCE is completed in place. An exclusive table lock is only taken briefly during the prepare phase and the commit phase of the operation. During the prepare phase, metadata is updated and an intermediate table is created. During the commit phase, table metadata changes are committed."

Thinking about a solution

Summing it all up, during those 58 seconds the DROP thread was with the LOCK_open held, impeding any other thread to have read or write access to the shared TDC. During this time, the thread was: removing any data related to the table from the buffer pool, deleting the index files, deleting the data file, deleting the .frm table structure file, removing the table definition from the database dictionary, removing the table definition from the shared cache and then releasing the lock.
There are two operations that, depending on the size of the file, will be extremely slow: scanning the buffer pool and deleting the file itself. So how do we fix the file size if we cannot simply drop nor truncate the table? The answer can be DELETE FROM + OPTIMIZE TABLE + DROP TABLE:

// Please READ IT ALL before issuing any statement in your production database
DELETE FROM target_table; // hold on and check notes about DELETE below. 
OPTIMIZE TABLE target_table;
DROP TABLE target_table;

The OPTIMIZE statement will reorganize the indexes structures and the data stored in the data file to optimize storage size, meaning it will defrag the data and reclaim the unused space, shrinking the file. Despite it being a key factor with this possible solution, the most important thing here is the way you clean up the table before optimizing and dropping it. You cannot simply TRUNCATE it, otherwise, you will perceive the same effect caused by a DROP operation. The best way is to delete all rows in the table and, although it is also the slowest way, it is the only online way.

Testing the solution

During some tests in a separate environment, we could confirm that the OPTIMIZE TABLE operation is done online and we were very comfortable with it, so we planned a new maintenance window and choose a 1.5TB empty table to test it in production. Surprisingly, we could observe the same scenario: a lot of processes with the Opening Tables status in the PROCESSLIST.
I immediately killed the query but our server was stuck for another 50 secs approximately.
The OPTIMIZE TABLE process completed since it was already at the commit phase when the thread was killed.
The occupied space was freed, but we had no way to issue a DROP after that. “What the hell is going on again?”.
Another deep research round and a colleague found out this GitHub thread. It states that there is practically no way to avoid a complete database stall during a DROP / OPTIMIZE statement in a high concurrency environment. Their tests have shown that the impact can be reduced if the table is empty, but the real truth is that it is currently an open bug in MySQL and affect all its versions. There isn’t a solution yet and there is no workaround.
In conclusion, if you need to DROP a huge InnoDB table, you better be prepared for some downtime.