DEV Community

Nobu
Nobu

Posted on

Deleted a table in production and lost four more table data with ON DELETE CASCADE

What happened

I was asked to investigate data in the production environment. Before the investigation, I was using MySQL Workbench to delete unnecessary data on the production DB.

When investigating the data in the production DB, I mistakenly executed delete instead of select in SQL in MySQL Workbench, and deleted an entire table.

DELETE FROM posts;
Enter fullscreen mode Exit fullscreen mode

ON DELETE CASCADE was set for the parent table, so four more table data were lost one after another.

Image description

If you would like to know more about ON DELETE CASCADE.
https://www.geeksforgeeks.org/mysql-on-delete-cascade-constraint/

Table structure (table names are given as examples and may differ slightly from the actual ones)
Screenshot 2020-11-22 12.12.55.png

Correct settings

When a user is deleted from the users table, the records in the child tables linked to that user_id are also deleted.

Image description

comments table

CONSTRAINT `comments_ibfk_1 ` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE 
Enter fullscreen mode Exit fullscreen mode

likes table

CONSTRAINT `likes_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE 
Enter fullscreen mode Exit fullscreen mode

points table

CONSTRAINT `points_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
Enter fullscreen mode Exit fullscreen mode

posts table

CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
Enter fullscreen mode Exit fullscreen mode

Incorrect setting

When you delete an entire child table posts table of the users table, the users records linked to post_id are also deleted.

Furthermore, the child table records linked to that user_id are also deleted in a chain reaction.
Image description

users table

CONSTRAINT `users_ibfk _1` FOREIGN KEY (`comment_id`) REFERENCES `comments` (`id`) ON DELETE CASCADE 
CONSTRAINT `users_ibfk_2` FOREIGN KEY (`post_id`) REFERENCES `posts` (`id`) ON DELETE CASCADE 
CONSTRAINT `users_ibfk_3` FOREIGN KEY (`like_id`) REFERENCES `likes` (`id`) ON DELETE CASCADE 
CONSTRAINT `users_ibfk_4` FOREIGN KEY (`point_id`) REFERENCES `points` (`id`) ON DELETE CASCADE
Enter fullscreen mode Exit fullscreen mode

comments table

CONSTRAINT `comments_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE 
Enter fullscreen mode Exit fullscreen mode

likes table

CONSTRAINT `likes_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE 
Enter fullscreen mode Exit fullscreen mode

points table

CONSTRAINT `points_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
Enter fullscreen mode Exit fullscreen mode

posts table

CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
Enter fullscreen mode Exit fullscreen mode

Dealing with the problem

This time, we took a backup of RDS every day, so I was able to restore it to its original data.
The problem occurred at about 10:45 that morning, and I was able to restore RDS at about 13:00. The restoration time may vary depending on the amount of data.
It happened during a meeting, so I was able to report the problem immediately.
Backups are very important.

Measures to prevent a recurrence

  • Only senior engineers or managers have editing rights (delete, etc.) to the production environment DB, and other members have read-only rights.
  • It was bad to have ON DELETE CASCADE set in the parent table at first, so I removed it from the parent table. I don't know how it was implemented, but it's clearly an anti-pattern, so I removed it.

Thoughts

Since I became an engineer, I had never made a big mistake until this incident, so I think I was pretty relaxed.

Also, during a meeting, we needed to investigate the data, and it was not good that I looked at the production database while talking.

When I executed DELETE, my mind went blank,
but afterwards my colleague encouraged me by saying, "I've had a lot of experiences like that. Don't worry about it," which made me happy.
I think that this is how we grow as engineers, step by step, even as we make mistakes, so I hope I can use this failure as a stepping stone to grow myself!

Top comments (3)

Collapse
 
nicolus profile image
Nicolus

Haha yeah, that's pretty terrible. Good job on minimizing the downtime though and hopefully not lose too much data.

A few notes that come to my mind :

  • Unless there's no way to DELETE a "post" or whatever object it is in reality, the issue was probably there in a much more insidious and hard to debug way, like users mysteriously losing their account after deleting something.
  • Another way to deal with accidental DELETEs is delayed replication : You could have one of your replicas intentionally lag an hour behind, and whenever you make a mistake like this you immediately stop replication on this server and get back the data from it (you could probably even replay replication logs manually up to just before you made a mistake). That way you lose as little data as possible, without having to take dozens of backups each day.
  • When investingating data I make a habit of always connecting to a replica (which are all in super_read_only mode), that way I'm sure I cannot modify anything and if I make a very large SELECT it doesn't impact performance on the source server. I also have the window in phpStorm/Datagrip be bright red whenever I'm connected to the source server with write access so that I (and people watching me work in a meeting) immediately know we're in dangerous territory.
Collapse
 
federicorazzoli profile image
Federico Razzoli

The way to deal with this type of mistakes is to have a delayed replica. In case of a mistake like this (which is always possible), you can take a dump of the deleted data from the replica and load it into the master.

Collapse
 
mrpercival profile image
Lawrence Cooke

We learn from our mistakes.
Backups are critical to any database, first thing you should do is build a backup system.

I doubt there is many Devs who haven't made some epic mistake, not always the same kind of mistakes but I Think we all have some terrible story to tell how something went horribly wrong for us.

You are not alone on making mistakes, it's how you come out of it that matters, and what you learn from it.