Discussion on: Fire Drills for Software Teams

View post

The idea of doing fire drills is great. We lost our production database a few years ago (I had made a backup to do some testing with a few months earlier) and there weren't any recent backups. Both disks on the server failed at the same time. We lost about six months worth of data. The only way we were able to recover any of the data was due to the backup that I had made. The year after that we had another failure (not sure the result of that one). Luckily, I had my databases stored on a different server at the time. I think this post contains some well thought out tips and I will definitely share it with our IT team.

Matt Holford • Aug 2 '17

Thanks, Tim! Another great practice we started last summer (also based on a suggestion from CJ Rayhill) was to keep a spreadsheet of every critical system and specify how it gets backed up, whether the backups are automatic, and how old the most recent backup is (especially if it's not yet automated).

Our DevOps lead reviews this weekly, and updates a cell on the spreadsheet with the timestamp of the latest review. This has helped expose and track our practices across very different systems.