The protection of digital assets is a multi-million dollar industry. Whether we're talking about military, financial or scientific data, each industry has to be prepared in the event of a loss, and plan for security. They often roll out extreme measures, going as far as having their own (and doubled) dedicated electrical power lines. But what about safeguarding your friend's latest BBQ party pictures? Or your little one's first steps video? Here's how I've learned my lesson from a tragic system failure and what my current setup looks like now.
A catastrophic failure
Back in 2008, I made myself a custom NAS (Network Attached Storage) using some old computer parts, a bunch of 500 GB hard drives and a copy of FreeNAS. The OS ran off a nifty 512 MB IDE flash based drive, and the data array was configured to use RAID5. That meant that if a drive was to be damaged, I could always put a new drive in the array and the data would rebuild itself. Note the conditional tense. That's because it worked flawlessly until we moved in 2010. And we stored the NAS in a box next to a speaker with a huge magnet for a month. And two drives failed.
I spent weeks trying to figure out a solution on how to rebuild the data I had lost. But after some time I had to face the reality of things. It was in vain. Years of family photos and videos, an entire MP3 collection, all my video games... It was all lost, forever. My girlfriend was in tears and my "geek pride" after spending all that time planning and building this whole system took a serious hit. I was using hot data as a storage, and at the time I had no backup strategy. Of course I had recovery options, hence the RAID5, but I was not prepared for such a catastrophic failure. And when it comes to computer security, you have to prepare for the worse.
Years later, I learned my lesson. So here is how I handle my digital life now.
My current setup
My current setup is mainly built around two things: a new NAS, that I bought and not built, and a backup software that automates how data is handled.
The NAS I'm using now is a Synology DiskStation DS214se. It's a very simple machine, running on a 800 MHz dual core CPU, with 256 MB of RAM and two hard drive bays. I put in there two 2 TB Western Digital Green hard drives, and configured a single array to be run in RAID1: everything that's on one drive gets mirrored to the second one. That means that I lose half of the hypothetical storage space but if one drive fails, I can change it and the data will automatically rebuild itself.
The NAS sits on top of an APC Uninterrupted Power Supply. If power is lost in my apartment, the NAS keeps running and I can manually (and safely) shut it down either by using its physical power button (which sends a power off command) or even my phone (my router is also plugged in to the UPS, so even without power I still have internet and network access for a few minutes).
My main backup strategy is handled by an amazing software called SyncBack Free. This software allows me to set up various backup scenarios, called profiles. The main profile is a physical backup to an external hard drive. When I bought the Synology NAS, I got a third 2 TB drive that is now used as a backup. This is my first failsafe. This is what lacked in my previous setup. Once the backup task is done, that drive is stored offline and off-site, so it doesn't have to suffer from electrical malfunctions. And even in the event of a fire or flood at my place, my data is safe.
SyncBack then runs two more jobs. Amongst all the data I've lost with that old setup, the loss of family photos were the hardest to cope with. One can always replace music or movies they used to love, as there's a never-ending stream of entertainment to consume. But memories do fade away, and are impossible to retrieve. So I've decided to add another redundancy layer to my backup strategy when it comes to photos and store them online, in my Google Drive. SyncBack compares the content of the NAS folder and my Google Drive, and updates the later with the former before performing a Cyclic Redundancy Check of each file to see if they are the same on both sides.
I should note that I could use two different apps on the NAS and have it handle these two backups automatically: USB Copy and Hyper Backup. After trying out both apps in different scenarios, I've decided not to use them as they either store data in a proprietary format (Hyper Backup) or add a bunch of
._ prepended metadata files to my existing directories (USB Copy). I like the fact that if I ever need to retrieve my files outside of Synology's ecosystem, I still can use a good old
cp command to get my files back.
But wait, there's more!
So my data is stored on a RAID1 array, and on an offline hard drive. And the photos are backed up online on my Google Drive. I could have stopped there but I thought that it was not enough. Thanks to my Amazon Prime subscription, I can upload an infinity number of photos on their Amazon Drive cloud service and it won't affect my otherwise limited quota. So hey, let's take this opportunity! Another SyncBack profile backs up the content of my Photos directory to Amazon's servers. I like the fact that my data is stored on two different storage providers. Google and Amazon each have their own infrastructure, so in the event of a failure of astronomical proportions at either one of these places, I may still be safe.
But why stop there? My photos are stored on four different locations now (The NAS, the external hard drive, Google Drive and Amazon Drive). But what about the rest? My music, my documents, my family videos? Well of course they're on the NAS and the external hard drive, but I figured I needed another failsafe. Because so far my backup strategy relies on what could constitute a single point of failure: SyncBack. If the software behaves badly or one of my backup profile is not properly configured, I may end up with nothing but a bad save on various locations. I also don't have access to the external hard drive that easily, so if I need to do a backup at any given time, I need to prepare the operation at least a day in advance.
That's why I took a subscription to Synology C2. It is a fully integrated service that runs natively on DSM (DiskStation Manager: Synology's own operating system) and allows me to back up the whole NAS (minus my movies and TV shows, these are not important) to Synology's servers. It uses AES-256 to locally encrypt the data before sending it on the network. I've set it up so it does an automated backup every first day of the week, and then do an integrity check two days later.
I also considered Online's C14, as they're really cheap and you can send files over (S)FTP but unfortunately they do not support Synology.
Room for improvement
So this is what my current setup looks like now:
Each file is physically stored on up to 6 different location, with various levels of failsafe measures.
Is this setup perfect? Of course not. First and foremost, it lacks automation. I still have to start each backup task (except for the C2 one) manually, and it is prone to error. I'm working with live data so the array is constantly changing, but this is a backup, not a long term cold storage. And the external hard drive I'm using has to be transported and manipulated, so that's another weak point in the system.
One thing I'll probably change soon is the model of the hard drives I'm using. WD Green are "fine" but they are not made for being used in a NAS. So I think I'll switch them for either WD Red or Seagate Ironwolf line, and probably take the opportunity to do a slight storage upgrade to 3 or 4 TB.
All in all, the main problem with backup strategies is that they're never perfect. Just look at what happened at GitLab a few months ago, or even the catastrophic failure that brought OVH to its knees for hours.
One cannot be fully prepared against data loss. Still, I can say that I feel somewhat confident with this strategy, and I've tried to think about every scenario (even solar flares, but they're a whole another animal). We'll see how and where my data sits in a few years.
Top comments (21)
"And we stored the NAS in a box next to a speaker with a huge magnet for a month. And two drives failed."
I promise you, that's not what killed your hard drives. :)
It is many times more likely that them just sitting in storage, after presumably running 24/7 for years, is what did them in. Magnets don't "kill" hard drives; at worst they'd wipe the data (which never happens with home/speaker magnets), not wreck the drive itself.
Anyway, good to see someone spreading the word of "an external drive is not a backup for you photos", and providing some useful info/ideas!
Ok, one last thing: "RAID is not a backup". RAID is for uptime, not backup. :)
Interesting. I never suspected the fact that just storing the hard drives could have caused that failure. I'm still suspecting the magnet in the speaker because I had faulty sectors before the drives gave way.
Ha, we'll never know. :)
And thanks for pointing that out: RAID is not a backup. That should be said over and over again.
Now the vibrations from the speaker could def bork a drive or two. Some of the drives aggressive head parking along with some vibrations :D. Did you get smart output from the drives?
Great article, my backup strategy is very similar to yours, except that am using Synology Hyperbackup for the Synology->USB disk backup. I hadn't considered the proprietary aspect of this app, however I believe that if you specify the single-version task, a straight 1:1 copy of the source files is made, in a non-proprietary format.
Yeah to be honest, after using this setup for quite a while now, I'm less concerned about the "proprietary" aspect of Synology C2. I'm fairly confident that I'll be able to retrieve my data in case of emergency. The system has been flawless so far. Fingers crossed! :)
I wasn't aware of Synology S2 until I read your article. I checked a simple text file backed with Hyperbackup (single-version mode), and it wasn't altered in any way.
Main NAS (location A) with 4x3TB WD Red NAS in a MicroServer Gen8 8GB RAM ECC with FreeNAS(raidz2) network bonded. APC 1500VA connected and auto poweroff when no battery at all.
Secondary NAS with MicroServer Gen8 and old 4x2TB WD/Seagate disks with FreeNAS(raidz2) and network bonded. APC 1500VA with auto poweroff.
Periodic snapshot of important files in location A that are synchronized nightly to location B vía vpn connection.
Small (and old, but brand new after a warranty replacement) LTO3 External SAS Ultrium Unit connected to the FreeNAS in B location, daily basics backup of this snapshots for out of office backup strategy in LTO Cartridges.
No Cloud sync at all, for the moment.
Thanks for share!
Oooh that MicroServer is one sexy box! The problem I had with FreeNAS is that it's a software RAID, and at the time it was not without its flaws.
Sorry for the bad news today but Synology is not HW Raid neither.
FreeNAS use the disks in JBOD mode, directly attached to the system, the same way Synology does but with ZFS vs EXT4 in Synology Hybrid RAID.
If you get a shell into your NAS you'll see the mdX devices created by Synology when setup the raid 1 with mdadm (software raid).
The advantages using FreeNAS are the use of ZFS and all the stuff behind this file system, snapshot, replication, etc.
I move from my old setup (DS508 + DS413j) to FreeNAS thanks to ZFS.
The synology web interface and all the apps into the ecosystem are the best value of this solution, but I'm still prefer ugly interface but a lot of more control to what's happen with my data and disks.
You're totally right in that regard, and I suspected it was not HW RAID (maybe their top of the line products are, I don't know). But at the end of the day, I'd rather rely on a product from a company that is specialized in such solutions than my own skill which have proven not to be that reliable. :D
If I ever try again the DYI approach, I'm more interested in products like unRAID which seem really solid.
Thanks for your feedback!
Great article. Consumer-level backup strategies are one of those things that you don't know you got wrong until you got it wrong!
I also have multiple layers in my system. The core is a five-disk RAID10 array (striped and mirrored with a hot swap disk) made up of WD Red disks. On top of this is an instance of Seafile, an open-source cloud implementation that I've been using for several years now. What I like about using a cloud approach is that any machine that is syncing to the cloud also stores files locally, so if the cloud goes down the local files are still viable. It also is a versioning system, so I can retrieve a file from any time in the past if necessary.
The Seafile cloud is exposed to the internet, so all of my files are available whether I or anyone in the family is at home, in the office, or traveling. I have zero trust in commercial cloud providers, so I'm basically creating my own service. Nothing is stored outside my control.
The Seafile server is backed up nightly via rsync to a second NAS that is simply mirrored. I built both of these machines, so they are easy to fix if something goes wrong.
Thank you for this article. Was really very helpful. Still, I don't get why you don't run automatic backups? I have took a look to SyncBack software and I saw it support scheduled backups. Why you don't configure it?
Hey thanks a lot for your comment and for taking the time to read my article. :)
To answer your question: I'm not using scheduled backups with SyncBack because the hard drive I'm using as a backup is not stored in my house. In order to still have acces to my data in the event of a site failure (flood, fire, theft...) it is stored offline and off-site.
I am using scheduled backups thanks to Synology's C2 cloud service though. Everything is backed up automatically on Synology's cloud every week.
Thank you for your insight.
I'm building my system now so.. I think I have good ideas here. Thanks!
Great article, I've learnt the hard way myself so I feel your pain over the data loss.
One thing I'd add to your thorough write up is to consider the disks you put into any RAID machine. I've had a pair of Seagate disks bought together fail within hours of each other (after several years of identical use) . If they'd both been in a RAID 5 set I would have lost everything before the array rebuilt after the first failure.
Now I aim to source disks separately so they come from different manufacturing batches (and even go for different makes of disk if the array allows)
This is a GREAT tip. I'll definitely take that into account when I upgrade the disks to WD Red :)
Of course, the flip side to this is there are now four places your data can be stolen from :)
Very informative post. I have a friend who works with video and should probably set up a proper NAS, or at least upgrade from "copy everything important to a thumb drive once in a while" so this is helpful.
Thanks for your reply!
It's true security can be a problem for some. Then again I guess the last paragraph can apply to the security of your data as well. It's about how confident you feel.
That's the reason I encrypt the data I'm sending to C2 with AES-256. As for the other backups, I could use a Truecrypt container to be extra safe.
Your friend already has a backup strategy which is simple, it's still better than what I had with my old setup. He just needs to add a bit of redundancy to be extra safe. :)
To help automate your tasks, I would highly suggest use rsync. I use rsync to connect to several of my machines via SFTP (secure with private keys) and sync my backups in multiple places. With the Synology Task Scheduler (similar to cron jobs) I run rsync on a schedule to automate my backups. Once you have the SFTP SSH access setup its super easy to just use a script under task scheduler.
Why WD green and not red?
Simply because at the time I was not aware of these drives (they came out a year prior to when I got the NAS), so I was a bit clueless about what to get. I just had a good experience with WD as a brand and went with what I knew. ;)