I have recently been working on some automation to regularly restore all of our live databases from AWS RDS into copies for use as a test environment. This includes some data scrubbing to remove sensitive information, real email addresses, etc, but allows testing against a production- like environment.
We have MS-SQL and Aurora MySql databases, and I have made use of the AWS APIs for this task. In my case I used the .Net APIs and built a console app to call them. While the APIs are well documented and quite easy to use, I have still found getting the right combination of calls has taken some time.
Firstly, the API call to restore a DB from a snapshot will, quite rightly, refuse to override an existing instance. However in this case we want to override the existing environment each time we restore. So I had to build that into my app.
First, call DescribeDBInstances with the name of the target instance. If an instance is found, then call DeleteDBInstance to delete it. Be aware that this call throws a DBInstanceNotFoundException if the instance does not exist, so you need to catch this and move on, as this is a perfectly valid scenario.
One thing to note with all of these calls if that the API will trigger a task in the cloud, but not wait for it to complete. So your DB instance will exist with status "Deleting" for a while. I did not find a way to subscribe to events which will tell you when this has completed, so I implemented some simple polling, calling DescribeDBInstances every 2 minutes until the instance no longer exists.
Once you are sure the target instance does not exist, you can call RestoreDBInstanceFromDBSnapshot to do the restore. This assumes you have a snapshot available, and the best bet is to have regular snapshots generated from the live DB automatically. You'll need to find the latest snapshot for the source DB, using DescribeDBSnapshots. Return all available by DB instance name, and then sort descending by date to find the latest.
Once again, you'll need to wait for the restore to complete, so again, I did some polling using DescribeDBInstances to wait until the target instance existed, with a status of "available".
Once you have a newly restored database, you may need to apply security groups, and you will probably want to change the master user password, so test and live don't have the same credentials. This can all be achieved using ModifyDBInstance. One gotcha I found is to ensure that "ApplyImmediately" is set to true, if you need to subsequently connect to the DB to complete further tasks. Otherwise AWS will wait for a maintenance window to apply the changes, and you don't want that.
Next, you can apply some data scrubbing or whatever you want to do to turn your prod data into test ready data. For this, you can simply connect to the database using whatever method you prefer, and run your scripts.
Now, this is all well and good, but what if you have databases in multiple regions? What if you have a UK database, which you want to replicate as a test environment, but all your testing happens in Australia, so you'd rather your test DB sits in Aus for better latency . Well, you can't restore a Snapshot across regions, but you can copy snapshots across regions. Simply call the CopyDBSnapshot command, connecting to the target region, passing the "DBSnapshotArn" of the snapshot in the source region that you wish to copy.
Then, once it is copied, (Check status using DescribeDBSnapshots in the target region), you can proceed as above.
So that covers single instances in MS-SQL, but what about Aurora DB clusters? It's a bit more complicated and a bit less intuitive.
First, call RestoreDBClusterFromSnapshot. The main gotcha here is that you need to specify the engine, even though the documentation implies you can leave it out. In my case, Engine = "aurora-mysql".
As above, you can poll for availability (call DescribeDBClusters) then call ModifyDBCluster to apply security groups and password changes.
Here is where it's a bit unintuitive. If you were to do this via the web dashboard, the cluster and an instance within would be restored, however the API does not do the second step. So you end up with an empty cluster, which seems useless, but the cluster contains all the info you need to spin up instances within it.
So you now need to call CreateDbInstance.
Pass in the following parameters:
DBClusterIdentifier = (target cluster),
DBInstanceIdentifier = (target instance name),
Engine = "aurora-mysql",
PubliclyAccessible = true (Note this defaults to false, even if the original you restored from was public. This caught me out and I wasted some time wondering why I couldn't connect).
So there you go. Everything you need to know to set up a restore of production AWS databases into a test environment. Simply schedule your app to run using whichever build server or scheduling tool you like, and away you go. You can restore a new environment, every night, at the start of a sprint, or whenever you wish.
Top comments (5)
I was also curious to learn about the app. I have been tasked with something similar with a few additional caveats and was interested in more information about the app itself, being as I am not a application developer. Is there a sample on Git?
Hi! As per the below comment (just now), I haven't made it available at this stage. The app is a very simple console app that includes the AWS nuget packages and calls the appropriate APIs.
I wrote this bash script based on Jeremy's procedure y'all can use as a starting point. Just be aware that the public IP changes when you blow the instance away and re-create it.
Hi Jeremy, Is code of this app available? Thanks
Hi! At this stage no. It's a pretty simple console application, include the AWS SDK packages from Nuget, parse whatever command line parameters you want to include and fire off the calls. The AWS documentation is pretty good.