DEV Community

Cover image for Cross region replication for AWS RDS MySQL
Ankur Shukla
Ankur Shukla

Posted on

Cross region replication for AWS RDS MySQL

When you work with databases, data retention and recovery become major concerns since your machine could go down. Being in cloud simply means you have offloaded infra responsibility to somewhere remote. This holds true both for provisioned (like EC2) and serverless capacity.

Importance of RTO & RPO

RTO or recovery time objective refers to the time taken from failure to recovery for DB. For instance, if it would take DBA an hour to bring DB to running state and making it available to applications, your RTO is 1 hour.
RPO or Recovery Point Objective refers to the point in time till which you can restore the data. For instance, if we take backups daily once at fixed time, in the event of a disaster, one would loose up to 24hrs of data since the last backup could be up to 24hrs old.
RTO and RPO make it easy to quantifiably define how much and how fast data can be recovered in the event of DB instance failure. With right approach, one can dramatically reduce RTO and RPO numbers, and at the same time reduce costs. Thus, Disaster Recovery or DR needs to be solved at multiple levels, often in several ways.

AWS' Relational Database Service provides several Relational DB solutions including MySQL, Postgres, Oracle Db, MS SQL Server, and MariaDb in cloud. You provision infra for these or can also use serverless offering in RDS- Aurora (currently MySQL, Postgres only).

We will be discussing everything for MySQL DBs since implementation details for replication vary based on database.

Multi-AZ FTW!!

AWS has concepts of region and Availability Zones(AZs). While regions are geographically separate locations which allow safety against regions going down, AZs are distinct locations in same region, spanning across multiple Data Centers.

When you launch an RDS DB with provisioned capacity, RDS gives an option to create a read-replica in a separate AZ. These standby replicas remain (almost) in sync with master node.
However, when you are defining RTO and RPO, having a multi AZ deployment is not sufficient since services in a region can go down too.

MySQL Replication to the rescue

Screenshot 2021-01-17 at 3.26.34 PM

There are several mechanisms to keep a replica node or backup in a separate AWS region to deal with region failures. Most of following mechanisms make use of replication capabilities of MySQL at the core. However, since Aurora Serverless v1 does not support replication, mechanisms based on replication do not work. If one has a strong replication use case, consider using provisioned RDS DB.
MySQL makes use of Binary log files(binlog) to capture database changed, called Change Data Capture aka CDC. These binlog files can be used to capture all DDL and DML for a DB, and can be used to create a replica. The multi-AZ deployment supported by RDS takes an initial snapshot and then uses CDC events generated using binlog files to keep the read replica (almost) in sync with master node. But to accurately define our RTO and RPO, we need to take care of multi-region replication.

Multi-region replication approaches

1. AWS Data Migration Service or DMS

AWS DMS allows replication from a source DB to a destination DB, as long as at least one of the source or destination is AWS managed DB. For using DMS, you need to enable binlogs for MySQL and set up a replication node which takes the CDC from source and sends data to destination node. You can find guidelines to set this up for MySQL here. The major drawback with this approach is the requirement of an additional replication node, and its maintenance.

2. Automated backups

AWS RDS natively supports taking automated backups and manual snapshots of DBs. While you can configure your RDS DB to take backups daily, copy needs to be done manually. Your first backup is always a full backup, and subsequent backups are incremental. RDS maintains metadata in these snapshots to determine whether full or incremental snapshot needs to be taken. Making use of automated backups feature, you can take backup of your DB every day, and use AWS CLI or a language sdk to set up a script to perform cross region replication. I shall be discussing this approach in detail in another post.

3. mysqldump

Since you need a homogeneous replication i.e. copy CDC for one MySQL to another, or keep backup to restore into another MySQL instance, you can use native tools such as mysqldump to perform this. With Mysqldump, you can take dump of all DBs on your source instance and restore it on destination when required. For starters, you could simply take a daily snapshot(for RPO=24hrs), copy this dump to a host in another region, and delete older dumps as required.

For Aurora Serverless,

1. Events pipeline

As I mentioned earlier, RDS Aurora serverless does not support binlogs and thus no kind of replication is supported for it. Since the core of replication is binlog generation capability, applications can emit create,Update and delete data change as a separate event with fixed schema which can be consumed to replicate and recreate original DB.

2. Stored Procedures and lambdas

One can setup stored procedures along with triggers in MySQL to capture DDL and DML changes in DB. Your stored procedure could invoke a lambda function and then usual AWS pipelines can be used to publish events. This blog discusses this approach in detail

3. Alter schema

One not so cool approach would be to keep an additional timestamp column, say 'updated_at', in all tables for which CDC is required. You could write a script to poll DB to recreate it on another machine. However, this approach can not be used to build an event source since you will be able to read stored data, not events.

Here, I have attempted to just highlight the approaches available for CRR in RDS MySQL. I shall be discussing this approach using self-managed mysql in another post where we will get into internals of using binlog.

Top comments (0)