Follower Clusters – 3 Major Use Cases for Syncing SQL & NoSQL Deployments

#postgres #mongodb #mysql #database

Follower clusters are a ScaleGrid feature that allows you to keep two independent database systems (of the same type) in sync. Unlike cloning or replication, this allows you to maintain an active, point-in-time copy of your production data. This extra cluster, known as a follower cluster, can be leveraged for multiple use cases, including for analyzing, optimizing and testing your application performance for MongoDB, MySQL and PostgreSQL. In this blog post, we will cover the top three scenarios to leverage follower clusters for your application.

How Do Follower Clusters Differ From Replication?

Unlike a static clone, this data imports on a set schedule so your follower cluster is always in sync with your production cluster. Here are a few critical ways in which it differs from replication:

You can control how frequently the destination system syncs from source - once a week, once a day, or even less frequently. This helps reduce the load on the source system.
Since they are two independent systems, you have much more flexibility over the data that is synced. You can have different user credentials and even remove some data from the destination based on security requirements (note: This requires user-side scripting - it is not an in-built feature of follower clusters).
The ‘follower’ system is writable, so you can use it as a staging environment to test your application changes. This is not something you can do on a replica node.

Note: ScaleGrid implements follower clusters using storage snapshots. It is not available for our in-memory database offerings like hosting for Redis™*.

1. Database Dev/Test Setup

We have all been there - a supposedly well-tested piece of code is deployed in production, and then all hell breaks loose. Production workflows fail, or are so slow they are basically unusable. Engineers are woken from their beds to start a full blown fire fighting operation. A bunch of sleepless nights later, that dreaded root cause emerges.

Application behaves differently on production and engineering setups.

In other words, we tested it on “test data”. Which, as it turns out, was nothing like the production data. At all.

The obvious way to avoid this situation is to run tests on your production data. Not actual production of course - that will be flirting with disaster. On a cloned copy of the production data. While concerns around privacy and data security make this impracticable in many scenarios, privacy requirements permitting, this is the best solution. We no longer need to rely on engineers generating appropriate data sets - if it passes on test data, it will pass on production data.

That is, until test data falls so far out of sync with production that it is no longer a good approximation. And we are back at square one.

This is where follower clusters come in.

By using follower clusters, you can periodically import data from your production database into the dev/test database. And since the entire import is performed using storage snapshots, rather than a logical dump, the process is nearly instantaneous. You can schedule your imports once every 24 hours, once a week, or whatever frequency suits your particular scenario.

With your development and QA clusters set to follow the production cluster, you can rest easy. If your application passes on the test dataset, it is definitely fit to deploy in production!

2. Data Analytics

If you have worked as a DBA, you’ve probably had a conversation with your team about system performance “mysteriously” slowing down at certain times. In most cases, the culprit turns out to be an analytics job that is accessing tons of data and ends up slowing down the entire system.

As a DBaaS vendor, we have had this conversation multiple times with our customers. Here are the two options we typically suggest:

If the analytics job is running on the primary/master server, move it to a secondary/replica server.
If the analytics job is already running on a secondary node, and the performance degradation is unacceptable, we recommend moving the jobs to a dedicated analytics cluster.

Using our follower cluster feature, it’s very easy to keep an analytics cluster up-to-date with actual production data. You can create a follower schedule to sync the latest data from production just before your analytics job kicks in.

The best part? Follower syncing does not perform any database level operations - it merely restores the latest snapshot! So, there is zero impact on your production cluster.

3. Reporting

Another common use case where our customers use the follower clusters feature is for report generation. Reporting processes typically run infrequently, but access large quantities of data and take up most of a database cluster’s resources. When the performance degradation is unacceptable, we recommend our customers move the reporting workload to a new cluster.

Since reporting operations are infrequent, many of our customers prefer to leverage our pause/resume feature to ‘pause’ reporting clusters when they are not in use. This helps save massively on infrastructure costs. Typically, reporting clusters are also much “smaller” (lesser CPU/RAM), to help reduce costs.

After you have created a follower cluster from our UI, you could use this workflow to automate your reporting flow:

Use our resume API to resume the cluster.
Wait until the cluster is back in running state (you can use your get-status API for this purpose).
Trigger a backup on your production cluster, if required (typically, if regular backups are scheduled on your production, you can skip this step. However, if you want your reporting to run on the latest data, this is essential).
Wait for the backup to complete.
Trigger a sync job on the follower - this finds the latest snapshot on the source cluster and restores to the destination.
Wait for the sync job to complete.
Run your reporting tasks.
Use our pause API to pause the cluster till your next reporting job!

Do you think follower clusters are a good fit for your particular user case? You can learn all about how to deploy and manage follower clusters for MongoDB, MySQL and PostgreSQL in our help docs!

If you are unsure about whether follower clusters are the correct solution for your use case, leave a comment or reach us at support@scalegrid.io - we are happy to discuss which feature best fits your requirements.