DEV Community

Cover image for How to migrate data when refactoring a monolith to microservices (with zero downtime)
Meitar Bruner for Artlist

Posted on

How to migrate data when refactoring a monolith to microservices (with zero downtime)

Microservices architecture is a well-known subject that is covered a lot in the tech industry.

Every person who ever tried breaking a monolith, probably found there aren't a lot of articles that speak about one of the greatest pains when moving from monolith to microservices - Data migrations.

After making some mistakes with migrations, and the fact that we decided to do it with zero downtime, I'll be glad to share some strategies, tools, and best practices we've learned for this kind of migration. Let's dive in.

First, what are data migrations?

According to Wikipedia: "Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another."

Let's understand what are the risks when migrating data:

  1. Having corrupted data (data that violates your new service expectations).

  2. Losing data.

  3. Creating data duplications.

  4. The new data can contain mistakes.

The risks when switching services with the aim of zero downtime:

  1. Data might return differently than expected.

  2. The new service might not handle the requested rates.

  3. New data that was created and updated won't be available anymore in the old system.

In this article, we will cover the practices to lower these risks. It is important to mention that you can probably migrate your data without all the practices I am going to cover, but keep in mind that having all of them will lower your risk to a minimum.

A. Preparations

Understand the data you are going to transfer

Learning the old entities you are going to transfer, the relationships between them, and how they will be represented in the new database is the first step. Also, it's important to understand the size (# of records & bytes) you are going to transfer.

Knowing the nature of the data will save you a lot of time when creating your new API and your migration process.

In case you designed or built your new API, you probably already did it.

If not, pay attention to the relations and the properties of the objects: do you need all the properties/columns? Will one object from the old database be represented by 2 objects in your new database? Or vice versa, a joined object from 2 SQL tables can become one object in the new database.

There are many possibilities so I won't try to cover them all. The main point is, that you should understand the data you are about to transfer and its scale, so you'll be able to create your new service properly and understand how each piece of data will be mapped to your new database.

monolithic architecture

Create the new API

An important approach I'd recommend embracing is to never access your new database without the API layer.

This is very important because accessing the database without the server layer and then using the same data from it can cause some mutations and problems you want to avoid (risk #1).

For example, your new API may have a restriction for some string property to be camelCase only, which the old system did not enforce. In that case, if you create a migration without going through the API layer, the new database will be corrupted with data that violates your new API rules.

In that case, unless you don't have your API ready, it is better to not insert new data into it.

Create your API first, better covered with DTO validations and integration/e2e tests, then move to the migration phase.

New microservice

Try to be incremental as possible (1 feature at a time).

As we mentioned, the migration process comes with a risk, so it is very important to try and do it in small steps rather than migrating everything at once.

Think if you can migrate only some part of the data from the old database to the new one, so at least some endpoints from the old monolith could be replaced at the beginning.

You can lower the risk even more by starting with the feature which has the least traffic.

In Artlist for example, we had a case when we needed to migrate all data related to the user's favorite songs, artists, and albums, including folders users created for themselves.

In that case, we decided to do it step-by-step for each object and chose to start with favorite artists because this feature has the smallest traffic and was less used by users.

Consider using third-party tools

At this point, you already know the old structure and the new expected structure of your data. Plus, you already created the API which is connected to the new database, and decided about what objects you would like to migrate first to your new database.

Before you dive into implementing big guns (developing huge code implementations), this phase is a reminder that for all the next phases, it is always good to search for some known, tested third-party tools, packages, or services that already implemented what you are about to do.

Because every problem is different and many people have different stacks and cloud services, I'm avoiding pointing to a specific solution. Just remember a lot of these problems are already solved by many developers, and they might have thought about issues we haven't.

B. Get to work

Synced create/update/delete operations

Sync strategy

Now we are starting for real. The first thing we would like to do is start writing to the new database and have it always synced with the old one. Remember we are aiming for zero downtime, so at the moment we will switch to the new service and database, it should have all the old data migrated to it.

The best way to do it is to call the new API from the old system whenever a mutation operation is happening (create, update or delete). As we mentioned above, we would like to go through the API and not update the database directly, to prevent corrupted data from being transferred.

Another important thing is from where in the old system you are calling to your new API. Having the calls as close as you can to the old database (like shown in the graphs above), will lower the risk of losing the call due to some error.

Keep in mind that now your new API will start handling requests, so it is better to verify that it can handle the rates of the old service and database.

Another great layer you can add, that will lower the risk even more, is sending the operation to some queueing component with an acknowledgment mechanism. That way you will never lose the call (risk #2) unless the new API processes it, plus you will be able to throttle calls if needed.

Create a job for history data

Good, we now have our new database populated with every new data piece inserted by real users! Also, updates and deletes are synced with the old database.

Now it's time to migrate all the data that was inserted before the operations sync.

We will need a job that can fetch data from the old database and then create an API call to insert it into the new service. Once you have this component, you can start and run it to migrate all the past data.

*It is much safer to read from a replica and not directly from the production database.

Depending on the size of your data, running the job can be time-consuming and take minutes, hours, or even days in extreme cases. Also, if you already implemented the queuing mechanism, it might be worth using it with this job as well, just create new operations to the queue you already have.

History data migration process

Create a mechanism to prevent duplications

It is also important to create a mechanism that prevents inserting duplications in your new service.

Let's say that your migration setup is broken, and your job creates 2 POST calls to the new API, for every 1 object from the old database (risk 3#).

A way to prevent the new API from creating duplicates is by checking if the unique identifier of the object is already located in the database, and then only if it isn't, insert it.

If this object should be able to have duplication (1 to many, many to many), you can save a property with the old ID of the object in the new database object. Doing it will let you do a quick check if this object already exists.

C. Some more protection

Verification tests

You now have your new database populated and synced with all past and present data.

Before you do the final switch from the old monolith to your new service, consider doing some verification tests. Verification tests can be created in many ways, but there are 2 that can be very useful:

  1. Tests that validate that the databases are synced (prevent risks #1-4).

  2. Test that validates that the end user will get the same data when you switch to the new service.

The first ones can be checked by a job, which will go over all data objects from the old database, and check if the same data exists and returns as expected from the new service.

The second one can be validated statistically with real traffic. For every GET call that fetches data from the old monolith, you can make a call to the new service, and then before/after returning the data to the consumer, send the received objects to some component that will compare them and store if the result is expected or not.

Create a backup mechanism

As a developer, you know anything might happen. That's why it is worth considering a backup mechanism.

What it means is that everything we just did from the old monolith to the new API, we can also create in the opposite direction (sync calls from the new database to the old system, history sync, etc.).

That way, in case the worst happens, we will have a mechanism to roll back to the old known system.

D. Switch and monitor

Connect the consumer to the new service

We are one step before the final switch, if you did the comparison tests, you already have the GET calls in your consumer. If not, this is the time to implement these calls to the new service along with switching the mutation calls (create/update/delete) to use the new service.

A thing worth considering in this phase, is a gradual rollout mechanism, to test what will happen to users if they'll be switched to the new service.

Rollout 100% & monitoring

That's it, we can now switch all calls to the new service. In case you have some feature flags related to a rollout, you can now get rid of them.

A new service with maybe new technology, can raise some problems we might not even think about, that's why it is important to keep and monitor the new service and the consumers using it even more during that phase.

Summary

We covered the risks, and the practices meant to lower the risks, of migrating from one service to a new service and database, while doing it with zero downtime.

Migrations can be scary, but using the practices mentioned in this article, will lower the potential risks and issues.

Even though I learned all these practices, there is a disclaimer that we didn't use all of them. Implementing them all comes with the cost of resources like time, which sometimes you do not have.

In that case, you should analyze your situation, and think which practices you can afford, and which would be less valuable.

Top comments (0)