DEV Community

Vladyslav Len
Vladyslav Len

Posted on

Rethink the way you share the data between micro-services with Change-Data-Capture

Organizing and sharing your data across the micro-services, these are the questions every developer or architect starts asking himself at some point. This is exactly the question I was asking myself when I realized that something is wrong with the architecture I had built.

The problem we will be talking about is old and simple and it already has a few ways to solve it. My goal here, in this article, is to share my experience of dealing with it, and the ways to make your life easier :)
Now, we finished with the intro, let's deep dive into the problem.

When you have more than 1 service in your system you need to decide how will you be sharing the data across the system. The reason why you ask this question is that it's almost impossible to build the right micro-service architecture using manuals and common practices. It's not just impossible, I truly believe there is no such thing as correct micro-service architecture. It can be more or less shiny, but in general, at some point will have to break a few guidelines.
So, let's take a look at a few ways to share the data across the system

1. Direct calls to the micro-service

Probably the easiest way to get the data from different micro-service. Most of the developers chose this approach because it's fast, easy to understand and the implementation is simple. BUT, there is another side to this approach. Cascading failure. This is something I have been facing for a long time. And this is exactly the reason I started looking for another way to share my data and increase the overall availability of the system.

2. EvenSourcing architecture

Generally, I don't mind using event sourcing while building the micro-services. It's a great way to share the data since each micro-service can store only the data it needs, so there is no data duplication. But it requires developers to write more code to deal with async event handling. Basically, each time you create a new service - you need to write a code to integrate your service with your event bus. It also requires a bit more debugging in case something goes wrong because it's hard to determine where exactly the bug occurs.

3. Selective/Logical Data Replication

Selective or Logical Data replication is the approach when developers don't write a code to sync the data between the services. They simply continue working on the service querying the data from the database as would this data belong to the service. Consistency and data replication is guaranteed by the infrastructure. This Selective/Logical Replication is possible because of change-data-capture.

The idea of change-data-capture is simple.

Visualisation of change data capture

You have a database that constantly does something, inserts the data, deletes the data, etc. And what we do is we "subscribe" to the logs of these changes. We read the stream of the events that are happening in our database. From there we can process it down the road, read the data inserted, transform it, etc. 
There are plenty of databases that support change-data-capture integration. Databases like Postgres, MySQL can export their change log and you can use different tools to parse it and use the data from the log.

Packages and technologies like pglogrepl and Debezium can help you build your own change-data-capture framework/layer within the infrastructure.

At this point, you may have probably guessed how can we use that to implement a better way to share the data between micro-services. By using the replication we can implement the system when the services will be sharing the data, but will not be coupled and will not be impacting one other.

Decoupling services with replication

It is worth mentioning, that building the replication for your micro-services is a complex and time-consuming process. But it has its own benefits.

As an example: you don't have to write the code to get the data from another microservice like with EventSourcing which means that you will be able to create new services (therefore grow) faster. You also will be able to debug the inconsistency easier, and most important you will be able to recover the data after the outage by having initial sync that will populate all the services with the data from the source.

In DataBrew this is exactly the thing we are working on. We are aiming to get the developers a simple way to build the replication for their services. Without any need to build, or maintain this complex infrastructure. Please, visit our website and see how we can help you in tailoring your data replication.

Top comments (0)