Introduction
There are several ways to handle data in a microservice architecture, in this article we will cover the approach of having a separate database for each microservice.
Prerequisites
This article will build upon the concepts covered in the following video by Houssem Dellai:
Architecture overview
We will use the following architecture as described in the video above. The following diagram shows the architecture:
Quick overview of the architecture:
- We have an eCommerce website that consists of these two microservices:
- Catalog Service: This microservice is responsible for managing products.
- Basket Service: This microservice is responsible for managing orders.
- Both microservices are behind an API Gateway.
- Each microservice has its own database.
- In order for the Backend to communicate with the Catalog service, we use a REST API. This is how the Basket service gets the products data from the Catalog service.
The problem with the above architecture is that the Basket service is tightly coupled with the Catalog service. There are a few drawbacks to this architecture:
- If the Catalog service is down, the Basket service will not be able to get the products data. Meaning, the Basket service will not be able to process orders and would also be down.
- When scaling up, the Basket service will start generating load on the Catalog service meaning, the Catalog service will also need to be scaled up.
- Each time the Catalog service is updated, the Basket service will need to be updated as well.
To decouple the microservices, we will use a third microservice called Gateway. This microservice will be responsible for routing the requests to the other microservices. But with this approach, the API Gateway will handle all the requests and the microservices and this could turn into a bottleneck as it will have a lot of responsibilities.
Let's take a look at another approach to decouple the microservices.
Loosely coupled microservices
As each microservice is independent and owns its own data, to decouple the microservices, we can use a materialized view that will store the aggregated data of the microservices.
The following diagram shows the architecture with the materialized view:
Quick overview of the architecture above:
- We are adding a new materialized view to the Basket service which will store the aggregated data of the Basket service and the Catalog service.
- The Catalog service is now extended to send change events to the Basket service.
- The change events could be stored in a service like Apache Kafka or RedPanda.
- The Basket service will now handle the change events and update the materialized view accordingly as the data in the Catalog service changes.
- The materialized view will be used as a read-only data source for the Basket service.
The downsides of this approach are:
- There will be a duplicate of the data in the aggregator.
- The change events would add lots of complexity to the microservices.
- If you do not have the expertise, adding the additional components and changes means that there will be a steep learning curve for the development team.
Using Materialize
Materialize is a streaming database that takes data coming from different sources like Kafka, PostgreSQL, S3 buckets, and more and allows users to write views that aggregate/materialize that data and let you query those views using pure SQL.
Unlike traditional materialized views, Materialize is designed to maintain the data in a continuous state and keeps the views incrementally updated. This means that if you have a view that is constantly updating, you can query the data in real-time. A normal materialized view would do a full table scan every time it needs to be updated which can be very expensive.
For a more detailed explanation of Materialize, please see the Materialize Documentation.
You can also take a look at this quick video that explains what Materialize is.
Decoupled Microservices Architecture with Materialize
Thanks to Materialize, we can decouple the microservices architecture and use a live materialized view to store the aggregated data. Materialize will keep the materialized view incrementally updated and will allow us to query the data in real-time with subsecond latency.
This will eliminate the need for the event-driven architecture and will also eliminate the need for extending the Catalog service to send change events to the Basket service.
The following diagram shows the architecture with Materialize:
Quick overview of the architecture using Materialize:
- We will again use a materialized view to store the aggregated data of the Basket service and the Catalog service. This time, we will use Materialize to store the data.
- As this will provide us with a real-time view of the data, we no longer need the change events as we can query the aggregated data in real-time.
- That way if the price of a product changes, the Basket service will be aware of the change immediately.
As Materialize is Postgres wire-compatible, we don't need to use any third-party libraries to use Materialize and there is no learning curve.
One thing that you need to keep in mind is that as of the time being, Materialize does not have persistence. This means that if you restart the service, the materialized view will need to re-aggregate the data. This feature is on our roadmap and will be available soon.
Demo
To put this into practice, you can take a look and run the following demo:
Conclusion
Useful links:
Top comments (8)
In a microservice based application where there are thousands of services with which information needs to be communicated across. There's a concept in materialized called CDC, Change Data Capture. We can take advantage of the incrementally updated materialized views and propagate the changes to some destination with event streaming platform such as KAFKA. As the database changes we publish those changes and services that need to be aware can subscribe to them. This way, it makes your third illustration more loosely coupled.
Yep, Materialize also supports CDC:
materialize.com/docs/guides/cdc-po...
You can use it with Kafka, or if Kafka is not part of your stack already, there is a direct Postgres Source available.
You should check it out 🙌
Great post!
Thank you brother!
Quite interesting.
Happy to hear that!
Something New.... Great article man
Thank you! Appreciate this!