π¦ Data consistency, outbox pattern and idempotency in a microservice architecture
CAP Theorem
Late 90's, the scientist Eric Brewer presented for the first time the CAP Theorem. The theorem states the "two out of three" concept, any distributed system can provide only two of the following guarantees:
- Consistency: every request receives the most recent data or an error;
- Availability: every request receives a response, without the guarantee that it contains the most recent data;
- Partition tolerance: the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Considering a web app, where there is a network connection between a database and a back-end application, or even between different services in a microservice architecture, the app must be partition tolerant. This means that, even after the network is partitioned, the system still works correctly. Therefore, after a partition, it only remains to decide whether to do one of the following: cancel the operation to ensure the consistency, or proceed with the operation providing availability but risk inconsistency.
Let me give you an example to clarify the CAP theorem. Imagine an e-commerce that to the process of finishing an order, two services are involved: order
and catalog
. The order
service has to check if there are products available calling the catalog
API before finishing the order. If the catalog
API is not available in that moment for any reason (a partition happened), the catalog
API can behave two different ways:
[Consistency] Choose the strong consistency returning an error
[Availability] Giving up the strong consistency returning to the client that eventually the request will be processed
The pattern that will be discussed in this article is an eventual consistency pattern. So, the outbox pattern gives up the solid consistency for focus on availability.
Outbox pattern
The outbox pattern makes sense only for distributed systems, discussing it in a monolithic scenario is completely nonsense. The problem that this pattern solves is: how to reliably/atomically update the database and send messages/events?
The way as the pattern solves this problem is relatively easy to understand, basically it can be described in four steps:
- A service that persists data in a database, inserts also messages/events into a table (which is called outbox table) as part of the local transaction;
- The service appends the messages/events to an attribute of the record being updated;
- Another process, called
Message Relay
, publishes the events inserted into the database to a message broker; - If something wrong happen, the
Message Relay
process retry to send the event a few times until the set limit been reached; - The messages/events are stored in the consumer side too.
Edited version from: https://github.com/dotnetcore/CAP
So the outbox pattern would guarantee data consistency between the services, but what if the events are consumed twice? This is where idempotency comes in.
Idempotency
In a scenario with a broker at least once delivery
the message could be persisted more than one time in two different situations:
- The producer had produced a message and sent it to the broker, the consumer stores the data in the database but don't return an
ack
in a timely manner. Then, the broker concludes that the message was not processed sending the message again; - In the outbox scenario, the producer had stored the message in the
outbox table
for the first time and sent it to the broker, but for some reason it wasn't able to update theoutbox table
saying that the message was published. For that reason, it will keep sending the message again until theoutbox table
had been updated.
NOTE: that could be even worse in a multiprocessing scenario
To turn your consumer in an idempotent one, you could register in the database the message/event ID that has been rightly processed. When the consumer is processing a new message, it would be able detect and discard duplicates.
Conclusion
The outbox pattern is an eventual consistency pattern that cares about the system's availability but is not a silver bullet. When using it you should be careful about double message consumption choosing an idempotent consumer approach for example.
There are many libraries in .NET
that helps you implementing the outbox pattern
like: MassTransit, NServiceBus, CAP. Talking about idempotency, a special mention to a specific lib from a big friend that runs on top of CAP
which is called Ziggurat.
If you got until here and liked the article content, let me know reacting to the current post. You can also open a discussion below, I'll try to answer ASAP. Next article, I'll show you the code specifying all you need to build a system using outbox pattern
and idempotency
using .NET
, CAP
and Ziggurat
. Hope you like it!
References
CAP Playground, π€ Just playing a bit with CAP and outbox pattern
[PT-BR] JS+, Data consistency, outbox pattern and idempotency in a microservice architecture with .NET; JS+ TechTalks #22 - Edição Lisboa
Richardson, Chris; Pattern: Idempotent Consumer
Richardson, Chris; Pattern: Transactional outbox
Top comments (0)