In a distributed system, some business transactions may span multiple services. How to implement transactions that span multiple services? — The Saga Pattern
And not just a business transaction, what are some other strategies to keep a distributed system eventually consistent? i.e. when there are multiple systems that store related data across them, what are some common strategies to keeping them consistent? — another such strategy is CDC (Change Data Capture)
Don’t confuse Saga and CDC
Saga is a substitute for a distributed transaction. An example would be: Let’s assume that our business rule (invariant) says that when a user places an order, it will only be fulfilled if the order total is within the user’s credit limit or balance (and deduct the order total from the credit limit). If the credit limit check/deduction fails, the order will not be fulfilled. In this case, the transaction spans over the order service (which manages the order) and the customer service (which manages the customer credit). You can use Saga in this case.
An example for CDC would be: Let’s assume you have a customer service that manages customer data (user-handle, email, name, etc.) and a notification service that sends out emails. The notification service caches some user data so that it does not have to query the customer service every time it has to send out an email to the customer. Now, what if the customer changes their name? That change has to be propagated to the notification service as well. Note that a transaction in this case finishes as soon as the change is committed in the customer service. But since some customer-related data is stored by the notification service as well, we need to make it consistent. You can use CDC in this case.
But there are still some problems we need to tackle in both Saga and CDC:
Guaranteed event/message delivery
Suppose you have a microservice (A) that needs to update it’s own database and publish an event to another microservice (B, using a message broker) as part of a saga step. It will probably do the following:
- Update the concerned aggregate/entity in a transaction
- Create and send an event to the message broker
Now, what if the event publishing to the messaging broker failed (let’s say because of a network issue?)
Retry capabilities
Let’s say you are using CDC. Sometimes, you may want retry capabilities in your application. What if microservice B is not able to consume an event due to a data/code issue? You spot it and quickly do a hotfix. But, you may also want to retry the events that were sent by microservice A which microservice B was not able to consume before the hotfix!
Use the Transactional Inbox/Outbox Pattern along with Saga/CDC
Transactional Inbox/Outbox pattern is a technique used in microservices architecture to ensure reliable and consistent data exchange between different services. It works by using a database table as an intermediary between the service that produces the data and the service that consumes it. The producer service inserts the data into the outbox table as part of the original business transaction (which ensures atomicity of the database update and the event insertion), and then a separate process or component reads the data from the outbox table and publishes it to a message broker or another service. The consumer service can then receive the data from the message broker, and store it in its own inbox table. This way, the data is always available for the consumer service, even if the producer service or the message broker is down or unreachable. The Transactional Inbox/Outbox pattern can help with some common challenges in microservices communication, such as atomicity, durability, scalability, and performance.
i.e. whenever you want to publish an event, you first put it into an outbox table (in the same transaction as whatever generated the event in the first place). Some event emitter polls those and sends them, giving you at-least once delivery (solves the guaranteed delivery problem). If the business trasaction generating the event gets rolled back, the event dosen’t get saved; if the emitter fails to publish, the event still sits in the DB. You can add a status to each event in the DB (PENDING, PUBLISHED). The poller will pick up PENDING events and mark them as PUBLISHED after publishing them to a message broker. Setting the status of an event back to PENDING will allow you to resend the message if needed (solves the retry capability problem). You may be tempted to skip the inbox table but it may come in handy to create idempotent consumers. In case you want to retry an event, your consumers should have the capability to discard duplicate events. If your domain model doesn’t have this capability, the inbox table will be useful here — If the event-id is already present in the inbox table, you can skip this event!
Top comments (0)