loading...
The Agile Monkeys

Saga Patterns by Example

rdoria1 profile image rdoria1 ・11 min read

Moving from Monolithic application to microservices

A very common scenario nowadays is migrating Monolith applications to a new Microservice Architecture.

We are not going to focus on explaining these architectures in this article. Instead, we are focusing on a very specific issue when it comes to dealing with Microservices: Data Consistency.

With MicroServices we are usually going to follow a One Database per service pattern. Imagine business transactions spanning across multiple services, meaning one transaction completion or failure will impact subsequent transactions. We need to maintain data consistency.

We can’t use ACID local transactions, we need to solve this issue differently.

Saga

We’ll need Saga: Sequence of local transactions.
Each local transaction will

  1. Trigger the associated action (This will generally be a DB update, but can also be something like sending an email to a customer etc..)
  2. Publish a message/event to trigger the next local transaction

If the local transaction fails, there will be a series of compensating transactions to undo preceding local transactions. Undoing things doesn’t necessarily mean deleting data in case the action is a DB transaction, but rather updating some state to FAILED.

Enough theory, let’s get to a specific business which we will use to explore the specifics of what we described and to understand Saga.

Enrollment Credit Card

Alt Text

Let’s imagine a Bank has an Enrollment to New Credit Card process. Whenever a customer wants to ask for a new Credit Card, they will enter a bunch of data through some Forms.

  • This data will get to a Card Service that will create the data needed to create Digital Credit Card on the fly (in the meantime the actual physical card gets processed and sent)

  • For this purpose it will first check the provided customer’s data is accurate and can be validated through the Identity Service (maybe this service will use a third party service like Equifax to check the customer’s identity)

  • Then, the Calculation Service will determine the credit limit for the new credit card based on internal calculations and other factors

  • Once everything is successfully completed the Card Service can finally return all the data for the new Digital Credit Card to the customer that was computed in the beginning.

Choreography Pattern

Saga comes into 2 different flavors. First, we’ll explore the simplest and most direct one: The Choreography pattern.

Use Case

The best way to describe this is to show the diagrams of the sequences of transactions happening just after the customer has entered all the required data in the form.

Alt Text

The input data will get to the Card Service that will create a new record in the Card DB. Note we have a status value, which will play a key role here. For now it’s set with VERIFYING value.

Alt Text

Once the record is successfully persisted, a new event will be streamed by Card Service and consumed by Identity Service.

Alt Text

Identity service will consume the event, then it will execute its internal logic and persist a new record in the Identity DB. The isVerified attribute will be a boolean stating if the identity was indeed verified or not.

Alt Text

Once the record is successfully persisted, a new event will be streamed and will be consumed by Card Service and Calculation Service.

Alt Text

The event consumed by Card Service will actually trigger an update of the record persisted in the first step. The status value will be updated to IDENTITY_VERIFIED. We are notifying the main Service that the identity was verified.

Alt Text

The event will also be consumed by Calculation Service. This one will execute some internal logic so that the goal is to calculate the amount Limit for the potential new Credit Card and persist this new data to a new record in the Amount DB.

Alt Text

Once the record is successfully persisted a new event with the amount limit data will be streamed to be consumed by the Card Service.

Alt Text

Finally, once the event is consumed by the Card Service we will again update the initial record with its final state. The amountLimit value will be updated and the status will change to VERIFIED. This status value means the new Credit Card Data is ready to be returned to the customer so they can start using their new Digital Credit Card.

Implementation details

Alt Text

Let’s imagine how we would implement this in the cloud, more specifically with Azure. We are showing just a small part of the implementation as we can extrapolate this to the other services.

As we can see from the diagram above, the services are implemented as Spring boot services. Whenever a new request gets to the Card Service, a new record is persisted in a Cosmos DB Card container.

The interesting part of this architecture comes when integrating Change Feed. We won’t get into too many details on how it works (the official documentation is great) but in a nutshell:

  • Enabled by default
  • It exposes the Cosmos DB logs to the end user so you can consume it and integrate it with other resources to be notified about any new insertion or update.
  • Sorted: Comes in the order of modification
  • Can be integrated with a number of resources

The easiest way to consume the change feed is the integration with Azure Functions, which also comes in handy with our purpose.

Once a new record in the Card Container is persisted, the Change feed will trigger a new Azure function with an event containing the new record. The azure function will receive this event and will do the following:

  • Deserialize the content so we get the customerId
  • Find the record associated with the given “customerId” in the Input Container. Actually, the Change Feed event received will only contain the data persisted in the Card Container. This is not enough for the next step which is the Identity Service. We will need identity data that was part of the original request. In this case we are fetching this data by directly reading a given Input container, but it might be something else such as an HTTP request from another Service.
  • HTTP request to the Identity Service
  • Wait for the Response and depending on the Response Code and Body, stream a new event through Event Hub to be consumed by the Card Service which will transition the Card Container record from VERIFYING to IDENTITY_VERIFIED.

The implementation of the Azure function that consumes the change feed is really simple. Here is a skeleton of this function implemented in Java.

package com.function;

import com.azure.cosmos.*;
import com.azure.cosmos.models.CosmosItemResponse;
import com.azure.cosmos.models.PartitionKey;
import com.microsoft.azure.functions.annotation.*;
import com.microsoft.azure.functions.*;
import com.model.Input;
import org.json.JSONObject;

import java.util.Collections;

/**
 * Azure Function with Cosmo DB Card container Trigger from Change Feed.
 */
public class CardChangeFeedTrigger {

    private CosmosContainer container;

    private static final String databaseId = "Enrollments";
    private static final String containerId = "Input";

    @FunctionName("CardChangeFeedTrigger")
    public void run(
        @CosmosDBTrigger(name = "items",
        databaseName = "Enrollments", collectionName = "Card",
        createLeaseCollectionIfNotExists = true,
        connectionStringSetting = "AzureCosmosDBConnection") String[] items,
        final ExecutionContext context) {

        setupCosmoDB();

        for (String item : items) {            

            // Read customerId from Change Feed.
            JSONObject jsonObject = new JSONObject(item);
            String customerId = (String)jsonObject.get("customerId");


            // Read Input from Cosmo Container
            CosmosItemResponse<Input> input = container.readItem(customerId, new PartitionKey(customerId), Input.class);
            context.getLogger().info("New item received " + input.getItem());

            // Now we do an HTTP CALL to the Identity Service with data customerID and data from Input Model

            // Based on the Answer 
            // 1. "IDENTITY_VERIFIED" event back to CardService if HTTP call succeeded.
            // 1. "IDENTITY_FAILED" event back to CardService if HTTP call failed.
            // 2. "IDENTITY_NOT_VERIFIED" event back to CardService if Identity not verified.
        }

    }

    private void setupCosmoDB() {
        CosmosClient client = new CosmosClientBuilder()
                .endpoint(System.getenv("COSMO_DB_HOST"))
                .key(System.getenv("COSMO_DB_MASTER_KEY"))
                .preferredRegions(Collections.singletonList(System.getenv("REGION")))
                .consistencyLevel(ConsistencyLevel.EVENTUAL)
                .buildClient();
        CosmosDatabase database = client.getDatabase(databaseId);
        container = database.getContainer(containerId);
    }
}

Advantages

  • Easiest way to implement SAGA
  • Each transaction is independent
  • It’s ideal when not dealing with many local transactions (e.g. no more than 4)

Disadvantages

  • It is difficult to track and maintain this logic if several transactions and services are involved. In the implementation example we showed above let’s analyze how this would work:

  • We introduce a new Azure Function to consume the change feed. In terms of ownership it could be tricky to define who should “own” this new function. Does it fall under the Card Service? Or under the Identity Service?

  • Also, as we saw before, the event sent through Event Hub needs to be consumed by the Card Service meaning we’ll need to add a new logic in Card Service so it becomes a consumer of this message. Imagine having this same pattern with multiple services, N services would need to send events to Card Service, it would become difficult to maintain at some point.

Orchestration Pattern

The second approach of SAGA comes with an Orchestrator. In this pattern, the coordination of all the transactions is fully abstracted into a separate service, an Orchestrator. We’ll use the same example as the one for Choreography pattern but we are adding a new Service called Verification Service.

Implementation details

Alt Text

This Verification Service will actually internally check if the customer is eligible for a new Card and check for any outstanding loan or any potential red Flag.

We added this service so we have parallels and sequential operations:

  • Once a new request is generated it will get to Card Service
  • Card Service will trigger the Orchestrator which will be in charge of coordinating the rest of the transactions
  • We have 2 verifications that need to happen and they will both be triggered in parallel, the Verification and the Identity Service.
  • Once both are successfully completed, the orchestrator will trigger the Calculation Service to compute the amountLimit.

We’ll use durable functions to implement the Orchestrator.

Durable functions

The documentation on Durable functions is very useful, but in a nutshell:

  • Extension of Azure functions
  • Serverless State machines - used for Long-running orchestrations
  • It simplifies complex transactions and coordination
  • It maintains a local state
  • It’s code only (No json schemas or config files like other orchestration systems)
  • Currently available in Python, JavaScript and C#

The following image describes the structure of the project to create an orchestrator. We implemented the project in Javascript.

Alt Text

Under the same project (that we usually create with VSCode) we’ll have 3 different kinds of Azure Functions to create a whole Orchestrator workflow. Each function has its own folder.

Orchestrator Function

The EnrollmentCreditCardOrchestrator folder will have the code of the orchestrator function itself. As seen it only contains 2 files:

  • function.json
  • index.js

The json file is a config file and as seen it’s a very small file specifying the function defined in the index.js file is an Orchestration Function (for further details, please check the documentation which is great).

The Orchestration function looks like the following:

const df = require("durable-functions");

module.exports = df.orchestrator(function* (context) {
    const request = context.df.getInput();
    context.log("Received New enrollment request. person: " + (request.person ? JSON.stringify(request.person) : undefined) + 
    " || card: " + (request.card ? JSON.stringify(request.card) : undefined) + ".");

    // Verify the request has all the mandatory fields
    verifyRequest(request);

    const tasks = [];

    const verificationTask = context.df.callActivity("VerificationActivity", request.person)
    tasks.push(verificationTask);

    const identityTask = context.df.callActivity("IdentityActivity", request.person);
    tasks.push(identityTask);

    // Wait for verificationTask and identityTask to finish
    yield context.df.Task.all(tasks);

    context.log("Verification Task " + JSON.stringify(verificationTask.result));
    context.log("Identity Task " + JSON.stringify(identityTask.result));

    const calculationRequest = {
        "creditData": verificationTask.result,
        "identity": identityTask.result 
    }

    const calculationCreditCard = yield context.df.callActivity("CalculationActivity", calculationRequest);

    return calculationCreditCard;
});

function verifyRequest(request) {
    if (!request) {
        throw new Error("A request object is required.");
    }
    if (!request.person.id) {
        throw new Error("Gobernment ID is required.");
    }
    if (!request.person.firstName) {
        throw new Error("A first name is required.");
    }
    if (!request.person.lastName) {
        throw new Error("A last name is required.");
    }
    if (!request.person.workType) {
        throw new Error("A type of work is required.");
    }
    if (!request.person.workPlace) {
        throw new Error("A work place is required.");
    }
    if (!request.person.salary) {
        throw new Error("A salary is required.");
    }
    if (!request.card.type) {
        throw new Error("A Card type is required.");
    }
    if (!request.card.brand) {
        throw new Error("A Cart brand is required.");
    }
}

The way we call the actual services is through Activities and more precisely through the context.df.callActitity function which is another kind of function we’ll define later on.

The context object is the one containing the request when triggering the orchestrator.

“VerificationActivity” and “IdentityActivity” are triggered in parallel, whereas “CalculationActivity” sequentially. How do we do this?

  • Verification and Identity activities are called without the yield word. This means they run in parallel.
  • Line 18 specifies we want to wait for both activities to be completed before moving forward
  • Once both successfully return with their respective payload, we actually call the Calculation activity synchronously, we get the result from it and then return the final result calculationCreditCard to the user who triggered the Orchestration.

Activity Function

As seen in the project structure screenshot, each activity function will have its own folder. Each folder will have its config file specifying the function in index.js is an activity function.

The index.js will look something like the following:

const axios = require("axios");
const identityURL = process.env["IDENTITY_SERVICE_URL"]
const CircularJSON = require('circular-json');

module.exports = async function (context, request) {
    try {
        const result = await axios.post(identityURL, request)
        context.log("body returned..." + CircularJSON.stringify(result))

        return result.data;
    } catch (err) {
        context.log(`Identity Service encountered an error: ${err}`);
        throw new Error(err);
    }
}

In this specific case we imagined the services are already implemented somewhere else so the only goal of the activity function would be to use some HTTP library (like axios in this case) to make the HTTP request to them. In the practice if we are dealing with a new brand service you can also implement the actual logic of this service in the Activity Function.

The activity functions and orchestration functions live in different VMs.

HTTP Trigger Function

This is the last kind of function we need, the only purpose of this function is to actually expose the orchestrator as an HTTP endpoint so it can be called from any client easily.

const df = require("durable-functions");

module.exports = async function (context, req) {
    const client = df.getClient(context);
    const instanceId = await client.startNew(req.params.functionName, undefined, req.body);

    context.log(`Started orchestration with ID = '${instanceId}'.`);

    return client.createCheckStatusResponse(context.bindingData.req, instanceId);
};

As a result, when triggering the orchestrator we’ll get a bunch of URLs as a result (Check documentation)

When calling the statusQueryGetUri we will get the current status of the orchestrator

Alt Text

Advantages

  • Easy to maintain the whole flow in a single place
  • Avoid cyclic dependencies, again everything happens in a single place.
  • If designing a smart orchestrator, the complexity of adding new transactions remains linear.

Disadvantages

  • New Service to maintain: In distributed teams pattern, adding this new service can be tricky to decide which team should own it. It contains logic and domains from all the services.
  • It’s also challenging to know a little bit of the logic of all the services, or at the least clearly understanding the return payloads from services and understanding how to react if something fails and how it affects all the services.

Discussion

pic
Editor guide
Collapse
javier_toledo profile image
Javier Toledo

This article is 💯