Current State + Last Event as an alternative to Event Sourcing

#eventsourcing

TLDR: Storing the current state of an entity including a LastEvent property (indicating the last event that happened to the entity) in combination with database change stream technology can be an alternative to an event-sourced entity approach, but of course with its own pros and cons.

Disclaimer: Please regard the content below just as sharing an approach which is most probably nothing new and not necessarily better than other approaches like Event Sourcing.

Introduction

When dealing with persistence or the "how to store my application data in a database" question developers usually come across the modern/quite hyped Event Sourcing (or append-only log) approach and the traditional insert/update entity (with its current state) approach.

On the one hand, with Event Sourcing (ES) we have event-sourced entities stored as a list of events in a database collection/table, each event having a type (e.g. Registered, Relocated) and a few properties. This list of events serves automatically as an audit trail for the entity, or in other words ES has a built-in audit trail.

For example, a CustomerEvents collection/table may contain entries like this:

    {
      "entityId": "a0e90137-4f5e-4071-805a-27bd3e5c5858",
      "id": "0bd1774b-8777-4086-8bc8-50c9627c9952",

      "type": "Registered"

      "createdOn": "2021-01-01T10:00:01.000Z",

      "someProp1": "value1a",
      "someProp2": "value2"
    },
    {
      "entityId": "a0e90137-4f5e-4071-805a-27bd3e5c5858",
      "id": "47223F6A-B9DA-4B6F-9B21-67CFF62F11C9",

      "type": "Relocated"

      "createdOn": "2021-01-02T10:00:01.000Z",

      "someProp1": "value1b",
      "someProp3": "value3"
    }

On the other hand, with the traditional approach entity's current state (= entity attributes with current values) is initially inserted and then updated (multiple times), without keeping track of the events which have led to the current state of the entity (audit trail is usually an afterthought).

For example, a Customers collection/table may contain entries like this:

    {
      "id": "a0e90137-4f5e-4071-805a-27bd3e5c5858",

      "createdOn": "2021-01-01T10:00:01.000Z",

      "someProp1": "value1b",
      "someProp2": "value2",
      "someProp3": "value3",
    }

To some it may seem that the ES approach complicates the retrieval of the current state of an entity ¹. You have to either fetch all events and replay them in memory to construct an in-memory "current state" of the entity, or you have to implement a Read Model, which is asynchronously fed with the events and thus is eventually consistent with the Write Model (the latter contains only the list of events in an append-only form).

ES is praised for the built-in audit trail (list of events = audit trail entries), however usually (at least in my personal opinion) developers are first taking care of storing/updating the current state of an entity, and the audit trail comes second (as a result of these store/update operations). Even though I fully agree that a sequence of events is more understandable to humans I have the feeling that mutating an entity still sounds more natural to developers in 2021 when it comes to data persistence.

The "Current State + Last Event" Approach

So what if we just inserted/updated an entity in the database, and then used the nowadays widely spread database technology for reading the database transaction log (e.g. Mongo DB Change Streams, Cosmos DB Change Feed, DynamoDb Streams, etc.) to create the list of events (= audit trail) after the fact? Well, for that you would need the event types (CustomerRegistered, CustomerAddressChanged) ... and these you could get if you would store them in an additional entity field, e.g. LastEvent.

The basic steps of this approach are:

Store the Customer entity, but set an additional property LastEvent = CustomerRegistered

{
  "id": "ab607a9e-4662-11ea-b979-eb44b02db7b9",

  "createdOn": "2021-01-01T10:00:01.000Z",

  "someProp1": "value1",
  "someProp2": "value2",
  "someProp3": "value3",

  "lastModifiedOn": "2021-01-01T10:00:01.000Z",

  "lastEvent": {
    "type": "Registered"
  }
}

Update the Customer entity, but set an additional property LastEvent = CustomerAddressChanged

type CustomerEvent = 
| Registered
| Relocated

type Customer = {
    Id: CustomerId
    Address: Address
    LastEvent: CustomerEvent
}

module Customer =
    let relocate customer newAddress = 
        // TODO: some validation here

        { customer with
            Address = newAddress
            LastEvent = CustomerEvent.Relocated // note this
        }

which results in the following state in the database (once the update has been performed):

```json
{
  "id": "ab607a9e-4662-11ea-b979-eb44b02db7b9",

  "createdOn": "2021-01-01T10:00:01.000Z",

  "someProp1": "value1",
  "someProp2": "value2",
  "someProp3": "value3",

  "lastModifiedOn": "2021-01-02T10:00:01.000Z",

  "lastEvent": {
    "type": "Relocated"
  }
}
```

Subscribe to the database change stream, consume the entity's current state incl. the LastEvent property, and use that LastEvent property for the audit trail entries, or for publishing external integration events between microservices for example.

async {
    let dbClient = MongoClient("mongodb://localhost:27017")
    let db = mongo.GetDatabase "TestDB"
    let col = db.GetCollection<Customer>("customers")

    let cancellationTokenSource = new CancellationTokenSource()
    let cancellationToken = cancellationTokenSource.Token
    let options = ChangeStreamOptions()
    options.FullDocument <- ChangeStreamFullDocumentOption.UpdateLookup

    use! cursor = col.WatchAsync(options, cancellationToken) |> Async.AwaitTask

    do! cursor.ForEachAsync((fun change ->

           // process the change.FullDocument ...
           // access the change.FullDocument.LastEvent ...

    ), cancellationToken) |> Async.AwaitTask
}

Issues/Challenges with the Event Sourcing Approach

Personally I have always regarded ES as an extremely tempting approach, however it has a few "not so nice features" like:

Developers do not feel natural the way how domain classes are written (only use ApplyChanges so replay can work), it takes time to get used to persisting/replaying events (vs. "let's CRUD this entity")
ES enforces Read Model (even for the simplest Id+Name paginated list of Customers for example) + eventual consistency between Write and Read Model from the very beginning (unless you read by entity id only, or do some gymnastics around synchronously storing the current state with every event). This causes some hassle with some standard validation checks requiring high consistency like duplicate user/email check, negative account balance check and similar (yes, I have read quite a few posts on that from Greg Young, which go towards questioning business requirements and handling duplicates or negative balances later on etc ... however for simpler systems going into such discussions could be an overkill).
There is the topic of how easy or difficult it is to version event schema ... I have even read academic papers like this and this identifying several different options, and honestly, none of them sounded super straightforward. Some like Adam Dymitruk may say that we can/must "close the books" i.e. close past periods, create snapshots and go from there forward, however ... again, not the thing I would do without thinking.

Issues/Challenges with the "Current State + Last Event" Approach

Current State + Last Event approach eliminates the above 3 concerns, but (as usual) comes with its own overhead:

You need this additional LastEvent property added to every entity. We do keep also LastModifiedOn, LastModifiedBy as well, so LastEvent is just another field, but the Domain Model must take care of setting it upon every operation, together with mutating some of the other properties.
You need to utilize a database technology for subscribing to collection/table changes and materializing the events from them.
When creating audit trail you have to diff previous entity state with current state (generic implementation possible, but still an overhead)

Real-world Application of the "Current State + Last Event" Approach

So how do we use this approach in a real world CQRS-based microservice architecture? Here try to illustrate this with an example.

Diagram 1: Customer Service and its "physical services", with Command Handling and Query Handling sharing the same data model

Customer Service (logical service, responsible for registration of new customers, handling changes to their data), consisting of several physical microservices (each one a separate process or pod in Kubernetes for example):

CustomerService.CommandHandling - stores Customer entities in a customers collection/table in the database (e.g. Mongo DB). Each entity has a LastEvent property.
CustomerService.QueryHandling - responsible for satisfying queries, by default these run against the original customers collection. Once we want to separate write from read concerns we redirect the queries to a Read Model (e.g. column store database like Azure Data Explorer - see Diagram 2)
CustomerService.EventPublishing - responsible for listening to the database change stream, and publishing customer integration events on a message bus (e.g. Azure Event Hub)
CustomerService.Auditing - responsible for listening to the database change stream, and storing audit trail entries in a CustomerAuditTrailEntries collection for example, diffing the current state received from the change stream with the last stored state, and calculating the delta. Note that Auditing is completely optional, it can be added afterwards.

Diagram 2: Customer Service and its "physical services", with Query Handling using a separate Read Model

How does Auditing calculate the diff and create an AuditTrailEntry? Here you have the possibility to keep the last but one state of the entity and compare it in a generic way to the current state of the entity (both representations can be in JSON, or in our case BSON). The lastEvent.type is then copied over to the AuditTrailEntry, together with the list of changes between previous and current state of the entity.

Open Topics

An open question I am still contemplating about is whether a single LastEvent is enough, or if that should be replaced with an array of LastEvents ... So far we have managed to handle all requirements with a single LastEvent ... but on one or two occasions I thought that I could have it easier if I had multiple finer-grained events ... The current approach is 1 command + current state => new state incl. 1 last event, but feel free to educate me why this won't work long-term in the comments ;)

Conclusion

In conclusion, the traditional "current entity state" approach combined with "last event" and database Change Stream technology seems to be quite useful and allows for staying simple at the beginning but still able to add Read Models or Audit Trail ² later on / whenever these are really needed. IMHO topics like Eventual Consistency and Event Schema Versioning are not to be underestimated, and may make simple things a bit more complicated, and changes slightly slower, if not well mastered.

Yes, we can discuss if the current state of the entity is really needed for processing a command (some like Adam Dymitruk for example are mentioning that aggregates are not even needed anymore), however that is a different discussion. ↩
Audit Trail is usually considered a consequence of changes to (so comes in 2nd place) and not the source of the current state of an entity, so the LastEvent approach aligns more naturally to that. ↩