Event Sourcing is everywhere. Almost every architectural problem in an IT system can be solved using Event Sourcing. Is that true? Of course not! Still, the pattern is proving to be very useful in a number of situations. When to consider using event sourcing? What are the scenarios, where this pattern works best?
There's a couple of great articles on the basics of event sourcing, such as the introduction on MSDN, another one by Martin Fowler, or the documentation of Akka Persistence and EventStore. Event sourcing is a rich pattern, and as you might see, each article might focus on a different use-case.
In this article, we'll first briefly introduce Event Sourcing, and then take a look at three possible use-cases, which might prompt adopting Event Sourcing in your system.
An event is a fact, which describes a state change that occurred to some entity. Notice the past tense! An event is something that already happened. As such, it cannot be modified — events, just like facts, are immutable.
In a system which is based on event sourcing, the primary source of truth is the stream of events. All state is derived from this stream. This includes the "current" state of entities, which is often the basis on which the system's business logic operates.
A good introduction connecting the concepts of event sourcing, stream processing, CEP and reactive can be found in Martin Kleppmann's article.
For any entity, its current state can be derived by folding — or in other words, applying — all events relating to this entity, starting with some initial "empty" state.
Implementing event sourcing might make sense globally in a system, but quite probably you'll want to use it selectively. Event sourcing can peacefully co-exists with other methods of handling state in a system; it's not a binary, all-or-nothing choice.
Performance might be the most-cited and well-publicised use-case for Event Sourcing. While it is true that this pattern might help in designing a high-performance system, nothing comes for free. There are still challenges that lie ahead, and tradeoffs to be made!
This might be exemplified by the fact that there are "event-sourcing rescue projects" as well:
Jimmy Bogard 🍻every Event Sourcing rescue project i've been on was also promised these things and found them not to be true (for their scenario)16:12 PM - 23 Jan 2020
Remember though, that "rescue projects" usually have a couple of problems, and give a skewed picture of a technology. Reality is not that harsh!
In a system under heavy load, event sourcing can be useful thanks to two of its characteristics: separating read and write models, and natural sharding. Let's look at both of these.
Not only each system is different, but individual functionalities/endpoints in a system might see radically different usage characteristics. In a system which receives a lot of traffic, you might see a lot of writes, as compared to reads. Or, only relatively few writes, but a lot of reads; or, you might see one component of the system to be used much more often than others.
To properly deal with performance requirements, while maintaining sane operational costs, we might need to scale the components of the system independently. Again, depending on the traffic pattern, we might want to scale only the write-part. Or, we might want to scale only reads. Finally, we might want to scale only a small fraction of the overall read functionality that the system enables.
We might also want to scale both reads & writes. The good thing is that with event sourcing, we can again do so independently, using different approaches. It's this freedom when it comes to deciding what to scale and how, which makes event sourcing so useful to implement performance-related requirements.
How might this look in practice? As our primary source of truth is the stream of events, a write operation can be made as simple as appending an event to a log-like database. This might be Kafka, EventStore, Cassandra, or even SQL with a good archiving and sharding strategy: each technology has its pros and cons. An append is typically a very fast operation, comparing to e.g. typical usage of a relational database, which has to deal with locks, transactions, read and write clients etc. Hence, the system might have good overall performance.
For a more in-depth view on why writes in a database often used for event sourcing — Apache Cassandra — are so fast, take a look at Andrzej Ludwikowski's article "Cassandra writes in depth".
To implement and scale reads, we can create various views of the event stream; this is also called read models or projections. Some views might store data in a relational database, so that it can be accessed using SQL. Other views might analyse the data and feed it to Spark, Hadoop, Kafka or any other data analytics framework. Finally, read models which will be under heavy load might be created in multiple copies (to balance traffic), using either in-memory or NoSQL storage.
One formalisation of the above architecture is CQRS (Command Query Responsibility Segregation; see also the much more comprehensive e-book by Greg Young). This pattern is often used in combination with event sourcing, to implement high-performance systems. The reactive movement is also closely related, as event sourcing might directly implement the message-passing paradigm, and is one way of building responsive, resilient and elastic systems.
Where's the catch, then? Well, there's a couple of catches, in fact. First of all, in a distributed setting, appending data to a log isn't that easy. First, you need to make your log distributed. Again, Kafka/Cassandra/EventStore make this possible, however, whenever you start dealing with distributed data, you‘re introducing new operational and implementation complexity.
Second, you might encounter eventual consistency. Data that you write doesn't have to be immediately available when read, as it first needs to be written to the read models. A write is considered successful when it's appended to the log, not when all read models get updated. This flexibility in implementing the data views also brings its challenges, ranging from user experience, through request deduplication, to ensuring data consistency on the business logic level.
A related technique, which can be used in systems with high data volume or a large number of entities, for which state has to be stored, is a combination of event sourcing and the actor model. This approach is the basis e.g. of Akka persistence.
Each actor in such a system stores data for a single entity. The data is stored in-memory. Any updates to the actor's state arrive as events. These events are persisted in a database (which might also be distributed, e.g. Cassandra), and then the actor updates its internal state based on the data in the event.
Hence, the actor's state can be entirely reconstructed from the events relating to that single entity. The longer the event history, the more costly this might be, that's why it's usually desired to also store actor-state snapshots for fast recovery and migration of actor/entity state.
What makes this architecture fast? First of all, all reads are done from memory (assuming the actor for a particular entity is materialised). Second, all writes are simple append operations. Thirdly, it's trivial to shard the data in the system, taking the entity id as the shard key.
That is, the only data routing requirement is that all events for a single entity should be directed to a single node in the cluster. We thus need a coordinator which knows which actors/entities are assigned to which nodes, and that's it! These coordinators can be clustered as well, to remove a single point of failure. That way, we can implement a data storage and processing system which scales horizontally — just by adding more computing power. You don't always need such as setup, but when you do, event sourcing might be a good choice!
The implications of an event-first approach when designing data storage system has been studied in depth by Martin Kleppmann. A good reference in that area is his book, "Designing data intensive applications".
In a traditional database, internally the transactional (e.g. write-ahead) logs are already based on events. However, what a relational database gives us, is a projection of the events so that it's possible to conveniently work with the "current" state of the entities using SQL.
Martin's core idea of "turning the database inside-out" is to work with this log of events directly; this way we can scale writes to the log, and create projections as necessary. While this approach is more scalable and more performant, it also removes all of the goodness that relational databases give us: SQL and ACID transactions. Also, it's not that new, as Martin himself notes:
Martin Kleppmann@martinklSuch a powerful idea; keeps getting reinvented under different names (event sourcing; lambda/kappa architecture; database inside-out/unbundled; state machine replication; etc). twitter.com/neil_conway/st…07:33 AM - 11 May 2018Neil Conway @neil_conwayReflecting on the fact that I've built some flavor of the "immutable, append-only log + materialized views" pattern into every non-trivial software project I've built since 2006.
This "inside-out" approach might also be a good candidate for an intra-microservice, asynchronous communication pattern. By making the events (and their schema) public, each microservice can arbitrarily use the data emitted by others.
We get very loose coupling, without the need for an upfront design of e.g. an HTTP API, with pre-determined data projections that are made available. Opinions differ, however. As Oliver Libutzki notes in his article, while Domain Events (more on this later) can be used for communication between services/bounded contexts, event sourcing should be kept as a service-local implementation detail.
When looking from a data modelling perspective, event sourcing shows other benefits. Designing a system where the stream of events is the central source of truth forces you to think differently about the flow of the data through the application.
In CRUD applications based on relational databases, various forms of "events" are often added as an after-thought. As we implement our applications, we discover that otherwise separate components must be notified of a change that took place somewhere else in the system. Or, we need to update an external system to keep it in sync with our data, but for performance reasons we do it in the background. If you've worked on business applications before, the above probably sounds familiar!
Event sourcing reverses the relationship between data and events. Events no longer are secondary, derived from the changing reference data. It's the events that drive all of the business logic, and it's the read models that are derived from the event stream. Greg Young explains it in a somewhat simplified form:
Greg Youngwant to learn event sourcing?
f(state, event) => state18:37 PM - 17 Mar 2013
The consequences of this change are much more far-reaching than only data modelling. The whole structure of an application written using event sourcing might be different, ranging from code organisation, through defining boundaries of modules, ending in extracting whole microservices, which might process a data stream independently.
Think of a simple shopping system, where you can buy socks. In a traditional system, you would have a
/buy/socks API endpoint. Upon invocation, this would update the
SOCKS inventory table, perform an insert to the order table, send an email and perform another API call to the fulfillment service. However, each modification of the process, however trivial, involves changing this core logic.
With event sourcing, we start thinking a bit differently. We no longer have an incoming request that arrives at our system, and database tables to be updated. Instead, we are dealing with a
UserBoughtSocks event (or with a command, that creates the event).
It's something that already happened (the user clicked the big green "Buy" button), and now we must deal with it. We might have synchronous event listeners which verify, that we actually have the socks in stock. We might have asynchronous event listeners, which send emails, create other events for the fulfillment service, or aggregate metrics.
If we start thinking in terms of immutable events, that already happened, instead of thinking in terms of requests and mutable database tables, we'll soon discover that our code takes a much different shape. We'll see that individual functionalities become more loosely coupled, as they are driven by incoming events, not by a central request controller. We might also see workflows that start emerging, probably somehow mirroring the business process that is happening behind the scenes.
Event sourcing gives us a new degree of flexibility, thanks to the separation of the action from effect (see this interview with Martin Kelppmann). We have flexibility in reacting to events happening in other parts of the system; we also have the flexibility of creating multiple read models.
We are no longer constrained by a one-size-fits-all data model. We can create many redundant read models, duplicating data as needed (the event stream is always the primary source of truth anyway), aggregating and indexing data as required by a particular functionality. Here, event sourcing is truly liberating.
Moreover, existing read models can be recreated and new read models can be created retrospectively, taking into account all the data that is available in the event stream — taking advantage of full information. However, make sure to test such scenarios as well!
Udi Dahan@udidahanThis has been one of my main complaints around the "just replay the event stream" solution presented to many things in the event-sourcing community. twitter.com/tjaskula/statu…09:11 AM - 21 Aug 2019Tomasz Jaskuλa @tjaskulaThe hard part is to figure out what actions that user took on those false event streams are actually relevant and how to keep them in rewriting process.
No discussion of data modelling can be complete without a mention of DDD (Domain Driven Design). And indeed, event sourcing fits like a glove. The central concepts of DDD is that of an aggregate: defined sets of data, which capture an important business concept.
When modelling a system, one natural way is to define the aggregates together with the aggregate events. Such an event captures a single step of a business process, or one transition in a workflow, or part of the aggregate's lifecycle.
Defining the (many) aggregates is a discovery process, necessarily involving the business! This is done together with sketching interactions that occur between them through events; following this methodology, we might soon discover the bounded contexts of our system. Event Storming is a related technique (not only through its name!): both a workshop format and design approach for modelling complex business domains.
Where's the catch? When modelling data, designing a system and most importantly evolving a system, it's important to retain a clear view of the interactions between various components. The liberty of reacting to events might need to be constrained.
With event sourcing we get looser coupling of components, but that might also end up as a tangled mess of hard-to-follow relationships. The business logic might get decentralised. Keep in mind, that while sending an email is a concern that is probably completely separate from updating the inventory service, quite often business logic is a series of steps, that need to be done in sequence: that's just the nature of the world. Don't try to hide the fact by artificially creating multiple events.
You might also look at other patterns, that try to address the above mentioned issue and that are being used in DDD. One is the Saga pattern, both in the choreography and orchestration variants. Another is context mapping, which helps to identify relationships between bounded contexts.
The interactions between aggregates and events, where events are produced and where they are consumed can't always be expressed only in code. Other means of conveying information might be needed here: diagrams, drawings, conventions. Self-documenting code is good, but documentation is sometimes still needed.
Putting both the specific DDD techniques, and the potential modelling problems aside, I think the "Data Modelling" section is best summarised by Jessica Kerr:
Event Sourcing gives you a complete, consistent model of the slice of the world modeled by your software. That's pretty attractive.
Data auditing is the process of storing historical data, or capturing the changes performed to a dataset, along with metadata such as timestamps or user identifiers. Thanks to a data audit we can find out who made a particular change, when, and in some cases where or how.
The need for a data audit may come from a couple of sources. First, this might a regulatory requirement. If we are dealing with, for example, a financial application, we might be required to store all historical data for a certain period of time.
Audits might also be a security-related feature available for users. Every time you can browse through your recent actions, such as logins, profile changes, settings updates, you are in fact browsing an audit of your actions.
Finally, audit data might prove to be very valuable for analytics. It can help in understanding how a system is being used, what are the most common behaviours, taking into account both the timeline and quantity of actions.
There's a couple of approaches to implementing a data audit, and event sourcing provides one elegant way. Looking at the event stream from another angle, it's in fact the audit trail of our application. The audit is not an after-thought that's added to the system in the last few sprints; the audit drives the application.
Beyond all of that for me, the focus tends to be on the current state based model when the system is really event centric. This may sound like a nitpicky issue but I find teams doing this look at events with less importance than those who use events as storage as well as the latter only use events they are extremely event centric.
Greg Young, "Why use Event Sourcing?"
We've seen above that event sourcing can change the way we think about data modelling, as well as our approach to system design, going as low level as impacting code organisation. The same is true for data audits; the roles get reversed comparing to other solutions. It's the audit trail from which all other data are derived.
One immediate gain from such a situation is that it's impossible to forget to audit a specific operation, as no operation can happen in an event-sourced system without an event being written to the event log. That way, we make sure that no information is being lost. That's quite a good trait, given that we all work in Information Technology (IT)!
It's worth keeping in mind that the event stream itself stores only "raw data". Making this data useful will most probably involve some kind of transformation. If it's a requirement to present a user's security audit trail in an application, we should probably create a projection (a read model) which will store the appropriate data, and index them in a form that's useful for the audit view. Same for security or analytical audits.
Another benefit of a fully-audited/event-sourced system surface when we are faced with the need to debug a problem. Not only we have additional information on exactly what changed and when, we can also re-create the system state at any point in time, by replaying events up to the problematic one. This might prove valuable when trying to reproduce an issue.
What kind of problems we might expect when using event sourcing for data auditing? First of all, we'll never be able to track everything. There's always decisions to be made, as to what should be captured as an event. Any changes to the data, that have been performed by the user result in an obvious event.
How about page views? That's probably too fine-grained. We would quickly run into problems with our data storage running out of space. An audit trail shouldn't be used to perform website traffic analysis. However, we might want to capture the fact that a user has viewed some sensitive data. The answers here are very application-specific.
Another group of problems relates to deleting data. Either because we want to be nice to our users, or because we want to comply with privacy regulation in the EU and US, we might need to implement a feature of removing all data for a particular user. This is in obvious contradiction with the immutability of the event stream. In addition, some systems of storing a distributed log make it hard or impossible to modify data that is already written.
There are several solutions: using log compaction, periodically rewriting the event stream, or using encryption keys which are "forgotten" when a user requires their data to be deleted. Again, there are no universal solutions, and each use-case might require slightly different approach.
Finally, in the introduction we said that we can apply event-sourcing selectively, only where this is truly needed. When implementing an audit trail using events, this stops being true; event sourcing is becoming much more "viral" as all places that require capturing the audit need to use event sourcing.
If you're interested in other approaches to data auditing, check out our whitepaper on the subject!
Event sourcing is often discussed in combination with NoSQL databases and messaging systems, such as EventStore, Cassandra and Kafka. But does event sourcing mandate the use of these storage systems? Is it necessary to introduce the high operational cost of a distributed or "NoSQL" database, just to use event sourcing? No!
Event sourcing can as well be implemented on top of a traditional, relational database. It's just as good for low-volume, internal applications, as well as for high-performance, mission-critical systems (however, the reasons for introducing event sourcing would be radically different).
The datastore options range from using relational databases both for events and the read model (and leveraging transactions in the process! See here for an example implementation using Scala), all the way to fully distributed databases and separate read&write models.
Above we've presented three common scenarios, where using event sourcing might be beneficial. Nothing comes for free! As with any other technology, it's easy to make mistakes when implementing event sourcing.
Hence whether you are looking at event sourcing from the perspective of performance, data modelling or data auditing, keep in mind the tradeoffs that will need to be made.
However, a lot of factors speak in favour of the pattern:
- the ability to scale reads and writes independently
- natural sharding of data
- the liberty of creating arbitrary, tailored read models
- data-centric approach do system modelling
- loose coupling of business logic
- audit-driven applications
In other words, event sourcing might be your gateway to a flexible, no-information-lost, unconstrained, but audited future!