Discussion on: Redefining Event Sourcing

View post

Thank you for the overview. The concept is quite clear.

How do you handle versioning?

I mean if business requirements have changed, and there's a need to work with the data differently, how do you update the logic?

For example, if there were an event of creating an order, and then you added some more fields to the order model, will event change as well? Or do you create a different type (version) of the event?

When you deploy a new instance of a microservice, with the updated logic, will it produce a new event, while old instances still producing old events?

It's clear that you can rebuild a read model after any change. I'm just curious how the entire process/approach looks for the Event Sourcing.

In traditional systems this is usually handled by migrations and backward compatibility with the n-1 version of the service.

Javier Toledo • Nov 5 '20 • Edited

Hi @poisonousjohn , this is a fantastic question and a crucial topic with which most teams struggle while switching to an event-driven mindset. There's certainly no one-size-fits-all solution to handle versioning because depending on your stack choices, you might have different options. In my experience, I've seen two general ways to approach this, but take them with a grain of salt because when you go down to your specific implementation, things can get blurry.

For instance, some people use Apache Kafka to distribute the events among services. In this case, they count with the schema registry to keep track of message schemas, and it has a schema evolution feature that allows you to "upgrade" old message versions to newer ones on the fly in some situations (you probably want to avoid changes that can't be resolved this way). In this context, a new service can consume the newest version of the schema while all other services are still emitting the old version. This is a nice article about the Schema Registry in Kafka with a much better explanation.

I've also known about teams that treated events as "immutable" structures, so when they need to "change" an event, they don't really change it. They tag the original event as "deprecated," create a brand new event and put an event handler in the middle to "translate" the events from the deprecated version to the new one (if needed). Doing this, existing services that were working with the old version can still work with the existing event stream, and the services that use the new implementation will use the new stream without looking back to the original one. Depending on the case, it could make sense to run a process to "migrate" all the events from the original event stream to the new one or delete it once you realize no service is using the deprecated version.

In any case, when you use event sourcing, you probably want to avoid having state shared by different services. Each service should have its own database and build its own projection from the events, so once you fix versioning at the event level, it should be much easier to handle changes in each service state because when things are properly designed, they own that state and can be changed without affecting other services.

At least this is what I've experienced/learned about this so far, I'd be really interested in hearing about other solutions to this challenge because I haven't found simpler/easier answers yet!

Ivan Fateev • Nov 6 '20

Great, thanks. You've just described what I had in my mind. Good to know that.

Anyways, it sounds quite complex to support, and I'm wondering about the advantages of the architecture, let's say, over the traditional approaches, like REST API + RabbitMQ to communicate with services.

This article focuses on implementation mostly, but not on the advantages.

Usually, Event Sourcing is really needed in the systems where write operations dominate over read ones. Like banking systems, where there are a lot of transactions, and you rarely ask for a statment.

I'm wondering what benefits do other people find beside the ones described in a typical examples of the Event Sourcing. Like real examples where this approach helped a lot.

Javier Toledo • Nov 6 '20

Yes, it certainly can become complex, but the advantages really pay off in some scenarios.

Talking about a close example to me, in e-commerce applications, the go-to way to scale is caching the catalog, and that helps a lot to keep the site online on sales days, but there are situations when the number of concurrent orders is such that the database becomes a bottleneck. When an order is processed a lot of transactional things happen under the hoods, you need to check that you have enough stock (and prevent others to pick it while you're processing the order), that the payment processor confirms the deposit, and trigger the warehouse processes to prepare the order, etc. In most e-commerce implementations all these things happen synchronously while the user waits, and some user actions could be blocked waiting for others to finalize, to the point that users get bored and leave or some calls time out and fail, with the consequent loss of orders (and money). With event-sourcing, you have an append-only database that can ingest way more data, it's easier to shard the writes, and the checks and everything else can be done asynchronously (you can still manage transactions with sagas). This requires a mindset change in the way the whole system is designed, and for sure it's not "easier", but this way no orders are ever dropped.

The main advantage I see over inter-services REST APIs is the inversion of dependence. When you use a REST API, the sender needs to understand the API contract and implement a client to consume it. When that API changes, you need to go back and find all the services that consume that API and change their API client implementation. Otherwise, everything fails. This can become challenging when you have more than a couple of services. In event-sourcing, as I mentioned in my previous message, one service that changes the contract could live with other services that are using a previous version of the message, reducing this risk. Dependencies happen at the data level, not at the code level, and that's generally easier to deal with.

With RabbitMQ you could deal with this dependency in a similar way, but the disadvantage I see is that RabbitMQ works with a real-time push mechanism that can easily overwhelm the consumers in some situations. I've seen a couple of times that because of errors or sudden user peaks, the consumers can't consume the messages fast enough until the hard drive is full and stops accepting messages. Event-sourced databases tend to be easier to scale and the messages are ingested and persisted until you can process them, so you don't need it to happen in real-time. The consumers can go at their own pace, they don't even need to be online when events are stored, and it's easier to recover from failures because no events are dropped.

Don't get me wrong, RabbitMQ is a fantastic piece of technology, and when the system is properly designed taking into account these edge cases, it can definitely work. Indeed, there are some implementations of Event Sourcing that use RabbitMQ under the hoods for notification events, so they are not really "competing" technologies.

Event-sourcing is not a silver bullet, but a way to store data that has some benefits in some situations like the one I described, but as with every other technique, I think it shines especially when you don't overuse it and combine it with other tools and techniques in a balanced way.