In working on a new product, I made a new (to me) discovery -- cyclic streams. It is actually not a new concept, but I had not run across this pattern before. I wanted to describe it quickly for the two or three of you who do event sourcing. 😀 Cyclic stream is a term I made up. If there is a pre-existing term for this, let me know.
Some workflows are cyclic: a monthly audit, or yearly training. In the past, I have represented each one as its own event stream. There is a defined beginning and end event, and a small number of events in between. This is convenient because I never get into a situation where a stream replay is slow. Then I don't have to introduce extra architectural complexity like Snapshots (which I still haven't needed).
There is a limitation to this approach. There is a possibility that duplicate streams are created for the same time period. For example, one might be created through user action while a duplicate might be simultaneously created by back-end processes. The end result is that a student is signed up for the same training twice. That's not so bad as it is easy to notice in the UI. But it gets worse when the UI doesn't naturally surface duplicates, which might be the case with audits. The previous product used full consistency between the event and view stores. Race conditions were caught by a unique index on the view. But our latest product uses eventually consistent views, so I must decide what to do about this corner case.
Greg Young has a post about set validation which basically encourages the reader not to worry so much about preventing duplicates, but instead focus on detection and correction. Because this is unlikely to really happen in production. I agree with this advice in general. However, it still requires me to write extra tools for detection and correction. More stuff to be aware of and maintain, even if it is less work than the prevention options. Could I possibly manage to solve this problem with no extra code?
I started to wonder if I could use the same stream for all iterations of a workflow instead of making a new one each year. For example, training events for a particular student and course always go to the same training stream. I can use a deterministic UUID to calculate a consistent ID from the student and course ID.
Uuid.deterministic(StudentId + CourseId). And it would be the same as combining all those individual streams for each year into one big stream.
This prevents duplicates, but now I'm back to needing Snapshots. Only it gets worse. The training process might change over time. (It already has in our previous product.) So the snapshotter will have to know how to deal with the peculiarities of all previous incarnations of the training process! Ugh. And all we need are the current events on the end of the stream.
Hang on then... could I just read the end of the stream? I recall from playing with EventStore that it has a way to read the stream backward. I always wondered what this could be used for, but now it starts to make sense. I could read the stream backward until I hit a marker event, such as the end result of the last training:
Canceled or whatever). And I think that does it!
So to sum up, the ingredients to this pattern are as follows.
- A cyclic process, such as yearly training
- A consistent ID based on the IDs of entities involved
- Beware of using "natural keys", which can change
- Reading the stream backward until a marker event to get the "current" events
This deftly avoids the possibility of creating duplicate streams. It also requires no extra architectural pieces. Code I don't have to write (especially architectural) is my favorite.