DEV Community

Discussion on: Event Storage in Postgres

Collapse
 
damiensawyer profile image
damiensawyer

Thanks for this. Very useful.

Your stream table has got me thinking. I worked on an event sourced system once where they managed consistency between concurrent writers by reading all the events in a stream in a snapshot isolated transaction then writing the new events in the same transaction - catching the concurrency exception if someone else had written records inside the range covered by the initial select query (at least, I think that's how it worked).

I'm building a system now and was planning to do similar. Advantages over yours is that there's not an extra table required. Disadvantages are that you need to keep the transaction open between the reads and the writes. I'm wondering if yours would allow for greater concurrency on on the db?

This is my first large Postgrep app in anger (after years with SQL Server). Perhaps some more investigation is required on this.

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

I'm wondering if yours would allow for greater concurrency on on the db?

Yes, for sure. What you described is pessimistic concurrency, and it will allow concurrent writers (if all writers follow the same process). But it will slow down drastically as you add more concurrent writers. Generally, I prefer to do a single writer over pessimistic locking. For example, I used a built-in F# MailboxProcessor (essentially queue + worker) to process concurrent requests in a serialized fashion -- so there is no concurrency to worry about. And there is no need for a Stream table. For an internal system, you will probably never need anything else. If you need some availability guarantees, this works fine in an active-passive failover scenario.

If you need concurrent writers for scalability, then optimistic concurrency is better. It is essentially a check-and-set transaction on write only. No need to hold the transaction open while you run decision code. I have described the tables needed for optimistic concurrency, but I have conveniently (for me) left the append implementation as an exercise for the reader. I have a stored procedure for this that I may post once I have some confidence in it.

And if you run into so much concurrency against the same streams that even optimistic concurrency is too slow, then you probably need to move to stateful processing.

I have decent amount of experience in both MSSQL and Postgres (and MySQL a long time ago). MSSQL tooling and user experience is generally better. But Postgres text search is very compelling (and easy) over MSSQL FTS service. Postgres JSON support is great. Postgres still annoys me in the way it handles casing, but it is my default database choice over MSSQL.

Collapse
 
damiensawyer profile image
damiensawyer

Thanks for this.

On pondering it, are you sure that you need your Stream table?

The Event table has a unique constraint on (StreamId, Version). If writers are required to always increment the version to be 1 more than they highest event they read (which you may or may not be doing), then if two writers try to compete, that constraint will block the second.

Thread Thread
 
kspeakman profile image
Kasey Speakman

Thanks for the comment! That is a good point. I probably would not let the writer throw an exception. But I could check the Event table (instead of the Stream table) for the highest version. Like this:

SELECT Version
  FROM Event
 WHERE StreamId = @StreamId
 ORDER BY Version DESC
 LIMIT 1
;

Unless I’m missing something, this should be O(1) lookup due to the index.

The Stream table with a Stream Type can still be nice for admin purposes. But it does seem even less important now.

Thread Thread
 
damiensawyer profile image
damiensawyer • Edited

All sweet.

I can't see any reason that query wouldn't be fast.

I'm not sure that I'd have an issue with catching an exception. You'd only catch that 'specific' one (key violation or whatever) and then report that back gracefully to the caller. Alternatively you need to do a potentially unneeded read before every write. Catching the error and retrying is kind of like asking for forgiveness instead of permission.

I was reading about partial indexes (postgresql.org/docs/8.0/static/ind...).

Another way you might do your original method, but without an extra table, is to add a boolean to the event table named latest and require that the StreamId is unique in a partial index over records where that is true.

CREATE UNIQUE INDEX flag_latest_version ON Event(StreamId) WHERE latest;

I've kind of dismissed this though in favour of just enforcing the streamid/version unique constraint. It can fail if a writer tries to write an event with a version greater than 1 more than the current max but, as we control the writers, we can prevent that I guess.

Anyway - it's all good fun. I'm enjoying Postgres! :-)

As a bit of a non sequitur, check out github.com/pgManage/pgManage. I've looked at a stack of PG clients but nothing's come close to SQL Management studio. PGAdmin3 is dated, PGAdmin4 is slooooooow and not even any of the commercial ones have been that impressive. PGManage is great!

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

Yeah, you are probably right about the exception handling. I had since added a Commit table to the design, mainly to denormalize the meta data across the different events in the same commit. And between 3 tables, I had it in my head to detect conflicts early. But there is great value in the simplicity of one table and just handling the unique key violation exception to detect a concurrency conflict. Using a multi row insert it should still be an all-or-nothing append operation. This has been a great conversation to clarify my thinking. Thank you.

Aside: I use FP, so I typically try to avoid knowingly triggering exceptions unless things are actually exceptional (like database down). In this case, a concurrency conflict is not exceptional -- it is something I expect to happen sometimes under normal circumstances. However, triggering an exception at the database level is probably the best play and I will just convert it to a Result before handling it.

For the latest version idea, I probably would not opt for updating past records (to unset the latest flag). It would no longer be append only, and it may limit options in the future. For example, I have been considering the idea of backing up events to S3 (in addition to SQL backups), and updates would really complicate that scenario.

I appreciate your non sequitur. We have been using PGAdmin4 and have sortof found workflows that work for us. But it is not well polished in certain aspects. I will give PGManage a try. Thanks!

Thread Thread
 
damiensawyer profile image
damiensawyer

All sweet. I hadn't considered the "append only" thing, but that's a great point. I also didn't know about FP being less tolerant of throwing exceptions.

In the C# world I know that exceptions are slow and I definitely try not to throw them during normal execution - however I would have considered a concurrency failure a good case for throwing one (and specifically catching it higher up the call stack to make retry decisions). Perhaps this comes down to personal style and a glass half full vs empty argument.

I've got the idea of using S3 or similar for events as well (how cool is immutability!!). We haven't started designing our integrators and builders yet - but when we do, even if we don't do it now, I want to at least allow for the idea of running them all as serverless functions at massive scale. At the other end of the spectrum I'm wondering if we can build the whole system to run from a wrist watch. I'm thinking Postgres et al on Docker + k8s + Android Wear. No idea if there's a watch that can do it yet, but there might be soon :-)

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

re: Exceptions. FP is not necessarily less tolerant. I use F#, and it has pretty rich exception handling. But it is common in FP to represent anticipated error conditions as normal return values rather than through exceptions. Mainly because you cannot see what exception will be thrown unless you look in the code or it is documented. But you can see the return value type. In this case, I expect optimistic concurrency violations sometimes and I expect the caller to handle it. So even though the database adapter will throw a unique constraint violation exception, I will catch it and convert it to a normal return value. However, if the database is just down, I will let that exception propagate up to the caller. Hope that makes sense.

Using messaging (e.g. events) just opens up a whole world of things. It is awesome. :) If you really want your mind blown, event sourcing on the front end.

Thread Thread
 
damiensawyer profile image
damiensawyer

Cheers for that. :-)

Thread Thread
 
alphahydrae profile image
Simon Oulevay

There may be an edge case I have missed and I don't know about performance, but I think it may be possible to enforce that the version increment is always exactly 1 at the cost of some redundancy, by adding the following column and constraints to the table:

PreviousVersion int,
FOREIGN KEY (StreamId, PreviousVersion)
  REFERENCES Event (StreamId, Version),
CHECK (
  Version = 1 AND PreviousVersion IS NULL
  OR
  PreviousVersion IS NOT NULL
    AND Version - PreviousVersion = 1
)
Enter fullscreen mode Exit fullscreen mode

We make each event reference the previous version of the stream with a multi-column foreign key and enforce the increment by 1 with a check constraint. The first event (version 1) is allow to have a null previous version.

Thread Thread
 
damiensawyer profile image
damiensawyer

I've only just skimmed this quickly (trying to get something done with kids screaming!)
Some quick thoughts:

  • If you're always enforcing that PreviousVersion is Version - 1, why store PreviousVersion?
  • In the most recent event sourced system I worked on, we allowed for gaps in the events. This was so we could delete events post production (arrgh!! The nerve!!). This was done after years of managing production event sourced systems and really wishing that we could do that.
  • If you did have gaps, you could have a PreviousVersion column... however, if all of the versions are incremental, do you really need it? The previous version is ascertainable through sorting the events.
  • I guess that if you allows forks in your event stream it would make sense to refer to a parent.... kind of like a directed acylclic graph... but that's not really the use case here.

If any of that sounds confusing or off, it probably is as it was a very quickly considered response :-)

Thread Thread
 
alphahydrae profile image
Simon Oulevay

This idea of the previous version was in response to your (2-year old) discussion with the author about concurrent writers, namely:

The Event table has a unique constraint on (StreamId, Version). If writers are required to always increment the version to be 1 more than they highest event they read (which you may or may not be doing), then if two writers try to compete, that constraint will block the second.

And:

It can fail if a writer tries to write an event with a version greater than 1 more than the current max but, as we control the writers, we can prevent that I guess.

Assuming there's a bug in one of the concurrent writers and it increments by 2 or more, multiple events may be written in the same stream at the same time.

As an intellectual exercise, I was wondering if there was a way to enforce that writers always increment the version by 1 at the database level. That's the purpose of the previous version column. The multi-column foreign key forces an event to reference another event, and the check constraint forces that event to be the previous version of the stream exactly.

If a writer is bugged and tries to increment by 2, the database will refuse to store the event (since it would fail the check constraint). It complements the protection against concurrent writers provided by the unique constraint on stream ID and version, by also protecting against misbehaved writers.

Of course that would only work if you want to implement an extremely strict event store. It would not support gaps between events or deleting events like in the systems you mention. You probably don't want to be that strict in a real-life system.

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

So this brings up a classic trade-off of what should be managed through the DB and what should be managed by code. The particular trade-off for managing this constraint in the database is a performance hit. To ensure writers do the right thing, we sided with code and made a library so that they all behave consistently. And in fact I'm still searching for ways to reduce our existing indexing/locking (of Position table, see other comments) which negatively impacts write perf. Not enough to matter for our use cases, but I would like to optimize further.

I've started to see why Greg Young's Event Store made the trade-offs it did -- many of which involve moving responsibilities out of disk into service code. It doesn't have the auto-increment or position-lock problem because it uses a serialized writer so the current position is able to be tracked in memory and recorded to disk when events are written. (Your client code can make concurrent writes, but inside the ES service, writes are serialized and fast.) Instead of a Stream table to record StreamType, you make stream type part of the stream id. Then you can categorize streams after the fact based just on stream ID. Instead of a unique index on StreamId + Version, GYES uses an Event ID, which at the application level you are supposed to create deterministically from StreamId + Version. So the the effect is the same -- duplicate events are prevented from being saved. The big perf gains (especially around Position) are implausible when using Postgres because it is not optimized for this type of use case. But we still want to use Postgres since we'd really need more people before we looked at supporting multiple DBs. And I like that it is a gentle starting point for smaller teams/projects.

Thread Thread
 
damiensawyer profile image
damiensawyer

Kasey, that's really interesting about GYES. It's a product I'd really love to have a play with one day. We looked at it a few years ago when starting our system but I chickened out because "No one ever got fired buying IBM (... in this case, Postgres)".

Simon, I get what you're going for and it's interesting. It would be interesting to see how that affected write performance.

Thread Thread
 
gregyoung profile image
gregyoung

If you want to look at the code involved look in StorageWriter

github.com/EventStore/EventStore/b... is a good place to start.

Thread Thread
 
damiensawyer profile image
damiensawyer

Thanks Greg. That's great. I'll have a look.