Discussion on: What would you use as a sortable, globally unique, ID?

View post

May I ask what the need for sorting is?
I personally don't see many reasons to sort by id, most often what people really want is sort by creation date.

Anyway, back in topic. This a tricky question because globally unique and predictable are two properties in direct clashing with each other. A solution is taking a look at Microsoft SQL Server Sequential ID, a sortable GUID.
There a couple of disadvantages:

They are not decentralized. If they were you would lose the anti-clashing capabilities of GUIDs.
They are predictable. Don't ever expose them to the public as they can lead to resource enumeration.

Basically, you may as well use BIGINT as your ids if you need sorting.

If you need to sort events on a distributed system, better use some other properties. Like timestamps.

rhymes • Sep 7 '19 • Edited

May I ask what the need for sorting is?

Sure, efficiency and convenience mostly. Let's say we use UUIDs, they work mostly well until these IDs land in a place far from your system. Someone decides to store events on S3 using the UUID as a file, suddenly you have gigabytes of events that can't sort well unless you peek inside the file to find the timestamp.

Or you grep an event log and suddenly you have to come up with a combination of bash commands in a pipe to extract both the ID and the timestamp to sort them.

Or you want to create shards out of them to group related data in different machines, UUIDs are useless for that.

In my experience UUIDs are great, until they aren't :D

The great thing about globally unique and sortable IDs is that they carry information with them. If well designed I can even deconstruct them to extract such info.

This a tricky question because globally unique and predictable are two properties in direct clashing with each other.

Not exactly true, see the examples at the end of my comment :)

A solution is taking a look at Microsoft SQL Server Sequential ID, a sortable GUID.

This definitely wouldn't work as you said, they are basically sequential as the name says.

If you need to sort events on a distributed system, better use some other properties. Like timestamps.

But timestamps are not globally unique and can be duplicate.

I'll leave you with two examples of partially sortable IDs that are also random and unique:

K-Sortable Globally Unique IDs, or ksuid, is a type of ID that has an encoded 32bit timestamp with 1 second resolution prefix and a 128bit random part as a suffix. The explanation of how Segment got there is great: A brief history of the UUID
Universally Unique Lexicographically Sortable Identifier, or ulid, this ID uses 48 bit with millisecond resolution for timestamps and 80 bits of randomness.

For example, Firebase uses something like this for their IDs: The 2 ^ 120 Ways to Ensure Unique Identifiers