Generating short, unique ID that’s user-facing is not an uncommon feature: Zoom URLs, before its whole security hooha, looked something like this: https://domain.zoom.us/j/92607701764, while Coderpad (throwback to interviewing days) looked like https://coderpad.io/EZ6YP9RY.
When designing a service with IDs that’s user-facing, e.g. a URL link for sharing purposes, we would ideally want to create one that’s short, sweet and mnemonic-able. But the shorter it is, the higher the rate of collision is and therefore, the ID won’t be unique. But in theory, if the short ID generator is good (i.e. random), short IDs are not that easy to collide: an 8 character short ID consisting of 26 alphabets + 10 numbers has 36⁸ permutations. Throwing in special characters would increase the number of permutations.
You may be thinking, do I need to create such an ID for the items in the service I’m building? What about just using the object id from my database (e.g. MongoDB), which I can guarantee that it’s unique?
That works for sure, but suppose we are trying to create a shorter unique ID for easy peer-to-peer sharing and elegance. My product Teamo, a platform where teams can create lively group cards to build genuine workplace connections, also refers to cards with its shorter unique ID, even though each still has a longer object id. Another reason could be for decoupling, as explained by Redgate’s engineering blog.
Option 1 (not-so-good but works): Store short ID as an item field in DB and search for duplicates
One way to do it is to create a short ID, store it as an item field in the database. Every time a new random short ID is generated, we do a quick check of the database to make sure that it is unique.
You may recognize why that is not such a good idea — creating a new item that relies on an initial search through DB for duplicated short ID takes time. However, a good improvement would be to use an in-memory Redis cache instead of a DB, if the items are ephemeral (e.g. CoderPad rooms).
Option 2: UUIDs (guaranteed unique, but too large for our notion of short IDs)
You may be wondering… what about UUIDs? Unfortunately, UUIDs are pretty large (128-bit hexadecimal number). They do have wonderful guarantees: the chances of the same UUID getting generated twice is a negligible — here’s what Wikipedia says:
Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owned 600 million UUIDs.
UUIDs are also great in that they work in distributed environments, as they do not require coordination between different nodes and can be generated independently. This is because a UUID contains a reference to the network address of the host that generated the UUID, a timestamp (time of a transaction), and some randomly generated component.
Yet, a UUID is too big for our requirement of short ID. A UUID looks something like this: f4fdfd2b-d1f1–4156–86e6–533f9cf91416. Indexing items is also an issue since UUIDs takes up more size and would affect query performance.
Option 3: Creating an independent service that generates short ID
By creating such a service, we can avoid that initial search through the DB for duplicate short IDs. The question now is thus, how do we generate a short ID that we know for sure is not duplicated, and when we scale this service over different nodes?
Let’s look at some current implementations that may give us inspiration. Twitter Snowflake, the open-source version of which is unfortunately archived, is an internal service used by Twitter for generating 64-bit unique IDs at a high scale. The IDs are made up of the components:
- Epoch timestamp in millisecond precision — 41 bits (gives us 69 years with a custom epoch)
- Machine id — 10 bits (thus allowing uniqueness even when we scale the short ID generator service over different nodes)
- Sequence number — 12 bits (A local counter per machine that rolls over every 4096)
- The extra 1 bit is reserved for future purposes.
I’d imagine components to be different based on how your service is used. If you don’t envision items to be created so rapidly, perhaps the unique ID does not need up to millisecond precision, or if peak traffic is not that high, you may not need that many machines (2¹⁰ = 1024 machines). You could also use the name of the item, the creator of the item, the object ID or component as components to generate the ID, based on your service.
Hoping that this was a fun discussion about the possible options to generate short IDs! Feel free to correct us for any inaccurate details, and all suggestions are welcomed.
Creating a warmer team culture and celebrating your workplace connections go a long way. Whether you’re looking to welcome a new hire, thank a summer intern or celebrate a teammate’s birthday, Teamo works for you. Check us out at teamo.team!