DEV Community

Cover image for Should We Rethink About IDs? A Deep Dive into "Snowflake IDs"
Shadi AJAM
Shadi AJAM

Posted on

Should We Rethink About IDs? A Deep Dive into "Snowflake IDs"

Let’s Start from begining: Why Do We Even Use IDs?

The short answer is "Labeling".

Since ancient times, we’ve had to label everything—animals, crops, livestock, geographic regions, even military units. As civilization grew, we began recording information on paper, collecting and storing data. Over time, this amassed into our first version of "Big Data." To manage this, we invent methods to sort and index all that paper, driven by one goal: finding information faster. This need for "organization" led us to create paper files, folders, shelves, and storage containers.

First kind of archiving

As we entered the computer era, we started using computers to store our files, shifting our basic labeling system into the realm of databases. In a database, each row (or file) gets a unique ID, usually starting with an auto-incremented value from zero. This makes it easy to organize and find information quickly.

Computers era for archives

Over time, as we began using distributed servers and databases, messaging and communication between devices became even more critical. Each record or message had to be unique, requiring a way to label it individually—without any duplicates—across the entire system, regardless of the device.

Modern datacenters

End of story, Let's brake "Snowflake IDs"!!

Snowflake IDs are "unique" identifiers created to solve the issue of ID duplication across distributed systems.

Orginally created by X (formaly Twitter) used for the IDs of tweets, also we can find kind of usage by major tech compaines like (Instagram, Uber, Github and Linkedin).

Snowflake ID Structure:

Snowflake ID is a fixed-length 8-byte, 64-bit (63 usable bits).

Snowflake ID is compact and efficient for storage in databases as a single 64-bit integer. This small footprint is ideal for high-performance systems, minimizing storage space while maintaining unique, ordered identifiers.

Structure Breakdown:

Snowflake ID Structure

Snowflake ID Structure

  • Empty bit (1 bit).
  • Timestamp (41 bits): Representing the time in milliseconds since a custom epoch.
  • Data Center/Machine ID (10 bits): Number present the generator machine/device, up to 1024 number.
  • Sequence Number (12 bits): serve as a sequence counter within the same millisecond, up to 4096 number.

Real world examples:

  • Linkedin uses Snowflake IDs on article editor Lets take my article as an example.

Linkedin uses Snowflake IDs on article editor

Snowflake ID: 7256902784527069184

Linkedin Snowflake IDs Breakdown

The table above breaks down the Snowflake ID, showing how LinkedIn structures its identifiers. The timestamp aligns exactly with the date and time I started writing this article: "October 29 at 5:40 AM".

X uses Snowflake IDs as post ID

Snowflake ID: 1851515326581916096

Twitter X Snowflake ID Breakdown

In this approach, X (Twitter) uses a starting timestamp of "1288834974657," which translates to "November 4, 2010, 1:42:54.657 AM." By adding the Snowflake ID timestamp, we get "October 30, 2024, 6:43:48.005 AM," indicating when the tweet was posted.

The datacenter ID identifies the machine that generated the tweet, while the sequence ID ensures each tweet is unique, even if created at the same time.

The good, the bad and the ugly!

Snowflake IDs: The good, the bad and the ugly

The Good:

  • Small Footprint: The 64-bit structure of Snowflake IDs makes them compact and efficient for storage.
  • Sortable: Snowflake IDs include a timestamp component, ensuring that IDs are roughly ordered by time.
  • Usable Components: Because the components are already has meanful data, this data can be used on any part of the system.
  • Customizable: Changing the allocation of bits for the data center id and sequence number as needs, basically you have 22 bit(10+12) you can divide them for whatever your needs.

The Bad:

  • Not Globally Unique: Snowflake IDs are unique within a one distributed system but may not be globally unique across different companies/systems.
  • Limited Numbers for Components: The number of bits allocated for data center and machine IDs restricts the number of unique identifiers for components. for ex: Data Center/Machine ID can only fit 1024 number
  • Complex Configuration: Properly configuring and managing the allocation of bits and unique identifiers for data centers and machines can become complicated, especially in large distributed systems.

The Ugly:

  • Clock Drift Issues and Dependency on Accurate Timekeeping: The system relies on precise time synchronization, which can lead to non-sequential IDs or even duplicates if clocks are out of sync.
  • Potential for ID Collisions: Without careful management and synchronization, Snowflake ID generation can lead to collisions or duplicated IDs, undermining the reliability of the system.

Snowflake IDs vs. GUIDs: A Potential Replacement?

Ahhh no diffiently not. Comparing "GUIDs" and "Snowflake IDs" is more like comparing "Sea" and "River", Yes at the base line both are "water" but with huge diffrences.

GUID vs Snowflake IDs

GUIDs are "GLOBAL IDs" it's great to ensure is that exact "label" is unique accross the globe.

Snowflake IDs are "SYSTEM IDs" it's great for all your system resources to know that "label".

Still here!? You are really interested!!!

Here is some Snowflake IDs Referances!

Top comments (0)