jiisanda🙆‍♂️

Posted on Mar 19

UUID or ULID: Awesomeness of Unique Identifiers!

#uniqueidentifiers #uuid #ulid #database

Welcome! In this article we are going to have the showdown of two prominent choices of unique identifiers: UUID and ULID! 😲

In the landscape of software development, the task of generating unique identifiers has always been a crucial challenge. Whether it's managing database keys, tracking events in distributed systems, or ensuring session security, the choice of identifier can significantly impact the efficiency and performance of your application. In this showdown both parties present their strengths, factors that could make you choose one over the other, a glimpse of there implementation, and weaknesses. So sit back, get some popcorns 🍿, and get ready for a showdown that will empower you to make an informed decision for your next project.

Stage Setup

In software development, unique identifiers play a crucial role in ensuring data integrity, system scalability, and security. They act as unique markers for various entities within a system, like database records, distributed events, and user sessions.

Traditional auto-incrementing IDs, while simple, can become problematic at scale, leading to performance issues, collision risks, and data leakage.

if need explanation for any of the problems do comment happy to answer...

So, the two powerful alternatives in the world of unique identifiers: UUIDs and ULIDs!

UUIDs are the OGs of uniqueness, with the standardized format and 128-bit punch, these chads says, "I'm globally unique, baby!" 😎. As they boast their universal uniqueness across the cosmos💫!

But then comes ULIDs, the new kids who brings the whole new steps in the game! UILDs ain't just uniqueness, but also lexicographically sorted, baby! With the blend of timestamp sweetness and randomness, ULIDs slide into your codebase like smooth saxophone 🎷 on a summer night.

So what's the fuss, you ask? Well, UUIDs bring the tried-and-true reliability, perfect for when you need global uniqueness. But ULIDs? They are all about time based sorting, making them the go to identifiers where chronological order is the thing.

UUIDs (Universally Unique Identifiers)

UUIDs are different from sequential ids. RFC_4122 says,

UUIDs are of a fixed size (128 bits) which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.

Layout and Byte order of UUID

timestamp

60 bit value
UUID1: represented by UTC (Since, 00:00:00.00 15th Oct 1582)
UUID3 and 5: timestamp (50 bits) is constructed from name (See Algo below)
UUID4: timestamp is randomly or pseudo-randomly generated. (see algo)

clock-sequence (14 bit)

UUID1: clock sequence is used to help avoid duplicates that arises when clock is set backwards in time (or) if the node ID changes.
UUID 3 and 5: 14-bit value constructed from a name (see Algo below)
UUID4: randomly or pseudo-randomly generated. (see algo)

node (48-bit)

UUID1: node field is an IEEE MAC address, usually host address.
UUID 3 and 5: Constructed from name (see algo)
UUID4: randomly or pseudo-randomly generated (see algo)

RFC_4122: Basic Algorithm sates an algorithm for generating UUIDs if they do not need to be generated frequently, but there were some issues. And hence different versions of UUIDs was implemented.

Let's look briefly at each and peek at it's Python implementation...

UUID1 (MAC Address + timestamp)

UUID1 concatenates the 48-bit MAC Address of the "node" (computer generating the UUID), with a 60-bit timestamp. The Python implementation is as follows:



def uuid1(node=None, clock_seq=None) -> UUID:
    """Generate a UUID from a host ID, sequence number, and the current time.
    If 'node' is not given, getnode() is used to obtain the hardware
    address.  If 'clock_seq' is given, it is used as the sequence number;
    otherwise a random 14-bit sequence number is chosen."""

    # some code

    time_low = timestamp & 0xffffffff
    time_mid = (timestamp >> 32) & 0xffff
    time_hi_version = (timestamp >> 48) & 0xfff
    clock_seq_low = clock_seq & 0xff
    clock_seq_hi_variant = (clock_seq >> 8) & 0x3f

    returns UUID(fields = (time_low, time_mid, time_hi_version,
                        clock_seq_hi_variant, clock_seq_low, node), version=1)

UUID 3 and 5

The version 3 and 5 are name-based UUIDs. For example, some name spaces are domain name system, URLs, or reserved words in programming languages. Some potential python specific name space ids are as follows:



NAMESPACE_DNS = UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_URL = UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_OID = UUID('6ba7b812-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_X500 = UUID('6ba7b814-9dad-11d1-80b4-00c04fd430c8')



def uuid3(namespace, name):
    """Generate a UUID from the MD5 hash of a namespace UUID and a name."""

    # some code    

    digest = md5(
        namespace.bytes + name,
        usedforsecurity=False
    ).digest()
    return UUID(bytes=digest[:16], version=3)

def uuid5(namespace, name):
    """Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""

    # some code

    hash = sha1(namespace.bytes + name).digest()
    return UUID(bytes=hash[:16], version=5)

UUID4

The version 4 is meant for generating UUIDs from truly-random or pseudo-random numbers



def uuid4():
    """Generate a random UUID."""
    return UUID(bytes=os.urandom(16), version=4)

Now if none of bytes, fields is given then class UUID() will generate a TypeError saying one of the hex, bytes, fields, or int argument must be given.

This was all about UUIDs, if want to know more about UUID do look at RFC 4122...

ULIDs (Universally Unique Lexicographically Sortable Identifiers)

A ULID is 128 bit compatible with UUIDs, we can be generate 1.21e+24 unique ULIDs per second. These are as the name suggest lexicographically sortable. ULIDs are case sensitive, and no special character so URL safe.

Layout

In general the structure of a ULID is as follow



 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

timestamp

48 bit integer
UNIX-time in milliseconds
Won't run out of space 'til the year 10889 AD.

randomness

80 bits
Cryptographically secure source of randomness, if possible.

Sorting and Encoding and Montonocity

The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set is used. For encoding Crockford's Base32 is used as shown below. This alphabet excludes I, L, O and U to avoid confusion and abuse.



0123456789ABCDEFGHJKMNPQRSTVWXYZ

While generating a UUID within same millisecond, it can provide some guarantees regarding some order. So if same millisecond is detected, the random component is incremented by 1 bit in the least significant bit position.

Usage

You usually would create a new :class:ULID object by calling the default constructor with no argument. In that case it will fill the timestamp part with the current datetime. And to encode the object it is usually converted to string.

You can create ULIDs, using different property passing as arguments. It can be generated using timestamp, or from uuid, from hex or byte, from string, or from datetime.

Advantages of ULID over UUIDs

shorter string representation (26 characters in ULIDs vs 36 in UUIDs)
Sortability for efficient ordering and retrieval.
Potential performance benefits in certain scenarios like
- In databases that uses the sorted indexes, ULIDs can potentially improve query performance because they leverage the existing sorting order of the index.
- When working with time-series data, ULIDs (which often includes a timestamp component) can be stored and retrieved in chronological order without additional sorting.

Choosing the Right Champion

Now that we've explored both UUIDs and ULIDs, let's help you pick the champion for your next project!

Here's is a quick comparison:

Feature	UUIDs	ULIDs
Uniqueness	Guaranteed	Guaranteed
Sortability	No	Yes (Lexicographically)
String Length	36 character	26 character
Performance	Generally Good	Potentially better with sorted indexes/time-series data

When to choose ULIDs

Sortability is essential: ULIDs excel when you need to efficiently sort or filter your identifiers.
Performance optimization matters: In scenarios with sorted indexes or time-series data, ULIDs can potentially offer performance benefits.
Compactness is desired: The shorter string length of ULIDs can be a space-saving advantage.

When to use UUIDs

Focus on guaranteed uniqueness: If the absolute certainty of no collision is paramount, UUIDs are the established choice.
Sorting isn't a priority: If order doesn't matter for your identifiers, UUIDs function perfectly well.

Ultimately, the best choice depends on your specific project requirements. Weigh the importance of uniqueness, sortability, performance, and string length to make an informed decision.

Example

Let's a look over how can you generate UUID and ULID in python...



# Generate a ULID

from ulid import ULID

ulid = ULID().generate()

print(f"ULID: {ulid}")  # Example output: 01HQCK8PK2T23Q13VVS03K47F9E



# Generate a UUID (version 4 - random)

import uuid

uuid = uuid.uuid4()

print(f"UUID: {uuid}")  # Example output: 123e4567-e89b-12d3-a456-426614174000

Conclusion

This article has explored the strength and weaknesses of two potential contenders in the unique identifier arena: UUIDs and ULIDs.

Key Takeaways

Both UUIDs and ULIDs guarantee uniqueness, a crucial aspect for data integrity and security.
UUIDs reigns supreme when prioritizing absolute uniqueness and don't require sorting capabilities.
ULIDs shine when sortability and potentially improved performance are key consideration, thanks to their lexicographically sorting and timestamp component.
Their compact string representation (26 characters) offers a space-saving advantage compared to UUIDs (36 characters)

DEV Community

UUID or ULID: Awesomeness of Unique Identifiers!

Stage Setup

UUIDs (Universally Unique Identifiers)

Layout and Byte order of UUID

timestamp

clock-sequence (14 bit)

node (48-bit)

UUID1 (MAC Address + timestamp)

UUID 3 and 5

UUID4

ULIDs (Universally Unique Lexicographically Sortable Identifiers)

Layout

timestamp

randomness

Sorting and Encoding and Montonocity

Usage

Advantages of ULID over UUIDs

Choosing the Right Champion

When to choose ULIDs

When to use UUIDs

Example

Conclusion

Key Takeaways

Top comments (0)

Read next

Optimizing Pagination in PostgreSQL: OFFSET/LIMIT vs. Keyset

Benchmarking Crunchy Data for latency

The Prisma ORM: A Brief Overview and Introduction

When and Why to Use JSON Columns in SQL Databases