Flavio Rosselli

Posted on Oct 2

Everything you need to know about UUID.

#uuid #development #javascript #database

A Universally Unique Identifier (UUID) is a 128-bit label used in computer systems to identify information uniquely. UUIDs are designed to be unique across space and time, allowing them to be generated independently without a central authority, minimising the risk of duplication.

UUIDs serve various purposes, including:

Identifying records in databases.
Tagging objects in distributed systems.
Serving as primary keys in applications where uniqueness is critical.

Real-world Use Cases

Databases: UUID is used as the primary key in relational databases to ensure the unique identification of records.
Microservices: Facilitate service communication by providing unique identifiers for requests and resources.
IoT Devices: Identify devices uniquely in a network, ensuring that data from multiple sources can be aggregated without conflicts.

Advantages and Disadvantages in use of UUID

Advantages:

Global Uniqueness: UUIDs are extremely unlikely to collide, making them suitable for distributed systems where multiple nodes generate identifiers independently.
No Central Authority Required: They can be generated without coordination, which simplifies operations in distributed environments.
Scalability: They work well in systems that require scaling across multiple servers or services.

Disadvantages:

Storage Size: UUIDs consume more space (128 bits) compared to traditional integer IDs (typically 32 bits), which can lead to increased storage costs.
Performance Issues: Indexing UUIDs can degrade database performance due to their randomness and size, leading to slower query times compared to sequential IDs.
User Unfriendliness: UUIDs are not easily memorable or user-friendly when presented in user interfaces.

The Standard

The standard representation of a UUID consists of 32 hexadecimal characters divided into five groups, separated by hyphens, following the format 8-4-4-4-12, resulting in a total of 36 characters (32 alphanumeric plus 4 hyphens).

The UUID format can be visualized as follows:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

Where:

M indicates the UUID version.
N indicates the variant, which helps interpret the UUID's layout.

Components of a UUID

TimeLow: 4 bytes (8 hex characters) representing the low field of the timestamp.
TimeMid: 2 bytes (4 hex characters) representing the middle field of the timestamp.
TimeHighAndVersion: 2 bytes (4 hex characters) that include the version number and the high field of the timestamp.
ClockSequence: 2 bytes (4 hex characters) used to help avoid collisions, especially when multiple UUIDs are generated in quick succession or if the system clock is adjusted.
Node: 6 bytes (12 hex characters), typically representing the MAC address of the generating node.

Types of UUIDs

Version 1: Time-based UUIDs that use a combination of the current timestamp and the MAC address of the generating node. This version ensures uniqueness across space and time.
Version 2: Similar to version 1 but includes local domain identifiers; however, it is less commonly used due to its limitations.
Version 3: Name-based UUIDs generated using an MD5 hash of a namespace identifier and a name.
Version 4: Randomly generated UUIDs that provide high randomness and uniqueness, with only a few bits reserved for versioning.
Version 5: Like version 3 but uses SHA-1 for hashing, making it more secure than version 3.

Variants

The variant field in a UUID determines its layout and interpretation. The most common variants include:

Variant 0: Reserved for NCS backward compatibility.
Variant 1: The standard layout used for most UUIDs.
Variant 2: Used for DCE Security UUIDs, which are less common.
Variant 3: Reserved for future definitions.

Example

For Version 4, a UUID might look like this:

550e8400-e29b-41d4-a716-446655440000

Here:

41d4 indicates it's a version 4.
a7 represents the variant, in this case, the common "Leach-Salz" variant.

How UUIDs are Calculated

Version 1 (Time-based):
- The timestamp is typically the number of 100-nanosecond intervals since October 15, 1582 (the date of the Gregorian calendar reform).
- The node is the MAC address of the machine generating the UUID.
- The clock sequence helps ensure uniqueness when the clock time changes (e.g., due to system restarts).
Version 3 and Version 5 (Name-based):
- A namespace (like a DNS domain) is combined with a name (like a file path or URL) and hashed.
- The hash (MD5 for version 3, SHA-1 for version 5) is then structured into a UUID format, ensuring the version and variant fields are properly set.
Version 4 (Random-based):
- Random or pseudo-random numbers are generated for the 122 bits of the UUID.
- The version and variant fields are set accordingly, ensuring compliance with UUID standards.

UUIDv4 Calculation Example

Step 1: Generate 128 Random Bits

Let's assume we generate the following 128-bit random value:

11001100110101101101010101111010101110110110111001011101010110110101111011010011011110100100101111001011

Step 2: Apply UUIDv4 Version and Variant

Version: Replace bits 12-15 (4th character) with 0100 (for UUID version 4).
Original: 1100 becomes 0100 → Updated value in this position.
Variant: Replace bits 6-7 of the 9th byte with 10 (for the RFC 4122 variant).
Original: 11 becomes 10 → Updated value in this position.

Step 3: Format into Hexadecimal

Convert the 128-bit binary into 5 hexadecimal groups:

32-bit group: 11001100110101101101010101111010 → ccda55ba
16-bit group: 1011101101101110 → b76e
16-bit group: 0100010101000101 → 4545 (with 0100 for version 4)
16-bit group: 1010110111110010 → adf2 (with 10 for the variant)
48-bit group: 11010011011110100100101111001011 → d39d25cb

Step 4: Combine the Groups

The final UUID would look like this:
ccda55ba-b76e-4545-adf2-d39d25cb

Top comments (5)

Samuel Rouse • Oct 2

I don't think it's a standard yet, but UUIDv7 promises to solve one of the biggest issues with UUID, which is the lack of sequencing. While maintaining sufficient randomness to avoid overlap, it provides a sequence of events which can be extremely useful for partitioning data, archiving records, and providing more of a guarantee of sequence.

Mike Talbot ⭐ • Oct 3 • Edited

I base so much of my code on v5 GUIDs. Unique for the same data, so bloody helpful, cache keys, filenames for s3, and on and on.



const { v5 } = require("uuid")

const NAMESPACE = "fd671d64-7115-431e-93b0-fc518f1f9944"

function deriveGuidFrom(...data) {
    return v5(JSON.stringify(data), NAMESPACE)
}

module.exports = { deriveGuidFrom }

Then you can just do cache keys like:



     const cacheKey = deriveGuidFrom(tableName, filters, sortOrder)



     const fileName = deriveGuidFrom(fileContents)

Samuel Rouse • Oct 3

This is an interesting idea. Is a v5 faster/better than other hashing options? Is the benefit primarily that the output is a UUID?

Mike Talbot ⭐ • Oct 3

Yeah, the benefit is it makes a short key out of what would be a much longer hash, etc. It's comparable with other keys quickly. I'm always storing things in Redis using such a key, or for instance, I name my files in s3 using v5 guids, then I don't need to bother virus scanning or uploading a file that already exists.

Scott Reno • Oct 2

Interesting! I had no idea how UUIDs were generated

DEV Community