A Universally Unique Identifier (UUID) is a 128-bit label used in computer systems to identify information uniquely. UUIDs are designed to be unique across space and time, allowing them to be generated independently without a central authority, minimising the risk of duplication.
UUIDs serve various purposes, including:
- Identifying records in databases.
- Tagging objects in distributed systems.
- Serving as primary keys in applications where uniqueness is critical.
Real-world Use Cases
- Databases: UUID is used as the primary key in relational databases to ensure the unique identification of records.
- Microservices: Facilitate service communication by providing unique identifiers for requests and resources.
- IoT Devices: Identify devices uniquely in a network, ensuring that data from multiple sources can be aggregated without conflicts.
Advantages and Disadvantages in use of UUID
Advantages:
- Global Uniqueness: UUIDs are extremely unlikely to collide, making them suitable for distributed systems where multiple nodes generate identifiers independently.
- No Central Authority Required: They can be generated without coordination, which simplifies operations in distributed environments.
- Scalability: They work well in systems that require scaling across multiple servers or services.
Disadvantages:
- Storage Size: UUIDs consume more space (128 bits) compared to traditional integer IDs (typically 32 bits), which can lead to increased storage costs.
- Performance Issues: Indexing UUIDs can degrade database performance due to their randomness and size, leading to slower query times compared to sequential IDs.
- User Unfriendliness: UUIDs are not easily memorable or user-friendly when presented in user interfaces.
The Standard
The standard representation of a UUID consists of 32 hexadecimal characters divided into five groups, separated by hyphens, following the format 8-4-4-4-12
, resulting in a total of 36 characters (32 alphanumeric plus 4 hyphens).
The UUID format can be visualized as follows:
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Where:
- M indicates the UUID version.
- N indicates the variant, which helps interpret the UUID's layout.
Components of a UUID
- TimeLow: 4 bytes (8 hex characters) representing the low field of the timestamp.
- TimeMid: 2 bytes (4 hex characters) representing the middle field of the timestamp.
- TimeHighAndVersion: 2 bytes (4 hex characters) that include the version number and the high field of the timestamp.
- ClockSequence: 2 bytes (4 hex characters) used to help avoid collisions, especially when multiple UUIDs are generated in quick succession or if the system clock is adjusted.
- Node: 6 bytes (12 hex characters), typically representing the MAC address of the generating node.
Types of UUIDs
Version 1: Time-based UUIDs that use a combination of the current timestamp and the MAC address of the generating node. This version ensures uniqueness across space and time.
Version 2: Similar to version 1 but includes local domain identifiers; however, it is less commonly used due to its limitations.
Version 3: Name-based UUIDs generated using an MD5 hash of a namespace identifier and a name.
Version 4: Randomly generated UUIDs that provide high randomness and uniqueness, with only a few bits reserved for versioning.
Version 5: Like version 3 but uses SHA-1 for hashing, making it more secure than version 3.
Variants
The variant field in a UUID determines its layout and interpretation. The most common variants include:
- Variant 0: Reserved for NCS backward compatibility.
- Variant 1: The standard layout used for most UUIDs.
- Variant 2: Used for DCE Security UUIDs, which are less common.
- Variant 3: Reserved for future definitions.
Example
For Version 4, a UUID might look like this:
550e8400-e29b-41d4-a716-446655440000
Here:
-
41d4
indicates it's a version 4. -
a7
represents the variant, in this case, the common "Leach-Salz" variant.
How UUIDs are Calculated
-
Version 1 (Time-based):
- The timestamp is typically the number of 100-nanosecond intervals since October 15, 1582 (the date of the Gregorian calendar reform).
- The node is the MAC address of the machine generating the UUID.
- The clock sequence helps ensure uniqueness when the clock time changes (e.g., due to system restarts).
-
Version 3 and Version 5 (Name-based):
- A namespace (like a DNS domain) is combined with a name (like a file path or URL) and hashed.
- The hash (MD5 for version 3, SHA-1 for version 5) is then structured into a UUID format, ensuring the version and variant fields are properly set.
-
Version 4 (Random-based):
- Random or pseudo-random numbers are generated for the 122 bits of the UUID.
- The version and variant fields are set accordingly, ensuring compliance with UUID standards.
UUIDv4 Calculation Example
Step 1: Generate 128 Random Bits
Let's assume we generate the following 128-bit random value:
11001100110101101101010101111010101110110110111001011101010110110101111011010011011110100100101111001011
Step 2: Apply UUIDv4 Version and Variant
Version: Replace bits 12-15 (4th character) with
0100
(for UUID version 4).
Original:1100
becomes0100
→ Updated value in this position.Variant: Replace bits 6-7 of the 9th byte with
10
(for the RFC 4122 variant).
Original:11
becomes10
→ Updated value in this position.
Step 3: Format into Hexadecimal
Convert the 128-bit binary into 5 hexadecimal groups:
- 32-bit group:
11001100110101101101010101111010
→ccda55ba
- 16-bit group:
1011101101101110
→b76e
- 16-bit group:
0100010101000101
→4545
(with0100
for version 4) - 16-bit group:
1010110111110010
→adf2
(with10
for the variant) - 48-bit group:
11010011011110100100101111001011
→d39d25cb
Step 4: Combine the Groups
The final UUID would look like this:
ccda55ba-b76e-4545-adf2-d39d25cb
Top comments (5)
I don't think it's a standard yet, but UUIDv7 promises to solve one of the biggest issues with UUID, which is the lack of sequencing. While maintaining sufficient randomness to avoid overlap, it provides a sequence of events which can be extremely useful for partitioning data, archiving records, and providing more of a guarantee of sequence.
I base so much of my code on v5 GUIDs. Unique for the same data, so bloody helpful, cache keys, filenames for s3, and on and on.
Then you can just do cache keys like:
or
This is an interesting idea. Is a v5 faster/better than other hashing options? Is the benefit primarily that the output is a UUID?
Yeah, the benefit is it makes a short key out of what would be a much longer hash, etc. It's comparable with other keys quickly. I'm always storing things in Redis using such a key, or for instance, I name my files in s3 using v5 guids, then I don't need to bother virus scanning or uploading a file that already exists.
Interesting! I had no idea how UUIDs were generated