DEV Community

Cover image for KUID: Compressed Universally Unique Identifier
Sonu Kumar
Sonu Kumar

Posted on

KUID: Compressed Universally Unique Identifier

KUID, a universally unique identifier, is essentially a shorter version of a UUID and is commonly used to identify records in various applications.

KUID

What's UUID?

A UUID (Universally Unique Identifier) is a unique identifier that is widely used across various applications and systems to identify records. It is a 128-bit number typically represented as a sequence of 32 hexadecimal digits, grouped in 5 sections, separated by hyphens, as in the following format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

UUIDs are designed to be globally unique, meaning that it is extremely unlikely for two UUIDs to be the same, even when generated independently by different systems. This makes UUIDs useful for identifying and tracking records in distributed systems, databases, and other contexts where unique identification is necessary.

There are different variants and versions of UUIDs, each with its own way of generating unique identifiers. Some commonly used variants include UUIDv1, UUIDv4, and UUIDv5. The specific variant and version of a UUID can be determined from the format and content of the identifier.

How to generate KUID?

KUID is generated in two steps

  • Generate UUID (use any algorithm like V1,V2,V3,V4 etc)
  • Encode 128-bit number

KUID = Generate UUID and encode it in the higher base like base 32, 62 or even higher.

The length of the KUID depends on the encoding scheme employed. If the base62 encoding scheme ([0–9A-Za-z]) is used, KUID can be encoded to 22 bytes. However, if you wish to store KUID in upper case, then encoding it using the base36 encoding scheme, which utilizes characters [0–9A-Z], will result in an encoded representation that can be stored in 26 bytes. For this post, we'll use base 62 encoding.

Why should we use KUID?

KUID generates a smaller string, which results in less storage space and faster processing of many operations.

NOTE: Following statements are only valid when KUID is treated as a string, not as 128-bit number.

  • Hashing: Using KUID with its shorter length of 22 bytes reduces the amount of data that needs to be hashed. This results in fewer instructions being required to perform the hashing operation compared to UUID, which requires a hash of 36 bytes.
  • Comparison: Due to the shorter length of KUID, only 22 bytes need to be compared instead of 36 bytes, resulting in fewer instructions needed for comparison. Read/Write: Using KUID, which can be encoded in 22 bytes, results in a reduction in the memory and disk space required for read and write operations. This is in contrast to UUID, which requires 36 bytes for the same operations. With KUID, fewer instructions are needed to read or write data due to its smaller size.
  • I/O Calls: Sending a UUID over a network consumes 36 bytes of data (without considering compression), while a KUID only requires 22 bytes. Therefore, KUID is a more efficient choice for network transmission due to its smaller size.
    Database indexing: Due to the smaller size of KUID, it leads to a decrease in the size of the database index. In addition, combining KUID with another column for secondary indexing can be beneficial as databases have a limited index width (e.g., 767 bytes in MySQL) that is dependent on the encoding used for the column or table. For instance, MySQL's UTF8 encoding may limit the index width to as little as 255 bytes.

  • URL safe: Like UUID, KUID is URL safe since it only uses [0–9A-Za-z] characters to represent the data. KUID is commonly used as a unique column, which means that URLs containing KUIDs may be used to represent specific resources, such as /api/v1/products/:productId. As the KUID is included in the URL, it must be URL safe to avoid issues such as encoding errors or unintended interpretation by the web server or browser.

  • Lower Memory/Disk Usage: Using KUID results in lower memory/disk usage compared to UUID. Since KUID only requires 22 bytes, it uses less storage space than UUID, which uses 36 bytes. This can be beneficial for systems that require frequent read/write operations or have limited storage capacity. For instance, if we consider the storage of 100 million records, assuming each KUID is stored as a 22-byte string, the space consumed by KUIDs would be approximately 2.2 GB. In contrast, the space required for UUIDs would be around 3.6 GB. Therefore, using KUIDs can result in a storage saving of approximately 39%.

KUID Usage

public class Main {
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            KUID kuid = KUID.randomKUID();
            System.out.println(kuid.toString());
        }

    }
}

Enter fullscreen mode Exit fullscreen mode

This will produce something like this

If you found this post helpful, please share and give a thumbs up.

Top comments (2)

Collapse
 
j0nimost profile image
John Nyingi

what are the chances of collisions? doesn't it increase?

Collapse
 
bookmantasty profile image
Cesar Adrian Leyva Luna

is the same uuid in other encoding