DEV Community

Frederik Van Lierde
Frederik Van Lierde

Posted on

Why you should never use an UUID as the primary key in SQL Databases

Using a UUID (Universally Unique Identifier) as a primary key in SQL databases has both advantages and disadvantages. While UUIDs offer benefits in certain scenarios, there are reasons why they might not be the best choice for a primary key:

Indexing and Performance:
UUIDs are 128 bits long, compared to 32 bits for a typical integer. This larger size can result in increased storage requirements and decreased performance, especially when dealing with large datasets.

Indexes built on UUID columns may not perform as efficiently as those on smaller data types. This is because larger keys can lead to more page reads, impacting query performance.

Readability and Debugging:
Unlike integers, UUIDs are not human-readable, which can make debugging and manual inspection of the database more challenging. Integers or other smaller data types may be more convenient for developers and database administrators.

Clustering:
UUIDs are designed to be globally unique, but they are not guaranteed to be sequential. This lack of sequential ordering can result in suboptimal disk I/O patterns, affecting the performance of certain types of queries, especially those involving range-based searches.

Fragmentation
UUIDs are often generated using a combination of timestamp and random values. This randomness can lead to higher levels of index fragmentation, impacting database performance over time.

Storage Overhead:
Storing UUIDs can lead to increased storage requirements, both in terms of disk space and memory. This can be a concern in environments where storage costs are a critical factor.

Application Complexity:
Managing UUIDs, especially their generation and uniqueness across distributed systems, can add complexity to the application logic. This complexity may not be necessary if simpler primary key types suffice for the application's requirements.

Top comments (66)

Collapse
 
vbilopav profile image
vbilopav • Edited

And how would you ensure pk uniqueness on a same entity in a distributed system?

Because you can't do it with int autoincrement.

And I don't see how declaring a primary key field as uuid default gen_random_uuid() instead of int autoincrement is any different in terms of extra complexity.

UUID exists for a reason and people use it for a reason. That doesn't mean it's good fit for every scenario. Samo goes for int autoincrement.

Collapse
 
bbkr profile image
Paweł bbkr Pabian • Edited

You can get unique autoincremented IDs in distribited systems. For example in MySQL you can configure autoincrenent increment and offset. Using those first server will generate 1, 101, 201, 301, 401, etc (increment 100 + offset 1). Second server will generate 2, 102, 202, 302, 402, etc (increment 100 + offset 2). Very simple and effective. And you can migrate those rows between servers without risk of collision.

Collapse
 
mkoehrer profile image
mkoehrer

This will not work with anonymus replicates (for example notebooks running a local database - my old company did this with MS Server Replication -

The only thing that will work in this scenario are uuids

Collapse
 
mjaggard profile image
mjaggard

That is definitely not simple. You need to know (at least approximately) how many servers you'll have, you also presumably need to either number them somehow and then you need to get your next increment value before creating the object in some cases to be truly atomic from the application user's point of view.

Thread Thread
 
bbkr profile image
Paweł bbkr Pabian

Bigint unsigned with increment 1000 solves this issue entirely. Be realistic.

Yes, numbering servers requires discipline and misconfiguration can be fatal.

As for ability to generate ID before actual insert and never hit reordering penalty - same trick can be applied. Just use sequence generator.

I think this is great topic for more advanced post. OP barely scratched the surface, because in almost all relational databases there must be PK / UK for data internal ordering and row replication. Using UUIDs has nasty consequences in large scale. I'll write my own post soon, clearing few misconceptions around PK generators.

Collapse
 
csgeek profile image
csgeek

Why wouldn't you do what basically every distributed database does? Hash the pkey and split that across the n number of nodes available.

Your approach won't scale as soon as the are not keys used then you owned for.

Thread Thread
 
bbkr profile image
Paweł bbkr Pabian

Because databases like MySQL or PostgreSQL organize data by PK internally, so using UUID or any other non incremental sequence results in huuuuuuuge performance drop on inserts.

Collapse
 
mickmister profile image
Michael Kochell

Pretty clickbaity. "Why you should never" is a bit extreme. At least explain a horror story that has scarred you personally or something that caused you to write this. I use UUIDs at work and it has worked out fine for me.

Collapse
 
randreu28 profile image
Rubén Chiquin • Edited

There's a spanish chanel who is well spoken on this subject that made a video arguing the contrary.

youtu.be/wR5b0OhbUyw?si=JeJmicrHey...

It's main argument is the fact that id's never travel from the backend to the frontend.

An argument against not using them is security. Regular id's are ordered. If you were to have an endpoint like users/:Id, everyone could know every user id just by counting (user/1, user/2, user/3...). It could be solved with some Salt, but then you get more complexity..

They also provide multiple ways to aliviate some of the pains that are generated by using UUID.

Deserves a watch! Anyway good post

Collapse
 
palalet profile image
RadekHavelka

And lastly, altho the chance of collision / two uuids being the same is very small, it still exists ...

Collapse
 
vulwsztyn profile image
Artur Mostowski

Yeah, but you should worry more about spontaneous combustion

Collapse
 
karlatt profile image
karlatt

Do you know they say you have enough unique UUID to number all the atoms of the known universe ? saying the risk of collision is existing, is like saying you can win the lottery 10 times in a row , as the probability exists to , it should be a number like 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 chances on two billions
In fact, ANY probability can find a singularity where it can happen: I'll be happy if I win the lottery just once, and all the poor morons like me who spend money in a ticket would be happy too ... But the fact is that there are a lot more people who lose than people who win on each lottery session. So the probability you loose ten times in a row is much much bigger than my first assertion .
If you want to say something that gives structured information , you have to provide all the variables.

So how much probability do you have to have a collision against the probability of having a unique UUID ? This would be interesting and just show that the chance of collision is so small that it would be one of the biggest miracles of all times if it happens once

Collapse
 
palalet profile image
RadekHavelka

the problem isnt that the chance is LOW, the problem is that the chance IS. There is no guarantee that this will not happen if you generate two uuids, thats the point, but it is less significant than what reasons were mentioned in the article, from usability and logic point of view UUIDs makes no sense.

Collapse
 
jlous profile image
Joachim Lous

Can be viewed as a transient error, since a retry will fix it. But do make sure id generation is part of the retry.

Collapse
 
immotus profile image
immōtus

Frederik, what methodology did you use to draw these conclusions?

Without first describing the scientific method you used, all these are just unsubstantiated hypothesis (except for human readability and storage issues) and nothing else, really.

Collapse
 
richardcrawshaw profile image
Richard Crawshaw

I've worked with a couple of SQL Server databases where GUIDs had been used for clustered primary keys. The performance in those areas truly sucked: a couple of orders of magnitude worse than where ints were used.

If a GUID was deemed necessary as an ID my first choice for a design would be to split that from the clustered key. Put the GUID in a non-clustered UNIQUE index and the int /bigint PK as an auto-incremented clustered index. Depending on the other data requirements I might split that from the rest of the data, putting the real data in one or more "extension" tables using the same PK and a 1:1 relationship.

Collapse
 
peledzohar profile image
Zohar Peled

I wouldn't say never, there are situations where it's actually better to use a UUID over simple 32 bit integers, however as with any other tool, you should know what you're doing if you want to get good results.

Collapse
 
m0n0x41d profile image
Ivan Zakutnii

100% agreed.

However, we are using UUID as the PKeys, but I suppose it is suitable for microservices, at least for the time being.

Someday, it might come back to haunt us... I am almost sure it will :D

Collapse
 
hewitt profile image
Neil

As long as these are comb/sequential ID's you will have no issues even in indexing but if they are not and are not sequential then you may. I have experience first hand in this.

Collapse
 
m0n0x41d profile image
Ivan Zakutnii

Sorry, I don't get it here.

Collapse
 
tausiq2003 profile image
Tausiq Samantaray • Edited

You know what, you should write about solutions to this, instead of straight ratioing uuid. If a beginner reads this they will, yeah autoincrement is a good idea, but its not, cuz lets say if id is exposed in endpoints, yeah your DX(Developer Experience) is improved, but bad guys sitting behind you, their DX is also far better than you. They will simply sneak in and type "Hey, give me the /products/123, then /product/122 like that. So, autoincrement is a terrible idea. According to you, UUIDs are bad (but not for me). Politely asking suggest some solutions, if UUIDs and autoincrement are bad.
Thanks.

Collapse
 
frederik_vl profile image
Frederik Van Lierde

I never expose internal keys .
Why would you do that ?

Also I didn't say UUId are bad. I am using them all the time.
I just don't use then for primary keys and foreign keys

Collapse
 
srbhr profile image
Saurabh Rai

This reminded me of this video:

Collapse
 
adaptive-shield-matrix profile image
Adaptive Shield Matrix • Edited

Currently one of the most popular uuid generators
github.com/paralleldrive/cuid2

Performance loss of uuid vs integer -> is/was about 6% in real world systems in one article I read (measure your own data) -> so nearly completely negletible

Readability and Debugging: -> if you have 7-9 character long integers -> readability is same as an uuid

Clustering -> adressed by cuid2
Fragmentation -> issue is negletable, since only a minor perf loss

Application Complexity -> completely wrong
uuid massively decrease Client-Server complexity, because the id can be generated by the client/browser without having to wait for and parse the server response. This reduction in architectural complexity is more worth than any perf loss suffered by the db.

Collapse
 
fjones profile image
FJones

I would add: Locking behaviour on UUIDs can be surprising. When using UUIDs as a PK (or even unique key), partial locking can result in ranges being locked that are completely unrelated, due to the effective randomness of UUIDs. Deleting an entry from a long time ago can still result in a lock that may impact a fresh insert, for example.

Collapse
 
jlous profile image
Joachim Lous

On the upside, accessing a (usually) recent one doesn't lock al the other recent ones, where the most work is probably ongoing.

Collapse
 
anuragbhatt1805 profile image
Anurag Bhatt

I guess we can use UUID concept on mixed basis, Like for user Identification or payment id we can use uuid as it will be unique and hard to crack, where as for smaller concepts like product ID, customer ID, etc, we can use normal Smaller datatype

Collapse
 
0xf_0b1111 profile image
0xf

Insist on credible evidence to substantiate your claims, as your current discussion lacks empirical support, additionally, various statements seem baseless and lack credibility.
If you manage a substantial database and harbor concerns about UUIDs, contemplate replacing your UUID primary key with a Snowflake ID, which proves more effective than the conventional integer auto-increment.
en.m.wikipedia.org/wiki/Snowflake_ID

Note: avoid using chatgpt in future articles.

Collapse
 
mattiasnordqvist profile image
Mattias Nordqvist

I've been using snowflake IDs exclusively the last 2 years

Collapse
 
hewitt profile image
Neil

I used to think this when i was a junior developer. But this article is WRONG and should instead read ( why you should not use a non sequential or non-comb type UUID ). If you are going to use UUID/GUID's for Id fields then my recommendation is to use sequential or comb guids. We use these in our large scale SaaS solutions and they are much better in many ways than integers. They are performant ( comparable to integers for indexing ) and can be generated within the app layer or db layer and can be generated on distributed systems without clashes. Anyone reading this article should first read up on the above points because restricting yourself based on this article will really hinder your ability to build large-scale applications. integers are fine for small scale e.g a few million record tables but when you start to move to large scale and more importantly in browser record creation such as ticketing systems etc you WILL have to use UUIDs or some kind of string. At this point, Comb GUIDs become the go-to solution. @frederik_vl I urge you to amend your original article as it is factually incorrect and missleading.

Collapse
 
hewitt profile image
Neil

Further more to provide additional context to my above comment. There are some things people should be aware of regarding GUIDs that not everybody is.

  1. Only v4 guid's are cryptographically secure and can be used for ( security by obscurity URL's, such as when you use them for password resets, though this is somewhat debateable and is a business decision )
  2. Sequential/Comb GUIDs are by their nature guessable and should never be used externally for ( Security by obscurity reasons ).
  3. The size(4-byte) of GUID's are obviously larger than an integer field so if you live in 1988 then this may be a concern, otherwise it is a non-issue.

There are many articles that compare the overall performance of int's vs ( newsequentialId as it is in SQL Server or Comb GUID's) and i am not going to waste time linking through to them. What i would refer people to is the following repo that covers some of this and provides a way to generate sequential guids across different application layers ( COMB ).
github.com/richardtallent/RT.Comb