DEV Community

Mircea Cadariu
Mircea Cadariu

Posted on • Updated on

To UUID, or not to UUID, that is the primary key question

For #PGSQLPhriday 015, Lætitia invited us to debate the merits of using UUIDs. I found it a good opportunity to explore the topic a bit deeper and form an opinion along the way. For this purpose, I've created a small experimental setup which allowed me to try out several UUID types in PostgreSQL and observe how they're processed internally by the database. I've enjoyed doing it and learned quite a lot. I share my conclusions here with the hope it helps others.

What are UUIDs

The abbreviation stands for Universal Unique IDentifier. UUIDs are sequences of 128 bits used to uniquely represent data in computer systems. Postgres has the UUID data type you can use for defining columns. This is what they look like:

a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11
Enter fullscreen mode Exit fullscreen mode

Note though that this is the hex-encoded string representation that you will see when you display them, it is not how they are stored internally.

Primary key options

When creating a database table, one of the first decisions you make is whether or not you'll use a natural or synthetic primary key. If you'll go for a synthetic one, UUIDs are an option. They are not the only option. Like with many other software-related decisions, this is not clear-cut and the trade-offs involved can be a source of long debates. For contrast, let's have a look at probably the most popular alternative solution.

Auto-incrementing integers

You can use auto-incrementing integers that are created by the database. Postgres creates these IDs using an object called sequence. It's essentially a single row table that keeps track of the current number and can give the next one.

Advantages:

  • easier to work with, you can remember them
  • predictable
  • smaller size on disk

Disadvantages:

  • can expose information, like user count
  • potential bottleneck in distributed systems (centralisation)
  • need to sync them when upgrading using replication

UUIDs

Let's have a look at UUIDs now.

Advantages:

  • more secure - you can't guess them
  • use-cases where centralised generation of IDs is not feasible (e.g. batch-generate new objects in the frontend without calling the backend)
  • no need to reset sequences in upgrades / migrations

Disadvantages:

  • occupy larger size on disk
  • harder to work with, you can't remember them
  • randomness makes some internal operations slow
  • WAL amplification 1 2
  • can't use directly when need to build pagination

So, there are pros and cons on either side. This means that I can't give an absolute conclusion about what's better, because it depends on the context and use-case.

I could then very well finish the post here with the familiar conclusion: it depends. However, let's proceed and assume you find the above arguments important enough and decide to use UUIDs as a primary key in Postgres. What should you know about them? Let's start our exploration by having a look at the data structure that Postgres creates for every primary key you create.

The structure of B-tree indexes

For all primary keys you define, Postgres creates an index for them automatically. This index is backed by a B-tree data structure. In order for B-trees to be able to do operations like primary key lookups and range queries very fast, the index pages are kept balanced and sorted at all times. The Postgres implementation is based on the paper by P. Lehman and S. Yao (with some modifications). Here is a screenshot from this paper, illustrating how a B-tree looks. Note that as mentioned, the implementation in Postgres is a bit different (e.g. there are internal and leaf pages, only the latter actually point to table rows).

Image description

The levels make reads faster because the number of steps that have to happen to find and return a specific entry (in the picture - the "associated information" node) are reduced. The time complexity of the operation becomes logarithmic instead of linear.

Random vs. Time-ordered UUIDs

There are multiple types of UUIDs we can use. For example, the default UUID generation strategy in Hibernate, a popular Java ORM tool, is random. Given Postgres internals, random UUIDs are not ideal.

The B-tree structure is not so compatible with randomness, because this means a lot of "work" (page modifications) have to happen in order to keep the tree pages balanced and sorted with every new element we store. If the new element can be inserted in a page that still has space then it's all good, however if not then a page split happens, which leads to the need to update the parent page also, which can potentially determine further splits and so on. Basically the new entry can go "anywhere", and the database has to accommodate for that by continuously restructuring the B-tree index.

Can this be avoided? Let's have a look at UUID version 7. An important difference is that it has a timestamp-based component at the beginning, which means they have a natural ordering embedded in their structure. This is good news, the database will do less work to keep the B-tree balanced, as all new elements will always go on the "right side", and a lot of the rest of the tree will remain untouched.

The TSIDs are another variant of UUIDs. Vlad Mihalcea considers this the best option of UUIDs for primary key. While also being time-ordered like the UUIDv7, they can be stored in a 8 bytes bigint data type, instead of 16.

We've assembled a nice set of UUIDs to try out so far (random UUIDs, UUIDv7 and TSIDs). Let's go ahead and use these UUIDs in our experimental setup. For this, I'm inserting all at once about 40k rows in a simple table which has a UUID as primary key. In the case of the TSID, I use bigint.

You can't yet generate all these UUID types in Postgres directly. In Java, there are several options of libraries you can use to generate such UUIDs. This would be one option for UUID v7: java-uuid-generator. And this one for TSIDs: hypersistence-tsid.

Experiment

I ran the same inserts of the 40k rows, for 5 times per each UUID. When going to the next UUID type I reset the statistics.

Runtime

It took roughly the same time for each. Despite having a lot more work to do to keep the B-tree balanced for the random ones, Postgres processed all the insertions very fast anyway.

We shouldn't jump to the conclusion now that there's no difference among the UUID types we tried. It is better to also look at amount of pages read/written (or with other words, amount of I/O) that were done for each, as timing is more volatile. For example, in production we might not have all the dataset present in the shared buffers.

Amount of I/O per UUID type

I queried pg_statio_all_indexes after every one of the 5 runs per each UUID in my experiment, and looked at the difference in the number of blocks for the primary key index.

I noted down the following columns after every run:

  • idx_blks_read (number of disk blocks read)
  • idx_blks_hit (number of shared buffer hits)

Let's have a look at the results!

TSID

idx_blks_read idx_blks_hit
124 88573
245 177552
367 266532
490 327560
611 372717

Index size: 4888 kB

UUID V7

idx_blks_read idx_blks_hit
254 88943
513 163091
769 233844
1012 310091
1245 387387

Index size: 9960 kB

Random UUID

idx_blks_read idx_blks_hit
234 89027
459 207315
628 340988
920 474911
1149 608634

Index size: 8968 kB

We can see that for the UUIDv7 and TSID runs, the numbers grow at a different rate than the random ones. Zooming-in on the idx_blks_read column only, the blocks read from disk, we see that TSIDs accumulated only half compared with the others. Expected, due to the data type storage difference (8 vs 16 bytes). We notice this proportion reflected also in the total size of the index.

I made a simple graph with the number of blocks added up (both from memory and from disk) after each run where it is visible that the growth is indeed faster for random ones.

Image

One thing that I found initially surprising looking at the results, is the number of index blocks from shared buffers appearing in the statistics (idx_blks_hit). When inserting ~44k rows with TSIDs as primary keys, we see ~88k in the idx_blks_hit. This looks like roughly 2 page hits for every index entry that we have inserted. What exactly are these 2 buffer hits per element? I was kind of expecting just 1 - the rightmost page of the index. Let's explore the Postgres source code in order to understand why.

Postgres B-tree index fastpath optimisation

The answer can be found in this comment in nbtinsert.c. The one extra hit is for the root page. The other hit is for the page where the new entry will be inserted, to which we get to, from the root page, when we insert a new element.

But note a fastpath optimization is mentioned (caching the rightmost leaf page). When the conditions are met to apply this, it doesn't read the root page anymore for every insert. Why is it not happening for my experiment though?

The answer to this question is here, again in nbtinsert.c. This optimisation does not get applied for small indexes like the one I created with 44k entries. Fair enough. Let me try out a larger one, say 30 million entries, to see this optimisation in action.

create table experiment (id bigserial primary key);
insert into experiment select generate_series(1, 30000000);
Enter fullscreen mode Exit fullscreen mode

I can now query the pg_statio_all_indexes and expect to not see 30 million x2 in the idx_blks_hit column. And indeed! Getting just 30639018 and not ~6000000. This is the effect of the fastpath optimisation where the rightmost leaf is cached, so it's accessed directly and not through the root.

Partitioning by timestamp-based UUIDs

A small tangent to a related topic I'd like to mention. You can set up partitioning based on the timestamp component. Why does this matter?

Partitioning is very useful because it helps achieve predictability of operations. In a way, it is the same argument for introducing pagination in APIs. It makes reads faster because, while the table might grow indefinitely, the database will not scan it in its entirety, but only a smaller part of it (a partition). Also, if you need to insert a lot of data at once, partitioning enables constant ingestion rates. An interesting experiment demonstrating this is described in this post written by the AWS RDS team.

Conclusion

I really like the newer timestamp-based UUIDs. I didn't know about them before starting writing this post, I only knew about the random ones. Therefore I was more reserved about UUIDs when starting out, but the newer types remove many deficiencies making them quite an attractive option for primary keys. As there still are trade-offs, you still have to consider how important those they are for your setup. I've explored performance characteristics of the various UUID types. The winner among them is the TSID because it generated the least I/O in my experiments and can also be stored in 8 bytes instead of 16, which is a welcome bonus.

With cloud databases like Aurora from AWS, where the pricing model is pay per I/O consumed (the requests to read from the storage when it can't be found in the shared buffers), so applying strategies to keep the I/O low keeps the costs low. This might not however be your biggest cost generator, you have to measure and compare. Also, even if you're not looking to reduce the bill, being aware of, and reducing the amount of work that has to be done in the background by the database, gets you more overall throughput and better resource utilisation. If Postgres spends less time keeping the B-tree balanced it can instead can handle more of your business use-cases.

Thanks for reading!

Top comments (0)