Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on Jun 19, 2023

Global Unique Constraint on a partitioned table in PostgreSQL and YugabyteDB

#yugabytedb #distributed #postgres #sql

One limitation of PostgreSQL declarative partitioning, when compared to some other databases like Oracle, is the impossibility to create global indexes. This includes the unique index that is necessary to enforce a primary key. It means that there's no built-in way to enforce the unicity of a key across all partitions, except when it includes the partition key. In this latter case, local indexes that enforce the local uniqueness are sufficient to guarantee the global uniqueness of the compound key, but some applications do not like compound primary keys.

With YugabyteDB you don't need declarative partitioning to scale because tables are split with automatic sharding to small tablets, and, at this level, all indexes are global and can enforce uniqueness. However, on top of it, you may want to use the PostgreSQL declarative partitioning for two reasons: lifecycle management (with the ability to drop old partitions) or geo-partitioning (to assign partitions to specific regions with tablespaces).

How to guarantee global uniqueness? There are two easy solutions when the application is designed to scale, and one alternative for legacy applications, which is the goal of this blog post.

The easy solutions are:

A primary key should be generated from a UUID or a Sequence, and should be immutable. Both are designed to generate unique values, with a high probability in the case of UUID or even a 100% guarantee in the case of a Sequence. You may not need an additional index.
include the partition key in the primary key. This means adding the date (for lifecycle management) or the region (for geo-partitioning) to the local identifier. Applications designed for geo-distribution should do that. Other alternatives are for legacy applications.

If for any reason you want an additional guarantee of uniqueness for the part that doesn't include the partition key, there's no other choice than having a global source of truth, and it will limit the scalability. Sequence is one of this kind, and are optimized in YugabyteDB by limiting their transactional behavior to the minimum necessary: numbers are cached and the incrementing operations are never rolled back. However, a global index must be fully ACID and a transaction inserting a new primary key then becomes a global transaction. Another approach, a global query to check all partitions, must also be a serializable global transaction.

Alternative with a global table

Here is an example of building a global unique index, as a table maintained by a trigger. In YugabyteDB, tables and indexes, are the same because a table is stored in its primary key. Then, in YugabyteDB, this solution is not only logically equivalent to a global index but also physically equivalent. In PostgreSQL this solution is limited because there's no sharding, and a table with a primary key is two structures: a B-Tree index and a Heap Table.

I've setup a cluster as in this previous post, a tree-region cluster (across the solar system just for fun) with a Docker Compose.

In short, I started docker-compose up and created the following tablespaces and table:

The customer table is geo-partitioned to earth, moon and mars. Its primary key is compound of with a generated UUID (id) and the region identifier (planet):

yugabyte=# \d+ customers

                                     Table "public.customers"
 Column | Type | Collation | Nullable |      Default      | Storage  | Stats target | Description 
--------+------+-----------+----------+-------------------+----------+--------------+-------------
 id     | uuid |           | not null | gen_random_uuid() | plain    |              | 
 planet | text |           | not null |                   | extended |              | 
 info   | text |           |          |                   | extended |              | 

Partition key: LIST (planet)

Indexes:
    "customers_pkey" PRIMARY KEY, lsm (id HASH, planet ASC)

Partitions: customers_earth FOR VALUES IN ('earth'),
            customers_mars FOR VALUES IN ('mars'),
            customers_moon FOR VALUES IN ('moon')

This is sufficient and optimal: an insert will be a local transaction, the composite primary key is guaranteed to be unique, and we would be very unlucky if seeing duplicate UUIDs.

Global Unique Index

If, for any reason, there is a need to guarantee that the UUID is globally unique, I cannot directly create a unique index:

yugabyte=# create unique index customers_unique_id
           on customers(id);

ERROR:  insufficient columns in UNIQUE constraint definition
DETAIL:  UNIQUE constraint on table "customers" lacks column "planet" which is part of the partition key.

Global Table maintained by Trigger

Here is my alternative. I create a table with the same columns as the main table's primary key, but where only the id is in the primary key:

yugabyte=# create table customers_unique_id 
           (id uuid primary key, planet text not null);

CREATE TABLE

This table is not partitioned. This is not a problem in YugabyteDB because automatic sharding applies. The only thing you have to take care is that if you partitioned for data governance reasons (to keep sensitive data in specific regions) then the information in this global table should not contain sensitive information. This should not be a problem with a UUID and a region name.

To guarantee the uniqueness of id, I don't need another column but I've added the region discriminent, planet, as this table could also be used to find the region when only the id is known. This is an alternative solution to the previous post I'm taking the example from where duplicate indexes are maintained for this purpose.

This table must be maintained automatically when rows are inserted, deleted or when the id is updated (which should not happen as it si part of the primary key, but we are talking about legacy application, so better be safe for all unexpected cases).

Here is the trigger function:

create or replace function customers_unique_id()
returns trigger as $$
declare
 rows smallint;
begin
 if tg_op in ('DELETE', 'UPDATE') then
  delete from customers_unique_id 
   where id = old.id and planet=old.planet ;
 elsif tg_op in ('INSERT', 'UPDATE') then
  insert into customers_unique_id (id,planet) 
   values (new.id, new.planet);
 end if;
 get diagnostics rows = row_count;
 if rows != 1 then 
  raise '% affected % rows (expected: 1)',tg_op, rows;
 end if;
 return new;
end;
$$ language plpgsql;

and the trigger:

create trigger customers_unique_id
 after delete or insert or update of id, planet
 on customers for each row
 execute function customers_unique_id();

Finally I initialize this table:

begin transaction;
delete from customers_unique_id;
insert into customers_unique_id select id, planet from customers;
end;

I did that within a transaction, starting with a delete, in case there were some inserts after I created the trigger. In PostgreSQL you can do all that, including the DDL, in a transaction (but you must be in serializable for this case). However, doing so will lock the table in exclusive mode for the DDL commands. For online changes, I prefer to separate DDL (short but with exclusive lock) from DML (can be long, but non blocking). YugabyteDB has an optimistic approach for DDL (no exclusive lock but applications may get a serializable error) and run DDL out of the main transaction, so this is the right way to do.

Testing DML

I try to insert the same id with different regions and check the correct behavior:

yugabyte=# insert into customers values('f00dcafe-dead-beef-face-c0ffeec0de12', 'mars', 'one');
INSERT 0 1
yugabyte=# insert into customers values('f00dcafe-dead-beef-face-c0ffeec0de12', 'mars', 'one');
ERROR:  duplicate key value violates unique constraint "customers_mars_pkey"

yugabyte=# insert into customers values('f00dcafe-dead-beef-face-c0ffeec0de12', 'moon', 'one');
ERROR:  duplicate key value violates unique constraint "customers_unique_id_pkey"

yugabyte=# delete from customers where id::text like 'f00dcafe%';
DELETE 1

yugabyte=# insert into customers values('f00dcafe-dead-beef-face-c0ffeec0de12', 'moon', 'one');
INSERT 0 1
yugabyte=#

DML that doesn't violate my business logic (id being globally unique) succeeded, the other failed.

Performance and scalability

In my lab, the docker-compose also starts a 'metrics' container that runs my YBWR script every 10 seconds to show the tablet activity. I have run the following inserting 1000 rows to the moon region:

yugabyte=# insert into customers(planet, info) select 'moon', generate_series(1,1000);
INSERT 0 1000

Before creating the trigger, this was running in Time: 75.999 ms with the following reads and writes:


  rocksdb_seek | rocksdb_next | rocksdb_insert |                     dbname / relname / tserver / tabletid / leader
 --------------+--------------+----------------+----------------------------------------------------------------------------------------
           498 |              |            166 | yugabyte customers_moon 393b02758fcf487994b666589d39e31c L yb-tserver-1.base.moon.star
           441 |              |            147 | yugabyte customers_moon 57cca7ebc5494848ad00b30a3ac88c44 L yb-tserver-1.base.moon.star
           543 |              |            181 | yugabyte customers_moon 60fd4b4984f2433385aa716572a73aca L yb-tserver-1.base.moon.star
           513 |              |            171 | yugabyte customers_moon 7a7ca74a195949baaa819775980a1eb8 L yb-tserver-1.base.moon.star
           537 |              |            179 | yugabyte customers_moon d3df57a21e35437b84f17a31b6fffd88 L yb-tserver-1.base.moon.star
           468 |              |            156 | yugabyte customers_moon e5e46ea408764a62950a6cb18954e77e L yb-tserver-1.base.moon.star
 (6 rows)

The inserts have to seek (to read if the row exists to check for duplicates in the partitioned table) and insert (write the row to the LSM-Tree) to multiple tablets. The tablets are all on the same region (moon) and, in this small lab, they are even in the same server. This is a local transaction.

I've run the same after creating the trigger:

  rocksdb_seek | rocksdb_next | rocksdb_insert |                        dbname / relname / tserver / tabletid / leader
 --------------+--------------+----------------+----------------------------------------------------------------------------------------------
           492 |              |            164 | yugabyte customers_moon 393b02758fcf487994b666589d39e31c L yb-tserver-1.base.moon.star
           558 |              |            186 | yugabyte customers_moon 57cca7ebc5494848ad00b30a3ac88c44 L yb-tserver-1.base.moon.star
           480 |              |            160 | yugabyte customers_moon 60fd4b4984f2433385aa716572a73aca L yb-tserver-1.base.moon.star
           513 |              |            171 | yugabyte customers_moon 7a7ca74a195949baaa819775980a1eb8 L yb-tserver-1.base.moon.star
           492 |              |            164 | yugabyte customers_moon d3df57a21e35437b84f17a31b6fffd88 L yb-tserver-1.base.moon.star
           465 |              |            155 | yugabyte customers_moon e5e46ea408764a62950a6cb18954e77e L yb-tserver-1.base.moon.star
           480 |              |            160 | yugabyte customers_unique_id 01a52bd802f342d9902042dbb8360ca5 L yb-tserver-2.base.mars.star
           492 |              |            164 | yugabyte customers_unique_id 13c9a37b25244d69a1782cc3591f852e L yb-tserver-1.base.moon.star
           513 |              |            171 | yugabyte customers_unique_id 6138c8e697e1483b8a1950e734ec9b85 L yb-tserver-0.base.earth.star
           492 |              |            164 | yugabyte customers_unique_id 7cee267d09f24362b4a1089b808b8db6 L yb-tserver-2.base.mars.star
           558 |              |            186 | yugabyte customers_unique_id ee3fa437841648c280b9a824d7d80ebc L yb-tserver-1.base.moon.star
           465 |              |            155 | yugabyte customers_unique_id fa9febc32ec14bfcaebdf475f5d6cb53 L yb-tserver-0.base.earth.star
 (12 rows)

This has run in Time: 778.955 ms, 10 times slower. The reads and writes are 2 times higher, because of the new table to maintain, but the major difference is that, now, multiple servers and regions are touched by the transaction. YugabyteDB implements many optimizations for single-shard, single-server and single-region transactions. In the latest case, the transaction table itself can be remote. With this global table we cannot benefit from those single-region optimizations.

To Summarize

When you want to scale a geo-distributed application you should

choose the right database. The closest to PostgreSQL is YugabyteDB which provides all SQL features on a horizontally scalable infrastructure.
design your application to run the critical services locally to one region and this means avoiding transactions that have to read and write to other regions. The SQL features that help for this design are: declarative partitioning, composite primary key, sequences, UUID generation extensions, and triggers.

That's the reason why YugabyteDB implements all those features, by re-using the PostgreSQL code when possible, and by pushing down the others to the distributed storage. The other distributed databases that do not support triggers require you to change your application and add, in addition to the business logic, the necessary code to validate the data integrity like uniqueness, and the regression tests for all race conditions on it.

DEV Community

Global Unique Constraint on a partitioned table in PostgreSQL and YugabyteDB

Alternative with a global table

Global Unique Index

Global Table maintained by Trigger

Testing DML

Performance and scalability

To Summarize

Top comments (0)

Read next

Generate ER Diagrams in MYSQL workbench

Fullstack app: Deploying the Fullstack Application on Render with Docker

Oracle SQL - Cardinality & Cost

Improving Performance of PostgreSQL queries