DEV Community

Franck Pachot for YugabyteDB

Posted on

Foreign keys referencing partitioned tables in YugabyteDB

YugabyteDB 2.17 is compatible with PostgreSQL 11.2 but soon PostgreSQL 15 will be merged and we will follow new versions. This is tracked in #9797. This was not a high-priority because not a lot of companies are using PG12 features and PG11 still has minor releases until November 2030. The PG11 features in YugabyteDB are already legion when compared to the limited set of SQL that the other Distributed SQL databases provide.

Referential Integrity and Declarative Partitioning

There's one feature that lot of people are asking for, that appeared in PostgreSQL 12: Foreign keys referencing partitioned tables. For sure, that's great improvement for PostgreSQL but does it have the same value for YugabyteDB? The main value for this feature, as described by its author, Álvaro Herrera, is that: for the first time it is possible in PostgreSQL to maintain large volumes of data while maintaining referential integrity.

Maintain large volume in YugabyteDB

In YugabyteDB, you don't need to partitioned the large tables because they are distributed, with all SQL features, like global indexes and foreign keys. Given that by default we set the maximum size per tablet to 100GB (tablet_force_split_threshold_bytes) and the maximum number of tablets per table to 5000 (max_num_tablets_for_table), we can have a 500 Terabyte table that doesn't require partitioning because each shard is still manageable.

Limits of Declarative Partitioning in PostgreSQL

On top of sharding, you may want to use PostgreSQL declarative partitioning for other reasons than the size of the table:

  • lifecycle management: you want to be able to quickly purge old data. You partition by range of data so that a simple drop partition can be used instead of bulk deletes.
  • multitenancy: you do Software as a Service and operate many tenants within the same table. You may want more control, like placing some tenants into dedicated nodes of the cluster. You partition by list of tenant ID to map them to tablespace with specific cluster placement
  • geo-distribution: you want to constrain the data placement by region for data governance or reduce latency. You partition by list of countries and map to tablespaces with regional placement and preferred leaders
  • If you have a table larger than 50 Terabytes and want to keep the tablets to 10GB maximum, you may partition by hash on top of sharding.

There's no global index in PostgreSQL

Now, in this context, let's see if you need this PG12 feature. First, you should remember that:

  • a foreign key can reference only a unique key (primary key or unique constraint)
  • a partitioned table must have the partitioning key in its primary key

This doesn't change in PostgreSQL12 or later. For example, you cannot do the following (global primary key):

create table users (
    id            bigserial primary key,
    national_id   text not null,
    country       text not null
)
partition by list ( country );

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "users" lacks column "country" which is part of the partition key.

Enter fullscreen mode Exit fullscreen mode

and you cannot do reference other than the primary key:

create table users (
    id            bigserial,
    national_id   text not null,
    country       text not null,
    primary key (country, national_id)
)
partition by list ( country );

create table accounts(
 id             bigserial primary key
 ,user_id       bigint references users(id)
 ,country       text not null
)
;

ERROR:  there is no unique constraint matching given keys for referenced table "users"

alter table users add unique(id);

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  UNIQUE constraint on table "users" lacks column "country" which is part of the partition key.
Enter fullscreen mode Exit fullscreen mode

Because there is no global index with declarative partitioning, all indexes are local and cannot enforce uniqueness globally. Without uniqueness enforced globally, you cannot reference a partitioned table. Except when the partition key is included in the primary key because, in this case, the foreign key determines a unique partition, where the local index can enforce uniqueness. The new feature that appeared in PostgreSQL allows the declaration of foreign keys for this exception.

Master-detail equi-partitioning

You cannot reference a partitioned table in PostgreSQL11 but you can reference a table partition, and this is also possible in YugabyteDB.

Let's continue on my example:

create table users (
    id            bigserial,
    national_id   text not null,
    country       text not null,
    primary key (country asc, national_id asc)
)
partition by list ( country );

create table users_id partition of users for values in ('ID');
create table users_sg partition of users for values in ('SG');
create table users_in partition of users for values in ('IN');
create table users_jp partition of users for values in ('JP');
create table users_ch partition of users for values in ('CH');

create table accounts(
 id bigserial
 ,national_id   text not null
 ,country       text not null
 ,primary key (country asc, national_id asc)
)
partition by list ( country );

create table accounts_ch partition of accounts
( foreign key (country, national_id) references users_ch)
for values in ('CH');

Enter fullscreen mode Exit fullscreen mode

Yes, this works, and I can verify that the referential integrity is enforced:


yugabyte=# insert into accounts (national_id, country) values (42,'CH');
ERROR:  insert or update on table "accounts_ch" violates foreign key constraint "accounts_ch_country_fkey"
DETAIL:  Key (country, national_id)=(CH, 42) is not present in table "users_ch".

yugabyte=# insert into users (national_id, country) values (42,'CH');
INSERT 0 1

yugabyte=# insert into accounts (national_id, country) values (42,'CH');
INSERT 0 1

yugabyte=# delete from users where national_id='42';
ERROR:  update or delete on table "users_ch" violates foreign key constraint "accounts_ch_country_fkey" on table "accounts_ch"
DETAIL:  Key (country, national_id)=(CH, 42) is still referenced from table "accounts_ch".

Enter fullscreen mode Exit fullscreen mode

You may think that declaring the foreign key constraint in each partition is lot of effort, but I'm lazy and here is how I do the other partitions:

select format('
create table if not exists %I.%I partition of accounts
( foreign key (country, national_id) references %s)
%s'
,relnamespace::regnamespace
,regexp_replace(relname,'^users_','accounts_')
,relname
,pg_get_expr(c.relpartbound, c.oid, true)
)
from pg_class c
where relname ~ '^users_..$'
\gexec
Enter fullscreen mode Exit fullscreen mode

That's the power of being PostgreSQL-compatible: pg_class, pg_get_expr, format, \gexec ... all is also there in YugabyteDB.

What is the difference with the PostgreSQL 12 feature?

What was added to PG12, that is not available yet in YugabyteDB, helps when a non-partitioned table needs to reference a partitioned table, or when a partitioned table needs to reference a table with different partitioning. In all other cases, you can do it as I did before.

Now, let's look again at the cases where we use PostgreSQL declarative partitioning in YugabyteDB:

  • lifecycle management: if you purge the parent rows (for example orders) you will also purge their child rows (order_items). If you partitioned the parent by year (having the year in the primary key) you will do the same partitioning for the child.

  • multitenancy: obviously, the child table rows reference the parent for the same tenant, and then both are partitioned on the tenant_id

  • geo-distribution: if you are required to store your customers in a specific country, you probably want to store their orders, account, and details into the same country

  • huge table: if the parent table is large, the child table will be larger. You want to partition it in the same way

Those cases are qui-partitioned and the referential integrity can be enforced as I did before: from the child table partition the the parent table partition. With PostgreSQL declarative partitioning, all is local to the partition: indexes, unique constraints, foreign key constraints.

Bonus: Partition-wise join

There is another reason for having the same partition key in referential integrity: scalable performance. Don't forget that Partition-Wise join is not enabled by default (the reason is to save some planning time when not used, as explained by Laurenz Albe)

I let you admire the beauty of this execution plan:

yugabyte=# set enable_partitionwise_join=on;
SET
yugabyte=# explain (costs off) select * from users join accounts using(national_id, country);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Append
   ->  Merge Join
         Merge Cond: ((users_ch.country = accounts_ch.country) AND (users_ch.national_id = accounts_ch.national_id))
         ->  Index Scan using users_ch_pkey on users_ch
         ->  Materialize
               ->  Index Scan using accounts_ch_pkey on accounts_ch
   ->  Merge Join
         Merge Cond: ((users_id.country = accounts_id.country) AND (users_id.national_id = accounts_id.national_id))
         ->  Index Scan using users_id_pkey on users_id
         ->  Materialize
               ->  Index Scan using accounts_id_pkey on accounts_id
   ->  Merge Join
         Merge Cond: ((users_in.country = accounts_in.country) AND (users_in.national_id = accounts_in.national_id))
         ->  Index Scan using users_in_pkey on users_in
         ->  Materialize
               ->  Index Scan using accounts_in_pkey on accounts_in
   ->  Merge Join
         Merge Cond: ((users_jp.country = accounts_jp.country) AND (users_jp.national_id = accounts_jp.national_id))
         ->  Index Scan using users_jp_pkey on users_jp
         ->  Materialize
               ->  Index Scan using accounts_jp_pkey on accounts_jp
   ->  Merge Join
         Merge Cond: ((users_sg.country = accounts_sg.country) AND (users_sg.national_id = accounts_sg.national_id))
         ->  Index Scan using users_sg_pkey on users_sg
         ->  Materialize
               ->  Index Scan using accounts_sg_pkey on accounts_sg
(26 rows)
Enter fullscreen mode Exit fullscreen mode

Joining two large tables is always expensive and the best you can do is do it with smaller chunks. In a master-detail relationship, you will join on the foreign key. Having the table partitioned on it allows to join one single partition to the other. With Hash Join or Sort Merge Join, this means a smaller chunk to hash or sort, and more changes that it fits in memory. Here, as I've defined range sharding, it doesn't even need to sort, but that's also thanks to Partition-Wise joins because all indexes are local (there's no global index with PostgreSQL declarative partitioning)

In summary

We will merge PostgreSQL 15 soon, and follow with future versions, but if you wait for it because you want the "Foreign keys referencing partitioned tables" feature that appeared in PostgreSQL 12, then I recommend that you look closer at your use case. In many cases, either you don't need partitioning (because sharding is the right way to distribute in YugabyteDB) or you partition the child tables on the same key (because the partitioning in YugabyteDB is used to group related rows). If you have a use case that doesn't fit in the same partition key design, then please comment, I'll be happy to update this post

Top comments (0)