DEV Community

Franck Pachot for YugabyteDB

Posted on • Updated on

Foreign Key validation in YugabyteDB when created in NOT VALID

If you're migrating to YugabyteDB or restoring a ysql_dump export, you run a script that creates the tables with their primary key, imports data with COPY, creates secondary indexes, and create constraints. In doing so, you have the option to change all foreign key constraint creation as NOT VALID to save time for validation. This is a good option when you're confident that there are no foreign key violations in the data you are importing.

You may still want to validate them later. This can take long and is currently not implemented in YugabyteDB (#3946). Here is a script that generates a list of all key violations for NOT VALID constraints by running an EXCEPT query between the child and parent tables.

The script is designed to run on both PostgreSQL and YugabyteDB. For YugabyteDB, there is an additional optimization to take advantage of the Distinct Pushdown. The EXCEPT clause is executed in YSQL, which is the PostgreSQL backend, using the input from the keys fetched from the parent and child distributed tablets. To avoid sending too many rows between nodes, a DISTINCT operation is performed when querying the foreign keys from the child table.

If there is a range index, it can be utilized to pushdown the DISTINCT. This requires a range scan, which is possible in the version I am using - YugabyteDB 2.19.3. To achieve this, I create a function that checks the execution plan to see if an Index Scan is possible, and returns the WHERE clause to range scan from the first value:

create or replace function add_range_for_distinct_pushdown(tablename text, columns text) returns text as $$
declare
 plan text;
begin
 -- don't add anything if already pushed down (#16771 fixed)
 execute format('explain (format xml) select distinct %s from %s',columns,tablename,columns) into plan;
 if plan like '%<Node-Type>Distinct Index Only Scan</Node-Type>%' then return ''; end if;
 -- check if getting the minimum value can use Index Scan
 execute format('explain (format xml) select %s from %s order by %s',columns,tablename,columns) into plan;
 -- if Index Scan is possible, return the where clause to start a range scan at first value
 if plan not like '%<Node-Type>Index%' then return ''; end if;
  return format('where (%s) >= (select %s from %s order by %s limit 1)',columns,columns,tablename,columns);
end;
$$ language plpgsql
;
Enter fullscreen mode Exit fullscreen mode

The function returns an additional where clause when the Index Scan is possible, or an empty text string when not.

The following query reads pg_constraint for NOT VALID constraints, gets the constraint definition to extract the list of columns for the foreign and referenced keys, and generates the query to get the violations. Those queries are run in psql by \gexec:

with fk as (
select
 '^FOREIGN KEY [(](.*)[)] REFERENCES (.*)[(](.*)[)] NOT VALID$' re
 , pg_get_constraintdef(c.oid) constraintdef
 , conrelid::regclass::text tablename
from pg_constraint c
where c.contype='f' and not c.convalidated
and conname='demo2_id_demo1_id'
) select
 format('
   select %L as _table_ , %L as _foreign_key_
    ,* from (
    select distinct %s from %s
    except
    select %s from %s
   ) keys_in_violation'
 , tablename
 , constraintdef
 , regexp_replace(constraintdef,re,'\1') --fk_cols
 , tablename
   -- this is added to do a range scan for YugabyteDB to use Distinct Pushdown
   ||' '|| add_range_for_distinct_pushdown(tablename,regexp_replace(constraintdef,re,'\1'))
 , regexp_replace(constraintdef,re,'\3') --ref_cols
 , regexp_replace(constraintdef,re,'\2') --ref_tab
) from fk
;
\timing on
\gexec

Enter fullscreen mode Exit fullscreen mode

You can direct the output to a file and search for NOT VALID [|] to find the foreign key violations, if any.


If you want to test it, here is an example that creates two tables with two violations:

create extension if not exists pgcrypto;
create extension if not exists orafce;
create table demo1 ( id1 int default 0, id2 uuid default gen_random_uuid(), name text , primary key (id1, id2)) ;
create table demo2 ( id1 int default 0, id2 uuid, seq int, name text, primary key(id1 asc, id2 asc, seq) ) ;
insert into demo1(name) select dbms_random.string('p',100) from generate_series(1,1000);
insert into demo2 select id1,id2, generate_series, dbms_random.string('p',100) from demo1, generate_series(1,1000);
delete from demo1 where id2 in(select id2 from demo1 limit 2);
alter table demo2 add constraint demo2_id_demo1_id foreign key (id1,id2) references demo1(id1,id2) not valid;

Enter fullscreen mode Exit fullscreen mode

I deliberately deleted two parent rows to show the violations. Here is the result of my script:

\gexec
                                                       format
--------------------------------------------------------------------------------------------------------------------
                                                                                                                   +
    select 'demo2' as _table_ , 'FOREIGN KEY (id1, id2) REFERENCES demo1(id1, id2) NOT VALID' as _foreign_key_     +
     ,* from (                                                                                                     +
     select distinct id1, id2 from demo2 where (id1, id2) >= (select id1, id2 from demo2 order by id1, id2 limit 1)+
     except                                                                                                        +
     select id1, id2 from demo1                                                                                    +
    ) keys_in_violation
(1 row)

Time: 41.470 ms
yugabyte=> \timing on
Timing is on.
yugabyte=> \gexec
 _table_ |                        _foreign_key_                        | id1 |                 id2
---------+-------------------------------------------------------------+-----+--------------------------------------
 demo2   | FOREIGN KEY (id1, id2) REFERENCES demo1(id1, id2) NOT VALID |   0 | 004abb5a-a193-4098-9147-0b80ce86be29
 demo2   | FOREIGN KEY (id1, id2) REFERENCES demo1(id1, id2) NOT VALID |   0 | 00138193-30f2-4d6b-a377-f50d59938fc6
(2 rows)
Enter fullscreen mode Exit fullscreen mode

Note 1: the function to add a where clause will not be needed anymore when [YSQL] distinct pushdown requires a range predicate #16771 is fixed
Note 2: distinct pushdown may be possible with hash index when [YSQL] skip scan occurs on range but not hash #11881
is fixed
Note 3: When [YSQL] Add support for validating table constraints #3946 will be fixed, with as similar execution plan, this script will not be needed.
Note 4: the query that is run is displayed (with ; before \gexec and you can explain (analyze, dist, debug) to check if the YugabyteDB pushdowns happen.

Top comments (0)