DEV Community

Cover image for Data type mismatch in referential integrity and its implications
Bolaji Wahab
Bolaji Wahab

Posted on

Data type mismatch in referential integrity and its implications

I have seen situations where different compatible types were used in referential integrity AKA foreign key constraints. When we say data types are compatible, it means there is a conversion path between the types.

The catalog pg_cast stores data type conversion paths, both built-in and user-defined but it does not represent every type conversion that the system knows how to perform. Yet, we can infer some compatible types from pg_cast.

Let us have a look at compatible types for integer and varchar

postgres=# SELECT castsource::regtype, casttarget::regtype FROM pg_cast WHERE castsource IN ('integer'::regtype, 'varchar'::regtype) ORDER BY castsource;
    castsource     |    casttarget
-------------------+-------------------
 integer           | bigint
 integer           | smallint
 integer           | real
 integer           | double precision
 integer           | numeric
 integer           | money
 integer           | boolean
 integer           | oid
 integer           | regproc
 integer           | regprocedure
 integer           | regoper
 integer           | regoperator
 integer           | regclass
 integer           | regcollation
 integer           | regtype
 integer           | regconfig
 integer           | regdictionary
 integer           | regrole
 integer           | regnamespace
 integer           | "char"
 integer           | bit
 character varying | character
 character varying | "char"
 character varying | name
 character varying | regclass
 character varying | xml
 character varying | character varying
 character varying | text
(28 rows)
Enter fullscreen mode Exit fullscreen mode

Let us assume we have a products table and another table storing orders of those products. We want to ensure that the orders table only contains orders of products that actually exist. So, we define a foreign key constraint in the orders table that references the products table:

postgres=# CREATE TABLE products (
    product_number bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    product_name text,
    price numeric
);
CREATE TABLE
Enter fullscreen mode Exit fullscreen mode
postgres=# CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_number integer REFERENCES products (product_number), 
    quantity integer 
);
CREATE TABLE
Enter fullscreen mode Exit fullscreen mode

Everything seems to work. Let us populate the tables

postgres=# INSERT INTO products (product_name, price) VALUES ('book', 10), ('pen', 5), ('bag', 15);
INSERT 0 3
Enter fullscreen mode Exit fullscreen mode
postgres=# INSERT INTO orders (order_id, product_number, quantity) VALUES (1, 1, 1), (2, 2, 1), (3, 3, 1);
INSERT 0 3
Enter fullscreen mode Exit fullscreen mode

Also, everything works. But what happens when product_number is greater than 2147483648 (max value of integer).

postgres=# SELECT setval('products_product_number_seq', 2147483648);
   setval
------------
 2147483648
(1 row)

postgres=# INSERT INTO products (product_name, price) VALUES ('shoe', 20) RETURNING product_number;
 product_number
----------------
     2147483649
(1 row)

postgres=# INSERT into orders (order_id, product_number, quantity) VALUES (4, 2147483649, 1);
ERROR:  integer out of range
Enter fullscreen mode Exit fullscreen mode

We got integer out of range error. One way to fix this is to change orders.product_number type to bigint but we risk full table rewrite and application downtime for the duration of the change. Let us find out:

postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
 relname | relfilenode
---------+-------------
 orders  |       42404
(1 row)
Enter fullscreen mode Exit fullscreen mode
postgres=# ALTER TABLE orders ALTER COLUMN product_number TYPE bigint ;
ALTER TABLE
postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
 relname | relfilenode
---------+-------------
 orders  |       42419
Enter fullscreen mode Exit fullscreen mode

We can see relfilenode has changed, this tells us the table was rewritten. Imagine having to do this in a heavily-loaded large database.
There are other ways of adjusting the type such as using triggers and mirrored columns but that can be tasking.

Most common source of type mismatch in referential integrity happens when using types that have size/length constraint that such as smallint, integer serial, character varying(N).

So, is there a way we can catch this kind of mismatch earlier, maybe during development/review?
We can use the query below which produces nice info/message about type mismatch in referential integrity

WITH pk AS (
    SELECT CASE nc.nspname 
               WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
               ELSE c.conrelid::regclass::text
           END AS conrelid,
           c.conindid::regclass,
           string_agg(a.attname, ', ' order by a.attnum) as attnames,
           string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
      FROM pg_namespace AS nc
      JOIN pg_class AS r ON nc.oid = r.relnamespace
      JOIN pg_attribute AS a ON r.oid = a.attrelid
      JOIN pg_constraint AS c ON r.oid = c.conrelid
     WHERE a.attnum = ANY (c.conkey)
       AND NOT a.attisdropped 
       AND c.contype = 'p' 
       AND r.relkind IN ('r', 'p')
       AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
  GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
), fk AS (
    SELECT CASE nc.nspname 
               WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
               ELSE c.conrelid::regclass::text
           END AS conrelid,
           c.conindid::regclass,
           string_agg(a.attname, ', ' order by a.attnum) as attnames,
           string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
      FROM pg_namespace AS nc
      JOIN pg_class AS r ON nc.oid = r.relnamespace
      JOIN pg_attribute AS a ON r.oid = a.attrelid
      JOIN pg_constraint AS c ON r.oid = c.conrelid
      JOIN pk ON pk.conindid = c.conindid
     WHERE a.attnum = ANY (c.conkey)
       AND NOT a.attisdropped 
       AND c.contype = 'f' 
       AND r.relkind IN ('r', 'p') 
       AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
  GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
)
SELECT 'Parent ' || pk.conrelid || '.{' || pk.attnames || '} datatype {' || pk.atttypid || '} is different from child ' || fk.conrelid || '.{' || fk.attnames || '} datatype {' || fk.atttypid  || '}' AS info
  FROM pk 
  JOIN fk ON pk.conindid = fk.conindid
 WHERE pk.atttypid != fk.atttypid;
                                                                                       info
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Parent public.products.{product_number} datatype {bigint} is different from child public.orders.{product_number} datatype {integer}
Enter fullscreen mode Exit fullscreen mode

The query above can also be incorporated into some continuous integration.

Summary

It is easy to miss out the right data types when creating referential integrity and this can have impactful implications during the lifecycle of an application. Few of the implications are:

  • Application errors due to size/length constraint such as in integer and character varying(N).
  • Application downtime when setting the right data type.
  • If there is a rewrite, the rewrite will temporarily require as much as double the disk space of the table.

Key takeaways

Review the data types in referential integrity as part of your database design review.

Top comments (2)

Collapse
 
bruceoutdoors profile image
Lee Zhen Yong • Edited

Never realise you can write a query for these mismatches. Good stuff. Though I think postgres ought to implement some setting to block this behaviour.

Collapse
 
bolajiwahab profile image
Bolaji Wahab

SQL implementation allows such behavior so postgres is just following the implementation.