HAP

Posted on Feb 6, 2021 • Edited on Feb 8, 2021

Compare table structures in two schemata

#database #sql #troubleshoot #postgres

Bugs are the bane of programmers' existence. But like death and taxes... well, death, anyway*; you can't get away from them entirely. This is especially true of large application endeavors including those with multiple schemata, which is what I'll be discussing here.

So, our application is using multiple tenant schemata for our clients and the structure must match. We encountered a situation that possibly could have included checking table structures between our template schema and a customer schema. This led to the development of this query.

The query utilizes the information_schema which should be present in most RDBMS engines, so this should be fairly portable.

To use this query, you'll need to set two to three parameters (depending on the number of tables you wish to compare).

Let's now examine the query:

with check_params as (
    select 'my-schema-to-check' as check_schema,  -- put schema to check here!
           'my-known-good-schema' as main_schema, -- put the known good schema here
           null::text as check_table     -- put table name to check here (or null for all)!
),
display_table as (
select coalesce(mn.table_name, ck.table_name) as table_name,
       case when mn.table_schema is null 
                 then ck.table_name || ' <null>'
            when ck.table_schema is null
                 then 'null ' || ' <' || mn.table_name || '>'
            else mn.table_name
       end::text as disp_table_name
  from (
         select table_schema,
                table_name
           from information_schema.tables
          cross
           join check_params cp
          where table_schema = cp.main_schema
       ) as mn
  full
  join (
         select table_schema,
                table_name
           from information_schema.tables
          cross
           join check_params cp
          where table_schema = cp.check_schema
       ) as ck
    on ck.table_name = mn.table_name
),
full_table_compare as (
select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
       (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
       case when c.column_name is distinct from t.column_name
                 then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
            else t.column_name
       end::text as "column_name",
       case when c.udt_name is distinct from t.udt_name
                 then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
            else t.udt_name
       end::text as column_type,
       case when c.character_maximum_length is distinct from t.character_maximum_length
                 then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
            else t.character_maximum_length
       end::text as text_length,
       case when c.numeric_precision is distinct from t.numeric_precision
                 then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
            else t.numeric_precision
       end::text as precision,
       case when c.numeric_scale is distinct from t.numeric_scale
                 then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
            else t.numeric_scale
       end::text as scale,
       case when c.is_nullable is distinct from t.is_nullable
                 then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
            else t.is_nullable
       end::text as is_nullable,
       case when c.column_default is distinct from t.column_default
                 then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
            else t.column_default
       end::text as col_default
  from (
         -- Get info from the "good" schema
         select cl.table_schema,
                cl.table_name,
                cl.column_name,
                cl.udt_name,
                coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
                coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
                coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
                coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
                regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
                cp.check_schema,
                cp.main_schema
           from information_schema.columns cl
          cross
           join check_params cp
          where cl.table_schema = cp.main_schema
            and cl.table_name = coalesce(cp.check_table, cl.table_name)
       ) as t
  full 
  join (
         -- Get info from "suspect" schema
         select cl.table_schema,
                cl.table_name,
                cl.column_name,
                cl.udt_name,
                coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
                coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
                coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
                coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
                regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default,
                cp.check_schema,
                cp.main_schema
           from information_schema.columns cl
          cross
           join check_params cp
          where cl.table_schema = cp.check_schema
            and cl.table_name = coalesce(cp.check_table, table_name)
       ) as c
    on c.table_name = t.table_name
   and c.column_name = t.column_name

)
select schema_name, 
       table_name,
       column_name,
       column_type,
       text_length,
       precision,
       scale,
       is_nullable,
       col_default
  from full_table_compare
       -- only report the differences
 where table_name ~ '<'
    or column_name ~ '<'
    or column_type ~ '<'
    or text_length ~ '<'
    or precision ~ '<'
    or scale ~ '<'
    or is_nullable ~ '<'
    or col_default ~ '<'
 order 
    by table_name,
       column_name;

Wow.

Let's break it down.

check_params

with check_params as (
    select 'my-schema-to-check' as check_schema,  -- put schema to check here!
           'my-known-good-schema' as main_schema, -- put the known good schema here
           null::text as check_table     -- put table name to check here (or null for all)!
)

This is the CTE that will have our parameters. This was created so that we wouldn't have to attempt to try script variables and to only have to define the values once. Just replace the strings with the requisite schema names to compare all tables in both schemas. Replace the null::text for the check_table param with a table name to constrain to that one table.

display_table

display_table as (
select coalesce(mn.table_name, ck.table_name) as table_name,
       case when mn.table_schema is null 
                 then ck.table_name || ' <null>'
            when ck.table_schema is null
                 then 'null ' || ' <' || mn.table_name || '>'
            else mn.table_name
       end::text as disp_table_name
  from (
         select table_schema,
                table_name
           from information_schema.tables
          cross
           join check_params cp
          where table_schema = cp.main_schema
       ) as mn
  full
  join (
         select table_schema,
                table_name
           from information_schema.tables
          cross
           join check_params cp
          where table_schema = cp.check_schema
       ) as ck
    on ck.table_name = mn.table_name
)

This CTE is designed to help with output. Because we are comparing if a table does or does not exist as well as its column existing or not, we have an issue. The table name is replicated on the information_schema.columns record for each column. So we want to display it correctly regardless of presence/absence of the table itself or for a column mismatch.

Why a full join? We want to compare both ways.

full_table_compare

full_table_compare as (
select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
       (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
       case when c.column_name is distinct from t.column_name
                 then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
            else t.column_name
       end::text as "column_name",
       case when c.udt_name is distinct from t.udt_name
                 then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
            else t.udt_name
       end::text as column_type,
       case when c.character_maximum_length is distinct from t.character_maximum_length
                 then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
            else t.character_maximum_length
       end::text as text_length,
       case when c.numeric_precision is distinct from t.numeric_precision
                 then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
            else t.numeric_precision
       end::text as precision,
       case when c.numeric_scale is distinct from t.numeric_scale
                 then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
            else t.numeric_scale
       end::text as scale,
       case when c.is_nullable is distinct from t.is_nullable
                 then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
            else t.is_nullable
       end::text as is_nullable,
       case when c.column_default is distinct from t.column_default
                 then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
            else t.column_default
       end::text as col_default
  from (
         -- Get info from the "good" schema
         select cl.table_schema,
                cl.table_name,
                cl.column_name,
                cl.udt_name,
                coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
                coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
                coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
                coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
                regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
                cp.check_schema,
                cp.main_schema
           from information_schema.columns cl
          cross
           join check_params cp
          where cl.table_schema = cp.main_schema
            and cl.table_name = coalesce(cp.check_table, cl.table_name)
       ) as t
  full 
  join (
         -- Get info from "suspect" schema
         select cl.table_schema,
                cl.table_name,
                cl.column_name,
                cl.udt_name,
                coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
                coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
                coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
                coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
                regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default,
                cp.check_schema,
                cp.main_schema
           from information_schema.columns cl
          cross
           join check_params cp
          where cl.table_schema = cp.check_schema
            and cl.table_name = coalesce(cp.check_table, table_name)
       ) as c
    on c.table_name = t.table_name
   and c.column_name = t.column_name
)

So there's two main parts to this query.

schema information

         select cl.table_schema,
                cl.table_name,
                cl.column_name,
                cl.udt_name,
                coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length,
                coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision,
                coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale,
                coalesce(cl.is_nullable::text, 'null')::text as is_nullable,
                regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default,
                cp.check_schema,
                cp.main_schema
           from information_schema.columns cl
          cross
           join check_params cp
          where cl.table_schema = cp.main_schema
            and cl.table_name = coalesce(cp.check_table, cl.table_name)

We need to get specific information from the information_schema regarding the columns for the target table(s) in the main or good or master or template schema (whatever you with to call it). This is by far not exhaustive, but it's a good place to start for basic structure comparisons. We also need to get the same information for the target, suspect, bad, or check schema (again, whatever you with to call it).

This information is:

table_schema : Namespace of the table
table_name : Name of the table
column_name : Name of the column
udt_name : Name of the (internal) data type of the column
character_maximum_length : Len of a char or varchar column. This will be null for a text type column.
numeric_precision : Precision of a numeric/decimal column
numeric_scale : Scale of a numeric/decimal column
is_nullable : YES if it can contain null else NO
column_default : The default value for the column

Some operations are done at these queries to make the comparison logic easier such as type changes or string scrubbing.

Full joins are again used to allow for bi-directional comparisons.

comparison logic

select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name",
       (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name",
       case when c.column_name is distinct from t.column_name
                 then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>'
            else t.column_name
       end::text as "column_name",
       case when c.udt_name is distinct from t.udt_name
                 then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>'
            else t.udt_name
       end::text as column_type,
       case when c.character_maximum_length is distinct from t.character_maximum_length
                 then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>'
            else t.character_maximum_length
       end::text as text_length,
       case when c.numeric_precision is distinct from t.numeric_precision
                 then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>'
            else t.numeric_precision
       end::text as precision,
       case when c.numeric_scale is distinct from t.numeric_scale
                 then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>'
            else t.numeric_scale
       end::text as scale,
       case when c.is_nullable is distinct from t.is_nullable
                 then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>'
            else t.is_nullable
       end::text as is_nullable,
       case when c.column_default is distinct from t.column_default
                 then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>'
            else t.column_default
       end::text as col_default

The rubber meets the road here. This is where all of the comparisons of the records from the two schemata are done. Basically, it checks for differences and reports them in a specified manner. Any difference will be reported as
suspect_schema_value <good_schema_value> meaning that the "good" values will be surrounded by < and > characters.

If a column has a null value it will appear as null or <null> depending if the value originated from the "suspect" or "good" schema.

If the values are identical, then there will only be the plaintext value listed as no other indicators are necessary.

The use of IS DISTINCT FROM was crucial to simplify the comparisons. This is an operator that evaluates null as if it were a value instead of, well... null. So if we consider null to have a value then x IS DISTINCT FROM y would be similar to coalesce(x, 'x') != coalesce(y, 'y') and x IS NOT DISTINCT FROM y would be equivalent to coalesce(x, 'x') != coalesce(y, 'y'). Check out the document link for the official explanation.

assembly

select schema_name, 
       table_name,
       column_name,
       column_type,
       text_length,
       precision,
       scale,
       is_nullable,
       col_default
  from full_table_compare
       -- only report the differences
 where table_name ~ '<'
    or column_name ~ '<'
    or column_type ~ '<'
    or text_length ~ '<'
    or precision ~ '<'
    or scale ~ '<'
    or is_nullable ~ '<'
    or col_default ~ '<'
 order 
    by table_name,
       column_name;

This is a lot of data and, on a good day, it will be mostly if not all in sync. So we're really only interested in the differences. This final query only pulls the records from the output of the full_table_compare CTE but only those records where any column contains a < character denoting a difference. We're checking every column except schema_name as it will always have the format suspect_schema <good_schema> as an indicator of the origin of reported values.

Demo time!

Let's start by creating two schemata:

postgres=# create schema eep;
CREATE SCHEMA
postgres=# create schema opp;
CREATE SCHEMA

Now we'll create some sample tables to play with.

postgres=# create table eep.ork
postgres-# (
postgres(#     id bigserial primary key,
postgres(#     label text,
postgres(#     created_ts timestamptz,
postgres(#     data jsonb
postgres(# );
CREATE TABLE
postgres=# create table eep.oof
postgres-# (
postgres(#     uniq_label text primary key
postgres(# );
CREATE TABLE
postgres=# create table opp.ork
postgres-# (
postgres(#     id bigserial primary key,
postgres(#     label text,
postgres(#     created_ts timestamptz,
postgres(#     data jsonb
postgres(# );
CREATE TABLE
postgres=# create table opp.ah_ah
postgres-# (
postgres(#     id uuid primary key,
postgres(#     authentication_header jsonb
postgres(# );
CREATE TABLE

We'll change our query parameters to say the eep schema is "good" and opp is "suspect". We'll also set it to only look at the ork table.

with check_params as (
    select 'opp' as check_schema,  -- put schema to check here!
           'eep' as main_schema, -- put the known good schema here
           'ork'::text as check_table     -- put table name to check here (or null for all)!
),

Running the query, we see the following output:

 schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default 
-------------+------------+-------------+-------------+-------------+-----------+-------+-------------+-------------
(0 rows)

素晴らしい! No differences!

Let's expand to all of the tables.

with check_params as (
    select 'opp' as check_schema,  -- put schema to check here!
           'eep' as main_schema, -- put the known good schema here
           null::text as check_table     -- put table name to check here (or null for all)!
),

Running the query, we now see:

 schema_name |  table_name  |         column_name          | column_type  | text_length |  precision  |    scale    | is_nullable | col_default 
-------------+--------------+------------------------------+--------------+-------------+-------------+-------------+-------------+-------------
 opp <eep>   | ah_ah <null> | authentication_header <null> | jsonb <null> | null <null> | null <null> | null <null> | YES <null>  | 
 opp <eep>   | ah_ah <null> | id <null>                    | uuid <null>  | null <null> | null <null> | null <null> | YES <null>  | 
 opp <eep>   | null  <oof>  | null <uniq_label>            | null <text>  | null <null> | null <null> | null <null> | null <NO>   | 
(3 rows)

So, let's examine the output.

First, there's the schema_name column. This shows the suspect schema followed by the good schema (surrounded by < and > chars).

Next we have the table_name column. Note there there is an ah_ah table that is in the opp schema, but not in the eep schema. Also, tere is an oof table that is in the eep schema that is not in the opp schema.

The rest of the columns show the differences at each column: type, length, precision, scale, nullable, default.

So we can tell missing tables, but how about actual differences?

Let's change opp.ork a bit.

Change the label from text to varchar(256)

postgres=# alter table opp.ork alter column label set data type varchar(256);
ALTER TABLE

Add a new column updated_ts

postgres=# alter table opp.ork add column updated_ts timestamptz;
ALTER TABLE

Just for fun, let's also alter eep.ork.

Add an audited column

postgres=# alter table eep.ork add column audited boolean not null default false;
ALTER TABLE

Now let's constrain the query back to only the ork table:

with check_params as (
    select 'opp' as check_schema,  -- put schema to check here!
           'eep' as main_schema, -- put the known good schema here
           'ork'::text as check_table     -- put table name to check here (or null for all)!
),

Now, running the query, we see:

 schema_name | table_name |    column_name    |    column_type     | text_length |  precision  |    scale    | is_nullable | col_default  
-------------+------------+-------------------+--------------------+-------------+-------------+-------------+-------------+--------------
 opp <eep>   | ork        | label             | varchar <text>     | 256 <null>  | null        | null        | YES         | 
 opp <eep>   | ork        | null <audited>    | null <bool>        | null <null> | null <null> | null <null> | null <NO>   | null <false>
 opp <eep>   | ork        | updated_ts <null> | timestamptz <null> | null <null> | null <null> | null <null> | YES <null>  | 
(3 rows)

So we've caught the extra column audited in eep.ork, the extra column updated_ts in opp.ork and the change to the common label column.

In closing, this could prove a useful tool to compare tables between schemata. This could be the basis for extending comparisons across databases on the same engine as well.

Hopefully this will prove useful and be helpful to database developers.

*I always pay my tax. Really!