Franck Pachot for YugabyteDB

Posted on Mar 16, 2023 • Edited on Jul 10, 2023

Retype a column in YugabyteDB (and PostgreSQL)

#postgres #yugabytedb #distributed #sql

You inherit a data model that does not use the best datatypes, like using int instead of bigint for a value that can be larger? Don't wait that the tables grows larger. Here is an example to show how to retype it efficiently. Usually, those operations are done in two times:

The DDL that may block the application must be short, with metadata changes only
The DML that modifies data can take longer as long as they are online, without impact on the application availability

During the intermediate phase, you have have to maintain both version of the columns. Because YugabyteDB is PostgreSQL compatible, with triggers, you may even do that without any change to the application code.

Here is an example.

Demo table

I have the following table:

create table demo ( k bigserial primary key, a int);
insert into demo(a) select generate_series(1,10000);
\d demo

yugabyte=> \d demo

                            Table "public.demo"
 Column |  Type   | Collation | Nullable |             Default
--------+---------+-----------+----------+---------------------------------
 k      | bigint  |           | not null | nextval('demo_k_seq'::regclass)
 a      | integer |           |          |
Indexes:
    "demo_pkey" PRIMARY KEY, lsm (k HASH)

Unfortunately, column a was defined as int and I want to change it to bigint to get larger values.

Add the new column

First, I'll add a new column a_new with the target datatype, bigint and a trigger to update it from a:


alter table demo add column a_new bigint;

create or replace function a_new() returns trigger as $$
 begin new.a_new := new.a; return new; end; $$ language plpgsql;

create trigger a_new_trigger
before insert or update of a on demo for each row
 execute function a_new();

This will synchronize the new column for new inserts or when a is updated but I have all existing rows to update:

yugabyte=> select * from demo limit 5;
  k   |  a   | a_new
------+------+-------
 4443 | 4443 |
  212 |  212 |
 8937 | 8937 |
 3710 | 3710 |
 8899 | 8899 |
(5 rows)

Backfill the new column

I can do it in one transaction:

update demo set a_new=a where a_new is null;

However, this will lock the rows during the statement duration and the application DML (actually only UPDATE of a and DELETE - see this
blog post to know why) may wait or timeout. It is better to do it in small transactions.

Optionally, to reduce the reads, I can create an index on what remains to update:

create index demo_a_new on demo(k) where a_new is null;

This can take long but is online, with backfill, and will help to update by small batches without re-scanning the table each time. You may choose to do the same without the index. With the index, you do less reads but more writes.

This will update by batch of 1000:

/*+ Set(yb_bnl_batch_size 1000) */
with updated as (
update demo set a_new=a where k in ( select k from demo where a_new is null limit 1000 )
returning 1 ) select count(*) as updated ,1/count(*) as fail_when_zero from updated
;
\watch 1

It will stop with ERROR: division by zero when there is no more to update. I use that to stop the \watch without more code but you can do better of course.

The most important to be sure that it is efficient is to look at the execution plan with explain (costs off, analyze, dist):

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (actual time=116.497..116.497 rows=1 loops=1)
   CTE updated
     ->  Update on demo (actual time=84.384..116.095 rows=1000 loops=1)
           ->  YB Batched Nested Loop Join (actual time=84.325..108.839 rows=1000 loops=1)
                 Join Filter: (demo.k = "ANY_subquery".k)
                 ->  HashAggregate (actual time=3.925..4.061 rows=1000 loops=1)
                       Group Key: "ANY_subquery".k
                       ->  Subquery Scan on "ANY_subquery" (actual time=3.098..3.661 rows=1000 loops=1)
                             ->  Limit (actual time=3.093..3.495 rows=1000 loops=1)
                                   ->  Index Only Scan using demo_a_new on demo demo_1 (actual time=3.092..3.407 rows=1000 loops=1)
                                         Heap Fetches: 0
                                         Storage Index Read Requests: 1
                                         Storage Index Execution Time: 4.000 ms
                 ->  Index Scan using demo_pkey on demo (actual time=79.913..103.305 rows=1000 loops=1)
                       Index Cond: (k = ANY (ARRAY["ANY_subquery".k, $2, $3, ..., $1000]))
                       Storage Index Read Requests: 1
                       Storage Index Execution Time: 16.000 ms
   ->  CTE Scan on updated (actual time=84.387..116.390 rows=1000 loops=1)
 Planning Time: 0.902 ms
 Execution Time: 153.539 ms
 Storage Read Requests: 2
 Storage Write Requests: 1
 Storage Execution Time: 56.000 ms
 Peak Memory Usage: 121241 kB
(24 rows)

This confirms that I have only two read requests and one write request for each batch.

The index, that has no entries now, can be removed:

drop index demo_a_new;

Validate the state of the new column

I validate that a_new is synchronized even after some DML:

yugabyte=> select * from demo where k<=3;
 k | a | a_new
---+---+-------
 3 | 3 |     3
 2 | 2 |     2
 1 | 1 |     1
(3 rows)

yugabyte=> update demo set a=a*2;
UPDATE 10000

yugabyte=> select * from demo where k<=3;
 k | a | a_new
---+---+-------
 3 | 6 |     6
 2 | 4 |     4
 1 | 2 |     2
(3 rows)

yugabyte=> insert into demo values(0,42);
INSERT 0 1

yugabyte=> select * from demo where k<=3;
 k | a  | a_new
---+----+-------
 3 |  6 |     6
 2 |  4 |     4
 0 | 42 |    42
 1 |  2 |     2
(4 rows)

Switch to the new column

With no urge and no stress, we are ready for the last step: dropping a and renaming a_new to a:

alter table demo rename column a to a_old;
alter table demo rename column a_new to a;

drop function a_new cascade;
alter table demo drop column a_old;

This must be done while the application does not query the table because in YugabyteDB 2.17 we don't do transactional DDL yet (#3109). However, this is short as it updates only the table metadata. And it is done in a way where it can raise an error but not corrupt data.

Here is the table in the final state:

yugabyte=> select * from demo where k<=3;
 k | a
---+----
 3 |  6
 2 |  4
 0 | 42
 1 |  2
(4 rows)

Remarks

You don't have to use a trigger. This can be done also with two application releases:

the first one adds the column and updates both of them in the application
the second one, after the update of all existing columns, removes the old one and renames the first one

Even if, in theory, you don't need this in PostgreSQL because DDL is transactional, you still want to avoid long transactions in PostgreSQL and this is still a good alternative.

In YugabyteDB 2.19 you can simply:

yugabyte=# alter table demo alter column a type bigint;
ALTER TABLE

But you may still prefer to control when the change concerns metadata only (fast, but concurrent sessions may get a serializable error) or data (longer but with less impact on concurrent transactions).

DEV Community