DEV Community

Franck Pachot for YugabyteDB

Posted on • Updated on

No-gap sequence in PostgreSQL and YugabyteDB

Sequences are special in a SQL database. To be scalable, they are not transactional and, by consequence, the result may show some gaps.

Non-transactional sequences

For example, in PostgreSQL, I create a table where the ID comes from a sequence:

create sequence my_sequence;
create table demo (
 primary key(id)
 , id bigint default nextval('my_sequence')
 , value text
);
Enter fullscreen mode Exit fullscreen mode

I insert two rows, but with a failed attempt that was rolled back:

begin transaction;
insert into demo(value) values('one');
commit;
begin transaction;
insert into demo(value) values('two');
rollback;
begin transaction;
insert into demo(value) values('two');
commit;
Enter fullscreen mode Exit fullscreen mode

The result has a gap:

postgres=# select * from demo order by id;

 id | value
----+-------
  1 | one
  3 | two
(2 rows)
Enter fullscreen mode Exit fullscreen mode

This is because the increment of the sequence is not transactional and was not rolled back, and generally, it doesn't matter.

Cached sequences

There's another reason for seeing gaps even when no transactions were rolled back. To avoid incrementing a single row each time we get the next value, a range of value is cached:

alter sequence my_sequence cache 100;
Enter fullscreen mode Exit fullscreen mode

PostgreSQL caches it per connection, and the range of value is lost if we reconnect:

insert into demo(value) values('three');
\connect
insert into demo(value) values('four');
insert into demo(value) values('five');
Enter fullscreen mode Exit fullscreen mode

Here is the result

postgres=# select * from demo order by id;

 id  | value
-----+-------
   1 | one
   3 | two
   4 | three
 104 | four
 105 | five
(5 rows)
Enter fullscreen mode Exit fullscreen mode

In YugabyteDB, the sequence is cached on the tablet server and you may not see this gap when re-connecting. However, YugabyteDB is elastic and resilient by allowing nodes to be added or removed and this will also introduce a gap.

This should not be a problem. SQL sequences are provided to generate unique values, and doing so by incrementing a number is an implementation optimization. You should not expect a no-gap sequence.

Re-numerotation after commit

If you need a no-gap numbering, you can do that after commit, with another column:

alter table demo add column num bigint;
create unique index demo_num on demo(num asc);
alter table demo add unique using index demo_num;
Enter fullscreen mode Exit fullscreen mode

I run a job that adds the no-gap numbering for the rows that don't have a number yet in this column:

update demo 
 set num=new.num
 from (
  select id
   , coalesce( (select max(num) from demo) ,0)
     + row_number()over(order by id)
     as num
  from demo where num is null order by id
 ) new where demo.id=new.id
;
Enter fullscreen mode Exit fullscreen mode

You can check the execution plan to be sure that it uses an Index Scan to get the max(num). The result is:

postgres=# select * from demo order by num;
 id  | value | num
-----+-------+-----
   1 | one   |   1
   3 | two   |   2
   4 | three |   3
 104 | four  |   4
 105 | five  |   5
(5 rows)
Enter fullscreen mode Exit fullscreen mode

This can be sufficient if you want a no-gap number before a specific point in time, like some end of month accounting. Be aware of the cost of this in PostgreSQL where updating a single column copies the whole row, with WAL, bloat, and need to vacuum consequences. YugabyteDB doesn't have those problems.

No-gap sequence from a transactional table

If you want an immediate no-gap sequence, you need to implement it yourself from a transactional table. This is quite easy but you must accept that having multiple sessions asking for the next value for one sequence will have to queue to get their number and this will reach some scalability limits.

A simple way is implementing those in a table with a function to get the next value. Here is an example where I want a no-gap sequence starting at one each year, to generate invoice numbers:

create table next_invoice_number(
 year int primary key, num int default 1
);

create function next_invoice_number(year int) returns int as $$
  insert into next_invoice_number as tab (year)
  values(next_invoice_number.year)
  on conflict (year) do
  update set num=tab.num+1
  returning tab.num
  ;
$$ language sql;
Enter fullscreen mode Exit fullscreen mode

The function will increment the value and return it. If there's no record yet for the year, a new one is created. This, with serializable isolation level, will generate no-gap numbers:

create table invoices (
 primary key(year, num) 
 , year int, num int
);

insert into invoices 
 select 2023, next_invoice_number(2023) 
 from generate_series(1,10)
 returning num
;
Enter fullscreen mode Exit fullscreen mode

The result:

postgres=# select * from invoices order by num;
 year | num
------+-----
 2023 |   1
 2023 |   2
 2023 |   3
 2023 |   4
 2023 |   5
 2023 |   6
 2023 |   7
 2023 |   8
 2023 |   9
 2023 |  10
(10 rows)
Enter fullscreen mode Exit fullscreen mode

No-Gap and Scalability

With pgbench, I test the throughput when generating numbers from one session only:

\! pgbench -t 1000 -nf <(echo '
insert into invoices values (2023, next_invoice_number(2023) );
') -c 1 

pgbench (16beta2, server 15.1 (Debian 15.1-1.pgdg110+1))
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 0.981 ms
initial connection time = 5.319 ms
tps = 1019.719332 (without initial connection time)
Enter fullscreen mode Exit fullscreen mode

Now I can compare this 1019 transactions per second, at 1 millisecond latency, with a run with 10 concurrent threads:

\! pgbench -t 1000 -nf <(echo '
insert into invoices values (2023, next_invoice_number(2023) );
') -c 10

pgbench (16beta2, server 15.1 (Debian 15.1-1.pgdg110+1))
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 14.528 ms
initial connection time = 53.172 ms
tps = 688.320969 (without initial connection time)
Enter fullscreen mode Exit fullscreen mode

There is a trade-off here. If you need no-gap numbering, there's no other solution than having a single point to generate the numbers.

Note that all those examples work the same in PostgreSQL and in YugabyteDB. I've created the unique constraint on num by creating the unique index beforehand so that I can precise ASC for range sharding. The default would have been HASH on YugabyteDB but there's a good chance that if you want a no-gap sequence you will also do some range queries on it.

Top comments (0)