Sequences are special in a SQL database. To be scalable, they are not transactional and, by consequence, the result may show some gaps.
Non-transactional sequences
For example, in PostgreSQL, I create a table where the ID comes from a sequence:
create sequence my_sequence;
create table demo (
primary key(id)
, id bigint default nextval('my_sequence')
, value text
);
I insert two rows, but with a failed attempt that was rolled back:
begin transaction;
insert into demo(value) values('one');
commit;
begin transaction;
insert into demo(value) values('two');
rollback;
begin transaction;
insert into demo(value) values('two');
commit;
The result has a gap:
postgres=# select * from demo order by id;
id | value
----+-------
1 | one
3 | two
(2 rows)
This is because the increment of the sequence is not transactional and was not rolled back, and generally, it doesn't matter.
Cached sequences
There's another reason for seeing gaps even when no transactions were rolled back. To avoid incrementing a single row each time we get the next value, a range of value is cached:
alter sequence my_sequence cache 100;
PostgreSQL caches it per connection, and the range of value is lost if we reconnect:
insert into demo(value) values('three');
\connect
insert into demo(value) values('four');
insert into demo(value) values('five');
Here is the result
postgres=# select * from demo order by id;
id | value
-----+-------
1 | one
3 | two
4 | three
104 | four
105 | five
(5 rows)
In YugabyteDB, the sequence is cached on the tablet server and you may not see this gap when re-connecting. However, YugabyteDB is elastic and resilient by allowing nodes to be added or removed and this will also introduce a gap.
This should not be a problem. SQL sequences are provided to generate unique values, and doing so by incrementing a number is an implementation optimization. You should not expect a no-gap sequence.
Re-numerotation after commit
If you need a no-gap numbering, you can do that after commit, with another column:
alter table demo add column num bigint;
create unique index demo_num on demo(num asc);
alter table demo add unique using index demo_num;
I run a job that adds the no-gap numbering for the rows that don't have a number yet in this column:
update demo
set num=new.num
from (
select id
, coalesce( (select max(num) from demo) ,0)
+ row_number()over(order by id)
as num
from demo where num is null order by id
) new where demo.id=new.id
;
You can check the execution plan to be sure that it uses an Index Scan to get the max(num)
. The result is:
postgres=# select * from demo order by num;
id | value | num
-----+-------+-----
1 | one | 1
3 | two | 2
4 | three | 3
104 | four | 4
105 | five | 5
(5 rows)
This can be sufficient if you want a no-gap number before a specific point in time, like some end of month accounting. Be aware of the cost of this in PostgreSQL where updating a single column copies the whole row, with WAL, bloat, and need to vacuum consequences. YugabyteDB doesn't have those problems.
No-gap sequence from a transactional table
If you want an immediate no-gap sequence, you need to implement it yourself from a transactional table. This is quite easy but you must accept that having multiple sessions asking for the next value for one sequence will have to queue to get their number and this will reach some scalability limits.
A simple way is implementing those in a table with a function to get the next value. Here is an example where I want a no-gap sequence starting at one each year, to generate invoice numbers:
create table next_invoice_number(
year int primary key, num int default 1
);
create function next_invoice_number(year int) returns int as $$
insert into next_invoice_number as tab (year)
values(next_invoice_number.year)
on conflict (year) do
update set num=tab.num+1
returning tab.num
;
$$ language sql;
The function will increment the value and return it. If there's no record yet for the year, a new one is created. This, with serializable isolation level, will generate no-gap numbers:
create table invoices (
primary key(year, num)
, year int, num int
);
insert into invoices
select 2023, next_invoice_number(2023)
from generate_series(1,10)
returning num
;
The result:
postgres=# select * from invoices order by num;
year | num
------+-----
2023 | 1
2023 | 2
2023 | 3
2023 | 4
2023 | 5
2023 | 6
2023 | 7
2023 | 8
2023 | 9
2023 | 10
(10 rows)
No-Gap and Scalability
With pgbench
, I test the throughput when generating numbers from one session only:
\! pgbench -t 1000 -nf <(echo '
insert into invoices values (2023, next_invoice_number(2023) );
') -c 1
pgbench (16beta2, server 15.1 (Debian 15.1-1.pgdg110+1))
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 0.981 ms
initial connection time = 5.319 ms
tps = 1019.719332 (without initial connection time)
Now I can compare this 1019 transactions per second, at 1 millisecond latency, with a run with 10 concurrent threads:
\! pgbench -t 1000 -nf <(echo '
insert into invoices values (2023, next_invoice_number(2023) );
') -c 10
pgbench (16beta2, server 15.1 (Debian 15.1-1.pgdg110+1))
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 14.528 ms
initial connection time = 53.172 ms
tps = 688.320969 (without initial connection time)
There is a trade-off here. If you need no-gap numbering, there's no other solution than having a single point to generate the numbers.
Note that all those examples work the same in PostgreSQL and in YugabyteDB. I've created the unique constraint on num
by creating the unique index beforehand so that I can precise ASC for range sharding. The default would have been HASH on YugabyteDB but there's a good chance that if you want a no-gap sequence you will also do some range queries on it.
Top comments (0)