DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

How PostgreSQL database tables rows are allocated?

In this blog I will demonstrate how data rows are placed in PostgreSQL database tables.

I'll examine the impact of the rows allocation on explain plan of SQL queries.

In the blog I will use a hidden PostgreSQL column ctid.

Ctid is unique column for each table in PostgreSQL. It has a combination of two values: block number and tuple index within the block.

Let's create some table and populate it with data.

Then let's check how ctid is populated.

postgres=# create table some_table(a bigint, b varchar(100), c timestamp);
CREATE TABLE
postgres=# insert into some_table values(1, 'Some data', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |     b     |             c
-------+---+-----------+----------------------------
 (0,1) | 1 | Some data | 2023-01-19 06:32:33.587591
(1 row)

postgres=#

postgres=# insert into some_table values(2, 'Some data 2', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,1) | 1 | Some data   | 2023-01-19 06:32:33.587591
 (0,2) | 2 | Some data 2 | 2023-01-19 06:33:43.122062
(2 rows)

postgres=#

postgres=# insert into some_table values(3, 'Some data 3', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,1) | 1 | Some data   | 2023-01-19 06:32:33.587591
 (0,2) | 2 | Some data 2 | 2023-01-19 06:33:43.122062
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
(3 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Note, there is correlation between the number of blocks PostgreSQL operates to perform a query and the cost of a query.

postgres=# explain (analyze, buffers)  select  * from some_table;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on some_table  (cost=0.00..1.03 rows=3 width=27) (actual time=0.008..0.010 rows=3 loops=1)
   Buffers: shared hit=1
 Planning:
   Buffers: shared hit=11
 Planning Time: 0.140 ms
 Execution Time: 0.027 ms
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Let's check, how 'insert', 'update', 'delete' operations affect allocation of rows in the block(s).

postgres=# insert into some_table values(4, 'Some data 4', now());
INSERT 0 1
postgres=# insert into some_table values(5, 'Some data 5', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,1) | 1 | Some data   | 2023-01-19 06:32:33.587591
 (0,2) | 2 | Some data 2 | 2023-01-19 06:33:43.122062
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
(5 rows)

postgres=#

postgres=# delete from some_table where a=1;
DELETE 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,2) | 2 | Some data 2 | 2023-01-19 06:33:43.122062
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
(4 rows)

postgres=#

postgres=# delete from some_table where a=2;
DELETE 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
(3 rows)

postgres=#

postgres=# insert into some_table values(6, 'Some data 6', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
 (0,6) | 6 | Some data 6 | 2023-01-19 06:42:32.575417
(4 rows)
Enter fullscreen mode Exit fullscreen mode

When inserting rows right after deletion it's not using the deleted slots. It's allocating more slots in the block.

postgres=# insert into some_table values(7, 'Some data 7', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
 (0,6) | 6 | Some data 6 | 2023-01-19 06:42:32.575417
 (0,7) | 7 | Some data 7 | 2023-01-19 06:42:46.647673
(5 rows)

postgres=#

postgres=# delete from some_table where a=7;
DELETE 1
postgres=# insert into some_table values(8, 'Some data 8', now());
INSERT 0 1
postgres=# insert into some_table values(9, 'Some data 9', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
 ctid  | a |      b      |             c
-------+---+-------------+----------------------------
 (0,3) | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,4) | 4 | Some data 4 | 2023-01-19 06:38:06.047129
 (0,5) | 5 | Some data 5 | 2023-01-19 06:38:13.031061
 (0,6) | 6 | Some data 6 | 2023-01-19 06:42:32.575417
 (0,8) | 8 | Some data 8 | 2023-01-19 06:45:24.432032
 (0,9) | 9 | Some data 9 | 2023-01-19 06:45:33.272139
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

The similar behaviour in case of updates.

The updated rows tuples occupy new slots in the block.

They are not updated "in place".

postgres=# update some_table set b='data row 4' where a=4;
UPDATE 1
postgres=# select ctid, * from some_table;
  ctid  | a |      b      |             c
--------+---+-------------+----------------------------
 (0,3)  | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,5)  | 5 | Some data 5 | 2023-01-19 06:38:13.031061
 (0,6)  | 6 | Some data 6 | 2023-01-19 06:42:32.575417
 (0,8)  | 8 | Some data 8 | 2023-01-19 06:45:24.432032
 (0,9)  | 9 | Some data 9 | 2023-01-19 06:45:33.272139
 (0,10) | 4 | data row 4  | 2023-01-19 06:38:06.047129
(6 rows)

postgres=#

postgres=# update some_table set b='data row 8' where a=8;
UPDATE 1
postgres=# select ctid, * from some_table;
  ctid  | a |      b      |             c
--------+---+-------------+----------------------------
 (0,3)  | 3 | Some data 3 | 2023-01-19 06:34:00.482305
 (0,5)  | 5 | Some data 5 | 2023-01-19 06:38:13.031061
 (0,6)  | 6 | Some data 6 | 2023-01-19 06:42:32.575417
 (0,9)  | 9 | Some data 9 | 2023-01-19 06:45:33.272139
 (0,10) | 4 | data row 4  | 2023-01-19 06:38:06.047129
 (0,11) | 8 | data row 8  | 2023-01-19 06:45:24.432032
(6 rows)

postgres=#

postgres=# delete from some_table where a != 8;
DELETE 5
postgres=# select ctid, * from some_table;
  ctid  | a |     b      |             c
--------+---+------------+----------------------------
 (0,11) | 8 | data row 8 | 2023-01-19 06:45:24.432032
(1 row)

postgres=# insert into some_table values(9, 'data row 9', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
  ctid  | a |     b      |             c
--------+---+------------+----------------------------
 (0,11) | 8 | data row 8 | 2023-01-19 06:45:24.432032
 (0,12) | 9 | data row 9 | 2023-01-19 06:56:14.852065
(2 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

When we do a delete in PostgreSQL, the row (= tuple) is not immediately removed from the data file. Instead it is only marked as deleted.

Similarly for updates, which may be seen as delete + inserts in PostgreSQL.

To reclaim storage occupied by dead tuples we'll use vacuum.

postgres=# vacuum some_table;
VACUUM
postgres=# select ctid, * from some_table;
  ctid  | a |     b      |             c
--------+---+------------+----------------------------
 (0,11) | 8 | data row 8 | 2023-01-19 06:45:24.432032
 (0,12) | 9 | data row 9 | 2023-01-19 06:56:14.852065
(2 rows)

postgres=# insert into some_table values(11, 'data row 11', now());
INSERT 0 1
postgres=# insert into some_table values(12, 'data row 12', now());
INSERT 0 1
postgres=# insert into some_table values(13, 'data row 13', now());
INSERT 0 1
postgres=# insert into some_table values(14, 'data row 14', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
  ctid  | a  |      b      |             c
--------+----+-------------+----------------------------
 (0,1)  | 11 | data row 11 | 2023-01-19 07:34:20.229467
 (0,2)  | 12 | data row 12 | 2023-01-19 07:34:26.4457
 (0,3)  | 13 | data row 13 | 2023-01-19 07:34:32.053483
 (0,4)  | 14 | data row 14 | 2023-01-19 07:34:37.165529
 (0,11) |  8 | data row 8  | 2023-01-19 06:45:24.432032
 (0,12) |  9 | data row 9  | 2023-01-19 06:56:14.852065
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

VACUUM FULL writes the entire content of the table into a new disk file and releases the wasted space back to OS.

This causes a table-level lock on the table and slow speeds.

VACUUM FULL should be avoided on a high load system.

postgres=# vacuum full some_table;
VACUUM
postgres=# select ctid, * from some_table;
 ctid  | a  |      b      |             c
-------+----+-------------+----------------------------
 (0,1) | 11 | data row 11 | 2023-01-19 07:34:20.229467
 (0,2) | 12 | data row 12 | 2023-01-19 07:34:26.4457
 (0,3) | 13 | data row 13 | 2023-01-19 07:34:32.053483
 (0,4) | 14 | data row 14 | 2023-01-19 07:34:37.165529
 (0,5) |  8 | data row 8  | 2023-01-19 06:45:24.432032
 (0,6) |  9 | data row 9  | 2023-01-19 06:56:14.852065
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Let's populate the table with 100,000 rows.

To populate it I will use PostgreSQL anonymous code block:

do $$
declare
   n_of_recs bigint := 100000;
   random_varchar_length smallint;
   random_varchar varchar(100);
   random_timestamp timestamp;
   random_int bigint;   
   query text; 
begin

      for idx in 1..n_of_recs loop      

         -- some random varchar length between 1 and 100
         random_varchar_length := floor(random()*(100-1+1))+1;

         -- some random varchar
         random_varchar := array_to_string(array(select chr((ascii('a') + round(random() * 25)) :: integer) from generate_series(1,random_varchar_length)), ''); 

         -- some random int between 10 and 999999
         random_int := floor(random()*(999999-10+1))+10;

         -- some random timestamp between '1900-01-01 00:00:00' and '2024-01-01 00:00:00'
         random_timestamp := timestamp '1900-01-01 00:00:00' + random() * (timestamp '2024-01-01 00:00:00' - timestamp '1900-01-01 00:00:00');

         query := format('insert into some_table values (%s, ''%s'', ''%s'');', random_int, random_varchar, random_timestamp);

         execute query;

      end loop;

      select count(1) from some_table into n_of_recs;

      raise notice 'n_of_recs: %', n_of_recs;

end$$;

NOTICE:  n_of_recs: 100006
DO
postgres=# select count(1) from some_table;
 count
--------
 100006
(1 row)

postgres=# select ctid, * from some_table order by 1;

   ctid    |   a    |                                                  b                                                   |             c
-----------+--------+------------------------------------------------------------------------------------------------------+----------------------------
 (0,1)     |     11 | data row 11                                                                                          | 2023-01-19 07:34:20.229467
 (0,2)     |     12 | data row 12                                                                                          | 2023-01-19 07:34:26.4457
 (0,3)     |     13 | data row 13                                                                                          | 2023-01-19 07:34:32.053483
 (0,4)     |     14 | data row 14                                                                                          | 2023-01-19 07:34:37.165529
 (0,5)     |      8 | data row 8                                                                                           | 2023-01-19 06:45:24.432032
 (0,6)     |      9 | data row 9                                                                                           | 2023-01-19 06:56:14.852065
 (0,7)     | 818252 | thmbosfqspeftpfsbpeshkcgxejglyxavojcctdzbqzoclrfkovbjnqlgefdotxjghxlnpknwvsxsreceyhohep              | 1995-05-17 19:46:24.613045
 (0,8)     | 483213 | rxegngumunvomlbqpjgdjaqtkoojhntekirmgqufihfetfsgvuetxofmpkgdulh                                      | 1930-07-14 17:37:37.662741
 (0,9)     | 172634 | cstbmbqenkkkkbfllvqstxvxvbobcgpegfithcnnpnooxfkdlmsnylndfvaxstguvw                                   | 1974-02-18 06:17:12.378766
 (0,10)    | 261913 | mepyfjfettjfnqquyjeqvqpnxogedosmkruwrgsvobui                                                         | 1929-06-26 00:39:31.413016
 (0,11)    | 892834 | distnlbpekvozttjtccfjcexboulpaefklthforhjrrlilnlegffvhbqxdjvlifelhnptfaoifo                          | 1991-09-28 17:15:19.777339
 (0,12)    | 725914 | igg                                                                                                  | 2016-03-05 12:28:38.527894
 (0,13)    | 933645 | jonlmscvrwqopvxlnmoqrpi                                                                              | 1975-06-18 21:54:13.063909
 (0,14)    | 267353 | wgtabkxtfrcmyuvdmuecjeixeypgfxrrzwutcwitjrydpoantfhvcxoifyjdfwgsxkktxfs                              | 1994-07-01 21:33:42.829717
 (0,15)    | 895733 | ehihnkmalycuhs                                                                                       | 2022-01-21 10:36:00.537987
 (0,16)    | 277033 | paypawsrlagmeslnorflsrytlqzfopwrtyemrvdixym                                                          | 1912-02-26 20:28:04.74868
 (0,17)    | 381127 | ngjifvdbgpbklwyhnrqhvulunkphrxyeehssrdztsvvxjkbrddugocmmuktbpcfhegeokaslt                            | 1978-09-24 13:57:50.683133
 (0,18)    | 145364 | ilvydbrdfrukpmgmimutgkdxormwhoxbhtcipnudcwtpkwudeiaiusxwhnlqprywjijzqledgkjszcshtzvxixhapdkbcxle     | 1914-10-29 01:01:01.052877
 (0,19)    |  30557 | gaqqbvbecjxlwnygnzmuvdjmgvuuhykwniumllad                                                             | 1995-06-06 23:19:28.814136
 (0,20)    | 493356 | ubsolrjutbowqwwlhrvvlslqpfczqqylbaykkpbscyoqxtrekyjdvyleossyjawydcvfnnbtdmrlhylchvaqxorp             | 1988-07-20 13:28:00.269443
 (0,21)    | 177633 | xthnsbicblfiudwyehlqkfjzrjbwvnqoiykovujzubrbwfhrmpvfulgtrdlupbrkrd                                   | 1962-01-27 17:21:11.319584
 (0,22)    | 670676 | hpybddysqwahymmynxrgmocmsmpfwcstjwijkjjudboezmkmixlrolpbncsgcnmrttvgm                                | 2017-11-01 11:58:27.012205
 (0,23)    | 781762 | gvmifiwwghwxuedmwnhmfgy                                                                              | 1929-06-26 00:24:27.996451
 (0,24)    | 952472 | gyzxnlswcvefaqtcorbkrrwsjyenovtbdkksdmghfhveppionbagnumxghxezanupbxahbuquxhcbbnoblfndriceklcf        | 1929-04-17 18:11:11.297861
 (0,25)    |  74972 | wnuwbrymujljpymjeiqblzkffimxtndysrvfdkunuytwwihp                                                     | 2004-01-24 15:01:37.238796
 (0,26)    | 409840 | leqniseeblshgvidkikrhosfkbhvhbixiaqevibvfbfnbvlxgjctroruwxldyyvidekiinahwouiidcojvsqrbnlzpqjqxdy     | 1998-08-02 12:42:36.204115
 (0,27)    | 140003 | xuhfioyngembxqyiwdpdnicgv                                                                            | 1993-04-18 10:34:15.487991
 (0,28)    | 978583 | wbmosdusqxrxhydstsiwormyptfenwhyodbrwixeuexkjymnfoemetrfgi                                           | 2018-04-13 03:00:55.070482
 (0,29)    | 939215 | vuspibifpwjvvixviogylsrysunxbqqiukpennpohebpdqsjegwovgwcyjhyqpuhfqhjtnpawsggudyfwglypqqfxnvsftu      | 2020-12-17 12:10:54.875318
 (0,30)    | 591045 | vncnfwsyezckyxqjejcbroyqzegkxrqorzctghxhcfdvnqouebabuuzrfrbmjuxmrugveqqwejpc                         | 2019-11-27 06:43:45.933303
  ...


postgres=# select ctid, * from some_table order by 1 desc;
   ctid    |   a    |                                                  b                                                   |             c
-----------+--------+------------------------------------------------------------------------------------------------------+----------------------------
 (1217,4)  | 435200 | ewk                                                                                                  | 2000-10-21 23:24:26.672407
 (1217,3)  | 170409 | ssytrbrbnwlsdmgyzqgghhnohuzllewpreivxwfxrpduneieusgvwvznzslywemcnxtisvalx                            | 1968-01-25 14:08:59.738072
 (1217,2)  | 315386 | ngqmrbhakhhjnlrgfxdubpnuvpzdepqfjwcpxgfcjgc                                                          | 1943-06-17 17:14:48.710813
 (1217,1)  | 326906 | jbefklnrrexflwhvutbmhmqxgqsimpvahfvjjgwvehnwfyjufrgerusmefxylqbcnemfxnuireuerlxwlxhykpsmmehwcnzphkdg | 2012-04-06 13:44:31.189737
 (1216,82) | 934300 | bbllprohweckivvfrzdnwnvhebcjqcxgw                                                                    | 1928-07-29 16:52:35.02805
 (1216,81) | 933560 | gynthyyiomhnbnnrzhirhgjjcgqgdtmnpbyaouewyjovgknlkbpqavepyvgkilchrpumnywwpuyvmzdqzq                   | 1923-11-20 17:11:26.162056
 (1216,80) | 936355 | dhusjwzluqcdbclwgdpdgdsheikvvowhettixytpifwjffscjdpeixxgzvppnytqrhkcnnxzbvceffqxtysnzqzpunetcfvxno   | 1922-01-28 20:32:10.139441
 (1216,79) | 622519 | iimfugohxqysyvjulrgjwgijnomodtwhccdhcnslfuvlotboqlkprhqstkpjjryu                                     | 1911-08-18 22:04:14.869798
 (1216,78) | 558434 | cvgjqmlqrrueevidrvcteyuxnfkrfdkmdnyedmtkpmlvyiessmeemxtlusvvtcxetfgfiovxxwste                        | 1968-01-14 17:02:40.066325
 (1216,77) | 334962 | wkgnbeffhosoimlufdmsgdbwqtfpmegxhjvgskxnyebdshgipszcjljoxswclrolczsbxxxesnsgfn                       | 1939-07-18 09:47:41.385745
 (1216,76) | 626407 | lvhuqnejuxjmgbymluntbrzjdwokfezapienouflogmcdextapybsslxfltmhukro                                    | 1907-10-21 07:09:50.792027
 (1216,75) | 524222 | hmfqpkmowlwmdhflxvqxmqkulvcbndgrkthlvmjcrceytkwxkyedgwyppisccwrrimysosxpowtrdbiyvquuhayzzjayehpcvgs  | 1902-12-18 17:04:07.061058
 (1216,74) | 502952 | dhpjrxjqpelipncphakmmj                                                                               | 1926-03-25 04:17:09.702591
 (1216,73) | 585605 | jtxkdymeulhrxbsdrq                                                                                   | 1922-09-30 05:55:48.012551
 (1216,72) |  62103 | chvrjewoishldlvfjvmnbfmshmbytbat                                                                     | 1928-08-12 06:54:50.968235
 (1216,71) | 673121 | odnie                                                                                                | 2021-10-04 18:27:11.108491
 ...


postgres=# analyze some_table;
ANALYZE
postgres=# explain (buffers, analyze) select * from some_table;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on some_table  (cost=0.00..2218.06 rows=100006 width=67) (actual time=0.011..13.283 rows=100006 loops=1)
   Buffers: shared hit=1218
 Planning:
   Buffers: shared hit=13
 Planning Time: 0.261 ms
 Execution Time: 21.694 ms
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Now I will delete all the rows from the table some_table.

And then insert just one record: insert into some_table values(1, 'One', now());

postgres=# delete from some_table;
DELETE 100006
postgres=# insert into some_table values(1, 'One', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
   ctid    | a |  b  |             c
-----------+---+-----+---------------------------
 (1011,83) | 1 | One | 2023-01-19 07:51:07.46366
(1 row)
Enter fullscreen mode Exit fullscreen mode

Examining explain plan of the "select * from some_table" query we can see Buffers: shared hit=1012 and the cost is 1012, despite it's just 1 row to retrieve.

postgres=# explain (buffers, analyze) select * from some_table;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on some_table  (cost=0.00..1012.01 rows=1 width=20) (actual time=0.633..0.634 rows=1 loops=1)
   Buffers: shared hit=1012
 Planning:
   Buffers: shared hit=11
 Planning Time: 0.136 ms
 Execution Time: 0.652 ms
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Suddenly we see that inserting more rows in the table is starting from the first block: (0,1)

postgres=# insert into some_table values(2, 'Two', now());
INSERT 0 1

postgres=# select ctid, * from some_table;
   ctid    | a |  b  |             c
-----------+---+-----+----------------------------
 (0,1)     | 2 | Two | 2023-01-19 07:55:41.581431
 (1011,83) | 1 | One | 2023-01-19 07:51:07.46366
(2 rows)

postgres=#

postgres=# insert into some_table values(3, 'Three', now());
INSERT 0 1
postgres=# select ctid, * from some_table;
   ctid    | a |   b   |             c
-----------+---+-------+----------------------------
 (0,1)     | 2 | Two   | 2023-01-19 07:55:41.581431
 (0,2)     | 3 | Three | 2023-01-19 07:57:17.164661
 (1011,83) | 1 | One   | 2023-01-19 07:51:07.46366
(3 rows)

postgres=#

postgres=# insert into some_table values(4, 'Four', now());
INSERT 0 1

postgres=# select * from some_table;
 a |   b   |             c
---+-------+----------------------------
 2 | Two   | 2023-01-19 07:55:41.581431
 3 | Three | 2023-01-19 07:57:17.164661
 4 | Four  | 2023-01-19 07:57:36.428461
 1 | One   | 2023-01-19 07:51:07.46366
(4 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

What is caused the rows to be populated from the very first block 0 and not continuing allocate more slotes in the blocks 1011?

To answer this question let's run the following query. It checks number of inserted, updated, deleted, live, dead tuples, the last vacuum, autovacuum, analyze and autoanalyze timestamps:

select 
  schemaname, 
  relname, 
  n_tup_ins, 
  n_tup_upd, 
  n_tup_del, 
  n_live_tup, 
  n_dead_tup, 
  last_vacuum, 
  last_autovacuum, 
  last_analyze, 
  last_autoanalyze 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public'
order by 
  n_dead_tup desc;

schemaname |     relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |          last_vacuum          |        last_autovacuum        |         last_analyze          |       last_autoanalyze
------------+------------------+-----------+-----------+-----------+------------+------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 public     | some_table       |    100018 |         2 |    100014 |          4 |          0 | 2023-01-19 07:34:01.876383+00 | 2023-01-19 07:51:30.818759+00 | 2023-01-19 07:44:07.334436+00 | 2023-01-19 07:51:30.842668+00
(3 rows)
Enter fullscreen mode Exit fullscreen mode

So the reason why the inserting more rows in the table is starting from the first block: (0,1) is autovacuum process.

PostgreSQL has an optional but highly recommended feature called autovacuum, whose purpose is to automate the execution of VACUUM and ANALYZE commands.

When enabled (on SaaS it's always enabled), autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples.

These checks use the statistics collection facility; therefore, autovacuum cannot be used unless track_counts is set to true.

In the default configuration, autovacuuming is enabled and the related configuration parameters are appropriately set.

postgres=# \d pg_settings;
               View "pg_catalog.pg_settings"
     Column      |  Type   | Collation | Nullable | Default
-----------------+---------+-----------+----------+---------
 name            | text    |           |          |
 setting         | text    |           |          |
 unit            | text    |           |          |
 category        | text    |           |          |
 short_desc      | text    |           |          |
 extra_desc      | text    |           |          |
 context         | text    |           |          |
 vartype         | text    |           |          |
 source          | text    |           |          |
 min_val         | text    |           |          |
 max_val         | text    |           |          |
 enumvals        | text[]  |           |          |
 boot_val        | text    |           |          |
 reset_val       | text    |           |          |
 sourcefile      | text    |           |          |
 sourceline      | integer |           |          |
 pending_restart | boolean |           |          |

postgres=# select name, setting from pg_settings where name like 'auto%';
                 name                  |  setting
---------------------------------------+-----------
 autovacuum                            | on
 autovacuum_analyze_scale_factor       | 0.1
 autovacuum_analyze_threshold          | 50
 autovacuum_freeze_max_age             | 200000000
 autovacuum_max_workers                | 3
 autovacuum_multixact_freeze_max_age   | 400000000
 autovacuum_naptime                    | 60
 autovacuum_vacuum_cost_delay          | 2
 autovacuum_vacuum_cost_limit          | -1
 autovacuum_vacuum_insert_scale_factor | 0.2
 autovacuum_vacuum_insert_threshold    | 1000
 autovacuum_vacuum_scale_factor        | 0.2
 autovacuum_vacuum_threshold           | 50
 autovacuum_work_mem                   | -1
(14 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

We can use PostgreSQL documentation to get more detailed information about the parameters

https://www.postgresql.org/docs/current/runtime-config-autovacuum.html

For example:

autovacuum_vacuum_threshold (integer)
Specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples. This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.

autovacuum_max_workers (integer)
Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default is three. This parameter can only be set at server start.

autovacuum_vacuum_scale_factor (floating point)
Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size). This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.
...

To retrieve just 4 rows Postgres engine should bring/iterate 1012 blocks.

postgres=# select * from some_table;
 a |   b   |             c
---+-------+----------------------------
 2 | Two   | 2023-01-19 07:55:41.581431
 3 | Three | 2023-01-19 07:57:17.164661
 4 | Four  | 2023-01-19 07:57:36.428461
 1 | One   | 2023-01-19 07:51:07.46366
(4 rows)

postgres=# select ctid, * from some_table;
   ctid    | a |   b   |             c
-----------+---+-------+----------------------------
 (0,1)     | 2 | Two   | 2023-01-19 07:55:41.581431
 (0,2)     | 3 | Three | 2023-01-19 07:57:17.164661
 (0,3)     | 4 | Four  | 2023-01-19 07:57:36.428461
 (1011,83) | 1 | One   | 2023-01-19 07:51:07.46366
(4 rows)

postgres=# explain (analyze, buffers) select * from some_table;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on some_table  (cost=0.00..1012.04 rows=4 width=20) (actual time=0.013..0.652 rows=4 loops=1)
   Buffers: shared hit=1012
 Planning Time: 0.050 ms
 Execution Time: 0.670 ms
(4 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Vacuum (regular vacuum) is not rebuilding the table.

postgres=# vacuum analyze some_table;
VACUUM
postgres=# select ctid, * from some_table;
   ctid    | a |   b   |             c
-----------+---+-------+----------------------------
 (0,1)     | 2 | Two   | 2023-01-19 07:55:41.581431
 (0,2)     | 3 | Three | 2023-01-19 07:57:17.164661
 (0,3)     | 4 | Four  | 2023-01-19 07:57:36.428461
 (1011,83) | 1 | One   | 2023-01-19 07:51:07.46366
(4 rows)
...
postgres=# select ctid, * from some_table;
   ctid    | a |   b   |             c
-----------+---+-------+----------------------------
 (0,1)     | 2 | Two   | 2023-01-19 07:55:41.581431
 (0,2)     | 3 | Three | 2023-01-19 07:57:17.164661
 (0,3)     | 4 | Four  | 2023-01-19 07:57:36.428461
 (0,4)     | 5 | Five  | 2023-01-19 07:58:31.123455
 (1011,83) | 1 | One   | 2023-01-19 07:51:07.46366
(5 rows)
Enter fullscreen mode Exit fullscreen mode

Vacuum FULL can fix the situation.

postgres=# vacuum full some_table;
VACUUM
postgres=# select ctid, * from some_table;
 ctid  | a |   b   |             c
-------+---+-------+----------------------------
 (0,1) | 2 | Two   | 2023-01-19 07:55:41.581431
 (0,2) | 3 | Three | 2023-01-19 07:57:17.164661
 (0,3) | 4 | Four  | 2023-01-19 07:57:36.428461
 (0,4) | 1 | One   | 2023-01-19 07:51:07.46366
(4 rows)

postgres=# analyze some_table;
ANALYZE
postgres=# explain (analyze, buffers) select * from some_table;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on some_table  (cost=0.00..1.04 rows=4 width=20) (actual time=0.009..0.011 rows=4 loops=1)
   Buffers: shared hit=1
 Planning:
   Buffers: shared hit=9
 Planning Time: 0.120 ms
 Execution Time: 0.026 ms
(6 rows)

postgres=#
Enter fullscreen mode Exit fullscreen mode

Conclusion.

In this blog I've demonstrated how data rows are placed in PostgreSQL database tables. Also, I've examine the impact of the rows allocation on explain plan of SQL queries.

Latest comments (0)