Brian Misachi

Posted on Oct 8

Debugging Postgres

#postgres #database

Debugging Postgres can be daunting at first, due to the the size and complexity of the project. Luckily, there are tools that can help make the process bearable. One such tool is the debugger: A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program) [Wikipedia].

This post is about using a debugger to achieve two things:

Exploring the Postgres codebase
Tracking bugs in Postgres

First, you'll need to setup Postgres as described here.

Exploring the Postgres codebase

When working on a large project such as Postgres, you might want to follow particular code paths in order to understand how some functionality is implemented from start to end. For example, I want to follow the function calls for insertions in the database -- Which parts of the code are called from the point of issuing am INSERT query to when I get back the query results.
Debuggers come in handy for this particular use case as compared to looking through the codebase in your editor on your own, trying to figure out which function called another.

I will be using the GNU Debugger(GDB) for this post. You can use any debugger you are most comfortable with.

Let's get started by creating a dummy database and table to insert data into

postgres=# -- Creating test database
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# -- Connect to testdb
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# -- Create test table for inserting to
testdb=# CREATE TABLE test_tbl(id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE
testdb=#

Next, we need to get the process id of our client process with this command

ps -faux | grep postgres

This should display something that looks like the following

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres      11  0.0  0.3 161752 18028 ?        Ss   06:50   0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
postgres     359  0.0  0.4 162000 20628 ?        Ss   11:32   0:00  \_ postgres: checkpointer
postgres     360  0.0  0.0 161904  4368 ?        Ss   11:32   0:00  \_ postgres: background writer
postgres     361  0.0  0.1 161752  7600 ?        Ss   11:32   0:00  \_ postgres: walwriter
postgres     362  0.0  0.1 163384  6896 ?        Ss   11:32   0:00  \_ postgres: autovacuum launcher
postgres     363  0.0  0.1 163332  5360 ?        Ss   11:32   0:00  \_ postgres: logical replication launcher
postgres     373  0.0  0.3 164484 17728 ?        Ss   11:49   0:00  \_ postgres: postgres testdb [local] idle

From the above, the Postgres process we are interested in is the last one postgres testdb [local] idle with the Process ID(PID) being 373.
The other processes shown are simply Postgres backends tasked with performing various background functions e.g checkpointing data, writing WAL data from buffers to disk, vacuuming bloated tables etc -- We are not interested in these for this post.

I can now start to trace an insert statement. I will have 2 terminals open. One for running the client(psql) and another for hooking the debugger to the client process.
In the terminal for running the debugger, I will run the following command gdb <command>

gdb /usr/local/pgsql/bin/postgres

This should launch gdb using the Postgres executable. This will open up the gdb prompt on to the screen.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgsql/bin/postgres...
(gdb)

Attach to the PID we got earlier -- 373.

(gdb) attach 373

In the terminal running the client(psql), I will run the INSERT command:

testdb=# -- Inserting an apple :)
testdb=# INSERT INTO test_tbl(name) VALUES ('apples');

The client will then freeze as the debugger takes control.

I will switch to the debugger terminal and begin stepping through the code. First I need to determine which function to break/stop at. For this example I will use heap_insert function as the breaking point since I am doing an INSERT: so I know for sure I will need to insert data into the heap(Think of heap as disk space where table data is stored in Postgres.)

(gdb) b heap_insert
Breakpoint 1 at 0x56040e35e7b1: file heapam.c, line 1819.
(gdb) c
Continuing.

Breakpoint 1, heap_insert (relation=0x7f1a6e5ca360, tup=0x560410b68e10, cid=0, options=0, bistate=0x0) at heapam.c:1819
1819    {
(gdb) bt
#0  heap_insert (relation=0x7f1a6e5ca360, tup=0x560410b68e10, cid=0, options=0, bistate=0x0) at heapam.c:1819
#1  0x000056040e36d1b3 in heapam_tuple_insert (relation=0x7f1a6e5ca360, slot=0x560410b68d08, cid=0, options=0, bistate=0x0) at heapam_handler.c:252
#2  0x000056040e5feb8c in table_tuple_insert (rel=0x7f1a6e5ca360, slot=0x560410b68d08, cid=0, options=0, bistate=0x0) at ../../../src/include/access/tableam.h:1400
#3  0x000056040e60090d in ExecInsert (context=0x7ffd763931e0, resultRelInfo=0x560410b680b8, slot=0x560410b68d08, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1133
#4  0x000056040e604acd in ExecModifyTable (pstate=0x560410b67eb0) at nodeModifyTable.c:3790
#5  0x000056040e5c8a49 in ExecProcNodeFirst (node=0x560410b67eb0) at execProcnode.c:464
#6  0x000056040e5bcd58 in ExecProcNode (node=0x560410b67eb0) at ../../../src/include/executor/executor.h:273
#7  0x000056040e5bf68f in ExecutePlan (estate=0x560410b67c68, planstate=0x560410b67eb0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection,
    dest=0x560410b64ee8, execute_once=true) at execMain.c:1670
#8  0x000056040e5bd2ec in standard_ExecutorRun (queryDesc=0x560410b5a038, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#9  0x000056040e5bd173 in ExecutorRun (queryDesc=0x560410b5a038, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#10 0x000056040e844858 in ProcessQuery (plan=0x560410b64d98, sourceText=0x560410a67bb8 "INSERT INTO test_tbl(name) VALUES ('apples');", params=0x0, queryEnv=0x0, dest=0x560410b64ee8, qc=0x7ffd76393640)
    at pquery.c:160
#11 0x000056040e8461f5 in PortalRunMulti (portal=0x560410ae4fe8, isTopLevel=true, setHoldSnapshot=false, dest=0x560410b64ee8, altdest=0x560410b64ee8, qc=0x7ffd76393640) at pquery.c:1277
#12 0x000056040e84578b in PortalRun (portal=0x560410ae4fe8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x560410b64ee8, altdest=0x560410b64ee8, qc=0x7ffd76393640) at pquery.c:791
#13 0x000056040e83eaa8 in exec_simple_query (query_string=0x560410a67bb8 "INSERT INTO test_tbl(name) VALUES ('apples');") at postgres.c:1272
#14 0x000056040e8437d0 in PostgresMain (dbname=0x560410a9ee80 "testdb", username=0x560410a9ee68 "postgres") at postgres.c:4652
#15 0x000056040e77bf0b in BackendRun (port=0x560410a911e0) at postmaster.c:4439
#16 0x000056040e77b797 in BackendStartup (port=0x560410a911e0) at postmaster.c:4167
#17 0x000056040e777eb3 in ServerLoop () at postmaster.c:1781
#18 0x000056040e777767 in PostmasterMain (argc=3, argv=0x560410a635b0) at postmaster.c:1465
#19 0x000056040e63bb69 in main (argc=3, argv=0x560410a635b0) at main.c:198
(gdb)

Set a breakpoint with b heap_insert. Next, allow execution to continue until our breakpoint is triggered, with c. Finally, we want to know how execution got to this point(the break-point) with bt for backtrace.

Execution starts from the main function through the executor functions(Exec*) to the table and heap functions. The source files and line numbers for all call sites are also shown.
As you can see, this is a powerful tool to help find your way through complex codebases very quickly.

Tracking bugs in Postgres

When a client process(e.g psql) encounters a problem it may crash or issue a warning, even though this depends on the severity of the problem.
I will try to investigate a Segmentation fault(SIGSEGV) error.

I will be using the PG_WAL_EXT extension for this example. In the pg_xlog_records function, I will comment out the allocation for the reader object defined in XLogReaderState *xlog_reader. This will throw a segmentation error when the object(xlog_reader) is touched since there will be no memory allocated to it

// xlog_reader = XLogReaderAllocate(wal_segment_size, directory,
    //                                  XL_ROUTINE(.page_read = &read_xlog_page,
    //                                             .segment_open = wal_segment_open,
    //                                             .segment_close = wal_segment_close),
    //                                  &private);

    // if (xlog_reader == NULL)
    //     ereport(ERROR,
    //             (errcode(ERRCODE_OUT_OF_MEMORY),
    //              errmsg("out of memory"),
    //              errdetail("Failed while allocating a WAL reading processor.")));

This should trigger a segfault when the query is ran. The client connection will be restored and a crash message shown as below:

postgres=# select * from pg_xlog_records('/usr/local/pgsql/data/pg_wal/000000010000000000000001');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=#

To debug this I will open up the debugger again with the command format gdb <command> -c <core-file> as below

gdb /usr/local/pgsql/bin/postgres -c /tmp/core.postgres.1728338434.415

/usr/local/pgsql/bin/postgres is the path to the Postgres executable and /tmp/core.postgres.1728338434.415 being the path to the core dumped file.

The output from the command above looks something like this

The first frame(highlighted) shows the file(pg_wal.c) and exact line(line 194) with the error.

This is an trivial example meant to show how to go about debugging Postgres. Other issues might not be as straight forward to debug and may require looking at multiple frames and following deep call stacks to determine the root cause of an issue with the Postgres server.

DEV Community

Debugging Postgres

Exploring the Postgres codebase

Tracking bugs in Postgres

Top comments (0)

Read next

Adding new columns - lowCalAlt_update5

Unidirectional associations for one-to-many

Aurora Limitless - Collocation (and porting PgBench)

Aurora Limitless - Resharding