Debugging Postgres can be daunting at first, due to the the size and complexity of the project. Luckily, there are tools that can help make the process bearable. One such tool is the debugger: A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program) [Wikipedia].
This post is about using a debugger to achieve two things:
- Exploring the Postgres codebase
- Tracking bugs in Postgres
First, you'll need to setup Postgres as described here.
Exploring the Postgres codebase
When working on a large project such as Postgres, you might want to follow particular code paths in order to understand how some functionality is implemented from start to end. For example, I want to follow the function calls for insertions in the database -- Which parts of the code are called from the point of issuing am INSERT query to when I get back the query results.
Debuggers come in handy for this particular use case as compared to looking through the codebase in your editor on your own, trying to figure out which function called another.
I will be using the GNU Debugger(GDB) for this post. You can use any debugger you are most comfortable with.
Let's get started by creating a dummy database and table to insert data into
postgres=# -- Creating test database
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# -- Connect to testdb
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# -- Create test table for inserting to
testdb=# CREATE TABLE test_tbl(id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE
testdb=#
Next, we need to get the process id of our client process with this command
ps -faux | grep postgres
This should display something that looks like the following
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 11 0.0 0.3 161752 18028 ? Ss 06:50 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
postgres 359 0.0 0.4 162000 20628 ? Ss 11:32 0:00 \_ postgres: checkpointer
postgres 360 0.0 0.0 161904 4368 ? Ss 11:32 0:00 \_ postgres: background writer
postgres 361 0.0 0.1 161752 7600 ? Ss 11:32 0:00 \_ postgres: walwriter
postgres 362 0.0 0.1 163384 6896 ? Ss 11:32 0:00 \_ postgres: autovacuum launcher
postgres 363 0.0 0.1 163332 5360 ? Ss 11:32 0:00 \_ postgres: logical replication launcher
postgres 373 0.0 0.3 164484 17728 ? Ss 11:49 0:00 \_ postgres: postgres testdb [local] idle
From the above, the Postgres process we are interested in is the last one postgres testdb [local] idle
with the Process ID(PID) being 373.
The other processes shown are simply Postgres backends tasked with performing various background functions e.g checkpointing data, writing WAL data from buffers to disk, vacuuming bloated tables etc -- We are not interested in these for this post.
I can now start to trace an insert statement. I will have 2 terminals open. One for running the client(psql) and another for hooking the debugger to the client process.
In the terminal for running the debugger, I will run the following command gdb <command>
gdb /usr/local/pgsql/bin/postgres
This should launch gdb using the Postgres executable. This will open up the gdb prompt on to the screen.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgsql/bin/postgres...
(gdb)
Attach to the PID we got earlier -- 373.
(gdb) attach 373
In the terminal running the client(psql), I will run the INSERT command:
testdb=# -- Inserting an apple :)
testdb=# INSERT INTO test_tbl(name) VALUES ('apples');
The client will then freeze as the debugger takes control.
I will switch to the debugger terminal and begin stepping through the code. First I need to determine which function to break/stop at. For this example I will use heap_insert
function as the breaking point since I am doing an INSERT: so I know for sure I will need to insert data into the heap(Think of heap as disk space where table data is stored in Postgres.)
(gdb) b heap_insert
Breakpoint 1 at 0x56040e35e7b1: file heapam.c, line 1819.
(gdb) c
Continuing.
Breakpoint 1, heap_insert (relation=0x7f1a6e5ca360, tup=0x560410b68e10, cid=0, options=0, bistate=0x0) at heapam.c:1819
1819 {
(gdb) bt
#0 heap_insert (relation=0x7f1a6e5ca360, tup=0x560410b68e10, cid=0, options=0, bistate=0x0) at heapam.c:1819
#1 0x000056040e36d1b3 in heapam_tuple_insert (relation=0x7f1a6e5ca360, slot=0x560410b68d08, cid=0, options=0, bistate=0x0) at heapam_handler.c:252
#2 0x000056040e5feb8c in table_tuple_insert (rel=0x7f1a6e5ca360, slot=0x560410b68d08, cid=0, options=0, bistate=0x0) at ../../../src/include/access/tableam.h:1400
#3 0x000056040e60090d in ExecInsert (context=0x7ffd763931e0, resultRelInfo=0x560410b680b8, slot=0x560410b68d08, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1133
#4 0x000056040e604acd in ExecModifyTable (pstate=0x560410b67eb0) at nodeModifyTable.c:3790
#5 0x000056040e5c8a49 in ExecProcNodeFirst (node=0x560410b67eb0) at execProcnode.c:464
#6 0x000056040e5bcd58 in ExecProcNode (node=0x560410b67eb0) at ../../../src/include/executor/executor.h:273
#7 0x000056040e5bf68f in ExecutePlan (estate=0x560410b67c68, planstate=0x560410b67eb0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection,
dest=0x560410b64ee8, execute_once=true) at execMain.c:1670
#8 0x000056040e5bd2ec in standard_ExecutorRun (queryDesc=0x560410b5a038, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#9 0x000056040e5bd173 in ExecutorRun (queryDesc=0x560410b5a038, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#10 0x000056040e844858 in ProcessQuery (plan=0x560410b64d98, sourceText=0x560410a67bb8 "INSERT INTO test_tbl(name) VALUES ('apples');", params=0x0, queryEnv=0x0, dest=0x560410b64ee8, qc=0x7ffd76393640)
at pquery.c:160
#11 0x000056040e8461f5 in PortalRunMulti (portal=0x560410ae4fe8, isTopLevel=true, setHoldSnapshot=false, dest=0x560410b64ee8, altdest=0x560410b64ee8, qc=0x7ffd76393640) at pquery.c:1277
#12 0x000056040e84578b in PortalRun (portal=0x560410ae4fe8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x560410b64ee8, altdest=0x560410b64ee8, qc=0x7ffd76393640) at pquery.c:791
#13 0x000056040e83eaa8 in exec_simple_query (query_string=0x560410a67bb8 "INSERT INTO test_tbl(name) VALUES ('apples');") at postgres.c:1272
#14 0x000056040e8437d0 in PostgresMain (dbname=0x560410a9ee80 "testdb", username=0x560410a9ee68 "postgres") at postgres.c:4652
#15 0x000056040e77bf0b in BackendRun (port=0x560410a911e0) at postmaster.c:4439
#16 0x000056040e77b797 in BackendStartup (port=0x560410a911e0) at postmaster.c:4167
#17 0x000056040e777eb3 in ServerLoop () at postmaster.c:1781
#18 0x000056040e777767 in PostmasterMain (argc=3, argv=0x560410a635b0) at postmaster.c:1465
#19 0x000056040e63bb69 in main (argc=3, argv=0x560410a635b0) at main.c:198
(gdb)
Set a breakpoint with b heap_insert
. Next, allow execution to continue until our breakpoint is triggered, with c
. Finally, we want to know how execution got to this point(the break-point) with bt
for backtrace.
Execution starts from the main
function through the executor functions(Exec*) to the table and heap functions. The source files and line numbers for all call sites are also shown.
As you can see, this is a powerful tool to help find your way through complex codebases very quickly.
Tracking bugs in Postgres
When a client process(e.g psql) encounters a problem it may crash or issue a warning, even though this depends on the severity of the problem.
I will try to investigate a Segmentation fault(SIGSEGV) error.
I will be using the PG_WAL_EXT extension for this example. In the pg_xlog_records
function, I will comment out the allocation for the reader object defined in XLogReaderState *xlog_reader
. This will throw a segmentation error when the object(xlog_reader) is touched since there will be no memory allocated to it
// xlog_reader = XLogReaderAllocate(wal_segment_size, directory,
// XL_ROUTINE(.page_read = &read_xlog_page,
// .segment_open = wal_segment_open,
// .segment_close = wal_segment_close),
// &private);
// if (xlog_reader == NULL)
// ereport(ERROR,
// (errcode(ERRCODE_OUT_OF_MEMORY),
// errmsg("out of memory"),
// errdetail("Failed while allocating a WAL reading processor.")));
This should trigger a segfault when the query is ran. The client connection will be restored and a crash message shown as below:
postgres=# select * from pg_xlog_records('/usr/local/pgsql/data/pg_wal/000000010000000000000001');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=#
To debug this I will open up the debugger again with the command format gdb <command> -c <core-file>
as below
gdb /usr/local/pgsql/bin/postgres -c /tmp/core.postgres.1728338434.415
/usr/local/pgsql/bin/postgres
is the path to the Postgres executable and /tmp/core.postgres.1728338434.415
being the path to the core dumped file.
The output from the command above looks something like this
The first frame(highlighted) shows the file(pg_wal.c) and exact line(line 194) with the error.
This is an trivial example meant to show how to go about debugging Postgres. Other issues might not be as straight forward to debug and may require looking at multiple frames and following deep call stacks to determine the root cause of an issue with the Postgres server.
Top comments (0)