DEV Community

Alex Dzyoba
Alex Dzyoba

Posted on • Originally published at alex.dzyoba.com on

How to point GDB to your sources

So, you have a binary that you or someone developed and, surprise, it has some bug. Or you just curious how it’s working. Great tool to help with these cases is a debugger.

It’s really seldom when you want to debug on assembly level, usually, you want to see the sources. But often times you debug the program on the host other than the build host and see this really frustrating message:

$ gdb -q python3.7
Reading symbols from python3.7...done.
(gdb) l
6 ./Programs/python.c: No such file or directory.
Enter fullscreen mode Exit fullscreen mode

Ouch. Everybody was here. I’ve seen this so often while it’s so vital for sensible debugging so I think it’s very important to get into details and understand how GDB shows source code in debugging session.

Debug info

It all starts with debug info - special sections in the binary file produced by the compiler and used by the debugger and other handy tools.

In GCC there is well-known -g flag for that. Most projects with some kind of build system either build with debug info by default or have some flag for it.

In the case of CPython, you have to do the following:

$ ./configure --with-pydebug
$ make -j
Enter fullscreen mode Exit fullscreen mode

--with-pydebug will insert -g in GCC invocation.

This -g option will generate debug sections - binary sections to insert into program’s binary. These sections are usually in DWARF format. For ELF binaries these debug sections have names like .debug_*, e.g. .debug_info or.debug_loc. These debug sections are what makes the magic of debugging possible - basically, it’s a mapping of assembly level instructions to the source code.

To find whether your program has debug symbols you can list the sections of the binary with objdump:

$ objdump -h ./python

python: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
  0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
...
 25 .bss 00031f70 00000000008d9e00 00000000008d9e00 002d9dfe 2**5
                  ALLOC
 26 .comment 00000058 0000000000000000 0000000000000000 002d9dfe 2**0
                  CONTENTS, READONLY
 27 .debug_aranges 000017f0 0000000000000000 0000000000000000 002d9e56 2**0
                  CONTENTS, READONLY, DEBUGGING
 28 .debug_info 00377bac 0000000000000000 0000000000000000 002db646 2**0
                  CONTENTS, READONLY, DEBUGGING
 29 .debug_abbrev 0001fcd7 0000000000000000 0000000000000000 006531f2 2**0
                  CONTENTS, READONLY, DEBUGGING
 30 .debug_line 0008b441 0000000000000000 0000000000000000 00672ec9 2**0
                  CONTENTS, READONLY, DEBUGGING
 31 .debug_str 00031f18 0000000000000000 0000000000000000 006fe30a 2**0
                  CONTENTS, READONLY, DEBUGGING
 32 .debug_loc 0034190c 0000000000000000 0000000000000000 00730222 2**0
                  CONTENTS, READONLY, DEBUGGING
 33 .debug_ranges 00062e10 0000000000000000 0000000000000000 00a71b2e 2**0
                  CONTENTS, READONLY, DEBUGGING
Enter fullscreen mode Exit fullscreen mode

or readelf:

$ readelf -S ./python
There are 38 section headers, starting at offset 0xb41840:

Section Headers:
  [Nr] Name Type Address Offset
       Size EntSize Flags Link Info Align
  [0] NULL 0000000000000000 00000000
       0000000000000000 0000000000000000 0 0 0
  [1] .interp PROGBITS 0000000000400238 00000238
       000000000000001c 0000000000000000 A 0 0 1

...

  [26] .bss NOBITS 00000000008d9e00 002d9dfe
       0000000000031f70 0000000000000000 WA 0 0 32
  [27] .comment PROGBITS 0000000000000000 002d9dfe
       0000000000000058 0000000000000001 MS 0 0 1
  [28] .debug_aranges PROGBITS 0000000000000000 002d9e56
       00000000000017f0 0000000000000000 0 0 1
  [29] .debug_info PROGBITS 0000000000000000 002db646
       0000000000377bac 0000000000000000 0 0 1
  [30] .debug_abbrev PROGBITS 0000000000000000 006531f2
       000000000001fcd7 0000000000000000 0 0 1
  [31] .debug_line PROGBITS 0000000000000000 00672ec9
       000000000008b441 0000000000000000 0 0 1
  [32] .debug_str PROGBITS 0000000000000000 006fe30a
       0000000000031f18 0000000000000001 MS 0 0 1
  [33] .debug_loc PROGBITS 0000000000000000 00730222
       000000000034190c 0000000000000000 0 0 1
  [34] .debug_ranges PROGBITS 0000000000000000 00a71b2e
       0000000000062e10 0000000000000000 0 0 1
  [35] .shstrtab STRTAB 0000000000000000 00b416d5
       0000000000000165 0000000000000000 0 0 1
  [36] .symtab SYMTAB 0000000000000000 00ad4940
       000000000003f978 0000000000000018 37 8762 8
  [37] .strtab STRTAB 0000000000000000 00b142b8
       000000000002d41d 0000000000000000 0 0 1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
Enter fullscreen mode Exit fullscreen mode

as we see in our fresh compiled Python - it has .debug_* section, hence it has debug info.

Debug info is a collection of DIEs - Debug Info Entries. Each DIE has a tag specifying what kind of DIE it is and attributes that describe this DIE - things like variable name and line number.

How GDB finds source code

To find the sources GDB parses .debug_info section to find all DIEs with tagDW_TAG_compile_unit. The DIE with this tag has 2 main attributesDW_AT_comp_dir (compilation directory) and DW_AT_name - path to the source file. Combined they provide the full path to the source file for the particular compilation unit (object file).

To parse debug info you can again use objdump:

$ objdump -g ./python | vim -
Enter fullscreen mode Exit fullscreen mode

and there you can see the parsed debug info:

Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length: 0x222d (32-bit)
   Version: 4
   Abbrev Offset: 0x0
   Pointer Size: 8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c> DW_AT_producer : (indirect string, offset: 0xb6b): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
    <10> DW_AT_language : 12 (ANSI C99)
    <11> DW_AT_name : (indirect string, offset: 0x10ec): ./Programs/python.c
    <15> DW_AT_comp_dir : (indirect string, offset: 0x7a): /home/avd/dev/cpython
    <19> DW_AT_low_pc : 0x41d2f6
    <21> DW_AT_high_pc : 0x1b3
    <29> DW_AT_stmt_list : 0x0
Enter fullscreen mode Exit fullscreen mode

It reads like this - for address range from DW_AT_low_pc = 0x41d2f6 toDW_AT_low_pc + DW_AT_high_pc = 0x41d2f6 + 0x1b3 = 0x41d4a9 source code file is the ./Programs/python.c located in /home/avd/dev/cpython. Pretty straightforward.

So this is what happens when GDB tries to show you the source code:

  • parses the .debug_info to find DW_AT_comp_dir with DW_AT_name attributes for the current object file (range of addresses)
  • opens the file at DW_AT_comp_dir/DW_AT_name
  • shows the content of the file to you

How to tell GDB where are the sources

So to fix our problem with ./Programs/python.c: No such file or directory. we have to obtain our sources on the target host (copy or git clone) and do one of the following:

1. Reconstruct the sources path

You can reconstruct the sources path on the target host, so GDB will find the source file where it expects. Stupid but it will work.

In my case, I can just dogit clone https://github.com/python/cpython.git /home/avd/dev/cpythonand checkout to the needed commit-ish.

2. Change GDB source path

You can direct GDB to the new source path right in the debug session withdirectory <dir> command:

(gdb) list
6 ./Programs/python.c: No such file or directory.
(gdb) directory /usr/src/python
Source directories searched: /usr/src/python:$cdir:$cwd
(gdb) list
6 #ifdef __FreeBSD__
7 #include <fenv.h>
8 #endif
9   
10 #ifdef MS_WINDOWS
11 int
12 wmain(int argc, wchar_t **argv)
13 {
14 return Py_Main(argc, argv);
15 }
Enter fullscreen mode Exit fullscreen mode

3. Set GDB substitution rule

Sometimes adding another source path is not enough if you have a complex hierarchy. In this case, you can add substitution rule for source path with set
substitute-path
GDB command.

(gdb) list
6 ./Programs/python.c: No such file or directory.
(gdb) set substitute-path /home/avd/dev/cpython /usr/src/python
(gdb) list
6 #ifdef __FreeBSD__
7 #include <fenv.h>
8 #endif
9   
10 #ifdef MS_WINDOWS
11 int
12 wmain(int argc, wchar_t **argv)
13 {
14 return Py_Main(argc, argv);
15 }
Enter fullscreen mode Exit fullscreen mode

4. Move binary to sources

You can trick GDB source path by moving binary to the directory with sources.

mv python /home/user/sources/cpython
Enter fullscreen mode Exit fullscreen mode

This will work because GDB will try to look for sources in the current directory ($cwd) as the last resort.

5. Compile with -fdebug-prefix-map

You can substitute the source path on the build stage with-fdebug-prefix-map=old_path=new_path option. Here is how to do it within CPython project:

$ make distclean # start clean
$ ./configure CFLAGS="-fdebug-prefix-map=$(pwd)=/usr/src/python" --with-pydebug
$ make -j
Enter fullscreen mode Exit fullscreen mode

And now we have new sources dir:

$ objdump -g ./python
...
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c> DW_AT_producer : (indirect string, offset: 0xb65): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
    <10> DW_AT_language : 12 (ANSI C99)
    <11> DW_AT_name : (indirect string, offset: 0x10ff): ./Programs/python.c
    <15> DW_AT_comp_dir : (indirect string, offset: 0x558): /usr/src/python
    <19> DW_AT_low_pc : 0x41d336
    <21> DW_AT_high_pc : 0x1b3
    <29> DW_AT_stmt_list : 0x0
...
Enter fullscreen mode Exit fullscreen mode

This is the most robust way to do it because you can set it to something like/usr/src/<project>, install sources there from a package and debug like a boss.

Conclusion

GDB uses debug info stored in DWARF format to find source level info. DWARF is a pretty straightforward format - basically, it’s a tree of DIEs (Debug Info Entries) that describes object files of your programs along with variables and functions.

There are multiple ways to help GDB find sources, where the easiest ones aredirectory and set substitute-path commands, though -fdebug-prefix-map is really useful.

Now, when you have source level info go and explore something!

Resources

Top comments (0)