DEV Community

Ahmed Farid
Ahmed Farid

Posted on

Undefined Reference: The Internals of Object Files and Linking

If you're a C/C++ developer, you've probably encountered this annoying linker error before:

undefined reference to 'symbol'

In this post, I explain exactly what this error means and why it occurs. We go into the details of object files and the linking process.

Note: This post assumes a Linux environment.

TL;DR

This error means one of the following:

  1. You declared a function and called it without providing its definition.
  2. You included a header file of a library, and you called a function declared in this header file, but you didn't link to the library itself.

C/C++ Compilation Steps

The compilation of a C/C++ program is carried out in these steps:

  1. Preprocessing: Takes the original C/C++ source file and produces an intermediate C/C++ source file.
  2. Compilation: Takes the intermediate C/C++ source file and produces an assembly code file.
  3. Assembly: Takes the assembly code file and produces an object file.
  4. Linking: Takes the object files and links them to produce the final executable file.

We'll examine a simple C++ program, and we'll stop its compilation at step 2 "Compilation" to examine the output assembly file, and at step 3 "Assembly" to examine the output object file.

C/C++ to Assembly

I'll explain in the context of C. The same concepts apply to C++ as well.

Take a look at the following C program:

void defined_function() {}

void undefined_function();

void main()
{
    defined_function();
    undefined_function();
}
Enter fullscreen mode Exit fullscreen mode

We define a function called defined_function with an empty body, we declare a function called undefined_function without defining it, and we call both functions from the main function.

Assume the program is in a file called main.c. We compile the program with gcc using the -S option to stop at the compilation step and examine the output assembly file:

gcc -S main.c
Enter fullscreen mode Exit fullscreen mode

The output assembly file main.s will be something like this:

    .file   "main.c"
    .text
    .globl  defined_function
    .type   defined_function, @function
defined_function:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    nop
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   defined_function, .-defined_function
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, %eax
    call    defined_function
    movl    $0, %eax
    call    undefined_function
    nop
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (GNU) 12.2.0"
    .section    .note.GNU-stack,"",@progbits
Enter fullscreen mode Exit fullscreen mode

Now let's go for a quick assembly refresher before we continue. An assembly program contains a list of instructions that the CPU executes one by one. These instructions could be movement instructions like mov, calculation instructions like add and sub, control transfer instrutions like jmp and call, and many more.

An assembly program can also contain labels at the beginning of lines in the form label:. Writing labels at the beginning of lines like this defines them. A label is simply a label for this place in memory that can be referenced in other instructions like jmp and call. So for example jmp label means jump to the memory location labeled by label.

Now let's examine the assembly program above. This program contains a lot of details that we don't need and we'll focus only on the important parts.

Notice the labels defined_function: and main:: these correspond to the defined functions in our C program. Also, notice the instructions call defined_function and call undefined_function: these correspond to the function calls in our C program. Notice also that there is no label defined for the undefined function undefined_function.

Assembly to Object File

We now compile the C file with gcc using the -c option to stop at the assembly step:

gcc -c main.c
Enter fullscreen mode Exit fullscreen mode

Alternatively, we can assemble the assembly file using the GNU assembler:

as main.s -o main.o
Enter fullscreen mode Exit fullscreen mode

Both approaches will produce the same output object file main.o.

Object File Format

You probably know that the assembly step transforms the assembly code to binary, but is this binary the only thing that is present in the object file? The answer is no.

The object file contains other metadata about the program such as section information, a symbol table and relocation information. In Linux, the object file is in a format called ELF (Executable and Linkable Format). There are many formats such as the PE (Portable Executable) format used in Windows and an older format called a.out. In this post, we'll focus on the ELF format.

How can we view the contents of an ELF file? There is a utility in Linux called readelf that we can use. We're only interested in the symbol table now, so we use readelf on the object file and pass -s to it to view only the symbol table:

readelf -s main.o
Enter fullscreen mode Exit fullscreen mode

The output is as follows:

Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     7 FUNC    GLOBAL DEFAULT    1 defined_function
     4: 0000000000000007    27 FUNC    GLOBAL DEFAULT    1 main
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND undefined_function
Enter fullscreen mode Exit fullscreen mode

Let's analyze this output.

Notice that there are symbols for defined_function and main, and their Ndx is a number which means they are defined. The assembler creates a defined symbol in the symbol table for each label defined in the assembly code. Because there are lines beginning with defined_function: and main: in the assembly code, they are defined symbols in the symbol table.

Notice also that there is a symbol for undefined_function, and its Ndx is UND which means undefined. The assembler creates an undefined symbol in the symbol table for each label referenced in instructions but not defined. Because undefined_function is referenced in an instruction (call undefined_function) and there is no line beginning with undefined_function: in the assembly code, it is an undefined symbol in the symbol table.

Also, notice that our three symbols defined_function, undefined_function and main have a Bind of GLOBAL, which means they are global symbols. This is important because when the linker links files, it sees only the global symbols.

Executables and Library Files

In the final stage of compilation, object and library files are linked together to produce an executable or library file. An executable file is a file with an entry point. A library file is a collection of object files where each object file has a collection of functions, and there is no entry point. There are two types of libraries: static and dynamic libraries. In this post, we're only interested in static libraries. In Linux, static libraries have a .a extension and are sometimes called static archives, and they are also in the ELF format.

Linking

The input to linking is object files and library files. When linking, the linker reads all global symbols in all input object and library files. For each undefined symbol, the linker checks if there is a defined symbol with the same name taken from another file. If there is a defined symbol with the same name of the undefined symbol for each undefined symbol, the linking can proceed successfully. On the other hand, if there are undefined symbols that don't have defined symbols with the same name, the linker issues the error undefined reference to 'symbol' for each of them.

In our example, if the input to the linker is only main.o, the linker will issue the following error:

undefined reference to 'undefined_function'

This is because undefined_function is an undefined symbol and there is no defined symbol with the same name.

The solution of this error would be either to define undefined_function in main.c, to compile with another C file that has the definition of undefined_function, or to link with a library that has undefined_function defined.

Conclusion

In summary, each defined function in an input C file will have a defined symbol in the output object file, and each declared and called but not defined function in an input C file will have an undefined symbol in the output object file. During linking, when an undefined symbol doesn't have another defined symbol with the same name, the linker issues the undefined reference error.

That's it. I hope you now really understand why this error occurs and I hope you have also gained insight into the content of object files and the linking process 🙂

Top comments (0)