DEV Community

Divyanshu Shekhar
Divyanshu Shekhar

Posted on • Edited on

C Program Compilation Process

In the programming world, everything starts with source code. Source code is also known as the code base, which consists of a number of files written in a programming language. C programming language that is understood by humans and not by machines. The C source code is compiled to machine-level instructions in order to be executed by the machine. In this blog, we will go through the steps to get a compiled C output and learn the C Program Compilation Process to convert the source code to binary.

C Program Compilation Pipeline

Usually, the C Program Compilation Process files take a few seconds, but during this brief period of time, the c source code enters a pipeline and many distinct components perform their task.

Before we proceed further, there are two rules that we should know:

C Program Compilation Rule

*Only Source files are compiled
*Each File is Compiled Separately
The Components of the C Program Compilation Pipeline are:

1.Preprocessor
2.Compilation
3.Assembly
4.Linker
Each component of the compilation pipeline accepts a certain input from the previous component and produces a certain output for the next component in the C Program Compilation Pipeline.

This process continues until the last component in the compilation pipeline generates the required output file i.e binary file.

One thing we must know about the compilation pipeline is that it will only generate the output, if and only if, the source file passes through all the compilation pipeline components successfully. Even a small failure in any one of the components can lead to a compilation or linkage failure and will give an error message.

Step 1 – Preprocessing

The first step in the C Program Compilation Pipeline is preprocessing. A source file consists of a number of header files and these files are included in the body of the C code by the preprocessor before the compilation starts.

This preprocessed code is called the translation unit (Compilation Unit). A translation unit is a single logical unit of C code generated by the preprocessor and is ready for serving to the compiler.

Enough Theory, let’s have a practical look.

Extract Preprocessed Code From GCC

We can also take a look at the obtained file from every component in the compilation pipeline. Let’s ask the C Compiler to dump the translation unit without compiling it further.

In case of gcc, -E (case-sensitive) option is used to dump the translation unit of the source code.

C-Code

// Header File
#include <stdio.h>

#define Max 10

int main()
{
    printf("Hello World"); // Print Hello World
    printf("%d", Max);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Dump Translation Unit

$ gcc -E cprogram.c


# 1 "cprogram.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cprogram.c"
.......
.......
.......
typedef __time64_t time_t;
# 435 "C:/msys64/mingw64/x86_64-w64-mingw32/include/corecrt.h" 3

typedef struct localeinfo_struct {
  pthreadlocinfo locinfo;
  pthreadmbcinfo mbcinfo;
} _locale_tstruct,*_locale_t;

typedef struct tagLC_ID {
  unsigned short wLanguage;
  unsigned short wCountry;
  unsigned short wCodePage;
} LC_ID,*LPLC_ID;

.......
.......
.......
# 1582 "C:/msys64/mingw64/x86_64-w64-mingw32/include/stdio.h" 2 3
# 3 "cprogram.c" 2


# 5 "cprogram.c"
int main()
{
    printf("Hello World");
    printf("%d", 10);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

What insides we get from the translation unit code:

*All the declarations are copied from the header file into the translation unit.
*Comments are removed
*Macros values are copied to their respective places in the translation unit
The above translation code is not full, as it is very large because it includes the stdio.h header file (Total 1038 lines of translation unit). To see the whole output run this command on your development machine.

This example shows us how the preprocessor works. The Preprocessor only does simple taks like inclusion, by copying contents from a file or macro expansion by text substitution.

Preprocessors are unware of the C syntax and rules and uses a parser, which looks for directives in the input code.

There is a tool called cpp, which stands for C Pre-Processor is used to preprocess a C file. This tool is a part of the C development bundle that is shipped with each flavor of UNIX.

$ cpp cprogram.c
Enter fullscreen mode Exit fullscreen mode

This code will give you the preprocessed code of the c program.

The preprocessed file has an extension of .i and if you pass this file to the C compiler, then it will bypass the preprocessor step. This happens because a file with .i extension is supposed to have already been preprocessed and thus is sent directly to the compilation step.

Read more about C Program Compilation.

Top comments (0)