DEV Community

Cover image for Journey of a .c file
Bharathy
Bharathy

Posted on • Updated on

Journey of a .c file

TL;DR
In order to generate a .exe file from a .c file, there are some temporary intermediate files (.i, .s, .o) that are created at each phase of the compilation process


Having taught a lot of students C programming, I felt that the knowledge and understanding of what is happening behind the scenes or on execution of the commands will help learners with better understanding. So during this COVID-19 pandemic lock down I thought of sharing what I know to new learners of programming (those who are depending on online sources)

Most of the beginners of "learning C" know that on compilation, a .exe file is created/generated. But have you ever wondered how it happens? This post is about the step by step transformation of a .c file to a .exe file.

Let us first write a simple C file called welcome.c that simply prints a welcome message in the console.

#include <stdio.h>

int main()
{
    printf("Welcome to learning C");
    return 0;
}

I use GCC (GNU Compiler Collection) version 9.2.0. Instead of using an IDE, using a code editor and running the commands in the command line will aid in enhancing your understanding of what happens under the hood.

In the command line, run

gcc welcome.c

This will generate an a.exe file by default.
To get the output file with the desired name -o option can be used.

gcc  welcome.c -o welcome

On execution of the above command, the output file welcome.exe is generated. However during this conversion, three other intermediate temporary files(.i, .s and .o) are generated and deleted.

Conversion of a .c to a .exe has the following phases.

  • Preprocessing 
  • Compiling
  • Assembling
  • Linking 

Preprocessing

 This is the first phase of the conversion. The file named cpp.exe is the GNU's preprocessor. cpp.exe can be used to process the .c file before compilation. This cpp.exe file can be seen in the bin folder.

Preprocessor Assembler and Compiler in bin folder

cpp welcome.c -o welcome.i
  • generates the preprocessed file welcome.i
  • -o is used to mention the output file. Here the output of preprocessing phase is welcome.i
  • .i file is the first intermediate file

The .i file can also be generated by -E option with gcc in any one format as follows

gcc -E welcome.c -o welcome.i   
gcc -E welcome.c > welcome.i

Let us go through the welcome.c file now.

stdio.h in line 1 of welcome.c is a file that contains function declaration/function prototype for all i/o functions. Since C is a forward declaration language, it is mandatory to inform the compiler about the existence of an entity before the entity is being used. (e.g., declaring a function before being called). 
welcome.c file uses printf() to display "Welcome to learning C". The code for printf() is available as a library file(pre-compiled error free content). But it needs to mentioned to the compiler, at the time of compiling welcome.c
By including #include<stdio.h>, the entire file contents are read and embedded in the source code. Instead of copying the entire file contents, function declaration of printf() alone can be mentioned as below.

int printf(const char *restrict format, ...);

int main()
{
    printf("Welcome to learning C")
    return 0;
}

Replacing header file(s) is one of the activities in the preprocessing phase. This phase also includes removal of comments, removal of extra spaces, etc., There are plenty of material available on preprocessing.

In short, the preprocessing phase retains only the necessary statements to compile and it generates a .i file as output. Compilation errors are not identified in the preprocessor phase. The above code has a missing semicolon (syntax error) at line no.5. However, the file welcome.i provides no indication of this.

welcome.i - Preprocessing is completed. Missing ; at line no. 9 not detected

Compiling

In this phase the syntax errors are identified.

gcc  -S welcome.i

The -S option in the above command tells the compiler to generate the equivalent assembly code.The missing semicolon is caught in a compilation error as follows.

On successful compilation, processor dependent assembly code is generated.

welcome.s - Assembly equivalent for welcome.i

.s file is the second intermediate file.

Assembling

 The .s file is taken as input by the GNU's assembler as.exe and it generates the file with the .o extension. The .o file is known as object file.The Object file is in machine code and hence cannot be read by code editors or ĪDEs. The function calls to the library functions (printf() in this example) still remain unresolved. These function calls are yet to be found from the libraries and attached/linked to the object code. So, the .o files are not in an executable form.

.c to .o

.o is the third intermediate file

The .c file can also be generated by

gcc  -c  -o welcome.o welcome.i

Linking

gcc uses a file named ld.exe as the linker for linking. 

An overview of the the linker's responsibility

  • takes one or more object files as input 
  • combines object files to resolve unresolved symbols. 
  • generates error(linker error) messages, on finding duplicate symbols(e.g. declaring the same variable in more than one object file in global scope)
  • if some symbols (eg. malloc ) are unresolved, the linker checks the libraries given to it in the mentioned order.

At the end of the linking phase,all the different parts of code are merged, symbols are resolved and an executable code (.exe file) is generated. However, memory address are yet to be assigned for the code and data.

For beginners, while building small programs, linking is not an issue. Hence, I am not dwelling too much into these details.

All three intermediate files and an executable file can also be obtained by the following options in gcc.

gcc -o welcome.exe -save-temps welcome.c

Latest comments (0)