TL;DR
In order to generate a .exe
file from a .c
file, there are some temporary intermediate files (.i, .s, .o) that are created at each phase of the compilation process
Having taught a lot of students C programming, I felt that the knowledge and understanding of what is happening behind the scenes or on execution of the commands will help learners with better understanding. So during this COVID-19 pandemic lock down I thought of sharing what I know to new learners of programming (those who are depending on online sources)
Most of the beginners of "learning C" know that on compilation, a .exe
file is created/generated. But have you ever wondered how it happens? This post is about the step by step transformation of a .c
file to a .exe
file.
Let us first write a simple C file called welcome.c
that simply prints a welcome message in the console.
#include <stdio.h>
int main()
{
printf("Welcome to learning C");
return 0;
}
I use GCC (GNU Compiler Collection) version 9.2.0. Instead of using an IDE, using a code editor and running the commands in the command line will aid in enhancing your understanding of what happens under the hood.
In the command line, run
gcc welcome.c
This will generate an a.exe
file by default.
To get the output file with the desired name -o
option can be used.
gcc welcome.c -o welcome
On execution of the above command, the output file welcome.exe
is generated. However during this conversion, three other intermediate temporary files(.i, .s and .o) are generated and deleted.
Conversion of a .c to a .exe has the following phases.
- Preprocessing
- Compiling
- Assembling
- Linking
Preprocessing
This is the first phase of the conversion. The file named cpp.exe
is the GNU's preprocessor. cpp.exe
can be used to process the .c
file before compilation. This cpp.exe
file can be seen in the bin folder.
cpp welcome.c -o welcome.i
- generates the preprocessed file
welcome.i
-
-o
is used to mention the output file. Here the output of preprocessing phase iswelcome.i
- .i file is the first intermediate file
The .i
file can also be generated by -E
option with gcc in any one format as follows
gcc -E welcome.c -o welcome.i
gcc -E welcome.c > welcome.i
Let us go through the welcome.c
file now.
stdio.h
in line 1 of welcome.c
is a file that contains function declaration/function prototype for all i/o functions. Since C is a forward declaration language, it is mandatory to inform the compiler about the existence of an entity before the entity is being used. (e.g., declaring a function before being called).
welcome.c
file uses printf()
to display "Welcome to learning C". The code for printf()
is available as a library file(pre-compiled error free content). But it needs to mentioned to the compiler, at the time of compiling welcome.c
.
By including #include<stdio.h>
, the entire file contents are read and embedded in the source code. Instead of copying the entire file contents, function declaration of printf() alone can be mentioned as below.
int printf(const char *restrict format, ...);
int main()
{
printf("Welcome to learning C")
return 0;
}
Replacing header file(s) is one of the activities in the preprocessing phase. This phase also includes removal of comments, removal of extra spaces, etc., There are plenty of material available on preprocessing.
In short, the preprocessing phase retains only the necessary statements to compile and it generates a .i
file as output. Compilation errors are not identified in the preprocessor phase. The above code has a missing semicolon (syntax error) at line no.5. However, the file welcome.i
provides no indication of this.
Compiling
In this phase the syntax errors are identified.
gcc -S welcome.i
The -S
option in the above command tells the compiler to generate the equivalent assembly code.The missing semicolon is caught in a compilation error as follows.
On successful compilation, processor dependent assembly code is generated.
.s file is the second intermediate file.
Assembling
The .s
file is taken as input by the GNU's assembler as.exe
and it generates the file with the .o
extension. The .o file is known as object file.The Object file is in machine code and hence cannot be read by code editors or ĪDEs. The function calls to the library functions (printf()
in this example) still remain unresolved. These function calls are yet to be found from the libraries and attached/linked to the object code. So, the .o
files are not in an executable form.
.o is the third intermediate file
The .c file can also be generated by
gcc -c -o welcome.o welcome.i
Linking
gcc uses a file named ld.exe
as the linker for linking.
An overview of the the linker's responsibility
- takes one or more object files as input
- combines object files to resolve unresolved symbols.
- generates error(linker error) messages, on finding duplicate symbols(e.g. declaring the same variable in more than one object file in global scope)
- if some symbols (eg. malloc ) are unresolved, the linker checks the libraries given to it in the mentioned order.
At the end of the linking phase,all the different parts of code are merged, symbols are resolved and an executable code (.exe
file) is generated. However, memory address are yet to be assigned for the code and data.
For beginners, while building small programs, linking is not an issue. Hence, I am not dwelling too much into these details.
All three intermediate files and an executable file can also be obtained by the following options in gcc.
gcc -o welcome.exe -save-temps welcome.c
Top comments (0)