## DEV Community

torben feldthusen

Posted on • Updated on

# Howto turn a x86 binary executable back into C source code

• Objective: turn a x86 binary executable back into C source code.
• Understand how the compiler turns C into assembly code.
• Low-level OS structures and executable file format.

Arithmetic Instructions

``````mov eax,2 ; eax = 2
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx
sub ebx, 2 ; ebx = ebx - 2
``````

Accessing Memory

``````mox eax, [1234] ; eax = *(int*)1234
mov ebx, 1234 ; ebx = 1234
mov eax, [ebx] ; eax = *ebx
mov [ebx], eax ; *ebx = eax

``````

Conditional Branches

``````cmp eax, 2 ; compare eax with 2
je label1 ; if(eax==2) goto label1
ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3
jbe label4 ; if(eax<=2) goto label4
jne label5 ; if(eax!=2) goto label5
jmp label6 ; unconditional goto label6

``````

Function calls

First calling a function:
The first operations is to save the return pointer:

``````pop esi ; save esi
Right before leaving the function:
pop esi ; restore esi
``````

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

``````int foo(int a, int b){
return a+b }
c = foo(a, b+1)
``````

translates to

``````c = a+b+1
``````

Loop unrolling

The loop:

``````for(i=0; i<2; i++){
a[i]=0;
}

``````

becomes

``````   a[0]=0;
a[1]=0;

``````

Loop-invariant code motion

``````The loop:
for (i = 0; i < 2; i++) {
a[i] = p + q;
}

``````

becomes:

``````
temp = p + q;
for (i = 0; i < 2; i++) {
a[i] = temp;
}

``````

Common subexpression elimination

• Objective: turn a x86 binary executable back into C source code.
• Understand how the compiler turns C into assembly code.
• Low-level OS structures and executable file format.

Arithmetic Instructions

``````mov eax,2 ; eax = 2
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx
sub ebx, 2 ; ebx = ebx - 2
``````

Accessing Memory

``````mox eax, [1234] ; eax = *(int*)1234
mov ebx, 1234 ; ebx = 1234
mov eax, [ebx] ; eax = *ebx
mov [ebx], eax ; *ebx = eax

``````

Conditional Branches

``````cmp eax, 2 ; compare eax with 2
je label1 ; if(eax==2) goto label1
ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3
jbe label4 ; if(eax<=2) goto label4
jne label5 ; if(eax!=2) goto label5
jmp label6 ; unconditional goto label6

``````

Function calls

First calling a function:
The first operations is to save the return pointer:

``````pop esi ; save esi
Right before leaving the function:
pop esi ; restore esi
``````

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

``````int foo(int a, int b){
return a+b }
c = foo(a, b+1)
``````

translates to

``````c = a+b+1
``````

Loop unrolling

The loop:

``````for(i=0; i<2; i++){
a[i]=0;
}
``````

becomes

``````   a[0]=0;
a[1]=0;

``````

Loop-invariant code motion

``````The loop:
for (i = 0; i < 2; i++) {
a[i] = p + q;
}
``````

becomes:

``````temp = p + q;
for (i = 0; i < 2; i++) {
a[i] = temp;
}

``````

Common subexpression elimination

``````a = b + (z + 1)
p = q + (z + 1)
``````

becomes

``````temp = z + 1
a = b + z
p = q + z

``````

Constant folding and propagation

The assignments:

``````a = 3 + 5
b = a + 1
func(b)
``````

Becomes:

``````func(9)

``````

Delete unnecessary code:

``````a = 1
if (a < 0) {
printf(“ERROR!”)
}
``````

to

``````a = 1

``````

Low-Level Optimizations

Strength reduction

Codes such as:

``````y = x * 2
y = x * 15
``````

Becomes:

``````y = x + x
y = (x << 4) - x

``````

Code block reordering

Codes such as :

``````if (a < 10) goto l1
printf(“ERROR”)
goto label2
l1:
printf(“OK”)
l2:
return;
``````

Becomes:

``````if (a > 10) goto l1
printf(“OK”)
l2:
return
l1:
printf(“ERROR”)
goto l2

``````

Register allocation

• Memory access is slower than registers.
• Try to fit as many as local variables as possible in registers.
• The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

``````mov eax, [esi]
mov ebx, [edi]
``````

Becomes:

``````mov eax, [esi]
mov ebx, [edi]

a = b + (z + 1)
p = q + (z + 1)
``````

becomes

``````temp = z + 1
a = b + z
p = q + z

``````

Constant folding and propagation

The assignments:

``````a = 3 + 5
b = a + 1
func(b)
``````

Becomes:

``````func(9)

``````

Delete unnecessary code:

``````a = 1
if (a < 0) {
printf(“ERROR!”)
}
``````

to

``````a = 1

``````

Low-Level Optimizations

Strength reduction

Codes such as:

``````y = x * 2
y = x * 15
``````

Becomes:

``````y = x + x
y = (x << 4) - x

``````

Code block reordering

Codes such as :

``````if (a < 10) goto l1
printf(“ERROR”)
goto label2
l1:
printf(“OK”)
l2:
return;
``````

Becomes:

``````if (a > 10) goto l1
printf(“OK”)
l2:
return
l1:
printf(“ERROR”)
goto l2

``````

Register allocation

• Memory access is slower than registers.
• Try to fit as many as local variables as possible in registers.
• The mapping of local variables to stack location and registers is not constant.

• Objective: turn a x86 binary executable back into C source code.

• Understand how the compiler turns C into assembly code.

• Low-level OS structures and executable file format.

Arithmetic Instructions

``````mov eax,2 ; eax = 2
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx
sub ebx, 2 ; ebx = ebx - 2
``````

Accessing Memory

``````mox eax, [1234] ; eax = *(int*)1234
mov ebx, 1234 ; ebx = 1234
mov eax, [ebx] ; eax = *ebx
mov [ebx], eax ; *ebx = eax

``````

Conditional Branches

``````cmp eax, 2 ; compare eax with 2
je label1 ; if(eax==2) goto label1
ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3
jbe label4 ; if(eax<=2) goto label4
jne label5 ; if(eax!=2) goto label5
jmp label6 ; unconditional goto label6

``````

Function calls

First calling a function:
The first operations is to save the return pointer:

``````pop esi ; save esi
Right before leaving the function:
pop esi ; restore esi
``````

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

``````int foo(int a, int b){
return a+b }
c = foo(a, b+1)

``````

translates to

``````c = a+b+1
``````

Loop unrolling

The loop:

``````for(i=0; i<2; i++){
a[i]=0;
}
``````
``````becomes
a[0]=0;
a[1]=0;

``````

Loop-invariant code motion

The loop:

``````for (i = 0; i < 2; i++) {
a[i] = p + q;
}
``````

becomes:

``````temp = p + q;
for (i = 0; i < 2; i++) {
a[i] = temp;
}

``````

Common subexpression elimination

• Objective: turn a x86 binary executable back into C source code.
• Understand how the compiler turns C into assembly code.
• Low-level OS structures and executable file format.

Arithmetic Instructions

``````mov eax,2 ; eax = 2
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx
sub ebx, 2 ; ebx = ebx - 2
``````

Accessing Memory

``````mox eax, [1234] ; eax = *(int*)1234
mov ebx, 1234 ; ebx = 1234
mov eax, [ebx] ; eax = *ebx
mov [ebx], eax ; *ebx = eax

``````

Conditional Branches

``````cmp eax, 2 ; compare eax with 2
je label1 ; if(eax==2) goto label1
ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3
jbe label4 ; if(eax<=2) goto label4
jne label5 ; if(eax!=2) goto label5
jmp label6 ; unconditional goto label6

``````

Function calls

First calling a function:
The first operations is to save the return pointer:

``````pop esi ; save esi
Right before leaving the function:
pop esi ; restore esi
``````

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

``````int foo(int a, int b){
return a+b }
c = foo(a, b+1)
``````

translates to

``````c = a+b+1
``````

Loop unrolling

The loop:

``````for(i=0; i<2; i++){
a[i]=0;
}
``````

becomes

``````   a[0]=0;
a[1]=0;

``````

Loop-invariant code motion

``````The loop:
for (i = 0; i < 2; i++) {
a[i] = p + q;
}
``````

becomes:

``````temp = p + q;
for (i = 0; i < 2; i++) {
a[i] = temp;
}

``````

Common subexpression elimination

``````a = b + (z + 1)
p = q + (z + 1)
``````

becomes

``````temp = z + 1
a = b + z
p = q + z

``````

Constant folding and propagation

The assignments:

``````a = 3 + 5
b = a + 1
func(b)
``````

Becomes:

``````func(9)

``````

Delete unnecessary code:

``````a = 1
if (a < 0) {
printf(“ERROR!”)
}
``````

to

``````a = 1

``````

Low-Level Optimizations

Strength reduction

Codes such as:

``````y = x * 2
y = x * 15
``````

Becomes:

``````y = x + x
y = (x << 4) - x

``````

Code block reordering

Codes such as :

``````if (a < 10) goto l1
printf(“ERROR”)
goto label2
l1:
printf(“OK”)
l2:
return;
``````

Becomes:

``````if (a > 10) goto l1
printf(“OK”)
l2:
return
l1:
printf(“ERROR”)
goto l2

``````

Register allocation

• Memory access is slower than registers.
• Try to fit as many as local variables as possible in registers.
• The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

``````mov eax, [esi]
mov ebx, [edi]
``````

Becomes:

``````mov eax, [esi]
mov ebx, [edi]

a = b + (z + 1)
p = q + (z + 1)
``````

becomes

``````temp = z + 1
a = b + z
p = q + z

``````

Constant folding and propagation

The assignments:

``````a = 3 + 5
b = a + 1
func(b)
``````

Becomes:

``````func(9)

``````

Delete unnecessary code:

``````a = 1
if (a < 0) {
printf(“ERROR!”)
}
``````

to

``````a = 1

``````

Low-Level Optimizations

Strength reduction

Codes such as:

``````y = x * 2
y = x * 15
``````

Becomes:

``````y = x + x
y = (x << 4) - x

``````

Code block reordering

Codes such as :

``````if (a < 10) goto l1
printf(“ERROR”)
goto label2
l1:
printf(“OK”)
l2:
return;
``````

Becomes:

``````if (a > 10) goto l1
printf(“OK”)
l2:
return
l1:
printf(“ERROR”)
goto l2

``````

Register allocation

• Memory access is slower than registers.
• Try to fit as many as local variables as possible in registers.
• The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

``````mov eax, [esi]
mov ebx, [edi]
``````

Becomes:

``````mov eax, [esi]
mov ebx, [edi]

``````

Instruction scheduling

Assembly code like:

``````mov eax, [esi]
mov ebx, [edi]
``````mov eax, [esi]