Introduction
I have been working with C all my professional and student life. There have been times when I had to look a little deeper to understand what is going on with my buggy program. There are many tools and techniques to examine an executable and this post is about that and a little bit of reverse engineering.
Test Code
As an example I wrote a pretty basic piece of code with some intentional inclusions. There are two global variables msga
and msgb
. Two user defined routines allow
and deny
that get executed inside the main
function. One conditional call to an external program using execvp
. The idea here is to examine the executable this program creates. Find out where the code I wrote lands in the executable and what compiler adds on top of it. Later I'll showcase some basic reverse engineering that can be done by pretending we haven't seen the code.
#include <stdio.h>
#include <unistd.h>
char *msga = "Allow";
char *msgb = "Deny";
void allow() {
printf("%s\n", msga);
}
void deny() {
printf("%s\n", msgb);
}
int main(int argc, char **argv) {
deny();
int runExternal = 0;
if (runExternal) {
char* lsargs[] = {"ls", "-l", NULL};
execvp("ls", lsargs);
}
}
While dealing with executable we'll encounter a lot of hexadecimal values, I prefer using Python to do quick arithmetic whenever the need arises.
Also, some of the output is going to be too big to paste here in the post so I'll link them in the end.
Goal
Let's compile the program and get our a.out
.
gcc examinebin.c
As we see in the code the execution of the program basically runs the deny
and halts. I am creating a goal for myself that I'll identify the instruction inside the executable and change it to make sure that allow
is called and then 'lsis executed with
-a` argument.
PS: The tools used to do analysis and their output are listed at the bottom of this post for reference.
Call Sequence
If we look at the objdump
output, it is very neatly divided into segment and clearly labeled with symbol names. The code we are interested in is the one we wrote but it's nice to know what everything else is.
The short version is every C program needs a main
routing which marks the start and end of user written code. C runtime executes main
within its framework and takes care of all static and runtime dependencies. The order of execution can be determined very easily by hooking up the executable with gdb
and adding breakpoint to all symbols defined in .text
section and _init
& _fini
. Let's see what happens.
breakpoint : _init, _start, deregister_tm_clones, register_tm_clones, __do_global_dtors_aux, frame_dummy, allow, deny, main, __libc_csu_init, __libc_csu_fini, _fini
This is the call sequence labelled by me based on what my understanding of the usual meaning of these symbols
// Initialisation
_init (argc=1, argv=0x7fffffffdfd8, envp=0x7fffffffdfe8)
_start ()
__libc_csu_init ()
_init ()
frame_dummy ()
register_tm_clones ()
// User Code
main ()
deny () // we want to call allow and execvp here instead
// Deconstruction and finalisation
__do_global_dtors_aux ()
deregister_tm_clones ()
deregister_tm_clones ()
_fini ()
Identification
Now that we know what we don't have to explore we can focus on the task at hand, calling allow and ls
with -a
. To do that we will specify our goal properly, basically we want to:
- call allow instead of deny.
- change
runExternal
flag value to non-zero. - change
"-l"
to"-a" in
lsargs`
To do that we have to know where these values are in binary and then change them manually without disturbing everything else.
Replace deny
The hexadecimal code calling deny
from objdump
output
0000000000001189 <allow>:
00000000000011a3 <deny>:
00000000000011bd <main>:
11e4: e8 ba ff ff ff callq 11a3 <deny>
11e9: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
from the callq reference we know that opcode e8
takes the operand ba ff ff ff
(0xffffffba) which is basically the offset from next instruction 0x11e9
. So, it should point to (0x11a3)
offset = hex(0xffffffba - 0x100000000) # getting the negative value
deny_addr = hex(0x11e9 + int(offset, 16))
print(deny_addr)
To call allow
instead we will have to change (0xffffffba) to something that gives (0x1189) instead.
allow_addr = hex(0x1189)
offset = hex(int(hex(int(allow_addr, 16) - 0x11e9), 16) + 0x100000000)
print(offset)
# 0xffffffa0 -> a0 ff ff ff
So all we need to do is change ba
to a0
in the binary
Change runExternal
This is quite simple, all we need to do is locate the mov
instruction that is putting the value in the flag.
11e9: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
and change the value to any non-zero one. ref
00 00 00 00 -> 01 00 00 00
Change "-l"
We basically want to change the arguments going into execvp
function call. In the assembly we can see the location where the callq
to execvp has been made and there should be push
or lea
instruction before that to add the argument into the stack. Since these values are hardcoded in binaries all we need to do is get the location of -l
and change it to -a
.
11f6: 48 8d 05 12 0e 00 00 lea 0xe12(%rip),%rax # 200f <_IO_stdin_used+0xf>
11fd: 48 89 45 e0 mov %rax,-0x20(%rbp)
1201: 48 8d 05 0a 0e 00 00 lea 0xe0a(%rip),%rax # 2012 <_IO_stdin_used+0x12>
1208: 48 89 45 e8 mov %rax,-0x18(%rbp)
121b: 48 8d 3d ed 0d 00 00 lea 0xded(%rip),%rdi # 200f <_IO_stdin_used+0xf>
1222: e8 69 fe ff ff callq 1090 <execvp@plt>
lea instruction is basically calculating the effective address which in every case here is an offset to the next instruction pointer. So we have three addresses, which can be calculated or seen in the objdump output as well.
print(hex(0xe12 + 0x11fd)) # 0x200f
print(hex(0xe0a + 0x1208)) # 0x2012
print(hex(0xded + 0x1222)) # 0x200f
from the hexdump output we can clearly see that our strings are really there.
00002000: 0100 0200 416c 6c6f 7700 4465 6e79 006c ....Allow.Deny.l
00002010: 7300 2d6c 0000 0000 011b 033b 5400 0000 s.-l.......;T...
Changing the fourth byte from the right 6c -> 61 will make l->a
.
Changes
Let's summarize and do all the necessary changes to the text output provided by xxd
utility.
Changes for allow
000011e0: 0000 0000 e8(ba) ffff ffc7 45dc 0000 0000
000011e0: 0000 0000 e8(a0) ffff ffc7 45dc 0000 0000
Changes for runExternal
flag
000011e0: 0000 0000 e8ba ffff ffc7 45dc (00)00 0000
000011e0: 0000 0000 e8ba ffff ffc7 45dc (01)00 0000
Changes for l -> a
00002010: 7300 2d(6c) 0000 0000 011b 033b 5400 0000
00002010: 7300 2d(61) 0000 0000 011b 033b 5400 0000
Create new executable
Using xxd utility
xxd -r modified-xxd.txt > a2.out
change permission
chmod +x a2.out
Run
Well I can tell you that it actually works but it's better to try yourself. The output is:
Before
Deny
After
Allow
. .. a.out a2.out
Tools
Lets review some tools that tell us about the file from the outside.
file
utility that gives the file name, file type and other format related information.
sum
Get the checksum and number of blocks in the file. Once we do some reverese engineering this output will tell us that the new executable is not genuine.
ldd
Gives the list of shared objects required by the executable
There are some utilities that give a quick peek about the executable if in depth examination is not something you need.
strings
Displays all printable characters and strings in the file. Works on any file in fact not just executable
nm
Lists all the symbols present in the executable file address map.
Now comes the in depth analysis of executable, this includes intrepreting the machine code into human readable form and also figuring out a way to edit the file.
objdump
using the -d
option you can get the detailed version of each section and segment of your executable along with the interpreted assembly instruction.
xxd or hexdump
These are plain read-write tools to deal with binary files and not just executables. Reading part creates a text file showing hexadecimal values at each byte and if possible there is a printable version side by side. Any changes to this output text file can be fed back to the tool, which can then create a binary file.
I am using xxd
for reading and writing the executable here.
Feel free to send me an email for any suggestion or feedback. Follow me on twitter and github.
Top comments (0)