Challenge RE #6

Introduction

The assembly code is as follows

<f>:
   0:             push   rbp
   1:             mov    rbp,rsp
   4:             mov    QWORD PTR [rbp-0x8],rdi
   8:             mov    QWORD PTR [rbp-0x10],rsi
   c:             mov    rax,QWORD PTR [rbp-0x8]
  10:             movzx  eax,BYTE PTR [rax]
  13:             movsx  dx,al
  17:             mov    rax,QWORD PTR [rbp-0x10]
  1b:             mov    WORD PTR [rax],dx
  1e:             mov    rax,QWORD PTR [rbp-0x10]
  22:             movzx  eax,WORD PTR [rax]
  25:             test   ax,ax
  28:             jne    2c 
  2a:             jmp    38 
  2c:             add    QWORD PTR [rbp-0x8],0x1
  31:             add    QWORD PTR [rbp-0x10],0x2
  36:             jmp    c 
  38:             pop    rbp
  39:             ret

Analysis

Function signature

The first thing to try to figure out here is the signature of f. In this case we have two arguments, each one of 8 bytes, both of them been positioned in the rbp register. This can be easily identified on the instructions:

<f>:
    push rbp
    mov rbp, rsp
    mov    QWORD PTR [rbp-0x8],rdi
    mov    QWORD PTR [rbp-0x10],rsi
    ;; ...

Now, the way these two arguments are used for example in instructions
mov rax, QWORD PTR [rbp - 0x8], gives me the clue that these two arguments are really just char *, strings in C. Let's take that assumption at the moment. Then we have function signature as follows

void f(char *str1, char *str2)

Here I put void as the returned because we haven't analyzed yet the return value, so it's more like a placeholder for the moment.

Main logic

The main logic here starts in memory position c, where the next three instructions

   c:             mov    rax,QWORD PTR [rbp-0x8]
  10:             movzx  eax,BYTE PTR [rax]
  13:             movsx  dx,al

Basically read one character from our first string. Then this character it's copied into our second string, in the first position. This can be checked on instructions:

  17:             mov    rax,QWORD PTR [rbp-0x10]
  1b:             mov    WORD PTR [rax],dx
  1e:             mov    rax,QWORD PTR [rbp-0x10]
  22:             movzx  eax,WORD PTR [rax]

What follows is a check to ax, which contains this character just copied. If this character is the \0 terminator character, we will finish our program, with a jmp 38 instruction. Otherwise we first the position of the pointer to str1 by one, and then the position of str2 by two. Then we come back at the beginning of the loop.

Without getting this into C code, you can have the idea, that this is basically copying str1 into str2, but leaving an "space" between characters. For example, having:

str1 = ABCD\0
str2 = 11111111

# After program we will have

str2 = A1B1C1D1

Let's write the code in C:

void f(char *str1, char *str2)
{
    int i = 0, j = 0;
    while (str1[i] != 0) {
        str2[j] = str1[i];
        i++;
        j += 2;
    }
}

This can be written in a more simplified way, as

void f(char *str1, char *str2)
{
    while (*str1 != 0) {
        *str2++ = *str1++;
        str2++;
    }
}

Notes

This was quite easy compared to the other ones.

DEV Community