Gealber Morales

Posted on Sep 28, 2023 • Edited on Oct 7, 2023 • Originally published at gealber.com

Challenge RE #22

#reverseengineering #assembly #c

In this article let's dive into the #22 RE Challenge. As in previous challenges we have assembly code examples that we have to understand and infer what it's doing. Sometimes this process it's straightforward, if you understand assembly language, of course, other times it's hard as fuck like in Challenge #19...I haven't forgotten this baby, going back to you another day. Enough of dirty talk with programming challenges Gealber. Back to our goal here, the assembly code can be found in challenge.re, take a look there's a fair amount of challenges there.

Note:

I know we programmers like a lot the Ctr+C and Ctr+V combination keys, but I don't recommend you to do so with this code here. The main point of these challenges is to understand what this code does, but that doesn't mean that all the code here runs smoothly.

Analysis

The first two things that you should identify in this code, we have two functions here, f1 and f2. Given that f1 has calls to f2, we can assume that this is our main entry point. Let's start with this.

;; SIGNATURE OF F1
;; ARGS: rdi, esi, edx
;; void f1(int *arr,int start,int end);
f1:
        push    r13
        push    r12
        ;; r12d
        mov     r12d, edx
        push    rbp
        ;; rbp
        mov     rbp, rdi
        push    rbx
        ;; ebx
        mov     ebx, esi
        push    rcx

.L12:
        ;; ebx >= r12d (two numbers, then)
        ;; start >= end
        cmp     ebx, r12d
        jge     .L10

        ;; copy back original parameters
        ;; to their original registers
        ;; to pass them as argument for f2
        ;; from this we know that rdi is an array of integers as well
        ;; f2(arr, start, end)
        mov     esi, ebx
        mov     edx, r12d
        mov     rdi, rbp
        call    f2

        ;; end arg
        lea     edx, [rax-1]
        mov     r13d, eax
        ;; start arg
        mov     esi, ebx
        ;; arr
        mov     rdi, rbp
        lea     ebx, [r13+1]

        call    f1

        jmp     .L12
.L10:
        pop     rax
        pop     rbx
        pop     rbp
        pop     r12
        pop     r13
        ret

Looking at this code, you can notice first that f1 calls itself, so it's a recursive function. Also, we have a loop which stops condition it's that ebx >= r12d. Putting this into C code, we have


void f1(int *arr, int start, int end)
{
    while (start < end) {
        end = f2(arr, start, end) - 1;
        f1(arr, start, end);
    }

    return;
}

I renamed the parameters to something meaningful. This is one of the most painful things of assembly, you have just registers with weird names :).

Now f2, this is the tricky one here I won't explain the whole code only the most tricky part. The signature we already know it from f1, we will receive an array of ints with and two ints.

xor     r11d, ebx
;; arr[i+1] = r11d;
mov     DWORD PTR [rcx+4+r8], r11d
;; r11d = arr[j-1] ^ arr[i+1] ^ arr[j-1] = arr[i+1];
xor     r11d, DWORD PTR [r9]
;; arr[j-1] = arr[i+1]
mov     DWORD PTR [r9], r11d
;; arr[i+1] ^= arr[i+1];
xor     DWORD PTR [rcx+4+r8], r11d

This was a real pain because if it was with normal variables you would know what it does. Look this is the equivalent of normal C code

a ^= b
b ^= a
a ^= b

😅 not so obvious either, but if you take into consideration that XOR it's a commutative operation you can figure out that this is a swap of variables, but without using a temporary variable. Actually, it's called XOR Swap, please don't use this to swap your variables. First, this is not intuitive, and also when you provide equal variables, you will end up with values 0 on both variables. Even if you put an if statement to discard this case, will run slower than the normal temporary approach.

This is the most tricky part of this function and the two loops it has inside as well. I managed to get this into C code, would be something like this

int f2(int *arr, int start, int end)
{
    int i = start, j = end + 1, esi = start;
    while (1) {
        esi++;
        if ((esi > end) || (arr[i+1] > arr[start])) {
            while (arr[j-1] > arr[start]) {
                j--;
            }

            if (j <= esi)
                break;

            swap(arr, i+1, j);
        }

        i++;
    }

    swap(arr, start, j-1);

    return j;
}

Note:

Let me put this here again. This code it's my literal translation of the assembly code, so I might be wrong, please don't copy this as an implementation of the algorithm I think it is.

When you combine these two functions we have what seems to be a recursive sorting algorithm. If you take a look at f2 you can notice that it's quite similar to the partition process in quick sort algorithm. This is my guess.

Formal description