Challenge 5

1. Original assembly
2. Assembly notes
3. C translation

1. Original assembly

Original challenge link: Click me.

This is the original function, adapted to fit the nasm syntax.

 1: func:
 2:     cmp     rcx, rsi
 3:     ja      .L10
 4:     sub     rsi, rcx
 5:     add     rsi, 1
 6:     mov     r11, rsi
 7:     je      .L10
 8:     test    rcx, rcx
 9:     jne     .L16
10:     mov     rax, rdi
11:     ret
12: .L10:
13:     xor     eax, eax
14:     ret
15: .L16:
16:     push    rbx
17:     xor     r10d, r10d
18:     mov     r9d, 1
19: .L4:
20:     lea     rax, [rdi + r10]
21:     xor     esi, esi
22:     xor     r8d, r8d
23: .L8:
24:     movzx   ebx, BYTE [rdx + rsi]
25:     cmp     BYTE [rax + rsi], bl
26:     cmovne  r8d, r9d
27:     add     rsi, 1
28:     cmp     rsi, rcx
29:     jne     .L8
30:     test    r8d, r8d
31:     je      .L12
32:     add     r10, 1
33:     cmp     r10, r11
34:     jne     .L4
35:     xor     eax, eax
36: .L12:
37:     pop     rbx
38:     ret

2. Assembly notes

Here are some important notes before converting the code to a higher level language.

2.1. First checks

As you can see, the first thing the function does is check if rcx is greater than rsi. If it is, it returns zero. Otherwise, it subtracts rcx from rsi, adds one to rsi, and uses another je instruction to branch. There are some key concepts you need to know to understand this:

The je and jz instructions are the same. The opcode for both je rel32 and jz rel32 is 0F 84 cd.
Both of these instructions just check the ZF bit inside the EFLAGS register¹.
The cmp instruction just subtracts the second operand from the first operand, but setting the flags without saving the result in the first operand.
This flag is not only changed by the cmp instruction, but also by other instructions like add, sub, etc.

So in this case, it’s checking if rsi minus rcx plus one is zero or not. In other words, if rcx is equal to rsi+1. If it is, it returns zero.

2.2. Function parameters

We can see that the function has 4 parameters because registers rdi, rsi, rdx and rcx are being used. See challenge 3 for more information on the AMD64 System V ABI.

Another important thing to note is how we know the types of the function parameters. We know the first argument (rdi) is a pointer because right after .L4 it’s being used for indexing. We can determine the type of the pointer because:

It’s not multiplying the index by the size of the elements (e.g. i*4 for arrays of DWORDs), so we can asume the size of each element is one byte.
Right after .L8, you can see it’s moving and comparing byte pointers, so it’s not hard to see that these are char*.

2.3. Function return type

We can tell that the function returns a char* because at line 31 of the assembly code, it’s returning without changing the value of rax, and the only place it’s modified is at line 20, where it loads the address of a character inside the string.

The other place where we don’t return zero is at line 10, where we return the first parameter, which as we know is a char*.

2.4. Constructing the `for` loops

We can see that the labels .L4 and .L8 are being used as loops because of the conditional jumps at lines 29 and 34 of the assembly code. We can get the loop conditions right before the jumps.

For the outer loop, the condition is:

while (r10 != r11) {
    // ...
}

And for the inner loop, the condition is:

while (rsi != rcx) {
    // ...
}

We can tell that they are actually for loops because:

One of the variables used for the loop’s condition is initialized before the loop starts (before the label used for jumping). In the first loop, the r10 register is initialized to zero (line 17), and in the inner loop, the rsi register is also initialized to zero (line 21).
Right before checking the conditions, the loops increment those same variables (rsi on line 27 and r10 on line 32).

In my opinion, the best approach when trying to understand a label’s purpose is to look at the places where those labels are being used, and get the loop/conditional information directly from there.

Also note that the rsi variable is now being used as the iterator for the inner loop, instead of holding the old result of subtracting the arguments. This old value was stored in the r11 register on line 6. It’s important to know where the meaning/usage of the registers change.

2.5. Conditional move

After .L8, on line 25, it compares rax[rsi] with rdx[rsi], and moves r9d to r8d if they are not equal. To do this, it uses the cmovne instruction:

CMOVcc — Conditional Move

CMOVNE r32, r/m32: Move if not equal (ZF = 0).

Each of the CMOVcc instructions performs a move operation if the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) are in a specified state (or condition). […] If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.

3. C translation

From the previous challenge, I learned that it’s a good approach to first convert the assembly code into a higher level language like C, and try to figure out what the function does from there.

#include <stdint.h>
#include <stddef.h>

char* func(char* rdi, int rsi, char* rdx, int rcx) {
    if (rcx > rsi)
        /* .L10 */
        return NULL;

    rsi     = (rsi - rcx) + 1;
    int r11 = rsi;

    if (rsi == 0)
        /* .L10 */
        return NULL;

    if (rcx == 0)
        return rdi;

    /* .L16 */
    const int r9 = 1;

    for (int r10 = 0; r10 != r11; r10++) {
        /* .L4 */
        char* rax = &rdi[r10];

        int r8 = 0;
        for (int rsi = 0; rsi != rcx; rsi++) {
            /* .L8 */
            char ebx = rdx[rsi];

            if (rax[rsi] != ebx)
                r8 = r9;
        }

        if (r8 == 0)
            return rax;
    }

    return NULL;
}

After changing some variable names and simplifying:

#include <stdbool.h>
#include <stdint.h>
#include <stddef.h>
#include <stdio.h>

char* func(char* str1, int n1, char* str2, int n2) {
    if (n2 > n1)
        /* .L10 */
        return NULL;

    n1 = (n1 - n2) + 1;

    /* NOTE: `r11` is just used to store the original `rsi` (n1), since `rsi`
     * it's going to be used as iterator for the second loop (j). Since I
     create
     * a `j` variable, I don't need this aux variable. */

    if (n1 == 0)
        /* .L10 */
        return NULL;

    if (n2 == 0)
        return str1;

    /* .L16 */
    /* NOTE: r9 is only used once and it's value (true) never changes */

    for (int i = 0; i != n1; i++) {
        /* .L4 */
        char* substring = &str1[i];

        bool failed = false;
        for (int j = 0; j != n2; j++) {
            /* .L8 */
            if (substring[j] != str2[j])
                failed = true;
        }

        if (!failed)
            return substring;
    }

    return NULL;
}

char* result = func("Hello, world!", 13, "world", 5);
printf("%p\n\"%s\"\n", result, result); /* "world!" */

As we can see, the function is used to return the first match of str2 inside str1. You also need to provide the lengths of the strings.

Note that the n1 and n2 parameters are the lengths of the strings. In the example we pass 5 instead of 6 (sizeof("world")) so "world\0" matches "world!".

Footnotes:

See Intel SDM, Vol. 1, Chapter 3.4.3 EFLAGS Register and Vol. 1, Chapter 5.1.7 Control Transfer Instructions.