Challenge 3

1. Original assembly
2. Analysis
3. C version
4. More information

1. Original assembly

Original challenge link: Click me.

This array of 32-bit integers is available to the function.

int32_t v[64] = { -1, 31, 8,  30, -1, 7,  -1, -1, 29, -1, 26, 6,  -1, -1, 2,  -1,
                  -1, 28, -1, -1, -1, 19, 25, -1, 5,  -1, 17, -1, 23, 14, 1,  -1,
                  9,  -1, -1, -1, 27, -1, 3,  -1, -1, -1, 20, -1, 18, 24, 15, 10,
                  -1, -1, 4,  -1, 21, -1, 16, 11, -1, 22, -1, 12, 13, -1, 0,  -1 };

This is the original function, adapted to fit the nasm syntax.

f:
    mov     edx, edi
    shr     edx, 1
    or      edx, edi
    mov     eax, edx
    shr     eax, 2
    or      eax, edx
    mov     edx, eax
    shr     edx, 4
    or      edx, eax
    mov     eax, edx
    shr     eax, 8
    or      eax, edx
    mov     edx, eax
    shr     edx, 16
    or      edx, eax
    imul    eax, edx, 79355661 ; 0x4BADF0D
    shr     eax, 26
    mov     eax, v[0 + rax * 4]
    ret

2. Analysis

First of all, it loads the function argument into edx.

mov     edx, edi

Keep in mind that this is a 64-bit program, so the x86-64 calling convention is different from the i386 calling convention:

AMD64 System V ABI - 3.2.3 Parameter Passing

After the argument values have been computed, they are placed either in registers or pushed on the stack. […]

Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:

If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.

After that, it shifts the value in the edx register (the function argument) 1 bit to the right, and OR’s it with the original unshifted value from edi.

shr     edx, 1
or      edx, edi

Next, it moves the OR’d value to eax, and shifts it again 2 bits to the right. Then it OR’s them back into eax.

mov     eax, edx
shr     eax, 2
or      eax, edx

It repeats that same process, duplicating the number of bits shifted every time:

mov     edx, eax
shr     edx, 4
or      edx, eax

mov     eax, edx
shr     eax, 8
or      eax, edx

mov     edx, eax
shr     edx, 16
or      edx, eax

Let’s have a look at what this is doing to the bits of our input:

Shifted	Hex value	Binary value
0	`0x80000000`	`0b10000000000000000000000000000000`
1	`0xC0000000`	`0b11000000000000000000000000000000`
2	`0xF0000000`	`0b11110000000000000000000000000000`
4	`0xFF000000`	`0b11111111000000000000000000000000`
8	`0xFFFF0000`	`0b11111111111111110000000000000000`
16	`0xFFFFFFFF`	`0b11111111111111111111111111111111`

Each value shows the number after shifting and ORing it with the previous value. The first value would be our input, but as you can imagine, the result ends up being the same even if there is more than one bit set on the input value. The only thing that matters from the input is the last set bit, since every bit from there will be set.

Then, it performs a signed multiplication on the last value (currently stored in edx, in our example 0xFFFFFFFF) and 79355661 (0x4BADF0D), and stores it in eax.

From felixcloutier.com:

IMUL — Signed Multiply

Performs a signed multiplication of two operands. This instruction has three forms, depending on the number of operands:

Three-operand form: This form requires a destination operand (the first operand) and two source operands (the second and the third operands). Here, the first source operand […] is multiplied by the second source operand (an immediate value). The intermediate product […] is truncated and stored in the destination operand (a general-purpose register).

Then, it shifts the eax register 26 bits to the right.

imul    eax, edx, 79355661 ; 0x4BADF0D
shr     eax, 26

The value of eax after the multiplication would be 0xFB4520F3, which would be 0x3E (62) after shifting it 26 bits to the left.

It then uses the value stored in the eax register as an index for the v array. To access the index, we multiply the position by the size of each element. In this case, 4 bytes, since the exercise specifies that they are 32-bit integers.

mov     eax, v[0 + rax * 4]
ret

So, what’s the item at position 62 in our array? Turns out it’s zero.

Let’s try to use another number to see how it changes:

Initial number:            0x000F0055  0b00000000000011110000000001010101
After shifting and ORing:  0x000FFFFF  0b00000000000011111111111111111111
After the multiplication:  0xEC1520F3  0b11101100000101010010000011110011
After the last shift:      59
Number at that index:      12

It looks like it’s calculating the position of the first set bit starting from the left (starting from zero). Or, in other words, the number of cleared bits after the first Shift + OR step.

For example, for 0x80000000 (0b10000000000000000000000000000000), the first set bit is the 32nd bit from the right, so there are 0 spaces left. For 0x000F0055 (0b00000000000011110000000001010101), it would be the 20th bit from the right, so there are 12 spaces left.

Finally, we could say that this is an algorithm for counting the leading zero bits of a double word. See More information.

3. C version

I made a similar version, but it’s not exactly a translation since I am using a for loop instead of copying and pasting the Shift + OR code 5 times.

#include <stdint.h>
#include <stdio.h>

static int32_t v[64] = { -1, 31, 8,  30, -1, 7,  -1, -1, 29, -1, 26, 6,  -1,
                         -1, 2,  -1, -1, 28, -1, -1, -1, 19, 25, -1, 5,  -1,
                         17, -1, 23, 14, 1,  -1, 9,  -1, -1, -1, 27, -1, 3,
                         -1, -1, -1, 20, -1, 18, 24, 15, 10, -1, -1, 4,  -1,
                         21, -1, 16, 11, -1, 22, -1, 12, 13, -1, 0,  -1 };

static int func(uint32_t num) {
    uint32_t tmp = num;

    for (int i = 1; i <= 16; i *= 2) {
        tmp >>= i;
        num |= tmp;
        tmp = num;
    }

    tmp *= 0x4BADF0D;
    tmp >>= 26;
    return v[tmp];
}

#define PRINT_EXPR(E) printf("%s -> %d\n", #E, E)

int main(void) {
    PRINT_EXPR(func(0x80000000)); /* 0 */
    PRINT_EXPR(func(0x000F0055)); /* 12 */
    return 0;
}

4. More information

After emailing Dennis Yurichev, the owner of the challenges.re website, I discovered that this called a find first set algorithm. From Wikipedia:

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position.

From Dennis’ math notes:

9.5.3 Leading zero bits counting

For example, 0x100 becomes 0x1ff, 0x1000 becomes 0x1fff, 0x20000 becomes 0x3ffff, 0x12340000 becomes 0x1fffffff. It works because all 1 bits are gradually propagated towards the lowest bit in 32-bit number, while zero bits at the left of most significant 1 bit are not touched.

It’s possible to add 1 to resulting number, so it will becomes 0x2000 or 0x20000000, but in fact, since multiplication by magic number is used, these numbers are very close to each other, so there is no error.

[…]

The magic number I found using just brute-force, so the readers will not be able to google it, for the sake of exercise.

The code is tricky after all, and the moral of the exercise is that practicing reverse engineer sometimes may just observe input/outputs to understand code’s behavior instead of diving into it.

Challenge 3

Table of Contents

1. Original assembly

2. Analysis

3. C version

4. More information