Challenge 3
Table of Contents
1. Original assembly
Original challenge link: Click me.
This array of 32-bit integers is available to the function.
int32_t v[64] = { -1, 31, 8, 30, -1, 7, -1, -1, 29, -1, 26, 6, -1, -1, 2, -1, -1, 28, -1, -1, -1, 19, 25, -1, 5, -1, 17, -1, 23, 14, 1, -1, 9, -1, -1, -1, 27, -1, 3, -1, -1, -1, 20, -1, 18, 24, 15, 10, -1, -1, 4, -1, 21, -1, 16, 11, -1, 22, -1, 12, 13, -1, 0, -1 };
This is the original function, adapted to fit the nasm syntax.
f: mov edx, edi shr edx, 1 or edx, edi mov eax, edx shr eax, 2 or eax, edx mov edx, eax shr edx, 4 or edx, eax mov eax, edx shr eax, 8 or eax, edx mov edx, eax shr edx, 16 or edx, eax imul eax, edx, 79355661 ; 0x4BADF0D shr eax, 26 mov eax, v[0 + rax * 4] ret
2. Analysis
First of all, it loads the function argument into edx
.
mov edx, edi
Keep in mind that this is a 64-bit program, so the x86-64 calling convention is different from the i386 calling convention:
AMD64 System V ABI - 3.2.3 Parameter Passing
After the argument values have been computed, they are placed either in registers or pushed on the stack. […]
Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:
- If the class is INTEGER, the next available register of the sequence
%rdi
,%rsi
,%rdx
,%rcx
,%r8
and%r9
is used.
After that, it shifts the value in the edx
register (the function argument) 1
bit to the right, and OR’s it with the original unshifted value from edi
.
shr edx, 1 or edx, edi
Next, it moves the OR’d value to eax
, and shifts it again 2 bits to the
right. Then it OR’s them back into eax
.
mov eax, edx shr eax, 2 or eax, edx
It repeats that same process, duplicating the number of bits shifted every time:
mov edx, eax shr edx, 4 or edx, eax mov eax, edx shr eax, 8 or eax, edx mov edx, eax shr edx, 16 or edx, eax
Let’s have a look at what this is doing to the bits of our input:
Shifted | Hex value | Binary value |
---|---|---|
0 | 0x80000000 |
0b10000000000000000000000000000000 |
1 | 0xC0000000 |
0b11000000000000000000000000000000 |
2 | 0xF0000000 |
0b11110000000000000000000000000000 |
4 | 0xFF000000 |
0b11111111000000000000000000000000 |
8 | 0xFFFF0000 |
0b11111111111111110000000000000000 |
16 | 0xFFFFFFFF |
0b11111111111111111111111111111111 |
Each value shows the number after shifting and ORing it with the previous value. The first value would be our input, but as you can imagine, the result ends up being the same even if there is more than one bit set on the input value. The only thing that matters from the input is the last set bit, since every bit from there will be set.
Then, it performs a signed multiplication on the last value (currently stored
in edx
, in our example 0xFFFFFFFF
) and 79355661 (0x4BADF0D
), and stores it in
eax
.
From felixcloutier.com:
IMUL — Signed Multiply
Performs a signed multiplication of two operands. This instruction has three forms, depending on the number of operands:
- Three-operand form: This form requires a destination operand (the first operand) and two source operands (the second and the third operands). Here, the first source operand […] is multiplied by the second source operand (an immediate value). The intermediate product […] is truncated and stored in the destination operand (a general-purpose register).
Then, it shifts the eax
register 26 bits to the right.
imul eax, edx, 79355661 ; 0x4BADF0D shr eax, 26
The value of eax
after the multiplication would be 0xFB4520F3
, which would be
0x3E
(62) after shifting it 26 bits to the left.
It then uses the value stored in the eax
register as an index for the v
array. To access the index, we multiply the position by the size of each
element. In this case, 4 bytes, since the exercise specifies that they are
32-bit integers.
mov eax, v[0 + rax * 4] ret
So, what’s the item at position 62 in our array? Turns out it’s zero.
Let’s try to use another number to see how it changes:
Initial number: 0x000F0055 0b00000000000011110000000001010101 After shifting and ORing: 0x000FFFFF 0b00000000000011111111111111111111 After the multiplication: 0xEC1520F3 0b11101100000101010010000011110011 After the last shift: 59 Number at that index: 12
It looks like it’s calculating the position of the first set bit starting from the left (starting from zero). Or, in other words, the number of cleared bits after the first Shift + OR step.
For example, for 0x80000000
(0b10000000000000000000000000000000
), the first set
bit is the 32nd bit from the right, so there are 0 spaces left. For 0x000F0055
(0b00000000000011110000000001010101
), it would be the 20th bit from the right,
so there are 12 spaces left.
Finally, we could say that this is an algorithm for counting the leading zero bits of a double word. See More information.
3. C version
I made a similar version, but it’s not exactly a translation since I am using a
for
loop instead of copying and pasting the Shift + OR code 5 times.
#include <stdint.h> #include <stdio.h> static int32_t v[64] = { -1, 31, 8, 30, -1, 7, -1, -1, 29, -1, 26, 6, -1, -1, 2, -1, -1, 28, -1, -1, -1, 19, 25, -1, 5, -1, 17, -1, 23, 14, 1, -1, 9, -1, -1, -1, 27, -1, 3, -1, -1, -1, 20, -1, 18, 24, 15, 10, -1, -1, 4, -1, 21, -1, 16, 11, -1, 22, -1, 12, 13, -1, 0, -1 }; static int func(uint32_t num) { uint32_t tmp = num; for (int i = 1; i <= 16; i *= 2) { tmp >>= i; num |= tmp; tmp = num; } tmp *= 0x4BADF0D; tmp >>= 26; return v[tmp]; } #define PRINT_EXPR(E) printf("%s -> %d\n", #E, E) int main(void) { PRINT_EXPR(func(0x80000000)); /* 0 */ PRINT_EXPR(func(0x000F0055)); /* 12 */ return 0; }
4. More information
After emailing Dennis Yurichev, the owner of the challenges.re website, I discovered that this called a find first set algorithm. From Wikipedia:
In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position.
From Dennis’ math notes:
9.5.3 Leading zero bits counting
For example,
0x100
becomes0x1ff
,0x1000
becomes0x1fff
,0x20000
becomes0x3ffff
,0x12340000
becomes0x1fffffff
. It works because all 1 bits are gradually propagated towards the lowest bit in 32-bit number, while zero bits at the left of most significant 1 bit are not touched.It’s possible to add 1 to resulting number, so it will becomes 0x2000 or 0x20000000, but in fact, since multiplication by magic number is used, these numbers are very close to each other, so there is no error.
[…]
The magic number I found using just brute-force, so the readers will not be able to google it, for the sake of exercise.
The code is tricky after all, and the moral of the exercise is that practicing reverse engineer sometimes may just observe input/outputs to understand code’s behavior instead of diving into it.