Challenge 10

1. Original assembly
2. Assembly notes
3. C translation
4. Missing subtraction
5. Why does the alignment work?

1. Original assembly

Original challenge link: Click me.

As the author says, this snippet is short but tricky.

func:
    lea     eax, [rdi - 1 + rsi]
    neg     esi
    and     eax, esi
    ret

2. Assembly notes

Remember that the lea instruction (Load Effective Address) is often used by compilers for basic arithmetic. That lea instruction is equivalent to the following.

; eax = rdi - 1 + rsi
mov eax, edi
sub eax, 1
add eax, esi

3. C translation

I had trouble figuring out this challenge because I was focusing on understanding the assembly, before trying to translate it to C. By looking at those individual instructions, it’s hard to see the relationship between them. However, if I had looked at the C code, I might have been able to figure out what it does.

int func(int a, int b) {
    return (a - 1 + b) & (-b);
}

This might look a bit more familiar. I used similar code in my libdetour hooking library for calculating the address of the page where an address is located, and then passing that to mprotect(). I had to ask Dennis Yurichev, the author of the challenges, and I felt a bit ashamed because I tried to understand the assembly directly before translating it into C code.

The function aligns a number n to a boundary, as long as that boundary is a power of 2. The original C code would be something like this:

int align2boundary(int n, int boundary) {
    return (n + boundary - 1) & ~(boundary - 1);
}

For example, if we have an address 0x123, and we want to know where the next 4KiB-page starts, we could use our align2boundary function like this:

align2boundary(0x0123, 0x1000); /* Result: 0x1000 */
align2boundary(0x2345, 0x1000); /* Result: 0x3000 */

4. Missing subtraction

However, the C translation and the real C code is not quite the same. Where did the ~(boundary - 1) go? The compiler optimized this to a neg instruction because subtracting one and then doing a bit-wise NOT is the same as negating the number.

   -1   0b1111111111111111
   -2   0b1111111111111110
   -3   0b1111111111111101
   -4   0b1111111111111100
   -5   0b1111111111111011

    4   0b0000000000000100
  4-1   0b0000000000000011
~(4-1)  0b1111111111111100

    5   0b0000000000000101
  5-1   0b0000000000000100
~(5-1)  0b1111111111111011

As you can see, ~(n-1) and (-n) is the same.

5. Why does the alignment work?

Let’s try to reduce an example first.

/* align2boundary(0x3456, 0x1000) */
0x3456 + 0x1000 - 1 & ~(0x1000 - 1)
         0x4456 - 1 & ~(0x1000 - 1)
             0x4455 & ~(0x1000 - 1)
             0x4455 & ~(0x0FFF)
             0x4455 & 0xFFFFF000
             0x4000

You might be wondering, why do we need the subtractions? Each subtraction serves a different purpose. Let’s start with the right-most subtraction, the second one. It ensures that we can use an input bigger than the boundary itself.

If you subtract one to any power of 2, you will set to 1 all the bytes to the right of the one that was set previously. For example:

   0x8  0b1000
   0x7  0b0111

  0x10  0b10000
   0xF  0b01111

0x2000  0b10000000000000
0x1FFF  0b01111111111111

If we do a bit-wise NOT to this value, we will set to 1 all the bytes to the left of the bit that was originally set, including that bit itself:

Original:     0x8  0b0001000
Subtraction:  0x7  0b0000111
NOT:         ~0x7  0b1111000

We can then do a bit-wise AND with this value for essentially discarding all bits to the left of the set bit. The following two examples will discard the lower 8 bits (0..7) of x:

x & ~(0x100 - 1)
x & ~((1 << 8) - 1)

What about the first subtraction? The left-most subtraction ensures we don’t allocate an extra “page” if we are already aligned to the boundary. For example, in the following example, the input 0x4000 is already aligned to the 0x1000 boundary, so we should return the untouched 0x4000.

0x4000 + 0x1000 & ~(0x1000 - 1)
         0x5000 & ~(0x1000 - 1)
         0x5000

However, if we subtract one:

0x4000 + 0x1000 - 1 & ~(0x1000 - 1)
         0x5000 - 1 & ~(0x1000 - 1)
             0x4FFF & ~(0x1000 - 1)
             0x4000