Challenge 7
Table of Contents
1. Original assembly
Original challenge link: Click me.
func: 0: movzx edx, BYTE [rdi] 3: mov rax, rdi 6: mov rcx, rdi 9: test dl, dl b: je 29 d: nop DWORD [rax] 10: lea esi, [rdx - 0x41] 13: cmp sil, 0x19 17: ja 1e 19: add edx, 0x20 1c: mov BYTE [rcx], dl 1e: add rcx, 0x1 22: movzx edx, BYTE [rcx] 25: test dl, dl 27: jne 10 29: repz ret
After removing the unused labels, and renaming the used ones.
func: movzx edx, BYTE [rdi] mov rax, rdi mov rcx, rdi test dl, dl je .done nop DWORD [rax] .loop: lea esi, [rdx - 0x41] cmp sil, 0x19 ja .bigger_val add edx, 0x20 mov BYTE [rcx], dl .bigger_val: add rcx, 0x1 movzx edx, BYTE [rcx] test dl, dl jne .loop .done: repz ret
2. Assembly notes
These are some notes I considered important when translating the assembly into C code.
2.1. Function parameters
We can asume the first parameter of the function is a char*
because it’s always
using BYTE [ptr]
when reading and writing values.
2.2. Single while
loop
If we start translating to C code literally, we can see it technically checks if
*ptr
is zero once, and then does a do { ... } while();
loop. This is likely an
optimization made by the compiler, so we can simplify this into a simple
while () { ... }
loop.
2.3. Using lea
without a pointer
After the .loop
label, we load the effective address of [rdx - 0x41]
. Keep in
mind that rdx
is not holding the pointer, but the dereferenced value at this
point. This is a trick used by commonly used by compilers for performing simple
operations on values.
The translation of this instruction would be something like this: If rdx
was an
address, and we subtracted 0x41
, at what address would be located the
dereferenced value?
Or in simpler words: If I am at rdx
, and I move back 0x41
bytes, where am I?
2.4. Understanding 0x41
At first, 0x41
doesn’t look particularly special. First, we should convert it to
decimal to see if it rings any bell. 65 doesn’t give much clue, but since we
know we are dealing with a char*
, and we are subtracting this from each
character, we should look up what this number represents in the ASCII table.
We can see that it’s the letter A
, which already looks interesting since we are
subtracting the first capital letter from the input.
We can do the same with other hex values. Keep in mind that the comparison
against 0x19
is made after subtracting 0x41
, so we should look for the character
at 0x5A
, which is Z
. We are basically checking if the character is between A
and
Z
.
2.5. The SIL
register
Although the SIL
register is not commonly used, it’s used to access the lower 8
bits of the ESI
register.
2.6. Using repz
along with ret
For more information about why repz
is used with ret
, see Challenge 6.
3. C translation
The first version uses register names and is a more direct translation.
void func(char* rcx) { if (*rcx == 0) return; do { char esi = *rcx - 0x41; if (esi <= 0x19) { *rcx += 0x20; } rcx++; } while (*rcx != 0); }
If we look at the assembly, we can also see that the functions return the
initial parameter (rdi
), since it gets loaded into rax
at the start. I didn’t
add it to the C translations because I feel like it’s a detail that can just be
mentioned, keeping the C code cleaner.
This is the final function after optimizing the loops and renaming the variables.
void func(char* ptr) { while (*ptr != '\0') { char c = *ptr - 'A'; /* esi */ /* Between 'A' and 'Z' */ if (c <= 25) { /* Convert from 'A' (0x41) to 'a' (0x61) */ *ptr += 32; } ptr++; } }
After looking at the code, we can determine that the function is used to convert all uppercase letters of a string into lowercase.