x86 assembly patterns
In Exploit Development and Malware Analysis, it’s not about reading assembly. It’s about understanding and directing the control flow. That is valuable. Who needs all these countless lines of instructions anyway?
- 1 Assembly references
- 2 IA 32 general purpose registers
- 3 WORD is a set of bits
- 4 Addressing - relative and indirect
- 5 Pointers - can easily be understood via basic x86 assembly
- 6 System Calls - why do reversers care about calls and Handles on Windows?
- 7 Conditional jumps
- 8 Calling convention and function reverse engineering
- 9 Identifying functions
- 10 cdecl, stdcall, fastcall - what does that mean?
- 11 Return values from functions
- 12 Summary
In case you are writing Shellcode, you’ll need the low-level controls of the assembly to create compact pieces of injection code. It needs to fit and re-direct the control flow.
In the case of Malware Analysis you usually do not have the source code, and therefore you fall back to the low-level, and you have to use a disassembler.
Like two sides of the same coin: assemble Shellcode, disassemble Malware. Who can do the one thing, also can do the other. That’s where the value is; in the control flow creation or reverse-engineering. The rest is clutter.
But many beginners are impressed if they fire up IDA Pro or OllyDBG for the first time. After reading this wiki article, you won’t be impressed anymore. You will be impressive.
And the only thing left to think about is whether you want to write pop pop ret
or the hex equivalent. Which is?
tl;dr: There are assembly reading strategies. There is a structure to it. Make use of it, save time, get the job done. That’s what this wiki page is about.
Assembly references
The Intel manuals are the best and most comprehensive reference I know.
This x86 disassembly wikibook is short and useful.
This wiki article focuses on patterns and reading strategies rather than concepts. If you are genuinely interested, read Chris Eagle’s IDA Pro book. I, personally, disliked Randall Hyde’s “The Art of Assembly Language” because it’s too artificial.
IA 32 general purpose registers
These registers can be used for everything. Commonly, there are the following orientation points:
EAX
usually has return values (depending on the calling convention, of course). Often used for addition and multiplication.ECX
is used as a counter and the this
pointer in C++ (depending on the compiler, of course).EBP
is used to reference args and vars (from the stack)ESI
and EDI
are typically used for memory managementEIP
is the finger, which points to what will be executed next. That’s why shellcoders like it.EFLAGS
is used to represent computation results.CS
, DS
and SS
point to the code, data and stack segment of the process.
WORD is a set of bits
In IA-32 a register is 32 bits. That is a double-word or a DWORD.
That means EAX. EBX, ECX, EDX … (the IA32 registers) get accessed as a DWORD. That is 4 bytes. 8 bit x 4 = 32 bit = DWORD. Simple as that. You can address them like this:
EAX = DWORD = 32 bit
AX = low WORD = 16 bits
AH = high byte = 8 bits
AL = low byte = 8 bits
That is relevant for the mov
insts. We will use AL
. It still happens, even in x86-64. Obviously…
Addressing - relative and indirect
In IDA Pro disassembly, [EBP+foo] may be something like [EBP+0x42] or [EBP-0x42]. There will always be a +
because arithmetically, 7 + (-2) == 7 - 2
. IDA dislikes to display numbers in its disassembly listings. It will always be +foo
even though foo is negative.
x86 uses “relative addressing”. But with IDA Pro this is not a real problem.
Pointers - can easily be understood via basic x86 assembly
The magic word to understand a pointer is “indirection”.
“If you want the value of this, go there.” A pointer can point to a memory location. Accessing this memory location is called “dereferencing”. Sometimes memory structures are large, and several functions work on them. You don’t copy them around if you can avoid it.
Take a look at this super short C++ code, and it’s assembly after compilation:
int val = 5;
int *ptr = &val;
Sleep(*ptr);
In this case, it’s basic:
5 is
mov
ed intoval
.then
lea
loads the effective address ofval
intoeax
. In C++ this is equivalent to the&val
.EAX holds the address of (
&
)val
.and this stuff in
EAX
ismov
’ed to[ebp+ptr]
which is the address of the points-to variable*ptr
.
For the win32 Sleep()
syscall this is moved into ECX
, where the dwMilliseconds
parameter is expected.
Further info on the calling convention and on the Windows syscalls are in the next section. Here the emphasis is that the pointer operations can be simply explained with assembly.
System Calls - why do reversers care about calls and Handles on Windows?
Sys calls are points of interaction between the process and OS or hardware. On Windows, you have the Windows API, which is a facade. On Linux, you have the syscall.h.
Take a look at this C++ code, which uses a WinAPI call:
bool write_more_garbage()
{
LPSTR text = "Hello, world!\n";
DWORD charsWritten;
HANDLE hStdout;
hStdout = GetStdHandle(STD_OUTPUT_HANDLE);
WriteFile(hStdout, text, 14, &charsWritten, NULL);
return 0;
}
And in IDA Pro (with PowerPoint magic):
(sorry, typo in the slide, I know)
The parameters for the WriteFile
syscall are push
’ed on the stack in the order which is indicated by the MSDN documentation 3.
The definition of the WinAPI call is given there:
A Handle is a Windows concept 3. It’s used to handle system resources. A
call
is used to retrieve the handle, which we need.You see that the values are pushed in reverse order because the variables are put onto the stack. A good reading strategy is to go to the
call
you are interested in and to read upwards.The Run Time Checks can be ignored here, but this topic is generally relevant for exploit development 1. But not today.
The WinAPI call does not use cdecl
. It uses stdcall
:
The difference between cdecl
and stdcall
is subject to another section.
The point of this section is, that arguments for WinAPI syscalls are laid out by the calling function. The parameters are pushed onto the stack here (that can be different for other calls). Windows needs Handle objects to handle resources. It is quite common to track Handles with a debugger, to start reversing an application via its points of interaction with the system. OllyDBG can do that.
Conditional jumps
Conditional jumps are contextual instructions. The conditions are usually in a test
or cmp
instruction before a jz
or ja
. Conditional jumps are mostly used for loop constructs and if-then-else compound expressions.
Reading heuristic: if the jumps go to two different locations it's an OR, if they go to the same location it's an AND
This is simple:
if the one condition of an
AND
compound expression fails, it can never beTrue
. Therefore the 2ndcmp
here gets skipped.for an
OR
compound expression that is different. Both expressions have to be evaluated.
This can speed up reading of assembly a lot.
Reading heuristic: dashed lines indicate a loop
To illustrate this, I use BinNavi. It has nicer graphs which look better. For this level of reverse engineering, BinNavi is not typically needed, but for the looks of it, it’s worth a shot.
Example 1
You can see the local variables at the bottom. So, you don’t need to right-click like in IDA Pro.
The usage of the variable bar
is tracked. There is a cmp
between bar
and 5h
. In case the flag for jg
is negative, we follow the red line, and iterate further. That is the head of the loop.
The loop body is on the right. So EBP+BAR
needs to be greater than 5 (decimal) before we stop looping over the left block (loop body).
In the loop, body block is an add
. 1 is added to the register, which holds the value of bar
; after the mov
. This loop body has a non-conditional jump (jmp
) back to the loop head, where the condition then is re-evaluated.
Now, if bar
is larger than 5 the right block is executed. Where all that iteration work is undone, and bar
gets set to 0 again. What a pity… now we have to do another loop example.
This is an example of a while (bar >= 5)
loop. If it’s greater than 5, we stop. And then we move on with the control flow.
On the left, this is the function epilogue. Easy to spot by the retn
and the mov esp, ebp
.
Example 2
Here is another loop, as promised:
The local loop variable i
is highlighted. It's loaded in EAX
. The add
instruction increments the value. Nothing new here.
Unless the jz
after the cmp
jumps out of the loop, we will have an unconditional jmp
backward.
Practically, this illustrates the difference between a while
loop (example 1) and a for
loop (example 2). The for
loop for example 2 looks like: for (int i = bar; i != 5; i = i + 1)
Calling convention and function reverse engineering
Arguments of a function, like function(var1, var2, var3)
will be handled depending on the calling convention. There are different types of calling conventions, which might be chosen depending on the optimization strategy of the compiler. Is it about getting a small binary or a fast execution?
Reading heuristic: if it's relatively positive to EBP, it's a function argument. If it's relatively negative to EBP it's a local function variable
Generally, the stack grows into lower address space.
If you see a variable that is relatively addressed to EBP, like [EBP+8]
- with a positive offset - that means it’s above the frame pointer. That usually means it’s a parameter.
If you see something like [EBP-8]
it is usually a local variable inside the function.
Identifying functions
There is a function prologue and an epilogue. Functions are sometimes called basic blocks. For example, when it’s about code coverage and binary instrumentation. Latter is a fancy word for debugging, actually. Very academic.
Reading heuristic: from `push ebp` to `retn`
You can identify basic blocks, which usually enclose a function, by looking between the instructions:
The mov
and pop
instructions are equivalent to leave
.
Note that compilers sometimes inline functions inside of each other. Then you will see the prologue in the middle of a basic block / function. That’s a performance optimization technique.
cdecl, stdcall, fastcall - what does that mean?
Take a look at the assembly for the write_more_garbage()
function again:
After the WinAPI call block, which is highlighted in yellow, you can see that there are these RTC checks again. But what sticks out is that after the RTC call
there are 5 pop
instructions. These take 4 byte (32 bit) of the stack, each. This clears out the call stack we needed for the WinAPI call. The code was compiled without optimizations. Otherwise, this can be done faster in one instruction. You can thank Visual C++ for that.
The write_more_garbage()
function is using cdecl
. IDA Pro indicates this on top of a function block.
The main
function also uses cdecl
, and the loop_main()
function as well. The latter takes 3 arguments, which are push
’ed before it’s called. Then the called function has its function prologue where ESP
is mov
’ed into EBP
. And that is also the reason why parameters are referenced relatively negative to EBP
. They are on the stack before EBP
is initialized in the called function.
As you saw WinAPI calls make use of stdcall
; and not cdecl
. In practice, that means that you will see the callee cleaning up the stack. The WinAPI call will not do that. But the arguments are push
’ed before the WinAPI call
. So stdcall
is similar to cdecl
, but the WinAPI calls are too lazy to clean up the stack. Or Microsoft thinks it’s faster that way. Or both.
If a function uses fastcall
you will see the parameters not being pop
’ed. At least some of them will be mov
’ed into registers because these have much faster access times.
C++ compilers may also use thiscall
, which is similar to cdecl
. We will see that the called function cleans up the stack. But C++ reverse engineering is a difficult endeavor. Generally, the this
pointer is pushed to the stack last. ECX
in thiscall
will hold the this
pointer. However, do not take this for granted.
Return values from functions
Return values will often be in EAX
. That being said, there is a confusing movzx
instruction used below.
This is because we don’t know the high bytes of EAX
. In AL
are the 8 low bytes of EAX
. The rest gets zero’ed out with movzx
. movzx
is useful for Shellcode as well. The return
value by the function simply doesn’t need the entire register.
I compiled the code with Visual C++, to have a real-world example. I don’t like reverse engineering tutorials or articles with examples that are too artificial. You have to teach people to spot the patterns. Like in driving school, you are being taught how a STOP sign looks. And once you learned that spotting, it is easy. Which doesn’t mean that everyone stops. Computers are different in that regard. For now. That’s beside my point, that malware analysis and exploit development have in common, that it’s about patterns.
With the xor
EAX is reset to 0. A simpler expression is mov eax, 0
. But xor
is faster. In exploit development or code deobfuscation, you will need to work a lot with XOR.
Summary
Now we know how to read x86 assembly. It’s not crazy hard and doesn’t take a lot of time if you spot the patterns of the control flow. Everyone can do it because assembly isn’t complex. It’s just a very low level and verbose, therefore.
x86 assembly is a must-know for a security engineer, who needs to deal with Malware or Shellcode. That isn’t for everyone, but at least the basics are. And this is an extremely basic summary of how it works. Working with IDA Pro and BinNavi is like driving, like navigating, through a binary. At some point, there is a familiarity with the conditions and the control flow, and then it really doesn’t matter if you fire up a debugger or only your disassembler. As long as you get the control flow and understand what you are doing. I usually do both.