Obfuscation is quite a debated topic, when it comes to software development.
There's obviously a tons of reason why you should not try to obfuscate your code.
But sometimes, for some very specific part of your software, you may consider trying it.
Now there's a lot of different ways to achieve this, and in my humble opinion there's no silver bullet when it come to obfuscation.
At the end, what you are trying to do is to prevent someone else to reverse-engineer your code, to prevent someone making sense of your logic and algorithms.
In such a situation, making your code «hard to read» may not be the best solution.
No matter how complex your code is, an experienced reverse-engineer will most of the time figure out what you are doing.
In such a context, psychological warfare is often more effective than pure obfuscation.
The goal here is to break down the attacker, and make it quit.
That being said, you'll often want to combine the psychological effect with some kind of obfuscation.
Again, lots of different ways to achieve this, depending on the language you're using, and your target platform.
But today I want to look at some techniques you can try from a machine code perspective, on x86-64 platforms.
LLDB
or GDB
also provide disassembly.
Now what can you expect from such a tool?
Well, let's take a look at the following C program:
#includevoid foo( void ); void foo( void ) { printf( "hello, world\n" ); } int main( void ) { foo(); return 0; }
A disassembler might give the following output:
_foo: push rbp mov rbp, rsp sub rsp, 0x10 lea rdi, qword [ aHelloWorldn ] mov al, 0x0 call imp___stubs__printf mov dword [ rbp + var_4 ], eax add rsp, 0x10 pop rbp ret _main: push rbp mov rbp, rsp sub rsp, 0x10 mov dword [ rbp + var_4 ], 0x0 call _foo xor eax, eax add rsp, 0x10 pop rbp ret
Now even if you're not used to x86-64 assembly code, you might figure out what the program is doing pretty quickly.
main
calls foo
, which calls printf
, with some argument.
Among the different obfuscation techniques, hiding code or trying to fool the disassembler is quite common.
The goal here is to make the disassembler program go crazy, and make it output garbage instead of the actual instructions.
You should of course not rely solely on this.
An experienced reverse-engineer will obviously run your software in a debugger, so machine-code obfuscation won't resist this.
But still, it's great if you can prevent some kind of static analysis.
Now how can we achieve this on x86-64 platforms?
Well, the x86 instruction set is quite a complex beast.
Unlike most of the RISC
architectures, x86 instructions can have an arbitrary length.
The CPU will (very) basically read the first byte(s) of an instruction, and depending on its value will read additional bytes for the instruction operands.
As an example, in x86-64 assembly:
xor eax, eax mov eax, [ edi ]
These two instructions will have a different lengths.
First one will be 2 bytes:
31 C0
While the second one will be 3 bytes:
67 8B 07
This means that, depending of the instruction, the CPU will expect trailing bytes for the operands.
And so will the disassembler.
This is a neat opportunity for us, as it implies we can in theory output raw bytes that correspond to valid x86 instructions and omit the operand bytes.
This way, the disassembler will expect the operands and will try to read them. It will then just miss the next instructions, and go crazy because it will then read at a wrong offset.
The hard part here is that we obviously want our code to be executable.
Trying to execute incomplete instructions will most certainly result in a crash.
So we want these incomplete instructions in a dead branch of our code; that is a branch that will never be executed, but that will still be read by the disassembler.
Something like:
if( false ) { /* Incomplete instructions here... */ }
Of course, we don't want this to be so obvious, so we'll need to use some other technique here.
When calling a function, using the call
instruction, a few things happen before the target code is reached.
Mainly, the return address is pushed onto the stack. Later, from the called function, when the ret
instruction is executed, the CPU will pop that return address, and jump back to the caller.
This pattern is recognised by all disassemblers. That's how they are able to generate complete call graphs.
Again, this is a nice opportunity for us.
First of all we might be able to break some disassemblers, as many will expect a function to have a single ret
statement.
Their ability to generate a call graph will then be greatly compromised.
Then it will help us insert dead code in our program, that will still be seen by the disassembler as actual and valid code.
In order to do this, we can basically place another return address on the stack, corresponding to a valid portion of our code.
When ret
is executed, the CPU will jump to that portion instead of the original caller, giving us control again.
Just like a local jmp
, but hidden for the disassembler.
Let's see how we can implement this in assembly.
But first of all, let's take a look at stack frames.
The stack is a memory region that is basically used as scratch memory for functions.
This is where local and temporary variables are stored.
Now to avoid overriding values from other functions, each called function will create its own stack frame.
The CPU has two registers for the stack: rbp
and rsp
.
The first one is the base pointer, and contains the start address of the local stack frame.
The second is a pointer to the top of the stack. When using push
or pop
instructions, this one will change accordingly.
This is why functions usually start with the following prologue:
push rbp mov rbp, rsp
The first instruction saves the original base pointer, and the second one sets it to the top of the stack.
This way, we can have our own stack space, for the current function.
So let's start by saving the registers we are going to use, so we can restore them later and hopefully don't break anything:
push rax push rcx push rdx
Then we'll save the current stack pointer (rsp
) into the rcx
register. Again so we'll be able to restore it later.
mov rcx, rsp
Now we'll reset the stack pointer (rsp
) to the base pointer (rbp
).
Doing this effectively resets the current stack frame to where it was just before the call.
mov rsp, rbp
This is where it gets interesting, because as we moved the stack pointer, the next two values stored in the stack that we can pop are the original base pointer and the return address.
We'll pop the base pointer in rbp
, and the return address in rax
. We'll restore them later:
pop rbp pop rax
Now we can push another return address, of a location we know:
lea rdx, [ rip + 97 ] push rdx push rdx
Here, the first line loads a specific address into the rdx
register.
rip
is the current instruction pointer; that is where we are right now. +97
is simply an offset from here, and is the target code we'll want to execute.
We'll then have some room for garbage code, and some other neat tricks.
We'll obviously push this new return address, as we popped the old one earlier.
Note that we do it twice, so the stack keeps its original alignment (it used to have the base pointer as well).
And finally:
ret
This is where the magic occurs. For most disassemblers, our function is over here, as we've hit a ret
instruction.
Control flow should return to the caller, but instead, it will simply jump a few bytes further, as we've overridden the return address.
What can we do from here?
Well, we have some room until we reach to code portion that will be executed.
Let's try to fool the disassembler a little more.
Now what logically comes after the ret
instruction of a function?
The start of another function.
Let's do one.
Remember this is completely dead code, that will never be executed.
We are just doing some stuff that will seem logical for a disassembler.
Let's start by a standard stack frame:
push rbp mov rbp, rsp
What can we do next?
Well, disassemblers are smart.
You can try to fool them in many ways, but sometimes they'll eventually recover.
They do this by analysing your program's flow. That is, your jmp
and call
instructions.
Even if the code seems completely garbage (like if you used the incomplete instruction trick), they might be able to recover if they see a jump to a valid code location.
Instead of reading garbage, they'll just start disassembling again from that location.
So I found it can actually be useful to write bogus jump instructions, jumping anywhere in your code.
This will usually mess a bit more with the control graph, and the disassembler's ability to recover.
As we have room for some garbage code, let's do this:
xor rax, rax cmp rax, rax
This just zeroes the rax
register, and compares it with itself.
Useless, but remember this is dead code.
Now following a cmp
instruction, we expect some kind of branching:
je j0
Meaning if the comparison was true (it surely is), jumps to the local j0
label, that we'll define later.
And let's continue a bit more, with other random comparisons, and other jumps:
cmp rax, rdi je j1 add rax, 0xCAFE cmp rax, rsi je j2 cmp rax, rdx je j3 cmp rax, rcx je j4 jmp 24[ rip ]
We are here just comparing useless stuff with useless stuff, and jumping to some local labels.
Again, just to mess with the control graph.
The last instruction jumps to a random location, based on the current instruction pointer.
So this is just:
if( ... ) { goto ...; } else if( ... ) { goto ...; } else { goto ...; }
Now we'll simply define these local labels, and in each one jump to another random location:
j0: jmp 16[ rip ] j1: jmp 48[ rip ] j2: jmp 64[ rip ] j3: jmp 128[ rip ] j4: jmp 256[ rip ]
Now, at this point, the disassembler should be pretty confused.
This is time for us to go back to real code.
Remember our return address override?
It was [ rip + 97 ]
.
That +97
offset brings us just here, accounting for all the previous instructions we wrote.
So let's undo all the mess we've done:
pop rdx push rax
We saved the original return address in rax
. So we'll restore it in the stack, and just before, as we pushed it twice to keep the stack alignment, we'll just pop it into rdx
, which is a safe register for us to use at this point.
The original base pointer was saved in rbp
previously, let's push it again:
push rbp
And now we can simply restore our previous stack frame (rsp
was saved to rcx
):
mov rbp, rsp mov rsp, rcx
And that gives us the opportunity to restore the three registers we earlier pushed on the stack, because we were going to use them:
pop rdx pop rcx pop rax
At this very specific point, it's just as if nothing happened.
The stack frame and the registers are in the exact same conditions.
This is great, because it means our software will run unaffected.
But it's also great because we produced a lot of garbage for the disassembler.
Now there's one more thing we can do, before continuing normal code execution.
We spoke about incomplete instructions, but we never actually used them.
Now is the right moment.
The idea of offsetting a disassembler is great, but I found in practice that many disassemblers are quite robust to incomplete instructions.
But now that we messed so much with its ability to generate a control graph, and detect that we're actually inside a single function, it might be quite efficient.
Now we're still in a valid code section, although it might not be recognised as such by the disassembler.
Let's do some shit, and jump to another valid code section:
push rax xor rax, rax jz done
Pushing the rax
register on the stack, zeroing it, and jumping to a done
label.
Nothing scary here, I don't expect the disassembler to see the done
label because of the mess we just did.
Now let's output an actual incomplete instruction:
.byte 0x89 .byte 0x84 .byte 0xD9
For an x86-64 processor, that is:
0x89
is the opcode for the mov (r/m16/32/64 r16/32/64)
instruction.
0x84
(1000 0100
) is MOD-REG-R/M
for a four byte displacement following SIB
with RAX
(000
) as destination register.
0xD9
(1101 1001
) is SIB
for 8
as scale, RBX
as index (011
) and RCX
as base (001
).
As you can see, the four displacement bytes are omitted, so the instruction is incomplete.
Assuming the disassembler is able to reach this location, this will fool it as it will try to interpret the next instructions as the displacement bytes.
Note that the complete instruction, if complete, would translate to:
mov rax, [ rcx + rbx * 8 + displacement ]
Now we simply have to declare our done
label, pop rax
, and we can continue normal execution:
done: pop rax
We're basically done, and I hope you found this article interesting.
Now remember this is a basic approach to some kind of obfuscation, for a specific platform.
In practice, I found that mixing different techniques in some specific way usually gives the best results.
That being said, disassemblers are very smart, and getting smarter each day.
Each one uses different heuristics, so as I said at the beginning of the article, there's really no silver bullet.
But if you're looking into obfuscation, my only hope is that this article gave you some ideas… : )
As alsways, you can find the code for the article on my GitHub.
Cheers!