Saturday, October 28, 2006

ELF - No Section Header? No Problem

During the research and development process of some private anti-malware tools of mine I have come to the conclusion that there just isnt enough documentation around on ELF malware analysis. This is obviosuly because the ratio of PE to ELF malware is probably around 100 to 1. This doesnt mean unix systems don't get infected or attacked on a daily basis. It just means the attacks are either more stealthy or dont occur as often.

Infection techniques are not neccessarily what I have been researching. Its more of the obfuscation techniques that common malware uses. Just about every ELF monkey has heard of the ELF Kickers suite of tools. One of the utilities that comes with that tarball is sstrip. Its a utility for stripping the section header from an ELF binary. In Linux, the linker and loader do not require that an ELF object of type ET_EXEC has a section header, only a program header is needed. The removal of the section header is done by changing the elf header e_shnum and e_shentsize values to 0 and then stripping or zeroing out the section header itself. Unfortunately ANY tools based on the BFD libraries are hopelessly dependent on the ELF section header. And whats worse many other open source and commercial tools can be _easily_ fooled by tweaking certain values in the section header (this is sad but true, but most will blindly follow the section header details without stopping to think thats NOT what the OS loader would do!).

SStriping the section header, and any other dead code in the ELF object is an obvious no brainer for a malware author. Not only does it save size but it makes the malware that much harder to analyze. This is nothing new, sstrip has been around for 5 years. Plenty of discussion has raged over this topic. What I find most surprising is the lack of section header rebuilding tools or tools that completely ignore the section header and parse the ELF object correctly (I am told IDA does this correctly, I do not own a copy). So in this post I present to you the correct way to partially rebuild an ELF section header while grabbing some more interesting data along the way (my example is rebuilding a partial symbol table). I present this information _not_ as something new, (like I said before sstrip is 5 years old now), but as information to help others write better tools in the future.

First off lets assume the ELF object we are analyzing has had its entire section header sstriped away. Lets also assume this binary was compiled with GCC and does have some symbol relocation happening at runtime. The very first thing we want to do is parse the program header for the DYNAMIC segment. Once we have the address of the DYNAMIC segment we want to iterate over that segment and use the Elf32_Dyn struct that is defined in /usr/include/elf.h

First let me explain what the dynamic segment holds. The dynamic segment is actually the start of the .dynamic section in a gcc compiled binary. It contains a wealth of values that will help us to begin piecing back together our ELF section header. Each one of our dynamic entries is going to contain a different value stored in the d_tag member of the struct. For example in our case if dyn->d_tag == DT_REL then this means that dyn->d_un.d_val == 'start of of a reloctable section'. In the case of our gcc binary its probably the .reldyn section. Many bits and pieces of the section header can be pieced back together this way. Offsets for sections such as .interp, .ctors, .strtab, .hash, .symtab, .reldyn and .relplt can be found using the DYNAMIC segment. Most of these sections will have to be parsed to find their other section header values such as size and sh_link members etc...

I can't very well end this post without a good example. So here is how to parse the symbol tables of a binary thats been sstriped using the data from the DYNAMIC segment.

Steps:

#1. Parse the program header to find the DYNAMIC segment
#2. Grab the address of your strtab section by finding a DYNAMIC segment entry with d_tag type of DT_STRTAB
#3. Grab the address of your symbol table by finding a DYNAMIC segment entry with d_tag type of DT_SYMTAB
#4. Get the size of the string table by reading the nchains value from the Hash table which you can find from the DYNAMIC segments d_tag of type DT_HASH
#5. Loop through the symbol table using the Elf32_Sym struct. Refer to the location of your string table and use the st_name value to find the string that corresponds with the symbol entry your parsing.

http://chris.rohlf.googlepages.com/phdr_syms.c.txt Please go there to find the code that accomplishes steps 1 through 5. I make no guarantee or warranty on this code whatsoever :)

One could easily build upon the code above to find the complete symbol data for a sstriped object by also parsing the relocatable sections of the binary by looking for dynamic segment entries with d_tag = DT_JMPREL || DT_RELDYN and using the Elf32_Rel structure. Thats all for now though. Thanks for reading.

UPDATE: Cleaned up some errors in the old code and now uses the hash tables nchains value to find the # of symbols

Tuesday, October 17, 2006

Nvidia Overflow

This Nvidia driver bug is a nasty one. I looked at the POC code and started thinking about all the different attack vectors for this. Quite scary. Think about it, this is a driver bug that can be reached by manipulating client software (malicious website, anything that talks to X etc...). Its about time someone started looking at closed source software in Linux. Much credit to Derek Abdine for this find.

Tuesday, October 10, 2006

Resolving ELF Relocation Name / Symbols

Updated October 4th 2007

I have updated this post to be more accurate as at least 5-10 people a day find it looking for information on ELF relocations/symbols.

From the ELF 1.2 standard:

"Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data."

Each 'call' instruction in your ELF object that calls a function located in a shared object (the function puts() for example) is instead going to call an address in the Procedure Linkage Table (.plt). The PLT is going to resolve the functions real address at runtime. This is because we don't know where the shared library will be loaded in memory at runtime. Lets look at an example ELF object:

Heres a call to a location '0x804833c' within our .text segment:

$ objdump -d example-elf | grep 804833c | grep call
804843f: e8 f8 fe ff ff call 804833c

Here is our PLT. Notice the address '0x804833c' that was called. It's just a jmp to the location at *0x8049684.

$ objdump -d example-elf | grep "section .plt:" -A 31
Disassembly of section .plt:

080482fc <__gmon_start__@plt-0x10>:
80482fc: ff 35 70 96 04 08 pushl 0x8049670
8048302: ff 25 74 96 04 08 jmp *0x8049674
8048308: 00 00 add %al,(%eax)
...

0804830c <__gmon_start__@plt>:
804830c: ff 25 78 96 04 08 jmp *0x8049678
8048312: 68 00 00 00 00 push $0x0
8048317: e9 e0 ff ff ff jmp 80482fc <_init+0x18>

0804831c <__libc_start_main@plt>:
804831c: ff 25 7c 96 04 08 jmp *0x804967c
8048322: 68 08 00 00 00 push $0x8
8048327: e9 d0 ff ff ff jmp 80482fc <_init+0x18>

0804832c <__stack_chk_fail@plt>:
804832c: ff 25 80 96 04 08 jmp *0x8049680
8048332: 68 10 00 00 00 push $0x10
8048337: e9 c0 ff ff ff jmp 80482fc <_init+0x18>

0804833c :
804833c: ff 25 84 96 04 08 jmp *0x8049684
8048342: 68 18 00 00 00 push $0x18
8048347: e9 b0 ff ff ff jmp 80482fc <_init+0x18>

0804834c :
804834c: ff 25 88 96 04 08 jmp *0x8049688
8048352: 68 20 00 00 00 push $0x20
8048357: e9 a0 ff ff ff jmp 80482fc <_init+0x18>
(Remember that little * means the address at 0x8049684) If we look at whats at *0x8049684 we find an entry in the Global Offset Table (GOT) as shown as below.

$ objdump -s example-elf | grep got.plt -A3
Contents of section .got.plt:
804966c 98950408 00000000 00000000 12830408 ................
804967c 22830408 32830408 42830408 52830408 "...2...B...R...

At 0x8049684 we find another address '42830408' which in little endian is 08048342. If you go back up and look at our PLT you will find at 0x8048342 is a push instruction which is directly after our first jmp. It looks like this "push $0x18". 0x18 (24) is the offset into our relocation tables. That is followed by a jmp instruction back to the beginning of our PLT "jmp 80482fc". You may have noticed the beginning of the PLT looks slightly different then the rest of it. These first two instructions "pushl 0x8049670" and "jmp *0x8049674" are important. The first pushes the address at 0x8049670 on the stack (which points to our GOT), and then calls the address located at *0x8049674, which again is located in our GOT. Both of these addresses at those locations are going to be 0x0 in your ELF file, because they are filled in at runtime. At runtime the first value will be a number which identifies the particular library being used and the second will be an address of the linkers symbol resolution routines. These routines will use that offset of 0x18 that was pushed onto the stack earlier to resolve the correct relocation.

This process is called lazy linking, because the relocation is only resolved at runtime when it is needed. After the first time its looked up by the linker, the linker will then edit it's GOT entry, so that when the PLT performs its first jmp *, it will jump directly to the function instead of back to the PLT to push its offset into the relocation tables.

Now, how do we get symbol names for relocation entries? After all we want to see 'snprintf' instead of '0x8049688' ...

When a section has type SHT_REL it contains a bunch of structures that look like this:

typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;


Lets look at readelf reading the relocation entries on an ELF object:

$ readelf -r /testbins/sha1

Relocation section '.rel.dyn' at offset 0x420 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
0804b54c 00001106 R_386_GLOB_DAT 00000000 __gmon_start__
0804b598 00000505 R_386_COPY 0804b598 stderr
0804b59c 00000d05 R_386_COPY 0804b59c stdin

Relocation section '.rel.plt' at offset 0x438 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0804b55c 00000107 R_386_JUMP_SLOT 00000000 feof
0804b560 00000207 R_386_JUMP_SLOT 00000000 putchar

Where do all these entries for each symbol come from?! Offset/Info/Type/Value/Name I didnt see any of those in that struct! Well most of these values are going to be pulled from the r_info field, or using a value from the r_info field to find what we are looking for. The offset value is where in our PLT you can find the relocation. If you were to disassemble your binary you would see a jmp to this address at the correct offset inside the PLT. The field we want to look at here is the r_info field. There are two (well actually three, the third is a combination of 1 and 2) important macros held in elf.h we will be using.

#define ELF32_R_SYM(val) ((val) >> 8)
#define ELF32_R_TYPE(val) ((val) & 0xff)

Each of these macros is designed to take in the r_info field and output a different result. Lets back up a step. The section header that has type SHT_REL (the relocation section we are looking at) is going to have another member called 'sh_link'. This holds an important value. The sh_link value is a pointer to another section thats going to hold symbolic information for these relocations - including its function name string. It is typically the section labeled 'dynsym' by GCC. This is readelf parsing the section header of /bin/ls (cropped for readability).

$ readelf -S /bin/ls
   There are 26 section headers, starting at offset 0x126d8:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[ 4] .dynsym DYNSYM 080484a0 0004a0 0006b0 10 A 5 1 4
...
[ 8] .rel.dyn REL 08049170 001170 000028 08 A 4 0 4
[ 9] .rel.plt REL 08049198 001198 0002f0 08 A 4 11 4
...
Notice the sh_link member of both sections 8 and 9. Its the #4 which points us to our dynamic symbol table section. Thats where the symbol names that correspond with our relocation entries can be found. So lets recap. #1 scan the section header for a section that is type SHT_REL #2 grab its sh_link member value.

Ok so where does that leave us, we have a bunch of numbers but how do we know what matches up where. Well as we iterate through the relocation table entries we run each r_info value through our ELF32_R_SYM(val) macro and we get a number back. That number corresponds to an entry in the dynamic symbol table (or wherever our sh_link member points us to). Parse that entry in the dynamic symbol table and your symbol name will be resolved.

What about the other macro? Well the other ELF32_R_TYPE(val) is going to tell us what kind of relocation this is. These values are defined in elf.h as R_386_GOT32, R_386_JMP_SLOT and so on. Their definitions as well can be found in elf.h. These are very helpful for trying to find out if this relocation is a function or not (it usually is when found in the rel.plt section).

Remember, not all ELF objects will have a section header. In those cases you can use the program header, to find and parse the dynamic segment. Which will also give you the address of your relocation tables, dynamic symbol tables and more. If you find any errors in this post, feel free to let me know.

Free BSD Local DOS Bug (IDefense)

This summary is not available. Please click here to view the post.

Wednesday, October 04, 2006

The Linux Kernel Binary Format Handler (bugs?)

Ok well tonight I read this short paper from shellcode.com.ar. It covers a supposed 'bug' in the linux kernel ELF loader. It does not report this bug as a vulnerability, these guys have released some quality content in the past so I will not immediately toss this one to the side with a 'you have to be root.." blah blah. Ok well its true you have to be root to insert a kernel module, which is the technique they have used for hijacking the binary loader. Basically the bug can be described very easily, an attacker (who has already gained root access) can hijack the linked list of binary format handlers in the kernel. The default action of the kernel is to check an executable's format at runtime against that newly registered binary format handler instead of the pre-existing ones first. The authors of the paper said from the beginning it was a technique that could be used by a rootkit writer to stay hidden. I for one applaud their work for one simple reason , rootkit detection software now has one more place to look. But thats about it. Id like to see the kernel check against ELF, a.out, COFF before checking against the newly registered format but I am a firm believer in a static kernel anyway (I dont enable the ability to load kernel modules on my boxes).