Friday, July 03, 2009
Leaf - Hit Tracing
Here is what my basic hit tracer, 'lhit' (included with Leaf) implements:
1. LEAF_init() - a mandatory function that must be present in all plugins. You can use it to initialize any private data structures your plugin may need, or you can leave its function body blank.
2. LEAF_interactive() - this is the plugin hook a debugger would want to call. Ideally you only want *one* plugin calling this, it doesnt make sense to have more then one. If your plugin implements this hook it will be called after all other static analysis if finished, consider it your debugger plugins main()
3. LEAF_attach(pid_t) - takes a pid_t as its only argument and will attach the debugger to your target process.
4. LEAF_set_hittracer(pid_t, breakpoints_t, int) - this is where it gets slightly tricky. Your plugin must declare a structure somewhere of type breakpoints_t. Pass the targets pid, the breakpoints structure and flag (ON/OFF) to this function and Leaf will automatically use the vector of function addresses it collected during static analysis and set breakpoints on each of them. There is no need for your plugin to manage any of this. There is also another function called LEAF_set_breakpoint, which takes a pid_t, a breakpoints_t structure, and the address you want to break on, you can use this for any other manual breakpoints you want to set.
5. LEAF_cont(pid_t) - this one is pretty self explanitory, it takes a pid_t as its only argument, and instructs the traced program to continue. At this point Leaf will handle calling wait() for you. All you have to do is inspect and handle the signal it returns. If you had used LEAF_set_hittracer and you hit one of the breakpoints it set then you will want to call LEAF_reinstall_breakpoint and Leaf will take care of putting the old instruction back, single stepping and reinstalling the breakpoint for you.
6. LEAF_get_regs(pid_t, user_regs_struct) - this will retrieve the processes registers for you.
7. LEAF_detach(pid_t) - will detach Leaf from your process.
8. LEAF_cleanup() - another mandatory plugin hook which you can use to free memory or close file descriptors, or you can leave it blank.
You will find an example hit tracer (lhit) which implements all of this in version 0.0.15 of Leaf here. Its not the best hit tracer in the world but it does the job. The debugger internals will be getting an overhaul soon, but the API should stay the same.
This new version of Leaf also contains my experimental LeafRub plugin which embeds a Ruby interpreter for scripting capabilities. An example LeafRub.rb script is also included, but I'll blog more about that later.
Saturday, June 20, 2009
Fun with erase()
Over the last few months I've been knee deep in C++, you can view this as good thing or a bad thing, I for one enjoy it. I personally like finding bugs in C++ applications, as they are usually more complex then plain old C and require a bit more thought:
- Keeping track of what variables your destructor will take care of, and which it wont
- Iterators, and what methods invalidate them
- (insert your favorite C++ gotcha here)
While debugging a crash one day it occured to me that the security research community has paid very little attention to the STL and CPPism's in general. There are a few things out there like TAOSSA's delete vs delete[] and of course there is also Cert's secure coding standards. But there is very little written on exploring STL specific bugs. Maybe its all private and im just not cool enough to see it :/
I decided to document some ways STL specific bugs may be exploited. The first place I looked was containers, you know vectors/queues/lists etc... Any use of these containers probably means lots of interesting data is being stored, and considering they all have very easy-to-use methods even novice C++ developers were (ab)using them somewhere.
Most of these methods take in iterators (don't let the name fool you, they're just pointers), and tainted iterators have been a known bad thing for a long time (read certs secure coding standards). But where were the exploits? Where were the how-to's on owning an attacker influenced iterator? I decided to look into it myself.
I settled on using vectors as my first topic of interest, as they are widely used for their efficiency and ease of use. I further focused my efforts by looking at any method that added/moved/deleted multiple elements of data at a time from a container. The erase method seemed like a good candidate considering the amount of memory copies that take place under the hood.
The erase method either takes a single position within the container and removes it, or it takes a range supplied by two iterators and deletes the elements within that range. But I needed to see what it looked like under the hood. After navigating the tangled mess that is the GNU C++ templates (this is probably the real reason no one has done a lot of STL security research) I was able to isolate the relevant erase() code and find what I was looking for.
This is where you are probably getting bored, so I'll skip ahead and just tell you why you care about any of this.
Tainted iterators are a known C++ gotcha that every code auditor should know about, but in certain situations they can lead to very interesting conditions for an exploit writer. The Cert secure coding standard begins to touch on the subject of invalid iterator ranges, but labels their 'undefined behavior' as equivalent to a buffer overflow. This is true, however it can be more then that depending onthe STL implementation. When an attacker can control the range iterators passed to erase() he may be able to leak or directly overwrite memory contents or even better he can trick the STL into resizing the container to encapsulate adjacent heap memory (think 'other containers'). This opens up all kinds of doors for creative exploitation.
I would love to post those details here, but blogger mangled my write up pretty bad. So I've uploaded it here. If you spot any inaccurate technical information please let me know.
Thursday, January 08, 2009
Leaf
I posted a new project on googlecode. Leaf is an ELF reversing framework written in C. It has a built in API for developing your own analysis and output plugins. The current version (0.0.7) supports plugins written in C. The whole point of the project is flexibility in the analysis and output of the stuff your interested in. It's not just another text based disassembler, although a plugin that implements one can be easily written. In fact I released one with it and its available for download at the website. I am slowly releasing other plugins of varying quality. There are plenty of great tools for reversing on the Win32 platform, so there is no plan to support the PE format. If you want more information on it check out the googlecode link and look at the wiki. It's still beta quality and there are definitely a few bugs. I hope you find it useful.
Update: Posted Leaf-0.0.10.tar.gz at http://leaf-re.googlecode.com It now uses udis86. Lots of work still to do, but its a start.
Wednesday, June 25, 2008
BitStruct is great
require 'pcaplet'
require 'bit-struct'
# Fake protocol I made up for this example
class CustomProtocol < BitStruct
char :header, 64, :endian => :native
unsigned :length, 8, :endian => :native
unsigned :next_hdr, 16, :endian => :little
unsigned :next_tag, 16, :endian => :network
unsigned :type, 32, :endian => :native
rest :data
end
# Capture up to 1533 bytes
sniff = Pcaplet.new('-s 1533')
# Specific pcap filter so we only grab the protocol we are dissecting
pcap_filter = Pcap::Filter.new('tcp && port 34504 && src 192.168.1.10', sniff.capture)
sniff.add_filter(pcap_filter)
for pkt in sniff
if pcap_filter =~ pkt
puts pkt
struct = CustomProtocol.new(pkt.tcp_data)
puts sprintf("ASCII Header: %s\tLength: %x\tNext Hdr: %x\tNext Tag: %x\tType: %x\tData: %s",
struct.header, struct.length, struct.next_hdr, struct.next_tag, struct.type, struct.data)
end
end
Tuesday, June 03, 2008
Known API's and automated static code analysis
It's interesting that slides Halvar presented in 2004 on automating reverse engineering are entirely still relevant. He made a good point ... "no matter how stupid an analysis tool is, some programmers will make mistakes which are stupider". How true...
Friday, May 02, 2008
Self Protecting GOT
You can find the draft version 1.1 of my writeup here. If you find any mistakes let me know and I will fix them.
Friday, April 18, 2008
kmemcheck and an old bug
"kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log."The author provided a sample log file from the patch which is here. I spent a few minutes browsing it and I think it definitely shows promise for more than debugging. **Consider the case of these ELF loader vulnerabilities found by Paul Starzetz in 2004. Bug [1] is basically incorrect checking of the kernel_read() return value. Here's the bug:
...
size = elf_ex.e_phnum * sizeof(struct elf_phdr);
elf_phdata = (struct elf_phdr *) kmalloc(size, GFP_KERNEL);
if (!elf_phdata)
goto out;
retval = kernel_read(bprm->file, elf_ex.e_phoff, (char *) elf_phdata, size);
if (retval < 0)
goto out_free_ph;
...
The code above makes the incorrect assumption that kernel_read() will return less than zero if an error occurs. This is true however kernel_read() can also return greater than zero but less than 'size'. Which in this case leaves a portion of elf_phdata uninitialized. Whats my point? I'm getting to that. An attacker can potentially control this uninitialized data and take control of a process image. Now this particular bug is pretty hard to trigger and even harder to exploit. But the important thing is kmemcheck may have caught this particular issue, and others like it. kmemcheck would fire off a log entry when the ELF loader goes to read the uninitialized data in elf_phdata because technically the attacker controlled data was never written to it in this context, its old 'left over' data. Very neat stuff.
The kernel allocators are a bit more complex than malloc in userland though. The slab code has many small details about it that can make or break a kmalloc based vulnerability, but the concept here is very intriguing regardless. You can grab the kmemcheck patches here.
**As a side note, I took a quick look at linux/fs/binfmt_elf_fdpic.c and found this bug in virtually the same place as Paul found it and in an additional spot as well, where the program interpreter is loaded. They affect a small population and have already been fixed.
Wednesday, March 19, 2008
CLD/STD and GCC 4.3.0
804de86: fc cld
804de87: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
804de89: 89 c1 mov %eax,%ecx
804de8b: c1 e9 02 shr $0x2,%ecx
This instruction (CLD) clears a flag that determines which direction data should be written in (forward or backward). The flag itself is stored in the EFLAGS register. Clearing the flag with CLD sets the flag to 0 (forward). The STD instruction can then change this by setting the flag to 1 (backward). GCC no longer emits this instruction before inline string copies. This change is documented here. Technically this is right because the ABI states the direction flag should be cleared before entering any function (see page 38 under EFLAGS). The problem in this case is that the Linux kernel does not clear the flag when entering a signal handler. So in theory the flag is set to 1 for whatever reason and then a signal gets tripped and calls something like memcpy or memmove. Since the CLD instruction is no longer used inline the copy can write data in the wrong direction. This can obviously lead to security issues. I put together some x86 example code for this based on the x86_64 version posted to LKML, you can find it here.
./cld
Hit Ctrl+C
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 1 (backward)
Monday, March 03, 2008
Updated: Spamhaus-Snort Correlation Script
I updated the Spamhaus-Snort correlation script today. I hope you find it useful.
Sunday, December 30, 2007
2008 Blogging
Sunday, December 23, 2007
Ret-2-libc Without Function Calls
From the paper:
Gadgets perform well defined operations, such as a load, an xor, or a jump. Return-oriented programming consists in putting gadgets together that will perform the desired operations.
...
These gadgets can be found in byte streams from libc within a process' memory. They are not injected due to W^X constraints on most platforms. ... Each of our gadgets expects to be entered in the same way: the processor executes a ret with the stack pointer, %esp, pointing to the bottom word of the gadget. This means that, in an exploit, the first gadget should be placed so that its bottom word overwrites some functions saved return address on the stack.
The technique is an interesting one. It reminds of me certain ret-2-text techniques that may fall into the middle of a long instruction to produce a jmp %reg trampoline. Overall the technique will vary from platform to platform because libc may be compiled differently from Fedora to Ubuntu for example.
Using randomized mmap() (randomized library base mappings), PIE (Position Independent Executables) and RANDEXEC hardening make this type of exploitation technique a bit harder to pull off. The paper is worth a read if you have the time.
Tuesday, November 27, 2007
Your favorite "better than C" scripting language is probably implemented in C
-------------------------------------------------------------------------As you can see the variable my_string is placed in the message dialog text using a format specifier correctly according to the man page. I started to wonder what happened if this string contained a format specifier, would the underlying C libraries and bindings display it correctly? Surprise!
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE,
"%s - Was your string!" % my_string)
-------------------------------------------------------------------------

No it was not displayed correctly. In fact it was vulnerable to a format string attack straight from the year 2001. UGH! Now you might argue - "Your fault for not sanitizing your string". Well thats true to a point. But the MessageDialog interface is just a very deep abstraction layer to a printf() style function in the GTK C library. But unlike those functions MessageDialog is not well documented as an 'easily mis-used' function.
Programmers typically trust their API to correctly sanitize and display their input, especially in scripting languages. This is because in scripting languages programmers feel they are safe from traditional C language vulnerabilities. This isn't always the case when your abstraction layers don't handle data correctly. My audit to find the offending code took about ten minutes but I narrowed it down to
ruby-gnome2-all-0.16.0/gtk/src/rbgtkmessagedialog.c
Where it calls GTK like this:
w = gtk_message_dialog_new(NIL_P(parent) ? NULL : GTK_WINDOW(RVAL2GOBJ(parent)),
RVAL2GFLAGS(flags, GTK_TYPE_DIALOG_FLAGS),
RVAL2GENUM(type, GTK_TYPE_MESSAGE_TYPE),
RVAL2GENUM(buttons, GTK_TYPE_BUTTONS_TYPE),
(const gchar*)(NIL_P(message) ? "": RVAL2CSTR(message)));
The variable 'message' is passed directly to GTK. I don't blame GTK authors for this one, it would be like blaming libc authors for printf()'s ability to print a variable without a format specifier. The GTK MessageDialog page shows the function prototype for gtk_message_dialog_new()
GtkWidget* gtk_message_dialog_new
(GtkWindow *parent, GtkDialogFlags flags, GtkMessageType type,
GtkButtonsType buttons, const gchar *message_format, ...);
parent: transient parent, or NULL for none
flags: flags
type: type of message
buttons: set of buttons to use
message_format: printf()-style format string, or NULL
...: arguments for message_format
So GTK is clearly expecting a proper format string, which should be properly passed to it by whatever API called it.
Example vulnerable code:
-------------------------------------------------------------------------To avoid this issue in your ruby code you could use the markup member. This will use the Pango markup language on your text. Its a workaround but it gets the job done.
#!/usr/bin/env ruby
# ruby rubber.rb %x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x
require 'gtk2'
my_string = ARGV[0]
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE,
"%s - Was your string!" % my_string)
dialog.run
dialog.destroy
-------------------------------------------------------------------------
-------------------------------------------------------------------------Or alternatively you could do something like "my_string = my_string.gsub(/%/, "%%")" before calling messagedialog.
my_string = ARGV[0]
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE)
dialog.markup = "#{my_string} - Was your string!"
dialog.run
dialog.destroy
-------------------------------------------------------------------------
Using google we can find some other projects vulnerable to similar bugs. Most just stick #{my_string} in the message, including example applications from the official Ruby/Gnome2 website.
That about wraps up this post. Other Ruby/Gnome2 API's may have similar 'functionality'. This should teach all the scripters out there a security lesson. Always remember your favorite "better than C" scripting language is probably implemented in C. Ruby/Gnome2 authors have been notified and they have committed a patch to SVN.
Thursday, November 22, 2007
What Every Programmer Should Know About Memory (PDF)
Link to PDF
It's going to take me awhile to get through this (its 114 pages long) - but so far its a decent read. I'm currently cheating and searching through it for things that interest me. I'm currently taking in section 7.3 'Measuring Memory Usage'. This section is particularly interesting to me because I've been toying with a project of mine lately that collects massive amounts of data. Searching and sorting that data efficiently has not been easy.
Ulrich states in the PDF that using libc's malloc to store a linked list you populate for later retrieval and use is probably a bad idea. This is true, because theres no guarantee malloc will return memory that is close or even near to the next member in the linked list. There are alternatives to using the traditional libc malloc library such as obstack and Google's TCMalloc. I have toyed with TCMalloc on several occasions. Despite all of my heap mappings being in sequential order it only seems to slow down my data hungry application.
Obstack on the other hand is generally fast because it starts out mapping a larger chunk of memory, within that chunk you allocate smaller chunks to hold your data (in my case my linked list structures) which end up being in sequential order. Which is of course a lot faster to access then a list that is fragmented up.
There's lots of other good stuff in his paper, take a look for yourself.
Thursday, October 18, 2007
OSX Leopard - ASLR?
And now that all of this is on slashdot.org I'm sure the fanboi war will begin. Please let it be known that my official opinion is: it doesn't matter what OS you run, you can still get owned.
http://pax.grsecurity.net/docs/aslr.txt
Wednesday, October 03, 2007
Code Auditing Checklist
Years ago when I would try to audit a fairly large application like Apache, I simply got lost in its many functions and data structures, unable to get a good enough grasp of how it worked. By that point I had become frustrated and would probably move onto another application. Sometimes you get lucky and sometimes you walk away angry. There were never any good guidelines from the masters, only examples of vulnerable code. But without a thorough understanding of how a program works, I don't believe its possible to get the most out of your time spent auditing it. I have written down a few simple steps to quickly understand an application in less time, which means more time auditing for vulnerabilities.
1. Does the application have its own memory management? Many applications will have their own internal memory management instead of just allocating space when they need it. You will find many larger applications will have memory structures that contains a pointer to some dynamic buffer, the total size of the buffer, the length of the data in that buffer, and perhaps a pointer to a function that needs the data. This will vary greatly from app to app but understanding how this internal memory management works is absolutely key to finding any vulnerabilities related to mishandling of that memory. Its also important when exploiting a vulnerability you have found. Sometimes these higher abstraction layers can be abused.
2. Are there any functions that the application calls repeatedly? For example during a recent code audit I did there was a function that processed and stripped HTML characters from a string of user input. This function was called repeatedly throughout the application. I reviewed the function from start to end, making notes about how it could be called insecurely. So next time I came across another block of code that called that function I already knew what it did and I knew right away if it was being used correctly or not. Don't make the beginner mistake of trying to find all instances of str/memcpy abuses - when there are plenty of home grown functions that are just as lousy and widespread.
3. macros, typedef's, define's,and structures - Study them and know them well. Most larger applications are going to typedef large structs or variables they use often. Large applications have many structures that are important to understanding their internals. A variable type can make a big difference between being vulnerable and not being vulnerable. Make a list on paper if you have to.
This is not an exhaustive list of how you should approach a code review. But more of a quick checklist to quickly understanding how an application works internally so you can spend more time finding bugs.