require 'pcaplet'
require 'bit-struct'
# Fake protocol I made up for this example
class CustomProtocol < BitStruct
char :header, 64, :endian => :native
unsigned :length, 8, :endian => :native
unsigned :next_hdr, 16, :endian => :little
unsigned :next_tag, 16, :endian => :network
unsigned :type, 32, :endian => :native
rest :data
end
# Capture up to 1533 bytes
sniff = Pcaplet.new('-s 1533')
# Specific pcap filter so we only grab the protocol we are dissecting
pcap_filter = Pcap::Filter.new('tcp && port 34504 && src 192.168.1.10', sniff.capture)
sniff.add_filter(pcap_filter)
for pkt in sniff
if pcap_filter =~ pkt
puts pkt
struct = CustomProtocol.new(pkt.tcp_data)
puts sprintf("ASCII Header: %s\tLength: %x\tNext Hdr: %x\tNext Tag: %x\tType: %x\tData: %s",
struct.header, struct.length, struct.next_hdr, struct.next_tag, struct.type, struct.data)
end
end
Wednesday, June 25, 2008
BitStruct is great
If you code in Ruby and do any binary parsing then you need to be using BitStruct. It makes C style structs in Ruby very easy. Sometimes you have to sniff a custom binary protocol the quick and dirty way, these are times I turn to Ruby instead of C. The Bitstruct release has some good examples of parsing network protocols but using raw sockets in Ruby is ugly. I prefer to use the LibPcap wrappers instead for the awesomeness of pcap filters.
Tuesday, June 03, 2008
Known API's and automated static code analysis
I did some quick work a few weeks ago on automating static code analysis by using known API's to generate information about data structures and logic flow. The work is not ground breaking but I felt the techniques are quite useful and I wanted to document them clearly for myself and others. You can grab the short paper here.
It's interesting that slides Halvar presented in 2004 on automating reverse engineering are entirely still relevant. He made a good point ... "no matter how stupid an analysis tool is, some programmers will make mistakes which are stupider". How true...
It's interesting that slides Halvar presented in 2004 on automating reverse engineering are entirely still relevant. He made a good point ... "no matter how stupid an analysis tool is, some programmers will make mistakes which are stupider". How true...
Friday, May 02, 2008
Self Protecting GOT
I had some time to kill over the past few days and I wanted to explore an idea I had a few months ago. The idea is to protect the ELF GOT (Global Offset Table) (and other segments of memory) from userland without the support of 'relro' functionality now found in the GNU dynamic linker. I accomplished it through techniques such as linker script modification and constructor functions. No kernel modifications are needed and I have tested it on a semi large project (Snort IDS).
You can find the draft version 1.1 of my writeup here. If you find any mistakes let me know and I will fix them.
You can find the draft version 1.1 of my writeup here. If you find any mistakes let me know and I will fix them.
Friday, April 18, 2008
kmemcheck and an old bug
I wanted to do a quick post about 'kmemcheck' because I think the concept is pretty cool. It's a debugging patch in its 7th rev that is now proposed for the mainline Linux kernel in 2.6.26 and the idea is pretty simple but has lots of security uses...
...
...
The code above makes the incorrect assumption that kernel_read() will return less than zero if an error occurs. This is true however kernel_read() can also return greater than zero but less than 'size'. Which in this case leaves a portion of elf_phdata uninitialized. Whats my point? I'm getting to that. An attacker can potentially control this uninitialized data and take control of a process image. Now this particular bug is pretty hard to trigger and even harder to exploit. But the important thing is kmemcheck may have caught this particular issue, and others like it. kmemcheck would fire off a log entry when the ELF loader goes to read the uninitialized data in elf_phdata because technically the attacker controlled data was never written to it in this context, its old 'left over' data. Very neat stuff.
The kernel allocators are a bit more complex than malloc in userland though. The slab code has many small details about it that can make or break a kmalloc based vulnerability, but the concept here is very intriguing regardless. You can grab the kmemcheck patches here.
**As a side note, I took a quick look at linux/fs/binfmt_elf_fdpic.c and found this bug in virtually the same place as Paul found it and in an additional spot as well, where the program interpreter is loaded. They affect a small population and have already been fixed.
"kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log."The author provided a sample log file from the patch which is here. I spent a few minutes browsing it and I think it definitely shows promise for more than debugging. **Consider the case of these ELF loader vulnerabilities found by Paul Starzetz in 2004. Bug [1] is basically incorrect checking of the kernel_read() return value. Here's the bug:
...
size = elf_ex.e_phnum * sizeof(struct elf_phdr);
elf_phdata = (struct elf_phdr *) kmalloc(size, GFP_KERNEL);
if (!elf_phdata)
goto out;
retval = kernel_read(bprm->file, elf_ex.e_phoff, (char *) elf_phdata, size);
if (retval < 0)
goto out_free_ph;
...
The code above makes the incorrect assumption that kernel_read() will return less than zero if an error occurs. This is true however kernel_read() can also return greater than zero but less than 'size'. Which in this case leaves a portion of elf_phdata uninitialized. Whats my point? I'm getting to that. An attacker can potentially control this uninitialized data and take control of a process image. Now this particular bug is pretty hard to trigger and even harder to exploit. But the important thing is kmemcheck may have caught this particular issue, and others like it. kmemcheck would fire off a log entry when the ELF loader goes to read the uninitialized data in elf_phdata because technically the attacker controlled data was never written to it in this context, its old 'left over' data. Very neat stuff.
The kernel allocators are a bit more complex than malloc in userland though. The slab code has many small details about it that can make or break a kmalloc based vulnerability, but the concept here is very intriguing regardless. You can grab the kmemcheck patches here.
**As a side note, I took a quick look at linux/fs/binfmt_elf_fdpic.c and found this bug in virtually the same place as Paul found it and in an additional spot as well, where the program interpreter is loaded. They affect a small population and have already been fixed.
Wednesday, March 19, 2008
CLD/STD and GCC 4.3.0
Some of you may have seen this already. Its a very subtle bug that was exposed by GCC 4.3.0 that manifests itself in an interesting way. Heres a quick overview. In its latest version, GCC has changed a very small detail. Before version 4.3.0 GCC would insert a CLD (Clear Direction Flag) instruction before any inline string copy functions as shown below:
This instruction (CLD) clears a flag that determines which direction data should be written in (forward or backward). The flag itself is stored in the EFLAGS register. Clearing the flag with CLD sets the flag to 0 (forward). The STD instruction can then change this by setting the flag to 1 (backward). GCC no longer emits this instruction before inline string copies. This change is documented here. Technically this is right because the ABI states the direction flag should be cleared before entering any function (see page 38 under EFLAGS). The problem in this case is that the Linux kernel does not clear the flag when entering a signal handler. So in theory the flag is set to 1 for whatever reason and then a signal gets tripped and calls something like memcpy or memmove. Since the CLD instruction is no longer used inline the copy can write data in the wrong direction. This can obviously lead to security issues. I put together some x86 example code for this based on the x86_64 version posted to LKML, you can find it here.
804de86: fc cld
804de87: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
804de89: 89 c1 mov %eax,%ecx
804de8b: c1 e9 02 shr $0x2,%ecx
This instruction (CLD) clears a flag that determines which direction data should be written in (forward or backward). The flag itself is stored in the EFLAGS register. Clearing the flag with CLD sets the flag to 0 (forward). The STD instruction can then change this by setting the flag to 1 (backward). GCC no longer emits this instruction before inline string copies. This change is documented here. Technically this is right because the ABI states the direction flag should be cleared before entering any function (see page 38 under EFLAGS). The problem in this case is that the Linux kernel does not clear the flag when entering a signal handler. So in theory the flag is set to 1 for whatever reason and then a signal gets tripped and calls something like memcpy or memmove. Since the CLD instruction is no longer used inline the copy can write data in the wrong direction. This can obviously lead to security issues. I put together some x86 example code for this based on the x86_64 version posted to LKML, you can find it here.
./cld
Hit Ctrl+C
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 1 (backward)
Monday, March 03, 2008
Updated: Spamhaus-Snort Correlation Script
If you have ever worked in security operations before you should be pretty familiar with the daily pains of trying to detect and stop malware before it gets into your network environment. Theres plenty of sources out there to help you out. Last year I toyed with the concept of correlating my Snort alert sources with the spamhaus DNS blacklist. The results were pretty much what I expected. A lot of the unsolicited attacks and probes picked up by my IDS were coming from hosts that were on the spamhaus black list. This is presumably because the same hosts on botnets that are sending spam are also scanning for other victims and hosting malicious client side exploits. This really isn't 'news' - but what I find disturbing is that there doesn't seem to be any correlation in some of these defenses. Specifically, when my mail filter rejects a spam due to a hit on Spamhaus XBL (exploits/trojans list etc...), it stops. Why not send that offending IP to my firewall and blacklist it? I know there are IDS's that will send this type of information to the firewall when an alert is triggered. Are there any anti-spam technologies out there doing this? If any big anti-spam vendors start doing this, be sure to send me consulting work :)
I updated the Spamhaus-Snort correlation script today. I hope you find it useful.
I updated the Spamhaus-Snort correlation script today. I hope you find it useful.
Sunday, December 23, 2007
Ret-2-libc Without Function Calls
Someone posted a link to this paper (http://www.cse.ucsd.edu/~hovav/papers/s07.html) on Full Disclosure the other day. I had not seen it before. It discusses ret-2-libc attacks without using functions. Instead the authors use what they call 'gadgets'. Which in plain technical terms means finding unintended code sequences in executable pages of memory that can be used to string together ways to execute arbitrary code. The authors present it as a way to defeat W^X protections.
From the paper:
Gadgets perform well defined operations, such as a load, an xor, or a jump. Return-oriented programming consists in putting gadgets together that will perform the desired operations.
...
These gadgets can be found in byte streams from libc within a process' memory. They are not injected due to W^X constraints on most platforms. ... Each of our gadgets expects to be entered in the same way: the processor executes a ret with the stack pointer, %esp, pointing to the bottom word of the gadget. This means that, in an exploit, the first gadget should be placed so that its bottom word overwrites some functions saved return address on the stack.
The technique is an interesting one. It reminds of me certain ret-2-text techniques that may fall into the middle of a long instruction to produce a jmp %reg trampoline. Overall the technique will vary from platform to platform because libc may be compiled differently from Fedora to Ubuntu for example.
Using randomized mmap() (randomized library base mappings), PIE (Position Independent Executables) and RANDEXEC hardening make this type of exploitation technique a bit harder to pull off. The paper is worth a read if you have the time.
From the paper:
Gadgets perform well defined operations, such as a load, an xor, or a jump. Return-oriented programming consists in putting gadgets together that will perform the desired operations.
...
These gadgets can be found in byte streams from libc within a process' memory. They are not injected due to W^X constraints on most platforms. ... Each of our gadgets expects to be entered in the same way: the processor executes a ret with the stack pointer, %esp, pointing to the bottom word of the gadget. This means that, in an exploit, the first gadget should be placed so that its bottom word overwrites some functions saved return address on the stack.
The technique is an interesting one. It reminds of me certain ret-2-text techniques that may fall into the middle of a long instruction to produce a jmp %reg trampoline. Overall the technique will vary from platform to platform because libc may be compiled differently from Fedora to Ubuntu for example.
Using randomized mmap() (randomized library base mappings), PIE (Position Independent Executables) and RANDEXEC hardening make this type of exploitation technique a bit harder to pull off. The paper is worth a read if you have the time.
Tuesday, November 27, 2007
Your favorite "better than C" scripting language is probably implemented in C
I was writing an application front-end in Ruby/Gnome2 and I needed to produce an error message for the user that contained a string the user had previously input. My MessageDialog code looked like this:

No it was not displayed correctly. In fact it was vulnerable to a format string attack straight from the year 2001. UGH! Now you might argue - "Your fault for not sanitizing your string". Well thats true to a point. But the MessageDialog interface is just a very deep abstraction layer to a printf() style function in the GTK C library. But unlike those functions MessageDialog is not well documented as an 'easily mis-used' function.
Programmers typically trust their API to correctly sanitize and display their input, especially in scripting languages. This is because in scripting languages programmers feel they are safe from traditional C language vulnerabilities. This isn't always the case when your abstraction layers don't handle data correctly. My audit to find the offending code took about ten minutes but I narrowed it down to
ruby-gnome2-all-0.16.0/gtk/src/rbgtkmessagedialog.c
Where it calls GTK like this:
The variable 'message' is passed directly to GTK. I don't blame GTK authors for this one, it would be like blaming libc authors for printf()'s ability to print a variable without a format specifier. The GTK MessageDialog page shows the function prototype for gtk_message_dialog_new()
So GTK is clearly expecting a proper format string, which should be properly passed to it by whatever API called it.
Example vulnerable code:
Using google we can find some other projects vulnerable to similar bugs. Most just stick #{my_string} in the message, including example applications from the official Ruby/Gnome2 website.
That about wraps up this post. Other Ruby/Gnome2 API's may have similar 'functionality'. This should teach all the scripters out there a security lesson. Always remember your favorite "better than C" scripting language is probably implemented in C. Ruby/Gnome2 authors have been notified and they have committed a patch to SVN.
-------------------------------------------------------------------------As you can see the variable my_string is placed in the message dialog text using a format specifier correctly according to the man page. I started to wonder what happened if this string contained a format specifier, would the underlying C libraries and bindings display it correctly? Surprise!
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE,
"%s - Was your string!" % my_string)
-------------------------------------------------------------------------

No it was not displayed correctly. In fact it was vulnerable to a format string attack straight from the year 2001. UGH! Now you might argue - "Your fault for not sanitizing your string". Well thats true to a point. But the MessageDialog interface is just a very deep abstraction layer to a printf() style function in the GTK C library. But unlike those functions MessageDialog is not well documented as an 'easily mis-used' function.
Programmers typically trust their API to correctly sanitize and display their input, especially in scripting languages. This is because in scripting languages programmers feel they are safe from traditional C language vulnerabilities. This isn't always the case when your abstraction layers don't handle data correctly. My audit to find the offending code took about ten minutes but I narrowed it down to
ruby-gnome2-all-0.16.0/gtk/src/rbgtkmessagedialog.c
Where it calls GTK like this:
w = gtk_message_dialog_new(NIL_P(parent) ? NULL : GTK_WINDOW(RVAL2GOBJ(parent)),
RVAL2GFLAGS(flags, GTK_TYPE_DIALOG_FLAGS),
RVAL2GENUM(type, GTK_TYPE_MESSAGE_TYPE),
RVAL2GENUM(buttons, GTK_TYPE_BUTTONS_TYPE),
(const gchar*)(NIL_P(message) ? "": RVAL2CSTR(message)));
The variable 'message' is passed directly to GTK. I don't blame GTK authors for this one, it would be like blaming libc authors for printf()'s ability to print a variable without a format specifier. The GTK MessageDialog page shows the function prototype for gtk_message_dialog_new()
GtkWidget* gtk_message_dialog_new
(GtkWindow *parent, GtkDialogFlags flags, GtkMessageType type,
GtkButtonsType buttons, const gchar *message_format, ...);
parent: transient parent, or NULL for none
flags: flags
type: type of message
buttons: set of buttons to use
message_format: printf()-style format string, or NULL
...: arguments for message_format
So GTK is clearly expecting a proper format string, which should be properly passed to it by whatever API called it.
Example vulnerable code:
-------------------------------------------------------------------------To avoid this issue in your ruby code you could use the markup member. This will use the Pango markup language on your text. Its a workaround but it gets the job done.
#!/usr/bin/env ruby
# ruby rubber.rb %x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x
require 'gtk2'
my_string = ARGV[0]
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE,
"%s - Was your string!" % my_string)
dialog.run
dialog.destroy
-------------------------------------------------------------------------
-------------------------------------------------------------------------Or alternatively you could do something like "my_string = my_string.gsub(/%/, "%%")" before calling messagedialog.
my_string = ARGV[0]
dialog = Gtk::MessageDialog.new(@main_app_window, Gtk::Dialog::MODAL,
Gtk::MessageDialog::INFO,
Gtk::MessageDialog::BUTTONS_CLOSE)
dialog.markup = "#{my_string} - Was your string!"
dialog.run
dialog.destroy
-------------------------------------------------------------------------
Using google we can find some other projects vulnerable to similar bugs. Most just stick #{my_string} in the message, including example applications from the official Ruby/Gnome2 website.
That about wraps up this post. Other Ruby/Gnome2 API's may have similar 'functionality'. This should teach all the scripters out there a security lesson. Always remember your favorite "better than C" scripting language is probably implemented in C. Ruby/Gnome2 authors have been notified and they have committed a patch to SVN.
Thursday, November 22, 2007
What Every Programmer Should Know About Memory (PDF)
I just came across this PDF on reddit.com titled "What every programmer should know about memory". Its written by Ulrich Drepper from RedHat, you should know who he is.
Link to PDF
It's going to take me awhile to get through this (its 114 pages long) - but so far its a decent read. I'm currently cheating and searching through it for things that interest me. I'm currently taking in section 7.3 'Measuring Memory Usage'. This section is particularly interesting to me because I've been toying with a project of mine lately that collects massive amounts of data. Searching and sorting that data efficiently has not been easy.
Ulrich states in the PDF that using libc's malloc to store a linked list you populate for later retrieval and use is probably a bad idea. This is true, because theres no guarantee malloc will return memory that is close or even near to the next member in the linked list. There are alternatives to using the traditional libc malloc library such as obstack and Google's TCMalloc.
There's lots of other good stuff in his paper, take a look for yourself.
Link to PDF
It's going to take me awhile to get through this (its 114 pages long) - but so far its a decent read. I'm currently cheating and searching through it for things that interest me. I'm currently taking in section 7.3 'Measuring Memory Usage'. This section is particularly interesting to me because I've been toying with a project of mine lately that collects massive amounts of data. Searching and sorting that data efficiently has not been easy.
Ulrich states in the PDF that using libc's malloc to store a linked list you populate for later retrieval and use is probably a bad idea. This is true, because theres no guarantee malloc will return memory that is close or even near to the next member in the linked list. There are alternatives to using the traditional libc malloc library such as obstack and Google's TCMalloc.
There's lots of other good stuff in his paper, take a look for yourself.
Thursday, October 18, 2007
OSX Leopard - ASLR?
A lot of main stream media is reporting OSX will be getting ASLR (Address Space Layout Randomization). However OSX's new features page says 'library randomization'. Not ASLR. Im not an OSX user but I think some clarification is needed here. ASLR is a pretty vague term to apply to this. The PAX implementation for example describes ASLR as randomization on many different regions of a processes memory. The true die-hard in me reserves the term ASLR for a wider randomization implementation such as stack base, mmap, .text base and many others, not just library mappings.
And now that all of this is on slashdot.org I'm sure the fanboi war will begin. Please let it be known that my official opinion is: it doesn't matter what OS you run, you can still get owned.
http://pax.grsecurity.net/docs/aslr.txt
And now that all of this is on slashdot.org I'm sure the fanboi war will begin. Please let it be known that my official opinion is: it doesn't matter what OS you run, you can still get owned.
http://pax.grsecurity.net/docs/aslr.txt
Wednesday, October 03, 2007
Code Auditing Checklist
When I audit any code I always follow the same steps to familiarize myself with the application and give me a better sense of its internals. I was giving this advice to a friend over IM today, and I thought it would make a good blog post for others.
Years ago when I would try to audit a fairly large application like Apache, I simply got lost in its many functions and data structures, unable to get a good enough grasp of how it worked. By that point I had become frustrated and would probably move onto another application. Sometimes you get lucky and sometimes you walk away angry. There were never any good guidelines from the masters, only examples of vulnerable code. But without a thorough understanding of how a program works, I don't believe its possible to get the most out of your time spent auditing it. I have written down a few simple steps to quickly understand an application in less time, which means more time auditing for vulnerabilities.
1. Does the application have its own memory management? Many applications will have their own internal memory management instead of just allocating space when they need it. You will find many larger applications will have memory structures that contains a pointer to some dynamic buffer, the total size of the buffer, the length of the data in that buffer, and perhaps a pointer to a function that needs the data. This will vary greatly from app to app but understanding how this internal memory management works is absolutely key to finding any vulnerabilities related to mishandling of that memory. Its also important when exploiting a vulnerability you have found. Sometimes these higher abstraction layers can be abused.
2. Are there any functions that the application calls repeatedly? For example during a recent code audit I did there was a function that processed and stripped HTML characters from a string of user input. This function was called repeatedly throughout the application. I reviewed the function from start to end, making notes about how it could be called insecurely. So next time I came across another block of code that called that function I already knew what it did and I knew right away if it was being used correctly or not. Don't make the beginner mistake of trying to find all instances of str/memcpy abuses - when there are plenty of home grown functions that are just as lousy and widespread.
3. macros, typedef's, define's,and structures - Study them and know them well. Most larger applications are going to typedef large structs or variables they use often. Large applications have many structures that are important to understanding their internals. A variable type can make a big difference between being vulnerable and not being vulnerable. Make a list on paper if you have to.
This is not an exhaustive list of how you should approach a code review. But more of a quick checklist to quickly understanding how an application works internally so you can spend more time finding bugs.
Years ago when I would try to audit a fairly large application like Apache, I simply got lost in its many functions and data structures, unable to get a good enough grasp of how it worked. By that point I had become frustrated and would probably move onto another application. Sometimes you get lucky and sometimes you walk away angry. There were never any good guidelines from the masters, only examples of vulnerable code. But without a thorough understanding of how a program works, I don't believe its possible to get the most out of your time spent auditing it. I have written down a few simple steps to quickly understand an application in less time, which means more time auditing for vulnerabilities.
1. Does the application have its own memory management? Many applications will have their own internal memory management instead of just allocating space when they need it. You will find many larger applications will have memory structures that contains a pointer to some dynamic buffer, the total size of the buffer, the length of the data in that buffer, and perhaps a pointer to a function that needs the data. This will vary greatly from app to app but understanding how this internal memory management works is absolutely key to finding any vulnerabilities related to mishandling of that memory. Its also important when exploiting a vulnerability you have found. Sometimes these higher abstraction layers can be abused.
2. Are there any functions that the application calls repeatedly? For example during a recent code audit I did there was a function that processed and stripped HTML characters from a string of user input. This function was called repeatedly throughout the application. I reviewed the function from start to end, making notes about how it could be called insecurely. So next time I came across another block of code that called that function I already knew what it did and I knew right away if it was being used correctly or not. Don't make the beginner mistake of trying to find all instances of str/memcpy abuses - when there are plenty of home grown functions that are just as lousy and widespread.
3. macros, typedef's, define's,and structures - Study them and know them well. Most larger applications are going to typedef large structs or variables they use often. Large applications have many structures that are important to understanding their internals. A variable type can make a big difference between being vulnerable and not being vulnerable. Make a list on paper if you have to.
This is not an exhaustive list of how you should approach a code review. But more of a quick checklist to quickly understanding how an application works internally so you can spend more time finding bugs.
Tuesday, October 02, 2007
1 Year Has Passed
I just realized this blog turned one year old a few weeks ago, and I'm still not at 50 posts. That's pretty sad, Ill have to pick up the pace. Over the past year I have blogged about various topics such as security, ELF, Linux, random security headlines and more. Sometimes even 'real' tech media will quote my posts. Does a lack of comments indicate no one finds what you have to say interesting? I hope not.
The blog averages about 20-40 hits a day from various google keyword searches and links to it. From what I can tell there's an additional 75 to 100 people who subscribe to the RSS feed via feedburner, bloglines, google and a few others I've never heard of. Thanks for reading for the past year. As long as I have readers I will continue to post :)
The blog averages about 20-40 hits a day from various google keyword searches and links to it. From what I can tell there's an additional 75 to 100 people who subscribe to the RSS feed via feedburner, bloglines, google and a few others I've never heard of. Thanks for reading for the past year. As long as I have readers I will continue to post :)
Saturday, September 29, 2007
Blackboxes and Trust
I'm sure you've heard the saying "you wouldn't buy a car that had the hood sealed shut would you?" - Followed up by an open source zealot fanatic person explaining to you why that analogy works for software. Well I actually do agree with that analogy. Anton Chuvakin put it into better words then I ever could in this blog post.
Every single day very large and important organizations rely on software to keep themselves running (hospitals, infrastructure control, intelligence agencies, the military ... and so on). Yet nearly none of these organizations are legally allowed to see the source code of that software. There is just absolute blind trust in its ability to work correctly and be reliable. Not to mention secure.
Where is the proof this software isn't full of backdoors, vulnerabilities, logic bugs or more. Organizations such as those above need to start asking (demanding) their vendors provide some real proof that the source code or binary was audited by a third party - i.e. not the original developers of the software. This proof works both ways. It gives the company the chance to say "hey - we can't catch all the bugs, but we did our best, and thats why you should choose us over our competition". And customers are given a little more trust in the investment they just made. Because now they know their vendor went further then the competition to produce a better quality product.
Lets take Windows Vista for example - many hackers have audited its source code on while on Microsoft's payroll. This is a good thing, and Microsoft can now say to customers "YES we did audit our code after development". Which is a lot more then most other vendors out there can say. The flip side to this argument is open source. Just because the source is open doesn't mean people have reviewed it for vulnerabilities (download a random sourceforge project and you will understand what I mean). But on the other hand, it does give the customer/user the ability to inspect the software they are relying so heavily on.
How many of you can honestly say the software products your company relies on have been audited by a third party?
Every single day very large and important organizations rely on software to keep themselves running (hospitals, infrastructure control, intelligence agencies, the military ... and so on). Yet nearly none of these organizations are legally allowed to see the source code of that software. There is just absolute blind trust in its ability to work correctly and be reliable. Not to mention secure.
Where is the proof this software isn't full of backdoors, vulnerabilities, logic bugs or more. Organizations such as those above need to start asking (demanding) their vendors provide some real proof that the source code or binary was audited by a third party - i.e. not the original developers of the software. This proof works both ways. It gives the company the chance to say "hey - we can't catch all the bugs, but we did our best, and thats why you should choose us over our competition". And customers are given a little more trust in the investment they just made. Because now they know their vendor went further then the competition to produce a better quality product.
Lets take Windows Vista for example - many hackers have audited its source code on while on Microsoft's payroll. This is a good thing, and Microsoft can now say to customers "YES we did audit our code after development". Which is a lot more then most other vendors out there can say. The flip side to this argument is open source. Just because the source is open doesn't mean people have reviewed it for vulnerabilities (download a random sourceforge project and you will understand what I mean). But on the other hand, it does give the customer/user the ability to inspect the software they are relying so heavily on.
How many of you can honestly say the software products your company relies on have been audited by a third party?
Monday, September 24, 2007
Some Thoughts On Virtualization and Security
With high profile VMWare vulnerabilities just hitting the news its easy to find some mainstream articles covering the subject. This post isn't about hypervisor rootkits (because were all tired of hearing about that), but more about the assumption in corporations and academia that (virtualization == security). This is just plain WRONG. Virtualization environments are extremely complex pieces of software - and with complexity comes insecurity. In fact I would venture as far as to say that by default (virtualization == insecurity); running two operating systems within the same machine just creates more attack surface. Considering the high degree of interaction the host and guest OS must have you inherently create greater possibility of vulnerability then if they were on separate hardware. And just because VM's are easy to create and re-create doesn't mean they shouldn't be secured as well. As we have seen from this latest VMWare vulnerability, theres always the possibility your guest VM can compromise your host OS. It should also be noted that once the host OS has been hijacked ALL of your guest VM's should be considered compromised and untrusted. In order for the attacker to completely own your virtualization environment he/she has to know exactly what host OS is being used. There needs to be more fool-proof research into this area before wide spread panic can begin. There will also hopefully be more utilization of the host OS/virtualizer as an Virtual IDS (VIDS) of sorts - to tell us when our virtual machines have been compromised. This use hasnlt been explored enough in my opinion.
Now its true some virtualization technologies were designed with security in mind and others were meant to increase efficiency and productivity of hardware. This fact should be noted when trying to decide which virtualization strategy to use. But companies should also be aware of the security issues they may be introducing by improperly implementing a virtualization strategy as they may be causing more harm then its worth.
Now its true some virtualization technologies were designed with security in mind and others were meant to increase efficiency and productivity of hardware. This fact should be noted when trying to decide which virtualization strategy to use. But companies should also be aware of the security issues they may be introducing by improperly implementing a virtualization strategy as they may be causing more harm then its worth.
Saturday, September 22, 2007
A good presentation by FX ....
I just read a pretty good presentation by FX (Felix Lindner) called "Security and Attack Surface of Modern Applications". He presented it at HITB 2007 (I did not attend). As FX describes it his presentation is not about hex and 0day ;( but more about how security problems are not being fixed and things are rapidly progressing down hill. He makes some very good points such as "Respect that software is there to solve real problems for people, security isn’t one of them. ". And this is very true, the security community tends to forget this detail most of the time. His presentation has some excellent numbers associated with vulnerability classes and what attackers focused on since the late nineties to today.
One subject he touches on which is of interest to me is perimeter security. While its true most attackers focus on client side exploits today, perimeter security should not be forgotten just because we tunnel %50 of our applications over HTTP. Client side exploits allow attackers to create larger botnets. But client side vulnerabilities aren't always the first pick in a targeted attack. Well they can be (MS Office parsing vulns - google for what I mean). But targeted attacks can involve something specific to that target, a mis-configured web server or email server etc... To FX's point, combining all of these different technologies (VPN Termination, LDAP, SSL etc) into the firewall is _not_ the way to do perimeter security. Defense in depth is still entirely relevant and will be for a long time to come. And if done correctly, at the very least, can stop some successful client side exploits from calling home, which can minimize their impact to your network.
On slide 13 FX also talks about 'Skill and Time'. He seems to put far more skill+time on finding vulnerabilities as opposed to writing exploits, which he states 'requires little skills but quite some time'. Im not sure how I feel about that slide yet. Others certainly do not agree with him.
I recommend reading it. You can grab FX's presentation and others from HITB 2007 here
(FX's take on the 'self defending network' is priceless)
One subject he touches on which is of interest to me is perimeter security. While its true most attackers focus on client side exploits today, perimeter security should not be forgotten just because we tunnel %50 of our applications over HTTP. Client side exploits allow attackers to create larger botnets. But client side vulnerabilities aren't always the first pick in a targeted attack. Well they can be (MS Office parsing vulns - google for what I mean). But targeted attacks can involve something specific to that target, a mis-configured web server or email server etc... To FX's point, combining all of these different technologies (VPN Termination, LDAP, SSL etc) into the firewall is _not_ the way to do perimeter security. Defense in depth is still entirely relevant and will be for a long time to come. And if done correctly, at the very least, can stop some successful client side exploits from calling home, which can minimize their impact to your network.
On slide 13 FX also talks about 'Skill and Time'. He seems to put far more skill+time on finding vulnerabilities as opposed to writing exploits, which he states 'requires little skills but quite some time'. Im not sure how I feel about that slide yet. Others certainly do not agree with him.
I recommend reading it. You can grab FX's presentation and others from HITB 2007 here
(FX's take on the 'self defending network' is priceless)
Wednesday, September 19, 2007
QueFuzz
**Update: New version is out (v06), supports a fuzzing template file - source is here
Its a very basic C program that utilizes the libnetfilter_queue library to turn any networked application into a fuzzer. It basically works like this:
- You set a specific iptables QUEUE rule like so:
$iptables -A OUTPUT -p tcp --dport 110 -j QUEUE
- Start it like so:
'$./quefuzz -a -v -c USER'
or
'$./quefuzz -b -v -f 3'
- Open your POP3 client and connect to the POP server you want to fuzz
- QueFuzz picks up your packets using libnetfilter_queue, fuzzes them and sends them on the wire
This works with any protocol/port. If netfilter/iptables can queue it, QueFuzz can fuzz it.
QueFuzz has no protocol awareness, it expects to receive a proper packet. It has minimal command line flags such as whether or not the protocol you want to fuzz is binary or ascii, or both. If the protocol is TCP or UDP, QueFuzz will skip those headers appropriately and start fuzzing the packet data. If the protocol is not TCP or UDP then it starts fuzzing immediately after the IP header.
A lot of work is needed on the tool. It was never meant to be protocol aware or intelligent, but it could certainly be cleaner. It is BETA code at best, so use at your own risk. I can guarantee its full of bugs (probably some bad ones) - so be careful! I literally whipped it up in a couple of hours. Ill be refining it over the next few weeks and releasing updates. Feel free to send me patches and suggestions by email.
QueFuzz is released under the GPLv2 as is libnetfilter_queue. Some checksum routines are released under BSD-3 license from various sources.
You can download the beta code here Enjoy
Its a very basic C program that utilizes the libnetfilter_queue library to turn any networked application into a fuzzer. It basically works like this:
- You set a specific iptables QUEUE rule like so:
$iptables -A OUTPUT -p tcp --dport 110 -j QUEUE
- Start it like so:
'$./quefuzz -a -v -c USER'
or
'$./quefuzz -b -v -f 3'
- Open your POP3 client and connect to the POP server you want to fuzz
- QueFuzz picks up your packets using libnetfilter_queue, fuzzes them and sends them on the wire
This works with any protocol/port. If netfilter/iptables can queue it, QueFuzz can fuzz it.
QueFuzz has no protocol awareness, it expects to receive a proper packet. It has minimal command line flags such as whether or not the protocol you want to fuzz is binary or ascii, or both. If the protocol is TCP or UDP, QueFuzz will skip those headers appropriately and start fuzzing the packet data. If the protocol is not TCP or UDP then it starts fuzzing immediately after the IP header.
A lot of work is needed on the tool. It was never meant to be protocol aware or intelligent, but it could certainly be cleaner. It is BETA code at best, so use at your own risk. I can guarantee its full of bugs (probably some bad ones) - so be careful! I literally whipped it up in a couple of hours. Ill be refining it over the next few weeks and releasing updates. Feel free to send me patches and suggestions by email.
QueFuzz is released under the GPLv2 as is libnetfilter_queue. Some checksum routines are released under BSD-3 license from various sources.
You can download the beta code here Enjoy
Thursday, September 13, 2007
Ngrep is still useful
I just had to blog on how much I love ngrep. Despite all the advances in security, we are still left with a huge problem called data leakage. If you work in any type of operational security role, its one of your worst nightmares. I have used ngrep for a couple of years, as I'm sure most of you have too. I had a (legal) need for ngrep again over the past week while trying to assess the state of security in a specific network I protect and monitor and I thought I would post some of my more use-able ngrep queries for you. I am not a regular expression guru like some people I know, sorry.
Looking for social security numbers:
$ngrep -q -d eth0 -w '[0-9]{3}\-[0-9]{2}\-[0-9]{4}'
Almost the same as above but searching for credit card number patterns (this one can lead some false positives if searching through http conversations):
$ngrep -q -d eth0 '[0-9]{4}\-[0-9]{4}\-[0-9]{4}\-[0-9]{4}'
Looking for 'password=':
$ngrep -q -d eth0 -i 'password='
Some storm worm executable names (this could be expanded easily):
ngrep -q -d eth0 -i '(ecard|postcard|youtube|FullClip|MoreHere|FullVideo|greeting|ClickHere|NFLSeasonTracker).exe' 'port 80'
Detect an HTTP connection to a server by IP address not FQDN (this is how bleedingthreats new storm worm download rules look):
ngrep -q -d eth0 -i 'Host\: [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' 'port 80'
Look for basic http login:
ngrep -q -d eth0 -i 'Authorization: Basic' 'port 80'
These are just smaller examples of what expensive 'data leak prevention' boxes do. Hopefully they perform the regular expression look ups on reassembled packet flows, not individual packets. Otherwise its a waste of time as the data can be chunked up between different packets. Data leakage continues to be an issue to this day. And unfortunately I don't see it going away anytime soon. And thats mostly because its a human problem, and user education is a loosing battle : \
Sorry this post was soooo 2001 - please resist the urge to remove me from your RSS reader
Looking for social security numbers:
$ngrep -q -d eth0 -w '[0-9]{3}\-[0-9]{2}\-[0-9]{4}'
Almost the same as above but searching for credit card number patterns (this one can lead some false positives if searching through http conversations):
$ngrep -q -d eth0 '[0-9]{4}\-[0-9]{4}\-[0-9]{4}\-[0-9]{4}'
Looking for 'password=':
$ngrep -q -d eth0 -i 'password='
Some storm worm executable names (this could be expanded easily):
ngrep -q -d eth0 -i '(ecard|postcard|youtube|FullClip|MoreHere|FullVideo|greeting|ClickHere|NFLSeasonTracker).exe' 'port 80'
Detect an HTTP connection to a server by IP address not FQDN (this is how bleedingthreats new storm worm download rules look):
ngrep -q -d eth0 -i 'Host\: [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' 'port 80'
Look for basic http login:
ngrep -q -d eth0 -i 'Authorization: Basic' 'port 80'
These are just smaller examples of what expensive 'data leak prevention' boxes do. Hopefully they perform the regular expression look ups on reassembled packet flows, not individual packets. Otherwise its a waste of time as the data can be chunked up between different packets. Data leakage continues to be an issue to this day. And unfortunately I don't see it going away anytime soon. And thats mostly because its a human problem, and user education is a loosing battle : \
Sorry this post was soooo 2001 - please resist the urge to remove me from your RSS reader
Friday, August 10, 2007
Static Analysis Headaches
I am very interested in the static analysis of binaries. Mainly because theres no one way to do it. Theres no correct or incorrect way of analyzing compiler generated code - especially without running it. In fact most techniques only work with certain compiler constructs and function behaviors. I think thats why even today there are very few tools that do it well.
I started coding static analysis tools a few years ago and have steadily been rewriting and testing pieces of one in particular over and over again that analyzes x86 ELF objects. (Yes I will eventually release it in some form). I have run into many pitfalls during its design, specifically emulating the x86 without too much overhead. Obviously I don't care to emulate every single instruction in every combination, thats not only pointless but it would take forever. There are only certain parts of the execution process I am interested in. That includes what the stack looks like, register contents, variable types, and how they all tie together. For example a programmer might say sizeof(var) - and the size of that variable is determined at runtime. Now lets suppose that size argument is used as a length argument to a function like memcpy. I can't be too sure if the call is vulnerable or not because I don't know exactly what var is or how big it is. Sometimes educated guesses must be made. For example does var get assigned a value from a packet? Is it a command line argument? When you can't execute the binary, you have to make certain assumptions, and just hope they are correct.
And sometimes, you do know certain things about the variables. I thought it might be a nice write up to show how a tool of mine evaluated a specific vulnerable call to memcpy(). This is one very non-scientific way of finding variable objects in the code and assigning them attributes such as 'size'. Another 'assumption' I had to make.
Heres a function foo():
Obviously I am not the only person to use this method - as its a very simple concept and easy to implement. And certainly won't catch more complex bugs that require the interaction of many functions.
This requires several passes are made over the binary before any output to the user can occur. My first and second passes gather all symbol, relocation, and cross reference data, followed by function analysis routines. The third pass contains mostly output plugins that make all of the data accessible for display.
Blogspot has a way of jumbling up my text, not to mention its not really formatted nicely to begin with. The vulnerability analysis plugin has lots of 'hint strings' that are basically triggered by the occurence of specific instructions plus a combination of pre-existing knowledge about the static data objects and code that has already been evaluated in previous passes. For now it works on smaller programs. Despite being written in straight C, it can sometimes take awhile to crunch all of this on a large binary like Firefox (and most of the time produces absolute nonsense). The end goal is to have an effcient tool that can process and accurately report on a larger binary.
I started coding static analysis tools a few years ago and have steadily been rewriting and testing pieces of one in particular over and over again that analyzes x86 ELF objects. (Yes I will eventually release it in some form). I have run into many pitfalls during its design, specifically emulating the x86 without too much overhead. Obviously I don't care to emulate every single instruction in every combination, thats not only pointless but it would take forever. There are only certain parts of the execution process I am interested in. That includes what the stack looks like, register contents, variable types, and how they all tie together. For example a programmer might say sizeof(var) - and the size of that variable is determined at runtime. Now lets suppose that size argument is used as a length argument to a function like memcpy. I can't be too sure if the call is vulnerable or not because I don't know exactly what var is or how big it is. Sometimes educated guesses must be made. For example does var get assigned a value from a packet? Is it a command line argument? When you can't execute the binary, you have to make certain assumptions, and just hope they are correct.
And sometimes, you do know certain things about the variables. I thought it might be a nice write up to show how a tool of mine evaluated a specific vulnerable call to memcpy(). This is one very non-scientific way of finding variable objects in the code and assigning them attributes such as 'size'. Another 'assumption' I had to make.
Heres a function foo():
During its first past on the object code a size value was stored and assigned to the static object at 0x08049640 based on the arguments to memset(). This is obviously not a fool proof way of knowing what the object at 0x08049640 is or what its true size is, however at the very least it should be the objects minimum size. Its probably a global struct that contains some variables or a static character array, but its impossible for it to figure that out with any degree of certainty at this point. Following the memset() call there was a call to memcpy(), based on the prior observation I am able to determine auto-magically that there is a potential buffer overflow.
...
80483de push %ebp
| Symbol: [foo @ 080483de]
| Xref: (0x80483de -> [0x080483cb call 0x80483de])
80483df mov %esp,%ebp
80483e1 sub $0x18,%esp
80483e4 mov $0x8049640,%edx
80483e9 mov $0x80,%eax
80483ee mov %eax,0x8(%esp)
80483f2 movl $0x0,0x4(%esp)
80483fa mov %edx,(%esp)
80483fd call 0x80482d4
| Symbol: [memset @ plt]
| Analysis:
| EAX 0x00000080 EBX 0x00000000
| ECX 0x00000000 EDX 0x08049640
| | Symbol: [0x8049640 buf1 @ .bss]
| Analysis:
| memset() argument indicates sizeof(0x08049640)=0x80(128 bytes)
8048402 mov 0x8(%ebp),%eax
8048405 add $0x4,%eax
8048408 mov (%eax),%eax
804840a mov $0x8049640,%ecx
| Symbol: [0x8049640 buf1 @ .bss]
804840f mov %eax,%edx
8048411 mov $0x100,%eax
8048416 mov %eax,0x8(%esp)
804841a mov %edx,0x4(%esp)
804841e mov %ecx,(%esp)
8048421 call 0x80482f4
| Symbol: [memcpy @ plt]
| Analysis:
| EAX 0x00000100 EBX 0x00000000
| ECX 0x08049640 EDX 0x00000000
| | Symbol: [0x8049640 buf1 @ .bss]
| Analysis:
| memcpy() argument indicates buffer overflow at 0x08049640 by (0x80) bytes [!]
8048426 mov $0x0,%eax
804842b leave
804842c ret
...
Obviously I am not the only person to use this method - as its a very simple concept and easy to implement. And certainly won't catch more complex bugs that require the interaction of many functions.
This requires several passes are made over the binary before any output to the user can occur. My first and second passes gather all symbol, relocation, and cross reference data, followed by function analysis routines. The third pass contains mostly output plugins that make all of the data accessible for display.
Blogspot has a way of jumbling up my text, not to mention its not really formatted nicely to begin with. The vulnerability analysis plugin has lots of 'hint strings' that are basically triggered by the occurence of specific instructions plus a combination of pre-existing knowledge about the static data objects and code that has already been evaluated in previous passes. For now it works on smaller programs. Despite being written in straight C, it can sometimes take awhile to crunch all of this on a large binary like Firefox (and most of the time produces absolute nonsense). The end goal is to have an effcient tool that can process and accurately report on a larger binary.
Tuesday, August 07, 2007
Summer is almost over
As you may have noticed, I have not written a blog entry since June. I am spending my summer relaxing for once and catching up on some reading. Some advisories and beta quality tools will be along shortly.
I often help beginners in the field of information/computer security at work and on a personal level. The question I get asked most often is "what should I start with?!". Usually they are expecting some cool and interesting technique they can dive into like "breaking XYZ encryption!" but they are typically disappointed when I respond with something like "start learning C and reading the Linux kernel source". Thats when their smile fades and they realize they have to go back to stuff they ignored freshmen year of college. Today I came across this "Computer Science From the Bottom Up". Its full of good information for the beginner to computer science, which is a necessary base for computer security. Have fun.
I often help beginners in the field of information/computer security at work and on a personal level. The question I get asked most often is "what should I start with?!". Usually they are expecting some cool and interesting technique they can dive into like "breaking XYZ encryption!" but they are typically disappointed when I respond with something like "start learning C and reading the Linux kernel source". Thats when their smile fades and they realize they have to go back to stuff they ignored freshmen year of college. Today I came across this "Computer Science From the Bottom Up". Its full of good information for the beginner to computer science, which is a necessary base for computer security. Have fun.
Friday, June 08, 2007
Dual Licenses and more
There has been some good discussion on GPL and dual licensing at matasanos blog, and ryan russell has also posted some good thoughts on this. This came right on time for me, as I've been debating lately what to do with a couple of projects I've been working on for awhile. I want to release the code, but it would also be great to sell and/or license it to companies wishing to use it commercially. These projects include a reverse engineering framework and some various network security tools. The RE framework is basically an engine written in C that securely and reliably parses, disassembles and stores massive amounts of data on any ELF object. It basically becomes usable by writing plugins for it. You can write output plugins (I will be including an HTML one with it) and plugins that hook the internal disassembler and ELF parsing routines. I have a couple of plugins ready and I want to release this code soon (1-2 months). So expect an open source version of that with a dual license for companies wishing to license it for commercial use.
** [ Start reading here if you came from bleedingthreats.net ] **
In other news, I posted a basic script today that parses the snort alert file for IP addresses and then queries spamhaus' zen real time blacklist. Feel free to modify and use it in your sensor network (its certainly not production quality as it is now). I am very interested in receiving modifications to the script and general feedback to the idea. I have already seen some interesting trends that I think will prove useful after a few days of correlating data. Enjoy!
Note: Spamhaus is unfortunately under DDOS as I write this though, so don't use it too heavily.
Update - I have posted a new version of the script - please contribute if you make changes
** [ Start reading here if you came from bleedingthreats.net ] **
In other news, I posted a basic script today that parses the snort alert file for IP addresses and then queries spamhaus' zen real time blacklist. Feel free to modify and use it in your sensor network (its certainly not production quality as it is now). I am very interested in receiving modifications to the script and general feedback to the idea. I have already seen some interesting trends that I think will prove useful after a few days of correlating data. Enjoy!
Note: Spamhaus is unfortunately under DDOS as I write this though, so don't use it too heavily.
Update - I have posted a new version of the script - please contribute if you make changes
Subscribe to:
Comments (Atom)