Wednesday, June 23, 2010

Its 2010 and your browser has an assembler

So it's 2010 and your browser has an assembler. How the hell did that happen? Simple, performance. Your browser probably includes at least one JIT (Just In Time Compiler). One for Javascript (Chrome, Firefox) and possibly others for plugins like Adobe Flash Player. If you're using Firefox with the Adobe Flash Player plugin then it has two instances of the same JIT called NanoJIT. Firefox uses SpiderMonkey/TraceMonkey as its front end to NanoJIT and Flash uses Tamarin as its front end. TraceMonkey watches for Javascript code that could benefit from being compiled into native code. It translates Javascript into LIR (Low-Level Intermediate Representation) instructions which NanoJIT turns into Native x86, x86_64, Sparc, PPC or ARM instructions. This is why your browser has an assembler in 2010.

Surely native code generation in your browser can't be bad for security? Dionysus Blazakis figured out a clever exploitation technique [1] where he used ActionScript byte code to JIT-spray a browser. The native code generated by Tamarin/NanoJIT was easy to influence via the ActionScript under his control. After reading Dion's paper last year I wanted to write my own JIT spray and research other JITs. I started with NanoJIT in Firefox. Along the way I noticed some interesting things that I thought would make a good blog post. My research isn't done but this writeup has been for awhile now and there is no sense in letting it collect dust in a directory any longer.

So first the basics. Pages for NanoJIT on Windows are allocated using the VirtualAlloc function and have RWX permissions. VirtualAlloc is not subject to ASLR (Address Space Layout Randomization) so it does not return randomized page locations. i.e. An attackers JavaScript can cause many allocations of contiguous RWX page mappings. These allocations are not at a 32-bit x86 page size granularity but their sizes are static. The next JIT-page allocation should be at 0x10000 bytes past the previous one. Dion also notes this in his paper. This is bad, but not the end of the world.

Heres some output from a WinDbg plugin I wrote (JitFind) that tracks the pages NanoJIT is using for compiled JavaScript. There are some holes in the range but as you can clearly see there are multiple contiguous allocations.

[ JitFind ] (25) 0x03510000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (26) 0x03520000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (27) 0x03530000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (28) 0x03540000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (29) 0x03550000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (30) 0x03560000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (31) 0x03570000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (32) 0x03580000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (33) 0x03590000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (34) 0x035a0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (35) 0x039d0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (36) 0x039e0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (37) 0x039f0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (38) 0x03b00000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (39) 0x03b10000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (40) 0x03b20000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (41) 0x03b30000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (42) 0x03c40000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (43) 0x03c60000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (44) 0x03c70000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (45) 0x03c80000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX

NanoJIT internally uses a CodeList structure to track JIT pages, within each page is a pointer to the previous page, the next page and the offset where the executable code can be found. In the case of NanoJIT, JitFind works by monitoring for calls to VMPI_setPageProtection, nanojit::Assembler::endAssembly, and nanojit::CodeAlloc::reset and then walks the list for pages it might have missed. JitFind can also track pages in any process marked executable with VirtualProtect, and I'm in the process of adding support for other JIT's as well. Writing JitFind was a large part of this research effort and if I accomplished nothing else, its a pretty nice plugin.

Mozilla had already started to tackle part of the problem awhile back [2]. If you couldn't read that whole Bugzilla thread then I will sum it up for you. Mozilla's current Linux solution to solve the RWX pages problem is to make a mirror page. One is W and one is X where writes to one are translated to another. There is talk of porting the patch to Win32 and OSX as well. I don't believe this is the best solution as its goal is to make an SELinux policy happy (no WX pages) and ignores the Windows user base with a bigger problem (contiguous RWX page allocations at non-randomized locations). At best its a partial solution. The contiguous page issue does not affect Linux because NanoJIT will use the mmap function to allocate pages and mmap is subject to ASLR, which means it allocates pages at a random location.

So its quite obvious if an attacker can get his code into those RWX page regions, or a write4 anywhere condition, via an overflow or dangling pointer, he stands a pretty good chance of guessing where they are because of VirtualAlloc. But theres another twist to this issue and when combined with the contiguous RWX pages, makes for an interesting exploitation scenario.

NanoJIT writes code backwards from the end of the page. This is why code offsets from the beginning of the page appear to change. One of the important things to know is that NanoJIT also assembles a large jump table and other static code inline with JIT'd code. This static code sets up stack frames and performs other small work for the transition to/from JIT'd code. This means there are static instructions at known offsets repeated across contiguous RWX pages. I started thinking, might an attacker be able to write Javascript that forces the allocation of many RWX JIT pages, trigger his vulnerability and then transfer execution to [ JIT_PAGE + PAGE_SIZE - ROP Gadget ] to get arbitrary code execution? If your not familiar with ROP, it stands for 'Return Oriented Programming' [3] [4]. Now of course its not as easy as that in the real world. We need to find useable ROP gadgets within that static code. This is done by analyzing a large set of JIT pages, determining if any RET instructions can be found at predictable locations and then finding usable ROP gadgets preceding them. I attempted this using a combination of JitFind to dump the raw JIT RWX page contents as they were created and a simple Ruby script that analyzed each page dump for consistent RET instructions and used rbdasm to dump those potential ROP gadgets.

First we need to write some JavaScript that will force Firefox into allocating a bunch of contiguous pages that contain the same static code before we can realistically look for ROP gadgets. Resource intensive JavaScript will be required, thats why the JIT was created in the first place. Remember we are only interested in the static code produced by the JIT, not influencing the instructions like Dion did. I thought about this problem for awhile. Some simple math problems wouldn't do the trick so I turned to graphics. A Google search for JavaScript ray tracers turned up exactly what I needed. I borrowed a ray tracer written in JavaScript from the internet and after some minor tweaking NanoJIT started to sing. My tweaks mainly consisted of instructing the ray tracer to draw high quality large resolution images. With this I could now force the allocation of many (40 to 50) contiguous RWX allocations through a single instance of the ray tracer. There are a few holes in the address space here, presumably due to the system heap allocator stealing them but I haven't looked into that. This minor detail obviously hurts the reliability of the technique and I need to research it further.

Heres the output of my gogo-gadget-finder ruby script analyzing 224 pages created by five simultaneous instances of the ray tracer in a single Firefox instance. One issue here is that as pages are no longer needed they are free'd and sometimes allocated again by another ray tracer instance. I filtered these duplicates out and while it lowered the overall number of pages I was scanning it is a more accurate reflection of the process image during runtime. This behavior appears to be different then that of Tamarin and Dions experience with Flash where pages were not immediately unmapped.

* As Dion points out in the comments, he kept the JIT pages mapped using a reference to functions on them. I suspect you can do something similar in Javascript but I have not looked into it.
$ ruby gogo-gadget-finder.rb -m 5
RET @ 0xffd7 (10)
|_ raw_page_2150000-1947765593.bin
|_ raw_page_23d0000-2279729936.bin
|_ raw_page_23f0000-50251687.bin
|_ raw_page_2750000-2059690635.bin
|_ raw_page_29f0000-3894631435.bin
|_ raw_page_2d10000-2295099192.bin
|_ raw_page_2e60000-1268413647.bin
|_ raw_page_2e80000-1700891559.bin
|_ raw_page_3560000-1344858230.bin
|_ raw_page_36c0000-1565064345.bin
RET @ 0xe437 (5)
|_ raw_page_2150000-1947765593.bin
|_ raw_page_23f0000-50251687.bin
|_ raw_page_2750000-2059690635.bin
|_ raw_page_2e60000-1268413647.bin
|_ raw_page_35f0000-1316993084.bin
RET @ 0xfb09 (5)
|_ raw_page_1eb0000-1004861587.bin
|_ raw_page_2e70000-2748893595.bin
|_ raw_page_35d0000-2041835722.bin
|_ raw_page_3620000-3921514024.bin
|_ raw_page_3620000-467968694.bin
RET @ 0xffef (6)
|_ raw_page_1ea0000-181584545.bin
|_ raw_page_29a0000-141065932.bin
|_ raw_page_29b0000-3609538306.bin
|_ raw_page_2e60000-1313636049.bin
|_ raw_page_3580000-330610872.bin
|_ raw_page_35c0000-420260428.bin

Thats not a lot of repeated RET instructions, and the ones that do match are generated by the NanoJIT assembler and not in the middle of another instruction stream. Each of these repeated RET's is preceded by a large jump table and then a 'pop ebp' instruction. Whats worse is that the repeated RETs that do exist are not in pages that border one another. So while our ray tracer is generating the right number of pages, we are not getting consistent reusable gadgets across them.

So where does this leave us? ROP gadgets ending in RET instructions is going to be difficult to pull off. We didn't even find a stack pivot gadget to kick off the process. More research is needed.

Next Steps:

It may be possible to chain together sequences of 'pop %reg; jmp %reg;' to get code execution [5]. These too of course must meet the same requirements as our RET based ROP gadgets. They must be at static locations across many of the JIT pages. More fine grained control over the page creation is also desirable. The ray tracer proved useful for generating pages to search for ROP gadgets but further research of NanoJIT is needed to know whether we can influence a specific jump table or other static code construct. I only looked at NanoJITs generation of 32bit x86 code, it supports other architectures too! Of course researching similar issues on other JITs is also on the radar.

Conclusion:

This is not a specific flaw in Firefox or NanoJIT. If anything its an architectural oversight that should be further researched and hardened. The Mozilla team obviously cares about your security, which is evident by the SELinux thread from last year, but I think the problem is bigger on Windows and thats where the focus should be. This is not a trivial problem to solve on any platform. There are several factors at play including attacker influenced native code (Dions attack), static code produced by NanoJIT, page permissions and page allocation specifics. VirtualAlloc makes the contiguous allocations unavoidable short of a VirtualAlloc wrapper or another randomized page allocation API. But keeping the page permissions correct is a good place to start. Of course if you have ever written a browser exploit you already know that there are much easier ways to get code execution, DLLs loaded at fixed addresses or heap spray to name a few.

In the end this could theoretically result in an ASLR / DEP bypass for Firefox or any application that uses NanoJIT. The combination of repeated ROP gadgets in contiguous RWX page allocations makes it worth further study. And while that has not been proven here it was fun to research. Hope you enjoyed the post rant.

Note:

If your a Firefox user and the idea of a JavaScript JIT makes you nervous, I have good news for you. You can disable it: https://wiki.mozilla.org/JavaScript:TraceMonkey#Playing_with_TraceMonkey


Another JavaScript JIT Spray Presentation:

7 comments:

Dion said...

Really thorough job -- well done. My initial plan was to try and use existing JITed pages (for ROP) and not try to influence the output, but with Flash it was quite easy to do. I'm really excited to see someone tried and (mostly) succeeded with that line of attack. Really cool :)

One comment on:
"This behavior appears to be different then that of Tamarin and Dions experience with Flash where pages were not immediately unmapped."
Just to make it clear, I kept each of my JITed pages from being reclaimed (can't quite remember exactly how -- back to Tamarin internal heap? to Tamarin code page cache?) by keeping a reference to a function on that page. Without the reference I was unable to maintain a spray.

That might have been what you meant, but I just wanted to make it clear.

Anyway, great post.

-- Dion

Chris Rohlf said...

Dion - That makes sense and I suspect you could play similar games with JavaScript to keep the pages around. The ray tracer served its purpose but I need more fine grained control over the page creation.

David said...

Only just started reading but I think the two addresses:

0x03510000
0x03520000

are 16 pages apart in non-PSE X86, not contiguous as you say?

Chris Rohlf said...

David,

Poor choice of words on my part. They are always 0x10000 apart. Dions paper also notes this. I'll correct my wording :)

David said...

Chris,

Great. Looking forward to reading the rest.

Chris Rohlf said...

David,

I should also point out these mappings aren't made at a page granularity. They are bigger than that. i.e. VirtualAlloc is not saying allocate(4096); its grabbing much larger sections of memory. So they technically are contiguous mappings. But ill make sure to clear that up.

David said...

Chris,

Yeah that makes sense. I should've waited til I'd read the entire paper before posting. Still, s/contiguous page/contiguous mapping/ should clear that up. :D