Tuesday, October 10, 2006

Resolving ELF Relocation Name / Symbols

Updated October 4th 2007

I have updated this post to be more accurate as at least 5-10 people a day find it looking for information on ELF relocations/symbols.

From the ELF 1.2 standard:

"Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data."

Each 'call' instruction in your ELF object that calls a function located in a shared object (the function puts() for example) is instead going to call an address in the Procedure Linkage Table (.plt). The PLT is going to resolve the functions real address at runtime. This is because we don't know where the shared library will be loaded in memory at runtime. Lets look at an example ELF object:

Heres a call to a location '0x804833c' within our .text segment:

$ objdump -d example-elf | grep 804833c | grep call
804843f: e8 f8 fe ff ff call 804833c

Here is our PLT. Notice the address '0x804833c' that was called. It's just a jmp to the location at *0x8049684.

$ objdump -d example-elf | grep "section .plt:" -A 31
Disassembly of section .plt:

080482fc <__gmon_start__@plt-0x10>:
80482fc: ff 35 70 96 04 08 pushl 0x8049670
8048302: ff 25 74 96 04 08 jmp *0x8049674
8048308: 00 00 add %al,(%eax)
...

0804830c <__gmon_start__@plt>:
804830c: ff 25 78 96 04 08 jmp *0x8049678
8048312: 68 00 00 00 00 push $0x0
8048317: e9 e0 ff ff ff jmp 80482fc <_init+0x18>

0804831c <__libc_start_main@plt>:
804831c: ff 25 7c 96 04 08 jmp *0x804967c
8048322: 68 08 00 00 00 push $0x8
8048327: e9 d0 ff ff ff jmp 80482fc <_init+0x18>

0804832c <__stack_chk_fail@plt>:
804832c: ff 25 80 96 04 08 jmp *0x8049680
8048332: 68 10 00 00 00 push $0x10
8048337: e9 c0 ff ff ff jmp 80482fc <_init+0x18>

0804833c :
804833c: ff 25 84 96 04 08 jmp *0x8049684
8048342: 68 18 00 00 00 push $0x18
8048347: e9 b0 ff ff ff jmp 80482fc <_init+0x18>

0804834c :
804834c: ff 25 88 96 04 08 jmp *0x8049688
8048352: 68 20 00 00 00 push $0x20
8048357: e9 a0 ff ff ff jmp 80482fc <_init+0x18>
(Remember that little * means the address at 0x8049684) If we look at whats at *0x8049684 we find an entry in the Global Offset Table (GOT) as shown as below.

$ objdump -s example-elf | grep got.plt -A3
Contents of section .got.plt:
804966c 98950408 00000000 00000000 12830408 ................
804967c 22830408 32830408 42830408 52830408 "...2...B...R...

At 0x8049684 we find another address '42830408' which in little endian is 08048342. If you go back up and look at our PLT you will find at 0x8048342 is a push instruction which is directly after our first jmp. It looks like this "push $0x18". 0x18 (24) is the offset into our relocation tables. That is followed by a jmp instruction back to the beginning of our PLT "jmp 80482fc". You may have noticed the beginning of the PLT looks slightly different then the rest of it. These first two instructions "pushl 0x8049670" and "jmp *0x8049674" are important. The first pushes the address at 0x8049670 on the stack (which points to our GOT), and then calls the address located at *0x8049674, which again is located in our GOT. Both of these addresses at those locations are going to be 0x0 in your ELF file, because they are filled in at runtime. At runtime the first value will be a number which identifies the particular library being used and the second will be an address of the linkers symbol resolution routines. These routines will use that offset of 0x18 that was pushed onto the stack earlier to resolve the correct relocation.

This process is called lazy linking, because the relocation is only resolved at runtime when it is needed. After the first time its looked up by the linker, the linker will then edit it's GOT entry, so that when the PLT performs its first jmp *, it will jump directly to the function instead of back to the PLT to push its offset into the relocation tables.

Now, how do we get symbol names for relocation entries? After all we want to see 'snprintf' instead of '0x8049688' ...

When a section has type SHT_REL it contains a bunch of structures that look like this:

typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;


Lets look at readelf reading the relocation entries on an ELF object:

$ readelf -r /testbins/sha1

Relocation section '.rel.dyn' at offset 0x420 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
0804b54c 00001106 R_386_GLOB_DAT 00000000 __gmon_start__
0804b598 00000505 R_386_COPY 0804b598 stderr
0804b59c 00000d05 R_386_COPY 0804b59c stdin

Relocation section '.rel.plt' at offset 0x438 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0804b55c 00000107 R_386_JUMP_SLOT 00000000 feof
0804b560 00000207 R_386_JUMP_SLOT 00000000 putchar

Where do all these entries for each symbol come from?! Offset/Info/Type/Value/Name I didnt see any of those in that struct! Well most of these values are going to be pulled from the r_info field, or using a value from the r_info field to find what we are looking for. The offset value is where in our PLT you can find the relocation. If you were to disassemble your binary you would see a jmp to this address at the correct offset inside the PLT. The field we want to look at here is the r_info field. There are two (well actually three, the third is a combination of 1 and 2) important macros held in elf.h we will be using.

#define ELF32_R_SYM(val) ((val) >> 8)
#define ELF32_R_TYPE(val) ((val) & 0xff)

Each of these macros is designed to take in the r_info field and output a different result. Lets back up a step. The section header that has type SHT_REL (the relocation section we are looking at) is going to have another member called 'sh_link'. This holds an important value. The sh_link value is a pointer to another section thats going to hold symbolic information for these relocations - including its function name string. It is typically the section labeled 'dynsym' by GCC. This is readelf parsing the section header of /bin/ls (cropped for readability).

$ readelf -S /bin/ls
   There are 26 section headers, starting at offset 0x126d8:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[ 4] .dynsym DYNSYM 080484a0 0004a0 0006b0 10 A 5 1 4
...
[ 8] .rel.dyn REL 08049170 001170 000028 08 A 4 0 4
[ 9] .rel.plt REL 08049198 001198 0002f0 08 A 4 11 4
...
Notice the sh_link member of both sections 8 and 9. Its the #4 which points us to our dynamic symbol table section. Thats where the symbol names that correspond with our relocation entries can be found. So lets recap. #1 scan the section header for a section that is type SHT_REL #2 grab its sh_link member value.

Ok so where does that leave us, we have a bunch of numbers but how do we know what matches up where. Well as we iterate through the relocation table entries we run each r_info value through our ELF32_R_SYM(val) macro and we get a number back. That number corresponds to an entry in the dynamic symbol table (or wherever our sh_link member points us to). Parse that entry in the dynamic symbol table and your symbol name will be resolved.

What about the other macro? Well the other ELF32_R_TYPE(val) is going to tell us what kind of relocation this is. These values are defined in elf.h as R_386_GOT32, R_386_JMP_SLOT and so on. Their definitions as well can be found in elf.h. These are very helpful for trying to find out if this relocation is a function or not (it usually is when found in the rel.plt section).

Remember, not all ELF objects will have a section header. In those cases you can use the program header, to find and parse the dynamic segment. Which will also give you the address of your relocation tables, dynamic symbol tables and more. If you find any errors in this post, feel free to let me know.

10 comments:

Rmn said...

hi there. Can u say more abt relocation entries of type R_386_RELATIVE. And how to perform relocation with those type of entries ?

Relocation Inner Loop said...
This comment has been removed by a blog administrator.
Moving To The UK said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Ragha said...

Hello,

This is a wonderful post, extremely helpful!


I am working on a TriCore/ T-Kernel based environment and I need to perform load time relocation. Currently, the environment does not support relocation, but supports Position independent code. What I want to achieve is, loading an application in any desired address space, although at Link time, an address space may have been allocated.

How do I go about this? Is it enough if I change all section addresses, symbol table entries and got entries?

I would be grateful if you can guide me on this. Thanks!

Cheers,
Raghavendran V

Vijay said...

Hi,
can you let us know if there is a way to resolve addresses to symbols of a statically linked elf executable using libelf ?

MAK said...

Awesome Post!!!

Mike said...

There's further information regarding ELF x86_64 relocations here, together with a few graphics which show how what each type represents.

AR mirza said...


Just want to say your article is as amazing. The clearness in your post is Just nice and i could assume you are an expert on this subject. Fine with your permission let me to grab your feed to keep up to date with forthcoming post. Thanks a million and please keep up the gratifying work.



cargo services in ajman
cargo services in Sharjah

IGL RELOCATION said...

This is the first time I have come across your site. Post info, I’ll be back soon. Thanks! Domestic and International Moving Services