mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-05 13:21:00 +00:00
Create linkers.md
This commit is contained in:
parent
377cbdf5ca
commit
dff801f85d
636
Misc/linkers.md
Normal file
636
Misc/linkers.md
Normal file
@ -0,0 +1,636 @@
|
|||||||
|
Introduction
|
||||||
|
---------------
|
||||||
|
|
||||||
|
During writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received and continue to receive many emails with the similar questions related with the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and a linker related stuff. So've decided to write this post that will cover some aspects related to the linker and linking of object files.
|
||||||
|
|
||||||
|
If we will open page about `Linker` on wikipidia, we will see following definition:
|
||||||
|
|
||||||
|
```
|
||||||
|
In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file.
|
||||||
|
```
|
||||||
|
|
||||||
|
If you've wrote at least one program on C in your life, you saw files with the `*.o` extension. These files are [object files](https://en.wikipedia.org/wiki/Object_file). Object files are blocks of machine code and data with uncertain addresses of references to data and functions in other object files (or libraries), as well as a list of its own functions and data. The main purpose of the linker is collect/handle code and data of the each object file to the the final executable file or library. In this post we will try to go through all aspects of this process. Let's start.
|
||||||
|
|
||||||
|
Linking process
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Let's create simple project with the following structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
*-linkers
|
||||||
|
*--main.c
|
||||||
|
*--lib.c
|
||||||
|
*--lib.h
|
||||||
|
```
|
||||||
|
|
||||||
|
And write there for example factorial program. Our `main.c` source code file will contain:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
#include "lib.h"
|
||||||
|
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
printf("factorial of 5 is: %d\n", factorial(5));
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `lib.c` will contain:
|
||||||
|
|
||||||
|
```C
|
||||||
|
int factorial(int base) {
|
||||||
|
int res = 1, i = 1;
|
||||||
|
|
||||||
|
if (base == 0) {
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (i <= base) {
|
||||||
|
res *= i;
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the `lib.h`:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#ifndef LIB_H
|
||||||
|
#define LIB_H
|
||||||
|
|
||||||
|
int factorial(int base);
|
||||||
|
|
||||||
|
#endif
|
||||||
|
```
|
||||||
|
|
||||||
|
Now let's compile only `main.c` source code file with the:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ gcc -c main.c
|
||||||
|
```
|
||||||
|
|
||||||
|
If we will look inside with the `nm` util, we will see following output:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nm -A main.o
|
||||||
|
main.o: U factorial
|
||||||
|
main.o:0000000000000000 T main
|
||||||
|
main.o: U printf
|
||||||
|
```
|
||||||
|
|
||||||
|
The `nm` util allows us to see the list of symbols from the given object file. Look on the its output, it consists from the three columns: the first is the name of the given object file and address of resolved symbols. The second column contains symbol that determines status of the given symbol. In our case the `U` is `undefined` symbol and the `T` is the symbols that placed in the `.text` section. So `nm` util shows us that we have three symbols in the `main.c` source code file:
|
||||||
|
|
||||||
|
* `factorial` - factorial function that defined in the `lib.c` source code file and marked as `undefined` because we compile only `main.c` source code file and it does not know anything about code from the `lib.c` for now;
|
||||||
|
* `main` - main function;
|
||||||
|
* `printf` - function from the [glibc](https://en.wikipedia.org/wiki/GNU_C_Library) library and `main.c` does not anything about it for now too.
|
||||||
|
|
||||||
|
What we can understand from the output of the `nm` for this moment? All is simple. The `main.o` object file contains local symbol `main` by the `0000000000000000` (it will be filled with correct address after the linking) and two unresolved symbols. We can see all of this information in the disassembly output of the `main.o` object file:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ objdump -S main.o
|
||||||
|
|
||||||
|
main.o: file format elf64-x86-64
|
||||||
|
Disassembly of section .text:
|
||||||
|
|
||||||
|
0000000000000000 <main>:
|
||||||
|
0: 55 push %rbp
|
||||||
|
1: 48 89 e5 mov %rsp,%rbp
|
||||||
|
4: 48 83 ec 10 sub $0x10,%rsp
|
||||||
|
8: 89 7d fc mov %edi,-0x4(%rbp)
|
||||||
|
b: 48 89 75 f0 mov %rsi,-0x10(%rbp)
|
||||||
|
f: bf 05 00 00 00 mov $0x5,%edi
|
||||||
|
14: e8 00 00 00 00 callq 19 <main+0x19>
|
||||||
|
19: 89 c6 mov %eax,%esi
|
||||||
|
1b: bf 00 00 00 00 mov $0x0,%edi
|
||||||
|
20: b8 00 00 00 00 mov $0x0,%eax
|
||||||
|
25: e8 00 00 00 00 callq 2a <main+0x2a>
|
||||||
|
2a: b8 00 00 00 00 mov $0x0,%eax
|
||||||
|
2f: c9 leaveq
|
||||||
|
30: c3 retq
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we are interesting only in the two `callq` operations. The two `callq` operations contain `linker stubs` or in another words function name and offset from it to the next instruction. These stubs will be updated to the real addresses of the functions. We can see these functions names with in the following `objdump` output:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ objdump -S -r main.o
|
||||||
|
|
||||||
|
...
|
||||||
|
14: e8 00 00 00 00 callq 19 <main+0x19>
|
||||||
|
15: R_X86_64_PC32 factorial-0x4
|
||||||
|
19: 89 c6 mov %eax,%esi
|
||||||
|
...
|
||||||
|
25: e8 00 00 00 00 callq 2a <main+0x2a>
|
||||||
|
26: R_X86_64_PC32 printf-0x4
|
||||||
|
2a: b8 00 00 00 00 mov $0x0,%eax
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The `-r` or `--reloc ` flags of the `objdump` util print the `relocation` entries of the file. Now let's know a little about relocation process.
|
||||||
|
|
||||||
|
Relocation
|
||||||
|
------------
|
||||||
|
|
||||||
|
Relocation is the process of connecting symbolic references with symbolic definitions. Let's look on the previous snippet from the `objdump` output:
|
||||||
|
|
||||||
|
```
|
||||||
|
14: e8 00 00 00 00 callq 19 <main+0x19>
|
||||||
|
15: R_X86_64_PC32 factorial-0x4
|
||||||
|
19: 89 c6 mov %eax,%esi
|
||||||
|
```
|
||||||
|
|
||||||
|
Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`. Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man:
|
||||||
|
|
||||||
|
```
|
||||||
|
-mcmodel=small
|
||||||
|
Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quoute from `gcc` manual. In this way 4-bytes enough for this. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ gcc main.c lib.c -o factorial | objdump -S factorial | grep factorial
|
||||||
|
|
||||||
|
factorial: file format elf64-x86-64
|
||||||
|
...
|
||||||
|
...
|
||||||
|
0000000000400506 <main>:
|
||||||
|
40051a: e8 18 00 00 00 callq 400537 <factorial>
|
||||||
|
...
|
||||||
|
...
|
||||||
|
0000000000400537 <factorial>:
|
||||||
|
400550: 75 07 jne 400559 <factorial+0x22>
|
||||||
|
400557: eb 1b jmp 400574 <factorial+0x3d>
|
||||||
|
400559: eb 0e jmp 400569 <factorial+0x32>
|
||||||
|
40056f: 7e ea jle 40055b <factorial+0x24>
|
||||||
|
...
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You already can know that standard C program is linked with the `glibc` C standard library if the `-nostdlib` was not passed to the `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that placed in the special section which is called `.init`. We can see it in the beginning of the objdump output:
|
||||||
|
|
||||||
|
```
|
||||||
|
objdump -S factorial | less
|
||||||
|
|
||||||
|
factorial: file format elf64-x86-64
|
||||||
|
|
||||||
|
Disassembly of section .init:
|
||||||
|
|
||||||
|
00000000004003a8 <_init>:
|
||||||
|
4003a8: 48 83 ec 08 sub $0x8,%rsp
|
||||||
|
4003ac: 48 8b 05 a5 05 20 00 mov 0x2005a5(%rip),%rax # 600958 <_DYNAMIC+0x1d0>
|
||||||
|
```
|
||||||
|
|
||||||
|
Not that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ readelf -d factorial | grep \(INIT\)
|
||||||
|
0x000000000000000c (INIT) 0x4003a8
|
||||||
|
```
|
||||||
|
|
||||||
|
So, the address of the `main` function is the `0000000000400506` and it is offset from the `.init` section. As we can see from the output, the address of the `factorial` function is `0x0000000000400537` and binary code for the call of the `factorial` function now is `e8 18 00 00 00`. We already knwo that `e8` is opcode for the `call` instruction, the next `18 00 00 00` (note that address represented as little endian for the `x86_64`, in other words it is `00 00 00 18`) is the offset from the `callq` to the `factorial` function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> hex(0x40051a + 0x18 + 0x5) == hex(0x400537)
|
||||||
|
True
|
||||||
|
```
|
||||||
|
|
||||||
|
So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of they will be overlapped. Just now we saw a process which called - `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses.
|
||||||
|
|
||||||
|
Ok, now we know a little about linkers and relocation. Time to link our object files and to know more about linkers.
|
||||||
|
|
||||||
|
GNU linker
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
As you can understand from the title, I will use [GNU linker](https://en.wikipedia.org/wiki/GNU_linker) or just `ld` in this post. Of course we can use `gcc` to link our `factorial` project:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ gcc main.c lib.o -o factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
and after it we will get executable file - `factorial` as a result:
|
||||||
|
|
||||||
|
```
|
||||||
|
./factorial
|
||||||
|
factorial of 5 is: 120
|
||||||
|
```
|
||||||
|
|
||||||
|
But `gcc` does not link object files. Instead it uses `collect2` which is just wrapper for the `GNU ld` linker:
|
||||||
|
|
||||||
|
```
|
||||||
|
~$ /usr/lib/gcc/x86_64-linux-gnu/4.9/collect2 --version
|
||||||
|
collect2 version 4.9.3
|
||||||
|
/usr/bin/ld --version
|
||||||
|
GNU ld (GNU Binutils for Debian) 2.25
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Ok, we can use gcc and it will produce executable file of our program for us. But let's look how to use `GNU ld` linker for the same purpose. First of all let's try to link these object files with the following example:
|
||||||
|
|
||||||
|
```
|
||||||
|
ld main.o lib.o -o factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
Try to do it and you will get following error:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ld main.o lib.o -o factorial
|
||||||
|
ld: warning: cannot find entry symbol _start; defaulting to 00000000004000b0
|
||||||
|
main.o: In function `main':
|
||||||
|
main.c:(.text+0x26): undefined reference to `printf'
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we can see two problems:
|
||||||
|
|
||||||
|
* Linker can't find `_start` symbol;
|
||||||
|
* Linker does not know anything about `printf` function.
|
||||||
|
|
||||||
|
First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I've started to learn programming I have learned that `main` function is the entry point of the program. I think you learned this too :) But actually it is not entry point, there is `_start` instead. The `_start` symbol defined in the `crt1.o` object file. We can find it with the:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o
|
||||||
|
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o: file format elf64-x86-64
|
||||||
|
|
||||||
|
|
||||||
|
Disassembly of section .text:
|
||||||
|
|
||||||
|
0000000000000000 <_start>:
|
||||||
|
0: 31 ed xor %ebp,%ebp
|
||||||
|
2: 49 89 d1 mov %rdx,%r9
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
and we pass this object file to the `ld` command as first argumet (see above). Now let's try to link it and will look on result:
|
||||||
|
|
||||||
|
```
|
||||||
|
ld /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \
|
||||||
|
main.o lib.o -o factorial
|
||||||
|
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o: In function `_start':
|
||||||
|
/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:115: undefined reference to `__libc_csu_fini'
|
||||||
|
/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:116: undefined reference to `__libc_csu_init'
|
||||||
|
/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:122: undefined reference to `__libc_start_main'
|
||||||
|
main.o: In function `main':
|
||||||
|
main.c:(.text+0x26): undefined reference to `printf'
|
||||||
|
```
|
||||||
|
|
||||||
|
Unfortunately we will see even more errors. We can see here old error about undefined `printf` and yet another three undefined references:
|
||||||
|
|
||||||
|
* `__libc_csu_fini`
|
||||||
|
* `__libc_csu_init`
|
||||||
|
* `__libc_start_main`
|
||||||
|
|
||||||
|
The `_start` symbol defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there:
|
||||||
|
|
||||||
|
```assembly
|
||||||
|
mov $__libc_csu_fini, %R8_LP
|
||||||
|
mov $__libc_csu_init, %RCX_LP
|
||||||
|
...
|
||||||
|
call __libc_start_main
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when program runned and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files:
|
||||||
|
|
||||||
|
* `crtn.o`;
|
||||||
|
* `crtn.i`.
|
||||||
|
|
||||||
|
Defines the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively).
|
||||||
|
|
||||||
|
The `crtn.o` object file contains these `.init` and `.fini` sections:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o
|
||||||
|
|
||||||
|
0000000000000000 <.init>:
|
||||||
|
0: 48 83 c4 08 add $0x8,%rsp
|
||||||
|
4: c3 retq
|
||||||
|
|
||||||
|
Disassembly of section .fini:
|
||||||
|
|
||||||
|
0000000000000000 <.fini>:
|
||||||
|
0: 48 83 c4 08 add $0x8,%rsp
|
||||||
|
4: c3 retq
|
||||||
|
```
|
||||||
|
|
||||||
|
And the `crti.o` contains `_init` and `_fini` symbols. Let's try to link again with these two object files:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ld \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o \
|
||||||
|
-o factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search the standard library in the paths that are pointed in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ld \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o -lc \
|
||||||
|
-o factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally we will get executable file, but if we will try to run it, we will get strange result:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ./factorial
|
||||||
|
bash: ./factorial: No such file or directory
|
||||||
|
```
|
||||||
|
|
||||||
|
What's the problem here? Let's look on the executable file with the [readelf](https://sourceware.org/binutils/docs/binutils/readelf.html) util:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ readelf -l factorial
|
||||||
|
|
||||||
|
Elf file type is EXEC (Executable file)
|
||||||
|
Entry point 0x4003c0
|
||||||
|
There are 7 program headers, starting at offset 64
|
||||||
|
|
||||||
|
Program Headers:
|
||||||
|
Type Offset VirtAddr PhysAddr
|
||||||
|
FileSiz MemSiz Flags Align
|
||||||
|
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
|
||||||
|
0x0000000000000188 0x0000000000000188 R E 8
|
||||||
|
INTERP 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
|
||||||
|
0x000000000000001c 0x000000000000001c R 1
|
||||||
|
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
|
||||||
|
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
|
||||||
|
0x0000000000000610 0x0000000000000610 R E 200000
|
||||||
|
LOAD 0x0000000000000610 0x0000000000600610 0x0000000000600610
|
||||||
|
0x00000000000001cc 0x00000000000001cc RW 200000
|
||||||
|
DYNAMIC 0x0000000000000610 0x0000000000600610 0x0000000000600610
|
||||||
|
0x0000000000000190 0x0000000000000190 RW 8
|
||||||
|
NOTE 0x00000000000001e4 0x00000000004001e4 0x00000000004001e4
|
||||||
|
0x0000000000000020 0x0000000000000020 R 4
|
||||||
|
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
|
||||||
|
0x0000000000000000 0x0000000000000000 RW 10
|
||||||
|
|
||||||
|
Section to Segment mapping:
|
||||||
|
Segment Sections...
|
||||||
|
00
|
||||||
|
01 .interp
|
||||||
|
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame
|
||||||
|
03 .dynamic .got .got.plt .data
|
||||||
|
04 .dynamic
|
||||||
|
05 .note.ABI-tag
|
||||||
|
06
|
||||||
|
```
|
||||||
|
|
||||||
|
Note on the strange line:
|
||||||
|
|
||||||
|
```
|
||||||
|
INTERP 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
|
||||||
|
0x000000000000001c 0x000000000000001c R 1
|
||||||
|
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of an Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it placed in the `/lib64/ld-linux-x86-64.so.2` for the `x86_64`. Now let's add pass `-dynamic-linker` option with the path of the `ld-linux-x86-64.so.2` to the `ld` and will see on the result:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ gcc -c main.c lib.c
|
||||||
|
|
||||||
|
$ ld \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \
|
||||||
|
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o \
|
||||||
|
-dynamic-linker /lib64/ld-linux-x86-64.so.2 \
|
||||||
|
-lc -o factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
Now we can run it as normal executable file:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ./factorial
|
||||||
|
|
||||||
|
factorial of 5 is: 120
|
||||||
|
```
|
||||||
|
|
||||||
|
It works! With the first line we compile the `main.c` and the `lib.c` source code files to the object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ file lib.o main.o
|
||||||
|
lib.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
|
||||||
|
main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
|
||||||
|
```
|
||||||
|
|
||||||
|
and after this we link object files of the our program with the needed system object files and libraries. We just saw simple example how to compile and link C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker` and etc. Moreover `GNU ld` has own language that allows to control of the linking process. In the next two paragraps we will look it.
|
||||||
|
|
||||||
|
Useful command line options of the GNU linker
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o <output>` - that tells `ld` to produce an output file called `output` as the result of linking, `-l<name>` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course the `ld` supports much more command line options, let's look on some of it.
|
||||||
|
|
||||||
|
The first useful command line option is `@file`. In this case the `file` specifies filename where command line options will be read. For example we can create file with the name `linker.ld`, put there our command line arguments from the previous example and execute it with:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ld @linker.ld
|
||||||
|
```
|
||||||
|
|
||||||
|
The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is command line option for the same purpose but for the output file: `--oformat=output-format`.
|
||||||
|
|
||||||
|
The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case when this command line option can be useful. For example let's look in the Linux kernel source code and more precisely in the Makefile that related to the kernel decompression for ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile). We can find following definition there:
|
||||||
|
|
||||||
|
```
|
||||||
|
LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ)
|
||||||
|
```
|
||||||
|
|
||||||
|
As we already know, it defines the `_kernel_bss_size` symbol with the size of the `.bss` section in the output file. This symbol will be used in the first [assembly file](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/head.S) that will be executed during kernel decompressing:
|
||||||
|
|
||||||
|
```assembly
|
||||||
|
ldr r5, =_kernel_bss_size
|
||||||
|
```
|
||||||
|
|
||||||
|
The next command line options is the `-shared` that allows us to create shared library. The `-M` or `-map <filename>` command line option prints the linking map with the information about symbols. In our case:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ld -M @linker.ld
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
.text 0x00000000004003c0 0x112
|
||||||
|
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
|
||||||
|
*(.text.exit .text.exit.*)
|
||||||
|
*(.text.startup .text.startup.*)
|
||||||
|
*(.text.hot .text.hot.*)
|
||||||
|
*(.text .stub .text.* .gnu.linkonce.t.*)
|
||||||
|
.text 0x00000000004003c0 0x2a /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
.text 0x00000000004003ea 0x31 main.o
|
||||||
|
0x00000000004003ea main
|
||||||
|
.text 0x000000000040041b 0x3f lib.o
|
||||||
|
0x000000000040041b factorial
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not full set of the command line options support by the `ld` util. Full description you can find in the manual of this util.
|
||||||
|
|
||||||
|
Control Language linker
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
As I wrote previously, the `ld` has support of the own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details.
|
||||||
|
|
||||||
|
With the linker language we can control:
|
||||||
|
|
||||||
|
* input files;
|
||||||
|
* output files;
|
||||||
|
* file formats
|
||||||
|
* addresses of sections;
|
||||||
|
* and etc.
|
||||||
|
|
||||||
|
Usually commands written on linker control language placed in a file that called - linker script. We can pass it to the `ld` with the `-T` command line option. The main command in the each linker script is the `SECTIONS`. Each linker script must contain this command and it determines the `map` of the output file. The special variable - `.` contains current position of the output. Let's write simple assembly program and will look how we can use linker script to control linking of this program. For example it will be hello world:
|
||||||
|
|
||||||
|
```assembly
|
||||||
|
section .data
|
||||||
|
msg db "hello, world!",`\n`
|
||||||
|
section .text
|
||||||
|
global _start
|
||||||
|
_start:
|
||||||
|
mov rax, 1
|
||||||
|
mov rdi, 1
|
||||||
|
mov rsi, msg
|
||||||
|
mov rdx, 14
|
||||||
|
syscall
|
||||||
|
mov rax, 60
|
||||||
|
mov rdi, 0
|
||||||
|
syscall
|
||||||
|
```
|
||||||
|
|
||||||
|
We can compile and link it with the:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nasm -f elf64 -o hello.o hello.asm
|
||||||
|
$ ld -o hello hello.o
|
||||||
|
```
|
||||||
|
|
||||||
|
Our program consists from tw sections: `.text` - contains code of the program and `.data` - contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is:
|
||||||
|
|
||||||
|
```
|
||||||
|
/*
|
||||||
|
* Linker script for the factorial
|
||||||
|
*/
|
||||||
|
OUTPUT(hello)
|
||||||
|
OUTPUT_FORMAT("elf64-x86-64")
|
||||||
|
INPUT(hello.o)
|
||||||
|
|
||||||
|
SECTIONS
|
||||||
|
{
|
||||||
|
. = 0x200000;
|
||||||
|
.text : {
|
||||||
|
*(.text)
|
||||||
|
}
|
||||||
|
|
||||||
|
. = 0x400000;
|
||||||
|
.data : {
|
||||||
|
*(.data)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
On the first three lines you can see comment that written with `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` command specifies name of the our executable file and its format. The next command - is `INPUT` specfies input file to the `ld` linker. After all of this command we can see main `SECTIONS` command, as I already wrote each linker script must contain definition of this command. The `SECTIONS` command represents set and order of the sections which are will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` section as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In another words the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section.
|
||||||
|
|
||||||
|
We can compile and link it with the:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nasm -f elf64 -o hello.o hello.S && ld -T linker.script && ./hello
|
||||||
|
hello, world!
|
||||||
|
```
|
||||||
|
|
||||||
|
If we will look inside with the `objdump` util, we will see that `.text` section starts from the `0x200000` and the `.data` sections starts from the `0x400000` address:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ objdump -D hello
|
||||||
|
|
||||||
|
Disassembly of section .text:
|
||||||
|
|
||||||
|
0000000000200000 <_start>:
|
||||||
|
200000: b8 01 00 00 00 mov $0x1,%eax
|
||||||
|
...
|
||||||
|
|
||||||
|
Disassembly of section .data:
|
||||||
|
|
||||||
|
0000000000400000 <msg>:
|
||||||
|
400000: 68 65 6c 6c 6f pushq $0x6f6c6c65
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Except of those comands that we have already seen, there are a few other linker scripts commands. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that setup header of the Linux kernel has offset - `0x1f1`. In the linker script of the Linux kernel we can find check for this:
|
||||||
|
|
||||||
|
```
|
||||||
|
. = ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!");
|
||||||
|
```
|
||||||
|
|
||||||
|
The next `INCLUDE filename` command allows to include external linker script symbols to the current. In a linker script we can assign a value to a symbol. The `ld` support a couple of assignment operators:
|
||||||
|
|
||||||
|
* symbol = expression ;
|
||||||
|
* symbol += expression ;
|
||||||
|
* symbol -= expression ;
|
||||||
|
* symbol *= expression ;
|
||||||
|
* symbol /= expression ;
|
||||||
|
* symbol <<= expression ;
|
||||||
|
* symbol >>= expression ;
|
||||||
|
* symbol &= expression ;
|
||||||
|
* symbol |= expression ;
|
||||||
|
|
||||||
|
As you can note all operators are C assignment operators. For example we can use it in our linker script as:
|
||||||
|
|
||||||
|
```
|
||||||
|
START_ADDRESS = 0x200000;
|
||||||
|
DATA_OFFSET = 0x200000;
|
||||||
|
|
||||||
|
SECTIONS
|
||||||
|
{
|
||||||
|
. = START_ADDRESS;
|
||||||
|
.text : {
|
||||||
|
*(.text)
|
||||||
|
}
|
||||||
|
|
||||||
|
. = START_ADDRESS + DATA_OFFSET;
|
||||||
|
.data : {
|
||||||
|
*(.data)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
As you already may noted the syntax for expressions in the linker script language is identical to that of C expressions. Besides this the control language of the linking supports following builtin functions:
|
||||||
|
|
||||||
|
* `ABSOLUTE` - returns absolute value of the given expression;
|
||||||
|
* `ADDR` - takes the section and returns its address;
|
||||||
|
* `ALIGN` - returns the value of the location counter (`.` operator) that aligned by the boundary of the next expression after the given expression;
|
||||||
|
* `DEFINED` - returns `1` if the given symbol placed in the global symbol table and `0` in other way;
|
||||||
|
* `MAX` and `MIN` - return maximum and minimum of the two given expressions;
|
||||||
|
* `NEXT` - returns the next unallocated address that is a multiple of the give expression;
|
||||||
|
* `SIZEOF` - returns the size in bytes of the given named section.
|
||||||
|
|
||||||
|
That's all.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
This is the end of the post about linkers. We knew many things about linkers in this post, such things like what is it linker and why we need in it, how to use it and etc..
|
||||||
|
|
||||||
|
If you will have any questions or suggestions write me [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) in twitter.
|
||||||
|
|
||||||
|
Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send let me know via emal or send a PR.
|
||||||
|
|
||||||
|
Links
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
* [Book about Linux kernel internals](http://0xax.gitbooks.io/linux-insides/content/)
|
||||||
|
* [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29)
|
||||||
|
* [object files](https://en.wikipedia.org/wiki/Object_file)
|
||||||
|
* [glibc](https://en.wikipedia.org/wiki/GNU_C_Library)
|
||||||
|
* [opcode](https://en.wikipedia.org/wiki/Opcode)
|
||||||
|
* [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)
|
||||||
|
* [GNU linker](https://en.wikipedia.org/wiki/GNU_linker)
|
||||||
|
* [My posts about assembly programming for x86_64](http://0xax.github.io/categories/assembly/)
|
||||||
|
* [readelf](https://sourceware.org/binutils/docs/binutils/readelf.html)
|
Loading…
Reference in New Issue
Block a user