1
0
mirror of https://github.com/0xAX/linux-insides.git synced 2024-12-31 10:50:57 +00:00

fix typos and sentences in linkers.md

This commit is contained in:
Johan Manuel 2015-08-03 14:36:30 +02:00
parent cf74bb9fd7
commit 40e7cb75a4

View File

@ -3,7 +3,7 @@ Introduction
During the writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received many emails with questions related to the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and linker-related subjects. So I've decided to write this to cover some aspects of the linker and the linking of object files.
If we open page the `Linker` page on wikipidia, we will see following definition:
If we open page the `Linker` page on wikipidia, we can see the following definition:
>In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file.
@ -34,7 +34,7 @@ int main(int argc, char **argv) {
}
```
The `lib.c` contains:
The `lib.c` file contains:
```C
int factorial(int base) {
@ -53,7 +53,7 @@ int factorial(int base) {
}
```
And the `lib.h`:
And the `lib.h` file contains:
```C
#ifndef LIB_H
@ -140,14 +140,14 @@ Relocation is the process of connecting symbolic references with symbolic defini
19: 89 c6 mov %eax,%esi
```
Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`. Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man:
Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`? Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man:
```
-mcmodel=small
Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
```
Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quoute from `gcc` manual. In this way 4-bytes enough for this. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see:
Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quote from the `gcc` manual. With this code model, 4-bytes is enough to represent the address. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see:
```
$ gcc main.c lib.c -o factorial | objdump -S factorial | grep factorial
@ -168,7 +168,7 @@ factorial: file format elf64-x86-64
...
```
As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You already can know that standard C program is linked with the `glibc` C standard library if the `-nostdlib` was not passed to the `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that placed in the special section which is called `.init`. We can see it in the beginning of the objdump output:
As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You may already know that standard C programs are linked with the `glibc` C standard library unless `-nostdlib` is passed to `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that is placed in the special section which is called `.init`. We can see it in the beginning of the objdump output:
```
objdump -S factorial | less
@ -182,7 +182,7 @@ Disassembly of section .init:
4003ac: 48 8b 05 a5 05 20 00 mov 0x2005a5(%rip),%rax # 600958 <_DYNAMIC+0x1d0>
```
Not that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format):
Note that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format):
```
$ readelf -d factorial | grep \(INIT\)
@ -196,7 +196,7 @@ So, the address of the `main` function is the `0000000000400506` and it is offse
True
```
So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of they will be overlapped. Just now we saw a process which called - `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses.
So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of them will be overlapped. Just now we saw a process which is called `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses.
Ok, now we know a little about linkers and relocation. Time to link our object files and to know more about linkers.
@ -248,7 +248,7 @@ Here we can see two problems:
* Linker can't find `_start` symbol;
* Linker does not know anything about `printf` function.
First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I've started to learn programming I have learned that `main` function is the entry point of the program. I think you learned this too :) But actually it is not entry point, there is `_start` instead. The `_start` symbol defined in the `crt1.o` object file. We can find it with the:
First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I started to learn programming I learned that the `main` function is the entry point of the program. I think you learned this too :) But it actually isn't the entry point, it's `_start` instead. The `_start` symbol is defined in the `crt1.o` object file. We can find it with the following command:
```
$ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o
@ -266,7 +266,7 @@ Disassembly of section .text:
...
```
and we pass this object file to the `ld` command as first argumet (see above). Now let's try to link it and will look on result:
We pass this object file to the `ld` command as its first argument (see above). Now let's try to link it and will look on result:
```
ld /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \
@ -286,7 +286,7 @@ Unfortunately we will see even more errors. We can see here old error about unde
* `__libc_csu_init`
* `__libc_start_main`
The `_start` symbol defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there:
The `_start` symbol is defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there:
```assembly
mov $__libc_csu_fini, %R8_LP
@ -295,12 +295,12 @@ mov $__libc_csu_init, %RCX_LP
call __libc_start_main
```
Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when program runned and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files:
Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when the program is ran and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols are defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files:
* `crtn.o`;
* `crtn.i`.
Defines the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively).
define the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively).
The `crtn.o` object file contains these `.init` and `.fini` sections:
@ -318,7 +318,7 @@ Disassembly of section .fini:
4: c3 retq
```
And the `crti.o` contains `_init` and `_fini` symbols. Let's try to link again with these two object files:
And the `crti.o` object file contains the `_init` and `_fini` symbols. Let's try to link again with these two object files:
```
$ ld \
@ -328,7 +328,7 @@ $ ld \
-o factorial
```
And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search the standard library in the paths that are pointed in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option:
And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search for the standard library in the paths present in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option:
```
$ ld \
@ -338,7 +338,7 @@ $ ld \
-o factorial
```
Finally we will get executable file, but if we will try to run it, we will get strange result:
Finally we get an executable file, but if we try to run it, we will get strange results:
```
$ ./factorial
@ -392,7 +392,7 @@ Note on the strange line:
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
```
The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of an Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it placed in the `/lib64/ld-linux-x86-64.so.2` for the `x86_64`. Now let's add pass `-dynamic-linker` option with the path of the `ld-linux-x86-64.so.2` to the `ld` and will see on the result:
The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it is placed in the `/lib64/ld-linux-x86-64.so.2` file for the `x86_64` architecture. Now let's add the `-dynamic-linker` option with the path of `ld-linux-x86-64.so.2` to the `ld` call and will see the following results:
```
$ gcc -c main.c lib.c
@ -413,7 +413,7 @@ $ ./factorial
factorial of 5 is: 120
```
It works! With the first line we compile the `main.c` and the `lib.c` source code files to the object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`:
It works! With the first line we compile the `main.c` and the `lib.c` source code files to object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`:
```
$ file lib.o main.o
@ -421,12 +421,12 @@ lib.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
```
and after this we link object files of the our program with the needed system object files and libraries. We just saw simple example how to compile and link C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker` and etc. Moreover `GNU ld` has own language that allows to control of the linking process. In the next two paragraps we will look it.
and after this we link object files of our program with the needed system object files and libraries. We just saw a simple example of how to compile and link a C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker`, etc... Moreover `GNU ld` has its own language that allows to control the linking process. In the next two paragraphs we will look into it.
Useful command line options of the GNU linker
----------------------------------------------
As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o <output>` - that tells `ld` to produce an output file called `output` as the result of linking, `-l<name>` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course the `ld` supports much more command line options, let's look on some of it.
As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o <output>` - that tells `ld` to produce an output file called `output` as the result of linking, `-l<name>` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course `ld` supports much more command line options, let's look at some of them.
The first useful command line option is `@file`. In this case the `file` specifies filename where command line options will be read. For example we can create file with the name `linker.ld`, put there our command line arguments from the previous example and execute it with:
@ -434,9 +434,9 @@ The first useful command line option is `@file`. In this case the `file` specifi
$ ld @linker.ld
```
The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is command line option for the same purpose but for the output file: `--oformat=output-format`.
The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is a command line option for the same purpose but for the output file: `--oformat=output-format`.
The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case when this command line option can be useful. For example let's look in the Linux kernel source code and more precisely in the Makefile that related to the kernel decompression for ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile). We can find following definition there:
The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case where this command line option can be useful: in the Linux kernel source code and more precisely in the Makefile that is related to the kernel decompression for the ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile), we can find following definition:
```
LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ)
@ -471,12 +471,12 @@ $ ld -M @linker.ld
0x000000000040041b factorial
```
Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not full set of the command line options support by the `ld` util. Full description you can find in the manual of this util.
Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not the full set of command line options supported by the `ld` util. You can find the complete documentation of the `ld` util in the manual.
Control Language linker
----------------------------------------------
As I wrote previously, the `ld` has support of the own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details.
As I wrote previously, `ld` has support for its own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details.
With the linker language we can control:
@ -484,9 +484,9 @@ With the linker language we can control:
* output files;
* file formats
* addresses of sections;
* and etc.
* etc...
Usually commands written on linker control language placed in a file that called - linker script. We can pass it to the `ld` with the `-T` command line option. The main command in the each linker script is the `SECTIONS`. Each linker script must contain this command and it determines the `map` of the output file. The special variable - `.` contains current position of the output. Let's write simple assembly program and will look how we can use linker script to control linking of this program. For example it will be hello world:
Commands written in the linker control language are usually placed in a file called linker script. We can pass it to `ld` with the `-T` command line option. The main command in a linker script is the `SECTIONS` command. Each linker script must contain this command and it determines the `map` of the output file. The special variable `.` contains current position of the output. Let's write simple assembly program andi we will look at how we can use a linker script to control linking of this program. We will take a hello world program for this example:
```assembly
section .data
@ -504,14 +504,14 @@ _start:
syscall
```
We can compile and link it with the:
We can compile and link it with the following commands:
```
$ nasm -f elf64 -o hello.o hello.asm
$ ld -o hello hello.o
```
Our program consists from tw sections: `.text` - contains code of the program and `.data` - contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is:
Our program consists from two sections: `.text` contains code of the program and `.data` contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is:
```
/*
@ -535,7 +535,7 @@ SECTIONS
}
```
On the first three lines you can see comment that written with `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` command specifies name of the our executable file and its format. The next command - is `INPUT` specfies input file to the `ld` linker. After all of this command we can see main `SECTIONS` command, as I already wrote each linker script must contain definition of this command. The `SECTIONS` command represents set and order of the sections which are will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` section as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In another words the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section.
On the first three lines you can see a comment written in `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` commands specifiy the name of our executable file and its format. The next command, `INPUT`, specfies the input file to the `ld` linker. Then, we can see the main `SECTIONS` command, which, as I already wrote, must be present in every linker script. The `SECTIONS` command represents the set and order of the sections which will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In other words, the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section.
We can compile and link it with the:
@ -544,7 +544,7 @@ $ nasm -f elf64 -o hello.o hello.S && ld -T linker.script && ./hello
hello, world!
```
If we will look inside with the `objdump` util, we will see that `.text` section starts from the `0x200000` and the `.data` sections starts from the `0x400000` address:
If we will look insidei it with the `objdump` util, we can see that `.text` section starts from the address `0x200000` and the `.data` sections starts from the address `0x400000`:
```
$ objdump -D hello
@ -562,13 +562,13 @@ Disassembly of section .data:
...
```
Except of those comands that we have already seen, there are a few other linker scripts commands. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that setup header of the Linux kernel has offset - `0x1f1`. In the linker script of the Linux kernel we can find check for this:
Apart from the commands we have already seen, there are a few others. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print the given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that the setup header of the Linux kernel has offset `0x1f1`. In the linker script of the Linux kernel we can find a check for this:
```
. = ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!");
```
The next `INCLUDE filename` command allows to include external linker script symbols to the current. In a linker script we can assign a value to a symbol. The `ld` support a couple of assignment operators:
The `INCLUDE filename` command allows to include external linker script symbols in the current one. In a linker script we can assign a value to a symbol. `ld` supports a couple of assignment operators:
* symbol = expression ;
* symbol += expression ;
@ -615,11 +615,11 @@ That's all.
Conclusion
-----------------
This is the end of the post about linkers. We knew many things about linkers in this post, such things like what is it linker and why we need in it, how to use it and etc..
This is the end of the post about linkers. We learned many things about linkers in this post, such as what is a linker and why it is needed, how to use it, etc..
If you will have any questions or suggestions write me [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) in twitter.
If you have any questions or suggestions, write me an [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) on twitter.
Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send let me know via emal or send a PR.
Please note that English is not my first language, and I am really sorry for any inconvenience. If you find any mistakes please let me know via email or send a PR.
Links
-----------------