From bbc3931edb4b10ed84c41762c8da57ed727c7c22 Mon Sep 17 00:00:00 2001 From: zhaoxiaoqiang Date: Thu, 30 Jul 2015 16:49:33 +0800 Subject: [PATCH 01/32] fix minor typos --- Misc/how_kernel_compiled.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Misc/how_kernel_compiled.md b/Misc/how_kernel_compiled.md index ee9097b..022dfdc 100644 --- a/Misc/how_kernel_compiled.md +++ b/Misc/how_kernel_compiled.md @@ -116,7 +116,7 @@ obj := $(objtree) export srctree objtree VPATH ``` -That tells to `Makefile` that source tree of the Linux kernel will be in the current directory where `make` command was executed. After this we set `objtree` and other variables to this directory and export these variables. The next step is the getting value for the `SUBARCH` variable that will represent tewhat the underlying archicecture is: +That tells to `Makefile` that source tree of the Linux kernel will be in the current directory where `make` command was executed. After this we set `objtree` and other variables to this directory and export these variables. The next step is the getting value for the `SUBARCH` variable that will represent what the underlying architecture is: ```Makefile SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \ From 377cbdf5ca810e66e5d97e20967dc1f274e09e60 Mon Sep 17 00:00:00 2001 From: 0xAX Date: Fri, 31 Jul 2015 14:41:58 +0600 Subject: [PATCH 02/32] Update ELF.md --- Theory/ELF.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Theory/ELF.md b/Theory/ELF.md index c4362ef..97db281 100644 --- a/Theory/ELF.md +++ b/Theory/ELF.md @@ -102,7 +102,7 @@ in the linux kernel source code. `elf64_phdr` defined in the same [elf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h). -And ELF object file also contains other fields/structures which you can find in the [Documentation](http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf). Better let's look on the `vmlinux`. +And ELF object file also contains other fields/structures which you can find in the [Documentation](http://www.uclibc.org/docs/elf-64-gen.pdf). Better let's look on the `vmlinux`. vmlinux -------------------------------------------------------------------------------- @@ -213,4 +213,4 @@ Program Headers: Here we can see five segments with sections list. All of these sections you can find in the generated linker script at - `arch/x86/kernel/vmlinux.lds`. -That's all. Of course it's not a full description of ELF(Executable and Linkable Format), but if you are interested in it, you can find documentation - [here](ftp://ftp.openwatcom.org/pub/devel/docs/elf-64-gen.pdf) +That's all. Of course it's not a full description of ELF(Executable and Linkable Format), but if you are interested in it, you can find documentation - [here](http://www.uclibc.org/docs/elf-64-gen.pdf) From e10c624957b2e01a8b2182ac9e6e5684c13dbde6 Mon Sep 17 00:00:00 2001 From: Waqar144 Date: Sat, 1 Aug 2015 23:59:49 +0500 Subject: [PATCH 03/32] [1/3] Fix sentence structures in linux-bootstrap-3.md --- Booting/linux-bootstrap-3.md | 116 ++++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index 8cc25ac..20a8254 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -4,17 +4,20 @@ Kernel booting process. Part 3. Video mode initialization and transition to protected mode -------------------------------------------------------------------------------- -This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). We will see video mode initialization in the kernel setup code, preparation before switching into the protected mode and transition into it in this part. +This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). In this part, we will see: +- video mode initialization in the kernel setup code, +- preparation before switching into the protected mode, +- transition to protected mode -**NOTE** If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you. +**NOTE: If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you.** -As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts with getting of video mode from the `boot_params.hdr` structure: +As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts by first getting the video mode from the `boot_params.hdr` structure: ```C u16 mode = boot_params.hdr.vid_mode; ``` -which we filled in the `copy_boot_params` function (you can read about it in the previous post). `vid_mode` is an obligatory field which filled by the bootloader. You can find information about it in the kernel boot protocol: +which we filled in the `copy_boot_params` function (you can read about it in the previous post). `vid_mode` is an obligatory field which is filled by the bootloader. You can find information about it in the kernel boot protocol: ``` Offset Proto Name Meaning @@ -34,49 +37,44 @@ vga= line is parsed. ``` -So we can add `vga` option to the grub or another bootloader configuration file and it will pass this option to the kernel command line. This option can have different values as we can read from the description, for example it can be integer number or `ask`. If you will pass `ask`, you see menu like this: +So we can add `vga` option to the grub or another bootloader configuration file and it will pass this option to the kernel command line. This option can have different values as we can mentioned in the description, for example it can be an integer number `0xFFFD` or `ask`. If you pass `ask` t `vga`, you will see a menu like this: ![video mode setup menu](http://oi59.tinypic.com/ejcz81.jpg) -which will suggest to select video mode. We will look on it's implementation, but before we must to know about another things. +which will ask to select a video mode. We will look at it's implementation, but before diving into the implementation we have to look at some other things. Kernel data types -------------------------------------------------------------------------------- -Earlier we saw definitions of different data types like `u16` and etc... in the kernel setup code. Let's look on a couple of data types provided by the kernel: +Earlier we saw definitions of different data types like `u16` etc. in the kernel setup code. Let's look on a couple of data types provided by the kernel: + -``` | Type | char | short | int | long | u8 | u16 | u32 | u64 | |------|------|-------|-----|------|----|-----|-----|-----| | Size | 1 | 2 | 4 | 8 | 1 | 2 | 4 | 8 | -``` -If you will read source code of the kernel, you'll see it very often, so it will be good to remember about it. +If you read source code of the kernel, you'll see these very often and so it will be good to remember them. Heap API -------------------------------------------------------------------------------- -As we got `vid_mode` from the `boot_params.hdr`, we can see call of the `RESET_HEAP` in the `set_video` function. `RESET_HEAP` is a macro which defined in the [boot.h](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L199) and looks as: +After we have `vid_mode` from the `boot_params.hdr` in the `set_video` function we can see call to `RESET_HEAP` function. `RESET_HEAP` is a macro which defined in the [boot.h](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L199). It is defined as: -```C +```c #define RESET_HEAP() ((void *)( HEAP = _end )) ``` -If you read second part, you can remember that we initialized the heap with the [init_heap](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. Since we can use heap, we have a couple functions for it which defined in the `boot.h`. They are: +If you have read the second part, you will remember that we initialized the heap with the [`init_heap`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. We have a couple of utility functions for heap which are defined in the `boot.h`. They are: -```C -#define RESET_HEAP()... +```c +RESET_HEAP() ``` -As we saw just now. It uses for resetting the heap by setting the `HEAP` variable equal to `_end`, where `_end` is just: - -```C -extern char _end[]; -``` +As we saw just above it resets the heap by setting the `HEAP` variable equal to `_end`, where `_end` is just `extern char _end[];` Next is `GET_HEAP` macro: -```C +```c #define GET_HEAP(type, n) \ ((type *)__get_heap(sizeof(type),__alignof__(type),(n))) ``` @@ -84,12 +82,12 @@ Next is `GET_HEAP` macro: for heap allocation. It calls internal function `__get_heap` with 3 parameters: * size of a type in bytes, which need be allocated -* next parameter shows how type of variable is aligned -* how many bytes to allocate +* `__alignof__(type)` shows how type of variable is aligned +* `n` tells how many bytes to allocate Implementation of `__get_heap` is: -```C +```c static inline char *__get_heap(size_t s, size_t a, size_t n) { char *tmp; @@ -101,20 +99,20 @@ static inline char *__get_heap(size_t s, size_t a, size_t n) } ``` -and further we will see usage of it, something like this: +and further we will see its usage, something like: -```C -saved.data = GET_HEAP(u16, saved.x*saved.y); +```c +saved.data = GET_HEAP(u16, saved.x * saved.y); ``` -Let's try to understand how `GET_HEAP` works. We can see here that `HEAP` (which equal to `_end` after `RESET_HEAP()`) is the address of aligned memory according to `a` parameter. After it we save memory address from `HEAP` to the `tmp` variable, move `HEAP` to the end of allocated block and return `tmp` which is start address of allocated memory. +Let's try to understand how `__get_heap` works. We can see here that `HEAP` (which is equal to `_end` after `RESET_HEAP()`) is the address of aligned memory according to `a` parameter. After it we save memory address from `HEAP` to the `tmp` variable, move `HEAP` to the end of allocated block and return `tmp` which is start address of allocated memory. And the last function is: ```C static inline bool heap_free(size_t n) { - return (int)(heap_end-HEAP) >= (int)n; + return (int)(heap_end - HEAP) >= (int)n; } ``` @@ -125,30 +123,41 @@ That's all. Now we have simple API for heap and can setup video mode. Setup video mode -------------------------------------------------------------------------------- -Now we can move directly to video mode initialization. We stopped at the `RESET_HEAP()` call in the `set_video` function. The next call of `store_mode_params` which stores video mode parameters in the `boot_params.screen_info` structure which defined in the [include/uapi/linux/screen_info.h](https://github.com/0xAX/linux/blob/master/include/uapi/linux/screen_info.h). If we will look at `store_mode_params` function, we can see that it starts from the call of the `store_cursor_position` function. As you can understand from the function name, it gets information about cursor and stores it. First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. After `store_cursor_position` executed, `store_video_mode` function will be called. It just gets current video mode and stores it in the `boot_params.screen_info.orig_video_mode`. +Now we can move directly to video mode initialization. We stopped at the `RESET_HEAP()` call in the `set_video` function. Next is the call to `store_mode_params` which stores video mode parameters in the `boot_params.screen_info` structure which is defined in the [include/uapi/linux/screen_info.h](https://github.com/0xAX/linux/blob/master/include/uapi/linux/screen_info.h). -After this, it checks current video mode and set the `video_segment`. After the BIOS transfers control to the boot sector, the following addresses are video memory: +If we will look at `store_mode_params` function, we can see that it starts with the call to `store_cursor_position` function. As you can understand from the function name, it gets information about cursor and stores it. + +First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. + +After `store_cursor_position` executed, `store_video_mode` function will be called. It just gets current video mode and stores it in the `boot_params.screen_info.orig_video_mode`. + +After this, it checks current video mode and sets the `video_segment`. After the BIOS transfers control to the boot sector, the following addresses are for video memory: ``` 0xB000:0x0000 32 Kb Monochrome Text Video Memory 0xB800:0x0000 32 Kb Color Text Video Memory ``` -So we set the `video_segment` variable to `0xb000` if current video mode is MDA, HGC, VGA in monochrome mode or `0xb800` in color mode. After setup of the address of the video segment need to store font size in the `boot_params.screen_info.orig_video_points` with: +So we set the `video_segment` variable to `0xB000` if current video mode is MDA, HGC, VGA in monochrome mode or `0xB800` in color mode. After setup of the address of the video segment font size needs to be stored in the `boot_params.screen_info.orig_video_points` with: -```C +```c set_fs(0); font_size = rdfs16(0x485); boot_params.screen_info.orig_video_points = font_size; ``` -First of all we put 0 to the `FS` register with `set_fs` function. We already saw functions like `set_fs` in the previous part. They are all defined in the [boot.h](https://github.com/0xAX/linux/blob/master/arch/x86/boot/boot.h). Next we read value which located at address `0x485` (this memory location used to get the font size) and save font size in the `boot_params.screen_info.orig_video_points`. +First of all we put 0 to the `FS` register with `set_fs` function. We already saw functions like `set_fs` in the previous part. They are all defined in the [boot.h](https://github.com/0xAX/linux/blob/master/arch/x86/boot/boot.h). Next we read value which is located at address `0x485` (this memory location is used to get the font size) and save font size in the `boot_params.screen_info.orig_video_points`. -The next we get amount of columns and rows by address `0x44a` and stores they in the `boot_params.screen_info.orig_video_cols` and `boot_params.screen_info.orig_video_lines`. After this, execution of the `store_mode_params` is finished. +``` + x = rdfs16(0x44a); + y = (adapter == ADAPTER_CGA) ? 25 : rdfs8(0x484)+1; +``` -The next we can see `save_screen` function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount and etc... to the `saved_screen` structure, which defined as: +Next we get amount of columns by `0x44a` and rows by address `0x484` and store them in the `boot_params.screen_info.orig_video_cols` and `boot_params.screen_info.orig_video_lines`. After this, execution of the `store_mode_params` is finished. -```C +Next we can see `save_screen` function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount etc. and stores it in the `saved_screen` structure, which is defined as: + +```c static struct saved_screen { int x, y; int curx, cury; @@ -156,9 +165,9 @@ static struct saved_screen { } saved; ``` -It checks that heap has free space for it with: +It then checks whether the heap has free space for it with: -```C +```c if (!heap_free(saved.x*saved.y*sizeof(u16)+512)) return; ``` @@ -167,7 +176,7 @@ and allocates space in the heap if it is enough and stores `saved_screen` in it. The next call is `probe_cards(0)` from the [arch/x86/boot/video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L33). It goes over all video_cards and collects number of modes provided by the cards. Here is the interesting moment, we can see the loop: -```C +```c for (card = video_cards; card < video_cards_end; card++) { /* collecting number of modes here */ } @@ -175,7 +184,7 @@ for (card = video_cards; card < video_cards_end; card++) { but `video_cards` not declared anywhere. Answer is simple: Every video mode presented in the x86 kernel setup code has definition like this: -```C +```c static __videocard video_vga = { .card_name = "VGA", .probe = vga_probe, @@ -185,13 +194,13 @@ static __videocard video_vga = { where `__videocard` is a macro: -```C +```c #define __videocard struct card_info __attribute__((used,section(".videocards"))) ``` which means that `card_info` structure: -```C +```c struct card_info { const char *card_name; int (*set_mode)(struct mode_info *mode); @@ -204,7 +213,7 @@ struct card_info { }; ``` -is in the `.videocards` segment. Let's look on the [arch/x86/boot/setup.ld](https://github.com/0xAX/linux/blob/master/arch/x86/boot/setup.ld) linker file, we can see there: +is in the `.videocards` segment. Let's look in the [arch/x86/boot/setup.ld](https://github.com/0xAX/linux/blob/master/arch/x86/boot/setup.ld) linker file, we can see there: ``` .videocards : { @@ -216,13 +225,13 @@ is in the `.videocards` segment. Let's look on the [arch/x86/boot/setup.ld](http It means that `video_cards` is just memory address and all `card_info` structures are placed in this segment. It means that all `card_info` structures are placed between `video_cards` and `video_cards_end`, so we can use it in a loop to go over all of it. After `probe_cards` executed we have all structures like `static __videocard video_vga` with filled `nmodes` (number of video modes). -After that `probe_cards` executed, we move to the main loop in the `setup_video` function. There is infinite loop which tries to setup video mode with the `set_mode` function or prints menu if we passed `vid_mode=ask` to the kernel command line or video mode is undefined. +After `probe_cards` execution is finished, we move to the main loop in the `set_video` function. There is infinite loop which tries to setup video mode with the `set_mode` function or prints a menu if we passed `vid_mode=ask` to the kernel command line or video mode is undefined. -The `set_mode` function is defined in the [video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L147) and gets only one parameter - `mode` which is number of video mode (we got it or from the menu or in the start of the `setup_video`, from kernel setup header). +The `set_mode` function is defined in the [video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L147) and gets only one parameter, `mode` which is the number of video mode (we got it or from the menu or in the start of the `setup_video`, from kernel setup header). -`set_mode` function checks the `mode` and calls `raw_set_mode` function. The `raw_set_mode` calls `set_mode` function for selected card. We can get access to this function from the `card_info` structure, every video mode defines this structure with filled value which depends on video mode (for example for `vga` it is `video_vga.set_mode` function, see above example of `card_info` structure for `vga`). `video_vga.set_mode` is `vga_set_mode`, which checks vga mode and call function depend on mode: +`set_mode` function checks the `mode` and calls `raw_set_mode` function. The `raw_set_mode` calls `set_mode` function for selected card i.e. `card->set_mode(struct mode_info*)`. We can get access to this function from the `card_info` structure, every video mode defines this structure with values filled depending upon the video mode (for example for `vga` it is `video_vga.set_mode` function, see above example of `card_info` structure for `vga`). `video_vga.set_mode` is `vga_set_mode`, which checks the vga mode and calls the respective function: -```C +```c static int vga_set_mode(struct mode_info *mode) { vga_set_basic_mode(); @@ -256,12 +265,18 @@ static int vga_set_mode(struct mode_info *mode) } ``` -Every function which setups video mode, just call `0x10` BIOS interruption with certain value in the `AH` register. After this we have set video mode and now we can switch to the protected mode. +Every function which setups video mode, just calls `0x10` BIOS interrupt with certain value in the `AH` register. + +After we have set video mode, we pass it to the `boot_params.hdr.vid_mode`. + +Next `vesa_store_edid` is called. This function simply stores the [EDID](https://en.wikipedia.org/wiki/Extended_Display_Identification_Data) (**E**xtended **D**isplay **I**dentification **D**ata) information for kernel use. After this `store_mode_params` is called again. Lastly, if `do_restore` is set, screen is restored to an earlier state. + +After this we have set video mode and now we can switch to the protected mode. Last preparation before transition into protected mode -------------------------------------------------------------------------------- -We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see last preparation and switch into the protected mode. +We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. `go_to_protected_mode` defined in the [arch/x86/boot/pm.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pm.c#L104). It contains some functions which make last preparations before we can jump into protected mode, so let's look on it and try to understand what they do and how it works. @@ -539,3 +554,4 @@ Links * [GCC designated inits](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html) * [GCC type attributes](https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html) * [Previous part](linux-bootstrap-2.md) + From dff801f85d43b5ebd4689d36846d69ab6711319a Mon Sep 17 00:00:00 2001 From: 0xAX Date: Sun, 2 Aug 2015 20:20:36 +0600 Subject: [PATCH 04/32] Create linkers.md --- Misc/linkers.md | 636 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 636 insertions(+) create mode 100644 Misc/linkers.md diff --git a/Misc/linkers.md b/Misc/linkers.md new file mode 100644 index 0000000..8c07e73 --- /dev/null +++ b/Misc/linkers.md @@ -0,0 +1,636 @@ +Introduction +--------------- + +During writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received and continue to receive many emails with the similar questions related with the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and a linker related stuff. So've decided to write this post that will cover some aspects related to the linker and linking of object files. + +If we will open page about `Linker` on wikipidia, we will see following definition: + +``` +In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file. +``` + +If you've wrote at least one program on C in your life, you saw files with the `*.o` extension. These files are [object files](https://en.wikipedia.org/wiki/Object_file). Object files are blocks of machine code and data with uncertain addresses of references to data and functions in other object files (or libraries), as well as a list of its own functions and data. The main purpose of the linker is collect/handle code and data of the each object file to the the final executable file or library. In this post we will try to go through all aspects of this process. Let's start. + +Linking process +---------------- + +Let's create simple project with the following structure: + +``` +*-linkers +*--main.c +*--lib.c +*--lib.h +``` + +And write there for example factorial program. Our `main.c` source code file will contain: + +```C +#include + +#include "lib.h" + +int main(int argc, char **argv) { + printf("factorial of 5 is: %d\n", factorial(5)); + return 0; +} +``` + +The `lib.c` will contain: + +```C +int factorial(int base) { + int res = 1, i = 1; + + if (base == 0) { + return 1; + } + + while (i <= base) { + res *= i; + i++; + } + + return res; +} +``` + +And the `lib.h`: + +```C +#ifndef LIB_H +#define LIB_H + +int factorial(int base); + +#endif +``` + +Now let's compile only `main.c` source code file with the: + +``` +$ gcc -c main.c +``` + +If we will look inside with the `nm` util, we will see following output: + +``` +$ nm -A main.o +main.o: U factorial +main.o:0000000000000000 T main +main.o: U printf +``` + +The `nm` util allows us to see the list of symbols from the given object file. Look on the its output, it consists from the three columns: the first is the name of the given object file and address of resolved symbols. The second column contains symbol that determines status of the given symbol. In our case the `U` is `undefined` symbol and the `T` is the symbols that placed in the `.text` section. So `nm` util shows us that we have three symbols in the `main.c` source code file: + +* `factorial` - factorial function that defined in the `lib.c` source code file and marked as `undefined` because we compile only `main.c` source code file and it does not know anything about code from the `lib.c` for now; +* `main` - main function; +* `printf` - function from the [glibc](https://en.wikipedia.org/wiki/GNU_C_Library) library and `main.c` does not anything about it for now too. + +What we can understand from the output of the `nm` for this moment? All is simple. The `main.o` object file contains local symbol `main` by the `0000000000000000` (it will be filled with correct address after the linking) and two unresolved symbols. We can see all of this information in the disassembly output of the `main.o` object file: + +``` +$ objdump -S main.o + +main.o: file format elf64-x86-64 +Disassembly of section .text: + +0000000000000000
: + 0: 55 push %rbp + 1: 48 89 e5 mov %rsp,%rbp + 4: 48 83 ec 10 sub $0x10,%rsp + 8: 89 7d fc mov %edi,-0x4(%rbp) + b: 48 89 75 f0 mov %rsi,-0x10(%rbp) + f: bf 05 00 00 00 mov $0x5,%edi + 14: e8 00 00 00 00 callq 19 + 19: 89 c6 mov %eax,%esi + 1b: bf 00 00 00 00 mov $0x0,%edi + 20: b8 00 00 00 00 mov $0x0,%eax + 25: e8 00 00 00 00 callq 2a + 2a: b8 00 00 00 00 mov $0x0,%eax + 2f: c9 leaveq + 30: c3 retq +``` + +Here we are interesting only in the two `callq` operations. The two `callq` operations contain `linker stubs` or in another words function name and offset from it to the next instruction. These stubs will be updated to the real addresses of the functions. We can see these functions names with in the following `objdump` output: + +``` +$ objdump -S -r main.o + +... + 14: e8 00 00 00 00 callq 19 + 15: R_X86_64_PC32 factorial-0x4 + 19: 89 c6 mov %eax,%esi +... + 25: e8 00 00 00 00 callq 2a + 26: R_X86_64_PC32 printf-0x4 + 2a: b8 00 00 00 00 mov $0x0,%eax +... +``` + +The `-r` or `--reloc ` flags of the `objdump` util print the `relocation` entries of the file. Now let's know a little about relocation process. + +Relocation +------------ + +Relocation is the process of connecting symbolic references with symbolic definitions. Let's look on the previous snippet from the `objdump` output: + +``` + 14: e8 00 00 00 00 callq 19 + 15: R_X86_64_PC32 factorial-0x4 + 19: 89 c6 mov %eax,%esi +``` + +Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`. Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man: + +``` +-mcmodel=small + Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. +``` + +Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quoute from `gcc` manual. In this way 4-bytes enough for this. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see: + +``` +$ gcc main.c lib.c -o factorial | objdump -S factorial | grep factorial + +factorial: file format elf64-x86-64 +... +... +0000000000400506
: + 40051a: e8 18 00 00 00 callq 400537 +... +... +0000000000400537 : + 400550: 75 07 jne 400559 + 400557: eb 1b jmp 400574 + 400559: eb 0e jmp 400569 + 40056f: 7e ea jle 40055b +... +... +``` + +As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You already can know that standard C program is linked with the `glibc` C standard library if the `-nostdlib` was not passed to the `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that placed in the special section which is called `.init`. We can see it in the beginning of the objdump output: + +``` +objdump -S factorial | less + +factorial: file format elf64-x86-64 + +Disassembly of section .init: + +00000000004003a8 <_init>: + 4003a8: 48 83 ec 08 sub $0x8,%rsp + 4003ac: 48 8b 05 a5 05 20 00 mov 0x2005a5(%rip),%rax # 600958 <_DYNAMIC+0x1d0> +``` + +Not that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format): + +``` +$ readelf -d factorial | grep \(INIT\) + 0x000000000000000c (INIT) 0x4003a8 + ``` + +So, the address of the `main` function is the `0000000000400506` and it is offset from the `.init` section. As we can see from the output, the address of the `factorial` function is `0x0000000000400537` and binary code for the call of the `factorial` function now is `e8 18 00 00 00`. We already knwo that `e8` is opcode for the `call` instruction, the next `18 00 00 00` (note that address represented as little endian for the `x86_64`, in other words it is `00 00 00 18`) is the offset from the `callq` to the `factorial` function: + +```python +>>> hex(0x40051a + 0x18 + 0x5) == hex(0x400537) +True +``` + +So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of they will be overlapped. Just now we saw a process which called - `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. + +Ok, now we know a little about linkers and relocation. Time to link our object files and to know more about linkers. + +GNU linker +----------------- + +As you can understand from the title, I will use [GNU linker](https://en.wikipedia.org/wiki/GNU_linker) or just `ld` in this post. Of course we can use `gcc` to link our `factorial` project: + +``` +$ gcc main.c lib.o -o factorial +``` + +and after it we will get executable file - `factorial` as a result: + +``` +./factorial +factorial of 5 is: 120 +``` + +But `gcc` does not link object files. Instead it uses `collect2` which is just wrapper for the `GNU ld` linker: + +``` +~$ /usr/lib/gcc/x86_64-linux-gnu/4.9/collect2 --version +collect2 version 4.9.3 +/usr/bin/ld --version +GNU ld (GNU Binutils for Debian) 2.25 +... +... +... +``` + +Ok, we can use gcc and it will produce executable file of our program for us. But let's look how to use `GNU ld` linker for the same purpose. First of all let's try to link these object files with the following example: + +``` +ld main.o lib.o -o factorial +``` + +Try to do it and you will get following error: + +``` +$ ld main.o lib.o -o factorial +ld: warning: cannot find entry symbol _start; defaulting to 00000000004000b0 +main.o: In function `main': +main.c:(.text+0x26): undefined reference to `printf' +``` + +Here we can see two problems: + +* Linker can't find `_start` symbol; +* Linker does not know anything about `printf` function. + +First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I've started to learn programming I have learned that `main` function is the entry point of the program. I think you learned this too :) But actually it is not entry point, there is `_start` instead. The `_start` symbol defined in the `crt1.o` object file. We can find it with the: + +``` +$ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o + +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o: file format elf64-x86-64 + + +Disassembly of section .text: + +0000000000000000 <_start>: + 0: 31 ed xor %ebp,%ebp + 2: 49 89 d1 mov %rdx,%r9 + ... + ... + ... +``` + +and we pass this object file to the `ld` command as first argumet (see above). Now let's try to link it and will look on result: + +``` +ld /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \ +main.o lib.o -o factorial + +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o: In function `_start': +/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:115: undefined reference to `__libc_csu_fini' +/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:116: undefined reference to `__libc_csu_init' +/tmp/buildd/glibc-2.19/csu/../sysdeps/x86_64/start.S:122: undefined reference to `__libc_start_main' +main.o: In function `main': +main.c:(.text+0x26): undefined reference to `printf' +``` + +Unfortunately we will see even more errors. We can see here old error about undefined `printf` and yet another three undefined references: + +* `__libc_csu_fini` +* `__libc_csu_init` +* `__libc_start_main` + +The `_start` symbol defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there: + +```assembly +mov $__libc_csu_fini, %R8_LP +mov $__libc_csu_init, %RCX_LP +... +call __libc_start_main +``` + +Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when program runned and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files: + +* `crtn.o`; +* `crtn.i`. + +Defines the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively). + +The `crtn.o` object file contains these `.init` and `.fini` sections: + +``` +$ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o + +0000000000000000 <.init>: + 0: 48 83 c4 08 add $0x8,%rsp + 4: c3 retq + +Disassembly of section .fini: + +0000000000000000 <.fini>: + 0: 48 83 c4 08 add $0x8,%rsp + 4: c3 retq +``` + +And the `crti.o` contains `_init` and `_fini` symbols. Let's try to link again with these two object files: + +``` +$ ld \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o \ +-o factorial +``` + +And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search the standard library in the paths that are pointed in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option: + +``` +$ ld \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o -lc \ +-o factorial +``` + +Finally we will get executable file, but if we will try to run it, we will get strange result: + +``` +$ ./factorial +bash: ./factorial: No such file or directory +``` + +What's the problem here? Let's look on the executable file with the [readelf](https://sourceware.org/binutils/docs/binutils/readelf.html) util: + +``` +$ readelf -l factorial + +Elf file type is EXEC (Executable file) +Entry point 0x4003c0 +There are 7 program headers, starting at offset 64 + +Program Headers: + Type Offset VirtAddr PhysAddr + FileSiz MemSiz Flags Align + PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 + 0x0000000000000188 0x0000000000000188 R E 8 + INTERP 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8 + 0x000000000000001c 0x000000000000001c R 1 + [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] + LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 + 0x0000000000000610 0x0000000000000610 R E 200000 + LOAD 0x0000000000000610 0x0000000000600610 0x0000000000600610 + 0x00000000000001cc 0x00000000000001cc RW 200000 + DYNAMIC 0x0000000000000610 0x0000000000600610 0x0000000000600610 + 0x0000000000000190 0x0000000000000190 RW 8 + NOTE 0x00000000000001e4 0x00000000004001e4 0x00000000004001e4 + 0x0000000000000020 0x0000000000000020 R 4 + GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 + 0x0000000000000000 0x0000000000000000 RW 10 + + Section to Segment mapping: + Segment Sections... + 00 + 01 .interp + 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame + 03 .dynamic .got .got.plt .data + 04 .dynamic + 05 .note.ABI-tag + 06 +``` + +Note on the strange line: + +``` + INTERP 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8 + 0x000000000000001c 0x000000000000001c R 1 + [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] +``` + +The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of an Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it placed in the `/lib64/ld-linux-x86-64.so.2` for the `x86_64`. Now let's add pass `-dynamic-linker` option with the path of the `ld-linux-x86-64.so.2` to the `ld` and will see on the result: + +``` +$ gcc -c main.c lib.c + +$ ld \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crti.o \ +/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crtn.o main.o lib.o \ +-dynamic-linker /lib64/ld-linux-x86-64.so.2 \ +-lc -o factorial +``` + +Now we can run it as normal executable file: + +``` +$ ./factorial + +factorial of 5 is: 120 +``` + +It works! With the first line we compile the `main.c` and the `lib.c` source code files to the object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`: + +``` +$ file lib.o main.o +lib.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped +main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped +``` + +and after this we link object files of the our program with the needed system object files and libraries. We just saw simple example how to compile and link C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker` and etc. Moreover `GNU ld` has own language that allows to control of the linking process. In the next two paragraps we will look it. + +Useful command line options of the GNU linker +---------------------------------------------- + +As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o ` - that tells `ld` to produce an output file called `output` as the result of linking, `-l` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course the `ld` supports much more command line options, let's look on some of it. + +The first useful command line option is `@file`. In this case the `file` specifies filename where command line options will be read. For example we can create file with the name `linker.ld`, put there our command line arguments from the previous example and execute it with: + +``` +$ ld @linker.ld +``` + +The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is command line option for the same purpose but for the output file: `--oformat=output-format`. + +The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case when this command line option can be useful. For example let's look in the Linux kernel source code and more precisely in the Makefile that related to the kernel decompression for ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile). We can find following definition there: + +``` +LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ) +``` + +As we already know, it defines the `_kernel_bss_size` symbol with the size of the `.bss` section in the output file. This symbol will be used in the first [assembly file](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/head.S) that will be executed during kernel decompressing: + +```assembly +ldr r5, =_kernel_bss_size +``` + +The next command line options is the `-shared` that allows us to create shared library. The `-M` or `-map ` command line option prints the linking map with the information about symbols. In our case: + +``` +$ ld -M @linker.ld +... +... +... +.text 0x00000000004003c0 0x112 + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + .text 0x00000000004003c0 0x2a /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o +... +... +... + .text 0x00000000004003ea 0x31 main.o + 0x00000000004003ea main + .text 0x000000000040041b 0x3f lib.o + 0x000000000040041b factorial +``` + +Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not full set of the command line options support by the `ld` util. Full description you can find in the manual of this util. + +Control Language linker +---------------------------------------------- + +As I wrote previously, the `ld` has support of the own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details. + +With the linker language we can control: + +* input files; +* output files; +* file formats +* addresses of sections; +* and etc. + +Usually commands written on linker control language placed in a file that called - linker script. We can pass it to the `ld` with the `-T` command line option. The main command in the each linker script is the `SECTIONS`. Each linker script must contain this command and it determines the `map` of the output file. The special variable - `.` contains current position of the output. Let's write simple assembly program and will look how we can use linker script to control linking of this program. For example it will be hello world: + +```assembly +section .data + msg db "hello, world!",`\n` +section .text + global _start +_start: + mov rax, 1 + mov rdi, 1 + mov rsi, msg + mov rdx, 14 + syscall + mov rax, 60 + mov rdi, 0 + syscall +``` + +We can compile and link it with the: + +``` +$ nasm -f elf64 -o hello.o hello.asm +$ ld -o hello hello.o +``` + +Our program consists from tw sections: `.text` - contains code of the program and `.data` - contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is: + +``` +/* + * Linker script for the factorial + */ +OUTPUT(hello) +OUTPUT_FORMAT("elf64-x86-64") +INPUT(hello.o) + +SECTIONS +{ + . = 0x200000; + .text : { + *(.text) + } + + . = 0x400000; + .data : { + *(.data) + } +} +``` + +On the first three lines you can see comment that written with `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` command specifies name of the our executable file and its format. The next command - is `INPUT` specfies input file to the `ld` linker. After all of this command we can see main `SECTIONS` command, as I already wrote each linker script must contain definition of this command. The `SECTIONS` command represents set and order of the sections which are will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` section as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In another words the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section. + +We can compile and link it with the: + +``` +$ nasm -f elf64 -o hello.o hello.S && ld -T linker.script && ./hello +hello, world! +``` + +If we will look inside with the `objdump` util, we will see that `.text` section starts from the `0x200000` and the `.data` sections starts from the `0x400000` address: + +``` +$ objdump -D hello + +Disassembly of section .text: + +0000000000200000 <_start>: + 200000: b8 01 00 00 00 mov $0x1,%eax + ... + +Disassembly of section .data: + +0000000000400000 : + 400000: 68 65 6c 6c 6f pushq $0x6f6c6c65 + ... +``` + +Except of those comands that we have already seen, there are a few other linker scripts commands. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that setup header of the Linux kernel has offset - `0x1f1`. In the linker script of the Linux kernel we can find check for this: + +``` +. = ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!"); +``` + +The next `INCLUDE filename` command allows to include external linker script symbols to the current. In a linker script we can assign a value to a symbol. The `ld` support a couple of assignment operators: + +* symbol = expression ; +* symbol += expression ; +* symbol -= expression ; +* symbol *= expression ; +* symbol /= expression ; +* symbol <<= expression ; +* symbol >>= expression ; +* symbol &= expression ; +* symbol |= expression ; + +As you can note all operators are C assignment operators. For example we can use it in our linker script as: + +``` +START_ADDRESS = 0x200000; +DATA_OFFSET = 0x200000; + +SECTIONS +{ + . = START_ADDRESS; + .text : { + *(.text) + } + + . = START_ADDRESS + DATA_OFFSET; + .data : { + *(.data) + } +} +``` + +As you already may noted the syntax for expressions in the linker script language is identical to that of C expressions. Besides this the control language of the linking supports following builtin functions: + +* `ABSOLUTE` - returns absolute value of the given expression; +* `ADDR` - takes the section and returns its address; +* `ALIGN` - returns the value of the location counter (`.` operator) that aligned by the boundary of the next expression after the given expression; +* `DEFINED` - returns `1` if the given symbol placed in the global symbol table and `0` in other way; +* `MAX` and `MIN` - return maximum and minimum of the two given expressions; +* `NEXT` - returns the next unallocated address that is a multiple of the give expression; +* `SIZEOF` - returns the size in bytes of the given named section. + +That's all. + +Conclusion +----------------- + +This is the end of the post about linkers. We knew many things about linkers in this post, such things like what is it linker and why we need in it, how to use it and etc.. + +If you will have any questions or suggestions write me [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) in twitter. + +Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send let me know via emal or send a PR. + +Links +----------------- + +* [Book about Linux kernel internals](http://0xax.gitbooks.io/linux-insides/content/) +* [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) +* [object files](https://en.wikipedia.org/wiki/Object_file) +* [glibc](https://en.wikipedia.org/wiki/GNU_C_Library) +* [opcode](https://en.wikipedia.org/wiki/Opcode) +* [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) +* [GNU linker](https://en.wikipedia.org/wiki/GNU_linker) +* [My posts about assembly programming for x86_64](http://0xax.github.io/categories/assembly/) +* [readelf](https://sourceware.org/binutils/docs/binutils/readelf.html) From 097226ad6c2e366539e13b7bd48bb5ffb79ed944 Mon Sep 17 00:00:00 2001 From: 0xAX Date: Sun, 2 Aug 2015 20:21:13 +0600 Subject: [PATCH 05/32] Update linkers.md --- Misc/linkers.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Misc/linkers.md b/Misc/linkers.md index 8c07e73..bf0a623 100644 --- a/Misc/linkers.md +++ b/Misc/linkers.md @@ -5,9 +5,8 @@ During writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/cont If we will open page about `Linker` on wikipidia, we will see following definition: -``` -In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file. -``` + +>In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file. If you've wrote at least one program on C in your life, you saw files with the `*.o` extension. These files are [object files](https://en.wikipedia.org/wiki/Object_file). Object files are blocks of machine code and data with uncertain addresses of references to data and functions in other object files (or libraries), as well as a list of its own functions and data. The main purpose of the linker is collect/handle code and data of the each object file to the the final executable file or library. In this post we will try to go through all aspects of this process. Let's start. From e47cab30b140e6cd1803700bedc0259f245a23a5 Mon Sep 17 00:00:00 2001 From: 0xAX Date: Sun, 2 Aug 2015 20:22:01 +0600 Subject: [PATCH 06/32] Update SUMMARY.md --- SUMMARY.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/SUMMARY.md b/SUMMARY.md index 7e3b6b5..640a567 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -46,7 +46,8 @@ * [initrd]() * [Misc](Misc/README.md) * [How kernel compiled](Misc/how_kernel_compiled.md) - * [Write and Submit your first Linux kernel Patch]() + * [Linkers](Misc/linkers.md) + * [Write and Submit your first Linux kernel Patch]() * [Data types in the kernel]() * [Useful links](LINKS.md) * [Contributors](contributors.md) From e3b60d6b4bcc568d543211596c10d147d34cd4f0 Mon Sep 17 00:00:00 2001 From: Ian Miell Date: Sun, 2 Aug 2015 20:02:57 +0100 Subject: [PATCH 07/32] Nit-picks and corrections up to "Relocation" --- Misc/linkers.md | 34 +++++++++++++++++----------------- README.md | 4 ++-- contributors.md | 1 + 3 files changed, 20 insertions(+), 19 deletions(-) diff --git a/Misc/linkers.md b/Misc/linkers.md index bf0a623..726f5f4 100644 --- a/Misc/linkers.md +++ b/Misc/linkers.md @@ -1,17 +1,16 @@ Introduction --------------- -During writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received and continue to receive many emails with the similar questions related with the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and a linker related stuff. So've decided to write this post that will cover some aspects related to the linker and linking of object files. - -If we will open page about `Linker` on wikipidia, we will see following definition: +During the writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received many emails with questions related to the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and linker-related subjects. So I've decided to write this to cover some aspects of the linker and the linking of object files. +If we open page the `Linker` page on wikipidia, we will see following definition: >In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file. -If you've wrote at least one program on C in your life, you saw files with the `*.o` extension. These files are [object files](https://en.wikipedia.org/wiki/Object_file). Object files are blocks of machine code and data with uncertain addresses of references to data and functions in other object files (or libraries), as well as a list of its own functions and data. The main purpose of the linker is collect/handle code and data of the each object file to the the final executable file or library. In this post we will try to go through all aspects of this process. Let's start. +If you've written at least one program on C in your life, you will have seen files with the `*.o` extension. These files are [object files](https://en.wikipedia.org/wiki/Object_file). Object files are blocks of machine code and data with placeholder addresses that reference data and functions in other object files or libraries, as well as a list of its own functions and data. The main purpose of the linker is collect/handle the code and data of each object file, turning it into the the final executable file or library. In this post we will try to go through all aspects of this process. Let's start. Linking process ----------------- +--------------- Let's create simple project with the following structure: @@ -22,7 +21,7 @@ Let's create simple project with the following structure: *--lib.h ``` -And write there for example factorial program. Our `main.c` source code file will contain: +And write there our example factorial program. Our `main.c` source code file contains: ```C #include @@ -35,7 +34,7 @@ int main(int argc, char **argv) { } ``` -The `lib.c` will contain: +The `lib.c` contains: ```C int factorial(int base) { @@ -65,13 +64,14 @@ int factorial(int base); #endif ``` -Now let's compile only `main.c` source code file with the: +Now let's compile only the `main.c` source code file with: ``` $ gcc -c main.c ``` -If we will look inside with the `nm` util, we will see following output: +If we look inside the outputted object file with the `nm` util, we will see the +following output: ``` $ nm -A main.o @@ -80,13 +80,13 @@ main.o:0000000000000000 T main main.o: U printf ``` -The `nm` util allows us to see the list of symbols from the given object file. Look on the its output, it consists from the three columns: the first is the name of the given object file and address of resolved symbols. The second column contains symbol that determines status of the given symbol. In our case the `U` is `undefined` symbol and the `T` is the symbols that placed in the `.text` section. So `nm` util shows us that we have three symbols in the `main.c` source code file: +The `nm` util allows us to see the list of symbols from the given object file. It consists of three columns: the first is the name of the given object file and the address of any resolved symbols. The second column contains a character that represents the status of the given symbol. In this case the `U` means `undefined` and the `T` denotes that the symbols are placed in the `.text` section of the object. The `nm` utility shows us here that we have three symbols in the `main.c` source code file: -* `factorial` - factorial function that defined in the `lib.c` source code file and marked as `undefined` because we compile only `main.c` source code file and it does not know anything about code from the `lib.c` for now; -* `main` - main function; -* `printf` - function from the [glibc](https://en.wikipedia.org/wiki/GNU_C_Library) library and `main.c` does not anything about it for now too. +* `factorial` - the factorial function defined in the `lib.c` source code file. It is marked as `undefined` here because we compiled only the `main.c` source code file, and it does not know anything about code from the `lib.c` file for now; +* `main` - the main function; +* `printf` - the function from the [glibc](https://en.wikipedia.org/wiki/GNU_C_Library) library. `main.c` does not know anything about it for now either. -What we can understand from the output of the `nm` for this moment? All is simple. The `main.o` object file contains local symbol `main` by the `0000000000000000` (it will be filled with correct address after the linking) and two unresolved symbols. We can see all of this information in the disassembly output of the `main.o` object file: +What can we understand from the output of `nm` so far? The `main.o` object file contains the local symbol `main` at address `0000000000000000` (it will be filled with correct address after is is linked), and two unresolved symbols. We can see all of this information in the disassembly output of the `main.o` object file: ``` $ objdump -S main.o @@ -111,7 +111,7 @@ Disassembly of section .text: 30: c3 retq ``` -Here we are interesting only in the two `callq` operations. The two `callq` operations contain `linker stubs` or in another words function name and offset from it to the next instruction. These stubs will be updated to the real addresses of the functions. We can see these functions names with in the following `objdump` output: +Here we are interested only in the two `callq` operations. The two `callq` operations contain `linker stubs`, or the function name and offset from it to the next instruction. These stubs will be updated to the real addresses of the functions. We can see these functions' names with in the following `objdump` output: ``` $ objdump -S -r main.o @@ -127,12 +127,12 @@ $ objdump -S -r main.o ... ``` -The `-r` or `--reloc ` flags of the `objdump` util print the `relocation` entries of the file. Now let's know a little about relocation process. +The `-r` or `--reloc ` flags of the `objdump` util print the `relocation` entries of the file. Now let's look in more detail at the relocation process. Relocation ------------ -Relocation is the process of connecting symbolic references with symbolic definitions. Let's look on the previous snippet from the `objdump` output: +Relocation is the process of connecting symbolic references with symbolic definitions. Let's look at the previous snippet from the `objdump` output: ``` 14: e8 00 00 00 00 callq 19 diff --git a/README.md b/README.md index 54072e5..ce5d4b2 100644 --- a/README.md +++ b/README.md @@ -3,9 +3,9 @@ linux-insides A series of posts about the linux kernel and its insides. -**The goal is simple** - to share my modest knowledge about the internals of the linux kernel and help people who are interested in the linux kernel internals, and other low-level subject matter. +**The goal is simple** - to share my modest knowledge about the internals of the linux kernel and help people who are interested in linux kernel internals, and other low-level subject matter. -**Questions/Suggestions**: Feel free about any questions or suggestions by pinging me at twitter [@0xAX](https://twitter.com/0xAX), adding [issue](https://github.com/0xAX/linux-internals/issues/new) or just drop me [email](mailto:anotherworldofworld@gmail.com). +**Questions/Suggestions**: Feel free about any questions or suggestions by pinging me at twitter [@0xAX](https://twitter.com/0xAX), adding an [issue](https://github.com/0xAX/linux-internals/issues/new) or just drop me an [email](mailto:anotherworldofworld@gmail.com). Support ------- diff --git a/contributors.md b/contributors.md index 066d753..21c3f70 100644 --- a/contributors.md +++ b/contributors.md @@ -64,3 +64,4 @@ Thank you to all contributors: * [Donny Nadolny](https://github.com/dnadolny) * [Ehsun N](https://github.com/imehsunn) * [Waqar Ahmed](https://github.com/Waqar144) +* [Ian Miell](https://github.com/ianmiell) From 175f348a020af8ead70894d90beb682da420d9d1 Mon Sep 17 00:00:00 2001 From: Waqar144 Date: Sat, 1 Aug 2015 23:59:49 +0500 Subject: [PATCH 08/32] [1/3] Fix sentence structures in linux-bootstrap-3.md --- Booting/linux-bootstrap-3.md | 116 ++++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index 8cc25ac..20a8254 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -4,17 +4,20 @@ Kernel booting process. Part 3. Video mode initialization and transition to protected mode -------------------------------------------------------------------------------- -This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). We will see video mode initialization in the kernel setup code, preparation before switching into the protected mode and transition into it in this part. +This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). In this part, we will see: +- video mode initialization in the kernel setup code, +- preparation before switching into the protected mode, +- transition to protected mode -**NOTE** If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you. +**NOTE: If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you.** -As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts with getting of video mode from the `boot_params.hdr` structure: +As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts by first getting the video mode from the `boot_params.hdr` structure: ```C u16 mode = boot_params.hdr.vid_mode; ``` -which we filled in the `copy_boot_params` function (you can read about it in the previous post). `vid_mode` is an obligatory field which filled by the bootloader. You can find information about it in the kernel boot protocol: +which we filled in the `copy_boot_params` function (you can read about it in the previous post). `vid_mode` is an obligatory field which is filled by the bootloader. You can find information about it in the kernel boot protocol: ``` Offset Proto Name Meaning @@ -34,49 +37,44 @@ vga= line is parsed. ``` -So we can add `vga` option to the grub or another bootloader configuration file and it will pass this option to the kernel command line. This option can have different values as we can read from the description, for example it can be integer number or `ask`. If you will pass `ask`, you see menu like this: +So we can add `vga` option to the grub or another bootloader configuration file and it will pass this option to the kernel command line. This option can have different values as we can mentioned in the description, for example it can be an integer number `0xFFFD` or `ask`. If you pass `ask` t `vga`, you will see a menu like this: ![video mode setup menu](http://oi59.tinypic.com/ejcz81.jpg) -which will suggest to select video mode. We will look on it's implementation, but before we must to know about another things. +which will ask to select a video mode. We will look at it's implementation, but before diving into the implementation we have to look at some other things. Kernel data types -------------------------------------------------------------------------------- -Earlier we saw definitions of different data types like `u16` and etc... in the kernel setup code. Let's look on a couple of data types provided by the kernel: +Earlier we saw definitions of different data types like `u16` etc. in the kernel setup code. Let's look on a couple of data types provided by the kernel: + -``` | Type | char | short | int | long | u8 | u16 | u32 | u64 | |------|------|-------|-----|------|----|-----|-----|-----| | Size | 1 | 2 | 4 | 8 | 1 | 2 | 4 | 8 | -``` -If you will read source code of the kernel, you'll see it very often, so it will be good to remember about it. +If you read source code of the kernel, you'll see these very often and so it will be good to remember them. Heap API -------------------------------------------------------------------------------- -As we got `vid_mode` from the `boot_params.hdr`, we can see call of the `RESET_HEAP` in the `set_video` function. `RESET_HEAP` is a macro which defined in the [boot.h](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L199) and looks as: +After we have `vid_mode` from the `boot_params.hdr` in the `set_video` function we can see call to `RESET_HEAP` function. `RESET_HEAP` is a macro which defined in the [boot.h](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L199). It is defined as: -```C +```c #define RESET_HEAP() ((void *)( HEAP = _end )) ``` -If you read second part, you can remember that we initialized the heap with the [init_heap](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. Since we can use heap, we have a couple functions for it which defined in the `boot.h`. They are: +If you have read the second part, you will remember that we initialized the heap with the [`init_heap`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. We have a couple of utility functions for heap which are defined in the `boot.h`. They are: -```C -#define RESET_HEAP()... +```c +RESET_HEAP() ``` -As we saw just now. It uses for resetting the heap by setting the `HEAP` variable equal to `_end`, where `_end` is just: - -```C -extern char _end[]; -``` +As we saw just above it resets the heap by setting the `HEAP` variable equal to `_end`, where `_end` is just `extern char _end[];` Next is `GET_HEAP` macro: -```C +```c #define GET_HEAP(type, n) \ ((type *)__get_heap(sizeof(type),__alignof__(type),(n))) ``` @@ -84,12 +82,12 @@ Next is `GET_HEAP` macro: for heap allocation. It calls internal function `__get_heap` with 3 parameters: * size of a type in bytes, which need be allocated -* next parameter shows how type of variable is aligned -* how many bytes to allocate +* `__alignof__(type)` shows how type of variable is aligned +* `n` tells how many bytes to allocate Implementation of `__get_heap` is: -```C +```c static inline char *__get_heap(size_t s, size_t a, size_t n) { char *tmp; @@ -101,20 +99,20 @@ static inline char *__get_heap(size_t s, size_t a, size_t n) } ``` -and further we will see usage of it, something like this: +and further we will see its usage, something like: -```C -saved.data = GET_HEAP(u16, saved.x*saved.y); +```c +saved.data = GET_HEAP(u16, saved.x * saved.y); ``` -Let's try to understand how `GET_HEAP` works. We can see here that `HEAP` (which equal to `_end` after `RESET_HEAP()`) is the address of aligned memory according to `a` parameter. After it we save memory address from `HEAP` to the `tmp` variable, move `HEAP` to the end of allocated block and return `tmp` which is start address of allocated memory. +Let's try to understand how `__get_heap` works. We can see here that `HEAP` (which is equal to `_end` after `RESET_HEAP()`) is the address of aligned memory according to `a` parameter. After it we save memory address from `HEAP` to the `tmp` variable, move `HEAP` to the end of allocated block and return `tmp` which is start address of allocated memory. And the last function is: ```C static inline bool heap_free(size_t n) { - return (int)(heap_end-HEAP) >= (int)n; + return (int)(heap_end - HEAP) >= (int)n; } ``` @@ -125,30 +123,41 @@ That's all. Now we have simple API for heap and can setup video mode. Setup video mode -------------------------------------------------------------------------------- -Now we can move directly to video mode initialization. We stopped at the `RESET_HEAP()` call in the `set_video` function. The next call of `store_mode_params` which stores video mode parameters in the `boot_params.screen_info` structure which defined in the [include/uapi/linux/screen_info.h](https://github.com/0xAX/linux/blob/master/include/uapi/linux/screen_info.h). If we will look at `store_mode_params` function, we can see that it starts from the call of the `store_cursor_position` function. As you can understand from the function name, it gets information about cursor and stores it. First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. After `store_cursor_position` executed, `store_video_mode` function will be called. It just gets current video mode and stores it in the `boot_params.screen_info.orig_video_mode`. +Now we can move directly to video mode initialization. We stopped at the `RESET_HEAP()` call in the `set_video` function. Next is the call to `store_mode_params` which stores video mode parameters in the `boot_params.screen_info` structure which is defined in the [include/uapi/linux/screen_info.h](https://github.com/0xAX/linux/blob/master/include/uapi/linux/screen_info.h). -After this, it checks current video mode and set the `video_segment`. After the BIOS transfers control to the boot sector, the following addresses are video memory: +If we will look at `store_mode_params` function, we can see that it starts with the call to `store_cursor_position` function. As you can understand from the function name, it gets information about cursor and stores it. + +First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. + +After `store_cursor_position` executed, `store_video_mode` function will be called. It just gets current video mode and stores it in the `boot_params.screen_info.orig_video_mode`. + +After this, it checks current video mode and sets the `video_segment`. After the BIOS transfers control to the boot sector, the following addresses are for video memory: ``` 0xB000:0x0000 32 Kb Monochrome Text Video Memory 0xB800:0x0000 32 Kb Color Text Video Memory ``` -So we set the `video_segment` variable to `0xb000` if current video mode is MDA, HGC, VGA in monochrome mode or `0xb800` in color mode. After setup of the address of the video segment need to store font size in the `boot_params.screen_info.orig_video_points` with: +So we set the `video_segment` variable to `0xB000` if current video mode is MDA, HGC, VGA in monochrome mode or `0xB800` in color mode. After setup of the address of the video segment font size needs to be stored in the `boot_params.screen_info.orig_video_points` with: -```C +```c set_fs(0); font_size = rdfs16(0x485); boot_params.screen_info.orig_video_points = font_size; ``` -First of all we put 0 to the `FS` register with `set_fs` function. We already saw functions like `set_fs` in the previous part. They are all defined in the [boot.h](https://github.com/0xAX/linux/blob/master/arch/x86/boot/boot.h). Next we read value which located at address `0x485` (this memory location used to get the font size) and save font size in the `boot_params.screen_info.orig_video_points`. +First of all we put 0 to the `FS` register with `set_fs` function. We already saw functions like `set_fs` in the previous part. They are all defined in the [boot.h](https://github.com/0xAX/linux/blob/master/arch/x86/boot/boot.h). Next we read value which is located at address `0x485` (this memory location is used to get the font size) and save font size in the `boot_params.screen_info.orig_video_points`. -The next we get amount of columns and rows by address `0x44a` and stores they in the `boot_params.screen_info.orig_video_cols` and `boot_params.screen_info.orig_video_lines`. After this, execution of the `store_mode_params` is finished. +``` + x = rdfs16(0x44a); + y = (adapter == ADAPTER_CGA) ? 25 : rdfs8(0x484)+1; +``` -The next we can see `save_screen` function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount and etc... to the `saved_screen` structure, which defined as: +Next we get amount of columns by `0x44a` and rows by address `0x484` and store them in the `boot_params.screen_info.orig_video_cols` and `boot_params.screen_info.orig_video_lines`. After this, execution of the `store_mode_params` is finished. -```C +Next we can see `save_screen` function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount etc. and stores it in the `saved_screen` structure, which is defined as: + +```c static struct saved_screen { int x, y; int curx, cury; @@ -156,9 +165,9 @@ static struct saved_screen { } saved; ``` -It checks that heap has free space for it with: +It then checks whether the heap has free space for it with: -```C +```c if (!heap_free(saved.x*saved.y*sizeof(u16)+512)) return; ``` @@ -167,7 +176,7 @@ and allocates space in the heap if it is enough and stores `saved_screen` in it. The next call is `probe_cards(0)` from the [arch/x86/boot/video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L33). It goes over all video_cards and collects number of modes provided by the cards. Here is the interesting moment, we can see the loop: -```C +```c for (card = video_cards; card < video_cards_end; card++) { /* collecting number of modes here */ } @@ -175,7 +184,7 @@ for (card = video_cards; card < video_cards_end; card++) { but `video_cards` not declared anywhere. Answer is simple: Every video mode presented in the x86 kernel setup code has definition like this: -```C +```c static __videocard video_vga = { .card_name = "VGA", .probe = vga_probe, @@ -185,13 +194,13 @@ static __videocard video_vga = { where `__videocard` is a macro: -```C +```c #define __videocard struct card_info __attribute__((used,section(".videocards"))) ``` which means that `card_info` structure: -```C +```c struct card_info { const char *card_name; int (*set_mode)(struct mode_info *mode); @@ -204,7 +213,7 @@ struct card_info { }; ``` -is in the `.videocards` segment. Let's look on the [arch/x86/boot/setup.ld](https://github.com/0xAX/linux/blob/master/arch/x86/boot/setup.ld) linker file, we can see there: +is in the `.videocards` segment. Let's look in the [arch/x86/boot/setup.ld](https://github.com/0xAX/linux/blob/master/arch/x86/boot/setup.ld) linker file, we can see there: ``` .videocards : { @@ -216,13 +225,13 @@ is in the `.videocards` segment. Let's look on the [arch/x86/boot/setup.ld](http It means that `video_cards` is just memory address and all `card_info` structures are placed in this segment. It means that all `card_info` structures are placed between `video_cards` and `video_cards_end`, so we can use it in a loop to go over all of it. After `probe_cards` executed we have all structures like `static __videocard video_vga` with filled `nmodes` (number of video modes). -After that `probe_cards` executed, we move to the main loop in the `setup_video` function. There is infinite loop which tries to setup video mode with the `set_mode` function or prints menu if we passed `vid_mode=ask` to the kernel command line or video mode is undefined. +After `probe_cards` execution is finished, we move to the main loop in the `set_video` function. There is infinite loop which tries to setup video mode with the `set_mode` function or prints a menu if we passed `vid_mode=ask` to the kernel command line or video mode is undefined. -The `set_mode` function is defined in the [video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L147) and gets only one parameter - `mode` which is number of video mode (we got it or from the menu or in the start of the `setup_video`, from kernel setup header). +The `set_mode` function is defined in the [video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L147) and gets only one parameter, `mode` which is the number of video mode (we got it or from the menu or in the start of the `setup_video`, from kernel setup header). -`set_mode` function checks the `mode` and calls `raw_set_mode` function. The `raw_set_mode` calls `set_mode` function for selected card. We can get access to this function from the `card_info` structure, every video mode defines this structure with filled value which depends on video mode (for example for `vga` it is `video_vga.set_mode` function, see above example of `card_info` structure for `vga`). `video_vga.set_mode` is `vga_set_mode`, which checks vga mode and call function depend on mode: +`set_mode` function checks the `mode` and calls `raw_set_mode` function. The `raw_set_mode` calls `set_mode` function for selected card i.e. `card->set_mode(struct mode_info*)`. We can get access to this function from the `card_info` structure, every video mode defines this structure with values filled depending upon the video mode (for example for `vga` it is `video_vga.set_mode` function, see above example of `card_info` structure for `vga`). `video_vga.set_mode` is `vga_set_mode`, which checks the vga mode and calls the respective function: -```C +```c static int vga_set_mode(struct mode_info *mode) { vga_set_basic_mode(); @@ -256,12 +265,18 @@ static int vga_set_mode(struct mode_info *mode) } ``` -Every function which setups video mode, just call `0x10` BIOS interruption with certain value in the `AH` register. After this we have set video mode and now we can switch to the protected mode. +Every function which setups video mode, just calls `0x10` BIOS interrupt with certain value in the `AH` register. + +After we have set video mode, we pass it to the `boot_params.hdr.vid_mode`. + +Next `vesa_store_edid` is called. This function simply stores the [EDID](https://en.wikipedia.org/wiki/Extended_Display_Identification_Data) (**E**xtended **D**isplay **I**dentification **D**ata) information for kernel use. After this `store_mode_params` is called again. Lastly, if `do_restore` is set, screen is restored to an earlier state. + +After this we have set video mode and now we can switch to the protected mode. Last preparation before transition into protected mode -------------------------------------------------------------------------------- -We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see last preparation and switch into the protected mode. +We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. `go_to_protected_mode` defined in the [arch/x86/boot/pm.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pm.c#L104). It contains some functions which make last preparations before we can jump into protected mode, so let's look on it and try to understand what they do and how it works. @@ -539,3 +554,4 @@ Links * [GCC designated inits](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html) * [GCC type attributes](https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html) * [Previous part](linux-bootstrap-2.md) + From 196a8b8f901b089b7ca59606c5e4b4c4dec10187 Mon Sep 17 00:00:00 2001 From: Waqar144 Date: Mon, 3 Aug 2015 00:20:47 +0500 Subject: [PATCH 09/32] [2/3] Fix sentence structures in linux-bootstrap-3.md --- Booting/linux-bootstrap-3.md | 75 +++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 27 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index 20a8254..8c96d9b 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -276,23 +276,27 @@ After this we have set video mode and now we can switch to the protected mode. Last preparation before transition into protected mode -------------------------------------------------------------------------------- -We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. +We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As the comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. `go_to_protected_mode` defined in the [arch/x86/boot/pm.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pm.c#L104). It contains some functions which make last preparations before we can jump into protected mode, so let's look on it and try to understand what they do and how it works. -At first we see call of `realmode_switch_hook` function in the `go_to_protected_mode`. This function invokes real mode switch hook if it is present and disables [NMI](http://en.wikipedia.org/wiki/Non-maskable_interrupt). Hooks are used if bootloader runs in a hostile environment. More about hooks you can read in the [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) (see **ADVANCED BOOT LOADER HOOKS**). `readlmode_swtich` hook presents pointer to the 16-bit real mode far subroutine which disables non-maskable interruptions. After we checked `realmode_switch` hook (it doesn't present for me), there is disabling of non-maskable interruptions: +First is the call to `realmode_switch_hook` function in the `go_to_protected_mode`. This function invokes real mode switch hook if it is present and disables [NMI](http://en.wikipedia.org/wiki/Non-maskable_interrupt). Hooks are used if bootloader runs in a hostile environment. You can read more about hooks in the [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) (see **ADVANCED BOOT LOADER HOOKS**). + +`readlmode_swtich` hook presents pointer to the 16-bit real mode far subroutine which disables non-maskable interrupts. After `realmode_switch` hook (it isn't present for me) is checked, disabling of Non-Maskable Interrupts(NMI) occurs: ```assembly asm volatile("cli"); -outb(0x80, 0x70); +outb(0x80, 0x70); /* Disable NMI */ io_delay(); ``` -At first there is inline assembly instruction with `cli` instruction which clears the interrupt flag (`IF`), after this external interrupts disabled. Next line disables NMI (non-maskable interruption). Interruption is a signal to the CPU which emitted by hardware or software. After getting signal, CPU stops to execute current instructions sequence and transfers control to the interruption handler. After interruption handler finished it's work, it transfers control to the interrupted instruction. Non-maskable interruptions (NMI) - interruptions which are always processed, independently of permission. We will not dive into details interruptions now, but will back to it in the next posts. +At first there is inline assembly instruction with `cli` instruction which clears the interrupt flag (`IF`). After this, external interrupts are disabled. Next line disables NMI (non-maskable interrupt). -Let's back to the code. We can see that second line is writing `0x80` (disabled bit) byte to the `0x70` (CMOS Address register). And call the `io_delay` function after it. `io_delay` which initiates small delay and looks like: +Interrupt is a signal to the CPU which is emitted by hardware or software. After getting signal, CPU suspends current instructions sequence, saves its state and transfers control to the interrupt handler. After interrupt handler has finished it's work, it transfers control to the interrupted instruction. Non-maskable interrupts (NMI) are interrupts which are always processed, independently of permission. It cannot be ignored and is typically used to signal for non-recoverable hardware errors. We will not dive into details of interrupts now, but will discuss it in the next posts. -``` +Let's get back to the code. We can see that second line is writing `0x80` (disabled bit) byte to the `0x70` (CMOS Address register). After that call to the `io_delay` function occurs. `io_delay` causes a small delay and looks like: + +```c static inline void io_delay(void) { const u16 DELAY_PORT = 0x80; @@ -300,11 +304,11 @@ static inline void io_delay(void) } ``` -Outputting any byte to the port `0x80` should delay exactly 1 microsecond. Sow we can write any value (value from `AL` register in our case) to the `0x80` port. After this delay `realmode_switch_hook` function finished execution and we can move to the next function. +Outputting any byte to the port `0x80` should delay exactly 1 microsecond. So we can write any value (value from `AL` register in our case) to the `0x80` port. After this delay `realmode_switch_hook` function has finished execution and we can move to the next function. -The next function is `enable_a20`, which enables [A20 line](http://en.wikipedia.org/wiki/A20_line). This function defined in the [arch/x86/boot/a20.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/a20.c) and it tries to enable A20 gate with different methods. The first is `a20_test_short` function which checks is A20 already enabled or not with `a20_test` function: +The next function is `enable_a20`, which enables [A20 line](http://en.wikipedia.org/wiki/A20_line). This function is defined in the [arch/x86/boot/a20.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/a20.c) and it tries to enable A20 gate with different methods. The first is `a20_test_short` function which checks is A20 already enabled or not with `a20_test` function: -```C +```c static int a20_test(int loops) { int ok = 0; @@ -328,7 +332,11 @@ static int a20_test(int loops) } ``` -First of all we put `0x0000` to the `FS` register and `0xffff` to the `GS` register. Next we read value by address `A20_TEST_ADDR` (it is `0x200`) and put this value into `saved` variable and `ctr`. Next we write updated `ctr` value into `fs:gs` with `wrfs32` function, make little delay, and read value into the `GS` register by address `A20_TEST_ADDR+0x10`, if it's not zero we already have enabled a20 line. If A20 is disabled, we try to enabled it with different method which you can find in the `a20.c`. For example with call of `0x15` BIOS interruption with `AH=0x2041` and etc... If `enabled_a20` function finished with fail, printed error message and called function `die`. You can remember it from the first source code file where we started - [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S): +First of all we put `0x0000` to the `FS` register and `0xffff` to the `GS` register. Next we read value by address `A20_TEST_ADDR` (it is `0x200`) and put this value into `saved` variable and `ctr`. + +Next we write updated `ctr` value into `fs:gs` with `wrfs32` function, then delay for 1ms, and then read the value into the `GS` register by address `A20_TEST_ADDR+0x10`, if it's not zero we already have enabled A20 line. If A20 is disabled, we try to enable it with a different method which you can find in the `a20.c`. For example with call of `0x15` BIOS interrupt with `AH=0x2041` etc. + +If `enabled_a20` function finished with fail, print an error message and call function `die`. You can remember it from the first source code file where we started - [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S): ```assembly die: @@ -337,14 +345,28 @@ die: .size die, .-die ``` -After the a20 gate successfully enabled, there are reset coprocessor and mask all interrupts. And after all of this preparations, we can see actual transition into protected mode. +After the A20 gate is successfully enabled, `reset_coprocessor` function is called: + ```c +outb(0, 0xf0); +outb(0, 0xf1); +``` +This function clears the Math Coprocessor by writing `0` to `0xf0` and then resets it by writing `0` to `0xf1`. + +After this `mask_all_interrupts` function is called: +```c +outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */ +outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */ +``` +This masks all interrupts on the secondary PIC (Programmable Interrupt Controller) and primary PIC except for IRQ2 on the primary PIC. + +And after all of these preparations, we can see actual transition into protected mode. Setup Interrupt Descriptor Table -------------------------------------------------------------------------------- -Then next ist setup of Interrupt Descriptor table (IDT). `setup_idt`: +Now we setup the Interrupt Descriptor table (IDT). `setup_idt`: -```C +```c static void setup_idt(void) { static const struct gdt_ptr null_idt = {0, 0}; @@ -352,23 +374,22 @@ static void setup_idt(void) } ``` -which setups `Interrupt descriptor table` (describes interrupt handlers and etc...). For now IDT is not installed (we will see it later), but now we just load IDT with `lidtl` instruction. `null_idt` contains address and size of IDT, but now they are just zero. `null_idt` is a `gdt_ptr` structure, it looks: - -```C +which setups the Interrupt Descriptor Table (describes interrupt handlers and etc.). For now IDT is not installed (we will see it later), but now we just load IDT with `lidtl` instruction. `null_idt` contains address and size of IDT, but now they are just zero. `null_idt` is a `gdt_ptr` structure, it as defined as: +```c struct gdt_ptr { u16 len; u32 ptr; } __attribute__((packed)); ``` -where we can see - 16-bit length of IDT and 32-bit pointer to it (More details about IDT and interruptions we will see in the next posts). ` __attribute__((packed))` means here that size of `gdt_ptr` minimum as required. So size of the `gdt_ptr` will be 6 bytes here or 48 bits. (Next we will load pointer to the `gdt_ptr` to the `GDTR` register and you can remember from the previous post that it is 48-bits size). +where we can see - 16-bit length(`len`) of IDT and 32-bit pointer to it (More details about IDT and interruptions we will see in the next posts). ` __attribute__((packed))` means here that size of `gdt_ptr` minimum as required. So size of the `gdt_ptr` will be 6 bytes here or 48 bits. (Next we will load pointer to the `gdt_ptr` to the `GDTR` register and you might remember from the previous post that it is 48-bits in size). Setup Global Descriptor Table -------------------------------------------------------------------------------- -The next point is setup of the Global Descriptor Table (GDT). We can see `setup_gdt` function which setups GDT (you can read about it in the [Kernel booting process. Part 2.](linux-bootstrap-2.md#protected-mode)). There is definition of the `boot_gdt` array in this function, which contains definition of the three segments: +Next is the setup of Global Descriptor Table (GDT). We can see `setup_gdt` function which sets up GDT (you can read about it in the [Kernel booting process. Part 2.](linux-bootstrap-2.md#protected-mode)). There is definition of the `boot_gdt` array in this function, which contains definition of the three segments: -```C +```c static const u64 boot_gdt[] __attribute__((aligned(16))) = { [GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff), [GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff), @@ -376,9 +397,8 @@ The next point is setup of the Global Descriptor Table (GDT). We can see `setup_ }; ``` -For code, data and TSS (Task state segment). We will not use task state segment for now, it was added there to make Intel VT happy as we can see in the comment line (if you're interesting you can find commit which describes it - [here](https://github.com/torvalds/linux/commit/88089519f302f1296b4739be45699f06f728ec31)). Let's look on `boot_gdt`. First of all we can note that it has `__attribute__((aligned(16)))` attribute. It means that this structure will be aligned by 16 bytes. Let's look on simple example: - -```C +For code, data and TSS (Task State Segment). We will not use task state segment for now, it was added there to make Intel VT happy as we can see in the comment line (if you're interesting you can find commit which describes it - [here](https://github.com/torvalds/linux/commit/88089519f302f1296b4739be45699f06f728ec31)). Let's look on `boot_gdt`. First of all note that it has `__attribute__((aligned(16)))` attribute. It means that this structure will be aligned by 16 bytes. Let's look at a simple example: +```c #include struct aligned { @@ -409,7 +429,7 @@ Not aligned - 4 Aligned - 16 ``` -`GDT_ENTRY_BOOT_CS` has index - 2 here, `GDT_ENTRY_BOOT_DS` is `GDT_ENTRY_BOOT_CS + 1` and etc... It starts from 2, because first is a mandatory null descriptor (index - 0) and the second is not used (index - 1). +`GDT_ENTRY_BOOT_CS` has index - 2 here, `GDT_ENTRY_BOOT_DS` is `GDT_ENTRY_BOOT_CS + 1` and etc. It starts from 2, because first is a mandatory null descriptor (index - 0) and the second is not used (index - 1). `GDT_ENTRY` is a macro which takes flags, base and limit and builds GDT entry. For example let's look on the code segment entry. `GDT_ENTRY` takes following values: @@ -436,11 +456,11 @@ in binary. Let's try to understand what every bit means. We will go through all * 101 - segment type execute/read/ * 1 - accessed bit -You can know more about every bit in the previous [post](linux-bootstrap-2.md) or in the [Intel® 64 and IA-32 Architectures Software Developer’s Manuals 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html). +You can read more about every bit in the previous [post](linux-bootstrap-2.md) or in the [Intel® 64 and IA-32 Architectures Software Developer's Manuals 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html). After this we get length of GDT with: -```C +```c gdt.len = sizeof(boot_gdt)-1; ``` @@ -448,7 +468,7 @@ We get size of `boot_gdt` and subtract 1 (the last valid address in the GDT). Next we get pointer to the GDT with: -```C +```c gdt.ptr = (u32)&boot_gdt + (ds() << 4); ``` @@ -456,7 +476,7 @@ Here we just get address of `boot_gdt` and add it to address of data segment shi In the last we execute `lgdtl` instruction to load GDT into GDTR register: -```C +```c asm volatile("lgdtl %0" : : "m" (gdt)); ``` @@ -555,3 +575,4 @@ Links * [GCC type attributes](https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html) * [Previous part](linux-bootstrap-2.md) + From 4af04f58c682a6e912224860eddfc9055125855b Mon Sep 17 00:00:00 2001 From: Waqar144 Date: Mon, 3 Aug 2015 00:39:25 +0500 Subject: [PATCH 10/32] [3/3] Fix sentence structures in linux-bootstrap-3.md --- Booting/linux-bootstrap-3.md | 38 ++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index 8c96d9b..24791e6 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -472,9 +472,9 @@ Next we get pointer to the GDT with: gdt.ptr = (u32)&boot_gdt + (ds() << 4); ``` -Here we just get address of `boot_gdt` and add it to address of data segment shifted on 4 (remember we're in the real mode now). +Here we just get address of `boot_gdt` and add it to address of data segment left-shifted by 4 bits (remember we're in the real mode now). -In the last we execute `lgdtl` instruction to load GDT into GDTR register: +Lastly we execute `lgdtl` instruction to load GDT into GDTR register: ```c asm volatile("lgdtl %0" : : "m" (gdt)); @@ -485,23 +485,27 @@ Actual transition into protected mode It is the end of `go_to_protected_mode` function. We loaded IDT, GDT, disable interruptions and now can switch CPU into protected mode. The last step we call `protected_mode_jump` function with two parameters: -```C +```c protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4)); ``` -which defined in the [arch/x86/boot/pmjump.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pmjump.S#L26). It takes two parameters: +which is defined in the [arch/x86/boot/pmjump.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pmjump.S#L26). It takes two parameters: * address of protected mode entry point * address of `boot_params` -Let's look inside `protected_mode_jump`. As i wrote above, you can find it in the `arch/x86/boot/pmjump.S`. First parameter will be in `eax` register and second is in `edx`. First of all we put address of `boot_params` to the `esi` register and address of code segment register `cs` (0x1000) to the `bx`. After this we shift `bx` on 4 and add address of label `2` to it (we will have physical address of label `2` in the `bx` after it) and jump to label `1`. Next we put data segment and task state segment in the `cs` and `di` registers with: +Let's look inside `protected_mode_jump`. As I wrote above, you can find it in the `arch/x86/boot/pmjump.S`. First parameter will be in `eax` register and second is in `edx`. + +First of all we put address of `boot_params` in the `esi` register and address of code segment register `cs` (0x1000) in the `bx`. After this we shift `bx` by 4 bits and add address of label `2` to it (we will have physical address of label `2` in the `bx` after it) and jump to label `1`. Next we put data segment and task state segment in the `cs` and `di` registers with: ```assembly movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di ``` -As you can read above `GDT_ENTRY_BOOT_CS` has index 2 and every GDT entry is 8 byte, so `CS` will be `2 * 8 = 16`, `__BOOT_DS` is 24 and etc... Next we set `PE` (Protection Enable) bit in the `CR0` control register: +As you can read above `GDT_ENTRY_BOOT_CS` has index 2 and every GDT entry is 8 byte, so `CS` will be `2 * 8 = 16`, `__BOOT_DS` is 24 etc. + +Next we set `PE` (Protection Enable) bit in the `CR0` control register: ```assembly movl %cr0, %edx @@ -517,16 +521,20 @@ and make long jump to the protected mode: .word __BOOT_CS ``` -where `0x66` is the operand-size prefix, which allows to mix 16-bit and 32-bit code, `0xea` - is the jump opcode, `in_pm32` is the segment offset and `__BOOT_CS` is the segment. +where +* `0x66` is the operand-size prefix which allows to mix 16-bit and 32-bit code, +* `0xea` - is the jump opcode, +* `in_pm32` is the segment offset +* `__BOOT_CS` is the code segment. -After this we are in the protected mode: +After this we are finally in the protected mode: ```assembly .code32 .section ".text32","ax" ``` -Let's look on the first steps in the protected mode. First of all we setup data segment with: +Let's look at the first steps in the protected mode. First of all we setup data segment with: ```assembly movl %ecx, %ds @@ -536,7 +544,7 @@ movl %ecx, %gs movl %ecx, %ss ``` -if you read with attention, you can remember that we saved `$__BOOT_DS` in the `cx` register. Now we fill with it all segment registers besides `cs` (`cs` is already `__BOOT_CS`). Next we zero out all general purpose registers besides `eax` with: +If you read with attention, you can remember that we saved `$__BOOT_DS` in the `cx` register. Now we fill with it all segment registers besides `cs` (`cs` is already `__BOOT_CS`). Next we zero out all general purpose registers besides `eax` with: ```assembly xorl %ecx, %ecx @@ -552,16 +560,18 @@ And jump to the 32-bit entry point in the end: jmpl *%eax ``` -remember that `eax` contains address of the 32-bit entry (we passed it as first parameter into `protected_mode_jump`). That's all we're in the protected mode and stops before it's entry point. What is happening after we joined in the 32-bit entry point we will see in the next part. +Remember that `eax` contains address of the 32-bit entry (we passed it as first parameter into `protected_mode_jump`). + +That's all we're in the protected mode and stop at it's entry point. What happens next, we will see in the next part. Conclusion -------------------------------------------------------------------------------- It is the end of the third part about linux kernel internals. In next part we will see first steps in the protected mode and transition into the [long mode](http://en.wikipedia.org/wiki/Long_mode). -If you will have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). +If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). -**Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).** +**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes, please send me a PR with corrections at [linux-internals](https://github.com/0xAX/linux-internals).** Links -------------------------------------------------------------------------------- @@ -574,5 +584,3 @@ Links * [GCC designated inits](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html) * [GCC type attributes](https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html) * [Previous part](linux-bootstrap-2.md) - - From bc872eb2f7e33ca6fbc11a92b30db8c72662d0cc Mon Sep 17 00:00:00 2001 From: Waqar Ahmed Date: Mon, 3 Aug 2015 01:12:27 +0500 Subject: [PATCH 11/32] Fix conflicts --- Booting/linux-bootstrap-3.md | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index d052d41..9d40a68 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -1,8 +1,8 @@ Kernel booting process. Part 3. -================================================================================ +=============================== Video mode initialization and transition to protected mode --------------------------------------------------------------------------------- +---------------------------------------------------------- This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). In this part, we will see: - video mode initialization in the kernel setup code, @@ -13,7 +13,7 @@ This is the third part of the `Kernel booting process` series. In the previous [ As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts by first getting the video mode from the `boot_params.hdr` structure: -```C +```c u16 mode = boot_params.hdr.vid_mode; ``` @@ -126,15 +126,9 @@ Setup video mode Now we can move directly to video mode initialization. We stopped at the `RESET_HEAP()` call in the `set_video` function. Next is the call to `store_mode_params` which stores video mode parameters in the `boot_params.screen_info` structure which is defined in the [include/uapi/linux/screen_info.h](https://github.com/0xAX/linux/blob/master/include/uapi/linux/screen_info.h). If we will look at `store_mode_params` function, we can see that it starts with the call to `store_cursor_position` function. As you can understand from the function name, it gets information about cursor and stores it. -<<<<<<< HEAD First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. -======= - -First of all `store_cursor_position` initializes two variables which has type - `biosregs`, with `AH = 0x3` and calls `0x10` BIOS interruption. After interruption successfully executed, it returns row and column in the `DL` and `DH` registers. Row and column will be stored in the `orig_x` and `orig_y` fields from the the `boot_params.screen_info` structure. - ->>>>>>> e10c624957b2e01a8b2182ac9e6e5684c13dbde6 After `store_cursor_position` executed, `store_video_mode` function will be called. It just gets current video mode and stores it in the `boot_params.screen_info.orig_video_mode`. After this, it checks current video mode and sets the `video_segment`. After the BIOS transfers control to the boot sector, the following addresses are for video memory: @@ -282,11 +276,7 @@ After this we have set video mode and now we can switch to the protected mode. Last preparation before transition into protected mode -------------------------------------------------------------------------------- -<<<<<<< HEAD We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As the comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. -======= -We can see the last function call - `go_to_protected_mode` in the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L184). As comment says: `Do the last things and invoke protected mode`, so let's see these last things and switch into the protected mode. ->>>>>>> e10c624957b2e01a8b2182ac9e6e5684c13dbde6 `go_to_protected_mode` defined in the [arch/x86/boot/pm.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/pm.c#L104). It contains some functions which make last preparations before we can jump into protected mode, so let's look on it and try to understand what they do and how it works. @@ -466,7 +456,7 @@ in binary. Let's try to understand what every bit means. We will go through all * 101 - segment type execute/read/ * 1 - accessed bit -You can read more about every bit in the previous [post](linux-bootstrap-2.md) or in the [Intel® 64 and IA-32 Architectures Software Developer's Manuals 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html). +You can read more about every bit in the previous [post](linux-bootstrap-2.md) or in the [Intel® 64 and IA-32 Architectures Software Developer's Manuals 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html). After this we get length of GDT with: From 5b033496def51ebc8dd6163781f322ca0b056750 Mon Sep 17 00:00:00 2001 From: Simon Funke Date: Sun, 2 Aug 2015 22:56:10 +0200 Subject: [PATCH 12/32] Fix spelling mistake --- Misc/linkers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Misc/linkers.md b/Misc/linkers.md index 726f5f4..76adf41 100644 --- a/Misc/linkers.md +++ b/Misc/linkers.md @@ -189,7 +189,7 @@ $ readelf -d factorial | grep \(INIT\) 0x000000000000000c (INIT) 0x4003a8 ``` -So, the address of the `main` function is the `0000000000400506` and it is offset from the `.init` section. As we can see from the output, the address of the `factorial` function is `0x0000000000400537` and binary code for the call of the `factorial` function now is `e8 18 00 00 00`. We already knwo that `e8` is opcode for the `call` instruction, the next `18 00 00 00` (note that address represented as little endian for the `x86_64`, in other words it is `00 00 00 18`) is the offset from the `callq` to the `factorial` function: +So, the address of the `main` function is the `0000000000400506` and it is offset from the `.init` section. As we can see from the output, the address of the `factorial` function is `0x0000000000400537` and binary code for the call of the `factorial` function now is `e8 18 00 00 00`. We already know that `e8` is opcode for the `call` instruction, the next `18 00 00 00` (note that address represented as little endian for the `x86_64`, in other words it is `00 00 00 18`) is the offset from the `callq` to the `factorial` function: ```python >>> hex(0x40051a + 0x18 + 0x5) == hex(0x400537) From 650aa0f1fe9dc3995c303a0e0e37adf3102f7613 Mon Sep 17 00:00:00 2001 From: Nahakiole Date: Mon, 3 Aug 2015 06:55:17 +0200 Subject: [PATCH 13/32] Fixed some grammar and spelling mistakes. --- Misc/how_kernel_compiled.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/Misc/how_kernel_compiled.md b/Misc/how_kernel_compiled.md index ee9097b..8c08315 100644 --- a/Misc/how_kernel_compiled.md +++ b/Misc/how_kernel_compiled.md @@ -4,7 +4,9 @@ Process of the Linux kernel building Introduction -------------------------------------------------------------------------------- -I will not tell you how to build and install custom Linux kernel on your machine, you can find many many [resources](https://encrypted.google.com/search?q=building+linux+kernel#q=building+linux+kernel+from+source+code) that will help you to do it. Instead, we will know what does occur when you are typed `make` in the directory with Linux kernel source code in this part. When I just started to learn source code of the Linux kernel, the [Makefile](https://github.com/torvalds/linux/blob/master/Makefile) file was a first file that I've opened. And it was scary :) This [makefile](https://en.wikipedia.org/wiki/Make_%28software%29) contains `1591` lines of code at the time when I wrote this part and it was [third](https://github.com/torvalds/linux/commit/52721d9d3334c1cb1f76219a161084094ec634dc) release candidate. +I won't tell you how to build and install a custom Linux kernel on your machine. If you need help with this, you can find many [resources](https://encrypted.google.com/search?q=building+linux+kernel#q=building+linux+kernel+from+source+code) that will help you do it. Instead, we will learn what occurs when you type `make` in the directory of the Linux kernel source code. + +When I started to study the source code of the Linux kernel, the [makefile](https://github.com/torvalds/linux/blob/master/Makefile) was the first file that I opened. And it was scary :). The [makefile](https://en.wikipedia.org/wiki/Make_%28software%29) contained `1591` lines of code when I wrote this and this was the [4.2.0-rc3](https://github.com/torvalds/linux/commit/52721d9d3334c1cb1f76219a161084094ec634dc) release. This makefile is the the top makefile in the Linux kernel source code and kernel build starts here. Yes, it is big, but moreover, if you've read the source code of the Linux kernel you can noted that all directories with a source code has an own makefile. Of course it is not real to describe how each source files compiled and linked. So, we will see compilation only for the standard case. You will not find here building of the kernel's documentation, cleaning of the kernel source code, [tags](https://en.wikipedia.org/wiki/Ctags) generation, [cross-compilation](https://en.wikipedia.org/wiki/Cross_compiler) related stuff and etc. We will start from the `make` execution with the standard kernel configuration file and will finish with the building of the [bzImage](https://en.wikipedia.org/wiki/Vmlinux#bzImage). @@ -15,8 +17,8 @@ So let's start. Preparation before the kernel compilation --------------------------------------------------------------------------------- -There are many things to preparate before the kernel compilation will be started. The main point here is to find and configure -the type of compilation, to parse command line arguments that are passed to the `make` util and etc. So let's dive into the top `Makefile` of the Linux kernel. +There are many things to prepare before the kernel compilation will be started. The main point here is to find and configure +The type of compilation, to parse command line arguments that are passed to the `make` util and etc. So let's dive into the top `Makefile` of the Linux kernel. The Linux kernel top `Makefile` is responsible for building two major products: [vmlinux](https://en.wikipedia.org/wiki/Vmlinux) (the resident kernel image) and the modules (any module files). The [Makefile](https://github.com/torvalds/linux/blob/master/Makefile) of the Linux kernel starts from the definition of the following variables: @@ -85,7 +87,7 @@ We check the `KBUILD_SRC` that represent top directory of the source code of the * Store value of the `KBUILD_OUTPUT` in the temp `saved-output` variable; * Try to create given output directory; * Check that directory created, in other way print error; -* If custom output directory created sucessfully, execute `make` again with the new directory (see `-C` option). +* If custom output directory created successfully, execute `make` again with the new directory (see `-C` option). The next `ifeq` statements checks that `C` or `M` options was passed to the make: @@ -116,7 +118,7 @@ obj := $(objtree) export srctree objtree VPATH ``` -That tells to `Makefile` that source tree of the Linux kernel will be in the current directory where `make` command was executed. After this we set `objtree` and other variables to this directory and export these variables. The next step is the getting value for the `SUBARCH` variable that will represent tewhat the underlying archicecture is: +That tells to `Makefile` that source tree of the Linux kernel will be in the current directory where `make` command was executed. After this we set `objtree` and other variables to this directory and export these variables. The next step is the getting value for the `SUBARCH` variable that will represent what the underlying architecture is: ```Makefile SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \ @@ -300,7 +302,7 @@ prepare1: prepare2 $(version_h) include/generated/utsrelease.h \ prepare2: prepare3 outputmakefile asm-generic ``` -The first `prepare0` expands to the `archprepare` that exapnds to the `archheaders` and `archscripts` that defined in the `x86_64` specific [Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/Makefile). Let's look on it. The `x86_64` specific makefile starts from the definition of the variables that are related to the archicteture-specific configs ([defconfig](https://github.com/torvalds/linux/tree/master/arch/x86/configs) and etc.). After this it defines flags for the compiling of the [16-bit](https://en.wikipedia.org/wiki/Real_mode) code, calculating of the `BITS` variable that can be `32` for `i386` or `64` for the `x86_64` flags for the assembly source code, flags for the linker and many many more (all definitions you can find in the [arch/x86/Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/Makefile)). The first target is `archheaders` in the makefile generates syscall table: +The first `prepare0` expands to the `archprepare` that expands to the `archheaders` and `archscripts` that defined in the `x86_64` specific [Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/Makefile). Let's look on it. The `x86_64` specific makefile starts from the definition of the variables that are related to the architecture-specific configs ([defconfig](https://github.com/torvalds/linux/tree/master/arch/x86/configs) and etc.). After this it defines flags for the compiling of the [16-bit](https://en.wikipedia.org/wiki/Real_mode) code, calculating of the `BITS` variable that can be `32` for `i386` or `64` for the `x86_64` flags for the assembly source code, flags for the linker and many many more (all definitions you can find in the [arch/x86/Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/Makefile)). The first target is `archheaders` in the makefile generates syscall table: ```Makefile archheaders: @@ -380,7 +382,7 @@ Note on the `build`. It defined in the [scripts/Kbuild.include](https://github.c build := -f $(srctree)/scripts/Makefile.build obj ``` -or in our case it is current source directory - `.`: +Or in our case it is current source directory - `.`: ```Makefile $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.build obj=. From 7a3a4014c945d1d8ab4a7a164a461abb039c8f38 Mon Sep 17 00:00:00 2001 From: Nan Xiao Date: Mon, 3 Aug 2015 16:18:53 +0800 Subject: [PATCH 14/32] Update linux-initialization-2.md Fix some typos. --- Initialization/linux-initialization-2.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Initialization/linux-initialization-2.md b/Initialization/linux-initialization-2.md index 542deb7..568881e 100644 --- a/Initialization/linux-initialization-2.md +++ b/Initialization/linux-initialization-2.md @@ -20,11 +20,11 @@ Some theory Interrupt is an event caused by software or hardware to the CPU. On interrupt, CPU stops the current task and transfer control to the interrupt handler, which handles interruption and transfer control back to the previously stopped task. We can split interrupts on three types: -* Software interrupts - when a software signals CPU that it needs kernel attention. These interrupts generally used for system calls; -* Hardware interrupts - when a hardware, for example button pressed on a keyboard; +* Software interrupts - when a software signals CPU that it needs kernel attention. These interrupts are generally used for system calls; +* Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard; * Exceptions - interrupts generated by CPU, when the CPU detects error, for example division by zero or accessing a memory page which is not in RAM. -Every interrupt and exception is assigned an unique number which called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `31` to `255` are used for user-defined interrupts. We can see it in the code above - `NUM_EXCEPTION_VECTORS`, which defined as: +Every interrupt and exception is assigned a unique number which called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts. We can see it in the code above - `NUM_EXCEPTION_VECTORS`, which defined as: ```C #define NUM_EXCEPTION_VECTORS 32 From 40e7cb75a4ef37d39fb7b548fb2e826987bb887d Mon Sep 17 00:00:00 2001 From: Johan Manuel Date: Mon, 3 Aug 2015 14:36:30 +0200 Subject: [PATCH 15/32] fix typos and sentences in linkers.md --- Misc/linkers.md | 70 ++++++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 35 deletions(-) diff --git a/Misc/linkers.md b/Misc/linkers.md index 76adf41..ba615d1 100644 --- a/Misc/linkers.md +++ b/Misc/linkers.md @@ -3,7 +3,7 @@ Introduction During the writing of the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book I have received many emails with questions related to the [linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) script and linker-related subjects. So I've decided to write this to cover some aspects of the linker and the linking of object files. -If we open page the `Linker` page on wikipidia, we will see following definition: +If we open page the `Linker` page on wikipidia, we can see the following definition: >In computer science, a linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file. @@ -34,7 +34,7 @@ int main(int argc, char **argv) { } ``` -The `lib.c` contains: +The `lib.c` file contains: ```C int factorial(int base) { @@ -53,7 +53,7 @@ int factorial(int base) { } ``` -And the `lib.h`: +And the `lib.h` file contains: ```C #ifndef LIB_H @@ -140,14 +140,14 @@ Relocation is the process of connecting symbolic references with symbolic defini 19: 89 c6 mov %eax,%esi ``` -Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`. Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man: +Note `e8 00 00 00 00` on the first line. The `e8` is the [opcode](https://en.wikipedia.org/wiki/Opcode) of the `call` instruction with a relative offset. So the `e8 00 00 00 00` contains a one-byte operation code followed by a four-byte address. Note that the `00 00 00 00` is 4-bytes, but why only 4-bytes if an address can be 8-bytes in the `x86_64`? Actually we compiled the `main.c` source code file with the `-mcmodel=small`. From the `gcc` man: ``` -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. ``` -Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quoute from `gcc` manual. In this way 4-bytes enough for this. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see: +Of course we didn't pass this option to the `gcc` when we compiled the `main.c`, but it is default. We know that our program will be linked in the lower 2 GB of the address space from the quote from the `gcc` manual. With this code model, 4-bytes is enough to represent the address. So we have opcode of the `call` instruction and unknown address. When we compile `main.c` with all dependencies to the executable file and will look on the call of the factorial we will see: ``` $ gcc main.c lib.c -o factorial | objdump -S factorial | grep factorial @@ -168,7 +168,7 @@ factorial: file format elf64-x86-64 ... ``` -As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You already can know that standard C program is linked with the `glibc` C standard library if the `-nostdlib` was not passed to the `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that placed in the special section which is called `.init`. We can see it in the beginning of the objdump output: +As we can see in the previous output, the address of the `main` function is `0x0000000000400506`. Why it does not starts from the `0x0`? You may already know that standard C programs are linked with the `glibc` C standard library unless `-nostdlib` is passed to `gcc`. The compiled code for a program includes constructors functions to initialize data in the program when the program is started. These functions need to be called before the program is started or in another words before the `main` function is called. To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. Execution of this program will starts from the code that is placed in the special section which is called `.init`. We can see it in the beginning of the objdump output: ``` objdump -S factorial | less @@ -182,7 +182,7 @@ Disassembly of section .init: 4003ac: 48 8b 05 a5 05 20 00 mov 0x2005a5(%rip),%rax # 600958 <_DYNAMIC+0x1d0> ``` -Not that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format): +Note that it starts at the `0x00000000004003a8` address relative to the `glibc` code. We can check it also in the resulted [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format): ``` $ readelf -d factorial | grep \(INIT\) @@ -196,7 +196,7 @@ So, the address of the `main` function is the `0000000000400506` and it is offse True ``` -So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of they will be overlapped. Just now we saw a process which called - `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. +So we add `0x18` and `0x5` to the address of the `call` instruction. The offset is measured from the address of the following instruction. Our call instruction is 5-bytes size - `e8 18 00 00 00` and the `0x18` is the offset from the next after call instruction to the `factorial` function. A compiler generally creates each object file with the program addresses starting at zero. But if a program is created from multiple object files, all of them will be overlapped. Just now we saw a process which is called `relocation`. This process assigns load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. Ok, now we know a little about linkers and relocation. Time to link our object files and to know more about linkers. @@ -248,7 +248,7 @@ Here we can see two problems: * Linker can't find `_start` symbol; * Linker does not know anything about `printf` function. -First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I've started to learn programming I have learned that `main` function is the entry point of the program. I think you learned this too :) But actually it is not entry point, there is `_start` instead. The `_start` symbol defined in the `crt1.o` object file. We can find it with the: +First of all let's try to understand what is this `_start` entry symbol that appears to be required for our program to run? When I started to learn programming I learned that the `main` function is the entry point of the program. I think you learned this too :) But it actually isn't the entry point, it's `_start` instead. The `_start` symbol is defined in the `crt1.o` object file. We can find it with the following command: ``` $ objdump -S /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o @@ -266,7 +266,7 @@ Disassembly of section .text: ... ``` -and we pass this object file to the `ld` command as first argumet (see above). Now let's try to link it and will look on result: +We pass this object file to the `ld` command as its first argument (see above). Now let's try to link it and will look on result: ``` ld /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/crt1.o \ @@ -286,7 +286,7 @@ Unfortunately we will see even more errors. We can see here old error about unde * `__libc_csu_init` * `__libc_start_main` -The `_start` symbol defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there: +The `_start` symbol is defined in the [sysdeps/x86_64/start.S](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=0d27a38e9c02835ce17d1c9287aa01be222e72eb;hb=HEAD) assembly file in the `glibc` source code. We can find following assembly code lines there: ```assembly mov $__libc_csu_fini, %R8_LP @@ -295,12 +295,12 @@ mov $__libc_csu_init, %RCX_LP call __libc_start_main ``` -Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when program runned and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files: +Here we pass address of the entry point to the `.init` and `.fini` section that contain code that starts to execute when the program is ran and the code that executes when program terminates. And in the end we see the call of the `main` function from our program. These three symbols are defined in the [csu/elf-init.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/elf-init.c;hb=1d4bbc54bd4f7d85d774871341b49f4357af1fb7) source code file. The following two object files: * `crtn.o`; * `crtn.i`. -Defines the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively). +define the function prologs/epilogs for the .init and .fini sections (with the `_init` and `_fini` symbols respectively). The `crtn.o` object file contains these `.init` and `.fini` sections: @@ -318,7 +318,7 @@ Disassembly of section .fini: 4: c3 retq ``` -And the `crti.o` contains `_init` and `_fini` symbols. Let's try to link again with these two object files: +And the `crti.o` object file contains the `_init` and `_fini` symbols. Let's try to link again with these two object files: ``` $ ld \ @@ -328,7 +328,7 @@ $ ld \ -o factorial ``` -And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search the standard library in the paths that are pointed in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option: +And anyway we will get the same errors. Now we need to pass `-lc` option to the `ld`. This option will search for the standard library in the paths present in the `$LD_LIBRARY_PATH` enviroment variable. Let's try to link again wit the `-lc` option: ``` $ ld \ @@ -338,7 +338,7 @@ $ ld \ -o factorial ``` -Finally we will get executable file, but if we will try to run it, we will get strange result: +Finally we get an executable file, but if we try to run it, we will get strange results: ``` $ ./factorial @@ -392,7 +392,7 @@ Note on the strange line: [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] ``` -The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of an Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it placed in the `/lib64/ld-linux-x86-64.so.2` for the `x86_64`. Now let's add pass `-dynamic-linker` option with the path of the `ld-linux-x86-64.so.2` to the `ld` and will see on the result: +The `.interp` section in the `elf` file holds the path name of a program interpreter or in another words the `.interp` section simply contains an `ascii` string that is the name of the dynamic linker. The dynamic linker is the part of Linux that loads and links shared libraries needed by an executable when it is executed, by copying the content of libraries from disk to RAM. As we can see in the output of the `readelf` command it is placed in the `/lib64/ld-linux-x86-64.so.2` file for the `x86_64` architecture. Now let's add the `-dynamic-linker` option with the path of `ld-linux-x86-64.so.2` to the `ld` call and will see the following results: ``` $ gcc -c main.c lib.c @@ -413,7 +413,7 @@ $ ./factorial factorial of 5 is: 120 ``` -It works! With the first line we compile the `main.c` and the `lib.c` source code files to the object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`: +It works! With the first line we compile the `main.c` and the `lib.c` source code files to object files. We will get the `main.o` and the `lib.o` after execution of the `gcc`: ``` $ file lib.o main.o @@ -421,12 +421,12 @@ lib.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped ``` -and after this we link object files of the our program with the needed system object files and libraries. We just saw simple example how to compile and link C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker` and etc. Moreover `GNU ld` has own language that allows to control of the linking process. In the next two paragraps we will look it. +and after this we link object files of our program with the needed system object files and libraries. We just saw a simple example of how to compile and link a C program with the `gcc` compiler and `GNU ld` linker. In this example we have used a couple command line options of the `GNU linker`, but it supports much more command line options than `-o`, `-dynamic-linker`, etc... Moreover `GNU ld` has its own language that allows to control the linking process. In the next two paragraphs we will look into it. Useful command line options of the GNU linker ---------------------------------------------- -As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o ` - that tells `ld` to produce an output file called `output` as the result of linking, `-l` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course the `ld` supports much more command line options, let's look on some of it. +As I already wrote and as you can see in the manual of the `GNU linker`, it has big set of the command line options. We've seen a couple of options in this post: `-o ` - that tells `ld` to produce an output file called `output` as the result of linking, `-l` that adds the archive or object file specified by the name, `-dynamic-linker` that specifies the name of the dynamic linker. Of course `ld` supports much more command line options, let's look at some of them. The first useful command line option is `@file`. In this case the `file` specifies filename where command line options will be read. For example we can create file with the name `linker.ld`, put there our command line arguments from the previous example and execute it with: @@ -434,9 +434,9 @@ The first useful command line option is `@file`. In this case the `file` specifi $ ld @linker.ld ``` -The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is command line option for the same purpose but for the output file: `--oformat=output-format`. +The next command line option is `-b` or `--format`. This command line option specifies format of the input object files `ELF`, `DJGPP/COFF` and etc. There is a command line option for the same purpose but for the output file: `--oformat=output-format`. -The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case when this command line option can be useful. For example let's look in the Linux kernel source code and more precisely in the Makefile that related to the kernel decompression for ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile). We can find following definition there: +The next command line option is `--defsym`. Full format of this command line option is the `--defsym=symbol=expression`. It allows to create global symbol in the output file containing the absolute address given by expression. We can find following case where this command line option can be useful: in the Linux kernel source code and more precisely in the Makefile that is related to the kernel decompression for the ARM architecture - [arch/arm/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/arm/boot/compressed/Makefile), we can find following definition: ``` LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ) @@ -471,12 +471,12 @@ $ ld -M @linker.ld 0x000000000040041b factorial ``` -Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not full set of the command line options support by the `ld` util. Full description you can find in the manual of this util. +Of course the `GNU linker` support standard command line options: `--help` and `--version` that print common help of the usage of the `ld` and its version. That's all about command line options of the `GNU linker`. Of course it is not the full set of command line options supported by the `ld` util. You can find the complete documentation of the `ld` util in the manual. Control Language linker ---------------------------------------------- -As I wrote previously, the `ld` has support of the own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details. +As I wrote previously, `ld` has support for its own language. It accepts Linker Command Language files written in a superset of AT&T's Link Editor Command Language syntax, to provide explicit and total control over the linking process. Let's look on its details. With the linker language we can control: @@ -484,9 +484,9 @@ With the linker language we can control: * output files; * file formats * addresses of sections; -* and etc. +* etc... -Usually commands written on linker control language placed in a file that called - linker script. We can pass it to the `ld` with the `-T` command line option. The main command in the each linker script is the `SECTIONS`. Each linker script must contain this command and it determines the `map` of the output file. The special variable - `.` contains current position of the output. Let's write simple assembly program and will look how we can use linker script to control linking of this program. For example it will be hello world: +Commands written in the linker control language are usually placed in a file called linker script. We can pass it to `ld` with the `-T` command line option. The main command in a linker script is the `SECTIONS` command. Each linker script must contain this command and it determines the `map` of the output file. The special variable `.` contains current position of the output. Let's write simple assembly program andi we will look at how we can use a linker script to control linking of this program. We will take a hello world program for this example: ```assembly section .data @@ -504,14 +504,14 @@ _start: syscall ``` -We can compile and link it with the: +We can compile and link it with the following commands: ``` $ nasm -f elf64 -o hello.o hello.asm $ ld -o hello hello.o ``` -Our program consists from tw sections: `.text` - contains code of the program and `.data` - contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is: +Our program consists from two sections: `.text` contains code of the program and `.data` contains initialized variables. Let's write simple linker script and try to link our `hello.asm` assembly file with it. Our script is: ``` /* @@ -535,7 +535,7 @@ SECTIONS } ``` -On the first three lines you can see comment that written with `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` command specifies name of the our executable file and its format. The next command - is `INPUT` specfies input file to the `ld` linker. After all of this command we can see main `SECTIONS` command, as I already wrote each linker script must contain definition of this command. The `SECTIONS` command represents set and order of the sections which are will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` section as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In another words the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section. +On the first three lines you can see a comment written in `C` style. After it the `OUTPUT` and the `OUTPUT_FORMAT` commands specifiy the name of our executable file and its format. The next command, `INPUT`, specfies the input file to the `ld` linker. Then, we can see the main `SECTIONS` command, which, as I already wrote, must be present in every linker script. The `SECTIONS` command represents the set and order of the sections which will be in the output file. At the beginning of the `SECTIONS` command we can see following line `. = 0x200000`. I already wrote above that `.` command points to the current position of the output. This line says that the code should be loaded at address `0x200000` and the line `. = 0x400000` says that data section should be loaded at address `0x400000`. The second line after the `. = 0x200000` defines `.text` as an output section. We can see `*(.text)` expression inside it. The `*` symbol is wildcard that matches any file name. In other words, the `*(.text)` expression says all `.text` input sections in all input files. We can rewrite it as `hello.o(.text)` for our example. After the following location counter `. = 0x400000`, we can see definition of the data section. We can compile and link it with the: @@ -544,7 +544,7 @@ $ nasm -f elf64 -o hello.o hello.S && ld -T linker.script && ./hello hello, world! ``` -If we will look inside with the `objdump` util, we will see that `.text` section starts from the `0x200000` and the `.data` sections starts from the `0x400000` address: +If we will look insidei it with the `objdump` util, we can see that `.text` section starts from the address `0x200000` and the `.data` sections starts from the address `0x400000`: ``` $ objdump -D hello @@ -562,13 +562,13 @@ Disassembly of section .data: ... ``` -Except of those comands that we have already seen, there are a few other linker scripts commands. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that setup header of the Linux kernel has offset - `0x1f1`. In the linker script of the Linux kernel we can find check for this: +Apart from the commands we have already seen, there are a few others. The first is the `ASSERT(exp, message)` that ensures that given expression is not zero. If it is zero, then exit the linker with an error code and print the given error message. If you've read about Linux kernel booting process in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book, you may know that the setup header of the Linux kernel has offset `0x1f1`. In the linker script of the Linux kernel we can find a check for this: ``` . = ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!"); ``` -The next `INCLUDE filename` command allows to include external linker script symbols to the current. In a linker script we can assign a value to a symbol. The `ld` support a couple of assignment operators: +The `INCLUDE filename` command allows to include external linker script symbols in the current one. In a linker script we can assign a value to a symbol. `ld` supports a couple of assignment operators: * symbol = expression ; * symbol += expression ; @@ -615,11 +615,11 @@ That's all. Conclusion ----------------- -This is the end of the post about linkers. We knew many things about linkers in this post, such things like what is it linker and why we need in it, how to use it and etc.. +This is the end of the post about linkers. We learned many things about linkers in this post, such as what is a linker and why it is needed, how to use it, etc.. -If you will have any questions or suggestions write me [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) in twitter. +If you have any questions or suggestions, write me an [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) on twitter. -Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send let me know via emal or send a PR. +Please note that English is not my first language, and I am really sorry for any inconvenience. If you find any mistakes please let me know via email or send a PR. Links ----------------- From 2538edd5d9126ae42360d911a7ce8c3af392bd46 Mon Sep 17 00:00:00 2001 From: Waqar Ahmed Date: Tue, 4 Aug 2015 19:24:50 +0500 Subject: [PATCH 16/32] Fix sentence structures in linux-bootstrap-3 Updated. --- Booting/linux-bootstrap-3.md | 62 ++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index 9d40a68..d60a7f6 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -1,5 +1,5 @@ Kernel booting process. Part 3. -=============================== +========================================================== Video mode initialization and transition to protected mode ---------------------------------------------------------- @@ -9,11 +9,11 @@ This is the third part of the `Kernel booting process` series. In the previous [ - preparation before switching into the protected mode, - transition to protected mode -**NOTE: If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you.** +**NOTE** If you don't know anything about protected mode, you can find some information about it in the previous [part](linux-bootstrap-2.md#protected-mode). Also there are a couple of [links](linux-bootstrap-2.md#links) which can help you. -As i wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts by first getting the video mode from the `boot_params.hdr` structure: +As I wrote above, we will start from the `set_video` function which defined in the [arch/x86/boot/video.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video.c#L315) source code file. We can see that it starts by first getting the video mode from the `boot_params.hdr` structure: -```c +```C u16 mode = boot_params.hdr.vid_mode; ``` @@ -60,21 +60,21 @@ Heap API After we have `vid_mode` from the `boot_params.hdr` in the `set_video` function we can see call to `RESET_HEAP` function. `RESET_HEAP` is a macro which defined in the [boot.h](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L199). It is defined as: -```c +```C #define RESET_HEAP() ((void *)( HEAP = _end )) ``` -If you have read the second part, you will remember that we initialized the heap with the [`init_heap`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. We have a couple of utility functions for heap which are defined in the `boot.h`. They are: +If you have read the second part, you will remember that we initialized the heap with the [`init_heap`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L116) function. We have a couple of utility functions for heap which are defined in `boot.h`. They are: -```c -RESET_HEAP() +```C +#define RESET_HEAP() ``` As we saw just above it resets the heap by setting the `HEAP` variable equal to `_end`, where `_end` is just `extern char _end[];` Next is `GET_HEAP` macro: -```c +```C #define GET_HEAP(type, n) \ ((type *)__get_heap(sizeof(type),__alignof__(type),(n))) ``` @@ -87,7 +87,7 @@ for heap allocation. It calls internal function `__get_heap` with 3 parameters: Implementation of `__get_heap` is: -```c +```C static inline char *__get_heap(size_t s, size_t a, size_t n) { char *tmp; @@ -101,7 +101,7 @@ static inline char *__get_heap(size_t s, size_t a, size_t n) and further we will see its usage, something like: -```c +```C saved.data = GET_HEAP(u16, saved.x * saved.y); ``` @@ -140,7 +140,7 @@ After this, it checks current video mode and sets the `video_segment`. After the So we set the `video_segment` variable to `0xB000` if current video mode is MDA, HGC, VGA in monochrome mode or `0xB800` in color mode. After setup of the address of the video segment font size needs to be stored in the `boot_params.screen_info.orig_video_points` with: -```c +```C set_fs(0); font_size = rdfs16(0x485); boot_params.screen_info.orig_video_points = font_size; @@ -157,7 +157,7 @@ Next we get amount of columns by `0x44a` and rows by address `0x484` and store t Next we can see `save_screen` function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount etc. and stores it in the `saved_screen` structure, which is defined as: -```c +```C static struct saved_screen { int x, y; int curx, cury; @@ -167,7 +167,7 @@ static struct saved_screen { It then checks whether the heap has free space for it with: -```c +```C if (!heap_free(saved.x*saved.y*sizeof(u16)+512)) return; ``` @@ -176,7 +176,7 @@ and allocates space in the heap if it is enough and stores `saved_screen` in it. The next call is `probe_cards(0)` from the [arch/x86/boot/video-mode.c](https://github.com/0xAX/linux/blob/master/arch/x86/boot/video-mode.c#L33). It goes over all video_cards and collects number of modes provided by the cards. Here is the interesting moment, we can see the loop: -```c +```C for (card = video_cards; card < video_cards_end; card++) { /* collecting number of modes here */ } @@ -184,7 +184,7 @@ for (card = video_cards; card < video_cards_end; card++) { but `video_cards` not declared anywhere. Answer is simple: Every video mode presented in the x86 kernel setup code has definition like this: -```c +```C static __videocard video_vga = { .card_name = "VGA", .probe = vga_probe, @@ -194,13 +194,13 @@ static __videocard video_vga = { where `__videocard` is a macro: -```c +```C #define __videocard struct card_info __attribute__((used,section(".videocards"))) ``` which means that `card_info` structure: -```c +```C struct card_info { const char *card_name; int (*set_mode)(struct mode_info *mode); @@ -231,7 +231,7 @@ The `set_mode` function is defined in the [video-mode.c](https://github.com/0xAX `set_mode` function checks the `mode` and calls `raw_set_mode` function. The `raw_set_mode` calls `set_mode` function for selected card i.e. `card->set_mode(struct mode_info*)`. We can get access to this function from the `card_info` structure, every video mode defines this structure with values filled depending upon the video mode (for example for `vga` it is `video_vga.set_mode` function, see above example of `card_info` structure for `vga`). `video_vga.set_mode` is `vga_set_mode`, which checks the vga mode and calls the respective function: -```c +```C static int vga_set_mode(struct mode_info *mode) { vga_set_basic_mode(); @@ -296,7 +296,7 @@ Interrupt is a signal to the CPU which is emitted by hardware or software. After Let's get back to the code. We can see that second line is writing `0x80` (disabled bit) byte to the `0x70` (CMOS Address register). After that call to the `io_delay` function occurs. `io_delay` causes a small delay and looks like: -```c +```C static inline void io_delay(void) { const u16 DELAY_PORT = 0x80; @@ -308,7 +308,7 @@ Outputting any byte to the port `0x80` should delay exactly 1 microsecond. So we The next function is `enable_a20`, which enables [A20 line](http://en.wikipedia.org/wiki/A20_line). This function is defined in the [arch/x86/boot/a20.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/a20.c) and it tries to enable A20 gate with different methods. The first is `a20_test_short` function which checks is A20 already enabled or not with `a20_test` function: -```c +```C static int a20_test(int loops) { int ok = 0; @@ -346,14 +346,14 @@ die: ``` After the A20 gate is successfully enabled, `reset_coprocessor` function is called: - ```c + ```C outb(0, 0xf0); outb(0, 0xf1); ``` This function clears the Math Coprocessor by writing `0` to `0xf0` and then resets it by writing `0` to `0xf1`. After this `mask_all_interrupts` function is called: -```c +```C outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */ outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */ ``` @@ -366,7 +366,7 @@ Setup Interrupt Descriptor Table Now we setup the Interrupt Descriptor table (IDT). `setup_idt`: -```c +```C static void setup_idt(void) { static const struct gdt_ptr null_idt = {0, 0}; @@ -375,7 +375,7 @@ static void setup_idt(void) ``` which setups the Interrupt Descriptor Table (describes interrupt handlers and etc.). For now IDT is not installed (we will see it later), but now we just load IDT with `lidtl` instruction. `null_idt` contains address and size of IDT, but now they are just zero. `null_idt` is a `gdt_ptr` structure, it as defined as: -```c +```C struct gdt_ptr { u16 len; u32 ptr; @@ -389,7 +389,7 @@ Setup Global Descriptor Table Next is the setup of Global Descriptor Table (GDT). We can see `setup_gdt` function which sets up GDT (you can read about it in the [Kernel booting process. Part 2.](linux-bootstrap-2.md#protected-mode)). There is definition of the `boot_gdt` array in this function, which contains definition of the three segments: -```c +```C static const u64 boot_gdt[] __attribute__((aligned(16))) = { [GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff), [GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff), @@ -398,7 +398,7 @@ Next is the setup of Global Descriptor Table (GDT). We can see `setup_gdt` funct ``` For code, data and TSS (Task State Segment). We will not use task state segment for now, it was added there to make Intel VT happy as we can see in the comment line (if you're interesting you can find commit which describes it - [here](https://github.com/torvalds/linux/commit/88089519f302f1296b4739be45699f06f728ec31)). Let's look on `boot_gdt`. First of all note that it has `__attribute__((aligned(16)))` attribute. It means that this structure will be aligned by 16 bytes. Let's look at a simple example: -```c +```C #include struct aligned { @@ -460,7 +460,7 @@ You can read more about every bit in the previous [post](linux-bootstrap-2.md) o After this we get length of GDT with: -```c +```C gdt.len = sizeof(boot_gdt)-1; ``` @@ -468,7 +468,7 @@ We get size of `boot_gdt` and subtract 1 (the last valid address in the GDT). Next we get pointer to the GDT with: -```c +```C gdt.ptr = (u32)&boot_gdt + (ds() << 4); ``` @@ -476,7 +476,7 @@ Here we just get address of `boot_gdt` and add it to address of data segment lef Lastly we execute `lgdtl` instruction to load GDT into GDTR register: -```c +```C asm volatile("lgdtl %0" : : "m" (gdt)); ``` @@ -485,7 +485,7 @@ Actual transition into protected mode It is the end of `go_to_protected_mode` function. We loaded IDT, GDT, disable interruptions and now can switch CPU into protected mode. The last step we call `protected_mode_jump` function with two parameters: -```c +```C protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4)); ``` From 8a166ac4242c1afe12a27b0048423c2bc4c55395 Mon Sep 17 00:00:00 2001 From: Waqar Ahmed Date: Tue, 4 Aug 2015 23:37:26 +0500 Subject: [PATCH 17/32] Fix sentence structures in linux-bootstrap-3 Update heading underline to 80 chars --- Booting/linux-bootstrap-3.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Booting/linux-bootstrap-3.md b/Booting/linux-bootstrap-3.md index d60a7f6..df31090 100644 --- a/Booting/linux-bootstrap-3.md +++ b/Booting/linux-bootstrap-3.md @@ -1,8 +1,8 @@ Kernel booting process. Part 3. -========================================================== +================================================================================ Video mode initialization and transition to protected mode ----------------------------------------------------------- +-------------------------------------------------------------------------------- This is the third part of the `Kernel booting process` series. In the previous [part](linux-bootstrap-2.md#kernel-booting-process-part-2), we stopped right before the call of the `set_video` routine from the [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c#L181). In this part, we will see: - video mode initialization in the kernel setup code, From 40c9df395c186bc8aa798fccf9e0e33e89112674 Mon Sep 17 00:00:00 2001 From: faj25 Date: Sat, 8 Aug 2015 02:36:41 -0400 Subject: [PATCH 18/32] minor grammar fix --- Theory/ELF.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Theory/ELF.md b/Theory/ELF.md index 97db281..59bad96 100644 --- a/Theory/ELF.md +++ b/Theory/ELF.md @@ -1,7 +1,7 @@ Executable and Linkable Format ================================================================================ -ELF (Executable and Linkable Format) is a standard file format for executable files and shared libraries. Linux as many UNIX-like operating systems uses this format. Let's look on structure of the ELF-64 Object File Format and some defintions in the linux kernel source code related with it. +ELF (Executable and Linkable Format) is a standard file format for executable files and shared libraries. Linux, as well as, many UNIX-like operating systems uses this format. Let's look on structure of the ELF-64 Object File Format and some defintions in the linux kernel source code related with it. An ELF object file consists of the following parts: @@ -16,7 +16,7 @@ Now let's look closer on these components. It's located in the beginning of the object file. It's main point is to locate all other parts of the object file. File header contains following fields: * ELF identification - array of bytes which helps to identify the file as an ELF object file and also provides information about general object file characteristic; -* Object file type - identifies the object file type. This field can describe that ELF file is relocatable object file, executable file, etc...; +* Object file type - identifies the object file type. This field can describe that ELF file is a relocatable object file, executable file, etc...; * Target architecture; * Version of the object file format; * Virtual address of the program entry point; @@ -51,7 +51,7 @@ This structure defined in the [elf.h](https://github.com/torvalds/linux/blob/mas **Sections** -All data stores in a sections in an Elf object file. Sections identified by index in the section header table. Section header contains following fields: +All data is stored in sections in an Elf object file. Sections identified by index in the section header table. Section header contains following fields: * Section name; * Section type; @@ -102,12 +102,12 @@ in the linux kernel source code. `elf64_phdr` defined in the same [elf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h). -And ELF object file also contains other fields/structures which you can find in the [Documentation](http://www.uclibc.org/docs/elf-64-gen.pdf). Better let's look on the `vmlinux`. +And ELF object file also contains other fields/structures which you can find in the [Documentation](http://www.uclibc.org/docs/elf-64-gen.pdf). Now let's look on the `vmlinux`. vmlinux -------------------------------------------------------------------------------- -`vmlinux` is relocatable ELF object file too. So we can look on it with the `readelf` util. First of all let's look on a header: +`vmlinux` is relocatable ELF object file too. So we can look at it with the `readelf` util. First of all let's look on a header: ``` $ readelf -h vmlinux From 57deafd20a9fd220ff2cb39a75827e50b6cd9d72 Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:05:48 +0600 Subject: [PATCH 19/32] Create interrupts-9.md --- interrupts/interrupts-9.md | 515 +++++++++++++++++++++++++++++++++++++ 1 file changed, 515 insertions(+) create mode 100644 interrupts/interrupts-9.md diff --git a/interrupts/interrupts-9.md b/interrupts/interrupts-9.md new file mode 100644 index 0000000..be59c23 --- /dev/null +++ b/interrupts/interrupts-9.md @@ -0,0 +1,515 @@ +Interrupts and Interrupt Handling. Part 9. +================================================================================ + +Introduction to deferred interrupts (Softirq, Tasklets and irqs) +-------------------------------------------------------------------------------- + +It is the ninth part of the [linux-insides](https://www.gitbook.com/book/0xax/linux-insides/details) book and in the previous [Previous part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-8.html) we saw implementation of the `init_IRQ` from that defined in the [arch/x86/kernel/irqinit.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irqinit.c) source code file. So, we will continue to dive into the initialization stuff which is related to the external hardware interrupts in this part. + +After the `init_IRQ` function we can see the call of the `softirq_init` function in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c). This function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file and as we can understand from its name, this function makes initialization of the `softirq` or in other words initialization of the `deferred interrupts`. What is it deferreed intrrupt? We already saw a little bit about it in the ninth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-9.html) of the chapter that describes initialization process of the Linux kernel. There are three types of `deffered interrupts` in the Linux kernel: + +* `softirqs`; +* `tasklets`; +* `workqueues`; + +And we will see description of all of these types in this part. As I said, we saw only a little bit about this theme, so, now is time to dive deep into details about this theme. + +Deferred interrupts +---------------------------------------------------------------------------------- + +Interrupts may have different important characteristics and there are two among them: + +* Handler of an interrupt must execute quickly; +* Sometime an interrupt handler must do a large amount of work. + +As you can understand, it is almost impossible to make so that both characteristics were valid. Because of these, previously the handling of interrupts was splitted into two parts: + +* Top half; +* Bottom half; + +Once the Linux kernel was one of the ways the organization postprocessing, and which was called: `the bottom half` of the processor, but now it is already not actual. Now this term has remained as a common noun referring to all the different ways of organizing deffered processing of an interrupt. With the advent of parallelisms in the Linux kernel, all new schemes of implementation of the bottom half handlers are built on the performance of the processor specific kernel thread that called `ksoftirqd` (will be discussed below). The `softirq` mechanism represents handling of interrupts that are `almost` important as the handling of the hardware interrupts. The deferred processing of an interrupt suggests that some of the actions for an interrupt may be postponed to a later execution when the system will be less loaded. As you can suggests, an interrupt handler can do large amount of work that is impermissible as it executes in the context where interrupts are disabled. That's why processing of an interrupt can be splitted on two different parts. In the first part, the main handler of an interrupt does only minimal and the most important job. After this it schedules the second part and finishes its work. When the system less busy and context of the processor allows to handle interrupts, the second part starts its work and finishes to process remaing part of a deferred interrupt. That is main explanation of the deferred interrupt handling. + +As I already wrote above, handling of deferred interrupts (or `softirq` in other words) and accordingly `tasklets` provided by a set of the special kernel threads (one thread per processor). Each processor has own thread and it is called `ksoftirqd/n` where the `n` is the number of the processor. We can see it in the output of the `systemd-cgls` util: + +``` +$ systemd-cgls -k | grep ksoft +├─ 3 [ksoftirqd/0] +├─ 13 [ksoftirqd/1] +├─ 18 [ksoftirqd/2] +├─ 23 [ksoftirqd/3] +├─ 28 [ksoftirqd/4] +├─ 33 [ksoftirqd/5] +├─ 38 [ksoftirqd/6] +├─ 43 [ksoftirqd/7] +``` + +The `spawn_ksoftirqd` function starts this these threads. As we can see this function called as early [initcall](http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/index.html): + +```C +early_initcall(spawn_ksoftirqd); +``` + +Deferred interrupts are determined statically at compile-time Linux kernel and the `open_softirq` function takes care of `softirq` initialization. The `open_softirq` function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c): + + +```C +void open_softirq(int nr, void (*action)(struct softirq_action *)) +{ + softirq_vec[nr].action = action; +} +``` + +and as we can see this function uses two parameters: + +* the index of the `softirq_vec` array; +* a pointer to the softirq function to be executed; + +First of all let's look on the `softirq_vec` array: + +```C +static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; +``` + +it defined in the same source code file. As we can see, the `softirq_vec` array may contain `NR_SOFTIRQS` or `10` types of `softirqs` that has type - `softirq_action`. First of all about its elements. In the actual version of the Linux kernel there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. All of these kinds represented by the following enum: + +```C +enum +{ + HI_SOFTIRQ=0, + TIMER_SOFTIRQ, + NET_TX_SOFTIRQ, + NET_RX_SOFTIRQ, + BLOCK_SOFTIRQ, + BLOCK_IOPOLL_SOFTIRQ, + TASKLET_SOFTIRQ, + SCHED_SOFTIRQ, + HRTIMER_SOFTIRQ, + RCU_SOFTIRQ, + NR_SOFTIRQS +}; +``` + +All names of these kinds of softirqs represented by the following array: + +```C +const char * const softirq_to_name[NR_SOFTIRQS] = { + "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", + "TASKLET", "SCHED", "HRTIMER", "RCU" +}; +``` + +Or we can see it in the output of the `/proc/softirqs`: + +``` +~$ cat /proc/softirqs + CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 + HI: 5 0 0 0 0 0 0 0 + TIMER: 332519 310498 289555 272913 282535 279467 282895 270979 + NET_TX: 2320 0 0 2 1 1 0 0 + NET_RX: 270221 225 338 281 311 262 430 265 + BLOCK: 134282 32 40 10 12 7 8 8 +BLOCK_IOPOLL: 0 0 0 0 0 0 0 0 + TASKLET: 196835 2 3 0 0 0 0 0 + SCHED: 161852 146745 129539 126064 127998 128014 120243 117391 + HRTIMER: 0 0 0 0 0 0 0 0 + RCU: 337707 289397 251874 239796 254377 254898 267497 256624 +``` + +As we can see the `softirq_vec` array has `softirq_action` types. This is main data structure related to the `softirq` mechanism, so all `softirqs` represented by the `softirq_action` structure. The `softirq_action` structure consists only of one the field: an action pointer to the softirq function: + +```C +struct softirq_action +{ + void (*action)(struct softirq_action *); +}; +``` + +So, after this we can understand thatthe `open_softirq` function fills the `softirq_vec` array with the given `softirq_action`. The registered deferred interrupt (with the call of the `open_softirq` function) for it to be queued for execution, it should be activated by the call of the `raise_softirq` function. This function takes only one parameter - softirq index `nr`. Let's look on its implementation: + +```C +void raise_softirq(unsigned int nr) +{ + unsigned long flags; + + local_irq_save(flags); + raise_softirq_irqoff(nr); + local_irq_restore(flags); +} +``` + +Here we can see the call of the `raise_softirq_irqoff` function between the `local_irq_save` and the `local_irq_restore` macros. The `local_irq_save` defined in the [include/linux/irqflags.h](https://github.com/torvalds/linux/blob/master/include/linux/irqflags.h) header file and saves the state of the [IF](https://en.wikipedia.org/wiki/Interrupt_flag) flag of the [eflags](https://en.wikipedia.org/wiki/FLAGS_register) register and disables interrupts on the local processor. The `local_irq_restore` macro defined in the same header file and does the opposite thing: restores the `interrupt flag` and enables interrupts. We disable interrupts here because a `softirq` interrupt runs in interrupt context and that one softirq (and no others) will be run. + +The `raise_softirq_irqoff` function marks the softirq as deffered by setting the bit corresponding to the given index `nr` in the `softirq` bit mask (`__softirq_pending`) of the local processor. It does it with the help of the: + +```C +__raise_softirq_irqoff(nr); +``` + +macro. After this, it checks the result of the `in_interrupt` that returns `irq_count` value. We already saw the `irq_count` in the first [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-1.html) of this chapter and it is used to check if a CPU is already on an interrupt stack or not. We just exit from the `raise_softirq_irqoff`, restore `IF` flang and enable interrupts on the local processor If we are in interrupt context. In another way we call the `wakeup_softirqd`: + +```C +if (!in_interrupt()) + wakeup_softirqd(); +``` + +Where the `wakeup_softirqd` function activates the `ksoftirqd` kernel thread of the local processor: + +```C +static void wakeup_softirqd(void) +{ + struct task_struct *tsk = __this_cpu_read(ksoftirqd); + + if (tsk && tsk->state != TASK_RUNNING) + wake_up_process(tsk); +} +``` + +Each `ksoftirqd` kernel thread runs the `run_ksoftirqd` function that checks existence of deferred interrupts and calls the `__do_softirq` function depends on result. This function reads the `__softirq_pending` softirq bit mask of the local processor and executes the deferrable functions corresponding to every set bit. During executing a deferred function, new pending `softirqs` might occur. The main problem here that execution of the userspace code can be delayed for a long time while the `__do_softirq` function will handle deferred interrupts. For this purpose, it has the limit of the time when it must be finsihed: + +```C +unsigned long end = jiffies + MAX_SOFTIRQ_TIME; +... +... +... +restart: +while ((softirq_bit = ffs(pending))) { + ... + h->action(h); + ... +} +... +... +... +pending = local_softirq_pending(); +if (pending) { + if (time_before(jiffies, end) && !need_resched() && + --max_restart) + goto restart; +} +... +``` + +Checks of the existence of the deferred interrupts performed periodically and there are some points where this check occurs. The main point where this situation occurs is the call of the `do_IRQ` function that defined in the [arch/x86/kernel/irq.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irq.c) and provides main possibilities for actual interrupt processing in the Linux kernel. When this function will finish to handle an interrupt, it calls the `exiting_irq` function from the [arch/x86/include/asm/apic.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/apic.h) that expands to the call of the `irq_exit` function. The `irq_exit` checks deferred interrupts, current context and calls the `invoke_softirq` function: + +```C +if (!in_interrupt() && local_softirq_pending()) + invoke_softirq(); +``` + +that executes the `__do_softirq` too. So what do we have in summary. Each `softirq` goes through the following stages: Registration of a `softirq` with the `open_softirq` function. Activation of a `softirq` by marking it as deferred with the `raise_softirq` function. After this, all marked `softirqs` will be runned in the next time the Linux kernel schedules a round of executions of deferrable functions. And execution of the deferred functions that have the same type. + +As I already wrote, the `softirqs` are statically allocated and it is a problem for a kernel module that can be loaded. The second concept that built on top of `softirq` - the `tasklets` solves this problem. + +Tasklets +-------------------------------------------------------------------------------- + +If you will read source code of the Linux kernel that is related to the `softirq`, you will note that it is very rarely used there. The preferable way to implement deferrable functions are `tasklets`. As I already wrote above the `tasklets` are built on top of the `softirq` concept and generally on top of two `softirqs`: + +* `TASKLET_SOFTIRQ`; +* `HI_SOFTIRQ`. + +In sort words, `tasklets` are `softirqs` that can be allocated and initialized at runtime and unlike `softirqs`, tasklets that have the same type cannot be runned on a sereral processors in one time. Ok, now we know a little bit about the `softirqs`, of course previous text does not cover all aspects about this, but now we can directly look on the code and to know more about the `softirqs` step by step on practice and to know about `tasklets`. Let's back to the implementation of the `softirq_init` function that we talked about in the beginning of this part. This function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file, let's look on its implementation: + +```C +void __init softirq_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + per_cpu(tasklet_vec, cpu).tail = + &per_cpu(tasklet_vec, cpu).head; + per_cpu(tasklet_hi_vec, cpu).tail = + &per_cpu(tasklet_hi_vec, cpu).head; + } + + open_softirq(TASKLET_SOFTIRQ, tasklet_action); + open_softirq(HI_SOFTIRQ, tasklet_hi_action); +} +``` + +We can see defineition of the integer `cpu` variable at the beginning of the `softirq_init` function. Next we will use it as parameter for the `for_each_possible_cpu` macro that goes through the all possible processors in the system. If the `possible processor` is the new terminology for you, you can know more about it the [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) chapter. In short words, `possible cpus` is the set of processors that can be plugged in anytime during the life of that system boot. All `possible processors` stored in the `cpu_possible_bits` bitmap, you can find its definition in the [kernel/cpu.c](https://github.com/torvalds/linux/blob/master/kernel/cpu.c): + +```C +static DECLARE_BITMAP(cpu_possible_bits, CONFIG_NR_CPUS) __read_mostly; +... +... +... +const struct cpumask *const cpu_possible_mask = to_cpumask(cpu_possible_bits); +``` + +Ok, we defined the integer `cpu` variable and go through the all possible processors with the `for_each_possible_cpu` macro and makes initialization of the two following [per-cpu](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html) variables: + +* `tasklet_vec`; +* `tasklet_hi_vec`; + +These two `per-cpu` variables defined in the same source [code](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) file as the `softirq_init` function and represent two `tasklet_head` structures: + +```C +static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); +static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec); +``` + +Where `tasklet_head` structure represent list of `Tasklets` and contains two fields, head and tail: + +```C +struct tasklet_head { + struct tasklet_struct *head; + struct tasklet_struct **tail; +}; +``` + +The `tasklet_struct` structure defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and represents the `Tasklet`. Previously we did not see this word in this book. Let's try to understand what is it `tasklet`. In short words, the tasklet is a thread that has no context and stack. Actually, the tasklet is an one of mechanisms to handle deferred interrupt. Let's look on the implementation if the `tasklet_struct` structure: + +```C +struct tasklet_struct +{ + struct tasklet_struct *next; + unsigned long state; + atomic_t count; + void (*func)(unsigned long); + unsigned long data; +}; +``` + +As we can see this structure contains five fields, they are: + +* Next tasklet in the scheduling queue; +* State of the tasklet; +* Represent current state of the tasklet, active or not; +* Main callback of the tasklet; +* Parameter of the callback. + +In our case, we set only for initialize only two arrays of tasklets in the `softirq_init` function: the `tasklet_vec` and the `tasklet_hi_vec`. Tasklets and high-priority tasklets are stored in the `tasklet_vec` and `tasklet_hi_vec` arrays, respectively. So, we have initialized these arrays and now we can see two calls of the `open_softirq` function that defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file: + +```C +open_softirq(TASKLET_SOFTIRQ, tasklet_action); +open_softirq(HI_SOFTIRQ, tasklet_hi_action); +``` + +in the end of the `softirq_init` function. The main purpose of the `open_softirq` function is initalization of `softirq`. Let's look on the implementation of the `open_softirq` function. + +, in our case they are: `tasklet_action` and the `tasklet_hi_action` or the `softirq` function associated with the `HI_SOFTIRQ` softirq is named `tasklet_hi_action` and `softirq` function associated with the `TASKLET_SOFTIRQ` is named `tasklet_action`. The Linux kernel provides API for the manipulating of `tasklets`. First of all it is the `tasklet_init` function that takes `tasklet_struct`, function and parameter for it and initializes the given `tasklet_struct` with the given data: + +```C +void tasklet_init(struct tasklet_struct *t, + void (*func)(unsigned long), unsigned long data) +{ + t->next = NULL; + t->state = 0; + atomic_set(&t->count, 0); + t->func = func; + t->data = data; +} +``` + +There are additional methods to initialize a tasklet statically with the two following macros: + +```C +DECLARE_TASKLET(name, func, data); +DECLARE_TASKLET_DISABLED(name, func, data); +``` + +The Linux kernel provides three following functions to mark a tasklet as ready to run: + +```C +void tasklet_schedule(struct tasklet_struct *t); +void tasklet_hi_schedule(struct tasklet_struct *t); +void tasklet_hi_schedule_first(struct tasklet_struct *t); +``` + +The first function schedules a tasklet with the normal priory, the second with the high priority and the third out of turn. Implementation of the all of these three functions is similar, so we will consider only the first - `tasklet_schedule`. Let's look on its implementation: + +```C +static inline void tasklet_schedule(struct tasklet_struct *t) +{ + if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) + __tasklet_schedule(t); +} + +void __tasklet_schedule(struct tasklet_struct *t) +{ + unsigned long flags; + + local_irq_save(flags); + t->next = NULL; + *__this_cpu_read(tasklet_vec.tail) = t; + __this_cpu_write(tasklet_vec.tail, &(t->next)); + raise_softirq_irqoff(TASKLET_SOFTIRQ); + local_irq_restore(flags); +} +``` + +As we can see it checks and set the state of the given tasklet to the `TASKLET_STATE_SCHED` and executes the `__tasklet_schedule` with the given tasklet. The `__tasklet_schedule` looks very similar on the `raise_softirq` function that we saw above. It saves the `interrupt flag` and disables interrupts at the beginning. After this it updates `tasklet_vec` with the new tasklet and calls the `raise_softirq_irqoff` function that we saw above. When the Linux kernel scheduler will decide to run deferred functions, the `tasklet_action` function will be called for deferred functions which are associated with the `TASKLET_SOFTIRQ` and `tasklet_hi_action` for deferred functions which are associated with the `HI_SOFTIRQ`. These functions are very similar and there is only one difference between they is that the `tasklet_action` uses `tasklet_vec`, but the `tasklet_hi_action` uses `tasklet_hi_vec`. + +Let's look on the implementation of the `tasklet_action` function: + +```C +static void tasklet_action(struct softirq_action *a) +{ + local_irq_disable(); + list = __this_cpu_read(tasklet_vec.head); + __this_cpu_write(tasklet_vec.head, NULL); + __this_cpu_write(tasklet_vec.tail, this_cpu_ptr(&tasklet_vec.head)); + local_irq_enable(); + + while (list) { + if (tasklet_trylock(t)) { + t->func(t->data); + tasklet_unlock(t); + } + ... + ... + ... + } +} +``` + +In the beginning of the `tasketl_action` function we disable interrupts for the local processor with the help of the `local_irq_disable` macro (you can read about this macro in the second [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-2.html) of this chapter). In the next step we take a head of the list that contains tasklets with normal priority and set this percpu list to `NULL` because all tasklets must be executed in a generaly way. After this we enable interrupts for the local processor and go through the list of taklets in the loop. In the every iteration of the loop we call the `tasklet_trylock` function for the given tasklet that update state of the given tasklet on `TASKLET_STATE_RUN`: + +```C +static inline int tasklet_trylock(struct tasklet_struct *t) +{ + return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state); +} +``` + +If this operation was successful we execute tasklet's action (it was set in the `tasklet_init`) and call the `tasklet_unlock` function that clears tasklet's `TASKLET_STATE_RUN` state. + +In generall that's all about `tasklets` concept. Of course this does not cover full `tasklets`, but I think that it is good point from where you can continue to learn this concept. + +The `tasklets` are [widely](http://lxr.free-electrons.com/ident?i=tasklet_init) used concept in the Linux kernel, but as I wrote in the beginning of this part there is third mechanism for deferred functions - `workqueue`. In the next paragraph we will know what is it. + +Workqueues +-------------------------------------------------------------------------------- + +The `workqueue` is another concept for handling deferred functions. It is similar on `tasklets`, but has some differences. Workqueue functions run in the context of a kernel process, but `tasklet` functions run in software interrupt context. This means that `workqueue` functions must not be atomic as `tasklet` functions. Tasklets always run on the processor from which they were originally submitted. Workqueues work in the same way, but only by default. The `workqueue` concept represented by the: + +```C +struct worker_pool { + spinlock_t lock; + int cpu; + int node; + int id; + unsigned int flags; + + struct list_head worklist; + int nr_workers; +... +... +... +``` + +structure that defined in the [kernel/workqueue.c](https://github.com/torvalds/linux/blob/master/kernel/workqueue.c) source code file in the Linux kernel. I will not write the source code of this structure here, because it has quite a bit of fileds, but we will consider some of their feilds. + +In its most basic form, the work queue subsystem is an interface for creating kernel threads to handle work that is queued from elsewhere. All of these kernel threads are called - `worker threads`. The work queue are maintained by the `work_struct` that defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h). Let's look on this structure: + +```C +struct work_struct { + atomic_long_t data; + struct list_head entry; + work_func_t func; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif +}; +``` + +Here are two things that we are interested: `func` - the function that will be scheduled by the `workqueue` and the `data` - parameter of this function. The Linux kernel provides special per-cpu threads that are called - `kworker`: + +``` +systemd-cgls -k | grep kworker +├─ 5 [kworker/0:0H] +├─ 15 [kworker/1:0H] +├─ 20 [kworker/2:0H] +├─ 25 [kworker/3:0H] +├─ 30 [kworker/4:0H] +... +... +... +``` + +This process can be used to schedule the deferred functions of the workqueues (as `ksoftirqd` for `softirqs`). Besides this we can create new separate worker thread for a `workqueue`. The Linux kernel provides following macros for the creation of workqueue: + +```C +#define DECLARE_WORK(n, f) \ + struct work_struct n = __WORK_INITIALIZER(n, f) +``` + +for static creation. It takes two parameters: name of the workqueue and the workqueue function. For creation of workqueue in runtime, we can use the: + +```C +#define INIT_WORK(_work, _func) \ + __INIT_WORK((_work), (_func), 0) + +#define __INIT_WORK(_work, _func, _onstack) \ + do { \ + __init_work((_work), _onstack); \ + (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ + INIT_LIST_HEAD(&(_work)->entry); \ + (_work)->func = (_func); \ + } while (0) +``` + +macro that takes `work_struct` structure that has to be created and the function to be scheduled in this workqueue. After a `work` was created with the one of these macros, we need to put it to the `workqueue`. We can do it with the help of the `queue_work` or the `queue_delayed_work` functions: + +```C +static inline bool queue_work(struct workqueue_struct *wq, + struct work_struct *work) +{ + return queue_work_on(WORK_CPU_UNBOUND, wq, work); +} +``` + +The `queue_work` function just calls the `queue_work_on` function that queue work on specific processor. Note that in our case we pass the `WORK_STRUCT_PENDING_BIT` to the `queue_work_on` function. It is a part of the `enum` that defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h) and represents workqueue which are not bound to any specific processor. The `queue_work_on` function tests and set the `WORK_STRUCT_PENDING_BIT` bit of the given `work` and executes the `__queue_work` function with the `workqueue` for the given processor and given `work`: + +```C +__queue_work(cpu, wq, work); +``` + +The `__queue_work` function gets the `work pool`. Yes not `workqueue`, but `work pool`. Actually all `works` are placed not in the `workqueue`, but to the `work pool` that represented by the `worker_pool` structure in the Linux kernel. As you can see above, the `workqueue_struct` structure has the `pwqs` field which is list of `worker_pools`. When we create a `workqueue`, it stands out for each processor the `pool_workqueue`. Each `pool_workqueue` associated with `worker_pool`, which is allocated on the same processor and corresponds to the type of priority queue. Through them `workqueue` interacts with `worker_pool`. So in the `__queue_work` function we set the cpu to the current processor with the `raw_smp_processor_id` (you can find information about this marco in the fouth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter), geting the `pool_workqueue` for the given `workqueue_struct` and insert the given `work` to the given `workqueue`: + +```C +static void __queue_work(int cpu, struct workqueue_struct *wq, + struct work_struct *work) +{ +... +... +... +if (req_cpu == WORK_CPU_UNBOUND) + cpu = raw_smp_processor_id(); + +if (!(wq->flags & WQ_UNBOUND)) + pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); +else + pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); +... +... +... +insert_work(pwq, work, worklist, work_flags); +``` + +As we can create `works` and `workqueue`, we need to know when they are execute. As I already wrote, all `works` executed by the kernel thread. When this kernel thread scheduled, it starts to execute `works` from the given `workqueue`. Each worker thread executes a loop inside the `worker_thread` function. This thread makes many different things and part of these things are similar to that we saw before in this part. As it started to work, it removes all `work_struct` or `works` from its `workqueue`. + +That's all. + +Conclusion +-------------------------------------------------------------------------------- + +It is the end of the ninth part of the [Interrupts and Interrupt Handling](http://0xax.gitbooks.io/linux-insides/content/interrupts/index.html) chapter and we continued to dive into external hardware interrupts in this part. In the previous part we saw initialization of the `IRQs` and main `irq_desc` structure. In this part we saw three concepts: the `softirq`, `tasklet` and `workqueue` that are used for the deferred functions. + +The next part will be last part of the `Interrupts and Interrupt Handling` chapter and we will look on the real hardware driver and will try to learn how it works with the interrupts subsystem. + +If you will have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). + +**Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).** + +Links +-------------------------------------------------------------------------------- + +* [initcall](http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/index.html) +* [IF](https://en.wikipedia.org/wiki/Interrupt_flag) +* [eflags](https://en.wikipedia.org/wiki/FLAGS_register) +* [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) +* [per-cpu](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html) +* [Workqueue](https://github.com/torvalds/linux/blob/master/Documentation/workqueue.txt) +* [Previous part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-8.html) From 4b8bb6e1252cbd8da4f4a7df8942803fd2762d15 Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:06:27 +0600 Subject: [PATCH 20/32] Update SUMMARY.md --- SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/SUMMARY.md b/SUMMARY.md index 640a567..f9eb9e1 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -26,6 +26,7 @@ * [Handling Non-Maskable interrupts](interrupts/interrupts-6.md) * [Dive into external hardware interrupts](interrupts/interrupts-7.md) * [Initialization of external hardware interrupts structures](interrupts/interrupts-8.md) + * [Softirq, Tasklets and Workqueues](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-9.md) * [Memory management](mm/README.md) * [Memblock](mm/linux-mm-1.md) * [Fixmaps and ioremap](mm/linux-mm-2.md) From 47982587fe43d4852e3ffcc96908145e2283e7ae Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:06:30 +0600 Subject: [PATCH 21/32] Update interrupts-9.md --- interrupts/interrupts-9.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/interrupts/interrupts-9.md b/interrupts/interrupts-9.md index be59c23..135e6de 100644 --- a/interrupts/interrupts-9.md +++ b/interrupts/interrupts-9.md @@ -1,7 +1,7 @@ Interrupts and Interrupt Handling. Part 9. ================================================================================ -Introduction to deferred interrupts (Softirq, Tasklets and irqs) +Introduction to deferred interrupts (Softirq, Tasklets and Workqueues) -------------------------------------------------------------------------------- It is the ninth part of the [linux-insides](https://www.gitbook.com/book/0xax/linux-insides/details) book and in the previous [Previous part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-8.html) we saw implementation of the `init_IRQ` from that defined in the [arch/x86/kernel/irqinit.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irqinit.c) source code file. So, we will continue to dive into the initialization stuff which is related to the external hardware interrupts in this part. From 00db88ef639cddd8ce5f9a6e5ec0efdbe1821d1a Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:07:04 +0600 Subject: [PATCH 22/32] Update README.md --- interrupts/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/interrupts/README.md b/interrupts/README.md index 8382468..04840a5 100644 --- a/interrupts/README.md +++ b/interrupts/README.md @@ -10,3 +10,4 @@ You will find a couple of posts which describes an interrupts and an exceptions * [Handling Non-Maskable interrupts](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-6.md) - describes handling of non-maskable interrupts and the rest of interrupts handlers from the architecture-specific part. * [Dive into external hardware interrupts](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-7.md) - this part describes early initialization of code which is related to handling of external hardware interrupts. * [Non-early initialization of the IRQs](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-8.md) - this part describes non-early initialization of code which is related to handling of external hardware interrupts. +* [Softirq, Tasklets and Workqueues](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-9.md) - this part describes softirqs, tasklets and workqueues concepts. From 879bc8317b75089f580e37eb4a7851834c62cb0f Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:11:15 +0600 Subject: [PATCH 23/32] Update interrupts-9.md --- interrupts/interrupts-9.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/interrupts/interrupts-9.md b/interrupts/interrupts-9.md index 135e6de..de2b657 100644 --- a/interrupts/interrupts-9.md +++ b/interrupts/interrupts-9.md @@ -1,7 +1,7 @@ Interrupts and Interrupt Handling. Part 9. ================================================================================ -Introduction to deferred interrupts (Softirq, Tasklets and Workqueues) +Introduction to deferred interrupts (Softirq, Tasklets and irqs) -------------------------------------------------------------------------------- It is the ninth part of the [linux-insides](https://www.gitbook.com/book/0xax/linux-insides/details) book and in the previous [Previous part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-8.html) we saw implementation of the `init_IRQ` from that defined in the [arch/x86/kernel/irqinit.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irqinit.c) source code file. So, we will continue to dive into the initialization stuff which is related to the external hardware interrupts in this part. @@ -258,7 +258,7 @@ struct tasklet_head { }; ``` -The `tasklet_struct` structure defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and represents the `Tasklet`. Previously we did not see this word in this book. Let's try to understand what is it `tasklet`. In short words, the tasklet is a thread that has no context and stack. Actually, the tasklet is an one of mechanisms to handle deferred interrupt. Let's look on the implementation if the `tasklet_struct` structure: +The `tasklet_struct` structure defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and represents the `Tasklet`. Previously we did not see this word in this book. Let's try to understand what is it `tasklet`. Actually, the tasklet is an one of mechanisms to handle deferred interrupt. Let's look on the implementation if the `tasklet_struct` structure: ```C struct tasklet_struct From 3d66d938c68f8532dd0e48ef8c829a9f86a91da3 Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sun, 9 Aug 2015 20:12:30 +0600 Subject: [PATCH 24/32] Update interrupts-9.md --- interrupts/interrupts-9.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/interrupts/interrupts-9.md b/interrupts/interrupts-9.md index de2b657..223911d 100644 --- a/interrupts/interrupts-9.md +++ b/interrupts/interrupts-9.md @@ -1,7 +1,7 @@ Interrupts and Interrupt Handling. Part 9. ================================================================================ -Introduction to deferred interrupts (Softirq, Tasklets and irqs) +Introduction to deferred interrupts (Softirq, Tasklets and Workqueues) -------------------------------------------------------------------------------- It is the ninth part of the [linux-insides](https://www.gitbook.com/book/0xax/linux-insides/details) book and in the previous [Previous part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-8.html) we saw implementation of the `init_IRQ` from that defined in the [arch/x86/kernel/irqinit.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irqinit.c) source code file. So, we will continue to dive into the initialization stuff which is related to the external hardware interrupts in this part. From 0f1eb03682dc9a1ccdd5da605bf03cc885ff64e6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Koutn=C3=BD?= Date: Sun, 9 Aug 2015 22:08:40 +0200 Subject: [PATCH 25/32] Fixed some typos and grammar constructs I tried to fix obvious typos and some grammar constructs that made text less readable. --- interrupts/interrupts-9.md | 64 +++++++++++++++++++------------------- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/interrupts/interrupts-9.md b/interrupts/interrupts-9.md index 223911d..0671862 100644 --- a/interrupts/interrupts-9.md +++ b/interrupts/interrupts-9.md @@ -27,9 +27,9 @@ As you can understand, it is almost impossible to make so that both characterist * Top half; * Bottom half; -Once the Linux kernel was one of the ways the organization postprocessing, and which was called: `the bottom half` of the processor, but now it is already not actual. Now this term has remained as a common noun referring to all the different ways of organizing deffered processing of an interrupt. With the advent of parallelisms in the Linux kernel, all new schemes of implementation of the bottom half handlers are built on the performance of the processor specific kernel thread that called `ksoftirqd` (will be discussed below). The `softirq` mechanism represents handling of interrupts that are `almost` important as the handling of the hardware interrupts. The deferred processing of an interrupt suggests that some of the actions for an interrupt may be postponed to a later execution when the system will be less loaded. As you can suggests, an interrupt handler can do large amount of work that is impermissible as it executes in the context where interrupts are disabled. That's why processing of an interrupt can be splitted on two different parts. In the first part, the main handler of an interrupt does only minimal and the most important job. After this it schedules the second part and finishes its work. When the system less busy and context of the processor allows to handle interrupts, the second part starts its work and finishes to process remaing part of a deferred interrupt. That is main explanation of the deferred interrupt handling. +Once the Linux kernel was one of the ways the organization postprocessing, and which was called: `the bottom half` of the processor, but now it is already not actual. Now this term has remained as a common noun referring to all the different ways of organizing deffered processing of an interrupt. With the advent of parallelisms in the Linux kernel, all new schemes of implementation of the bottom half handlers are built on the performance of the processor specific kernel thread that called `ksoftirqd` (will be discussed below). The `softirq` mechanism represents handling of interrupts that are `almost` as important as the handling of the hardware interrupts. The deferred processing of an interrupt suggests that some of the actions for an interrupt may be postponed to a later execution when the system will be less loaded. As you can suggests, an interrupt handler can do large amount of work that is impermissible as it executes in the context where interrupts are disabled. That's why processing of an interrupt can be splitted on two different parts. In the first part, the main handler of an interrupt does only minimal and the most important job. After this it schedules the second part and finishes its work. When the system is less busy and context of the processor allows to handle interrupts, the second part starts its work and finishes to process remaing part of a deferred interrupt. That is main explanation of the deferred interrupt handling. -As I already wrote above, handling of deferred interrupts (or `softirq` in other words) and accordingly `tasklets` provided by a set of the special kernel threads (one thread per processor). Each processor has own thread and it is called `ksoftirqd/n` where the `n` is the number of the processor. We can see it in the output of the `systemd-cgls` util: +As I already wrote above, handling of deferred interrupts (or `softirq` in other words) and accordingly `tasklets` is performed by a set of the special kernel threads (one thread per processor). Each processor has its own thread that is called `ksoftirqd/n` where the `n` is the number of the processor. We can see it in the output of the `systemd-cgls` util: ``` $ systemd-cgls -k | grep ksoft @@ -49,7 +49,7 @@ The `spawn_ksoftirqd` function starts this these threads. As we can see this fun early_initcall(spawn_ksoftirqd); ``` -Deferred interrupts are determined statically at compile-time Linux kernel and the `open_softirq` function takes care of `softirq` initialization. The `open_softirq` function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c): +Deferred interrupts are determined statically at compile-time of the Linux kernel and the `open_softirq` function takes care of `softirq` initialization. The `open_softirq` function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c): ```C @@ -70,7 +70,7 @@ First of all let's look on the `softirq_vec` array: static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; ``` -it defined in the same source code file. As we can see, the `softirq_vec` array may contain `NR_SOFTIRQS` or `10` types of `softirqs` that has type - `softirq_action`. First of all about its elements. In the actual version of the Linux kernel there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. All of these kinds represented by the following enum: +it defined in the same source code file. As we can see, the `softirq_vec` array may contain `NR_SOFTIRQS` or `10` types of `softirqs` that has type `softirq_action`. First of all about its elements. In the current version of the Linux kernel there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. All of these kinds are represented by the following enum: ```C enum @@ -89,7 +89,7 @@ enum }; ``` -All names of these kinds of softirqs represented by the following array: +All names of these kinds of softirqs are represented by the following array: ```C const char * const softirq_to_name[NR_SOFTIRQS] = { @@ -115,7 +115,7 @@ BLOCK_IOPOLL: 0 0 0 0 0 0 RCU: 337707 289397 251874 239796 254377 254898 267497 256624 ``` -As we can see the `softirq_vec` array has `softirq_action` types. This is main data structure related to the `softirq` mechanism, so all `softirqs` represented by the `softirq_action` structure. The `softirq_action` structure consists only of one the field: an action pointer to the softirq function: +As we can see the `softirq_vec` array has `softirq_action` types. This is the main data structure related to the `softirq` mechanism, so all `softirqs` represented by the `softirq_action` structure. The `softirq_action` structure consists a single field only: an action pointer to the softirq function: ```C struct softirq_action @@ -124,7 +124,7 @@ struct softirq_action }; ``` -So, after this we can understand thatthe `open_softirq` function fills the `softirq_vec` array with the given `softirq_action`. The registered deferred interrupt (with the call of the `open_softirq` function) for it to be queued for execution, it should be activated by the call of the `raise_softirq` function. This function takes only one parameter - softirq index `nr`. Let's look on its implementation: +So, after this we can understand that the `open_softirq` function fills the `softirq_vec` array with the given `softirq_action`. The registered deferred interrupt (with the call of the `open_softirq` function) for it to be queued for execution, it should be activated by the call of the `raise_softirq` function. This function takes only one parameter -- a softirq index `nr`. Let's look on its implementation: ```C void raise_softirq(unsigned int nr) @@ -137,7 +137,7 @@ void raise_softirq(unsigned int nr) } ``` -Here we can see the call of the `raise_softirq_irqoff` function between the `local_irq_save` and the `local_irq_restore` macros. The `local_irq_save` defined in the [include/linux/irqflags.h](https://github.com/torvalds/linux/blob/master/include/linux/irqflags.h) header file and saves the state of the [IF](https://en.wikipedia.org/wiki/Interrupt_flag) flag of the [eflags](https://en.wikipedia.org/wiki/FLAGS_register) register and disables interrupts on the local processor. The `local_irq_restore` macro defined in the same header file and does the opposite thing: restores the `interrupt flag` and enables interrupts. We disable interrupts here because a `softirq` interrupt runs in interrupt context and that one softirq (and no others) will be run. +Here we can see the call of the `raise_softirq_irqoff` function between the `local_irq_save` and the `local_irq_restore` macros. The `local_irq_save` defined in the [include/linux/irqflags.h](https://github.com/torvalds/linux/blob/master/include/linux/irqflags.h) header file and saves the state of the [IF](https://en.wikipedia.org/wiki/Interrupt_flag) flag of the [eflags](https://en.wikipedia.org/wiki/FLAGS_register) register and disables interrupts on the local processor. The `local_irq_restore` macro defined in the same header file and does the opposite thing: restores the `interrupt flag` and enables interrupts. We disable interrupts here because a `softirq` interrupt runs in the interrupt context and that one softirq (and no others) will be run. The `raise_softirq_irqoff` function marks the softirq as deffered by setting the bit corresponding to the given index `nr` in the `softirq` bit mask (`__softirq_pending`) of the local processor. It does it with the help of the: @@ -145,7 +145,7 @@ The `raise_softirq_irqoff` function marks the softirq as deffered by setting the __raise_softirq_irqoff(nr); ``` -macro. After this, it checks the result of the `in_interrupt` that returns `irq_count` value. We already saw the `irq_count` in the first [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-1.html) of this chapter and it is used to check if a CPU is already on an interrupt stack or not. We just exit from the `raise_softirq_irqoff`, restore `IF` flang and enable interrupts on the local processor If we are in interrupt context. In another way we call the `wakeup_softirqd`: +macro. After this, it checks the result of the `in_interrupt` that returns `irq_count` value. We already saw the `irq_count` in the first [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-1.html) of this chapter and it is used to check if a CPU is already on an interrupt stack or not. We just exit from the `raise_softirq_irqoff`, restore `IF` flang and enable interrupts on the local processor, if we are in the interrupt context, otherwise we call the `wakeup_softirqd`: ```C if (!in_interrupt()) @@ -164,7 +164,7 @@ static void wakeup_softirqd(void) } ``` -Each `ksoftirqd` kernel thread runs the `run_ksoftirqd` function that checks existence of deferred interrupts and calls the `__do_softirq` function depends on result. This function reads the `__softirq_pending` softirq bit mask of the local processor and executes the deferrable functions corresponding to every set bit. During executing a deferred function, new pending `softirqs` might occur. The main problem here that execution of the userspace code can be delayed for a long time while the `__do_softirq` function will handle deferred interrupts. For this purpose, it has the limit of the time when it must be finsihed: +Each `ksoftirqd` kernel thread runs the `run_ksoftirqd` function that checks existence of deferred interrupts and calls the `__do_softirq` function depends on result. This function reads the `__softirq_pending` softirq bit mask of the local processor and executes the deferrable functions corresponding to every bit set. During execution of a deferred function, new pending `softirqs` might occur. The main problem here that execution of the userspace code can be delayed for a long time while the `__do_softirq` function will handle deferred interrupts. For this purpose, it has the limit of the time when it must be finsihed: ```C unsigned long end = jiffies + MAX_SOFTIRQ_TIME; @@ -198,17 +198,17 @@ if (!in_interrupt() && local_softirq_pending()) that executes the `__do_softirq` too. So what do we have in summary. Each `softirq` goes through the following stages: Registration of a `softirq` with the `open_softirq` function. Activation of a `softirq` by marking it as deferred with the `raise_softirq` function. After this, all marked `softirqs` will be runned in the next time the Linux kernel schedules a round of executions of deferrable functions. And execution of the deferred functions that have the same type. -As I already wrote, the `softirqs` are statically allocated and it is a problem for a kernel module that can be loaded. The second concept that built on top of `softirq` - the `tasklets` solves this problem. +As I already wrote, the `softirqs` are statically allocated and it is a problem for a kernel module that can be loaded. The second concept that built on top of `softirq` -- the `tasklets` solves this problem. Tasklets -------------------------------------------------------------------------------- -If you will read source code of the Linux kernel that is related to the `softirq`, you will note that it is very rarely used there. The preferable way to implement deferrable functions are `tasklets`. As I already wrote above the `tasklets` are built on top of the `softirq` concept and generally on top of two `softirqs`: +If you read the source code of the Linux kernel that is related to the `softirq`, you notice that it is used very rarely. The preferable way to implement deferrable functions are `tasklets`. As I already wrote above the `tasklets` are built on top of the `softirq` concept and generally on top of two `softirqs`: * `TASKLET_SOFTIRQ`; * `HI_SOFTIRQ`. -In sort words, `tasklets` are `softirqs` that can be allocated and initialized at runtime and unlike `softirqs`, tasklets that have the same type cannot be runned on a sereral processors in one time. Ok, now we know a little bit about the `softirqs`, of course previous text does not cover all aspects about this, but now we can directly look on the code and to know more about the `softirqs` step by step on practice and to know about `tasklets`. Let's back to the implementation of the `softirq_init` function that we talked about in the beginning of this part. This function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file, let's look on its implementation: +In short words, `tasklets` are `softirqs` that can be allocated and initialized at runtime and unlike `softirqs`, tasklets that have the same type cannot be run on multiple processors at a time. Ok, now we know a little bit about the `softirqs`, of course previous text does not cover all aspects about this, but now we can directly look on the code and to know more about the `softirqs` step by step on practice and to know about `tasklets`. Let's return back to the implementation of the `softirq_init` function that we talked about in the beginning of this part. This function is defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file, let's look on its implementation: ```C void __init softirq_init(void) @@ -227,7 +227,7 @@ void __init softirq_init(void) } ``` -We can see defineition of the integer `cpu` variable at the beginning of the `softirq_init` function. Next we will use it as parameter for the `for_each_possible_cpu` macro that goes through the all possible processors in the system. If the `possible processor` is the new terminology for you, you can know more about it the [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) chapter. In short words, `possible cpus` is the set of processors that can be plugged in anytime during the life of that system boot. All `possible processors` stored in the `cpu_possible_bits` bitmap, you can find its definition in the [kernel/cpu.c](https://github.com/torvalds/linux/blob/master/kernel/cpu.c): +We can see definition of the integer `cpu` variable at the beginning of the `softirq_init` function. Next we will use it as parameter for the `for_each_possible_cpu` macro that goes through the all possible processors in the system. If the `possible processor` is the new terminology for you, you can read more about it the [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) chapter. In short words, `possible cpus` is the set of processors that can be plugged in anytime during the life of that system boot. All `possible processors` stored in the `cpu_possible_bits` bitmap, you can find its definition in the [kernel/cpu.c](https://github.com/torvalds/linux/blob/master/kernel/cpu.c): ```C static DECLARE_BITMAP(cpu_possible_bits, CONFIG_NR_CPUS) __read_mostly; @@ -249,7 +249,7 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec); ``` -Where `tasklet_head` structure represent list of `Tasklets` and contains two fields, head and tail: +Where `tasklet_head` structure represents a list of `Tasklets` and contains two fields, head and tail: ```C struct tasklet_head { @@ -258,7 +258,7 @@ struct tasklet_head { }; ``` -The `tasklet_struct` structure defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and represents the `Tasklet`. Previously we did not see this word in this book. Let's try to understand what is it `tasklet`. Actually, the tasklet is an one of mechanisms to handle deferred interrupt. Let's look on the implementation if the `tasklet_struct` structure: +The `tasklet_struct` structure is defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and represents the `Tasklet`. Previously we did not see this word in this book. Let's try to understand what the `tasklet` is. Actually, the tasklet is one of mechanisms to handle deferred interrupt. Let's look on the implementation of the `tasklet_struct` structure: ```C struct tasklet_struct @@ -279,14 +279,14 @@ As we can see this structure contains five fields, they are: * Main callback of the tasklet; * Parameter of the callback. -In our case, we set only for initialize only two arrays of tasklets in the `softirq_init` function: the `tasklet_vec` and the `tasklet_hi_vec`. Tasklets and high-priority tasklets are stored in the `tasklet_vec` and `tasklet_hi_vec` arrays, respectively. So, we have initialized these arrays and now we can see two calls of the `open_softirq` function that defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file: +In our case, we set only for initialize only two arrays of tasklets in the `softirq_init` function: the `tasklet_vec` and the `tasklet_hi_vec`. Tasklets and high-priority tasklets are stored in the `tasklet_vec` and `tasklet_hi_vec` arrays, respectively. So, we have initialized these arrays and now we can see two calls of the `open_softirq` function that is defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file: ```C open_softirq(TASKLET_SOFTIRQ, tasklet_action); open_softirq(HI_SOFTIRQ, tasklet_hi_action); ``` -in the end of the `softirq_init` function. The main purpose of the `open_softirq` function is initalization of `softirq`. Let's look on the implementation of the `open_softirq` function. +at the end of the `softirq_init` function. The main purpose of the `open_softirq` function is the initalization of `softirq`. Let's look on the implementation of the `open_softirq` function. , in our case they are: `tasklet_action` and the `tasklet_hi_action` or the `softirq` function associated with the `HI_SOFTIRQ` softirq is named `tasklet_hi_action` and `softirq` function associated with the `TASKLET_SOFTIRQ` is named `tasklet_action`. The Linux kernel provides API for the manipulating of `tasklets`. First of all it is the `tasklet_init` function that takes `tasklet_struct`, function and parameter for it and initializes the given `tasklet_struct` with the given data: @@ -317,7 +317,7 @@ void tasklet_hi_schedule(struct tasklet_struct *t); void tasklet_hi_schedule_first(struct tasklet_struct *t); ``` -The first function schedules a tasklet with the normal priory, the second with the high priority and the third out of turn. Implementation of the all of these three functions is similar, so we will consider only the first - `tasklet_schedule`. Let's look on its implementation: +The first function schedules a tasklet with the normal priority, the second with the high priority and the third out of turn. Implementation of the all of these three functions is similar, so we will consider only the first -- `tasklet_schedule`. Let's look on its implementation: ```C static inline void tasklet_schedule(struct tasklet_struct *t) @@ -339,7 +339,7 @@ void __tasklet_schedule(struct tasklet_struct *t) } ``` -As we can see it checks and set the state of the given tasklet to the `TASKLET_STATE_SCHED` and executes the `__tasklet_schedule` with the given tasklet. The `__tasklet_schedule` looks very similar on the `raise_softirq` function that we saw above. It saves the `interrupt flag` and disables interrupts at the beginning. After this it updates `tasklet_vec` with the new tasklet and calls the `raise_softirq_irqoff` function that we saw above. When the Linux kernel scheduler will decide to run deferred functions, the `tasklet_action` function will be called for deferred functions which are associated with the `TASKLET_SOFTIRQ` and `tasklet_hi_action` for deferred functions which are associated with the `HI_SOFTIRQ`. These functions are very similar and there is only one difference between they is that the `tasklet_action` uses `tasklet_vec`, but the `tasklet_hi_action` uses `tasklet_hi_vec`. +As we can see it checks and sets the state of the given tasklet to the `TASKLET_STATE_SCHED` and executes the `__tasklet_schedule` with the given tasklet. The `__tasklet_schedule` looks very similar to the `raise_softirq` function that we saw above. It saves the `interrupt flag` and disables interrupts at the beginning. After this, it updates `tasklet_vec` with the new tasklet and calls the `raise_softirq_irqoff` function that we saw above. When the Linux kernel scheduler will decide to run deferred functions, the `tasklet_action` function will be called for deferred functions which are associated with the `TASKLET_SOFTIRQ` and `tasklet_hi_action` for deferred functions which are associated with the `HI_SOFTIRQ`. These functions are very similar and there is only one difference between them -- `tasklet_action` uses `tasklet_vec` and `tasklet_hi_action` uses `tasklet_hi_vec`. Let's look on the implementation of the `tasklet_action` function: @@ -364,7 +364,7 @@ static void tasklet_action(struct softirq_action *a) } ``` -In the beginning of the `tasketl_action` function we disable interrupts for the local processor with the help of the `local_irq_disable` macro (you can read about this macro in the second [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-2.html) of this chapter). In the next step we take a head of the list that contains tasklets with normal priority and set this percpu list to `NULL` because all tasklets must be executed in a generaly way. After this we enable interrupts for the local processor and go through the list of taklets in the loop. In the every iteration of the loop we call the `tasklet_trylock` function for the given tasklet that update state of the given tasklet on `TASKLET_STATE_RUN`: +In the beginning of the `tasketl_action` function, we disable interrupts for the local processor with the help of the `local_irq_disable` macro (you can read about this macro in the second [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/interrupts-2.html) of this chapter). In the next step, we take a head of the list that contains tasklets with normal priority and set this per-cpu list to `NULL` because all tasklets must be executed in a generaly way. After this we enable interrupts for the local processor and go through the list of taklets in the loop. In every iteration of the loop we call the `tasklet_trylock` function for the given tasklet that updates state of the given tasklet on `TASKLET_STATE_RUN`: ```C static inline int tasklet_trylock(struct tasklet_struct *t) @@ -375,14 +375,14 @@ static inline int tasklet_trylock(struct tasklet_struct *t) If this operation was successful we execute tasklet's action (it was set in the `tasklet_init`) and call the `tasklet_unlock` function that clears tasklet's `TASKLET_STATE_RUN` state. -In generall that's all about `tasklets` concept. Of course this does not cover full `tasklets`, but I think that it is good point from where you can continue to learn this concept. +In general, that's all about `tasklets` concept. Of course this does not cover full `tasklets`, but I think that it is a good point from where you can continue to learn this concept. -The `tasklets` are [widely](http://lxr.free-electrons.com/ident?i=tasklet_init) used concept in the Linux kernel, but as I wrote in the beginning of this part there is third mechanism for deferred functions - `workqueue`. In the next paragraph we will know what is it. +The `tasklets` are [widely](http://lxr.free-electrons.com/ident?i=tasklet_init) used concept in the Linux kernel, but as I wrote in the beginning of this part there is third mechanism for deferred functions -- `workqueue`. In the next paragraph we will see what it is. Workqueues -------------------------------------------------------------------------------- -The `workqueue` is another concept for handling deferred functions. It is similar on `tasklets`, but has some differences. Workqueue functions run in the context of a kernel process, but `tasklet` functions run in software interrupt context. This means that `workqueue` functions must not be atomic as `tasklet` functions. Tasklets always run on the processor from which they were originally submitted. Workqueues work in the same way, but only by default. The `workqueue` concept represented by the: +The `workqueue` is another concept for handling deferred functions. It is similar to `tasklets` with some differences. Workqueue functions run in the context of a kernel process, but `tasklet` functions run in the software interrupt context. This means that `workqueue` functions must not be atomic as `tasklet` functions. Tasklets always run on the processor from which they were originally submitted. Workqueues work in the same way, but only by default. The `workqueue` concept represented by the: ```C struct worker_pool { @@ -399,9 +399,9 @@ struct worker_pool { ... ``` -structure that defined in the [kernel/workqueue.c](https://github.com/torvalds/linux/blob/master/kernel/workqueue.c) source code file in the Linux kernel. I will not write the source code of this structure here, because it has quite a bit of fileds, but we will consider some of their feilds. +structure that is defined in the [kernel/workqueue.c](https://github.com/torvalds/linux/blob/master/kernel/workqueue.c) source code file in the Linux kernel. I will not write the source code of this structure here, because it has quite a lot of fields, but we will consider some of those fields. -In its most basic form, the work queue subsystem is an interface for creating kernel threads to handle work that is queued from elsewhere. All of these kernel threads are called - `worker threads`. The work queue are maintained by the `work_struct` that defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h). Let's look on this structure: +In its most basic form, the work queue subsystem is an interface for creating kernel threads to handle work that is queued from elsewhere. All of these kernel threads are called -- `worker threads`. The work queue are maintained by the `work_struct` that defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h). Let's look on this structure: ```C struct work_struct { @@ -414,7 +414,7 @@ struct work_struct { }; ``` -Here are two things that we are interested: `func` - the function that will be scheduled by the `workqueue` and the `data` - parameter of this function. The Linux kernel provides special per-cpu threads that are called - `kworker`: +Here are two things that we are interested: `func` -- the function that will be scheduled by the `workqueue` and the `data` - parameter of this function. The Linux kernel provides special per-cpu threads that are called `kworker`: ``` systemd-cgls -k | grep kworker @@ -460,13 +460,13 @@ static inline bool queue_work(struct workqueue_struct *wq, } ``` -The `queue_work` function just calls the `queue_work_on` function that queue work on specific processor. Note that in our case we pass the `WORK_STRUCT_PENDING_BIT` to the `queue_work_on` function. It is a part of the `enum` that defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h) and represents workqueue which are not bound to any specific processor. The `queue_work_on` function tests and set the `WORK_STRUCT_PENDING_BIT` bit of the given `work` and executes the `__queue_work` function with the `workqueue` for the given processor and given `work`: +The `queue_work` function just calls the `queue_work_on` function that queue work on specific processor. Note that in our case we pass the `WORK_STRUCT_PENDING_BIT` to the `queue_work_on` function. It is a part of the `enum` that is defined in the [include/linux/workqueue.h](https://github.com/torvalds/linux/blob/master/include/linux/workqueue.h) and represents workqueue which are not bound to any specific processor. The `queue_work_on` function tests and set the `WORK_STRUCT_PENDING_BIT` bit of the given `work` and executes the `__queue_work` function with the `workqueue` for the given processor and given `work`: ```C __queue_work(cpu, wq, work); ``` -The `__queue_work` function gets the `work pool`. Yes not `workqueue`, but `work pool`. Actually all `works` are placed not in the `workqueue`, but to the `work pool` that represented by the `worker_pool` structure in the Linux kernel. As you can see above, the `workqueue_struct` structure has the `pwqs` field which is list of `worker_pools`. When we create a `workqueue`, it stands out for each processor the `pool_workqueue`. Each `pool_workqueue` associated with `worker_pool`, which is allocated on the same processor and corresponds to the type of priority queue. Through them `workqueue` interacts with `worker_pool`. So in the `__queue_work` function we set the cpu to the current processor with the `raw_smp_processor_id` (you can find information about this marco in the fouth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter), geting the `pool_workqueue` for the given `workqueue_struct` and insert the given `work` to the given `workqueue`: +The `__queue_work` function gets the `work pool`. Yes, the `work pool` not `workqueue`. Actually, all `works` are not placed in the `workqueue`, but to the `work pool` that is represented by the `worker_pool` structure in the Linux kernel. As you can see above, the `workqueue_struct` structure has the `pwqs` field which is list of `worker_pools`. When we create a `workqueue`, it stands out for each processor the `pool_workqueue`. Each `pool_workqueue` associated with `worker_pool`, which is allocated on the same processor and corresponds to the type of priority queue. Through them `workqueue` interacts with `worker_pool`. So in the `__queue_work` function we set the cpu to the current processor with the `raw_smp_processor_id` (you can find information about this marco in the fouth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter), getting the `pool_workqueue` for the given `workqueue_struct` and insert the given `work` to the given `workqueue`: ```C static void __queue_work(int cpu, struct workqueue_struct *wq, @@ -488,7 +488,7 @@ else insert_work(pwq, work, worklist, work_flags); ``` -As we can create `works` and `workqueue`, we need to know when they are execute. As I already wrote, all `works` executed by the kernel thread. When this kernel thread scheduled, it starts to execute `works` from the given `workqueue`. Each worker thread executes a loop inside the `worker_thread` function. This thread makes many different things and part of these things are similar to that we saw before in this part. As it started to work, it removes all `work_struct` or `works` from its `workqueue`. +As we can create `works` and `workqueue`, we need to know when they are executed. As I already wrote, all `works` are executed by the kernel thread. When this kernel thread is scheduled, it starts to execute `works` from the given `workqueue`. Each worker thread executes a loop inside the `worker_thread` function. This thread makes many different things and part of these things are similar to what we saw before in this part. As it starts executing, it removes all `work_struct` or `works` from its `workqueue`. That's all. @@ -499,9 +499,9 @@ It is the end of the ninth part of the [Interrupts and Interrupt Handling](http: The next part will be last part of the `Interrupts and Interrupt Handling` chapter and we will look on the real hardware driver and will try to learn how it works with the interrupts subsystem. -If you will have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). +If you have any questions or suggestions, write me a comment or ping me at [twitter](https://twitter.com/0xAX). -**Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).** +**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).** Links -------------------------------------------------------------------------------- From 51e7262cdc23e2259fcbe4843d313eb26444ed9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Si=C3=B4n=20Le=20Roux?= Date: Sun, 9 Aug 2015 22:57:56 +0200 Subject: [PATCH 26/32] Fix typo stoped --> stopped --- Booting/linux-bootstrap-5.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Booting/linux-bootstrap-5.md b/Booting/linux-bootstrap-5.md index dce09cd..5e0c90e 100644 --- a/Booting/linux-bootstrap-5.md +++ b/Booting/linux-bootstrap-5.md @@ -9,7 +9,7 @@ This is the fifth part of the `Kernel booting process` series. We saw transition Preparation before kernel decompression -------------------------------------------------------------------------------- -We stoped right before jump on 64-bit entry point - `startup_64` which located in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) source code file. We already saw the jump to the `startup_64` in the `startup_32`: +We stopped right before jump on 64-bit entry point - `startup_64` which located in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) source code file. We already saw the jump to the `startup_64` in the `startup_32`: ```assembly pushl $__KERNEL_CS From 59300f915cbfbc77bb05ce15f9e78622c9a4286a Mon Sep 17 00:00:00 2001 From: Mihir Shete Date: Mon, 10 Aug 2015 13:07:44 +0530 Subject: [PATCH 27/32] Fix duplicate links and a grammatical costruct - Remove duplicate entries from the Links section - events should be "raised" instead of "emitted" --- interrupts/interrupts-1.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/interrupts/interrupts-1.md b/interrupts/interrupts-1.md index 38d527e..ae0d858 100644 --- a/interrupts/interrupts-1.md +++ b/interrupts/interrupts-1.md @@ -16,7 +16,7 @@ We have already heard of the word `interrupt` in several parts of this book. We We will then continue to dig deeper into the details of `interrupts` and how the Linux kernel handles them. -So..., First of all what is an interrupt? An interrupt is an `event` which is emitted by software or hardware when its needs the CPU's attention. For example, we press a button on the keyboard and what do we expect next? What should the operating system and computer do after this? To simplify matters assume that each peripheral device has an interrupt line to the CPU. A device can use it to signal an interrupt to the CPU. However interrupts are not signaled directly to the CPU. In the old machines there was a [PIC](http://en.wikipedia.org/wiki/Programmable_Interrupt_Controller) which is a chip responsible for sequentially processing multiple interrupt requests from multiple devices. In the new machines there is an [Advanced Programmable Interrupt Controller](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) commonly known as - `APIC`. An `APIC` consists of two separate devices: +So..., First of all what is an interrupt? An interrupt is an `event` which is raised by software or hardware when its needs the CPU's attention. For example, we press a button on the keyboard and what do we expect next? What should the operating system and computer do after this? To simplify matters assume that each peripheral device has an interrupt line to the CPU. A device can use it to signal an interrupt to the CPU. However interrupts are not signaled directly to the CPU. In the old machines there was a [PIC](http://en.wikipedia.org/wiki/Programmable_Interrupt_Controller) which is a chip responsible for sequentially processing multiple interrupt requests from multiple devices. In the new machines there is an [Advanced Programmable Interrupt Controller](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) commonly known as - `APIC`. An `APIC` consists of two separate devices: * `Local APIC` * `I/O APIC` @@ -505,14 +505,11 @@ If you will have any questions or suggestions write me a comment or ping me at [ Links -------------------------------------------------------------------------------- +* [PIC](http://en.wikipedia.org/wiki/Programmable_Interrupt_Controller) * [Advanced Programmable Interrupt Controller](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) * [protected mode](http://en.wikipedia.org/wiki/Protected_mode) * [long mode](http://en.wikipedia.org/wiki/Long_mode) * [kernel stacks](https://www.kernel.org/doc/Documentation/x86/x86_64/kernel-stacks) -* [PIC](http://en.wikipedia.org/wiki/Programmable_Interrupt_Controller) -* [Advanced Programmable Interrupt Controller](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) -* [long mode](http://en.wikipedia.org/wiki/Long_mode) -* [protected mode](http://en.wikipedia.org/wiki/Protected_mode) * [Task State Segement](http://en.wikipedia.org/wiki/Task_state_segment) * [segmented memory model](http://en.wikipedia.org/wiki/Memory_segmentation) * [Model specific registers](http://en.wikipedia.org/wiki/Model-specific_register) From 226a1a8aac5a9db0519da7a893b7eaa3d2f02942 Mon Sep 17 00:00:00 2001 From: Nan Xiao Date: Tue, 11 Aug 2015 13:35:21 +0800 Subject: [PATCH 28/32] Update README.md Fix typo. --- interrupts/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/interrupts/README.md b/interrupts/README.md index 04840a5..784ef2f 100644 --- a/interrupts/README.md +++ b/interrupts/README.md @@ -1,6 +1,6 @@ # Interrupts and Interrupt Handling -You will find a couple of posts which describes an interrupts and an exceptions handling in the linux kernel. +You will find a couple of posts which describe an interrupts and an exceptions handling in the linux kernel. * [Interrupts and Interrupt Handling. Part 1.](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-1.md) - describes an interrupts handling theory. * [Start to dive into interrupts in the Linux kernel](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-2.md) - this part starts to describe interrupts and exceptions handling related stuff from the early stage. From 8979bcd6d3201571b878b6d5da126e5816e3ed5c Mon Sep 17 00:00:00 2001 From: Nan Xiao Date: Tue, 11 Aug 2015 13:58:38 +0800 Subject: [PATCH 29/32] Update interrupts-1.md Fix typos. --- interrupts/interrupts-1.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/interrupts/interrupts-1.md b/interrupts/interrupts-1.md index ae0d858..1c3792e 100644 --- a/interrupts/interrupts-1.md +++ b/interrupts/interrupts-1.md @@ -25,13 +25,13 @@ The first - `Local APIC` is located on each CPU core. The local APIC is responsi The second - `I/O APIC` provides multi-processor interrupt management. It is used to distribute external interrupts among the CPU cores. More about the local and I/O APICs will be covered later in this chapter. As you can understand, interrupts can occur at any time. When an interrupt occurs, the operating system must handle it immediately. But what does it mean `to handle an interrupt`? When an interrupt occurs, the operating system must ensure the following steps: -* The kernel must pause execution of the current process; (preempt current task) +* The kernel must pause execution of the current process; (preempt current task); * The kernel must search for the handler of the interrupt and transfer control (execute interrupt handler); -* After the interrupt handler completes execution, the interrupted process can resume execution; +* After the interrupt handler completes execution, the interrupted process can resume execution. Of course there are numerous intricacies involved in this procedure of handling interrupts. But the above 3 steps form the basic skeleton of the procedure. -Addresses of each of the interrupt handlers are maintained in a special location referred to as the - `Interrupt Descriptor Table` or `IDT`. The processor uses an unique number for recognizing the type of interruption or exception. This number is called - `vector number`. A vector number is an index in the `IDT`. There is limited amount of the vector numbers and it can be from `0` to `255`. You can note the following range-check upon the vector number within the Linux kernel source-code: +Addresses of each of the interrupt handlers are maintained in a special location referred to as the - `Interrupt Descriptor Table` or `IDT`. The processor uses a unique number for recognizing the type of interruption or exception. This number is called - `vector number`. A vector number is an index in the `IDT`. There is limited amount of the vector numbers and it can be from `0` to `255`. You can note the following range-check upon the vector number within the Linux kernel source-code: ```C BUG_ON((unsigned)n > 0xFF); From 809ba66b309140a8ef3b95a761a938145831375f Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Wed, 12 Aug 2015 13:55:41 +0600 Subject: [PATCH 30/32] Update SUMMARY.md --- SUMMARY.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SUMMARY.md b/SUMMARY.md index f9eb9e1..87aae65 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -26,7 +26,7 @@ * [Handling Non-Maskable interrupts](interrupts/interrupts-6.md) * [Dive into external hardware interrupts](interrupts/interrupts-7.md) * [Initialization of external hardware interrupts structures](interrupts/interrupts-8.md) - * [Softirq, Tasklets and Workqueues](https://github.com/0xAX/linux-insides/blob/master/interrupts/interrupts-9.md) + * [Softirq, Tasklets and Workqueues](interrupts/interrupts-8.md) * [Memory management](mm/README.md) * [Memblock](mm/linux-mm-1.md) * [Fixmaps and ioremap](mm/linux-mm-2.md) From 3a82d20bca8fcd4fd489c2d5a217204f3555d59d Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Wed, 12 Aug 2015 13:58:07 +0600 Subject: [PATCH 31/32] Update SUMMARY.md --- SUMMARY.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SUMMARY.md b/SUMMARY.md index 87aae65..c9c5ccc 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -26,7 +26,7 @@ * [Handling Non-Maskable interrupts](interrupts/interrupts-6.md) * [Dive into external hardware interrupts](interrupts/interrupts-7.md) * [Initialization of external hardware interrupts structures](interrupts/interrupts-8.md) - * [Softirq, Tasklets and Workqueues](interrupts/interrupts-8.md) + * [Softirq, Tasklets and Workqueues](interrupts/interrupts-9.md) * [Memory management](mm/README.md) * [Memblock](mm/linux-mm-1.md) * [Fixmaps and ioremap](mm/linux-mm-2.md) From ae4e704e5441015ae3f32dc570b6e134ba67c822 Mon Sep 17 00:00:00 2001 From: 0xAX <0xAX@users.noreply.github.com> Date: Sat, 15 Aug 2015 19:17:12 +0600 Subject: [PATCH 32/32] Update SUMMARY.md --- SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/SUMMARY.md b/SUMMARY.md index c9c5ccc..22374b7 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -27,6 +27,7 @@ * [Dive into external hardware interrupts](interrupts/interrupts-7.md) * [Initialization of external hardware interrupts structures](interrupts/interrupts-8.md) * [Softirq, Tasklets and Workqueues](interrupts/interrupts-9.md) + * [Last part]() * [Memory management](mm/README.md) * [Memblock](mm/linux-mm-1.md) * [Fixmaps and ioremap](mm/linux-mm-2.md)