mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-18 11:41:08 +00:00
Merge branch 'master' into sync2_fix_1
This commit is contained in:
commit
5303f8b91a
@ -4,9 +4,9 @@ Kernel booting process. Part 1.
|
||||
From the bootloader to the kernel
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
If you've read my previous [blog posts](https://0xax.github.io/categories/assembler/), you might have noticed that I have been involved with low-level programming for some time. I have written some posts about assembly programming for `x86_64` Linux and, at the same time, I have also started to dive into the Linux kernel source code.
|
||||
If you read my previous [blog posts](https://0xax.github.io/categories/assembler/), you might have noticed that I have been involved with low-level programming for some time. I wrote some posts about assembly programming for `x86_64` Linux and, at the same time, started to dive into the Linux kernel source code.
|
||||
|
||||
I have a great interest in understanding how low-level things work, how programs run on my computer, how they are located in memory, how the kernel manages processes and memory, how the network stack works at a low level, and many many other things. So, I have decided to write yet another series of posts about the Linux kernel for the **x86_64** architecture.
|
||||
I have a great interest in understanding how low-level things work, how programs run on my computer, how they are located in memory, how the kernel manages processes and memory, how the network stack works at a low level, and many many other things. So, I decided to write yet another series of posts about the Linux kernel for the **x86_64** architecture.
|
||||
|
||||
Note that I'm not a professional kernel hacker and I don't write code for the kernel at work. It's just a hobby. I just like low-level stuff, and it is interesting for me to see how these things work. So if you notice anything confusing, or if you have any questions/remarks, ping me on Twitter [0xAX](https://twitter.com/0xAX), drop me an [email](anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-insides/issues/new). I appreciate it.
|
||||
|
||||
@ -19,16 +19,16 @@ All posts will also be accessible at [github repo](https://github.com/0xAX/linux
|
||||
* Understanding C code
|
||||
* Understanding assembly code (AT&T syntax)
|
||||
|
||||
Anyway, if you are just starting to learn such tools, I will try to explain some parts during this and the following posts. Alright, this is the end of the simple introduction, and now we can start to dive into the Linux kernel and low-level stuff.
|
||||
Anyway, if you're just starting to learn such tools, I will try to explain some parts during this and the following posts. Alright, this is the end of the simple introduction. Let's start to dive into the Linux kernel and low-level stuff!
|
||||
|
||||
I've started writing this book at the time of the `3.18` Linux kernel, and many things might have changed since that time. If there are changes, I will update the posts accordingly.
|
||||
I started writing these posts at the time of the `3.18` Linux kernel, and many things have changed since that time. If there are changes, I will update the posts accordingly.
|
||||
|
||||
The Magical Power Button, What happens next?
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Although this is a series of posts about the Linux kernel, we will not be starting directly from the kernel code - at least not, in this paragraph. As soon as you press the magical power button on your laptop or desktop computer, it starts working. The motherboard sends a signal to the [power supply](https://en.wikipedia.org/wiki/Power_supply) device. After receiving the signal, the power supply provides the proper amount of electricity to the computer. Once the motherboard receives the [power good signal](https://en.wikipedia.org/wiki/Power_good_signal), it tries to start the CPU. The CPU resets all leftover data in its registers and sets up predefined values for each of them.
|
||||
Although this is a series of posts about the Linux kernel, we won't start directly from the kernel code. As soon as you press the magical power button on your laptop or desktop computer, it starts working. The motherboard sends a signal to the [power supply](https://en.wikipedia.org/wiki/Power_supply) device. After receiving the signal, the power supply provides the proper amount of electricity to the computer. Once the motherboard receives the [power good signal](https://en.wikipedia.org/wiki/Power_good_signal), it tries to start the CPU. The CPU resets all leftover data in its registers and sets predefined values for each of them.
|
||||
|
||||
The [80386](https://en.wikipedia.org/wiki/Intel_80386) CPU and later CPUs define the following predefined data in CPU registers after the computer resets:
|
||||
The [80386](https://en.wikipedia.org/wiki/Intel_80386) and later CPUs define the following predefined data in CPU registers after the computer resets:
|
||||
|
||||
```
|
||||
IP 0xfff0
|
||||
@ -40,7 +40,7 @@ The processor starts working in [real mode](https://en.wikipedia.org/wiki/Real_m
|
||||
|
||||
[Memory segmentation](https://en.wikipedia.org/wiki/Memory_segmentation) is used to make use of all the address space available. All memory is divided into small, fixed-size segments of `65536` bytes (64 KB). Since we cannot address memory above `64 KB` with 16-bit registers, an alternate method was devised.
|
||||
|
||||
An address consists of two parts: a segment selector, which has a base address, and an offset from this base address. In real mode, the associated base address of a segment selector is `Segment Selector * 16`. Thus, to get a physical address in memory, we need to multiply the segment selector part by `16` and add the offset to it:
|
||||
An address consists of two parts: a segment selector, which has a base address; and an offset from this base address. In real mode, the associated base address of a segment selector is `Segment Selector * 16`. Thus, to get a physical address in memory, we need to multiply the segment selector part by `16` and add the offset to it:
|
||||
|
||||
```
|
||||
PhysicalAddress = Segment Selector * 16 + Offset
|
||||
@ -62,9 +62,9 @@ But, if we take the largest segment selector and offset, `0xffff:0xffff`, then t
|
||||
|
||||
which is `65520` bytes past the first megabyte. Since only one megabyte is accessible in real mode, `0x10ffef` becomes `0x00ffef` with the [A20 line](https://en.wikipedia.org/wiki/A20_line) disabled.
|
||||
|
||||
Ok, now we know a little bit about real mode and memory addressing in this mode. Let's get back to discussing register values after reset.
|
||||
Ok, now we know a little bit about real mode and its memory addressing. Let's get back to discussing register values after reset.
|
||||
|
||||
The `CS` register consists of two parts: the visible segment selector, and the hidden base address. While the base address is normally formed by multiplying the segment selector value by 16, during a hardware reset the segment selector in the CS register is loaded with `0xf000` and the base address is loaded with `0xffff0000`; the processor uses this special base address until `CS` is changed.
|
||||
The `CS` register consists of two parts: the visible segment selector and the hidden base address. While the base address is normally formed by multiplying the segment selector value by 16, during a hardware reset the segment selector in the CS register is loaded with `0xf000` and the base address is loaded with `0xffff0000`. The processor uses this special base address until `CS` changes.
|
||||
|
||||
The starting address is formed by adding the base address to the value in the EIP register:
|
||||
|
||||
@ -73,7 +73,7 @@ The starting address is formed by adding the base address to the value in the EI
|
||||
'0xfffffff0'
|
||||
```
|
||||
|
||||
We get `0xfffffff0`, which is 16 bytes below 4GB. This point is called the [reset vector](https://en.wikipedia.org/wiki/Reset_vector). This is the memory location at which the CPU expects to find the first instruction to execute after reset. It contains a [jump](https://en.wikipedia.org/wiki/JMP_%28x86_instruction%29) (`jmp`) instruction that usually points to the BIOS entry point. For example, if we look in the [coreboot](https://www.coreboot.org/) source code (`src/cpu/x86/16bit/reset16.inc`), we will see:
|
||||
We get `0xfffffff0`, which is 16 bytes below 4GB. This point is called the [reset vector](https://en.wikipedia.org/wiki/Reset_vector). It's the memory location at which the CPU expects to find the first instruction to execute after reset. It contains a [jump](https://en.wikipedia.org/wiki/JMP_%28x86_instruction%29) (`jmp`) instruction that usually points to the [BIOS](https://en.wikipedia.org/wiki/BIOS) (Basic Input/Output System) entry point. For example, if we look in the [coreboot](https://www.coreboot.org/) source code (`src/cpu/x86/16bit/reset16.inc`), we see:
|
||||
|
||||
```assembly
|
||||
.section ".reset", "ax", %progbits
|
||||
@ -87,7 +87,7 @@ _start:
|
||||
|
||||
Here we can see the `jmp` instruction [opcode](http://ref.x86asm.net/coder32.html#xE9), which is `0xe9`, and its destination address at `_start16bit - ( . + 2)`.
|
||||
|
||||
We can also see that the `reset` section is `16` bytes and is compiled to start from the address `0xfffffff0` (`src/cpu/x86/16bit/reset16.ld`):
|
||||
We also see that the `reset` section is `16` bytes and is compiled to start from the address `0xfffffff0` (`src/cpu/x86/16bit/reset16.ld`):
|
||||
|
||||
```
|
||||
SECTIONS {
|
||||
@ -103,7 +103,7 @@ SECTIONS {
|
||||
}
|
||||
```
|
||||
|
||||
Now the BIOS starts; after initializing and checking the hardware, the BIOS needs to find a bootable device. A boot order is stored in the BIOS configuration, controlling which devices the BIOS attempts to boot from. When attempting to boot from a hard drive, the BIOS tries to find a boot sector. On hard drives partitioned with an [MBR partition layout](https://en.wikipedia.org/wiki/Master_boot_record), the boot sector is stored in the first `446` bytes of the first sector, where each sector is `512` bytes. The final two bytes of the first sector are `0x55` and `0xaa`, which designates to the BIOS that this device is bootable.
|
||||
Now the BIOS starts. After initializing and checking the hardware, the BIOS needs to find a bootable device. A boot order is stored in the BIOS configuration, controlling which devices the BIOS attempts to boot from. When attempting to boot from a hard drive, the BIOS tries to find a boot sector. On hard drives partitioned with an [MBR partition layout](https://en.wikipedia.org/wiki/Master_boot_record), the boot sector is stored in the first `446` bytes of the first sector, where each sector is `512` bytes. The final two bytes of the first sector are `0x55` and `0xaa`, which designates to the BIOS that this device is bootable.
|
||||
|
||||
For example:
|
||||
|
||||
@ -134,13 +134,13 @@ Build and run this with:
|
||||
nasm -f bin boot.nasm && qemu-system-x86_64 boot
|
||||
```
|
||||
|
||||
This will instruct [QEMU](http://qemu.org) to use the `boot` binary that we just built as a disk image. Since the binary generated by the assembly code above fulfills the requirements of the boot sector (the origin is set to `0x7c00` and we end it with the magic sequence), QEMU will treat the binary as the master boot record (MBR) of a disk image.
|
||||
This will instruct [QEMU](https://www.qemu.org/) to use the `boot` binary that we just built as a disk image. Since the binary generated by the assembly code above fulfills the requirements of the boot sector (the origin is set to `0x7c00` and we end it with the magic sequence), QEMU will treat the binary as the master boot record (MBR) of a disk image.
|
||||
|
||||
You will see:
|
||||
|
||||
![Simple bootloader which prints only `!`](http://oi60.tinypic.com/2qbwup0.jpg)
|
||||
|
||||
In this example, we can see that the code will be executed in `16-bit` real mode and will start at `0x7c00` in memory. After starting, it calls the [0x10](http://www.ctyme.com/intr/rb-0106.htm) interrupt, which just prints the `!` symbol; it fills the remaining `510` bytes with zeros and finishes with the two magic bytes `0xaa` and `0x55`.
|
||||
In this example, we can see that the code will be executed in `16-bit` real mode and will start at `0x7c00` in memory. After starting, it calls the [0x10](http://www.ctyme.com/intr/rb-0106.htm) interrupt, which just prints the `!` symbol. It fills the remaining `510` bytes with zeros and finishes with the two magic bytes `0xaa` and `0x55`.
|
||||
|
||||
You can see a binary dump of this using the `objdump` utility:
|
||||
|
||||
@ -149,15 +149,15 @@ nasm -f bin boot.nasm
|
||||
objdump -D -b binary -mi386 -Maddr16,data16,intel boot
|
||||
```
|
||||
|
||||
A real-world boot sector has code for continuing the boot process and a partition table instead of a bunch of 0's and an exclamation mark :) From this point onwards, the BIOS hands over control to the bootloader.
|
||||
A real-world boot sector has code for continuing the boot process and a partition table instead of a bunch of 0's and an exclamation mark. :) From this point onwards, the BIOS hands control over to the bootloader.
|
||||
|
||||
**NOTE**: As explained above, the CPU is in real mode; in real mode, calculating the physical address in memory is done as follows:
|
||||
**NOTE**: As explained above, the CPU is in real mode. In real mode, calculating the physical address in memory is done as follows:
|
||||
|
||||
```
|
||||
PhysicalAddress = Segment Selector * 16 + Offset
|
||||
```
|
||||
|
||||
just as explained above. We have only 16-bit general purpose registers; the maximum value of a 16-bit register is `0xffff`, so if we take the largest values, the result will be:
|
||||
just as explained above. We have only 16-bit general purpose registers, which has a maximum value of `0xffff`, so if we take the largest values the result will be:
|
||||
|
||||
```python
|
||||
>>> hex((0xffff * 16) + 0xffff)
|
||||
@ -182,7 +182,7 @@ In general, real mode's memory map is as follows:
|
||||
0x000F0000 - 0x000FFFFF - System BIOS
|
||||
```
|
||||
|
||||
In the beginning of this post, I wrote that the first instruction executed by the CPU is located at address `0xFFFFFFF0`, which is much larger than `0xFFFFF` (1MB). How can the CPU access this address in real mode? The answer is in the [coreboot](https://www.coreboot.org/Developer_Manual/Memory_map) documentation:
|
||||
At the beginning of this post, I wrote that the first instruction executed by the CPU is located at address `0xFFFFFFF0`, which is much larger than `0xFFFFF` (1MB). How can the CPU access this address in real mode? The answer is in the [coreboot](https://www.coreboot.org/Developer_Manual/Memory_map) documentation:
|
||||
|
||||
```
|
||||
0xFFFE_0000 - 0xFFFF_FFFF: 128 kilobyte ROM mapped into address space
|
||||
@ -195,11 +195,11 @@ Bootloader
|
||||
|
||||
There are a number of bootloaders that can boot Linux, such as [GRUB 2](https://www.gnu.org/software/grub/) and [syslinux](http://www.syslinux.org/wiki/index.php/The_Syslinux_Project). The Linux kernel has a [Boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt) which specifies the requirements for a bootloader to implement Linux support. This example will describe GRUB 2.
|
||||
|
||||
Continuing from before, now that the `BIOS` has chosen a boot device and transferred control to the boot sector code, execution starts from [boot.img](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/boot/i386/pc/boot.S;hb=HEAD). This code is very simple, due to the limited amount of space available, and contains a pointer which is used to jump to the location of GRUB 2's core image. The core image begins with [diskboot.img](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/boot/i386/pc/diskboot.S;hb=HEAD), which is usually stored immediately after the first sector in the unused space before the first partition. The above code loads the rest of the core image, which contains GRUB 2's kernel and drivers for handling filesystems, into memory. After loading the rest of the core image, it executes the [grub_main](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/kern/main.c) function.
|
||||
Continuing from before, now that the BIOS has chosen a boot device and transferred control to the boot sector code, execution starts from [boot.img](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/boot/i386/pc/boot.S;hb=HEAD). Its code is very simple, due to the limited amount of space available. It contains a pointer which is used to jump to the location of GRUB 2's core image. The core image begins with [diskboot.img](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/boot/i386/pc/diskboot.S;hb=HEAD), which is usually stored immediately after the first sector in the unused space before the first partition. The above code loads the rest of the core image, which contains GRUB 2's kernel and drivers for handling filesystems, into memory. After loading the rest of the core image, it executes the [grub_main](http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/kern/main.c) function.
|
||||
|
||||
The `grub_main` function initializes the console, gets the base address for modules, sets the root device, loads/parses the grub configuration file, loads modules, etc. At the end of execution, the `grub_main` function moves grub to normal mode. The `grub_normal_execute` function (from the `grub-core/normal/main.c` source code file) completes the final preparations and shows a menu to select an operating system. When we select one of the grub menu entries, the `grub_menu_execute_entry` function runs, executing the grub `boot` command and booting the selected operating system.
|
||||
|
||||
As we can read in the kernel boot protocol, the bootloader must read and fill some fields of the kernel setup header, which starts at the `0x01f1` offset from the kernel setup code. You may look at the boot [linker script](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) to confirm the value of this offset. The kernel header [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S) starts from:
|
||||
As we can read in the kernel boot protocol, the bootloader must read and fill some fields of the kernel setup header, which starts at offset `0x01f1` from the kernel setup code. You may look at the boot [linker script](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) to confirm the value of this offset. The kernel header [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S) starts from:
|
||||
|
||||
```assembly
|
||||
.globl hdr
|
||||
@ -213,9 +213,9 @@ hdr:
|
||||
boot_flag: .word 0xAA55
|
||||
```
|
||||
|
||||
The bootloader must fill this and the rest of the headers (which are only marked as being type `write` in the Linux boot protocol, such as in [this example](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L354)) with values which it has either received from the command line or calculated during boot. (We will not go over full descriptions and explanations for all fields of the kernel setup header now, but we shall do so when we discuss how the kernel uses them; you can find a description of all fields in the [boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L156).)
|
||||
The bootloader must fill this and the rest of the headers (which are only marked as being type `write` in the Linux boot protocol, such as in [this example](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L354)) with values either received from the command line or calculated during booting. (We will not go over full descriptions and explanations for all fields of the kernel setup header for now, but we shall do so when discussing how the kernel uses them. You can find a description of all fields in the [boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L156).)
|
||||
|
||||
As we can see in the kernel boot protocol, the memory will be mapped as follows after loading the kernel:
|
||||
As we can see in the kernel boot protocol, memory will be mapped as follows after loading the kernel:
|
||||
|
||||
```shell
|
||||
| Protected-mode kernel |
|
||||
@ -242,7 +242,7 @@ X+08000 +------------------------+
|
||||
|
||||
```
|
||||
|
||||
So, when the bootloader transfers control to the kernel, it starts at:
|
||||
When the bootloader transfers control to the kernel, it starts at:
|
||||
|
||||
```
|
||||
X + sizeof(KernelBootSector) + 1
|
||||
@ -252,14 +252,14 @@ where `X` is the address of the kernel boot sector being loaded. In my case, `X`
|
||||
|
||||
![kernel first address](http://oi57.tinypic.com/16bkco2.jpg)
|
||||
|
||||
The bootloader has now loaded the Linux kernel into memory, filled the header fields, and then jumped to the corresponding memory address. We can now move directly to the kernel setup code.
|
||||
The bootloader has now loaded the Linux kernel into memory, filled the header fields, and then jumped to the corresponding memory address. We now move directly to the kernel setup code.
|
||||
|
||||
The Beginning of the Kernel Setup Stage
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Finally, we are in the kernel! Technically, the kernel hasn't run yet; first, the kernel setup part must configure stuff such as the decompressor and some memory management related things, to name a few. After all these things are done, the kernel setup part will decompress the actual kernel and jump to it. Execution of the setup part starts from [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S) at the [_start](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L292) symbol.
|
||||
Finally, we are in the kernel! Technically, the kernel hasn't run yet. First, the kernel setup part must configure stuff such as the decompressor and some memory management related things, to name a few. After all these things are done, the kernel setup part will decompress the actual kernel and jump to it. Execution of the setup part starts from [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S) at the [_start](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L292) symbol.
|
||||
|
||||
It may looks a little bit strange at first sight, as there are several instructions before it. A long time ago, the Linux kernel used to have its own bootloader. Now, however, if you run, for example,
|
||||
It may look a bit strange at first sight, as there are several instructions before it. A long time ago, the Linux kernel had its own bootloader. Now, however, if you run, for example,
|
||||
|
||||
```
|
||||
qemu-system-x86_64 vmlinuz-3.18-generic
|
||||
@ -285,7 +285,7 @@ pe_header:
|
||||
.word 0
|
||||
```
|
||||
|
||||
It needs this to load an operating system with [UEFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) support. We won't be looking into its inner workings right now and will cover it in upcoming chapters.
|
||||
It needs this to load an operating system with [UEFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) support. We won't be looking into its inner workings right now but will cover it in upcoming chapters.
|
||||
|
||||
The actual kernel setup entry point is:
|
||||
|
||||
@ -295,7 +295,7 @@ The actual kernel setup entry point is:
|
||||
_start:
|
||||
```
|
||||
|
||||
The bootloader (grub2 and others) knows about this point (at an offset of `0x200` from `MZ`) and makes a jump directly to it, despite the fact that `header.S` starts from the `.bstext` section, which prints an error message:
|
||||
The bootloader (GRUB 2 and others) knows about this point (at an offset of `0x200` from `MZ`) and jumps directly to it, despite the fact that `header.S` starts from the `.bstext` section, which prints an error message:
|
||||
|
||||
```
|
||||
//
|
||||
@ -319,9 +319,9 @@ _start:
|
||||
//
|
||||
```
|
||||
|
||||
Here we can see a `jmp` instruction opcode (`0xeb`) that jumps to the `start_of_setup-1f` point. In `Nf` notation, `2f`, for example, refers to the local label `2:`; in our case, it is the label `1` that is present right after the jump, and it contains the rest of the setup [header](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L156). Right after the setup header, we see the `.entrytext` section, which starts at the `start_of_setup` label.
|
||||
Here we can see a `jmp` instruction opcode (`0xeb`) that jumps to the `start_of_setup-1f` point. In `Nf` notation, `2f`, for example, refers to the local label `2:`. In our case, it's label `1:` that is present right after the jump, and contains the rest of the setup [header](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt#L156). Right after the setup header, we see the `.entrytext` section, which starts at the `start_of_setup` label.
|
||||
|
||||
This is the first code that actually runs (aside from the previous jump instructions, of course). After the kernel setup part receives control from the bootloader, the first `jmp` instruction is located at the `0x200` offset from the start of the kernel real mode, i.e., after the first 512 bytes. This can be seen in both the Linux kernel boot protocol and the grub2 source code:
|
||||
This is the first code that actually runs (aside from the previous jump instructions, of course). After the kernel setup part receives control from the bootloader, the first `jmp` instruction is located at the `0x200` offset from the start of the kernel real mode, i.e., after the first 512 bytes. This can be seen in both the Linux kernel boot protocol and the GRUB 2 source code:
|
||||
|
||||
```C
|
||||
segment = grub_linux_real_target >> 4;
|
||||
@ -329,7 +329,7 @@ state.gs = state.fs = state.es = state.ds = state.ss = segment;
|
||||
state.cs = segment + 0x20;
|
||||
```
|
||||
|
||||
In my case, the kernel is loaded at `0x10000` physical address. This means that segment registers will have the following values after kernel setup starts:
|
||||
In my case, the kernel is loaded at the physical address `0x10000`. This means that segment registers have the following values after kernel setup starts:
|
||||
|
||||
```
|
||||
gs = fs = es = ds = ss = 0x1000
|
||||
@ -356,7 +356,7 @@ First of all, the kernel ensures that the `ds` and `es` segment registers point
|
||||
cld
|
||||
```
|
||||
|
||||
As I wrote earlier, `grub2` loads kernel setup code at address `0x10000` by default and `cs` at `0x1020` because execution doesn't start from the start of file, but from the jump here:
|
||||
As I wrote earlier, `grub2` loads kernel setup code at address `0x10000` by default and `cs` at `0x1020` because execution doesn't start from the start of the file, but from the jump here:
|
||||
|
||||
```assembly
|
||||
_start:
|
||||
@ -388,7 +388,7 @@ Almost all of the setup code is for preparing the C language environment in real
|
||||
|
||||
This can lead to 3 different scenarios:
|
||||
|
||||
* `ss` has a valid value `0x1000` (as do all the other segment registers beside `cs`)
|
||||
* `ss` has a valid value `0x1000` (as do all the other segment registers besides `cs`)
|
||||
* `ss` is invalid and the `CAN_USE_HEAP` flag is set (see below)
|
||||
* `ss` is invalid and the `CAN_USE_HEAP` flag is not set (see below)
|
||||
|
||||
@ -405,11 +405,11 @@ Let's look at all three of these scenarios in turn:
|
||||
sti
|
||||
```
|
||||
|
||||
Here we set the alignment of `dx` (which contains the value of `sp` as given by the bootloader) to `4` bytes and check if it is zero. If it is, we set `dx` to `0xfffc` (The last 4-byte aligned address in a 64KB segment). If it is not zero, we continue to use the value of `sp` given by the bootloader (0xf7f4 in my case). After this, we put the value of `ax` into `ss`, which means `ss` contains the value `0x1000`. We now have a correct stack:
|
||||
Here we set the alignment of `dx` (which contains the value of `sp` as given by the bootloader) to `4` bytes and check if it is zero. If it is, we set `dx` to `0xfffc` (The last 4-byte aligned address in a 64KB segment). If it is not zero, we continue to use the value of `sp` given by the bootloader (`0xf7f4` in my case). Afterwards, we put the value of `ax` (`0x1000`) into `ss`. We now have a correct stack:
|
||||
|
||||
![stack](http://oi58.tinypic.com/16iwcis.jpg)
|
||||
|
||||
* In the second scenario, (`ss` != `ds`). First, we put the value of [_end](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) (the address of the end of the setup code) into `dx` and check the `loadflags` header field using the `testb` instruction to see whether we can use the heap. [loadflags](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L320) is a bitmask header which is defined as:
|
||||
* The second scenario, (`ss` != `ds`). First, we put the value of [_end](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) (the address of the end of the setup code) into `dx` and check the `loadflags` header field using the `testb` instruction to see whether we can use the heap. [loadflags](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L320) is a bitmask header defined as:
|
||||
|
||||
```C
|
||||
#define LOADED_HIGH (1<<0)
|
||||
@ -418,7 +418,7 @@ Here we set the alignment of `dx` (which contains the value of `sp` as given by
|
||||
#define CAN_USE_HEAP (1<<7)
|
||||
```
|
||||
|
||||
and, as we can read in the boot protocol:
|
||||
and as we can read in the boot protocol:
|
||||
|
||||
```
|
||||
Field name: loadflags
|
||||
@ -464,14 +464,14 @@ The BSS section is used to store statically allocated, uninitialized data. Linux
|
||||
rep; stosl
|
||||
```
|
||||
|
||||
First, the [__bss_start](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) address is moved into `di`. Next, the `_end + 3` address (+3 - aligns to 4 bytes) is moved into `cx`. The `eax` register is cleared (using a `xor` instruction), and the bss section size (`cx`-`di`) is calculated and put into `cx`. Then, `cx` is divided by four (the size of a 'word'), and the `stosl` instruction is used repeatedly, storing the value of `eax` (zero) into the address pointed to by `di`, automatically increasing `di` by four, repeating until `cx` reaches zero). The net effect of this code is that zeros are written through all words in memory from `__bss_start` to `_end`:
|
||||
First, the [__bss_start](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld) address is moved into `di`. Next, the `_end + 3` address (+3 - aligns to 4 bytes) is moved into `cx`. The `eax` register is cleared (using the `xor` instruction), and the bss section size (`cx - di`) is calculated and put into `cx`. Then, `cx` is divided by four (the size of a 'word'), and the `stosl` instruction is used repeatedly, storing the value of `eax` (zero) into the address pointed to by `di`, automatically increasing `di` by four, repeating until `cx` reaches zero. The net effect of this code is that zeros are written through all words in memory from `__bss_start` to `_end`:
|
||||
|
||||
![bss](http://oi59.tinypic.com/29m2eyr.jpg)
|
||||
|
||||
Jump to main
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
That's all - we have the stack and BSS, so we can jump to the `main()` C function:
|
||||
That's all! We have the stack and BSS, so we can jump to the `main()` C function:
|
||||
|
||||
```assembly
|
||||
calll main
|
||||
|
@ -4,14 +4,14 @@ Notification Chains in Linux Kernel
|
||||
Introduction
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
The Linux kernel is huge piece of [C](https://en.wikipedia.org/wiki/C_(programming_language)) code which consists from many different subsystems. Each subsystem has its own purpose which is independent of other subsystems. But often one subsystem wants to know something from other subsystem(s). There is special mechanism in the Linux kernel which allows to solve this problem partly. The name of this mechanism is - `notification chains` and its main purpose to provide a way for different subsystems to subscribe on asynchronous events from other subsystems. Note that this mechanism is only for communication inside kernel, but there are other mechanisms for communication between kernel and userspace.
|
||||
The Linux kernel is huge piece of [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code which consists from many different subsystems. Each subsystem has its own purpose which is independent of other subsystems. But often one subsystem wants to know something from other subsystem(s). There is special mechanism in the Linux kernel which allows to solve this problem partly. The name of this mechanism is - `notification chains` and its main purpose to provide a way for different subsystems to subscribe on asynchronous events from other subsystems. Note that this mechanism is only for communication inside kernel, but there are other mechanisms for communication between kernel and userspace.
|
||||
|
||||
Before we will consider `notification chains` [API](https://en.wikipedia.org/wiki/Application_programming_interface) and implementation of this API, let's look at `Notification chains` mechanism from theoretical side as we did it in other parts of this book. Everything which is related to `notification chains` mechanism is located in the [include/linux/notifier.h](https://github.com/torvalds/linux/blob/master/include/linux/notifier.h) header file and [kernel/notifier.c](https://github.com/torvalds/linux/blob/master/kernel/notifier.c) source code file. So let's open them and start to dive.
|
||||
|
||||
Notification Chains related data structures
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Let's start to consider `notification chains` mechanism from related data structures. As I wrote above, main data structures should be located in the [include/linux/notifier.h](https://github.com/torvalds/linux/blob/master/include/linux/notifier.h) header file, so the Linux kernel provides generic API which does not depend on certain architecture. In general, the `notification chains` mechanism represents a list (that's why it named `chains`) of [callback](https://en.wikipedia.org/wiki/Callback_(computer_programming)) functions which are will be executed when an event will be occurred.
|
||||
Let's start to consider `notification chains` mechanism from related data structures. As I wrote above, main data structures should be located in the [include/linux/notifier.h](https://github.com/torvalds/linux/blob/master/include/linux/notifier.h) header file, so the Linux kernel provides generic API which does not depend on certain architecture. In general, the `notification chains` mechanism represents a list (that's why it named `chains`) of [callback](https://en.wikipedia.org/wiki/Callback_%28computer_programming%29) functions which are will be executed when an event will be occurred.
|
||||
|
||||
All of these callback functions are represented as `notifier_fn_t` type in the Linux kernel:
|
||||
|
||||
@ -355,9 +355,9 @@ That's all.
|
||||
Links
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [C programming langauge](https://en.wikipedia.org/wiki/C_(programming_language))
|
||||
* [C programming language](https://en.wikipedia.org/wiki/C_%28programming_language%29)
|
||||
* [API](https://en.wikipedia.org/wiki/Application_programming_interface)
|
||||
* [callback](https://en.wikipedia.org/wiki/Callback_(computer_programming))
|
||||
* [callback](https://en.wikipedia.org/wiki/Callback_%28computer_programming%29)
|
||||
* [RCU](https://en.wikipedia.org/wiki/Read-copy-update)
|
||||
* [spinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/linux-sync-1.html)
|
||||
* [loadable modules](https://en.wikipedia.org/wiki/Loadable_kernel_module)
|
||||
|
@ -666,3 +666,4 @@ Links
|
||||
* [Documentation](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/Documentation/kbuild/makefiles.txt)
|
||||
* [System.map](https://en.wikipedia.org/wiki/System.map)
|
||||
* [Relocation](https://en.wikipedia.org/wiki/Relocation_%28computing%29)
|
||||
* [The Linux Kernel](https://www.kernel.org/doc/html/latest/driver-api/device_link.html)
|
||||
|
@ -126,3 +126,4 @@ Thank you to all contributors:
|
||||
* [Junsoo Lee](https://github.com/junsooo)
|
||||
* [SeongJae Park](https://github.com/sjp38)
|
||||
* [Stefan20162016](https://github.com/stefan20162016)
|
||||
* [Marco Torsello](https://github.com/md1512)
|
Loading…
Reference in New Issue
Block a user