mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-21 21:21:18 +00:00
Improve English throughout the file
This commit is contained in:
parent
b9f1115479
commit
58888011c3
@ -4,7 +4,7 @@ GNU/Linux kernel internals
|
||||
Linux kernel booting process. Part 1.
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
If you have read my previous [blog posts](http://0xax.blogspot.com/search/label/asm), you can see that some time ago I started to get involved with low-level programming. I wrote some posts about x86_64 assembly programming for Linux. At the same time, I started to dive into the GNU/Linux kernel source code. It is very interesting for me to understand how low-level things work, how programs run on my computer, how they are located in memory, how the kernel manages processes and memory, how the network stack works on low-level and many many other things. I decided to write yet another series of posts about the GNU/Linux kernel for **x86_64**.
|
||||
If you read my previous [blog posts](http://0xax.blogspot.com/search/label/asm), you'll see that some time ago I started to get involved with low-level programming. I wrote some posts about x86_64 assembly programming for Linux. At the same time, I started to dive into the GNU/Linux kernel source code. It is very interesting for me to understand how low-level things work, how programs run on my computer, how they are located in memory, how the kernel manages processes and memory, how the network stack works on low-level and many many other things. I decided to write yet another series of posts about the GNU/Linux kernel for the **x86_64** processors.
|
||||
|
||||
Note that I'm not a professional kernel hacker, and I don't write code for the kernel at work. It's just a hobby. I just like low-level stuff, and it is interesting for me to see how these things work. So if you notice anything confusing, or if you have any questions/remarks, ping me on twitter [0xAX](https://twitter.com/0xAX), drop me an [email](anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-internals/issues/new). I appreciate it. All posts will also be accessible at[linux-internals](https://github.com/0xAX/linux-internals) and if you find something wrong with my English or post content, feel free to send pull request.
|
||||
|
||||
@ -16,17 +16,16 @@ Note that I'm not a professional kernel hacker, and I don't write code for the k
|
||||
* Understanding C code
|
||||
* Understanding assembly code (AT&T syntax)
|
||||
|
||||
Anyway, if you just started to learn some tools, I will try to explain some parts during this and following posts. Ok, little introduction finished and now we can start to dive into kernel and low-level stuff.
|
||||
Anyway if you just started to learn some tools, I will try to explain some parts during this and the following posts. Ok, enough with the introductions. Lets dive into kernel and low-level stuff.
|
||||
|
||||
All code is actual for kernel - 3.18, if there will be changes, I will update posts.
|
||||
All code is for kernel - 3.18, if there come about any changes, I will update the posts.
|
||||
|
||||
Magic power button, what's next?
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Despite that it is series of posts about linux kernel, we will not start from kernel code (at least in this paragraph). Ok, you pressed magic power button on your laptop or desktop computer and it started to work. After this mother board sends signal to the [power supply](http://en.wikipedia.org/wiki/Power_supply) which provides computer with the proper amount of electricity. Once motherboard receives [power good signal](http://en.wikipedia.org/wiki/Power_good_signal), it tries to run CPU. CPU resets all leftover data in its registers and sets up predefined values for every register.
|
||||
Despite that it is a series of posts about the Linux kernel, we will not start with kernel code (at least in this paragraph). Ok, so you pressed the magic power button on your laptop or desktop computer and it started up. After the motherboard sends signal to the [power supply](http://en.wikipedia.org/wiki/Power_supply), it provides the computer with the proper amount of electricity. Once the motherboard receives the [power good signal](http://en.wikipedia.org/wiki/Power_good_signal), it tries to run the CPU. The CPU then resets all of the leftover data in its register and sets up predefined values for every register.
|
||||
|
||||
|
||||
[80386](http://en.wikipedia.org/wiki/Intel_80386) and later CPUs defines following predifined data in CPU registers after computer resets:
|
||||
[80386](http://en.wikipedia.org/wiki/Intel_80386) and later CPUs define the following predifined data in CPU registers after computer resets:
|
||||
|
||||
```
|
||||
IP 0xfff0
|
||||
@ -34,13 +33,13 @@ CS selector 0xf000
|
||||
CS base 0xffff0000
|
||||
```
|
||||
|
||||
Processor works in the [real mode](http://en.wikipedia.org/wiki/Real_mode) now and we need to make a little retreat for understanding memory segmentation in this mode. Real mode is supported in all x86 compatible processors, from [8086](http://en.wikipedia.org/wiki/Intel_8086) to modern intel 64 CPUs. 8086 processor had 20 bit addres bus, which means that it could work with 0-2^20 bytes address space (1 megabyte). But it had only 16 bit registers, and with 16 bit registers maximum address is 2^16 or 0xffff (640 KB). Memory segmentation was used to make use of all of the addres space. All memory was divided into small fixed-size segments of 65535 bytes, or 64 KB. Since we can not address memory behind 640 KB with 16 bit register, another method to do it has been devised. Address consists of two parts: beginning address of segment and offset from the beginning of this segment. To get physical address in memory, we need to multiply segment part by 16 and add offset part:
|
||||
Processor begins working in the [real mode](http://en.wikipedia.org/wiki/Real_mode) now and we need to deviate a little to understand memory segmentation in this mode. Real mode is supported in all x86 compatible processors from the [8086](http://en.wikipedia.org/wiki/Intel_8086) to the modern Intel CPUs. 8086 processor had a 20-bit address bus; this means that it could work with the 0th address space all the way up to the 2^20th address space (1 MB). However, it has only 16-bit registers, and with 16-bit registers, the maximum address is 2^16 or 0xffff (640 KB). To use all of the address space, it uses memory segmentation. All memory is divided into small fixed-size segments of 65535 bytes or 64 KB each. Since we cannot address memory behind 640 KB with a 16-bit register, a newer method was used to achieve this. The address consists of two parts: the beginning address of the segment and the offset from the beginning of this segment. For getting physical address of memory, we need to multiply segment part by 16 and add the offset, as shown below:
|
||||
|
||||
```
|
||||
PhysicalAddress = Segment * 16 + Offset
|
||||
```
|
||||
|
||||
For example `CS:IP` is `0x2000:0x0010`, physical address will be:
|
||||
For example `CS:IP` is `0x2000:0x0010`; its physical address will be:
|
||||
|
||||
```python
|
||||
>>> hex((0x2000 << 4) + 0x0010)
|
||||
@ -54,24 +53,24 @@ But if we take the biggest segment part and offset: `0xffff:0xffff`, it will be:
|
||||
'0x10ffef'
|
||||
```
|
||||
|
||||
which is 65519 bytes over first megabyte. Since only one megabyte is accessible in real mode, `0x10ffef` becomes `0x00ffef` with disabled [A20](http://en.wikipedia.org/wiki/A20_line).
|
||||
which is 65519 bytes over the first megabyte. Since only one megabyte accessible in real mode, `0x10ffef` becomes `0x00ffef` with disabled [A20](http://en.wikipedia.org/wiki/A20_line).
|
||||
|
||||
Ok, now we know about real mode and memory addressing, let's get back to register values after reset.
|
||||
Ok, now we know about real mode and memory addressing, let's back to registers values after reset.
|
||||
|
||||
`CS` register has two parts: the visible segment selector and hidden base addres. We know predefined `CS` base and `IP` value, so our logical address will be:
|
||||
`CS` register has two parts: the visible segment selector and hidden base address. We know the predefined values of `CS` base and `IP`, so our logical address will be:
|
||||
|
||||
```
|
||||
0xffff0000:0xfff0
|
||||
```
|
||||
|
||||
which we can translate to the physical address::
|
||||
which we can translate to the physical address:
|
||||
|
||||
```python
|
||||
>>> hex((0xffff000 << 4) + 0xfff0)
|
||||
'0xfffffff0'
|
||||
```
|
||||
|
||||
We get `fffffff0` which is 4GB - 16 bytes. This point is the [Reset vector](http://en.wikipedia.org/wiki/Reset_vector). This is the memory location at which CPU expects to find the first instruction to execute after reset. It contains [jump](http://en.wikipedia.org/wiki/JMP_%28x86_instruction%29) instruction which usually points to the BIOS entry point. For example if we look in [coreboot](http://www.coreboot.org/) source code, we will see it:
|
||||
We get `fffffff0` which is 4GB - 16 bytes. This point is a - [Reset vector](http://en.wikipedia.org/wiki/Reset_vector). TThe first instruction exists at this memory location, which the CPU inteprets after reset. It contains the [jump](http://en.wikipedia.org/wiki/JMP_%28x86_instruction%29) instruction which usually points to the BIOS entry point. For example if we look into the [coreboot](http://www.coreboot.org/) source code, we see:
|
||||
|
||||
```assembly
|
||||
.section ".reset"
|
||||
@ -83,7 +82,7 @@ reset_vector:
|
||||
...
|
||||
```
|
||||
|
||||
We can see here jump instruction [opcode](http://ref.x86asm.net/coder32.html#xE9) - 0xe9 to the address `_start - ( . + 2)`. And we can see that `reset` section is 16 bytes and starts at `0xfffffff0`:
|
||||
We can see here that the jump instruction [opcode](http://ref.x86asm.net/coder32.html#xE9) - 0xe9 to the address `_start - ( . + 2)`. And we can see that `reset` section is 16 bytes and starts at `0xfffffff0`:
|
||||
|
||||
```
|
||||
SECTIONS {
|
||||
@ -97,7 +96,7 @@ SECTIONS {
|
||||
}
|
||||
```
|
||||
|
||||
Now BIOS started to work, after all initializations, hardware checking, it needs to load operating system. BIOS tries to find bootable device which contains boot sector. Boot sector is the first sector on device (512 bytes) and contains sequence of `0x55` and `0xaa` at 511 and 512 byte. For example:
|
||||
Now the BIOS has begun its work, and, after all the required initializations and hardware checking, it loads the operating system. The BIOS tries to find bootable device, which contains the boot sector. Boot sector is a first sector on device (512 bytes) and contains sequence of `0x55` and `0xaa` at the **511**th and **512**th byte. For example:
|
||||
|
||||
```assembly
|
||||
[BITS 16]
|
||||
@ -106,21 +105,20 @@ Now BIOS started to work, after all initializations, hardware checking, it needs
|
||||
jmp boot
|
||||
|
||||
boot:
|
||||
mov al, '!'
|
||||
mov ah, 0x0e
|
||||
mov bh, 0x00
|
||||
mov bl, 0x07
|
||||
mov al, !
|
||||
|
||||
int 0x10
|
||||
int 0x10
|
||||
jmp $
|
||||
|
||||
times 510-($-$$) db 0
|
||||
|
||||
db 0x55
|
||||
db 0xaa
|
||||
db 0x55
|
||||
```
|
||||
|
||||
Build and run it with:
|
||||
Build and run this code with using the following command:
|
||||
|
||||
```
|
||||
nasm -f bin boot.nasm && qemu-system-x86_64 boot
|
||||
@ -130,26 +128,26 @@ We will see:
|
||||
|
||||
![Simple bootloader which prints only `!`](http://oi60.tinypic.com/2qbwup0.jpg)
|
||||
|
||||
In this example we can see that this code will be executed in 16 bit real mode and will start at 0x7c00 in memory. After the start it calls [0x10](http://www.ctyme.com/intr/rb-0106.htm) interrupt which just prints `!` symbol. It fills rest of 510 bytes with zeros and finish with two magic bytes 0xaa and 0x55.
|
||||
In this example we can see that this code will be executed in 16-bit real mode and started at 0x7c00 in memory. After the start it calls [0x10](http://www.ctyme.com/intr/rb-0106.htm) interruption which just prints `!` symbol. It fills rest of the 510 bytes with zeros and ends it with two magic bytes 0xaa and 0x55.
|
||||
|
||||
Real world boot loader starts at the same point, ends with `0xaa55` bytes, but reads kernel code from device, loads it to memory, parses and passes boot parameters to kernel and etc... intead of printing one symbol :) Ok, so, from this moment bios handed control to the operating system bootloader and we can go ahead.
|
||||
In the real world, the bootloader starts at the same point, ends with `0xaa55` bytes, but reads kernel code from the device instead, loading it to memory, parsing it and passing the necessary boot parameters to the kernel, among other required tasks... intead printing one symbol :) Ok, so, from this moment, the BIOS has officially handed control to the operating system bootloader and we can go ahead.
|
||||
|
||||
**NOTE**: as you can read above CPU is in real mode. In real mode for calculating physical address in memory uses following form:
|
||||
**NOTE**: as you can read above, the CPU is in real mode. In real mode, for calculating physical address of memory, it uses the following form:
|
||||
|
||||
```
|
||||
PhysicalAddress = Segment * 16 + Offset
|
||||
```
|
||||
|
||||
as I wrote above. But we have only 16 bit general purpose registers. The maximum value of 16 bit register is: `0xffff`; So if we take the biggest values, it will be:
|
||||
As i wrote above. But we have only 16-bit general purpose registers. The maximum value of 16 bit register is: `0xffff`; So if we take the biggest values, it will be:
|
||||
|
||||
```python
|
||||
>>> hex((0xffff * 16) + 0xffff)
|
||||
'0x10ffef'
|
||||
```
|
||||
|
||||
Where `0x10ffef` is equal to `1mb + 64KB - 16b`. But [8086](http://en.wikipedia.org/wiki/Intel_8086) processor, which was first processor with real mode, had 20 bit address line, and `2^20 = 1048576.0` which is 1MB, so it means that actually available memory amount is 1MB.
|
||||
Where `0x10ffef` is equal to `1mb + 64KB - 16b`. But the [8086](http://en.wikipedia.org/wiki/Intel_8086) processor, which was first processor with real mode had 20 address lines, but `20^2 = 1048576.0` which is 1MB; this means that the amount of memory actually available was 1MB.
|
||||
|
||||
General real mode memory map is:
|
||||
The general real mode memory map is:
|
||||
|
||||
```
|
||||
0x00000000 - 0x000003FF - Real Mode Interrupt Vector Table
|
||||
@ -165,24 +163,24 @@ General real mode memory map is:
|
||||
0x000F0000 - 0x000FFFFF - System BIOS
|
||||
```
|
||||
|
||||
But stop, at the beginning of post I wrote that first instruction executed by CPU located by `0xfffffff0` address, which is much bigger than `0xffff` (1MB). How can CPU access it in real mode? As I write about and you can read in [coreboot](http://www.coreboot.org/Developer_Manual/Memory_map) documentation:
|
||||
But stop; at the beginning of this post, I had written that the first instruction executed by the CPU is located by `0xfffffff0` address; however, it's much bigger than `0xffff` (1MB). How can the CPU access *this* in real mode? As I write and you can read about in [coreboot](http://www.coreboot.org/Developer_Manual/Memory_map) documentation:
|
||||
|
||||
```
|
||||
0xFFFE_0000 - 0xFFFF_FFFF: 128 kilobyte ROM mapped into address space
|
||||
```
|
||||
|
||||
At the start of execution BIOS is not in RAM, it is located in ROM.
|
||||
At the start, the BIOS is not located in RAM, but in ROM.
|
||||
|
||||
Bootloader
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Now bios transfered control to the operating system bootlader and it needs to load operating system into the memory. There are a couple of bootloaders which can boot linux, like: [Grub2](http://www.gnu.org/software/grub/), [syslinux](http://www.syslinux.org/wiki/index.php/The_Syslinux_Project) and etc... Linux kernel has [Boot protocol](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt) which describes how to load linux kernel.
|
||||
Now, the BIOS has transferred control to the operating system's bootloader and it needs to load the operating system into memory. There are a couple of bootloaders which can boot linux like: [Grub2](http://www.gnu.org/software/grub/), [syslinux](http://www.syslinux.org/wiki/index.php/The_Syslinux_Project) etc... Linux kernel has a [Boot protocol](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt) which describes how to load the linux kernel into memory.
|
||||
|
||||
Let us briefly consider how grub loads linux. GRUB2 execution starts from `grub-core/boot/i386/pc/boot.S`. It starts to load from device its own kernel (not to be confused with linux kernel) and executes `grub_main` after successfully loading.
|
||||
Let us briefly consider how grub loads linux: GRUB2 execution starts from `grub-core/boot/i386/pc/boot.S`. It starts to load from the device, its own kernel (**not to be confused with the linux kernel**) and executes `grub_main` after successfully loading.
|
||||
|
||||
`grub_main` initializes console, gets base address for modules, sets root device, loads/parses grub configuration file, loads modules etc... At the end of execution `grub_main` moves grub to normal mode. `grub_normal_execute` (from `grub-core/normal/main.c`) completes last preparation and shows menu for selecting operating system. When we select one of grub menu entries, `grub_menu_execute_entry` begins to be executed, which executes grub `boot` command. It starts to boot operating system.
|
||||
`grub_main` initializes the console, gets the base address for the modules, sets the root device, loads/parses the grub configuration file etc... In the end of execution `grub_main` moves grub to normal mode. `grub_normal_execute` (from `grub-core/normal/main.c`) completes the final preparation steps and shows the menu for selecting the operating system. When we pressed on one of grub's menu entry, `grub_menu_execute_entry` begins to be executed, which executes grub's `boot` command. Finally, it starts to boot into the operating system.
|
||||
|
||||
As we can read in the kernel boot protocol, bootloader must read and fill some fields of kernel setup header which starts at `0x01f1` offset from the kernel setup code. Kernel header [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S) starts from:
|
||||
As we can read in the kernel boot protocol, the bootloader must read and fill some fields of the kernel setup header which starts at `0x01f1` offset from the kernel setup code. Kernel header [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S) starts from:
|
||||
|
||||
```assembly
|
||||
.globl hdr
|
||||
@ -196,9 +194,9 @@ hdr:
|
||||
boot_flag: .word 0xAA55
|
||||
```
|
||||
|
||||
Bootloader must fill this and the rest of headers (only marked as `write` in the linux boot protocol, for example [this](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L354)) with values which it either got from command line or calculated. We will not see description and explanation of all fields of kernel setup header, we will get back to it when kernel uses it. Anyway, you can find description of any field in the [boot protocol](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L156).
|
||||
The bootloader must fill this and the rest of the headers (only marked as `write` in the linux boot protocol, for example [this](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L354)) with information gotten from the command line or calculated values. We will delve into the description and explanation of all the fields of the kernel setup header, and instead will get back to it when kernel will use it. However, you can find the descriptions of any field in the [boot protocol](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L156).
|
||||
|
||||
As we can see in kernel boot protocol, memory map will be following after kernel loading:
|
||||
As we can see in kernel boot protocol, memory map will be following after the kernel loads:
|
||||
|
||||
```shell
|
||||
| Protected-mode kernel |
|
||||
@ -225,24 +223,24 @@ X+08000 +------------------------+
|
||||
|
||||
```
|
||||
|
||||
So after bootloader trasferred control to the kernel, it starts somewhere at:
|
||||
So after the bootloader has transferred control to the kernel, it starts somewhere at:
|
||||
|
||||
```
|
||||
0x1000 + X + sizeof(KernelBootSector) + 1
|
||||
```
|
||||
|
||||
where `X` is the address kernel bootsector loaded. In my case `X` is `0x10000` (), we can see it in memory dump:
|
||||
where `X` is the address the kernel's bootsector loaded. In my case `X` is `0x10000` (), which we can see in the memory dump:
|
||||
|
||||
![kernel first address](http://oi57.tinypic.com/16bkco2.jpg)
|
||||
|
||||
Ok, bootloader loaded linux kernel into memory, filled header fields and jumped to it. Now we can move directly to the kernel setup code.
|
||||
Ok, the bootloader has successfully loaded the linux kernel into memory, filled the required header fields and jumped to it. Now we can move directly to the kernel setup code.
|
||||
|
||||
Start of kernel setup
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Finally we are in the kernel. Technically kernel didn't run yet, first of all we need to setup kernel, memory manager, process manager and etc... Kernel setup execution starts from [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S) at the [_start](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L293). It is little strange at the first look, there are many instructions before it. Actually....
|
||||
Finally we are in the kernel. Technically the kernel hasn't run yet; first of all we need to setup the kernel, memory manager, process manager etc... Kernel setup execution starts from [arch/x86/boot/header.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S) at [_start](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L293). It is little strange when initially looked at, but there are many instructions before it. Actually....
|
||||
|
||||
Long time ago linux had its own bootloader, but now if you run for example:
|
||||
A long time ago, linux had its own bootloader, but now if you will run for example:
|
||||
|
||||
```
|
||||
qemu-system-x86_64 vmlinuz-3.18-generic
|
||||
@ -268,7 +266,7 @@ pe_header:
|
||||
.word 0
|
||||
```
|
||||
|
||||
It needs this for loading operating system with [UEFI](http://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface). Here we will not see how it works (will look into it in the next parts).
|
||||
It needs for loading operating system with [UEFI](http://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface). Here we will not see how it works (will look on it in the next parts).
|
||||
|
||||
So actual kernel setup entry point is:
|
||||
|
||||
@ -278,7 +276,7 @@ So actual kernel setup entry point is:
|
||||
_start:
|
||||
```
|
||||
|
||||
Bootloader (grub2 and others) knows about this point (`0x200` offset from `MZ`) and makes a jump directly to this point, despite the fact that `header.S` starts from `.bstext` section which prints error message:
|
||||
Bootloader (grub2 and others) knows this point (`0x200` offset from `MZ`) and makes a jump directly to it, despite the fact that `header.S` starts from the `.bstext` section which prints error message:
|
||||
|
||||
```
|
||||
//
|
||||
@ -289,7 +287,7 @@ Bootloader (grub2 and others) knows about this point (`0x200` offset from `MZ`)
|
||||
.bsdata : { *(.bsdata) }
|
||||
```
|
||||
|
||||
So kernel setup entry point is:
|
||||
So the kernel setup entry point is:
|
||||
|
||||
```assembly
|
||||
.globl _start
|
||||
@ -302,37 +300,37 @@ _start:
|
||||
//
|
||||
```
|
||||
|
||||
Here we can see `jmp` instruction opcode - `0xeb` to the `start_of_setup-1f` point. `Nf` notation means following: `2f` refers to the next local `2:` label. In our case it is label `1` which goes right after jump. It contains rest of setup [header](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L156) and right after setup header we can see `.entrytext` section which starts at `start_of_setup` label.
|
||||
Here we can see the `jmp` instruction jump from opcode - `0xeb` to the `start_of_setup-1f` point. `Nf` notation means following: `2f` refers to the next local `2:` label. In our case it is label `1` which goes right after the jump. It contains the rest of the setup [header](https://github.com/torvalds/linux/blob/master/Documentation/x86/boot.txt#L156), and right after the setup header we can see the `.entrytext` section which starts at the `start_of_setup` label.
|
||||
|
||||
Actually it's first code which starts to execute besides previous jump instruction. After kernel setup got the control from bootloader, first `jmp` instruction is located at `0x200` (first 512 bytes) offset from the start of kernel real mode. This we can read in linux kernel boot protocol and also see in grub2 source code:
|
||||
Actually it's the first code which executes beside previous jump instruction. After the kernel setup got control from bootloader, the first `jmp` instruction is located at `0x200` (first 512 bytes) offset from the start of the kernel's real mode. This we can read in linux kernel's boot protocol and also see in the grub2 source code:
|
||||
|
||||
```C
|
||||
state.gs = state.fs = state.es = state.ds = state.ss = segment;
|
||||
state.cs = segment + 0x20;
|
||||
```
|
||||
|
||||
It means that segment registers will have following values after kernel setup starts to work:
|
||||
It means that segment registers will have the following values after the kernel setup starts to work:
|
||||
|
||||
```
|
||||
fs = es = ds = ss = 0x1000
|
||||
cs = 0x1020
|
||||
```
|
||||
|
||||
for my case when kernel loaded at `0x10000`.
|
||||
for my case when the kernel is loaded at `0x10000`.
|
||||
|
||||
After jump to `start_of_setup`, needs to do following things:
|
||||
After the jump to `start_of_setup`, it needs to do following things:
|
||||
|
||||
* Be sure that all values of all segement registers are equal
|
||||
* Setup correct stack if need
|
||||
* Be sure that all values of all segment registers are equal
|
||||
* Setup correct stack if need be
|
||||
* Setup [bss](http://en.wikipedia.org/wiki/.bss)
|
||||
* Jump to C code at [main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c)
|
||||
|
||||
Let's look at implementation.
|
||||
Let's look at the implementation.
|
||||
|
||||
Segement registers align
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
First of all it ensures that `ds` and `es` segment registers point to the same address and enables interrupts with `sti` instruction:
|
||||
First of all it ensures that `ds` and `es` segment registers points to the same address and enables interruptions with the `sti` instruction:
|
||||
|
||||
```assembly
|
||||
movw %ds, %ax
|
||||
@ -340,7 +338,7 @@ First of all it ensures that `ds` and `es` segment registers point to the same a
|
||||
sti
|
||||
```
|
||||
|
||||
As i wrote above, grub2 loads kernel setup code at `0x10000` address and `cs` at `0x0x1020` because execution doesn't start from the start of file, but from:
|
||||
As i wrote above, grub2 loads the kernel setup code at `0x10000` address and `cs` at `0x0x1020` because execution doesn't start from the start of the file, but rather from:
|
||||
|
||||
```
|
||||
_start:
|
||||
@ -348,7 +346,7 @@ _start:
|
||||
.byte start_of_setup-1f
|
||||
```
|
||||
|
||||
jump, which is 512 bytes offset from the [4d 5a](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L47). Also need to align `cs` from 0x10200 to 0x10000 as all other segement registers. After that we setup stack:
|
||||
jump, which is 512 bytes offset from the [4d 5a](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L47). Also, it needs to align `cs` from 0x10200 to 0x10000 as do all other segement registers. After we setup stack:
|
||||
|
||||
```assembly
|
||||
pushw %ds
|
||||
@ -356,12 +354,12 @@ jump, which is 512 bytes offset from the [4d 5a](https://github.com/torvalds/lin
|
||||
lretw
|
||||
```
|
||||
|
||||
push `ds` value to stack, and address of [6](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L494) label and execute `lretw` instruction. When we call `lretw`, it loads address of `6` label to [instruction pointer](http://en.wikipedia.org/wiki/Program_counter) register and `cs` with value of `ds`. After it we will have `ds` and `cs` with the same values.
|
||||
push `ds` value to stack, and the address of the [6](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L494) label and execute the `lretw` instruction. When we call `lretw`, it loads the address of `6` label to the [instruction pointer](http://en.wikipedia.org/wiki/Program_counter) register and `cs` with value of `ds`. After this, we see that `ds` and `cs` have the same values.
|
||||
|
||||
Stack setup
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Actually, almost all of the setup code is preparation for C language environment in the real mode. Next [step](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L467) is checking of `ss` register value and making of correct stack if `ss` is wrong:
|
||||
Actually almost all of the setup code is preparation for the C language environment in the real mode. The next [step](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L467) is checking the `ss` register's value and making the correct stack if `ss` is wrong:
|
||||
|
||||
```assembly
|
||||
movw %ss, %dx
|
||||
@ -372,13 +370,13 @@ Actually, almost all of the setup code is preparation for C language environment
|
||||
|
||||
Generally, it can be 3 different cases:
|
||||
|
||||
* `ss` has valid value 0x10000 (as all other segment registers beside `cs`)
|
||||
* `ss` is invlalid and `CAN_USE_HEAP` flag is set (see below)
|
||||
* `ss` is invlalid and `CAN_USE_HEAP` flag is not set (see below)
|
||||
* `ss` has valid value 0x10000 (as do all another segment registers besides `cs`)
|
||||
* `ss` is invalid and `CAN_USE_HEAP` flag is set (see below)
|
||||
* `ss` is invalid and `CAN_USE_HEAP` flag is not set (see below)
|
||||
|
||||
Let's look at all of these cases:
|
||||
Let's look on all of these cases:
|
||||
|
||||
1. `ss` has a correct address (0x10000). In this case we go to [2](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L481) label:
|
||||
1. `ss` has a correct address (0x10000). If this is the case, we go to the [2](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L481) label:
|
||||
|
||||
```
|
||||
2: andw $~3, %dx
|
||||
@ -389,11 +387,11 @@ Let's look at all of these cases:
|
||||
sti
|
||||
```
|
||||
|
||||
Here we can see aligning of `dx` (contains `sp` given by bootloader) to 4 bytes and checking that it is not zero. If it is zero we put `0xfffc` (4 byte aligned address before maximum segment size - 64 KB) to `dx`. If it is not zero we continue to use `sp` given by bootloader (0xf7f4 in my case). After this we put `ax` value to `ss` which stores correct segment address `0x10000` and set up correct `sp`. After it we have correct stack:
|
||||
Here we can see aligning of `dx` (contains `sp` given by bootloader) to 4 bytes and checking that it is not zero. If it is zero we put `0xfffc` (4 byte aligned address before maximum segment size - 64 KB) to `dx`. If it is zero we continue to use the value of `sp` given by the bootloader (0xf7f4 in my case). After this we put the `ax` value to `ss` which stores the correct segment address `0x10000` and sets up the correct `sp` value. After it we have the corrected stack:
|
||||
|
||||
![stack](http://oi58.tinypic.com/16iwcis.jpg)
|
||||
|
||||
2. In the second case (`ss` != `ds`), first of all put [_end](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L52) (address of end of setup code) value in `dx`. And check `loadflags` header field with `testb` instruction too see if we can use heap or not. [loadflags](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L321) is a bitmask header which is defined as:
|
||||
2. In the second case (`ss` != `ds`), first of all put [_end](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L52) (address of end of setup code) value at the `dx`. And check the `loadflags` header field with `testb` instruction to determine whether we can use the heap or not. [loadflags](https://github.com/torvalds/linux/blob/master/arch/x86/boot/header.S#L321) is a bitmask header which is defined as:
|
||||
|
||||
```C
|
||||
#define LOADED_HIGH (1<<0)
|
||||
@ -402,7 +400,7 @@ Here we can see aligning of `dx` (contains `sp` given by bootloader) to 4 bytes
|
||||
#define CAN_USE_HEAP (1<<7)
|
||||
```
|
||||
|
||||
And as we can read in the boot protocol:
|
||||
And as we read in the boot protocol:
|
||||
|
||||
```
|
||||
Field name: loadflags
|
||||
@ -415,27 +413,27 @@ Field name: loadflags
|
||||
functionality will be disabled.
|
||||
```
|
||||
|
||||
If `CAN_USE_HEAP` bit is set, put `heap_end_ptr` to `dx` which points to `_end` and add `STACK_SIZE` (minimal stack size - 512 bytes) to it. After this if `dx` is not carry, jump to `2` (it will be not carry, dx = _end + 512) label as in previous case and make correct stack.
|
||||
If `CAN_USE_HEAP` bit is set, put `heap_end_ptr` to `dx` which points to `_end` and add `STACK_SIZE` (minimal stack size - 512 bytes) to it. After this if `dx` is not equal, carry jump to the `2` (it will be not carry, dx = _end + 512) label as is in the previous case and make a correct stack.
|
||||
|
||||
![stack](http://oi62.tinypic.com/dr7b5w.jpg)
|
||||
|
||||
3. The last case when `CAN_USE_HEAP` is not set, we just use minimal stack from `_end` to `_end + STACK_SIZE`:
|
||||
3. The last case, when `CAN_USE_HEAP` is not set, we just use a minimal stack from `_end` to `_end + STACK_SIZE`:
|
||||
|
||||
![minimal stack](http://oi60.tinypic.com/28w051y.jpg)
|
||||
|
||||
Bss setup
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Last two steps before we can jump to see code need to setup [bss](http://en.wikipedia.org/wiki/.bss) and check magic signature. Signature checking:
|
||||
Last two steps before we can jump to see code is the need to setup [bss](http://en.wikipedia.org/wiki/.bss) and check magic signature. Signature checking:
|
||||
|
||||
```assembly
|
||||
cmpl $0x5a5aaa55, setup_sig
|
||||
jne setup_bad
|
||||
```
|
||||
|
||||
just consists of comparing of [setup_sig](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L39) and `0x5a5aaa55` number, and if they are not equal jump to error printing.
|
||||
just consists of the comparing of [setup_sig](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L39) and `0x5a5aaa55` number, and if they are not equal, jump to error printing.
|
||||
|
||||
Ok now we have correct segment registers, stack, need only setup bss and jump to C code. Bss section used for storing statically allocated uninitialized data. Here is the code:
|
||||
Ok now we have the correct segment registers, stack, and we need only to setup BSS and jump to C code. The BSS section used is for storing statically allocated uninitialized data. Here is the code:
|
||||
|
||||
```assembly
|
||||
movw $__bss_start, %di
|
||||
@ -446,17 +444,17 @@ Ok now we have correct segment registers, stack, need only setup bss and jump to
|
||||
rep; stosl
|
||||
```
|
||||
|
||||
First of all we put [__bss_start](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L47) address in `di` and `_end + 3` (+3 - align to 4 bytes) in `cx`. Clear `eax` register with `xor` instruction and calculate size of BSS section (put in `cx`). Devide `cx` by 4 and repeat `cx` times `stosl` instruction which stores value of `eax` (it is zero) and increase `di`by the size of `eax`. In this way, we write zeros from `__bss_start` to `_end`:
|
||||
First of all we put [__bss_start](https://github.com/torvalds/linux/blob/master/arch/x86/boot/setup.ld#L47) address to `di` and `_end + 3` (+3 - align to 4 bytes) to `cx`. Clear the `eax` register with the `xor` instruction and calculate the size of BSS section (put to `cx`). Divide `cx` by 4 and repeat `cx` times, the `stosl` instruction which stores the value of `eax` (it is zero) and increase `di` by the size of `eax`. In this way, we write zeros from `__bss_start` to `_end`:
|
||||
|
||||
![bss](http://oi59.tinypic.com/29m2eyr.jpg)
|
||||
|
||||
Jump to main
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
That's all, we have stack, bss and now we can jump to `main` C function:
|
||||
That's all! We have our stack and BSS set up. Now we can jump to the `main` C function:
|
||||
|
||||
```assembly
|
||||
calll main
|
||||
call main
|
||||
```
|
||||
|
||||
which is in [arch/x86/boot/main.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/main.c). What will be there? We will see it in the next part.
|
||||
@ -464,9 +462,9 @@ which is in [arch/x86/boot/main.c](https://github.com/torvalds/linux/blob/master
|
||||
Conclusion
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
It is the end of the first part about linux kernel internals. If you have questions or suggestions, ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/0xAX/linux-internals/issues/new). In next part we will see first C code which executes in linux kernel setup, implementation of memory routines as memset, memcpy, `earlyprintk` implementation and early console initialization and many more.
|
||||
It is the end of the first part about the Linux kernel internals. If you have any questions or suggestions, ping me on twitter [0xAX](https://twitter.com/0xAX), drop me an [email](anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-internals/issues/new). In next part we will see our first C code which executes during the linux kernel setup, implementation of memory routines as memset, memcpy, `earlyprintk` implementation and early console initialization and much more. Stay tuned!
|
||||
|
||||
**Please note that English is not my first language and I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).**
|
||||
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).**
|
||||
|
||||
Links
|
||||
--------------------------------------------------------------------------------
|
||||
|
Loading…
Reference in New Issue
Block a user