Browse Source

Merge pull request #667 from WarpspeedSCP/grammar-fix

Fix grammar and  improve formatting for linux-bootstrap section
pull/671/head
0xAX 2 months ago
parent
commit
b97cc6aaf5
No account linked to committer's email address
4 changed files with 217 additions and 195 deletions
  1. 6
    6
      Booting/linux-bootstrap-1.md
  2. 101
    81
      Booting/linux-bootstrap-4.md
  3. 52
    50
      Booting/linux-bootstrap-5.md
  4. 58
    58
      Booting/linux-bootstrap-6.md

+ 6
- 6
Booting/linux-bootstrap-1.md View File

@@ -4,7 +4,7 @@ Kernel booting process. Part 1.
4 4
 From the bootloader to the kernel
5 5
 --------------------------------------------------------------------------------
6 6
 
7
-If you have been reading my previous [blog posts](https://0xax.github.io/categories/assembler/), then you can see that, for some time now, I have been starting to get involved with low-level programming. I have written some posts about assembly programming for `x86_64` Linux and, at the same time, I have also started to dive into the Linux kernel source code.
7
+If you've read my previous [blog posts](https://0xax.github.io/categories/assembler/), you might have noticed that I  have been involved with low-level programming for some time. I have written some posts about assembly programming for `x86_64` Linux and, at the same time, I have also started to dive into the Linux kernel source code.
8 8
 
9 9
 I have a great interest in understanding how low-level things work, how programs run on my computer, how they are located in memory, how the kernel manages processes and memory, how the network stack works at a low level, and many many other things. So, I have decided to write yet another series of posts about the Linux kernel for the **x86_64** architecture.
10 10
 
@@ -87,7 +87,7 @@ _start:
87 87
 
88 88
 Here we can see the `jmp` instruction [opcode](http://ref.x86asm.net/coder32.html#xE9), which is `0xe9`, and its destination address at `_start16bit - ( . + 2)`.
89 89
 
90
-We can also see that the `reset` section is `16` bytes and that is compiled to start from `0xfffffff0` address (`src/cpu/x86/16bit/reset16.ld`):
90
+We can also see that the `reset` section is `16` bytes and is compiled to start from the address `0xfffffff0` (`src/cpu/x86/16bit/reset16.ld`):
91 91
 
92 92
 ```
93 93
 SECTIONS {
@@ -134,7 +134,7 @@ Build and run this with:
134 134
 nasm -f bin boot.nasm && qemu-system-x86_64 boot
135 135
 ```
136 136
 
137
-This will instruct [QEMU](http://qemu.org) to use the `boot` binary that we just built as a disk image. Since the binary generated by the assembly code above fulfills the requirements of the boot sector (the origin is set to `0x7c00` and we end with the magic sequence), QEMU will treat the binary as the master boot record (MBR) of a disk image.
137
+This will instruct [QEMU](http://qemu.org) to use the `boot` binary that we just built as a disk image. Since the binary generated by the assembly code above fulfills the requirements of the boot sector (the origin is set to `0x7c00` and we end it with the magic sequence), QEMU will treat the binary as the master boot record (MBR) of a disk image.
138 138
 
139 139
 You will see:
140 140
 
@@ -377,7 +377,7 @@ which pushes the value of `ds` to the stack, followed by the address of the [6](
377 377
 Stack Setup
378 378
 --------------------------------------------------------------------------------
379 379
 
380
-Almost all of the setup code is in preparation for the C language environment in real mode. The next [step](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L575) is checking the `ss` register value and making a correct stack if `ss` is wrong:
380
+Almost all of the setup code is for preparing the C language environment in real mode. The next [step](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S#L575) is checking the `ss` register's value and setting up a correct stack if `ss` is wrong:
381 381
 
382 382
 ```assembly
383 383
     movw    %ss, %dx
@@ -405,7 +405,7 @@ Let's look at all three of these scenarios in turn:
405 405
     sti
406 406
 ```
407 407
 
408
-Here we set the alignment of `dx` (which contains the value of `sp` as given by the bootloader) to `4` bytes and a check for whether or not it is zero. If it is zero, we put `0xfffc` (4 byte aligned address before the maximum segment size of 64 KB) in `dx`. If it is not zero, we continue to use the value of `sp` given by the bootloader (0xf7f4 in my case). After this, we put the value of `ax` into `ss`, which stores the correct segment address of `0x1000` and sets up a correct `sp`. We now have a correct stack:
408
+Here we set the alignment of `dx` (which contains the value of `sp` as given by the bootloader) to `4` bytes and check if it is zero. If it is, we set `dx` to `0xfffc` (The last 4-byte aligned address in a 64KB segment). If it is not zero, we continue to use the value of `sp` given by the bootloader (0xf7f4 in my case). After this, we put the value of `ax` into `ss`, which means `ss` contains the value `0x1000`. We now have a correct stack:
409 409
 
410 410
 ![stack](http://oi58.tinypic.com/16iwcis.jpg)
411 411
 
@@ -431,7 +431,7 @@ Field name: loadflags
431 431
     functionality will be disabled.
432 432
 ```
433 433
 
434
-If the `CAN_USE_HEAP` bit is set, we put `heap_end_ptr` into `dx` (which points to `_end`) and add `STACK_SIZE` (minimum stack size, `1024` bytes) to it. After this, if `dx` is not carried (it will not be carried, `dx = _end + 1024`), jump to label `2` (as in the previous case) and make a correct stack.
434
+If the `CAN_USE_HEAP` bit is set, we put `heap_end_ptr` into `dx` (which points to `_end`) and add `STACK_SIZE` (the minimum stack size, `1024` bytes) to it. After this, if `dx` is not carried (it will not be carried, `dx = _end + 1024`), jump to label `2` (as in the previous case) and make a correct stack.
435 435
 
436 436
 ![stack](http://oi62.tinypic.com/dr7b5w.jpg)
437 437
 

+ 101
- 81
Booting/linux-bootstrap-4.md View File

@@ -1,12 +1,12 @@
1 1
 Kernel booting process. Part 4.
2 2
 ================================================================================
3 3
 
4
-Transition to 64-bit mode
4
+The Transition to 64-bit mode
5 5
 --------------------------------------------------------------------------------
6 6
 
7
-This is the fourth part of the `Kernel booting process` where we will see first steps in [protected mode](http://en.wikipedia.org/wiki/Protected_mode), like checking that CPU supports [long mode](http://en.wikipedia.org/wiki/Long_mode) and [SSE](http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), [paging](http://en.wikipedia.org/wiki/Paging), initializes the page tables and at the end we will discuss the transition to [long mode](https://en.wikipedia.org/wiki/Long_mode).
7
+This is the fourth part of the `Kernel booting process`. Here, we will learn about the first steps taken in [protected mode](http://en.wikipedia.org/wiki/Protected_mode), like checking if the CPU supports [long mode](http://en.wikipedia.org/wiki/Long_mode) and [SSE](http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions). We will initialize the page tables with [paging](http://en.wikipedia.org/wiki/Paging) and, at the end, transition the CPU to [long mode](https://en.wikipedia.org/wiki/Long_mode).
8 8
 
9
-**NOTE: there will be much assembly code in this part, so if you are not familiar with that, you might want to consult a book about it**
9
+**NOTE: there will be lots of assembly code in this part, so if you are not familiar with that, you might want to consult a book about it**
10 10
 
11 11
 In the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md) we stopped at the jump to the `32-bit` entry point in [arch/x86/boot/pmjump.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S):
12 12
 
@@ -14,13 +14,13 @@ In the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/
14 14
 jmpl	*%eax
15 15
 ```
16 16
 
17
-You will recall that `eax` register contains the address of the 32-bit entry point. We can read about this in the [linux kernel x86 boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt):
17
+You will recall that the `eax` register contains the address of the 32-bit entry point. We can read about this in the [linux kernel x86 boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt):
18 18
 
19 19
 ```
20 20
 When using bzImage, the protected-mode kernel was relocated to 0x100000
21 21
 ```
22 22
 
23
-Let's make sure that it is true by looking at the register values at the 32-bit entry point:
23
+Let's make sure that this is so by looking at the register values at the 32-bit entry point:
24 24
 
25 25
 ```
26 26
 eax            0x100000	1048576
@@ -41,14 +41,14 @@ fs             0x18	24
41 41
 gs             0x18	24
42 42
 ```
43 43
 
44
-We can see here that `cs` register contains - `0x10` (as you may remember from the [previous part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), this is the second index in the `Global Descriptor Table`), `eip` register contains `0x100000` and the base address of all segments including the code segment are zero.
44
+We can see here that the `cs` register contains a value of `0x10` (as you maight recall from the [previous part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), this is the second index in the `Global Descriptor Table`), the `eip` register contains the value `0x100000` and the base address of all segments including the code segment are zero.
45 45
 
46
-So we can get the physical address, it will be `0:0x100000` or just `0x100000`, as specified by the boot protocol. Now let's start with the `32-bit` entry point.
46
+So, the physical address where the kernel is loaded would be `0:0x100000` or just `0x100000`, as specified by the boot protocol. Now let's start with the `32-bit` entry point.
47 47
 
48
-32-bit entry point
48
+The 32-bit entry point
49 49
 --------------------------------------------------------------------------------
50 50
 
51
-We can find the definition of the `32-bit` entry point in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file:
51
+The `32-bit` entry point is defined in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file:
52 52
 
53 53
 ```assembly
54 54
 	__HEAD
@@ -60,14 +60,14 @@ ENTRY(startup_32)
60 60
 ENDPROC(startup_32)
61 61
 ```
62 62
 
63
-First of all, why the directory is named `compressed`? Actually `bzimage` is a gzipped `vmlinux + header + kernel setup code`. We saw the kernel setup code in all of the previous parts. So, the main goal of the `head_64.S` is to prepare for entering long mode, enter into it and then decompress the kernel. We will see all of the steps up to kernel decompression in this part.
63
+First, why is the directory named `compressed`? The answer to that is that `bzimage` is a gzipped package consisting of `vmlinux`,   `header` and ` kernel setup code`. We looked at kernel setup code in all of the previous parts. The main goal of the code in `head_64.S` is to prepare to enter long mode, enter it and then decompress the kernel. We will look at all of the steps leading to kernel decompression in this part.
64 64
 
65
-You may find two files in the `arch/x86/boot/compressed` directory:
65
+You will find two files in the `arch/x86/boot/compressed` directory:
66 66
 
67 67
 * [head_32.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_32.S)
68 68
 * [head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S)
69 69
 
70
-but we will consider only `head_64.S` source code file because, as you may remember, this book is only `x86_64` related; Let's look at [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/Makefile). We can find the following `make` target here:
70
+but we will consider only the `head_64.S` source code file because, as you may remember, this book is only `x86_64` related; Let's look at [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/Makefile). We can find the following `make` target here:
71 71
 
72 72
 ```Makefile
73 73
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
@@ -75,7 +75,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
75 75
 	$(obj)/piggy.o $(obj)/cpuflags.o
76 76
 ```
77 77
 
78
-Take a look on the `$(obj)/head_$(BITS).o`.
78
+The first line contains this- `$(obj)/head_$(BITS).o`.
79 79
 
80 80
 This means that we will select which file to link based on what `$(BITS)` is set to, either `head_32.o` or `head_64.o`. The `$(BITS)` variable is defined elsewhere in [arch/x86/Makefile](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/Makefile) based on the kernel configuration:
81 81
 
@@ -91,12 +91,12 @@ else
91 91
 endif
92 92
 ```
93 93
 
94
-Now we know where to start, so let's do it.
94
+Now that we know where to start, let's get to it.
95 95
 
96 96
 Reload the segments if needed
97 97
 --------------------------------------------------------------------------------
98 98
 
99
-As indicated above, we start in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/boot/compressed/head_64.S) assembly source code file. First we see the definition of the special section attribute before the `startup_32` definition:
99
+As indicated above, we start in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/boot/compressed/head_64.S) assembly source code file. We first see the definition of a special section attribute before the definition of the `startup_32` function:
100 100
 
101 101
 ```assembly
102 102
     __HEAD
@@ -104,13 +104,13 @@ As indicated above, we start in the [arch/x86/boot/compressed/head_64.S](https:/
104 104
 ENTRY(startup_32)
105 105
 ```
106 106
 
107
-The `__HEAD` is macro which is defined in [include/linux/init.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/init.h) header file and expands to the definition of the following section:
107
+`__HEAD` is a macro defined in the [include/linux/init.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/init.h) header file and expands to the definition of the following section:
108 108
 
109 109
 ```C
110 110
 #define __HEAD		.section	".head.text","ax"
111 111
 ```
112 112
 
113
-with `.head.text` name and `ax` flags. In our case, these flags show us that this section is [executable](https://en.wikipedia.org/wiki/Executable) or in other words contains code. We can find definition of this section in the [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/boot/compressed/vmlinux.lds.S) linker script:
113
+Here, `.head.text` is the name of the section and `ax` is a set of flags. In our case, these flags show us that this section is [executable](https://en.wikipedia.org/wiki/Executable) or in other words contains code. We can find the definition of this section in the [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/boot/compressed/vmlinux.lds.S) linker script:
114 114
 
115 115
 ```
116 116
 SECTIONS
@@ -127,17 +127,17 @@ SECTIONS
127 127
 }
128 128
 ```
129 129
 
130
-If you are not familiar with the syntax of `GNU LD` linker scripting language, you can find more information in the [documentation](https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts). In short, the `.` symbol is a special variable of linker - location counter. The value assigned to it is an offset relative to the segment. In our case, we assign zero to location counter. This means that our code is linked to run from the `0` offset in memory. Moreover, we can find this information in comments:
130
+If you are not familiar with the syntax of the `GNU LD` linker scripting language, you can find more information in its [documentation](https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts). In short, the `.` symbol is a special linker variable, the location counter. The value assigned to it is an offset relative to the segment. In our case, we set the location counter to zero. This means that our code is linked to run from an offset of `0` in memory. This is also stated in the comments:
131 131
 
132 132
 ```
133 133
 Be careful parts of head_64.S assume startup_32 is at address 0.
134 134
 ```
135 135
 
136
-Ok, now we know where we are, and now is the best time to look inside the `startup_32` function.
136
+Now that we have our bearings, let's look at the contents of the `startup_32` function.
137 137
 
138
-In the beginning of the `startup_32` function, we can see the `cld` instruction which clears the `DF` bit in the [flags](https://en.wikipedia.org/wiki/FLAGS_register) register. When direction flag is clear, all string operations like [stos](http://x86.renejeschke.de/html/file_module_x86_id_306.html), [scas](http://x86.renejeschke.de/html/file_module_x86_id_287.html) and others will increment the index registers `esi` or `edi`. We need to clear direction flag because later we will use strings operations for clearing space for page tables, etc.
138
+In the beginning of the `startup_32` function, we can see the `cld` instruction which clears the `DF` bit in the [flags](https://en.wikipedia.org/wiki/FLAGS_register) register. When the direction flag is clear, all string operations like [stos](http://x86.renejeschke.de/html/file_module_x86_id_306.html), [scas](http://x86.renejeschke.de/html/file_module_x86_id_287.html) and others will increment the index registers `esi` or `edi`. We need to clear the direction flag because later we will use strings operations to perform various operations such as clearing space for page tables.
139 139
 
140
-After we have cleared the `DF` bit, next step is the check of the `KEEP_SEGMENTS` flag from `loadflags` kernel setup header field. If you remember we already saw `loadflags` in the very first [part](https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html) of this book. There we checked `CAN_USE_HEAP` flag to get ability to use heap. Now we need to check the `KEEP_SEGMENTS` flag. This flag is described in the linux [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) documentation:
140
+After we have cleared the `DF` bit, the next step is to check the `KEEP_SEGMENTS` flag in the `loadflags` kernel setup header field. If you remember, we already talked about `loadflags` in the very first [part](https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html) of this book. There we checked the `CAN_USE_HEAP` flag to query the ability to use the heap. Now we need to check the `KEEP_SEGMENTS` flag. This flag is described in the linux [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) documentation:
141 141
 
142 142
 ```
143 143
 Bit 6 (write): KEEP_SEGMENTS
@@ -148,7 +148,7 @@ Bit 6 (write): KEEP_SEGMENTS
148 148
 		a base of 0 (or the equivalent for their environment).
149 149
 ```
150 150
 
151
-So, if the `KEEP_SEGMENTS` bit is not set in the `loadflags`, we need to set `ds`, `ss` and `es` segment registers to the index of data segment with base `0`. That we do:
151
+So, if the `KEEP_SEGMENTS` bit is not set in `loadflags`, we need to set the `ds`, `ss` and `es` segment registers to the index of the data segment with a base of `0`. That we do:
152 152
 
153 153
 ```C
154 154
 	testb $KEEP_SEGMENTS, BP_loadflags(%esi)
@@ -161,9 +161,9 @@ So, if the `KEEP_SEGMENTS` bit is not set in the `loadflags`, we need to set `ds
161 161
 	movl	%eax, %ss
162 162
 ```
163 163
 
164
-Remember that the `__BOOT_DS` is `0x18` (index of data segment in the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table)). If `KEEP_SEGMENTS` is set, we jump to the nearest `1f` label or update segment registers with `__BOOT_DS` if it is not set. It is pretty easy, but here is one interesting moment. If you've read the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), you may remember that we already updated these segment registers right after we switched to [protected mode](https://en.wikipedia.org/wiki/Protected_mode) in [arch/x86/boot/pmjump.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S). So why do we need to care about values of segment registers again? The answer is easy. The Linux kernel also has a 32-bit boot protocol and if a bootloader uses it to load the Linux kernel all code before the `startup_32` will be missed. In this case, the `startup_32` will be the first entry point of the Linux kernel right after the bootloader and there are no guarantees that segment registers will be in known state.
164
+Remember that `__BOOT_DS` is `0x18` (the index of the data segment in the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table)). If `KEEP_SEGMENTS` is set, we jump to the nearest `1f` label or update segment registers with `__BOOT_DS` if they are not set. This is all pretty easy, but here's something to consider. If you've read the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), you may remember that we already updated these segment registers right after we switched to [protected mode](https://en.wikipedia.org/wiki/Protected_mode) in [arch/x86/boot/pmjump.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S). So why do we need to care about the values in the segment registers again? The answer is easy. The Linux kernel also has a 32-bit boot protocol and if a bootloader uses *that* to load the Linux kernel, all the code before the `startup_32` function will be missed. In this case, the `startup_32` function would be the first entry point to the Linux kernel right after the bootloader and there are no guarantees that the segment registers will be in a known state.
165 165
 
166
-After we have checked the `KEEP_SEGMENTS` flag and put the correct value to the segment registers, the next step is to calculate the difference between where we loaded and compiled to run. Remember that `setup.ld.S` contains following definition: `. = 0` at the start of the `.head.text` section. This means that the code in this section is compiled to run from `0` address. We can see this in `objdump` output:
166
+After we have checked the `KEEP_SEGMENTS` flag and set the segment registers to a correct value, the next step is to calculate the difference between where the kernel is compiled to run, and where we loaded it. Remember that `setup.ld.S` contains the following definition: `. = 0` at the start of the `.head.text` section. This means that the code in this section is compiled to run at the address `0`. We can see this in the output of `objdump`:
167 167
 
168 168
 ```
169 169
 arch/x86/boot/compressed/vmlinux:     file format elf64-x86-64
@@ -176,14 +176,14 @@ Disassembly of section .head.text:
176 176
    1:   f6 86 11 02 00 00 40    testb  $0x40,0x211(%rsi)
177 177
 ```
178 178
 
179
-The `objdump` util tells us that the address of the `startup_32` is `0` but actually it's not so. Our current goal is to know where actually we are. It is pretty simple to do in [long mode](https://en.wikipedia.org/wiki/Long_mode) because it support `rip` relative addressing, but currently we are in [protected mode](https://en.wikipedia.org/wiki/Protected_mode). We will use common pattern to know the address of the `startup_32`. We need to define a label and make a call to this label and pop the top of the stack to a register:
179
+The `objdump` util tells us that the address of the `startup_32` function is `0` but that isn't so. We now need to know where we actually are. This is pretty simple to do in [long mode](https://en.wikipedia.org/wiki/Long_mode) because it supports `rip` relative addressing, but currently we are in [protected mode](https://en.wikipedia.org/wiki/Protected_mode). We will use a common pattern to find the address of the `startup_32` function. We need to define a label, make a call to it and pop the top of the stack to a register:
180 180
 
181 181
 ```assembly
182 182
 call label
183 183
 label: pop %reg
184 184
 ```
185 185
 
186
-After this, a `%reg` register will contain the address of a label. Let's look at the similar code which searches address of the `startup_32` in the Linux kernel:
186
+After this, the register indicated by `%reg` will contain the address of `label`. Let's look at the code which uses this pattern to search for the `startup_32` function in the Linux kernel:
187 187
 
188 188
 ```assembly
189 189
         leal	(BP_scratch+4)(%esi), %esp
@@ -192,7 +192,7 @@ After this, a `%reg` register will contain the address of a label. Let's look at
192 192
         subl	$1b, %ebp
193 193
 ```
194 194
 
195
-As you remember from the previous part, the `esi` register contains the address of the [boot_params](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/uapi/asm/bootparam.h#L113) structure which was filled before we moved to the protected mode. The `boot_params` structure contains a special field `scratch` with offset `0x1e4`. These four bytes field will be temporary stack for `call` instruction. We are getting the address of the `scratch` field + `4` bytes and putting it in the `esp` register. We add `4` bytes to the base of the `BP_scratch` field because, as just described, it will be a temporary stack and the stack grows from top to down in `x86_64` architecture. So our stack pointer will point to the top of the stack. Next, we can see the pattern that I've described above. We make a call to the `1f` label and put the address of this label to the `ebp` register because we have return address on the top of stack after the `call` instruction will be executed. So, for now we have an address of the `1f` label and now it is easy to get address of the `startup_32`. We just need to subtract address of label from the address which we got from the stack:
195
+As you remember from the previous part, the `esi` register contains the address of the [boot_params](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/uapi/asm/bootparam.h#L113) structure which was filled before we moved to the protected mode. The `boot_params` structure contains a special field `scratch` with an offset of `0x1e4`. This four byte field is a temporary stack for the `call` instruction. We set `esp` to the address four bytes after the `BP_scratch` field of the `boot_params` structure. We add `4` bytes to the base of the `BP_scratch` field because, as just described, it will be a temporary stack and the stack grows from the top to bottom in the `x86_64` architecture. So our stack pointer will point to the top of the temporary stack. Next, we can see the pattern that I've described above. We make a call to the `1f` label and pop the top of the stack onto `ebp`. This works because `call` stores the return address of the current function on the top of the stack. We now have the address of the `1f` label and can now easily get the address of the `startup_32` function. We just need to subtract the address of the label from the address we got from the stack:
196 196
 
197 197
 ```
198 198
 startup_32 (0x0)     +-----------------------+
@@ -210,7 +210,7 @@ startup_32 (0x0)     +-----------------------+
210 210
                      +-----------------------+
211 211
 ```
212 212
 
213
-The `startup_32` is linked to run at address `0x0` and this means that `1f` has the address `0x0 + offset to 1f`, approximately `0x21` bytes. The `ebp` register contains the real physical address of the `1f` label. So, if we subtract `1f` from the `ebp` we will get the real physical address of the `startup_32`. The Linux kernel [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) describes that the base of the protected mode kernel is `0x100000`. We can verify this with [gdb](https://en.wikipedia.org/wiki/GNU_Debugger). Let's start the debugger and put breakpoint to the `1f` address, which is `0x100021`. If this is correct we will see `0x100021` in the `ebp` register:
213
+The `startup_32` function is linked to run at the address `0x0` and this means that `1f` has the address `0x0 + offset to 1f`, which is approximately `0x21` bytes. The `ebp` register contains the real physical address of the `1f` label. So, if we subtract `1f` from the `ebp` register, we will get the real physical address of the `startup_32` function. The Linux kernel [boot protocol](https://www.kernel.org/doc/Documentation/x86/boot.txt) saysthe base of the protected mode kernel is `0x100000`. We can verify this with [gdb](https://en.wikipedia.org/wiki/GNU_Debugger). Let's start the debugger and add a breakpoint at the address of `1f`, which is `0x100021`. If this is correct we will see the value `0x100021` in the `ebp` register:
214 214
 
215 215
 ```
216 216
 $ gdb
@@ -255,12 +255,12 @@ ebp            0x100000	0x100000
255 255
 ...
256 256
 ```
257 257
 
258
-Ok, that's true. The address of the `startup_32` is `0x100000`. After we know the address of the `startup_32` label, we can prepare for the transition to [long mode](https://en.wikipedia.org/wiki/Long_mode). Our next goal is to setup the stack and verify that the CPU supports long mode and [SSE](http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions).
258
+Ok, we've verified that the address of the `startup_32` function is `0x100000`. After we know the address of the `startup_32` label, we can prepare for the transition to [long mode](https://en.wikipedia.org/wiki/Long_mode). Our next goal is to setup the stack and verify that the CPU supports long mode and [SSE](http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions).
259 259
 
260 260
 Stack setup and CPU verification
261 261
 --------------------------------------------------------------------------------
262 262
 
263
-We could not setup the stack while we did not know the address of the `startup_32` label. We can imagine the stack as an array and the stack pointer register `esp` must point to the end of this array. Of course, we can define an array in our code, but we need to know its actual address to configure the stack pointer in a correct way. Let's look at the code:
263
+We can't set up the stack until we know where in memory the `startup_32` label is. If we imagine the stack as an array, the stack pointer register `esp` must point to the end of it. Of course, we can define an array in our code, but we need to know its actual address to configure the stack pointer correctly. Let's look at the code:
264 264
 
265 265
 ```assembly
266 266
 	movl	$boot_stack_end, %eax
@@ -268,7 +268,7 @@ We could not setup the stack while we did not know the address of the `startup_3
268 268
 	movl	%eax, %esp
269 269
 ```
270 270
 
271
-The `boot_stack_end` label, defined in the same [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file and located in the [.bss](https://en.wikipedia.org/wiki/.bss) section:
271
+The `boot_stack_end` label is also defined in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file  and is located in the [.bss](https://en.wikipedia.org/wiki/.bss) section:
272 272
 
273 273
 ```assembly
274 274
 	.bss
@@ -280,9 +280,9 @@ boot_stack:
280 280
 boot_stack_end:
281 281
 ```
282 282
 
283
-First of all, we put the address of `boot_stack_end` into the `eax` register, so the `eax` register contains the address of `boot_stack_end` where it was linked, which is `0x0 + boot_stack_end`. To get the real address of `boot_stack_end`, we need to add the real address of the `startup_32`. As you remember, we have found this address above and put it to the `ebp` register. In the end, the register `eax` will contain real address of the `boot_stack_end` and we just need to put to the stack pointer.
283
+First of all, we put the address of `boot_stack_end` into the `eax` register, so the `eax` register contains the address of `boot_stack_end` as it was linked, which is `0x0 + boot_stack_end`. To get the real address of `boot_stack_end`, we need to add the real address of the `startup_32` function. We've already found this address and put it into the `ebp` register. In the end, the  `eax` register will contain the real address of `boot_stack_end` and we just need to set the stack pointer to it.
284 284
 
285
-After we have set up the stack, next step is CPU verification. As we are going to execute transition to the `long mode`, we need to check that the CPU supports `long mode` and `SSE`. We will do it by the call of the `verify_cpu` function:
285
+After we have set up the stack, the next step is CPU verification. Since we are transitioning to `long mode`, we need to check that the CPU supports `long mode` and `SSE`. We will do this with a call to the `verify_cpu` function:
286 286
 
287 287
 ```assembly
288 288
 	call	verify_cpu
@@ -290,9 +290,9 @@ After we have set up the stack, next step is CPU verification. As we are going t
290 290
 	jnz	no_longmode
291 291
 ```
292 292
 
293
-This function defined in the [arch/x86/kernel/verify_cpu.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/kernel/verify_cpu.S) assembly file and just contains a couple of calls to the [cpuid](https://en.wikipedia.org/wiki/CPUID) instruction. This instruction is used for getting information about the processor. In our case, it checks `long mode` and `SSE` support and returns `0` on success or `1` on fail in the `eax` register.
293
+This function is defined in the [arch/x86/kernel/verify_cpu.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/kernel/verify_cpu.S) assembly file and just contains a couple of calls to the [cpuid](https://en.wikipedia.org/wiki/CPUID) instruction. This instruction is used to get information about the processor. In our case, it checks for `long mode` and `SSE` support and sets the `eax` register to `0` on success and `1` on failure.
294 294
 
295
-If the value of the `eax` is not zero, we jump to the `no_longmode` label which just stops the CPU by the call of the `hlt` instruction while no hardware interrupt will not happen:
295
+If the value of `eax` is not zero, we jump to the `no_longmode` label which just stops the CPU with the `hlt` instruction while no hardware interrupt can happen:
296 296
 
297 297
 ```assembly
298 298
 no_longmode:
@@ -301,12 +301,12 @@ no_longmode:
301 301
 	jmp     1b
302 302
 ```
303 303
 
304
-If the value of the `eax` register is zero, everything is ok and we are able to continue.
304
+If the value of the `eax` register is zero, everything is ok and we can continue.
305 305
 
306
-Calculate relocation address
306
+Calculate the relocation address
307 307
 --------------------------------------------------------------------------------
308 308
 
309
-The next step is calculating relocation address for decompression if needed. First, we need to know what it means for a kernel to be `relocatable`. We already know that the base address of the 32-bit entry point of the Linux kernel is `0x100000`, but that is a 32-bit entry point. The default base address of the Linux kernel is determined by the value of the `CONFIG_PHYSICAL_START` kernel configuration option. Its default value is `0x1000000` or `16 MB`. The main problem here is that if the Linux kernel crashes, a kernel developer must have a `rescue kernel` for [kdump](https://www.kernel.org/doc/Documentation/kdump/kdump.txt) which is configured to load from a different address. The Linux kernel provides special configuration option to solve this problem: `CONFIG_RELOCATABLE`. As we can read in the documentation of the Linux kernel:
309
+The next step is to calculate the relocation address for decompression if needed. First, we need to know what it means for a kernel to be `relocatable`. We already know that the base address of the 32-bit entry point of the Linux kernel is `0x100000`, but that is a 32-bit entry point. The default base address of the Linux kernel is determined by the value of the `CONFIG_PHYSICAL_START` kernel configuration option. Its default value is `0x1000000` or `16 MB`. The main problem here is that if the Linux kernel crashes, a kernel developer must have a `rescue kernel` for [kdump](https://www.kernel.org/doc/Documentation/kdump/kdump.txt) which is configured to load from a different address. The Linux kernel provides a special configuration option to solve this problem: `CONFIG_RELOCATABLE`. As we can read in the documentation of the Linux kernel:
310 310
 
311 311
 ```
312 312
 This builds a kernel image that retains relocation information
@@ -317,13 +317,34 @@ it has been loaded at and the compile time physical address
317 317
 (CONFIG_PHYSICAL_START) is used as the minimum location.
318 318
 ```
319 319
 
320
-In simple terms, this means that the Linux kernel with the same configuration can be booted from different addresses. Technically, this is done by compiling the decompressor as [position independent code](https://en.wikipedia.org/wiki/Position-independent_code). If we look at [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/Makefile), we will see that the decompressor is indeed compiled with the `-fPIC` flag:
320
+Now that we know where to start, let's get to it.
321
+
322
+Reload the segments if needed
323
+--------------------------------------------------------------------------------
324
+
325
+As indicated above, we start in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/boot/compressed/head_64.S) assembly source code file. We first see the definition of a special section attribute before the definition of the `startup_32` function:
326
+
327
+```assembly
328
+    __HEAD
329
+    .code32
330
+ENTRY(startup_32)
331
+```
332
+
333
+`__HEAD` is a macro defined in the [include/linux/init.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/init.h) header file and expands to the definition of the following section:
334
+
335
+```C
336
+#define __HEAD		.section	".head.text","ax"
337
+```
338
+
339
+Here, `.head.text` is the name of the section and `ax` is a set of flags. In our case, these flags show us that this section is [executable](https://en.wikipedia.org/wiki/Executable
340
+
341
+In simple terms, this means that a Linux kernel with this option set can be booted from different addresses. Technically, this is done by compiling the decompressor as [position independent code](https://en.wikipedia.org/wiki/Position-independent_code). If we look at [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/Makefile), we can see that the decompressor is indeed compiled with the `-fPIC` flag:
321 342
 
322 343
 ```Makefile
323 344
 KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
324 345
 ```
325 346
 
326
-When we are using position-independent code an address is obtained by adding the address field of the instruction to the value of the program counter. We can load code which uses such addressing from any address. That's why we had to get the real physical address of `startup_32`. Now let's get back to the Linux kernel code. Our current goal is to calculate an address where we can relocate the kernel for decompression. Calculation of this address depends on `CONFIG_RELOCATABLE` kernel configuration option. Let's look at the code:
347
+When we are using position-independent code an address is obtained by adding the address field of the instruction to the value of the program counter. We can load code which uses such addressing from any address. That's why we had to get the real physical address of `startup_32`. Now let's get back to the Linux kernel code. Our current goal is to calculate an address where we can relocate the kernel for decompression. The calculation of this address depends on the `CONFIG_RELOCATABLE` kernel configuration option. Let's look at the code:
327 348
 
328 349
 ```assembly
329 350
 #ifdef CONFIG_RELOCATABLE
@@ -339,7 +360,7 @@ When we are using position-independent code an address is obtained by adding the
339 360
 	movl	$LOAD_PHYSICAL_ADDR, %ebx
340 361
 ```
341 362
 
342
-Remember that the value of the `ebp` register is the physical address of the `startup_32` label. If the `CONFIG_RELOCATABLE` kernel configuration option is enabled during kernel configuration, we put this address in the `ebx` register, align it to a multiple of `2MB` and compare it with the `LOAD_PHYSICAL_ADDR` value. The `LOAD_PHYSICAL_ADDR` macro is defined in the [arch/x86/include/asm/boot.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/boot.h) header file and it looks like this:
363
+Remember that the value of the `ebp` register is the physical address of the `startup_32` label. If the `CONFIG_RELOCATABLE` kernel configuration option is enabled during kernel configuration, we put this address in the `ebx` register, align it to a multiple of `2MB` and compare it with the result of the `LOAD_PHYSICAL_ADDR` macro. `LOAD_PHYSICAL_ADDR` is defined in the [arch/x86/include/asm/boot.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/boot.h) header file and it looks like this:
343 364
 
344 365
 ```C
345 366
 #define LOAD_PHYSICAL_ADDR ((CONFIG_PHYSICAL_START \
@@ -347,9 +368,9 @@ Remember that the value of the `ebp` register is the physical address of the `st
347 368
 				& ~(CONFIG_PHYSICAL_ALIGN - 1))
348 369
 ```
349 370
 
350
-As we can see it just expands to the aligned `CONFIG_PHYSICAL_ALIGN` value which represents the physical address of where to load the kernel. After comparison of the `LOAD_PHYSICAL_ADDR` and value of the `ebx` register, we add the offset from the `startup_32` where to decompress the compressed kernel image. If the `CONFIG_RELOCATABLE` option is not enabled during kernel configuration, we just put the default address where to load kernel and add `z_extract_offset` to it.
371
+As we can see it just expands to the aligned `CONFIG_PHYSICAL_ALIGN` value which represents the physical address where the kernel will be loaded. After comparing `LOAD_PHYSICAL_ADDR` and the value of the `ebx` register, we add the offset from `startup_32` where we will decompress the compressed kernel image. If the `CONFIG_RELOCATABLE` option is not enabled during kernel configuration, we just add `z_extract_offset` to the default address where the kernel is loaded.
351 372
 
352
-After all of these calculations, we will have `ebp` which contains the address where we loaded it and `ebx` set to the address of where kernel will be moved after decompression. But that is not the end. The compressed kernel image should be moved to the end of the decompression buffer to simplify calculations where kernel will be located later. For this:
373
+After all of these calculations, `ebp` will contain the address where we loaded the kernel and `ebx` will contain the address where the decompressed kernel will be relocated. But that is not the end. The compressed kernel image should be moved to the end of the decompression buffer to simplify calculations regarding where the kernel will be located later. For this:
353 374
 
354 375
 ```assembly
355 376
 1:
@@ -358,19 +379,19 @@ After all of these calculations, we will have `ebp` which contains the address w
358 379
     addl	%eax, %ebx
359 380
 ```
360 381
 
361
-we put value from the `boot_params.BP_init_size` (or kernel setup header value from the `hdr.init_size`) to the `eax` register. The `BP_init_size` contains larger value between compressed and uncompressed [vmlinux](https://en.wikipedia.org/wiki/Vmlinux). Next we subtract address of the `_end` symbol from this value and add the result of subtraction to `ebx` register which will stores base address for kernel decompression.
382
+we put the value from the `boot_params.BP_init_size` field (or the kernel setup header value from `hdr.init_size`) in the `eax` register. The `BP_init_size` field contains the larger of the compressed and uncompressed [vmlinux](https://en.wikipedia.org/wiki/Vmlinux) sizes. Next we subtract the address of the `_end` symbol from this value and add the result of the subtraction to the `ebx` register which will store the base address for kernel decompression.
362 383
 
363 384
 Preparation before entering long mode
364 385
 --------------------------------------------------------------------------------
365 386
 
366
-When we have the base address where we will relocate the compressed kernel image, we need to do one last step before we can transition to 64-bit mode. First, we need to update the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table) with 64-bit segments because an relocatable kernel may be runned at any address below 512G:
387
+After we get the address to relocate the compressed kernel image to, we need to do one last step before we can transition to 64-bit mode. First, we need to update the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table) with 64-bit segments because a relocatable kernel is runnable at any address below 512GB:
367 388
 
368 389
 ```assembly
369 390
 	addl	%ebp, gdt+2(%ebp)
370 391
 	lgdt	gdt(%ebp)
371 392
 ```
372 393
 
373
-Here we adjust base address of the Global Descriptor table to the address where we actually loaded and load the `Global Descriptor Table` with the `lgdt` instruction.
394
+Here we adjust the base address of the Global Descriptor table to the address where we actually loaded the kernel and load the `Global Descriptor Table` with the `lgdt` instruction.
374 395
 
375 396
 To understand the magic with `gdt` offsets we need to look at the definition of the `Global Descriptor Table`. We can find its definition in the same source code [file](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S):
376 397
 
@@ -393,11 +414,11 @@ gdt:
393 414
 gdt_end:
394 415
 ```
395 416
 
396
-We can see that it is located in the `.data` section and contains five descriptors: the first is `32-bit` descriptor for kernel code segment, `64-bit` kernel segment, kernel data segment and two task descriptors.
417
+We can see that it is located in the `.data` section and contains five descriptors: the first is a `32-bit` descriptor for the kernel code segment, a `64-bit` kernel segment, a kernel data segment and two task descriptors.
397 418
 
398
-We already loaded the `Global Descriptor Table` in the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), and now we're doing almost the same here, but descriptors with `CS.L = 1` and `CS.D = 0` for execution in `64` bit mode. As we can see, the definition of the `gdt` starts from two bytes: `gdt_end - gdt` which represents the last byte in the `gdt` table or table limit. The next four bytes contains base address of the `gdt`.
419
+We already loaded the `Global Descriptor Table` in the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-3.md), and now we're doing almost the same here, but we set descriptors to use `CS.L = 1` and `CS.D = 0` for execution in `64` bit mode. As we can see, the definition of the `gdt` starts with a two byte value: `gdt_end - gdt` which represents the address of the last byte in the `gdt` table or the table limit. The next four bytes contain the base address of the `gdt`.
399 420
 
400
-After we have loaded the `Global Descriptor Table` with `lgdt` instruction, we must enable [PAE](http://en.wikipedia.org/wiki/Physical_Address_Extension) by putting the value of `cr4` register into `eax`, setting the 5th bit and loading it back into `cr4`:
421
+After we have loaded the `Global Descriptor Table` with the `lgdt` instruction, we must enable [PAE](http://en.wikipedia.org/wiki/Physical_Address_Extension) by putting the value of the `cr4` register into `eax`, setting the 5th bit and loading it back into `cr4`:
401 422
 
402 423
 ```assembly
403 424
 	movl	%cr4, %eax
@@ -405,41 +426,42 @@ After we have loaded the `Global Descriptor Table` with `lgdt` instruction, we m
405 426
 	movl	%eax, %cr4
406 427
 ```
407 428
 
408
-Now we are almost finished with all preparations before we can move into 64-bit mode. The last step is to build page tables, but before that, here is some information about long mode.
429
+Now we are almost finished with the preparations needed to move into 64-bit mode. The last step is to build page tables, but before that, here is some information about long mode.
409 430
 
410 431
 Long mode
411 432
 --------------------------------------------------------------------------------
412 433
 
413
-The [Long mode](https://en.wikipedia.org/wiki/Long_mode) is the native mode for [x86_64](https://en.wikipedia.org/wiki/X86-64) processors. First, let's look at some differences between `x86_64` and the `x86`.
434
+[Long mode](https://en.wikipedia.org/wiki/Long_mode) is the native mode for [x86_64](https://en.wikipedia.org/wiki/X86-64) processors. First, let's look at some differences between `x86_64` and `x86`.
414 435
 
415
-The `64-bit` mode provides features such as:
436
+`64-bit` mode provides the following features:
416 437
 
417
-* New 8 general purpose registers from `r8` to `r15` + all general purpose registers are 64-bit now;
418
-* 64-bit instruction pointer - `RIP`;
419
-* New operating mode - Long mode;
438
+* 8 new general purpose registers from `r8` to `r15`
439
+* All general purpose registers are 64-bit now
440
+* A 64-bit instruction pointer - `RIP`
441
+* A new operating mode - Long mode;
420 442
 * 64-Bit Addresses and Operands;
421
-* RIP Relative Addressing (we will see an example of it in the next parts).
443
+* RIP Relative Addressing (we will see an example of this in the coming parts).
422 444
 
423
-Long mode is an extension of legacy protected mode. It consists of two sub-modes:
445
+Long mode is an extension of the legacy protected mode. It consists of two sub-modes:
424 446
 
425 447
 * 64-bit mode;
426 448
 * compatibility mode.
427 449
 
428
-To switch into `64-bit` mode we need to do following things:
450
+To switch into `64-bit` mode we need to do the following things:
429 451
 
430 452
 * Enable [PAE](https://en.wikipedia.org/wiki/Physical_Address_Extension);
431 453
 * Build page tables and load the address of the top level page table into the `cr3` register;
432 454
 * Enable `EFER.LME`;
433 455
 * Enable paging.
434 456
 
435
-We already enabled `PAE` by setting the `PAE` bit in the `cr4` control register. Our next goal is to build the structure for [paging](https://en.wikipedia.org/wiki/Paging). We will see this in next paragraph.
457
+We already enabled `PAE` by setting the `PAE` bit in the `cr4` control register. Our next goal is to build the structure for [paging](https://en.wikipedia.org/wiki/Paging). We will discuss this in the next paragraph.
436 458
 
437 459
 Early page table initialization
438 460
 --------------------------------------------------------------------------------
439 461
 
440
-So, we already know that before we can move into `64-bit` mode, we need to build page tables, so, let's look at the building of early `4G` boot page tables.
462
+We already know that before we can move into `64-bit` mode, we need to build page tables. Let's look at how the early `4G` boot page tables are built.
441 463
 
442
-**NOTE: I will not describe the theory of virtual memory here. If you need to know more about it, see links at the end of this part.**
464
+**NOTE: I will not describe the theory of virtual memory here. If you want to know more about virtual memory, check out the links at the end of this part.**
443 465
 
444 466
 The Linux kernel uses `4-level` paging, and we generally build 6 page tables:
445 467
 
@@ -447,7 +469,7 @@ The Linux kernel uses `4-level` paging, and we generally build 6 page tables:
447 469
 * One `PDP` or `Page Directory Pointer` table with four entries;
448 470
 * Four Page Directory tables with a total of `2048` entries.
449 471
 
450
-Let's look at the implementation of this. First of all, we clear the buffer for the page tables in memory. Every table is `4096` bytes, so we need clear `24` kilobyte buffer:
472
+Let's look at how this is implemented. First, we clear the buffer for the page tables in memory. Every table is `4096` bytes, so we need clear a `24` kilobyte buffer:
451 473
 
452 474
 ```assembly
453 475
 	leal	pgtable(%ebx), %edi
@@ -456,11 +478,11 @@ Let's look at the implementation of this. First of all, we clear the buffer for
456 478
 	rep	stosl
457 479
 ```
458 480
 
459
-We put the address of `pgtable` plus `ebx` (remember that `ebx` contains the address to relocate the kernel for decompression) in the `edi` register, clear the `eax` register and set the `ecx` register to `6144`.
481
+We put the address of `pgtable` with an offset of `ebx` (remember that `ebx` points to the location in memory where the kernel will be decompressed later) into the `edi` register, clear the `eax` register and set the `ecx` register to `6144`.
460 482
 
461
-The `rep stosl` instruction will write the value of the `eax` to `edi`, increase value of the `edi` register by `4` and decrease the value of the `ecx` register by `1`. This operation will be repeated while the value of the `ecx` register is greater than zero. That's why we put `6144` or `BOOT_INIT_PGT_SIZE/4` in `ecx`.
483
+The `rep stosl` instruction will write the value of `eax` to `edi`, add `4` to `edi` and decrement `ecx` by `1`. This operation will be repeated while the value of the `ecx` register is greater than zero. That's why we put `6144` or `BOOT_INIT_PGT_SIZE/4` in `ecx`.
462 484
 
463
-The `pgtable` is defined at the end of [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly file and is:
485
+`pgtable` is defined at the end of the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly file:
464 486
 
465 487
 ```assembly
466 488
 	.section ".pgtable","a",@nobits
@@ -482,7 +504,7 @@ As we can see, it is located in the `.pgtable` section and its size depends on t
482 504
 # endif
483 505
 ```
484 506
 
485
-After we have got buffer for the `pgtable` structure, we can start to build the top level page table - `PML4` - with:
507
+After we have a buffer for the `pgtable` structure, we can start to build the top level page table - `PML4` - with:
486 508
 
487 509
 ```assembly
488 510
 	leal	pgtable + 0(%ebx), %edi
@@ -490,7 +512,7 @@ After we have got buffer for the `pgtable` structure, we can start to build the
490 512
 	movl	%eax, 0(%edi)
491 513
 ```
492 514
 
493
-Here again, we put the address of the `pgtable` relative to `ebx` or in other words relative to address of the `startup_32` to the `edi` register. Next, we put this address with offset `0x1007` in the `eax` register. The `0x1007` is `4096` bytes which is the size of the `PML4` plus `7`. The `7` here represents flags of the `PML4` entry. In our case, these flags are `PRESENT+RW+USER`. In the end, we just write first the address of the first `PDP` entry to the `PML4`.
515
+Here again, we put the address of `pgtable` relative to `ebx` or in other words relative to address of `startup_32` in the `edi` register. Next, we put this address with an offset of `0x1007` into the `eax` register. `0x1007` is the result of adding the size of the `PML4` table which is `4096` or `0x1000` bytes with `7`. The `7` here represents the flags associated with the `PML4` entry. In our case, these flags are `PRESENT+RW+USER`. In the end, we just write the address of the first `PDP` entry to the `PML4` table.
494 516
 
495 517
 In the next step we will build four `Page Directory` entries in the `Page Directory Pointer` table with the same `PRESENT+RW+USE` flags:
496 518
 
@@ -505,7 +527,7 @@ In the next step we will build four `Page Directory` entries in the `Page Direct
505 527
 	jnz	1b
506 528
 ```
507 529
 
508
-We put the base address of the page directory pointer which is `4096` or `0x1000` offset from the `pgtable` table in `edi` and the address of the first page directory pointer entry in `eax` register. Put `4` in the `ecx` register, it will be a counter in the following loop and write the address of the first page directory pointer table entry to the `edi` register. After this `edi` will contain the address of the first page directory pointer entry with flags `0x7`. Next we just calculate the address of following page directory pointer entries where each entry is `8` bytes, and write their addresses to `eax`. The last step of building paging structure is the building of the `2048` page table entries with `2-MByte` pages:
530
+We set `edi` to the base address of the page directory pointer which is at an offset of `4096` or `0x1000` bytes from the `pgtable` table and `eax` to the address of the first page directory pointer entry. We also set `ecx` to `4` to act as a counter in the following loop and write the address of the first page directory pointer table entry to the `edi` register. After this, `edi` will contain the address of the first page directory pointer entry with flags `0x7`. Next we calculate the address of the following page directory pointer entries — each entry is `8` bytes — and write their addresses to `eax`. The last step in building the paging structure is to build the `2048` page table entries with `2-MByte` pages:
509 531
 
510 532
 ```assembly
511 533
 	leal	pgtable + 0x2000(%ebx), %edi
@@ -518,23 +540,23 @@ We put the base address of the page directory pointer which is `4096` or `0x1000
518 540
 	jnz	1b
519 541
 ```
520 542
 
521
-Here we do almost the same as in the previous example, all entries will be with flags - `$0x00000183` - `PRESENT + WRITE + MBZ`. In the end, we will have `2048` pages with `2-MByte` page or:
543
+Here we do almost the same things that we did in the previous example, all entries are associated with these flags - `$0x00000183` - `PRESENT + WRITE + MBZ`. In the end, we will have a page table with `2048` `2-MByte` pages, which represents a 4 Gigabyte block of memory:
522 544
 
523 545
 ```python
524 546
 >>> 2048 * 0x00200000
525 547
 4294967296
526 548
 ```
527 549
 
528
-`4G` page table. We just finished to build our early page table structure which maps `4` gigabytes of memory and now we can put the address of the high-level page table - `PML4` - in `cr3` control register:
550
+Since we've just finished building our early page table structure which maps `4` gigabytes of memory, we can put the address of the high-level page table - `PML4` - into the `cr3` control register:
529 551
 
530 552
 ```assembly
531 553
 	leal	pgtable(%ebx), %eax
532 554
 	movl	%eax, %cr3
533 555
 ```
534 556
 
535
-That's all. All preparation are finished and now we can see transition to the long mode.
557
+That's all. We are now prepared to transition to long mode.
536 558
 
537
-Transition to the 64-bit mode
559
+The transition to 64-bit mode
538 560
 --------------------------------------------------------------------------------
539 561
 
540 562
 First of all we need to set the `EFER.LME` flag in the [MSR](http://en.wikipedia.org/wiki/Model-specific_register) to `0xC0000080`:
@@ -546,7 +568,7 @@ First of all we need to set the `EFER.LME` flag in the [MSR](http://en.wikipedia
546 568
 	wrmsr
547 569
 ```
548 570
 
549
-Here we put the `MSR_EFER` flag (which is defined in [arch/x86/include/asm/msr-index.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/msr-index.h)) in the `ecx` register and call `rdmsr` instruction which reads the [MSR](http://en.wikipedia.org/wiki/Model-specific_register) register. After `rdmsr` executes, we will have the resulting data in `edx:eax` which depends on the `ecx` value. We check the `EFER_LME` bit with the `btsl` instruction and write data from `eax` to the `MSR` register with the `wrmsr` instruction.
571
+Here we put the `MSR_EFER` flag (which is defined in [arch/x86/include/asm/msr-index.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/msr-index.h)) in the `ecx` register and execute the `rdmsr` instruction which reads the [MSR](http://en.wikipedia.org/wiki/Model-specific_register) register. After `rdmsr` executes, the resulting data is stored in `edx:eax` according to the `MSR` register specified in `ecx`. We check the `EFER_LME` bit with the `btsl` instruction and write data from `edx:eax` back to the `MSR` register with the `wrmsr` instruction.
550 572
 
551 573
 In the next step, we push the address of the kernel segment code to the stack (we defined it in the GDT) and put the address of the `startup_64` routine in `eax`.
552 574
 
@@ -555,7 +577,7 @@ In the next step, we push the address of the kernel segment code to the stack (w
555 577
 	leal	startup_64(%ebp), %eax
556 578
 ```
557 579
 
558
-After this we push this address to the stack and enable paging by setting `PG` and `PE` bits in the `cr0` register:
580
+After this we push `eax` to the stack and enable paging by setting the `PG` and `PE` bits in the `cr0` register:
559 581
 
560 582
 ```assembly
561 583
 	pushl	%eax
@@ -563,15 +585,13 @@ After this we push this address to the stack and enable paging by setting `PG` a
563 585
 	movl	%eax, %cr0
564 586
 ```
565 587
 
566
-and execute:
588
+We then execute the `lret` instruction:
567 589
 
568 590
 ```assembly
569 591
 lret
570 592
 ```
571 593
 
572
-instruction.
573
-
574
-Remember that we pushed the address of the `startup_64` function to the stack in the previous step, and after the `lret` instruction, the CPU extracts the address of it and jumps there.
594
+Remember that we pushed the address of the `startup_64` function to the stack in the previous step. The CPU extracts `startup_64`'s address from the stack and jumps there.
575 595
 
576 596
 After all of these steps we're finally in 64-bit mode:
577 597
 
@@ -589,11 +609,11 @@ That's all!
589 609
 Conclusion
590 610
 --------------------------------------------------------------------------------
591 611
 
592
-This is the end of the fourth part linux kernel booting process. If you have questions or suggestions, ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-insides/issues/new).
612
+This is the end of the fourth part of the linux kernel booting process. If you have any questions or suggestions, ping me on twitter [0xAX](https://twitter.com/0xAX), drop me an [email](anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-insides/issues/new).
593 613
 
594
-In the next part, we will see kernel decompression and much more.
614
+In the next part, we will learn about many things, including how kernel decompression works.
595 615
 
596
-**Please note that English is not my first language and I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-internals).**
616
+**Please note that English is not my first language and I am really sorry for any inconvenience. If you find any mistakes please send a PR to [linux-insides](https://github.com/0xAX/linux-internals).**
597 617
 
598 618
 Links
599 619
 --------------------------------------------------------------------------------

+ 52
- 50
Booting/linux-bootstrap-5.md View File

@@ -1,15 +1,15 @@
1 1
 Kernel booting process. Part 5.
2 2
 ================================================================================
3 3
 
4
-Kernel decompression
4
+Kernel Decompression
5 5
 --------------------------------------------------------------------------------
6 6
 
7
-This is the fifth part of the `Kernel booting process` series. We saw transition to the 64-bit mode in the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-4.md#transition-to-the-long-mode) and we will continue from this point in this part. We will see the last steps before we jump to the kernel code as preparation for kernel decompression, relocation and directly kernel decompression. So... let's start to dive in the kernel code again.
7
+This is the fifth part of the `Kernel booting process` series. We went over the transition to 64-bit mode in the previous [part](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-4.md#transition-to-the-long-mode) and we will continue where we left off in this part. We will study the steps taken to prepare for kernel decompression, relocation and the process of  kernel decompression itself. So... let's dive into the kernel code again.
8 8
 
9
-Preparation before kernel decompression
9
+Preparing to Decompress the Kernel
10 10
 --------------------------------------------------------------------------------
11 11
 
12
-We stopped right before the jump on the `64-bit` entry point - `startup_64` which is located in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) source code file. We already saw the jump to the `startup_64` in the `startup_32`:
12
+We stopped right before the jump to the `64-bit` entry point - `startup_64` which is located in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) source code file. We already covered the jump to `startup_64` from `startup_32` in the previous part:
13 13
 
14 14
 ```assembly
15 15
 	pushl	$__KERNEL_CS
@@ -24,7 +24,7 @@ We stopped right before the jump on the `64-bit` entry point - `startup_64` whic
24 24
 	lret
25 25
 ```
26 26
 
27
-in the previous part. Since we loaded the new `Global Descriptor Table` and there was CPU transition in other mode (`64-bit` mode in our case), we can see the setup of the data segments:
27
+Since we have loaded a new `Global Descriptor Table` and the CPU has transitioned to a new mode (`64-bit` mode in our case), we set up the segment registers again at the beginning of the `startup_64` function:
28 28
 
29 29
 ```assembly
30 30
 	.code64
@@ -38,9 +38,9 @@ ENTRY(startup_64)
38 38
 	movl	%eax, %gs
39 39
 ```
40 40
 
41
-in the beginning of the `startup_64`. All segment registers besides `cs` register now reseted as we joined into the `long mode`.
41
+All segment registers besides the `cs` register are now reset in `long mode`.
42 42
 
43
-The next step is computation of difference between where the kernel was compiled and where it was loaded:
43
+The next step is to compute the difference between the location the kernel was compiled to be loaded at and the location where it is actually loaded:
44 44
 
45 45
 ```assembly
46 46
 #ifdef CONFIG_RELOCATABLE
@@ -60,9 +60,9 @@ The next step is computation of difference between where the kernel was compiled
60 60
 	addq	%rbp, %rbx
61 61
 ```
62 62
 
63
-The `rbp` contains the decompressed kernel start address and after this code executes `rbx` register will contain address to relocate the kernel code for decompression. We already saw code like this in the `startup_32` ( you can read about it in the previous part - [Calculate relocation address](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-4.md#calculate-relocation-address)), but we need to do this calculation again because the bootloader can use 64-bit boot protocol and `startup_32` just will not be executed in this case.
63
+The `rbp` register contains the decompressed kernel's start address. After this code executes, the `rbx` register will contain the address where the kernel code will be relocated to for decompression. We've already done this before in the `startup_32` function ( you can read about this in the previous part - [Calculate relocation address](https://github.com/0xAX/linux-insides/blob/v4.16/Booting/linux-bootstrap-4.md#calculate-relocation-address)), but we need to do this calculation again because the bootloader can use the 64-bit boot protocol now and `startup_32` is no longer being executed.
64 64
 
65
-In the next step we can see setup of the stack pointer, resetting of the flags register and setup `GDT` again because of in a case of `64-bit` protocol `32-bit` code segment can be omitted by bootloader:
65
+In the next step we set up the stack pointer, reset the flags register and set up the `GDT` again to overwrite the `32-bit` specific values with those from the `64-bit` protocol:
66 66
 
67 67
 ```assembly
68 68
     leaq	boot_stack_end(%rbx), %rsp
@@ -75,9 +75,9 @@ In the next step we can see setup of the stack pointer, resetting of the flags r
75 75
     popfq
76 76
 ```
77 77
 
78
-If you look at the Linux kernel source code after `lgdt gdt64(%rip)` instruction, you will see that there is some additional code. This code builds trampoline to enable [5-level pagging](https://lwn.net/Articles/708526/) if need. We will consider only 4-level paging in this books, so this code will be omitted.
78
+If you take a look at the code after the `lgdt gdt64(%rip)` instruction, you will see that there is some additional code. This code builds the trampoline to enable [5-level pagging](https://lwn.net/Articles/708526/) if needed. We will only consider 4-level paging in this book, so this code will be omitted.
79 79
 
80
-As you can see above, the `rbx` register contains the start address of the kernel decompressor code and we just put this address with `boot_stack_end` offset to the `rsp` register which represents pointer to the top of the stack. After this step, the stack will be correct. You can find definition of the `boot_stack_end` in the end of [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file:
80
+As you can see above, the `rbx` register contains the start address of the kernel decompressor code and we just put this address with an offset of `boot_stack_end` in the `rsp` register which points to the top of the stack. After this step, the stack will be correct. You can find the definition of the `boot_stack_end` constant in the end of the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) assembly source code file:
81 81
 
82 82
 ```assembly
83 83
 	.bss
@@ -89,9 +89,9 @@ boot_stack:
89 89
 boot_stack_end:
90 90
 ```
91 91
 
92
-It located in the end of the `.bss` section, right before the `.pgtable`. If you will look into [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S) linker script, you will find  Definition of the `.bss` and `.pgtable` there.
92
+It located in the end of the `.bss` section, right before `.pgtable`. If you peek inside the [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S) linker script, you will find the definitions of `.bss` and `.pgtable` there.
93 93
 
94
-As we set the stack, now we can copy the compressed kernel to the address that we got above, when we calculated the relocation address of the decompressed kernel. Before details, let's look at this assembly code:
94
+Since the stack is now correct, we can copy the compressed kernel to the address that we got above, when we calculated the relocation address of the decompressed kernel. Before we get into the details, let's take a look at this assembly code:
95 95
 
96 96
 ```assembly
97 97
 	pushq	%rsi
@@ -105,9 +105,11 @@ As we set the stack, now we can copy the compressed kernel to the address that w
105 105
 	popq	%rsi
106 106
 ```
107 107
 
108
-First of all we push `rsi` to the stack. We need preserve the value of `rsi`, because this register now stores a pointer to the `boot_params` which is real mode structure that contains booting related data (you must remember this structure, we filled it in the start of kernel setup). In the end of this code we'll restore the pointer to the `boot_params` into `rsi` again. 
108
+This set of instructions copies the compressed kernel over to where it will be decompressed.
109
+
110
+First of all we push `rsi` to the stack. We need preserve the value of `rsi`, because this register now stores a pointer to `boot_params`  which is a real mode structure that contains booting related data (remember,  this structure was populated at the start of the kernel setup). We pop the pointer to `boot_params` back to `rsi` after we execute this code.
109 111
 
110
-The next two `leaq` instructions calculates effective addresses of the `rip` and `rbx` with `_bss - 8` offset and put it to the `rsi` and `rdi`. Why do we calculate these addresses? Actually the compressed kernel image is located between this copying code (from `startup_32` to the current code) and the decompression code. You can verify this by looking at the linker script - [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S):
112
+The next two `leaq` instructions calculate the effective addresses of the `rip` and `rbx` registers with an offset of `_bss - 8` and assign the results to `rsi` and `rdi` respectively. Why do we calculate these addresses? The compressed kernel image is located between this code (from `startup_32` to the current code) and the decompression code. You can verify this by looking at this linker script - [arch/x86/boot/compressed/vmlinux.lds.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S):
111 113
 
112 114
 ```
113 115
 	. = 0;
@@ -127,7 +129,7 @@ The next two `leaq` instructions calculates effective addresses of the `rip` and
127 129
 	}
128 130
 ```
129 131
 
130
-Note that `.head.text` section contains `startup_32`. You may remember it from the previous part:
132
+Note that the `.head.text` section contains `startup_32`. You may remember it from the previous part:
131 133
 
132 134
 ```assembly
133 135
 	__HEAD
@@ -138,7 +140,7 @@ ENTRY(startup_32)
138 140
 ...
139 141
 ```
140 142
 
141
-The `.text` section contains decompression code:
143
+The `.text` section contains the decompression code:
142 144
 
143 145
 ```assembly
144 146
 	.text
@@ -152,21 +154,21 @@ relocated:
152 154
 ...
153 155
 ```
154 156
 
155
-And `.rodata..compressed` contains the compressed kernel image. So `rsi` will contain the absolute address of `_bss - 8`, and `rdi` will contain the relocation relative address of `_bss - 8`. As we store these addresses in registers, we put the address of `_bss` in the `rcx` register. As you can see in the `vmlinux.lds.S` linker script, it's located at the end of all sections with the setup/kernel code. Now we can start to copy data from `rsi` to `rdi`, `8` bytes at the time, with the `movsq` instruction. 
157
+And `.rodata..compressed` contains the compressed kernel image. So `rsi` will contain the absolute address of `_bss - 8`, and `rdi` will contain the relocation relative address of `_bss - 8`. In the same way we store these addresses in registers, we put the address of `_bss` in the `rcx` register. As you can see in the `vmlinux.lds.S` linker script, it's located at the end of all sections with the setup/kernel code. Now we can start copying data from `rsi` to `rdi`, `8` bytes at a time, with the `movsq` instruction.
156 158
 
157
-Note that there is an `std` instruction before data copying: it sets the `DF` flag, which means that `rsi` and `rdi` will be decremented. In other words, we will copy the bytes backwards. At the end, we clear the `DF` flag with the `cld` instruction, and restore `boot_params` structure to `rsi`.
159
+Note that we execute an `std` instruction before copying the data. This sets the `DF` flag, which means that `rsi` and `rdi` will be decremented. In other words, we will copy the bytes backwards. At the end, we clear the `DF` flag with the `cld` instruction, and restore the `boot_params` structure to `rsi`.
158 160
 
159
-Now we have the address of the `.text` section address after relocation, and we can jump to it:
161
+Now we have a pointer to the `.text` section's address after relocation, and we can jump to it:
160 162
 
161 163
 ```assembly
162 164
 	leaq	relocated(%rbx), %rax
163 165
 	jmp	*%rax
164 166
 ```
165 167
 
166
-Last preparation before kernel decompression
168
+The final touches before kernel decompression
167 169
 --------------------------------------------------------------------------------
168 170
 
169
-In the previous paragraph we saw that the `.text` section starts with the `relocated` label. The first thing it does is clearing the `bss` section with:
171
+In the previous paragraph we saw that the `.text` section starts with the `relocated` label. The first thing we do is to clear the `bss` section with:
170 172
 
171 173
 ```assembly
172 174
 	xorl	%eax, %eax
@@ -177,9 +179,9 @@ In the previous paragraph we saw that the `.text` section starts with the `reloc
177 179
 	rep	stosq
178 180
 ```
179 181
 
180
-We need to initialize the `.bss` section, because we'll soon jump to [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code. Here we just clear `eax`, put the address of `_bss` in `rdi` and `_ebss` in `rcx`, and fill it with zeros with the `rep stosq` instruction.
182
+We need to initialize the `.bss` section, because we'll soon jump to [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code. Here we just clear `eax`, put the addresses of `_bss` in `rdi` and `_ebss` in `rcx`, and fill `.bss` with zeros with the `rep stosq` instruction.
181 183
 
182
-At the end, we can see the call to the `extract_kernel` function:
184
+At the end, we can see a call to the `extract_kernel` function:
183 185
 
184 186
 ```assembly
185 187
 	pushq	%rsi
@@ -193,21 +195,21 @@ At the end, we can see the call to the `extract_kernel` function:
193 195
 	popq	%rsi
194 196
 ```
195 197
 
196
-Again we set `rdi` to a pointer to the `boot_params` structure and preserve it on the stack. In the same time we set `rsi` to point to the area which should be used for kernel uncompression. The last step is preparation of the `extract_kernel` parameters and call of this function which will uncompres the kernel. The `extract_kernel` function is defined in the  [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file and takes six arguments:
198
+Like before, we push `rsi` onto the stack to preserve the pointer to `boot_params`. We also copy the contents of `rsi` to `rdi`. Then, we set `rsi` to point to the area where the kernel will be decompressed. The last step is to prepare the parameters for the  `extract_kernel` function and call it to decompress the kernel. The `extract_kernel` function is defined in the  [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file and takes six arguments:
197 199
 
198
-* `rmode` - pointer to the [boot_params](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/uapi/asm/bootparam.h) structure which is filled by bootloader or during early kernel initialization;
199
-* `heap` - pointer to the `boot_heap` which represents start address of the early boot heap;
200
-* `input_data` - pointer to the start of the compressed kernel or in other words pointer to the `arch/x86/boot/compressed/vmlinux.bin.bz2`;
201
-* `input_len` - size of the compressed kernel;
202
-* `output` - start address of the future decompressed kernel;
203
-* `output_len` - size of decompressed kernel;
200
+* `rmode` - a pointer to the [boot_params](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/uapi/asm/bootparam.h) structure which is filled by either the bootloader or during early kernel initialization;
201
+* `heap` - a pointer to `boot_heap` which represents the start address of the early boot heap;
202
+* `input_data` - a pointer to the start of the compressed kernel or in other words, a pointer to the `arch/x86/boot/compressed/vmlinux.bin.bz2` file;
203
+* `input_len` - the size of the compressed kernel;
204
+* `output` - the start address of the decompressed kernel;
205
+* `output_len` - the size of the decompressed kernel;
204 206
 
205
-All arguments will be passed through the registers according to [System V Application Binary Interface](http://www.x86-64.org/documentation/abi.pdf). We've finished all preparation and can now look at the kernel decompression.
207
+All arguments will be passed through registers as per the [System V Application Binary Interface](http://www.x86-64.org/documentation/abi.pdf). We've finished all the preparations and can now decompress the kernel.
206 208
 
207 209
 Kernel decompression
208 210
 --------------------------------------------------------------------------------
209 211
 
210
-As we saw in previous paragraph, the `extract_kernel` function is defined in the [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file and takes six arguments. This function starts with the video/console initialization that we already saw in the previous parts. We need to do this again because we don't know if we started in [real mode](https://en.wikipedia.org/wiki/Real_mode) or a bootloader was used, or whether the bootloader used the `32` or `64-bit` boot protocol.
212
+As we saw in the previous paragraph, the `extract_kernel` function is defined in the [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file and takes six arguments. This function starts with the video/console initialization that we already saw in the previous parts. We need to do this again because we don't know if we started in [real mode](https://en.wikipedia.org/wiki/Real_mode) or if a bootloader was used, or whether the bootloader used the `32` or `64-bit` boot protocol.
211 213
 
212 214
 After the first initialization steps, we store pointers to the start of the free memory and to the end of it:
213 215
 
@@ -216,26 +218,26 @@ free_mem_ptr     = heap;
216 218
 free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
217 219
 ```
218 220
 
219
-where the `heap` is the second parameter of the `extract_kernel` function which we got in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S):
221
+Here, `heap` is the second parameter of the `extract_kernel` function as passed to it in [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S):
220 222
 
221 223
 ```assembly
222 224
 leaq	boot_heap(%rip), %rsi
223 225
 ```
224 226
 
225
-As you saw above, the `boot_heap` is defined as:
227
+As you saw above, `boot_heap` is defined as:
226 228
 
227 229
 ```assembly
228 230
 boot_heap:
229 231
 	.fill BOOT_HEAP_SIZE, 1, 0
230 232
 ```
231 233
 
232
-where the `BOOT_HEAP_SIZE` is macro which expands to `0x10000` (`0x400000` in a case of `bzip2` kernel) and represents the size of the heap.
234
+where `BOOT_HEAP_SIZE` is a macro which expands to `0x10000` (`0x400000` in thecase of a `bzip2` kernel) and represents the size of the heap.
233 235
 
234
-After heap pointers initialization, the next step is the call of the `choose_random_location` function from [arch/x86/boot/compressed/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file. As we can guess from the function name, it chooses the memory location where the kernel image will be decompressed. It may look weird that we need to find or even `choose` location where to decompress the compressed kernel image, but the Linux kernel supports [kASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization) which allows decompression of the kernel into a random address, for security reasons.
236
+After we initialize the heap pointers, the next step is to call the `choose_random_location` function from the [arch/x86/boot/compressed/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file. As we can guess from the function name, it chooses a memory location to write the decompressed kernel to. It may look weird that we need to find or even `choose` where to decompress the compressed kernel image, but the Linux kernel supports [kASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization) which allows decompression of the kernel into a random address, for security reasons.
235 237
 
236
-We will not consider randomization of the Linux kernel load address in this part, but will do it in the next part.
238
+We'll take a look at how the kernel's load address is randomized in the next part.
237 239
 
238
-Now let's back to [misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c). After getting the address for the kernel image, there need to be some checks to be sure that the retrieved random address is correctly aligned and address is not wrong:
240
+Now let's get back to [misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c). After getting the address for the kernel image, we need to check that the random address we got is correctly aligned, and in general, not wrong:
239 241
 
240 242
 ```C
241 243
 if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -260,16 +262,16 @@ if (virt_addr != LOAD_PHYSICAL_ADDR)
260 262
 After all these checks we will see the familiar message:
261 263
 
262 264
 ```
263
-Decompressing Linux... 
265
+Decompressing Linux...
264 266
 ```
265 267
 
266
-and call the `__decompress` function:
268
+Now, we call the `__decompress` function to decompress the kernel:
267 269
 
268 270
 ```C
269 271
 __decompress(input_data, input_len, NULL, NULL, output, output_len, NULL, error);
270 272
 ```
271 273
 
272
-which will decompress the kernel. The implementation of the `__decompress` function depends on what decompression algorithm was chosen during kernel compilation:
274
+The implementation of the `__decompress` function depends on what decompression algorithm was chosen during kernel compilation:
273 275
 
274 276
 ```C
275 277
 #ifdef CONFIG_KERNEL_GZIP
@@ -297,7 +299,7 @@ which will decompress the kernel. The implementation of the `__decompress` funct
297 299
 #endif
298 300
 ```
299 301
 
300
-After kernel is decompressed, the last two functions are `parse_elf` and `handle_relocations`. The main point of these functions is to move the uncompressed kernel image to the correct memory place. The fact is that the decompression will decompress [in-place](https://en.wikipedia.org/wiki/In-place_algorithm), and we still need to move kernel to the correct address. As we already know, the kernel image is an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) executable, so the main goal of the `parse_elf` function is to move loadable segments to the correct address. We can see loadable segments in the output of the `readelf` program:
302
+After the kernel is decompressed, two more functions are called: `parse_elf` and `handle_relocations`. The main point of these functions is to move the decompressed kernel image to its correct place in memory. This is because the decompression is done [in-place](https://en.wikipedia.org/wiki/In-place_algorithm), and we still need to move the kernel to the correct address. As we already know, the kernel image is an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) executable. The main goal of the `parse_elf` function is to move loadable segments to the correct address. We can see the kernel's loadable segments in the output of the `readelf` program:
301 303
 
302 304
 ```
303 305
 readelf -l vmlinux
@@ -319,7 +321,7 @@ Program Headers:
319 321
                  0x0000000000138000 0x000000000029b000  RWE    200000
320 322
 ```
321 323
 
322
-The goal of the `parse_elf` function is to load these segments to the `output` address we got from the `choose_random_location` function. This function starts with checking the [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) signature:
324
+The goal of the `parse_elf` function is to load these segments to the `output` address we got from the `choose_random_location` function. This function starts by checking the [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) signature:
323 325
 
324 326
 ```C
325 327
 Elf64_Ehdr ehdr;
@@ -336,7 +338,7 @@ if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
336 338
 }
337 339
 ```
338 340
 
339
-and if it's not valid, it prints an error message and halts. If we got a valid `ELF` file, we go through all program headers from the given `ELF` file and copy all loadable segments with correct 2 megabytes aligned address to the output buffer:
341
+If the ELF header is not valid, it prints an error message and halts. If we have a valid `ELF` file, we go through all the program headers from the given `ELF` file and copy all loadable segments with correct 2 megabyte aligned addresses to the output buffer:
340 342
 
341 343
 ```C
342 344
 	for (i = 0; i < ehdr.e_phnum; i++) {
@@ -347,7 +349,7 @@ and if it's not valid, it prints an error message and halts. If we got a valid `
347 349
 #ifdef CONFIG_X86_64
348 350
 			if ((phdr->p_align % 0x200000) != 0)
349 351
 				error("Alignment of LOAD segment isn't multiple of 2MB");
350
-#endif                
352
+#endif
351 353
 #ifdef CONFIG_RELOCATABLE
352 354
 			dest = output;
353 355
 			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
@@ -366,9 +368,9 @@ That's all.
366 368
 
367 369
 From this moment, all loadable segments are in the correct place.
368 370
 
369
-The next step after the `parse_elf` function is the call of the `handle_relocations` function. Implementation of this function depends on the `CONFIG_X86_NEED_RELOCS` kernel configuration option and if it is enabled, this function adjusts addresses in the kernel image, and is called only if the `CONFIG_RANDOMIZE_BASE` configuration option was enabled during kernel configuration. Implementation of the `handle_relocations` function is easy enough. This function subtracts value of the `LOAD_PHYSICAL_ADDR` from the value of the base load address of the kernel and thus we obtain the difference between where the kernel was linked to load and where it was actually loaded. After this we can perform kernel relocation as we know actual address where the kernel was loaded, its address where it was linked to run and relocation table which is in the end of the kernel image.
371
+The next step after the `parse_elf` function is to call the `handle_relocations` function. The implementation of this function depends on the `CONFIG_X86_NEED_RELOCS` kernel configuration option and if it is enabled, this function adjusts addresses in the kernel image. This function is also only called if the `CONFIG_RANDOMIZE_BASE` configuration option was enabled during kernel configuration. The implementation of the `handle_relocations` function is easy enough. This function subtracts the value of `LOAD_PHYSICAL_ADDR` from the value of the base load address of the kernel and thus we obtain the difference between where the kernel was linked to load and where it was actually loaded. After this we can relocate the kernel since we know the actual address where the kernel was loaded, the address where it was linked to run and the relocation table which is at the end of the kernel image.
370 372
 
371
-After the kernel is relocated, we return back from the `extract_kernel` to [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S).
373
+After the kernel is relocated, we return from the `extract_kernel` function to [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S).
372 374
 
373 375
 The address of the kernel will be in the `rax` register and we jump to it:
374 376
 
@@ -381,9 +383,9 @@ That's all. Now we are in the kernel!
381 383
 Conclusion
382 384
 --------------------------------------------------------------------------------
383 385
 
384
-This is the end of the fifth part about linux kernel booting process. We will not see posts about kernel booting anymore (maybe updates to this and previous posts), but there will be many posts about other kernel internals. 
386
+This is the end of the fifth part about the linux kernel booting process. We will not see any more posts about the kernel booting process (there may be updates to this and previous posts though), but there will be many posts about other kernel internals.
385 387
 
386
-Next chapter will describe more advanced details about linux kernel booting process, like a load address randomization and etc.
388
+The Next chapter will describe more advanced details about linux kernel booting process, like load address randomization and etc.
387 389
 
388 390
 If you have any questions or suggestions write me a comment or ping me in [twitter](https://twitter.com/0xAX).
389 391
 

+ 58
- 58
Booting/linux-bootstrap-6.md View File

@@ -4,9 +4,9 @@ Kernel booting process. Part 6.
4 4
 Introduction
5 5
 --------------------------------------------------------------------------------
6 6
 
7
-This is the sixth part of the `Kernel booting process` series. In the [previous part](linux-bootstrap-5.md) we have seen the end of the kernel boot process. But we have skipped some important advanced parts.
7
+This is the sixth part of the `Kernel booting process` series. In the [previous part](linux-bootstrap-5.md) we took a look at the final stages of the Linux kernel boot process. But we have skipped some important, more advanced parts.
8 8
 
9
-As you may remember the entry point of the Linux kernel is the `start_kernel` function from the [main.c](https://github.com/torvalds/linux/blob/v4.16/init/main.c) source code file started to execute at `LOAD_PHYSICAL_ADDR` address. This address depends on the `CONFIG_PHYSICAL_START` kernel configuration option which is `0x1000000` by default:
9
+As you may remember, the entry point of the Linux kernel is the `start_kernel` function defined in the [main.c](https://github.com/torvalds/linux/blob/v4.16/init/main.c) source code file. This function is executed  at the address stored in `LOAD_PHYSICAL_ADDR`. and depends on the `CONFIG_PHYSICAL_START` kernel configuration option, which is `0x1000000` by default:
10 10
 
11 11
 ```
12 12
 config PHYSICAL_START
@@ -19,18 +19,18 @@ config PHYSICAL_START
19 19
       ...
20 20
 ```
21 21
 
22
-This value may be changed during kernel configuration, but also load address can be selected as a random value. For this purpose the `CONFIG_RANDOMIZE_BASE` kernel configuration option should be enabled during kernel configuration.
22
+This value may be changed during kernel configuration, but the load address can also be configured to be a random value. For this purpose, the `CONFIG_RANDOMIZE_BASE` kernel configuration option should be enabled during kernel configuration.
23 23
 
24
-In this case a physical address at which Linux kernel image will be decompressed and loaded will be randomized. This part considers the case when this option is enabled and load address of the kernel image will be randomized for [security reasons](https://en.wikipedia.org/wiki/Address_space_layout_randomization).
24
+Now, the physical address where the Linux kernel image will be decompressed and loaded will be randomized. This part considers the case when the `CONFIG_RANDOMIZE_BASE` option is enabled and the load address of the kernel image is randomized for [security reasons](https://en.wikipedia.org/wiki/Address_space_layout_randomization).
25 25
 
26
-Initialization of page tables
26
+Page Table Initialization
27 27
 --------------------------------------------------------------------------------
28 28
 
29
-Before the kernel decompressor will start to find random memory range where the kernel will be decompressed and loaded, the identity mapped page tables should be initialized. If a [bootloader](https://en.wikipedia.org/wiki/Booting) used [16-bit or 32-bit boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt), we already have page tables. But in any case, we may need new pages by demand if the kernel decompressor selects memory range outside of them. That's why we need to build new identity mapped page tables.
29
+Before the kernel decompressor can look for a random memory range to decompress and load the kernel to, the identity mapped page tables should be initialized. If the [bootloader](https://en.wikipedia.org/wiki/Booting) used the [16-bit or 32-bit boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt), we already have page tables. But, there may be problems if the kernel decompressor selects a memory range which is valid only in a 64-bit context. That's why we need to build new identity mapped page tables.
30 30
 
31
-Yes, building of identity mapped page tables is the one of the first step during randomization of load address. But before we will consider it, let's try to remember where did we come from to this point.
31
+Indeed, the first step in randomizing the kernel load address is to build new identity mapped page tables. But first, let's reflect on how we got to this point.
32 32
 
33
-In the [previous part](linux-bootstrap-5.md), we saw transition to [long mode](https://en.wikipedia.org/wiki/Long_mode) and jump to the kernel decompressor entry point - `extract_kernel` function. The randomization stuff starts here from the call of the:
33
+In the [previous part](linux-bootstrap-5.md), we followed the transition to [long mode](https://en.wikipedia.org/wiki/Long_mode) and jumped to the kernel decompressor entry point - the `extract_kernel` function. The randomization stuff begins with a call to this function:
34 34
 
35 35
 ```C
36 36
 void choose_random_location(unsigned long input,
@@ -41,7 +41,7 @@ void choose_random_location(unsigned long input,
41 41
 {}
42 42
 ```
43 43
 
44
-function. As you may see, this function takes following five parameters:
44
+This function takes five parameters:
45 45
 
46 46
   * `input`;
47 47
   * `input_size`;
@@ -49,7 +49,7 @@ function. As you may see, this function takes following five parameters:
49 49
   * `output_isze`;
50 50
   * `virt_addr`.
51 51
 
52
-Let's try to understand what these parameters are. The first `input` parameter came from parameters of the `extract_kernel` function from the [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file:
52
+Let's try to understand what these parameters are. The first parameter, `input` is just the `input_data` parameter of the `extract_kernel` function from the [arch/x86/boot/compressed/misc.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/misc.c) source code file, cast to `unsigned long`:
53 53
 
54 54
 ```C
55 55
 asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
@@ -71,13 +71,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
71 71
 }
72 72
 ```
73 73
 
74
-This parameter is passed from assembler code:
74
+This parameter is passed through assembly from the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S)source code file:
75 75
 
76 76
 ```C
77 77
 leaq	input_data(%rip), %rdx
78 78
 ```
79 79
 
80
-from the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S). The `input_data` is generated by the little [mkpiggy](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/mkpiggy.c) program. If you have compiled linux kernel source code under your hands, you may find the generated file by this program which should be placed in the `linux/arch/x86/boot/compressed/piggy.S`. In my case this file looks:
80
+`input_data` is generated by the little [mkpiggy](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/mkpiggy.c) program. If you've tried compiling the Linux kernel yourself, you may find the output generated by this program in the `linux/arch/x86/boot/compressed/piggy.S` source code file. In my case this file looks like this:
81 81
 
82 82
 ```assembly
83 83
 .section ".rodata..compressed","a",@progbits
@@ -91,21 +91,21 @@ input_data:
91 91
 input_data_end:
92 92
 ```
93 93
 
94
-As you may see it contains four global symbols. The first two `z_input_len` and `z_output_len` which are sizes of compressed and uncompressed `vmlinux.bin.gz`. The third is our `input_data` and as you may see it points to linux kernel image in raw binary format (all debugging symbols, comments and relocation information are stripped). And the last `input_data_end` points to the end of the compressed linux image.
94
+As you can see, it contains four global symbols. The first two, `z_input_len` and `z_output_len` are the sizes of the compressed and uncompressed `vmlinux.bin.gz` archive. The third is our `input_data` parameter which points to the linux kernel image's raw binary (stripped of all debugging symbols, comments and relocation information). The last parameter,  `input_data_end`, points to the end of the compressed linux image.
95 95
 
96
-So, our first parameter of the `choose_random_location` function is the pointer to the compressed kernel image that is embedded into the `piggy.o` object file.
96
+So, the first parameter to the `choose_random_location` function is the pointer to the compressed kernel image that is embedded into the `piggy.o` object file.
97 97
 
98
-The second parameter of the `choose_random_location` function is the `z_input_len` that we have seen just now.
98
+The second parameter of the `choose_random_location` function is `z_input_len`.
99 99
 
100
-The third and fourth parameters of the `choose_random_location` function are address where to place decompressed kernel image and the length of decompressed kernel image respectively. The address where to put decompressed kernel came from [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) and it is address of the `startup_32` aligned to 2 megabytes boundary. The size of the decompressed kernel came from the same `piggy.S` and it is `z_output_len`.
100
+The third and fourth parameters of the `choose_random_location` function are the address of the decompressed kernel image and its length respectively. The decompressed kernel's address came from the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) source code file and is the address of the `startup_32` function  aligned to a 2 megabyte boundary. The size of the decompressed kernel is given by `z_output_len` which, again, is found in `piggy.S`.
101 101
 
102
-The last parameter of the `choose_random_location` function is the virtual address of the kernel load address. As we may see, by default it coincides with the default physical load address:
102
+The last parameter of the `choose_random_location` function is the virtual address of the kernel load address. As can be seen, by default, it coincides with the default physical load address:
103 103
 
104 104
 ```C
105 105
 unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
106 106
 ```
107 107
 
108
-which depends on kernel configuration:
108
+The physical load address is defined by the configuration options:
109 109
 
110 110
 ```C
111 111
 #define LOAD_PHYSICAL_ADDR ((CONFIG_PHYSICAL_START \
@@ -113,7 +113,7 @@ which depends on kernel configuration:
113 113
 				& ~(CONFIG_PHYSICAL_ALIGN - 1))
114 114
 ```
115 115
 
116
-Now, as we considered parameters of the `choose_random_location` function, let's look at implementation of it. This function starts from the checking of `nokaslr` option in the kernel command line:
116
+We've covered `choose_random_location`'s parameters, so let's look at its implementation. This function starts by checking the `nokaslr` option in the kernel command line:
117 117
 
118 118
 ```C
119 119
 if (cmdline_find_option_bool("nokaslr")) {
@@ -122,7 +122,7 @@ if (cmdline_find_option_bool("nokaslr")) {
122 122
 }
123 123
 ```
124 124
 
125
-and if the options was given we exit from the `choose_random_location` function ad kernel load address will not be randomized. Related command line options can be found in the [kernel documentation](https://github.com/torvalds/linux/blob/v4.16/Documentation/admin-guide/kernel-parameters.rst):
125
+We exit `choose_random_location` if the option is specified, leaving the kernel load address unrandomized. Information related to this can be found in the [kernel's documentation](https://github.com/torvalds/linux/blob/v4.16/Documentation/admin-guide/kernel-parameters.rst):
126 126
 
127 127
 ```
128 128
 kaslr/nokaslr [X86]
@@ -140,13 +140,13 @@ Let's assume that we didn't pass `nokaslr` to the kernel command line and the `C
140 140
 boot_params->hdr.loadflags |= KASLR_FLAG;
141 141
 ```
142 142
 
143
-and the next step is the call of the:
143
+Now, we call another function:
144 144
 
145 145
 ```C
146 146
 initialize_identity_maps();
147 147
 ```
148 148
 
149
-function which is defined in the [arch/x86/boot/compressed/kaslr_64.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr_64.c) source code file. This function starts from initialization of `mapping_info` an instance of the `x86_mapping_info` structure:
149
+The `initialize_identity_maps` function is defined in the [arch/x86/boot/compressed/kaslr_64.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr_64.c) source code file. This function starts by initialising an instance of the `x86_mapping_info` structure called `mapping_info`:
150 150
 
151 151
 ```C
152 152
 mapping_info.alloc_pgt_page = alloc_pgt_page;
@@ -155,7 +155,7 @@ mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sev_me_mask;
155 155
 mapping_info.kernpg_flag = _KERNPG_TABLE;
156 156
 ```
157 157
 
158
-The `x86_mapping_info` structure is defined in the [arch/x86/include/asm/init.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/init.h) header file and looks:
158
+The `x86_mapping_info` structure is defined in the [arch/x86/include/asm/init.h](https://github.com/torvalds/linux/blob/v4.16/arch/x86/include/asm/init.h) header file and looks like this:
159 159
 
160 160
 ```C
161 161
 struct x86_mapping_info {
@@ -168,18 +168,18 @@ struct x86_mapping_info {
168 168
 };
169 169
 ```
170 170
 
171
-This structure provides information about memory mappings. As you may remember from the previous part, we already setup'ed initial page tables from 0 up to `4G`. For now we may need to access memory above `4G` to load kernel at random position. So, the `initialize_identity_maps` function executes initialization of a memory region for a possible needed new page table. First of all let's try to look at the definition of the `x86_mapping_info` structure.
171
+This structure provides information about memory mappings. As you may remember from the previous part, we have already set up page tables to cover the range `0` to `4G`. This won't do since we might generate a randomized address outside of the 4 gigabyte range. So, the `initialize_identity_maps` function initializes the memory for a new page table entry. First, let's take a look at the definition of the `x86_mapping_info` structure.
172 172
 
173
-The `alloc_pgt_page` is a callback function that will be called to allocate space for a page table entry. The `context` field is an instance of the `alloc_pgt_data` structure in our case which will be used to track allocated page tables. The `page_flag` and `kernpg_flag` fields are page flags. The first represents flags for `PMD` or `PUD` entries. The second `kernpg_flag` field represents flags for kernel pages which can be overridden later. The `direct_gbpages` field represents support for huge pages and the last `offset` field represents offset between kernel virtual addresses and physical addresses up to `PMD` level.
173
+`alloc_pgt_page` is a callback function that is called to allocate space for a page table entry. The `context` field is an instance of the `alloc_pgt_data` structure. We use it to track allocated page tables. The `page_flag` and `kernpg_flag` fields are page flags. The first represents flags for `PMD` or `PUD` entries. The `kernpg_flag` field represents overridable flags for kernel pages. The `direct_gbpages` field is used to check if huge pages are supported and the last field,  `offset`, represents the offset between the kernel's virtual addresses and its physical addresses up to the `PMD` level.
174 174
 
175
-The `alloc_pgt_page` callback just validates that there is space for a new page, allocates new page:
175
+The `alloc_pgt_page` callback just checks that there is space for a new page, allocates it in the `pgt_buf` field of the `alloc_pgt_data` structure and returns the address of the new page:
176 176
 
177 177
 ```C
178 178
 entry = pages->pgt_buf + pages->pgt_buf_offset;
179 179
 pages->pgt_buf_offset += PAGE_SIZE;
180 180
 ```
181 181
 
182
-in the buffer from the:
182
+Here's what the `alloc_pgt_data` structure looks like:
183 183
 
184 184
 ```C
185 185
 struct alloc_pgt_data {
@@ -189,36 +189,36 @@ struct alloc_pgt_data {
189 189
 };
190 190
 ```
191 191
 
192
-structure and returns address of a new page. The last goal of the `initialize_identity_maps` function is to initialize `pgdt_buf_size` and `pgt_buf_offset`. As we are only in initialization phase, the `initialze_identity_maps` function sets `pgt_buf_offset` to zero:
192
+The last goal of the `initialize_identity_maps` function is to initialize `pgdt_buf_size` and `pgt_buf_offset`. As we are only in the initialization phase, the `initialze_identity_maps` function sets `pgt_buf_offset` to zero:
193 193
 
194 194
 ```C
195 195
 pgt_data.pgt_buf_offset = 0;
196 196
 ```
197 197
 
198
-and the `pgt_data.pgt_buf_size` will be set to `77824` or `69632` depends on which boot protocol will be used by bootloader (64-bit or 32-bit). The same is for `pgt_data.pgt_buf`. If a bootloader loaded the kernel at `startup_32`, the `pgdt_data.pgdt_buf` will point to the end of the page table which already was initialzed in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S):
198
+`pgt_data.pgt_buf_size` will be set to `77824` or `69632` depending on which boot protocol was used by the bootloader (64-bit or 32-bit). The same is done for `pgt_data.pgt_buf`. If a bootloader loaded the kernel at `startup_32`, `pgdt_data.pgdt_buf` will point to the end of the already initialzed page table in the [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/head_64.S) source code file:
199 199
 
200 200
 ```C
201 201
 pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
202 202
 ```
203 203
 
204
-where `_pgtable` points to the beginning of this page table [_pgtable](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S). In other way, if a bootloader have used 64-bit boot protocol and loaded the kernel at `startup_64`, early page tables should be built by bootloader itself and `_pgtable` will be just overwrote:
204
+Here, `_pgtable` points to the beginning of [_pgtable](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/vmlinux.lds.S). On the other hand, if the bootloader used the 64-bit boot protocol and loaded the kernel at `startup_64`, the early page tables should already be built by the bootloader itself and `_pgtable` will just point to those instead:
205 205
 
206 206
 ```C
207 207
 pgt_data.pgt_buf = _pgtable
208 208
 ```
209 209
 
210
-As the buffer for new page tables is initialized, we may return back to the `choose_random_location` function.
210
+As the buffer for new page tables is initialized, we may return to the `choose_random_location` function.
211 211
 
212
-Avoid reserved memory ranges
212
+Avoiding Reserved Memory Ranges
213 213
 --------------------------------------------------------------------------------
214 214
 
215
-After the stuff related to identity page tables is initilized, we may start to choose random location where to put decompressed kernel image. But as you may guess, we can't choose any address. There are some reseved addresses in memory ranges. Such addresses occupied by important things, like [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk), kernel command line and etc. The
215
+After the stuff related to identity page tables is initilized, we can choose a random memory location to extract the kernel image to. But as you may have guessed, we can't just choose any address. There are certain reseved memory regions which are occupied by important things like the [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk) and the kernel command line which must be avoided. The `mem_avoid_init` function will help us do this:
216 216
 
217 217
 ```C
218 218
 mem_avoid_init(input, input_size, *output);
219 219
 ```
220 220
 
221
-function will help us to do this. All non-safe memory regions will be collected in the:
221
+All unsafe memory regions will be collected in an array called `mem_avoid`:
222 222
 
223 223
 ```C
224 224
 struct mem_vector {
@@ -229,7 +229,7 @@ struct mem_vector {
229 229
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
230 230
 ```
231 231
 
232
-array. Where `MEM_AVOID_MAX` is from `mem_avoid_index` [enum](https://en.wikipedia.org/wiki/Enumerated_type#C) which represents different types of reserved memory regions:
232
+Here, `MEM_AVOID_MAX` is from the `mem_avoid_index` [enum](https://en.wikipedia.org/wiki/Enumerated_type#C) which represents different types of reserved memory regions:
233 233
 
234 234
 ```C
235 235
 enum mem_avoid_index {
@@ -245,7 +245,7 @@ enum mem_avoid_index {
245 245
 
246 246
 Both are defined in the [arch/x86/boot/compressed/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file.
247 247
 
248
-Let's look at the implementation of the `mem_avoid_init` function. The main goal of this function is to store information about reseved memory regions described by the `mem_avoid_index` enum in the `mem_avoid` array and create new pages for such regions in our new identity mapped buffer. Numerous parts fo the `mem_avoid_index` function are similar, but let's take a look at the one of them:
248
+Let's look at the implementation of the `mem_avoid_init` function. The main goal of this function is to store information about reseved memory regions with descriptions given by the `mem_avoid_index` enum in the `mem_avoid` array and to create new pages for such regions in our new identity mapped buffer. The `mem_avoid_index` function does the same thing for all elements in the `mem_avoid_index`enum, so let's look at a typical example of the process:
249 249
 
250 250
 ```C
251 251
 mem_avoid[MEM_AVOID_ZO_RANGE].start = input;
@@ -254,7 +254,7 @@ add_identity_map(mem_avoid[MEM_AVOID_ZO_RANGE].start,
254 254
 		 mem_avoid[MEM_AVOID_ZO_RANGE].size);
255 255
 ```
256 256
 
257
-At the beginning of the `mem_avoid_init` function tries to avoid memory region that is used for current kernel decompression. We fill an entry from the `mem_avoid` array with the start and size of such region and call the `add_identity_map` function which should build identity mapped pages for this region. The `add_identity_map` function is defined in the [arch/x86/boot/compressed/kaslr_64.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr_64.c) source code file and looks:
257
+THe `mem_avoid_init` function first tries to avoid memory regions currently used to decompress the kernel. We fill an entry from the `mem_avoid` array with the start address and the size of the relevant region and call the `add_identity_map` function, which  builds the identity mapped pages for this region. The `add_identity_map` function is defined in the [arch/x86/boot/compressed/kaslr_64.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr_64.c) source code file and looks like this:
258 258
 
259 259
 ```C
260 260
 void add_identity_map(unsigned long start, unsigned long size)
@@ -271,18 +271,18 @@ void add_identity_map(unsigned long start, unsigned long size)
271 271
 }
272 272
 ```
273 273
 
274
-As you may see it aligns memory region to 2 megabytes boundary and checks given start and end addresses.
274
+The `round_up` and `round_down` functions are used to align the start and end addresses to a 2 megabyte boundary.
275 275
 
276
-In the end it just calls the `kernel_ident_mapping_init` function from the [arch/x86/mm/ident_map.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/mm/ident_map.c) source code file and pass `mapping_info` instance that was initilized above, address of the top level page table and addresses of memory region for which new identity mapping should be built.
276
+In the end this function calls the `kernel_ident_mapping_init` function from the [arch/x86/mm/ident_map.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/mm/ident_map.c) source code file and passes the previously initialized `mapping_info` instance, the address of the top level page table and the start and end addresses of the memory region for which a new identity mapping should be built.
277 277
 
278
-The `kernel_ident_mapping_init` function sets default flags for new pages if they were not given:
278
+The `kernel_ident_mapping_init` function sets default flags for new pages if they were not already set:
279 279
 
280 280
 ```C
281 281
 if (!info->kernpg_flag)
282 282
 	info->kernpg_flag = _KERNPG_TABLE;
283 283
 ```
284 284
 
285
-and starts to build new 2-megabytes (because of `PSE` bit in the `mapping_info.page_flag`) page entries (`PGD -> P4D -> PUD -> PMD` in a case of [five-level page tables](https://lwn.net/Articles/717293/) or `PGD -> PUD -> PMD` in a case of [four-level page tables](https://lwn.net/Articles/117749/)) related to the given addresses.
285
+It then starts to build new 2-megabyte (because of the `PSE` bit in `mapping_info.page_flag`) page entries (`PGD -> P4D -> PUD -> PMD` if we're using [five-level page tables](https://lwn.net/Articles/717293/) or `PGD -> PUD -> PMD` if [four-level page tables](https://lwn.net/Articles/117749/) are used) associated with the given addresses.
286 286
 
287 287
 ```C
288 288
 for (; addr < end; addr = next) {
@@ -299,32 +299,32 @@ for (; addr < end; addr = next) {
299 299
 }
300 300
 ```
301 301
 
302
-First of all here we find next entry of the `Page Global Directory` for the given address and if it is greater than `end` of the given memory region, we set it to `end`. After this we allocate a new page with our `x86_mapping_info` callback that we already considered above and call the `ident_p4d_init` function. The `ident_p4d_init` function will do the same, but for low-level page directories (`p4d` -> `pud` -> `pmd`).
302
+The first thing this for loop does is to find the next entry of the `Page Global Directory` for the given address. If the entry's address is greater than the `end` of the given memory region, we set its size to `end`. After this, we allocate a new page with the  `x86_mapping_info` callback that we looked at previously and call the `ident_p4d_init` function. The `ident_p4d_init` function will do the same thing, but for the lower level page directories (`p4d` -> `pud` -> `pmd`).
303 303
 
304 304
 That's all.
305 305
 
306
-New page entries related to reserved addresses are in our page tables. This is not the end of the `mem_avoid_init` function, but other parts are similar. It just build pages for [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk), kernel command line and etc.
306
+We now have new page entries related to reserved addresses in our page tables. We haven't reached the end of the `mem_avoid_init` function, but the rest is similar. It builds pages for the [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk) and the kernel command line, among other things.
307 307
 
308
-Now we may return back to `choose_random_location` function.
308
+Now we may return to the `choose_random_location` function.
309 309
 
310 310
 Physical address randomization
311 311
 --------------------------------------------------------------------------------
312 312
 
313
-After the reserved memory regions were stored in the `mem_avoid` array and identity mapping pages were built for them, we select minimal available address to choose random memory region to decompress the kernel:
313
+After the reserved memory regions have been stored in the `mem_avoid` array and identity mapped pages are built for them, we select the region with the lowest available address to decompress the kernel to:
314 314
 
315 315
 ```C
316 316
 min_addr = min(*output, 512UL << 20);
317 317
 ```
318 318
 
319
-As you may see it should be smaller than `512` megabytes. This `512` megabytes value was selected just to avoid unknown things in lower memory.
319
+You will notice that the address should be within the first `512` megabytes. A limit of `512` megabytes was selected to avoid unknown things in lower memory.
320 320
 
321
-The next step is to select random physical and virtual addresses to load kernel. The first is physical addresses:
321
+The next step is to select random physical and virtual addresses to load the kernel to. The first is the physical addresses:
322 322
 
323 323
 ```C
324 324
 random_addr = find_random_phys_addr(min_addr, output_size);
325 325
 ```
326 326
 
327
-The `find_random_phys_addr` function is defined in the [same](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file:
327
+The `find_random_phys_addr` function is defined in the [same](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file as `choose_random_location`:
328 328
 
329 329
 ```
330 330
 static unsigned long find_random_phys_addr(unsigned long minimum,
@@ -340,7 +340,7 @@ static unsigned long find_random_phys_addr(unsigned long minimum,
340 340
 }
341 341
 ```
342 342
 
343
-The main goal of `process_efi_entries` function is to find all suitable memory ranges in full accessible memory to load kernel. If the kernel compiled and runned on the system without [EFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) support, we continue to search such memory regions in the [e820](https://en.wikipedia.org/wiki/E820) regions. All founded memory regions will be stored in the
343
+The main goal of the `process_efi_entries` function is to find all suitable memory ranges in fully accessible memory to load kernel. If the kernel is compiled and run on a system without [EFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) support, we continue to search for such memory regions in the [e820](https://en.wikipedia.org/wiki/E820) region. All memory regions found will be stored in the `slot_areas` array:
344 344
 
345 345
 ```C
346 346
 struct slot_area {
@@ -353,20 +353,20 @@ struct slot_area {
353 353
 static struct slot_area slot_areas[MAX_SLOT_AREA];
354 354
 ```
355 355
 
356
-array. The kernel will select a random index of this array for kernel to be decompressed. This selection will be executed by the `slots_fetch_random` function. The main goal of the `slots_fetch_random` function is to select random memory range from the `slot_areas` array via `kaslr_get_random_long` function:
356
+The kernel will select a random index from this array to decompress the kernel to. The selection process is conducted by the  `slots_fetch_random` function. The main goal of the `slots_fetch_random` function is to select a random memory range from the `slot_areas` array via the `kaslr_get_random_long` function:
357 357
 
358 358
 ```C
359 359
 slot = kaslr_get_random_long("Physical") % slot_max;
360 360
 ```
361 361
 
362
-The `kaslr_get_random_long` function is defined in the [arch/x86/lib/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/lib/kaslr.c) source code file and it just returns random number. Note that the random number will be get via different ways depends on kernel configuration and system opportunities (select random number base on [time stamp counter](https://en.wikipedia.org/wiki/Time_Stamp_Counter), [rdrand](https://en.wikipedia.org/wiki/RdRand) and so on).
362
+The `kaslr_get_random_long` function is defined in the [arch/x86/lib/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/lib/kaslr.c) source code file and as its name suggests, returns a random number. Note that the random number can be generated in a number of ways depending on kernel configuration and features present in the system (For example, using the [time stamp counter](https://en.wikipedia.org/wiki/Time_Stamp_Counter), or [rdrand](https://en.wikipedia.org/wiki/RdRand) or some other method).
363 363
 
364
-That's all from this point random memory range will be selected.
364
+We now have a random physical address to decompress the kernel to.
365 365
 
366 366
 Virtual address randomization
367 367
 --------------------------------------------------------------------------------
368 368
 
369
-After random memory region was selected by the kernel decompressor, new identity mapped pages will be built for this region by demand:
369
+After selecting a random physical address for the decompressed kernel, we generate identity mapped pages for the region:
370 370
 
371 371
 ```C
372 372
 random_addr = find_random_phys_addr(min_addr, output_size);
@@ -377,7 +377,7 @@ if (*output != random_addr) {
377 377
 }
378 378
 ```
379 379
 
380
-From this time `output` will store the base address of a memory region where kernel will be decompressed. But for this moment, as you may remember we randomized only physical address. Virtual address should be randomized too in a case of [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
380
+From now on, `output` will store the base address of the memory region where kernel will be decompressed. Currrently, we have only randomized the physical address. We can randomize the virtual address as well on the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
381 381
 
382 382
 ```C
383 383
 if (IS_ENABLED(CONFIG_X86_64))
@@ -386,18 +386,18 @@ if (IS_ENABLED(CONFIG_X86_64))
386 386
 *virt_addr = random_addr;
387 387
 ```
388 388
 
389
-As you may see in a case of non `x86_64` architecture, randomzed virtual address will coincide with randomized physical address. The `find_random_virt_addr` function calculates amount of virtual memory ranges that may hold kernel image and calls the `kaslr_get_random_long` that we already saw in a previous case when we tried to find random `physical` address.
389
+ In architectures other than `x86_64`, the randomized physical and virtual addresses are the same. The `find_random_virt_addr` function calculates the number of virtual memory ranges needed to hold the kernel image. It calls the `kaslr_get_random_long` function, which we have already seen being used to generate a random `physical` address.
390 390
 
391
-From this moment we have both randomized base physical (`*output`) and virtual (`*virt_addr`) addresses for decompressed kernel.
391
+At this point we have randomized both the base physical (`*output`) and virtual (`*virt_addr`) addresses for the decompressed kernel.
392 392
 
393 393
 That's all.
394 394
 
395 395
 Conclusion
396 396
 --------------------------------------------------------------------------------
397 397
 
398
-This is the end of the sixth and the last part about linux kernel booting process. We will not see posts about kernel booting anymore (maybe updates to this and previous posts), but there will be many posts about other kernel internals. 
398
+This is the end of the sixth and last part concerning the linux kernel's booting process. We will not see any more posts about kernel booting (though there may be updates to this and previous posts). We will now turn to other parts of the linux kernel instead.
399 399
 
400
-Next chapter will be about kernel initialization and we will see the first steps in the Linux kernel initialization code.
400
+The next chapter will be about kernel initialization and we will study the first steps take in the Linux kernel initialization code.
401 401
 
402 402
 If you have any questions or suggestions write me a comment or ping me in [twitter](https://twitter.com/0xAX).
403 403
 

Loading…
Cancel
Save