1
0
mirror of https://github.com/0xAX/linux-insides.git synced 2025-01-18 11:41:08 +00:00

Update linux-initialization-2

This commit is contained in:
Takuya Yamamoto 2018-10-08 21:19:59 +09:00
parent cfc6994ee6
commit 1bbdcee0ce

View File

@ -8,14 +8,13 @@ In the previous [part](https://0xax.gitbooks.io/linux-insides/content/Initializa
We already started to do this preparation in the previous [first](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-1.html) part of this [chapter](https://0xax.gitbooks.io/linux-insides/content/Initialization/index.html). We continue in this part and will know more about interrupt and exception handling. We already started to do this preparation in the previous [first](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-1.html) part of this [chapter](https://0xax.gitbooks.io/linux-insides/content/Initialization/index.html). We continue in this part and will know more about interrupt and exception handling.
Remember that we stopped before following loop: Remember that we stopped before following function:
```C ```C
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) idt_setup_early_handler();
set_intr_gate(i, early_idt_handler_array[i]);
``` ```
from the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/head64.c) source code file. But before we started to sort out this code, we need to know about interrupts and handlers. from the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) source code file. But before we start to sort out this function, we need to know about interrupts and handlers.
Some theory Some theory
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
@ -26,13 +25,9 @@ An interrupt is an event caused by software or hardware to the CPU. For example
* Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard; * Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard;
* Exceptions - interrupts generated by CPU, when the CPU detects error, for example division by zero or accessing a memory page which is not in RAM. * Exceptions - interrupts generated by CPU, when the CPU detects error, for example division by zero or accessing a memory page which is not in RAM.
Every interrupt and exception is assigned a unique number which called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts. We can see it in the code above - `NUM_EXCEPTION_VECTORS`, which defined as: Every interrupt and exception is assigned a unique number which is called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts.
```C CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will see description of it soon). CPU catches interrupts from the [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) or through its pins. Following table shows `0-31` exceptions:
#define NUM_EXCEPTION_VECTORS 32
```
CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will see description of it soon). CPU catch interrupts from the [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) or through it's pins. Following table shows `0-31` exceptions:
``` ```
---------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------
@ -84,7 +79,7 @@ CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will
---------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------
``` ```
To react on interrupt CPU uses special structure - Interrupt Descriptor Table or IDT. IDT is an array of 8-byte descriptors like Global Descriptor Table, but IDT entries are called `gates`. CPU multiplies vector number on 8 to find index of the IDT entry. But in 64-bit mode IDT is an array of 16-byte descriptors and CPU multiplies vector number on 16 to find index of the entry in the IDT. We remember from the previous part that CPU uses special `GDTR` register to locate Global Descriptor Table, so CPU uses special register `IDTR` for Interrupt Descriptor Table and `lidt` instruction for loading base address of the table into this register. To react on interrupt CPU uses special structure - Interrupt Descriptor Table or IDT. IDT is an array of 8-byte descriptors like Global Descriptor Table, but IDT entries are called `gates`. CPU multiplies vector number by 8 to find the IDT entry. But in 64-bit mode IDT is an array of 16-byte descriptors and CPU multiplies vector number by 16 to find the entry in the IDT. We remember from the previous part that CPU uses special `GDTR` register to locate Global Descriptor Table, so CPU uses special register `IDTR` for Interrupt Descriptor Table and `lidt` instruction for loading base address of the table into this register.
64-bit mode IDT entry has following structure: 64-bit mode IDT entry has following structure:
@ -123,38 +118,53 @@ Where:
* `Segment selector` - a code segment selector in GDT or LDT * `Segment selector` - a code segment selector in GDT or LDT
* `IST` - provides ability to switch to a new stack for interrupts handling. * `IST` - provides ability to switch to a new stack for interrupts handling.
And the last `Type` field describes type of the `IDT` entry. There are three different kinds of handlers for interrupts: And the last `Type` field describes type of the `IDT` entry. There are three different kinds of gates for interrupts:
* Task descriptor * Task gate
* Interrupt descriptor * Interrupt gate
* Trap descriptor * Trap gate
Interrupt and trap descriptors contain a far pointer to the entry point of the interrupt handler. Only one difference between these types is how CPU handles `IF` flag. If interrupt handler was accessed through interrupt gate, CPU clear the `IF` flag to prevent other interrupts while current interrupt handler executes. After that current interrupt handler executes, CPU sets the `IF` flag again with `iret` instruction. Interrupt and trap gates contain a far pointer to the entry point of the interrupt handler. Only one difference between these types is how CPU handles `IF` flag. If interrupt handler was accessed through interrupt gate, CPU clear the `IF` flag to prevent other interrupts while current interrupt handler executes. After that current interrupt handler executes, CPU sets the `IF` flag again with `iret` instruction.
Other bits in the interrupt gate reserved and must be 0. Now let's look how CPU handles interrupts: Other bits in the interrupt descriptor is reserved and must be 0. Now let's look how CPU handles interrupts:
* CPU save flags register, `CS`, and instruction pointer on the stack. * CPU save flags register, `CS`, and instruction pointer on the stack.
* If interrupt causes an error code (like `#PF` for example), CPU saves an error on the stack after instruction pointer; * If interrupt causes an error code (like `#PF` for example), CPU saves an error on the stack after instruction pointer;
* After interrupt handler executed, `iret` instruction used to return from it. * After interrupt handler executes, `iret` instruction will be used to return from it.
Now let's back to code. Now let's back to code.
Fill and load IDT Fill and load IDT
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
We stopped at the following point: We stopped at the following function:
```C ```C
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) idt_setup_early_handler();
set_intr_gate(i, early_idt_handler_array[i]);
``` ```
Here we call `set_intr_gate` in the loop, which takes two parameters: `idt_setup_early_handler` is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) like the following:
```C
void __init idt_setup_early_handler(void)
{
int i;
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
set_intr_gate(i, early_idt_handler_array[i]);
load_idt(&idt_descr);
}
```
where `NUM_EXCEPTION_VECTORS` expands to `32`. As we can see, We're filling only first 32 `IDT` entries in the loop, because all of the early setup runs with interrupts disabled, so there is no need to set up interrupt handlers for vectors greater than `32`. Here we call `set_intr_gate` in the loop, which takes two parameters:
* Number of an interrupt or `vector number`; * Number of an interrupt or `vector number`;
* Address of the idt handler. * Address of the idt handler.
and inserts an interrupt gate to the `IDT` table which is represented by the `&idt_descr` array. First of all let's look on the `early_idt_handler_array` array. It is an array which is defined in the [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/segment.h) header file contains addresses of the first `32` exception handlers: and inserts an interrupt gate to the `IDT` table which is represented by the `&idt_descr` array.
The `early_idt_handler_array` array is declaredd in the [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/segment.h) header file and contains addresses of the first `32` exception handlers:
```C ```C
#define EARLY_IDT_HANDLER_SIZE 9 #define EARLY_IDT_HANDLER_SIZE 9
@ -163,9 +173,7 @@ and inserts an interrupt gate to the `IDT` table which is represented by the `&i
extern const char early_idt_handler_array[NUM_EXCEPTION_VECTORS][EARLY_IDT_HANDLER_SIZE]; extern const char early_idt_handler_array[NUM_EXCEPTION_VECTORS][EARLY_IDT_HANDLER_SIZE];
``` ```
The `early_idt_handler_array` is `288` bytes array which contains address of exception entry points every nine bytes. Every nine bytes of this array consist of two bytes optional instruction for pushing dummy error code if an exception does not provide it, two bytes instruction for pushing vector number to the stack and five bytes of `jump` to the common exception handler code. The `early_idt_handler_array` is `288` bytes array which contains address of exception entry points every nine bytes. Every nine bytes of this array consist of two bytes optional instruction for pushing dummy error code if an exception does not provide it, two bytes instruction for pushing vector number to the stack and five bytes of `jump` to the common exception handler code. You will see more detail in the next paragraph.
As we can see, We're filling only first 32 `IDT` entries in the loop, because all of the early setup runs with interrupts disabled, so there is no need to set up interrupt handlers for vectors greater than `32`. The `early_idt_handler_array` array contains generic idt handlers and we can find its definition in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/head_64.S) assembly file. For now we will skip it, but will look it soon. Before this we will look on the implementation of the `set_intr_gate` function.
The `set_intr_gate` function is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) source file and looks: The `set_intr_gate` function is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) source file and looks:
@ -187,7 +195,7 @@ static void set_intr_gate(unsigned int n, const void *addr)
} }
``` ```
First of all it checks with that passed interrupt number is not greater than `255` with `BUG_ON` macro. We need to do this check because we can have only `256` interrupts. After this, we setup the idt data with the given values. And then we call `idt_setup_from_table` function which looks like: First of all it checks that passed vector number is not greater than `255` with `BUG_ON` macro. We need to do this because we are limited to have up to `256` interrupts. After this, we fill the idt data with the given arguments and others, which will be passed to `idt_setup_from_table`. The `idt_setup_from_table` function is defined in the same file as the `set_intr_gate` function like the following:
```C ```C
static void static void
@ -209,15 +217,19 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
} }
``` ```
which fill three parts of the address of the interrupt handler with the address which we got in the main loop (address of the interrupt handler entry point). And then we just copy the gate descriptor to the idt entry. which fill temporary idt descriptor with the given arguments and others. And then we just copy it to the certain element of the `idt_table` array. `idt_table` is an array of idt entries:
After that main loop will finished, we will have filled `idt_table` array of `gate_desc` structures and we can load `Interrupt Descriptor table` with the call of the:
```C ```C
load_idt((const struct desc_ptr *)&idt_descr); gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
``` ```
Where `idt_descr` is: Now we are moving back to main loop code. After main loop finishes, we can load `Interrupt Descriptor table` with the call of the:
```C
load_idt((const struct desc_ptr *)&idt_descr);
```
where `idt_descr` is:
```C ```C
struct desc_ptr idt_descr __ro_after_init = { struct desc_ptr idt_descr __ro_after_init = {
@ -229,17 +241,15 @@ struct desc_ptr idt_descr __ro_after_init = {
and `load_idt` just executes `lidt` instruction: and `load_idt` just executes `lidt` instruction:
```C ```C
asm volatile("lidt %0"::"m" (*dtr)); asm volatile("lidt %0"::"m" (idt_descr));
``` ```
You can note that there are calls of the `_trace_*` functions in the `_set_gate` and other functions. These functions fills `IDT` gates in the same manner that `_set_gate` but with one difference. These functions use `trace_idt_table` the `Interrupt Descriptor Table` instead of `idt_table` for tracepoints (we will cover this theme in the another part).
Okay, now we have filled and loaded `Interrupt Descriptor Table`, we know how the CPU acts during an interrupt. So now time to deal with interrupts handlers. Okay, now we have filled and loaded `Interrupt Descriptor Table`, we know how the CPU acts during an interrupt. So now time to deal with interrupts handlers.
Early interrupts handlers Early interrupts handlers
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
As you can read above, we filled `IDT` with the address of the `early_idt_handler_array`. We can find it in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file: As you can read above, we filled `IDT` with the address of the `early_idt_handler_array`. In this section, we are going to look into it in detail. We can find it in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file:
```assembly ```assembly
ENTRY(early_idt_handler_array) ENTRY(early_idt_handler_array)
@ -261,7 +271,7 @@ ENTRY(early_idt_handler_array)
END(early_idt_handler_array) END(early_idt_handler_array)
``` ```
Functions which tend to not be called directly by other functions, such as syscall and interrupt handlers, often do unusual non-C-function-type things with the stack pointer. Such code needs to be annotated by using `UNWIND_HINT_IRET_REGS` macro such that objtool can understand it. We can see here, interrupt handlers generation for the first `32` exceptions. We check here, if exception has an error code then we do nothing, if exception does not return error code, we push zero to the stack. We do it for that would stack was uniform. After that we push exception number on the stack and jump on the `early_idt_handler_array` which is generic interrupt handler for now. As we may see above, every nine bytes of the `early_idt_handler_array` array consists from optional push of an error code, push of `vector number` and jump instruction. We can see it in the output of the `objdump` util: We can see here, interrupt handlers generation for the first `32` exceptions. We check here, if exception has an error code then we do nothing, if exception does not return error code, we push zero to the stack. We do it for that stack was uniform. After that we push `vector number` on the stack and jump on the `early_idt_handler_common` which is generic interrupt handler for now. After all, every nine bytes of the `early_idt_handler_array` array consists of optional push of an error code, push of `vector number` and jump instruction to `early_idt_handler_common`. We can see it in the output of the `objdump` util:
``` ```
$ objdump -D vmlinux $ objdump -D vmlinux
@ -282,7 +292,7 @@ ffffffff81fe5014: 6a 02 pushq $0x2
... ...
``` ```
As i wrote above, CPU pushes flag register, `CS` and `RIP` on the stack. So before `early_idt_handler` will be executed, stack will contain following data: As we may know, CPU pushes flag register, `CS` and `RIP` on the stack before calling interrupt handler. So before `early_idt_handler_common` will be executed, stack will contain following data:
``` ```
|--------------------| |--------------------|
@ -293,7 +303,7 @@ As i wrote above, CPU pushes flag register, `CS` and `RIP` on the stack. So befo
|--------------------| |--------------------|
``` ```
Now let's look on the `early_idt_handler_common` implementation. It locates in the same [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file and first of all we increment `early_recursion_flag` to prevent recursion in the `early_idt_handler_common`: Now let's look on the `early_idt_handler_common` implementation. It locates in the same [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file. First of all we increment `early_recursion_flag` to prevent recursion in the `early_idt_handler_common`:
```assembly ```assembly
incl early_recursion_flag(%rip) incl early_recursion_flag(%rip)
@ -332,7 +342,7 @@ We need to do it to prevent wrong values of registers when we return from the in
jz 20f jz 20f
``` ```
If vector number is not `#PF`, we call `early_fixup_exception` function with passing kernel stack pointer. (refer to [x86-64 calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions)): otherwise we call `early_fixup_exception` function by passing kernel stack pointer:
```assembly ```assembly
10: 10:
@ -348,16 +358,16 @@ We'll see the implementaion of the `early_fixup_exception` function later.
jmp restore_regs_and_return_to_kernel jmp restore_regs_and_return_to_kernel
``` ```
After we decrement the `early_recursion_flag`, we restore registers which we saved earlier from the stack and return from the handler with `iretq`. After we decrement the `early_recursion_flag`, we restore registers which we saved before from the stack and return from the handler with `iretq`.
It is the end of the first interrupt handler. Note that it is very early interrupt handler, so it handles only Page Fault now. We will see handlers for the other interrupts, but now let's look on the page fault handler. It is the end of the interrupt handler. We will examine the page fault handling and the other exception handling in order.
Page fault handling Page fault handling
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
In the previous paragraph we saw first early interrupt handler which checks interrupt number for page fault and calls `early_make_pgtable` for building new page tables if it is. We need to have `#PF` handler in this step because there are plans to add ability to load kernel above `4G` and make access to `boot_params` structure above the 4G. In the previous paragraph we saw the early interrupt handler which checks if the vector number is page fault and calls `early_make_pgtable` for building new page tables if it is. We need to have `#PF` handler in this step because there are plans to add ability to load kernel above `4G` and make access to `boot_params` structure above the 4G.
You can find implementation of the `early_make_pgtable` in the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) and takes one parameter - address from the `cr2` register, which caused Page Fault. Let's look on it: You can find the implementation of `early_make_pgtable` in [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) and takes one parameter - the value of `cr2` register, which contains the address caused page fault. Let's look on it:
```C ```C
int __init early_make_pgtable(unsigned long address) int __init early_make_pgtable(unsigned long address)
@ -371,7 +381,7 @@ int __init early_make_pgtable(unsigned long address)
} }
``` ```
Next we call `__early_make_pgtable` function which is defined in the same file as `early_make_pgtable` function as following: We initialize `pmd` and pass it to the `__early_make_pgtable` function along with `address`. The `__early_make_pgtable` function is defined in the same file as the `early_make_pgtable` function as the following:
```C ```C
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd) int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
@ -387,108 +397,58 @@ int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
} }
``` ```
It starts from the definition of some variables which have `*val_t` types. All of these types are just: It starts from the definition of some variables which have `*val_t` types. All of these types are declared as alias of `unsigned long` using `typedef`.
After we made the check that we have no invalid address, we're getting the address of the Page Global Directory entry which contains base address of Page Upper Directory and put its value to the `pgd` variable:
```C ```C
typedef unsigned long pgdval_t; again:
pgd_p = &early_top_pgt[pgd_index(address)].pgd;
pgd = *pgd_p;
``` ```
Also we will operate with the `*_t` (not val) types, for example `pgd_t` and etc... All of these types are defined in the [arch/x86/include/asm/pgtable_types.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgtable_types.h) and represent structures like this: And we check if `pgd` is presented. If it is, we assign the base address of the page upper directory table to `pud_p`:
```C ```C
typedef struct { pgdval_t pgd; } pgd_t; pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
``` ```
For example, where `PTE_PFN_MASK` is a macro which mask lower `12` bits of `(pte|pmd|pud|pgd)val_t`.
If `pgd` is not presented, we check if `next_early_pgt` is not greater than `EARLY_DYNAMIC_PAGE_TABLES` which is `64` and present a fixed number of buffers to set up new page tables on demand. If `next_early_pgt` is greater than `EARLY_DYNAMIC_PAGE_TABLES` we reset page tables and start again from `again` label. If `next_early_pgt` is less than `EARLY_DYNAMIC_PAGE_TABLES`, we assign the next entry of `early_dynamic_pgts` to `pud_p` and fill whole entry of the page upper directory with `0`, then fill the page global directory entry with the base address and some access rights:
```C ```C
extern pgd_t early_top_pgt[PTRS_PER_PGD]; if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
reset_early_page_tables();
goto again;
}
pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
``` ```
Here `early_top_pgt` presents early top-level page table directory which consists of an array of `pgd_t` types and `pgd` points to low-level page entries. And we fix `pud_p` to point to correct entry and assign its value to `pud` with the following:
After we made the check that we have no invalid address, we're getting the address of the Page Global Directory entry which contains `#PF` address and put its value to the `pgd` variable:
```C ```C
pgd_p = &early_top_pgt[pgd_index(address)].pgd; pud_p += pud_index(address);
pgd = *pgd_p; pud = *pud_p;
``` ```
Next we check if five-layer paging is enabled: And then we do the same routine as above, but to the page middle directory.
In the end we assign the given `pmd` which is passed by the `early_make_pgtable` function to the certain entry of page middle directory which maps kernel text+data virtual addresses:
```C ```C
if (!pgtable_l5_enabled()) pmd_p[pmd_index(address)] = pmd;
p4d_p = pgd_p;
``` ```
In most cases five-layer paging is not enabled, so `p4d_p` most likely equals to `pgd_p`. After page fault handler finished its work, as a result, `early_top_pgt` contains entries which point to the valid addresses.
After this we fix up address of the p4d with:
```C
p4d_p += p4d_index(address);
p4d = *p4d_p;
```
In the next step we check `p4d`, if it contains correct p4d entry we put physical address of the p4d entry and put it to the `pud_p` with:
```C
pud_p = (pudval_t *)((p4d & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
```
where `PTE_PFN_MASK` is a macro:
```C
#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK)
```
which expands to:
```C
(signed long)(~(PAGE_SIZE-1)) & ((1 << 52) - 1)
```
Here [sign-extension](https://en.wikipedia.org/wiki/Sign_extension) is used. To be more expanded:
```
0b1111111111111111111111111111111111111111111111111111
```
which is 52 bits to mask page frame.
If `p4d` does not contain correct address we check that `next_early_pgt` is not greater than `EARLY_DYNAMIC_PAGE_TABLES` which is `64` and present a fixed number of buffers to set up new page tables on demand. If `next_early_pgt` is greater than `EARLY_DYNAMIC_PAGE_TABLES` we reset page tables and start again. If `next_early_pgt` is less than `EARLY_DYNAMIC_PAGE_TABLES`, we create new page upper directory pointer which points to the current dynamic page table and writes its physical address with the `_KERNPG_TABLE` access rights to the p4d:
```C
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
reset_early_page_tables();
goto again;
}
pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
for (i = 0; i < PTRS_PER_PUD; i++)
pud_p[i] = 0;
*p4d_p = (p4dval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
```
As we did above, we fix up address of the page upper directory with:
```C
pud_p += pud_index(address);
pud = *pud_p;
```
In the next step we do the same actions as we did before, but with the page middle directory. In the end we fix address of the page middle directory which contains maps kernel text+data virtual addresses:
```C
pmd_p[pmd_index(address)] = pmd;
```
After page fault handler finished its work and as result our `early_top_pgt` contains entries which point to the valid addresses.
Other exception handling Other exception handling
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
In early interrupt phase, exceptions other than page fault are handled by `early_fixup_exception` function which is defined in [arch/x86/mm/extable.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/extable.c) and takes two parameters - pointer to kernel stack which consists of saved registers and interrupt number: In early interrupt phase, exceptions other than page fault are handled by `early_fixup_exception` function which is defined in [arch/x86/mm/extable.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/extable.c) and takes two parameters - pointer to kernel stack which consists of saved registers and vector number:
```C ```C
void __init early_fixup_exception(struct pt_regs *regs, int trapnr) void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
@ -499,7 +459,7 @@ void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
} }
``` ```
First of all we need to pass some condition expressions. First of all we need to make some checks as the following:
```C ```C
if (trapnr == X86_TRAP_NMI) if (trapnr == X86_TRAP_NMI)
@ -512,14 +472,16 @@ First of all we need to pass some condition expressions.
goto fail; goto fail;
``` ```
Here we just ignore [NMI](https://en.wikipedia.org/wiki/Non-maskable_interrupt). And we make sure that we are not in recursive situation. After that, we get into: Here we just ignore [NMI](https://en.wikipedia.org/wiki/Non-maskable_interrupt) and make sure that we are not in recursive situation.
After that, we get into:
```C ```C
if (fixup_exception(regs, trapnr)) if (fixup_exception(regs, trapnr))
return; return;
``` ```
The `fixup_exception` function is defined in the same file as `early_fixup_exception` function and looks like: The `fixup_exception` function finds the actual handler and call it. It is defined in the same file as `early_fixup_exception` function as the following:
```C ```C
int fixup_exception(struct pt_regs *regs, int trapnr) int fixup_exception(struct pt_regs *regs, int trapnr)
@ -543,16 +505,16 @@ typedef bool (*ex_handler_t)(const struct exception_table_entry *,
struct pt_regs *, int) struct pt_regs *, int)
``` ```
The `search_exception_tables` function looks up the given address in the exception table (i.e. the contents of the ELF section __ex_table). After that, we get the actual address by `ex_fixup_handler` function. At last we call actual handler. For more information about exception table, you can refer to [Documentation/x86/exception-tables.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt). The `search_exception_tables` function looks up the given address in the exception table (i.e. the contents of the ELF section, `__ex_table`). After that, we get the actual address by `ex_fixup_handler` function. At last we call actual handler. For more information about exception table, you can refer to [Documentation/x86/exception-tables.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt).
Back to `early_fixup_exception` function, the next step is: Let's get back to the `early_fixup_exception` function, the next step is:
```C ```C
if (fixup_bug(regs, trapnr)) if (fixup_bug(regs, trapnr))
return; return;
``` ```
The `fixup_bug` function is defined in [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/traps.c). Let's have a look on the function implementation. The `fixup_bug` function is defined in [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/traps.c). Let's have a look on the function implementation:
```C ```C
int fixup_bug(struct pt_regs *regs, int trapnr) int fixup_bug(struct pt_regs *regs, int trapnr)
@ -574,7 +536,7 @@ int fixup_bug(struct pt_regs *regs, int trapnr)
} }
``` ```
All what this funtion do is just return `1` if the exception is generated because `#UD` (or [Invalid Opcode](https://wiki.osdev.org/Exceptions#Invalid_Opcode)) occured and the `report_bug` function returns `BUG_TRAP_TYPE_WARN`, otherwise return `0`. All what this funtion does is just returns `1` if the exception is generated because `#UD` (or [Invalid Opcode](https://wiki.osdev.org/Exceptions#Invalid_Opcode)) occured and the `report_bug` function returns `BUG_TRAP_TYPE_WARN`, otherwise returns `0`.
Conclusion Conclusion
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------