mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-06 22:01:06 +00:00
Fix grammar in linux-initialization-2.
Signed-off-by: Jakub Duchniewicz <j.duchniewicz@gmail.com>
This commit is contained in:
parent
859d98c037
commit
f759770302
@ -4,9 +4,9 @@ Kernel initialization. Part 2.
|
|||||||
Early interrupt and exception handling
|
Early interrupt and exception handling
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
In the previous [part](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-1) we stopped before setting of early interrupt handlers. At this moment we are in the decompressed Linux kernel, we have basic [paging](https://en.wikipedia.org/wiki/Page_table) structure for early boot and our current goal is to finish early preparation before the main kernel code will start to work.
|
In the previous [part](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-1) we stopped before setting up early interrupt handlers. At this moment we are in the decompressed Linux kernel, we have a basic [paging](https://en.wikipedia.org/wiki/Page_table) structure for early boot and our current goal is to finish early preparation before the main kernel code starts to work.
|
||||||
|
|
||||||
We already started to do this preparation in the previous [first](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-1) part of this [chapter](https://0xax.gitbook.io/linux-insides/summary/initialization). We continue in this part and will know more about interrupt and exception handling.
|
We already started this preparation in the previous ([first](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-1)) part of this [chapter](https://0xax.gitbook.io/linux-insides/summary/initialization). We continue in this part and will learn more about interrupt and exception handling.
|
||||||
|
|
||||||
Remember that we stopped before following function:
|
Remember that we stopped before following function:
|
||||||
|
|
||||||
@ -14,20 +14,20 @@ Remember that we stopped before following function:
|
|||||||
idt_setup_early_handler();
|
idt_setup_early_handler();
|
||||||
```
|
```
|
||||||
|
|
||||||
from the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) source code file. But before we start to sort out this function, we need to know about interrupts and handlers.
|
from the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) source code file. But before we start to sort out this function, we need to understand interrupts and handlers.
|
||||||
|
|
||||||
Some theory
|
Some theory
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
An interrupt is an event caused by software or hardware to the CPU. For example a user have pressed a key on keyboard. On interrupt, CPU stops the current task and transfer control to the special routine which is called - [interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler). An interrupt handler handles and interrupt and transfer control back to the previously stopped task. We can split interrupts on three types:
|
An interrupt is an event caused by the software or hardware to the CPU. For example a user has pressed a key on the keyboard. On the interrupt, CPU stops the current task and transfers control to a special routine called [interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler). An interrupt handler handles an interrupt and transfers control back to the previously stopped task. We can split interrupts on three types:
|
||||||
|
|
||||||
* Software interrupts - when a software signals CPU that it needs kernel attention. These interrupts are generally used for system calls;
|
* Software interrupts - when a software signals CPU that it needs kernel attention. These interrupts are generally used for system calls;
|
||||||
* Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard;
|
* Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard;
|
||||||
* Exceptions - interrupts generated by CPU, when the CPU detects error, for example division by zero or accessing a memory page which is not in RAM.
|
* Exceptions - interrupts generated by CPU, when the CPU detects an error, for example a division by zero or accessing a memory page which is not in RAM.
|
||||||
|
|
||||||
Every interrupt and exception is assigned a unique number which is called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts.
|
Every interrupt and exception is assigned a unique number called a `vector number`. `Vector number` can be any number from `0` to `255`. A common practice is to use the first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts.
|
||||||
|
|
||||||
CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will see description of it soon). CPU catches interrupts from the [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) or through its pins. Following table shows `0-31` exceptions:
|
CPU uses vector the number as an index in the `Interrupt Descriptor Table` (we will see a description of it soon). CPU catches interrupts from the [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) or through its pins. The following table shows `0-31` exceptions:
|
||||||
|
|
||||||
```
|
```
|
||||||
----------------------------------------------------------------------------------------------
|
----------------------------------------------------------------------------------------------
|
||||||
@ -79,7 +79,7 @@ CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will
|
|||||||
----------------------------------------------------------------------------------------------
|
----------------------------------------------------------------------------------------------
|
||||||
```
|
```
|
||||||
|
|
||||||
To react on interrupt CPU uses special structure - Interrupt Descriptor Table or IDT. IDT is an array of 8-byte descriptors like Global Descriptor Table, but IDT entries are called `gates`. CPU multiplies vector number by 8 to find the IDT entry. But in 64-bit mode IDT is an array of 16-byte descriptors and CPU multiplies vector number by 16 to find the entry in the IDT. We remember from the previous part that CPU uses special `GDTR` register to locate Global Descriptor Table, so CPU uses special register `IDTR` for Interrupt Descriptor Table and `lidt` instruction for loading base address of the table into this register.
|
To react upon the interrupt CPU uses a special structure - Interrupt Descriptor Table or IDT. IDT is an array of 8-byte descriptors just like the Global Descriptor Table, but IDT entries are called `gates`. CPU multiplies vector number by 8 to find the IDT entry. However in the 64-bit mode IDT is an array of 16-byte descriptors and CPU multiplies vector number by 16 to find the entry in the IDT. We remember from the previous part that CPU uses special `GDTR` register to locate the Global Descriptor Table, so CPU uses special register `IDTR` for Interrupt Descriptor Table and `lidt` instruction for loading base address of the table into this register.
|
||||||
|
|
||||||
64-bit mode IDT entry has following structure:
|
64-bit mode IDT entry has following structure:
|
||||||
|
|
||||||
@ -112,10 +112,10 @@ To react on interrupt CPU uses special structure - Interrupt Descriptor Table or
|
|||||||
|
|
||||||
Where:
|
Where:
|
||||||
|
|
||||||
* `Offset` - is offset to entry point of an interrupt handler;
|
* `Offset` - is the offset to entry point of an interrupt handler;
|
||||||
* `DPL` - Descriptor Privilege Level;
|
* `DPL` - Descriptor Privilege Level;
|
||||||
* `P` - Segment Present flag;
|
* `P` - Segment Present flag;
|
||||||
* `Segment selector` - a code segment selector in GDT or LDT (actually in linux, it must point to a valid descriptor in your GDT.)
|
* `Segment selector` - a code segment selector in GDT or LDT (actually in Linux, it must point to a valid descriptor in your GDT.)
|
||||||
```C
|
```C
|
||||||
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8) // 0000 0000 0001 0000
|
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8) // 0000 0000 0001 0000
|
||||||
#define GDT_ENTRY_KERNEL_CS 2
|
#define GDT_ENTRY_KERNEL_CS 2
|
||||||
@ -128,15 +128,15 @@ And the last `Type` field describes type of the `IDT` entry. There are three dif
|
|||||||
* Interrupt gate
|
* Interrupt gate
|
||||||
* Trap gate
|
* Trap gate
|
||||||
|
|
||||||
Interrupt and trap gates contain a far pointer to the entry point of the interrupt handler. Only one difference between these types is how CPU handles `IF` flag. If interrupt handler was accessed through interrupt gate, CPU clear the `IF` flag to prevent other interrupts while current interrupt handler executes. After that current interrupt handler executes, CPU sets the `IF` flag again with `iret` instruction.
|
Interrupt and trap gates contain a far pointer to the entry point of the interrupt handler. The only difference between these types is how CPU handles the `IF` flag. If an interrupt handler was accessed through the interrupt gate, CPU clears the `IF` flag to prevent other interrupts while current interrupt handler executes. After the current interrupt handler executes, CPU sets the `IF` flag again with `iret` instruction.
|
||||||
|
|
||||||
Other bits in the interrupt descriptor is reserved and must be 0. Now let's look how CPU handles interrupts:
|
Other bits in the interrupt descriptor are reserved and must be 0. Now let's look how a CPU handles interrupts:
|
||||||
|
|
||||||
* CPU save flags register, `CS`, and instruction pointer on the stack.
|
* CPU saves flags register, `CS`, and instruction pointer on the stack.
|
||||||
* If interrupt causes an error code (like `#PF` for example), CPU saves an error on the stack after instruction pointer;
|
* If an interrupt causes an error code (for example `#PF`), CPU saves an error on the stack after instruction pointer;
|
||||||
* After interrupt handler executes, `iret` instruction will be used to return from it.
|
* After interrupt handler executes, `iret` instruction will be used to return from it.
|
||||||
|
|
||||||
Now let's back to code.
|
Now let's go back to code.
|
||||||
|
|
||||||
Fill and load IDT
|
Fill and load IDT
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
@ -147,7 +147,7 @@ We stopped at the following function:
|
|||||||
idt_setup_early_handler();
|
idt_setup_early_handler();
|
||||||
```
|
```
|
||||||
|
|
||||||
`idt_setup_early_handler` is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) like the following:
|
`idt_setup_early_handler` is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) as following:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
void __init idt_setup_early_handler(void)
|
void __init idt_setup_early_handler(void)
|
||||||
@ -161,12 +161,12 @@ void __init idt_setup_early_handler(void)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
where `NUM_EXCEPTION_VECTORS` expands to `32`. As we can see, We're filling only first 32 `IDT` entries in the loop, because all of the early setup runs with interrupts disabled, so there is no need to set up interrupt handlers for vectors greater than `32`. Here we call `set_intr_gate` in the loop, which takes two parameters:
|
where `NUM_EXCEPTION_VECTORS` expands to `32`. As we can see, We're filling only first 32 `IDT` entries in the loop, because all of the early setup runs with interrupts disabled, so there is no need to set up an interrupt handlers for vectors greater than `32`. Here we call `set_intr_gate` in the loop, which takes two parameters:
|
||||||
|
|
||||||
* Number of an interrupt or `vector number`;
|
* Number of an interrupt or `vector number`;
|
||||||
* Address of the idt handler.
|
* Address of the idt handler.
|
||||||
|
|
||||||
and inserts an interrupt gate to the `IDT` table which is represented by the `&idt_descr` array.
|
and inserts an interrupt gate to the `IDT` table represented by the `&idt_descr` array.
|
||||||
|
|
||||||
The `early_idt_handler_array` array is declared in the [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/segment.h) header file and contains addresses of the first `32` exception handlers:
|
The `early_idt_handler_array` array is declared in the [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/segment.h) header file and contains addresses of the first `32` exception handlers:
|
||||||
|
|
||||||
@ -177,9 +177,9 @@ The `early_idt_handler_array` array is declared in the [arch/x86/include/asm/seg
|
|||||||
extern const char early_idt_handler_array[NUM_EXCEPTION_VECTORS][EARLY_IDT_HANDLER_SIZE];
|
extern const char early_idt_handler_array[NUM_EXCEPTION_VECTORS][EARLY_IDT_HANDLER_SIZE];
|
||||||
```
|
```
|
||||||
|
|
||||||
The `early_idt_handler_array` is `288` bytes array which contains address of exception entry points every nine bytes. Every nine bytes of this array consist of two bytes optional instruction for pushing dummy error code if an exception does not provide it, two bytes instruction for pushing vector number to the stack and five bytes of `jump` to the common exception handler code. You will see more detail in the next paragraph.
|
The `early_idt_handler_array` is a `288` bytes array containing addresses of exception entry points every nine bytes. Every nine bytes of this array consist of two optional bytes for the instruction for pushing dummy error code if an exception does not provide it, two bytes instruction for pushing vector number to the stack and five bytes of `jump` to the common exception handler code. You will see more detail in the next paragraph.
|
||||||
|
|
||||||
The `set_intr_gate` function is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) source file and looks:
|
The `set_intr_gate` function is defined in the [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/idt.c) source file and looks as follows:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
static void set_intr_gate(unsigned int n, const void *addr)
|
static void set_intr_gate(unsigned int n, const void *addr)
|
||||||
@ -199,7 +199,7 @@ static void set_intr_gate(unsigned int n, const void *addr)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
First of all it checks that passed vector number is not greater than `255` with `BUG_ON` macro. We need to do this because we are limited to have up to `256` interrupts. After this, we fill the idt data with the given arguments and others, which will be passed to `idt_setup_from_table`. The `idt_setup_from_table` function is defined in the same file as the `set_intr_gate` function like the following:
|
First of all it checks that vector number passed to it is not greater than `255` with `BUG_ON` macro. We need to do this because we are limited up to `256` interrupts. After this, we fill the idt data with given arguments and others, which will be passed to `idt_setup_from_table`. The `idt_setup_from_table` function is defined in the same file as the `set_intr_gate` function as follows:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
static void
|
static void
|
||||||
@ -221,13 +221,13 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
which fill temporary idt descriptor with the given arguments and others. And then we just copy it to the certain element of the `idt_table` array. `idt_table` is an array of idt entries:
|
that fills a temporary idt descriptor with the given arguments and others. And then we just copy it to the certain element of the `idt_table` array. `idt_table` is an array of idt entries:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
|
gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
|
||||||
```
|
```
|
||||||
|
|
||||||
Now we are moving back to main loop code. After main loop finishes, we can load `Interrupt Descriptor table` with the call of the:
|
Now we are moving back to main loop code. After main loop finishes, we can load `Interrupt Descriptor table` with the call to the:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
load_idt((const struct desc_ptr *)&idt_descr);
|
load_idt((const struct desc_ptr *)&idt_descr);
|
||||||
@ -248,9 +248,9 @@ and `load_idt` just executes `lidt` instruction:
|
|||||||
asm volatile("lidt %0"::"m" (idt_descr));
|
asm volatile("lidt %0"::"m" (idt_descr));
|
||||||
```
|
```
|
||||||
|
|
||||||
Okay, now we have filled and loaded `Interrupt Descriptor Table`, we know how the CPU acts during an interrupt. So now time to deal with interrupts handlers.
|
Okay, now after we have filled and loaded the `Interrupt Descriptor Table`, we know how the CPU acts during an interrupt. So now it's time to deal with interrupt handlers.
|
||||||
|
|
||||||
Early interrupts handlers
|
Early interrupt handlers
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
As you can read above, we filled `IDT` with the address of the `early_idt_handler_array`. In this section, we are going to look into it in detail. We can find it in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file:
|
As you can read above, we filled `IDT` with the address of the `early_idt_handler_array`. In this section, we are going to look into it in detail. We can find it in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file:
|
||||||
@ -275,7 +275,7 @@ ENTRY(early_idt_handler_array)
|
|||||||
END(early_idt_handler_array)
|
END(early_idt_handler_array)
|
||||||
```
|
```
|
||||||
|
|
||||||
We can see here, interrupt handlers generation for the first `32` exceptions. We check here, if exception has an error code then we do nothing, if exception does not return error code, we push zero to the stack. We do it for that stack was uniform. After that we push `vector number` on the stack and jump on the `early_idt_handler_common` which is generic interrupt handler for now. After all, every nine bytes of the `early_idt_handler_array` array consists of optional push of an error code, push of `vector number` and jump instruction to `early_idt_handler_common`. We can see it in the output of the `objdump` util:
|
As we can see above, interrupt handlers generation is done for the first `32` exceptions. We check here, if the exception has an error code and then we do nothing. If an exception, however, does not return an error code, we push a zero to the stack. We do it so that the stack is uniform. After that we push `vector number` on the stack and jump to the `early_idt_handler_common` - a generic interrupt handler for the time being. After all, every nine bytes of the `early_idt_handler_array` array consist of an optional push of an error code, push of `vector number` and jump instruction to `early_idt_handler_common`. We can see it in the output of the `objdump` util:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ objdump -D vmlinux
|
$ objdump -D vmlinux
|
||||||
@ -296,7 +296,7 @@ ffffffff81fe5014: 6a 02 pushq $0x2
|
|||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
As we may know, CPU pushes flag register, `CS` and `RIP` on the stack before calling interrupt handler. So before `early_idt_handler_common` will be executed, stack will contain following data:
|
As we may know, CPU pushes flag registers, `CS` and `RIP` on the stack before calling the interrupt handler. So before `early_idt_handler_common` will be executed, stack will contain the following data:
|
||||||
|
|
||||||
```
|
```
|
||||||
|--------------------|
|
|--------------------|
|
||||||
@ -308,7 +308,7 @@ As we may know, CPU pushes flag register, `CS` and `RIP` on the stack before cal
|
|||||||
|--------------------|
|
|--------------------|
|
||||||
```
|
```
|
||||||
|
|
||||||
Now let's look on the `early_idt_handler_common` implementation. It locates in the same [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file. First of all we increment `early_recursion_flag` to prevent recursion in the `early_idt_handler_common`:
|
Now let's look at the `early_idt_handler_common` implementation. It is located in the same [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file. First of all we increment `early_recursion_flag` to prevent recursion in the `early_idt_handler_common`:
|
||||||
|
|
||||||
```assembly
|
```assembly
|
||||||
incl early_recursion_flag(%rip)
|
incl early_recursion_flag(%rip)
|
||||||
@ -367,7 +367,7 @@ High |-------------------------|
|
|||||||
Low |-------------------------|
|
Low |-------------------------|
|
||||||
```
|
```
|
||||||
|
|
||||||
We need to do it to prevent wrong values of registers when we return from the interrupt handler. After this we check the vector number, and if it is `#PF` or [Page Fault](https://en.wikipedia.org/wiki/Page_fault), we put value from the `cr2` to the `rdi` register and call `early_make_pgtable` (we'll see it soon):
|
We need to do it to prevent wrong values of registers when we return from the interrupt handler. After this we check the vector number, and if it is `#PF` or a [Page Fault](https://en.wikipedia.org/wiki/Page_fault), we put value from the `cr2` to the `rdi` register and call `early_make_pgtable` (we'll see it soon):
|
||||||
|
|
||||||
```assembly
|
```assembly
|
||||||
cmpq $14,%rsi /* Page fault? */
|
cmpq $14,%rsi /* Page fault? */
|
||||||
@ -394,16 +394,16 @@ We'll see the implementation of the `early_fixup_exception` function later.
|
|||||||
jmp restore_regs_and_return_to_kernel
|
jmp restore_regs_and_return_to_kernel
|
||||||
```
|
```
|
||||||
|
|
||||||
After we decrement the `early_recursion_flag`, we restore registers which we saved before from the stack and return from the handler with `iretq`.
|
After we decrement the `early_recursion_flag`, we restore registers that we saved before on the stack and return from the handler with `iretq`.
|
||||||
|
|
||||||
It is the end of the interrupt handler. We will examine the page fault handling and the other exception handling in order.
|
That is the end of the interrupt handler. We will examine the page fault handling and the other exception handling in order.
|
||||||
|
|
||||||
Page fault handling
|
Page fault handling
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
In the previous paragraph we saw the early interrupt handler which checks if the vector number is page fault and calls `early_make_pgtable` for building new page tables if it is. We need to have `#PF` handler in this step because there are plans to add ability to load kernel above `4G` and make access to `boot_params` structure above the 4G.
|
In the previous paragraph we saw the early interrupt handler that checks if the vector number is a page fault and calls `early_make_pgtable` for building new page tables if it is. We need to have `#PF` handler in this step because there are plans to add an ability to load kernels above `4G` addresses and allow accesses to `boot_params` structure above the 4G addressing limit.
|
||||||
|
|
||||||
You can find the implementation of `early_make_pgtable` in [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) and takes one parameter - the value of `cr2` register, which contains the address caused page fault. Let's look on it:
|
You can find the implementation of the `early_make_pgtable` in [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) that takes one parameter - the value of the `cr2` register, containing the address causing page fault. Let's look at it:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
int __init early_make_pgtable(unsigned long address)
|
int __init early_make_pgtable(unsigned long address)
|
||||||
@ -417,7 +417,7 @@ int __init early_make_pgtable(unsigned long address)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
`__PAGE_OFFSET` is defined in the [arch/x86/include/asm/page_64_types.h](https://elixir.bootlin.com/linux/v3.10-rc1/source/arch/x86/include/asm/page_64_types.h#L33) header file, and the suffix `UL` forces the page offset to be a unsigned long data type.
|
`__PAGE_OFFSET` is defined in the [arch/x86/include/asm/page_64_types.h](https://elixir.bootlin.com/linux/v3.10-rc1/source/arch/x86/include/asm/page_64_types.h#L33) header file, and the suffix `UL` forces the page offset to be an unsigned long data type.
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
|
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
|
||||||
@ -442,7 +442,7 @@ And the `_AC` macro is defined in the [include/uapi/linux/const.h](https://elixi
|
|||||||
#define _AC(X,Y) __AC(X,Y)
|
#define _AC(X,Y) __AC(X,Y)
|
||||||
#endif
|
#endif
|
||||||
```
|
```
|
||||||
Where `__PAGE_OFFSET` expands to `0xffff888000000000`. But, why is it possible to translate a virtual address to a physical address by subtracting `__PAGE_OFFSET`? The answer is in the [Documentation/x86/x86_64/mm.rst](https://elixir.bootlin.com/linux/v5.10-rc5/source/Documentation/x86/x86_64/mm.rst#L45) documentation:
|
Where `__PAGE_OFFSET` expands to `0xffff888000000000`. But, why is it possible to translate a virtual address to a physical address by subtracting `__PAGE_OFFSET`? The answer is in the [Documentation/x86/x86_64/mm.rst](https://elixir.bootlin.com/linux/v5.10-rc5/source/Documentation/x86/x86_64/mm.rst#L45):
|
||||||
|
|
||||||
```
|
```
|
||||||
...
|
...
|
||||||
@ -452,7 +452,7 @@ ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct mapping of a
|
|||||||
|
|
||||||
As explained above, the virtual address space `ffff888000000000-ffffc87fffffffff` is direct mapping of all physical memory. When the kernel wants to access all physical memory, it uses direct mapping.
|
As explained above, the virtual address space `ffff888000000000-ffffc87fffffffff` is direct mapping of all physical memory. When the kernel wants to access all physical memory, it uses direct mapping.
|
||||||
|
|
||||||
Okay, let's get back to discussing `early_make_pgtable`. We initialize `pmd` and pass it to the `__early_make_pgtable` function along with `address`. The `__early_make_pgtable` function is defined in the same file as the `early_make_pgtable` function as follows:
|
Okay, let's get back to discussing `early_make_pgtable`. We initialize `pmd` and pass it to the `__early_make_pgtable` function along with an `address`. The `__early_make_pgtable` function is defined in the same file as the `early_make_pgtable` function as follows:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
|
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
|
||||||
@ -468,9 +468,9 @@ int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
It starts from the definition of some variables which have `*val_t` types. All of these types are declared as alias of `unsigned long` using `typedef`.
|
It starts from the definition of some variables having `*val_t` types. All of these types are declared as an alias of `unsigned long` using `typedef`.
|
||||||
|
|
||||||
After we made the check that we have no invalid address, we're getting the address of the Page Global Directory entry which contains base address of Page Upper Directory and put its value to the `pgd` variable:
|
After performing the check for invalid addresses, we're getting the address of the Page Global Directory entry containing base address of the Page Upper Directory and put its value into the `pgd` variable:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
again:
|
again:
|
||||||
@ -478,15 +478,15 @@ again:
|
|||||||
pgd = *pgd_p;
|
pgd = *pgd_p;
|
||||||
```
|
```
|
||||||
|
|
||||||
And we check if `pgd` is presented. If it is, we assign the base address of the page upper directory table to `pud_p`:
|
And we check if `pgd` is present. If it is, we assign the base address of the page upper directory table to `pud_p`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
|
pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
|
||||||
```
|
```
|
||||||
|
|
||||||
where `PTE_PFN_MASK` is a macro which mask lower `12` bits of `(pte|pmd|pud|pgd)val_t`.
|
where `PTE_PFN_MASK` is a macro that masks lower `12` bits of `(pte|pmd|pud|pgd)val_t`.
|
||||||
|
|
||||||
If `pgd` is not presented, we check if `next_early_pgt` is not greater than `EARLY_DYNAMIC_PAGE_TABLES` which is `64` and present a fixed number of buffers to set up new page tables on demand. If `next_early_pgt` is greater than `EARLY_DYNAMIC_PAGE_TABLES` we reset page tables and start again from `again` label. If `next_early_pgt` is less than `EARLY_DYNAMIC_PAGE_TABLES`, we assign the next entry of `early_dynamic_pgts` to `pud_p` and fill whole entry of the page upper directory with `0`, then fill the page global directory entry with the base address and some access rights:
|
If `pgd` is not present, we check if `next_early_pgt` is not greater than `EARLY_DYNAMIC_PAGE_TABLES` which is `64` and present a fixed number of buffers to set up new page tables on demand. If `next_early_pgt` is greater than `EARLY_DYNAMIC_PAGE_TABLES` we reset page tables and start again from `again` label. If `next_early_pgt` is less than `EARLY_DYNAMIC_PAGE_TABLES`, we assign the next entry of `early_dynamic_pgts` to `pud_p` and fill whole entry of the page upper directory with `0`, then fill the page global directory entry with the base address and some access rights:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
|
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
|
||||||
@ -519,7 +519,7 @@ After page fault handler finished its work, as a result, `early_top_pgt` contain
|
|||||||
Other exception handling
|
Other exception handling
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
In early interrupt phase, exceptions other than page fault are handled by `early_fixup_exception` function which is defined in [arch/x86/mm/extable.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/extable.c) and takes two parameters - pointer to kernel stack which consists of saved registers and vector number:
|
In the early interrupt phase, exceptions other than the page fault are handled by `early_fixup_exception` function defined in [arch/x86/mm/extable.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/extable.c) taking two parameters - a pointer to the kernel stack that consists of saved registers and a vector number:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
|
void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
|
||||||
@ -530,7 +530,7 @@ void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
First of all we need to make some checks as the following:
|
First of all, we need to make some checks as following:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (trapnr == X86_TRAP_NMI)
|
if (trapnr == X86_TRAP_NMI)
|
||||||
@ -552,7 +552,7 @@ After that, we get into:
|
|||||||
return;
|
return;
|
||||||
```
|
```
|
||||||
|
|
||||||
The `fixup_exception` function finds the actual handler and call it. It is defined in the same file as `early_fixup_exception` function as the following:
|
The `fixup_exception` function finds the actual handler and calls it. It is defined in the same file as `early_fixup_exception` function as follows:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
int fixup_exception(struct pt_regs *regs, int trapnr)
|
int fixup_exception(struct pt_regs *regs, int trapnr)
|
||||||
@ -576,7 +576,7 @@ typedef bool (*ex_handler_t)(const struct exception_table_entry *,
|
|||||||
struct pt_regs *, int)
|
struct pt_regs *, int)
|
||||||
```
|
```
|
||||||
|
|
||||||
The `search_exception_tables` function looks up the given address in the exception table (i.e. the contents of the ELF section, `__ex_table`). After that, we get the actual address by `ex_fixup_handler` function. At last we call actual handler. For more information about exception table, you can refer to [Documentation/x86/exception-tables.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt).
|
The `search_exception_tables` function looks up the given address in the exception table (i.e. the contents of the ELF section, `__ex_table`). After that, we get the actual address by `ex_fixup_handler` function. At last we call the actual handler. For more information about the exception table, you can refer to [Documentation/x86/exception-tables.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt).
|
||||||
|
|
||||||
Let's get back to the `early_fixup_exception` function, the next step is:
|
Let's get back to the `early_fixup_exception` function, the next step is:
|
||||||
|
|
||||||
@ -585,7 +585,7 @@ Let's get back to the `early_fixup_exception` function, the next step is:
|
|||||||
return;
|
return;
|
||||||
```
|
```
|
||||||
|
|
||||||
The `fixup_bug` function is defined in [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/traps.c). Let's have a look on the function implementation:
|
The `fixup_bug` function is defined in [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/traps.c). Let's have a look at its implementation:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
int fixup_bug(struct pt_regs *regs, int trapnr)
|
int fixup_bug(struct pt_regs *regs, int trapnr)
|
||||||
@ -607,12 +607,12 @@ int fixup_bug(struct pt_regs *regs, int trapnr)
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
All what this function does is just returns `1` if the exception is generated because `#UD` (or [Invalid Opcode](https://wiki.osdev.org/Exceptions#Invalid_Opcode)) occurred and the `report_bug` function returns `BUG_TRAP_TYPE_WARN`, otherwise returns `0`.
|
All what this function does is to return `1` if the exception is generated because `#UD` (or [Invalid Opcode](https://wiki.osdev.org/Exceptions#Invalid_Opcode) occurred and the `report_bug` function returns `BUG_TRAP_TYPE_WARN`), otherwise it returns `0`.
|
||||||
|
|
||||||
Conclusion
|
Conclusion
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
This is the end of the second part about Linux kernel insides. If you have questions or suggestions, ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](mailto:anotherworldofworld@gmail.com) or just create [issue](https://github.com/0xAX/linux-insides/issues/new). In the next part we will see all steps before kernel entry point - `start_kernel` function.
|
This is the end of the second part about Linux kernel insides. If you have questions or suggestions, ping me on twitter [0xAX](https://twitter.com/0xAX), drop me an [email](mailto:anotherworldofworld@gmail.com) or just create an [issue](https://github.com/0xAX/linux-insides/issues/new). In the next part we will see all the steps before kernel entry point - `start_kernel` function.
|
||||||
|
|
||||||
**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
|
**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user