mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-05 05:10:55 +00:00
fix: typo grammar spelling
This commit is contained in:
parent
75a4d9a8e5
commit
f74f207581
@ -91,7 +91,7 @@ We also see that the `reset` section is `16` bytes and is compiled to start from
|
|||||||
|
|
||||||
```
|
```
|
||||||
SECTIONS {
|
SECTIONS {
|
||||||
/* Trigger an error if I have an unuseable start address */
|
/* Trigger an error if I have an unusable start address */
|
||||||
_bogus = ASSERT(_start16bit >= 0xffff0000, "_start16bit too low. Please report.");
|
_bogus = ASSERT(_start16bit >= 0xffff0000, "_start16bit too low. Please report.");
|
||||||
_ROMTOP = 0xfffffff0;
|
_ROMTOP = 0xfffffff0;
|
||||||
. = _ROMTOP;
|
. = _ROMTOP;
|
||||||
|
@ -212,7 +212,7 @@ As the buffer for new page tables is initialized, we may return to the `choose_r
|
|||||||
Avoiding Reserved Memory Ranges
|
Avoiding Reserved Memory Ranges
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
After the stuff related to identity page tables is initilized, we can choose a random memory location to extract the kernel image to. But as you may have guessed, we can't just choose any address. There are certain reseved memory regions which are occupied by important things like the [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk) and the kernel command line which must be avoided. The `mem_avoid_init` function will help us do this:
|
After the stuff related to identity page tables is initilized, we can choose a random memory location to extract the kernel image to. But as you may have guessed, we can't just choose any address. There are certain reserved memory regions which are occupied by important things like the [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk) and the kernel command line which must be avoided. The `mem_avoid_init` function will help us do this:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
mem_avoid_init(input, input_size, *output);
|
mem_avoid_init(input, input_size, *output);
|
||||||
@ -245,7 +245,7 @@ enum mem_avoid_index {
|
|||||||
|
|
||||||
Both are defined in the [arch/x86/boot/compressed/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file.
|
Both are defined in the [arch/x86/boot/compressed/kaslr.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/compressed/kaslr.c) source code file.
|
||||||
|
|
||||||
Let's look at the implementation of the `mem_avoid_init` function. The main goal of this function is to store information about reseved memory regions with descriptions given by the `mem_avoid_index` enum in the `mem_avoid` array and to create new pages for such regions in our new identity mapped buffer. The `mem_avoid_index` function does the same thing for all elements in the `mem_avoid_index`enum, so let's look at a typical example of the process:
|
Let's look at the implementation of the `mem_avoid_init` function. The main goal of this function is to store information about reserved memory regions with descriptions given by the `mem_avoid_index` enum in the `mem_avoid` array and to create new pages for such regions in our new identity mapped buffer. The `mem_avoid_index` function does the same thing for all elements in the `mem_avoid_index`enum, so let's look at a typical example of the process:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
mem_avoid[MEM_AVOID_ZO_RANGE].start = input;
|
mem_avoid[MEM_AVOID_ZO_RANGE].start = input;
|
||||||
@ -377,7 +377,7 @@ if (*output != random_addr) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
From now on, `output` will store the base address of the memory region where kernel will be decompressed. Currrently, we have only randomized the physical address. We can randomize the virtual address as well on the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
|
From now on, `output` will store the base address of the memory region where kernel will be decompressed. Currently, we have only randomized the physical address. We can randomize the virtual address as well on the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (IS_ENABLED(CONFIG_X86_64))
|
if (IS_ENABLED(CONFIG_X86_64))
|
||||||
|
@ -386,7 +386,7 @@ otherwise we call `early_fixup_exception` function by passing kernel stack point
|
|||||||
call early_fixup_exception
|
call early_fixup_exception
|
||||||
```
|
```
|
||||||
|
|
||||||
We'll see the implementaion of the `early_fixup_exception` function later.
|
We'll see the implementation of the `early_fixup_exception` function later.
|
||||||
|
|
||||||
```assembly
|
```assembly
|
||||||
20:
|
20:
|
||||||
|
@ -403,7 +403,7 @@ $ cat /proc/sys/kernel/sched_rt_runtime_us
|
|||||||
950000
|
950000
|
||||||
```
|
```
|
||||||
|
|
||||||
The values related to a group can be configured in `<cgroup>/cpu.rt_period_us` and `<cgroup>/cpu.rt_runtime_us`. Due no one filesystem is not mounted yet, the `def_rt_bandwidth` and the `def_dl_bandwidth` will be initialzed with default values which will be retuned by the `global_rt_period` and `global_rt_runtime` functions.
|
The values related to a group can be configured in `<cgroup>/cpu.rt_period_us` and `<cgroup>/cpu.rt_runtime_us`. Due no one filesystem is not mounted yet, the `def_rt_bandwidth` and the `def_dl_bandwidth` will be initialized with default values which will be retuned by the `global_rt_period` and `global_rt_runtime` functions.
|
||||||
|
|
||||||
That's all with the bandwiths of `real-time` and `deadline` tasks and in the next step, depends on enable of [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing), we make initialization of the `root domain`:
|
That's all with the bandwiths of `real-time` and `deadline` tasks and in the next step, depends on enable of [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing), we make initialization of the `root domain`:
|
||||||
|
|
||||||
@ -501,7 +501,7 @@ struct task_struct {
|
|||||||
|
|
||||||
The first one is `dynamic priority` which can't be changed during lifetime of a process based on its static priority and interactivity of the process. The `static_prio` contains initial priority most likely well-known to you `nice value`. This value does not changed by the kernel if a user will not change it. The last one is `normal_priority` based on the value of the `static_prio` too, but also it depends on the scheduling policy of a process.
|
The first one is `dynamic priority` which can't be changed during lifetime of a process based on its static priority and interactivity of the process. The `static_prio` contains initial priority most likely well-known to you `nice value`. This value does not changed by the kernel if a user will not change it. The last one is `normal_priority` based on the value of the `static_prio` too, but also it depends on the scheduling policy of a process.
|
||||||
|
|
||||||
So the main goal of the `set_load_weight` function is to initialze `load_weight` fields for the `init` task:
|
So the main goal of the `set_load_weight` function is to initialize `load_weight` fields for the `init` task:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
static void set_load_weight(struct task_struct *p)
|
static void set_load_weight(struct task_struct *p)
|
||||||
|
@ -12,7 +12,7 @@ what happens when an interrupt is triggered and etc. The source code of this dri
|
|||||||
Initialization of a kernel module
|
Initialization of a kernel module
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
We will start to consider this driver as we usually did it with all new concepts that we saw in this book. We will start to consider it from the intialization. As you already may know, the Linux kernel provides two macros for initialization and finalization of a driver or a kernel module:
|
We will start to consider this driver as we usually did it with all new concepts that we saw in this book. We will start to consider it from the initialization. As you already may know, the Linux kernel provides two macros for initialization and finalization of a driver or a kernel module:
|
||||||
|
|
||||||
* `module_init`;
|
* `module_init`;
|
||||||
* `module_exit`.
|
* `module_exit`.
|
||||||
@ -194,7 +194,7 @@ In our case we pass `0`, so it will be `IRQF_TRIGGER_NONE`. This flag means that
|
|||||||
static const char serial21285_name[] = "Footbridge UART";
|
static const char serial21285_name[] = "Footbridge UART";
|
||||||
```
|
```
|
||||||
|
|
||||||
and will be displayed in the output of the `/proc/interrupts`. And in the last parameter we pass the pointer to the our main `uart_port` structure. Now we know a little about `request_irq` function and its parameters, let's look at its implemenetation. As we can see above, the `request_irq` function just makes a call of the `request_threaded_irq` function inside. The `request_threaded_irq` function defined in the [kernel/irq/manage.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/irq/manage.c) source code file and allocates a given interrupt line. If we will look at this function, it starts from the definition of the `irqaction` and the `irq_desc`:
|
and will be displayed in the output of the `/proc/interrupts`. And in the last parameter we pass the pointer to the our main `uart_port` structure. Now we know a little about `request_irq` function and its parameters, let's look at its implementation. As we can see above, the `request_irq` function just makes a call of the `request_threaded_irq` function inside. The `request_threaded_irq` function defined in the [kernel/irq/manage.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/irq/manage.c) source code file and allocates a given interrupt line. If we will look at this function, it starts from the definition of the `irqaction` and the `irq_desc`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
|
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
|
||||||
@ -296,7 +296,7 @@ if (new->thread_fn && !nested) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
And fill the rest of the given interrupt descriptor fields in the end. So, our `16` and `17` interrupt request lines are registered and the `serial21285_rx_chars` and `serial21285_tx_chars` functions will be invoked when an interrupt controller will get event releated to these interrupts. Now let's look at what happens when an interrupt occurs.
|
And fill the rest of the given interrupt descriptor fields in the end. So, our `16` and `17` interrupt request lines are registered and the `serial21285_rx_chars` and `serial21285_tx_chars` functions will be invoked when an interrupt controller will get event related to these interrupts. Now let's look at what happens when an interrupt occurs.
|
||||||
|
|
||||||
Prepare to handle an interrupt
|
Prepare to handle an interrupt
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
@ -283,7 +283,7 @@ After we allocated space for general purpose registers, we do some checks to und
|
|||||||
|
|
||||||
Let's consider all of these there cases in course.
|
Let's consider all of these there cases in course.
|
||||||
|
|
||||||
An exception occured in userspace
|
An exception occurred in userspace
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
In the first let's consider a case when an exception has `paranoid=1` like our `debug` and `int3` exceptions. In this case we check selector from `CS` segment register and jump at `1f` label if we came from userspace or the `paranoid_entry` will be called in other way.
|
In the first let's consider a case when an exception has `paranoid=1` like our `debug` and `int3` exceptions. In this case we check selector from `CS` segment register and jump at `1f` label if we came from userspace or the `paranoid_entry` will be called in other way.
|
||||||
@ -477,7 +477,7 @@ In the end of this second way we just call secondary exception handler as we did
|
|||||||
call \do_sym
|
call \do_sym
|
||||||
```
|
```
|
||||||
|
|
||||||
The last method is similar to previous both, but an exception occured with `paranoid=0` and we may use fast method determination of where we are from.
|
The last method is similar to previous both, but an exception occurred with `paranoid=0` and we may use fast method determination of where we are from.
|
||||||
|
|
||||||
Exit from an exception handler
|
Exit from an exception handler
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
@ -446,7 +446,7 @@ That's all.
|
|||||||
Conclusion
|
Conclusion
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
It is the end of the sixth part of the [Interrupts and Interrupt Handling](https://0xax.gitbook.io/linux-insides/summary/interrupts) chapter and we saw implementation of some exception handlers in this part, like `non-maskable` interrupt, [SIMD](https://en.wikipedia.org/wiki/SIMD) and [x87 FPU](https://en.wikipedia.org/wiki/X87) floating point exception. Finally we have finsihed with the `trap_init` function in this part and will go ahead in the next part. The next our point is the external interrupts and the `early_irq_init` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c).
|
It is the end of the sixth part of the [Interrupts and Interrupt Handling](https://0xax.gitbook.io/linux-insides/summary/interrupts) chapter and we saw implementation of some exception handlers in this part, like `non-maskable` interrupt, [SIMD](https://en.wikipedia.org/wiki/SIMD) and [x87 FPU](https://en.wikipedia.org/wiki/X87) floating point exception. Finally we have finished with the `trap_init` function in this part and will go ahead in the next part. The next our point is the external interrupts and the `early_irq_init` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c).
|
||||||
|
|
||||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||||
|
|
||||||
@ -472,7 +472,7 @@ Links
|
|||||||
* [breakpoint](https://en.wikipedia.org/wiki/Breakpoint)
|
* [breakpoint](https://en.wikipedia.org/wiki/Breakpoint)
|
||||||
* [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table)
|
* [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table)
|
||||||
* [stack frame](https://en.wikipedia.org/wiki/Call_stack)
|
* [stack frame](https://en.wikipedia.org/wiki/Call_stack)
|
||||||
* [Model Specific regiser](https://en.wikipedia.org/wiki/Model-specific_register)
|
* [Model Specific register](https://en.wikipedia.org/wiki/Model-specific_register)
|
||||||
* [percpu](https://0xax.gitbook.io/linux-insides/summary/concepts/linux-cpu-1)
|
* [percpu](https://0xax.gitbook.io/linux-insides/summary/concepts/linux-cpu-1)
|
||||||
* [RCU](https://en.wikipedia.org/wiki/Read-copy-update)
|
* [RCU](https://en.wikipedia.org/wiki/Read-copy-update)
|
||||||
* [MPX](https://en.wikipedia.org/wiki/Intel_MPX)
|
* [MPX](https://en.wikipedia.org/wiki/Intel_MPX)
|
||||||
|
@ -314,7 +314,7 @@ if (kmemcheck_fault(regs, address, error_code))
|
|||||||
return;
|
return;
|
||||||
```
|
```
|
||||||
|
|
||||||
First of all the `kmemcheck_fault` function checks that the fault was occured by the correct reason. At first we check the [flags register](https://en.wikipedia.org/wiki/FLAGS_register) and check that we are in normal kernel mode:
|
First of all the `kmemcheck_fault` function checks that the fault was occurred by the correct reason. At first we check the [flags register](https://en.wikipedia.org/wiki/FLAGS_register) and check that we are in normal kernel mode:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (regs->flags & X86_VM_MASK)
|
if (regs->flags & X86_VM_MASK)
|
||||||
|
@ -135,11 +135,11 @@ SYSCALL_DEFINE3(execve,
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
It takes an executable file name, set of command line arguments, and set of enviroment variables. As you may guess, everything is done by the `do_execve` function. I will not describe the implementation of the `do_execve` function in detail because you can read about this in [here](https://0xax.gitbook.io/linux-insides/summary/syscall/linux-syscall-4). But in short words, the `do_execve` function does many checks like `filename` is valid, limit of launched processes is not exceed in our system and etc. After all of these checks, this function parses our executable file which is represented in [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) format, creates memory descriptor for newly executed executable file and fills it with the appropriate values like area for the stack, heap and etc. When the setup of new binary image is done, the `start_thread` function will set up one new process. This function is architecture-specific and for the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, its definition will be located in the [arch/x86/kernel/process_64.c](https://github.com/torvalds/linux/blob/08e4e0d0456d0ca8427b2d1ddffa30f1c3e774d7/arch/x86/kernel/process_64.c#L239) source code file.
|
It takes an executable file name, set of command line arguments, and set of environment variables. As you may guess, everything is done by the `do_execve` function. I will not describe the implementation of the `do_execve` function in detail because you can read about this in [here](https://0xax.gitbook.io/linux-insides/summary/syscall/linux-syscall-4). But in short words, the `do_execve` function does many checks like `filename` is valid, limit of launched processes is not exceed in our system and etc. After all of these checks, this function parses our executable file which is represented in [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) format, creates memory descriptor for newly executed executable file and fills it with the appropriate values like area for the stack, heap and etc. When the setup of new binary image is done, the `start_thread` function will set up one new process. This function is architecture-specific and for the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, its definition will be located in the [arch/x86/kernel/process_64.c](https://github.com/torvalds/linux/blob/08e4e0d0456d0ca8427b2d1ddffa30f1c3e774d7/arch/x86/kernel/process_64.c#L239) source code file.
|
||||||
|
|
||||||
The `start_thread` function sets new value to [segment registers](https://en.wikipedia.org/wiki/X86_memory_segmentation) and program execution address. From this point, our new process is ready to start. Once the [context switch](https://en.wikipedia.org/wiki/Context_switch) will be done, control will be returned to userspace with new values of registers and the new executable will be started to execute.
|
The `start_thread` function sets new value to [segment registers](https://en.wikipedia.org/wiki/X86_memory_segmentation) and program execution address. From this point, our new process is ready to start. Once the [context switch](https://en.wikipedia.org/wiki/Context_switch) will be done, control will be returned to userspace with new values of registers and the new executable will be started to execute.
|
||||||
|
|
||||||
That's all from the kernel side. The Linux kernel prepares the binary image for execution and its execution starts right after the context switch and returns controll to userspace when it is finished. But it does not answer our questions like where does `_start` come from and others. Let's try to answer these questions in the next paragraph.
|
That's all from the kernel side. The Linux kernel prepares the binary image for execution and its execution starts right after the context switch and returns control to userspace when it is finished. But it does not answer our questions like where does `_start` come from and others. Let's try to answer these questions in the next paragraph.
|
||||||
|
|
||||||
How does a program start in userspace
|
How does a program start in userspace
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
@ -300,7 +300,7 @@ We can get all the arguments we need for `__libc_start_main` function from the s
|
|||||||
+-----------------+
|
+-----------------+
|
||||||
```
|
```
|
||||||
|
|
||||||
After we cleared `ebp` register and saved the address of the termination function in the `r9` register, we pop an element from the stack to the `rsi` register, so after this `rsp` will point to the `argv` array and `rsi` will contain count of command line arguemnts passed to the program:
|
After we cleared `ebp` register and saved the address of the termination function in the `r9` register, we pop an element from the stack to the `rsi` register, so after this `rsp` will point to the `argv` array and `rsi` will contain count of command line arguments passed to the program:
|
||||||
|
|
||||||
```
|
```
|
||||||
+-----------------+
|
+-----------------+
|
||||||
|
@ -196,7 +196,7 @@ if (vsyscall_nr < 0) {
|
|||||||
...
|
...
|
||||||
sigsegv:
|
sigsegv:
|
||||||
force_sig(SIGSEGV, current);
|
force_sig(SIGSEGV, current);
|
||||||
reutrn true;
|
return true;
|
||||||
```
|
```
|
||||||
|
|
||||||
As it checked number of a virtual system call, it does some yet another checks like `access_ok` violations and execute system call function depends on the number of a virtual system call:
|
As it checked number of a virtual system call, it does some yet another checks like `access_ok` violations and execute system call function depends on the number of a virtual system call:
|
||||||
|
@ -25,7 +25,7 @@ int main(int argc, char *argv) {
|
|||||||
perror("Opening of the file is failed\n");
|
perror("Opening of the file is failed\n");
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
printf("file sucessfully opened\n");
|
printf("file successfully opened\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
close(fd);
|
close(fd);
|
||||||
|
@ -321,7 +321,7 @@ If you remember, we have started this part with the call of the `tick_init` func
|
|||||||
Initialization of dyntick related data structures
|
Initialization of dyntick related data structures
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
We already saw some information about `dyntick` concept in this part and we know that this concept allows kernel to disable system timer interrupts in the `idle` state. The `tick_nohz_init` function makes initialization of the different data structures which are related to this concept. This function defined in the [kernel/time/tick-sched.c](https://github.com/torvalds/linux/blob/master/kernel/time/tick-sched.c) source code file and starts from the check of the value of the `tick_nohz_full_running` variable which represents state of the tick-less mode for the `idle` state and the state when system timer interrups are disabled during a processor has only one runnable task:
|
We already saw some information about `dyntick` concept in this part and we know that this concept allows kernel to disable system timer interrupts in the `idle` state. The `tick_nohz_init` function makes initialization of the different data structures which are related to this concept. This function defined in the [kernel/time/tick-sched.c](https://github.com/torvalds/linux/blob/master/kernel/time/tick-sched.c) source code file and starts from the check of the value of the `tick_nohz_full_running` variable which represents state of the tick-less mode for the `idle` state and the state when system timer interrupts are disabled during a processor has only one runnable task:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (!tick_nohz_full_running) {
|
if (!tick_nohz_full_running) {
|
||||||
|
@ -133,3 +133,4 @@ Thank you to all contributors:
|
|||||||
* [Mingzhe Yang](https://github.com/Mutated1994)
|
* [Mingzhe Yang](https://github.com/Mutated1994)
|
||||||
* [Yuxin Wu](https://github.com/chaffz)
|
* [Yuxin Wu](https://github.com/chaffz)
|
||||||
* [Biao Ding](https://github.com/SmallPond)
|
* [Biao Ding](https://github.com/SmallPond)
|
||||||
|
* [Arfy slowy](https://github.com/slowy07)
|
||||||
|
Loading…
Reference in New Issue
Block a user