mirror of
https://github.com/0xAX/linux-insides.git
synced 2024-11-15 12:38:54 +00:00
Merge pull request #94 from davydovanton/fix-typos
Fix typos in repository
This commit is contained in:
commit
66e3dbdd73
@ -11,7 +11,7 @@ Protected mode
|
||||
|
||||
Before we can move to the native Intel64 [Long mode](http://en.wikipedia.org/wiki/Long_mode), the kernel must switch the CPU into protected mode. What is the protected mode? The Protected mode was first added to the x86 architecture in 1982 and was the main mode of Intel processors from [80286](http://en.wikipedia.org/wiki/Intel_80286) processor until Intel 64 and long mode. The Main reason to move away from the real mode that there is very limited access to the RAM. As you can remember from the previous part, there is only 2^20 bytes or 1 megabyte, sometimes even only 640 kilobytes.
|
||||
|
||||
Protected mode brought many changes, but the main is a different memory management.The 24-bit address bus was replaced with a 32-bit address bus. It allows to access to 4 gigabytes of physical adress space. Also [paging](http://en.wikipedia.org/wiki/Paging) support was added which we will see in the next parts.
|
||||
Protected mode brought many changes, but the main is a different memory management.The 24-bit address bus was replaced with a 32-bit address bus. It allows to access to 4 gigabytes of physical address space. Also [paging](http://en.wikipedia.org/wiki/Paging) support was added which we will see in the next parts.
|
||||
|
||||
Memory management in the protected mode is divided into two, almost independent parts:
|
||||
|
||||
|
@ -317,7 +317,7 @@ struct mem_vector {
|
||||
|
||||
The next step after we collected all unsafe memory regions in the `mem_avoid` array will be search of the random address which does not overlap with the unsafe regions with the `find_random_addr` function.
|
||||
|
||||
First of all we can see allign of the output address in the `find_random_addr` function:
|
||||
First of all we can see align of the output address in the `find_random_addr` function:
|
||||
|
||||
```C
|
||||
minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
|
||||
@ -399,7 +399,7 @@ static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
|
||||
static unsigned long slot_max;
|
||||
```
|
||||
|
||||
After `process_e820_entry` will be executed, we will have array of the addressess which are safe for the decompressed kernel. Next we call `slots_fetch_random` function for getting random item from this array:
|
||||
After `process_e820_entry` will be executed, we will have array of the addresses which are safe for the decompressed kernel. Next we call `slots_fetch_random` function for getting random item from this array:
|
||||
|
||||
```C
|
||||
if (slot_max == 0)
|
||||
@ -418,7 +418,7 @@ After all these checks will see the familiar message:
|
||||
Decompressing Linux...
|
||||
```
|
||||
|
||||
and call `decompress` function which will decompress the kernel. `decompress` function depends on what decompression algorithm was choosen during kernel compilartion:
|
||||
and call `decompress` function which will decompress the kernel. `decompress` function depends on what decompression algorithm was chosen during kernel compilartion:
|
||||
|
||||
```C
|
||||
#ifdef CONFIG_KERNEL_GZIP
|
||||
|
@ -25,6 +25,6 @@ If you want to contribute to [linux-insides](https://github.com/0xAX/linux-insid
|
||||
|
||||
**IMPORTANT**
|
||||
|
||||
Please, make the actual changes. While you made your changes, I can merge changes from somebody else and your changes can conflict with `master` branch content. Please rebase on master everytime before you're going to push your changes and check that your branch doesn't conflict with `master`.
|
||||
Please, make the actual changes. While you made your changes, I can merge changes from somebody else and your changes can conflict with `master` branch content. Please rebase on master every time before you're going to push your changes and check that your branch doesn't conflict with `master`.
|
||||
|
||||
Thank you.
|
||||
Thank you.
|
||||
|
@ -19,7 +19,7 @@ set_cpu_present(cpu, true);
|
||||
set_cpu_possible(cpu, true);
|
||||
```
|
||||
|
||||
`set_cpu_possible` is a set of cpu ID's which can be plugged in anytime during the life of that system boot. `cpu_present` represents which CPUs are currently plugged in. `cpu_online` represents subset of the `cpu_present` and indicates CPUs which are available for scheduling. These masks depends on `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. Implementation of the all of these functions are very similar. Every function checks the second paramter. If it is `true`, calls `cpumask_set_cpu` or `cpumask_clear_cpu` otherwise.
|
||||
`set_cpu_possible` is a set of cpu ID's which can be plugged in anytime during the life of that system boot. `cpu_present` represents which CPUs are currently plugged in. `cpu_online` represents subset of the `cpu_present` and indicates CPUs which are available for scheduling. These masks depends on `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. Implementation of the all of these functions are very similar. Every function checks the second parameter. If it is `true`, calls `cpumask_set_cpu` or `cpumask_clear_cpu` otherwise.
|
||||
|
||||
There are two ways for a `cpumask` creation. First is to use `cpumask_t`. It defined as:
|
||||
|
||||
@ -27,7 +27,7 @@ There are two ways for a `cpumask` creation. First is to use `cpumask_t`. It def
|
||||
typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
|
||||
```
|
||||
|
||||
It wraps `cpumask` structure which contains one bitmak `bits` field. `DECLARE_BITMAP` macro gets two paramters:
|
||||
It wraps `cpumask` structure which contains one bitmak `bits` field. `DECLARE_BITMAP` macro gets two parameters:
|
||||
|
||||
* bitmap name;
|
||||
* number of bits.
|
||||
@ -70,7 +70,7 @@ The second way to define cpumask is to use `DECLARE_BITMAP` macro directly and `
|
||||
: (void *)sizeof(__check_is_bitmap(bitmap))))
|
||||
```
|
||||
|
||||
We can see ternary operator operator here which is `true` everytime. `__check_is_bitmap` inline function defined as:
|
||||
We can see ternary operator operator here which is `true` every time. `__check_is_bitmap` inline function defined as:
|
||||
|
||||
```C
|
||||
static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
@ -79,7 +79,7 @@ static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
}
|
||||
```
|
||||
|
||||
And returns `1` everytime. We need in it here only for one purpose: In compile time it checks that given `bitmap` is a bitmap, or with another words it checks that given `bitmap` has type - `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting array of `unsigned long` to the `struct cpumask *`.
|
||||
And returns `1` every time. We need in it here only for one purpose: In compile time it checks that given `bitmap` is a bitmap, or with another words it checks that given `bitmap` has type - `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting array of `unsigned long` to the `struct cpumask *`.
|
||||
|
||||
cpumask API
|
||||
--------------------------------------------------------------------------------
|
||||
@ -103,7 +103,7 @@ void set_cpu_online(unsigned int cpu, bool online)
|
||||
}
|
||||
```
|
||||
|
||||
First of all it checks the second `state` paramter and calls `cpumask_set_cpu` or `cpumask_clear_cpu` depends on it. Here we can see casting to the `struct cpumask *` of the second paramter in the `cpumask_set_cpu`. In our case it is `cpu_online_bits` which is bitmap and defined as:
|
||||
First of all it checks the second `state` parameter and calls `cpumask_set_cpu` or `cpumask_clear_cpu` depends on it. Here we can see casting to the `struct cpumask *` of the second parameter in the `cpumask_set_cpu`. In our case it is `cpu_online_bits` which is bitmap and defined as:
|
||||
|
||||
```C
|
||||
static DECLARE_BITMAP(cpu_online_bits, CONFIG_NR_CPUS) __read_mostly;
|
||||
@ -118,7 +118,7 @@ static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp)
|
||||
}
|
||||
```
|
||||
|
||||
`set_bit` function takes two paramter too, and sets a given bit (first paramter) in the memory (second paramter or `cpu_online_bits` bitmap). We can see here that before `set_bit` will be called, its two paramter will be passed to the
|
||||
`set_bit` function takes two parameter too, and sets a given bit (first parameter) in the memory (second parameter or `cpu_online_bits` bitmap). We can see here that before `set_bit` will be called, its two parameter will be passed to the
|
||||
|
||||
* cpumask_check;
|
||||
* cpumask_bits.
|
||||
@ -153,7 +153,7 @@ This function looks scarry, but it is not so hard as it seems. First of all it p
|
||||
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
|
||||
```
|
||||
|
||||
`__builtin_constant_p` checks that given paramter is known constant at compile-time. As our `cpu` is not compile-time constant, `else` clause will be executed:
|
||||
`__builtin_constant_p` checks that given parameter is known constant at compile-time. As our `cpu` is not compile-time constant, `else` clause will be executed:
|
||||
|
||||
```C
|
||||
asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
|
||||
@ -163,7 +163,7 @@ Let's try to understand how it works step by step:
|
||||
|
||||
`LOCK_PREFIX` is a x86 `lock` instruction. This instruction tells to the cpu to occupy the system bus while instruction will be executed. This allows to synchronize memory access, preventing simultaneous access of multiple processors (or devices - DMA controller for example) to one memory cell.
|
||||
|
||||
`BITOP_ADDR` casts given paramter to the `(*(volatile long *)` and adds `+m` constraints. `+` means that this operand is bot read and written by the instruction. `m` shows that this is memory operand. `BITOP_ADDR` is defined as:
|
||||
`BITOP_ADDR` casts given parameter to the `(*(volatile long *)` and adds `+m` constraints. `+` means that this operand is bot read and written by the instruction. `m` shows that this is memory operand. `BITOP_ADDR` is defined as:
|
||||
|
||||
```C
|
||||
#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
|
||||
@ -187,7 +187,7 @@ cpumask provides the set of macro for getting amount of the CPUs with different
|
||||
#define num_online_cpus() cpumask_weight(cpu_online_mask)
|
||||
```
|
||||
|
||||
This macro returns amount of the `online` CPUs. It calls `cpumask_weight` function with the `cpu_online_mask` bitmap (read about about it). `cpumask_wieght` function makes an one call of the `bitmap_wiegt` function with two paramters:
|
||||
This macro returns amount of the `online` CPUs. It calls `cpumask_weight` function with the `cpu_online_mask` bitmap (read about about it). `cpumask_wieght` function makes an one call of the `bitmap_wiegt` function with two parameters:
|
||||
|
||||
* cpumask bitmap;
|
||||
* `nr_cpumask_bits` - which is `NR_CPUS` in our case.
|
||||
|
@ -14,13 +14,13 @@ Kernel provides API for creating per-cpu variables - `DEFINE_PER_CPU` macro:
|
||||
|
||||
This macro defined in the [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) as many other macros for work with per-cpu variables. Now we will see how this feature implemented.
|
||||
|
||||
Take a look on `DECLARE_PER_CPU` definition. We see that it takes 2 paramters: `type` and `name`. So we can use it for creation per-cpu variable, for example like this:
|
||||
Take a look on `DECLARE_PER_CPU` definition. We see that it takes 2 parameters: `type` and `name`. So we can use it for creation per-cpu variable, for example like this:
|
||||
|
||||
```C
|
||||
DEFINE_PER_CPU(int, per_cpu_n)
|
||||
```
|
||||
|
||||
We pass type of our variable and name. `DEFI_PER_CPU` calls `DEFINE_PER_CPU_SECTION` macro and passes the same two paramaters and empty string to it. Let's look on the defintion of the `DEFINE_PER_CPU_SECTION`:
|
||||
We pass type of our variable and name. `DEFI_PER_CPU` calls `DEFINE_PER_CPU_SECTION` macro and passes the same two paramaters and empty string to it. Let's look on the definition of the `DEFINE_PER_CPU_SECTION`:
|
||||
|
||||
```C
|
||||
#define DEFINE_PER_CPU_SECTION(type, name, sec) \
|
||||
@ -83,7 +83,7 @@ and
|
||||
#define raw_cpu_ptr(ptr) per_cpu_ptr(ptr, 0)
|
||||
```
|
||||
|
||||
where `per_cpu_ptr` returns a pointer to the per-cpu variable for the given cpu (second paramter). After that we got per-cpu variables and made any manipulations on it, we must call `put_cpu_var` macro which enables preemption with call of `preempt_enable` function. So the typical usage of a per-cpu variable is following:
|
||||
where `per_cpu_ptr` returns a pointer to the per-cpu variable for the given cpu (second parameter). After that we got per-cpu variables and made any manipulations on it, we must call `put_cpu_var` macro which enables preemption with call of `preempt_enable` function. So the typical usage of a per-cpu variable is following:
|
||||
|
||||
```C
|
||||
get_cpu_var(var);
|
||||
@ -115,7 +115,7 @@ do {
|
||||
|
||||
which makes given `ptr` type of `const void __percpu *`,
|
||||
|
||||
After this we can see the call of the `SHIFT_PERCPU_PTR` macro with two paramters. At first paramter we pass our ptr and sencond we pass cpu number to the `per_cpu_offset` macro which:
|
||||
After this we can see the call of the `SHIFT_PERCPU_PTR` macro with two parameters. At first parameter we pass our ptr and sencond we pass cpu number to the `per_cpu_offset` macro which:
|
||||
|
||||
```C
|
||||
#define per_cpu_offset(x) (__per_cpu_offset[x])
|
||||
@ -141,7 +141,7 @@ That's all! Of course it is not full API, but the general part. It can be hard f
|
||||
|
||||
Let's again look on the algorithm of getting pointer on per-cpu variable:
|
||||
|
||||
* Kernel creates multiply `.data..percpu` sections (ones perc-pu) during intialization process;
|
||||
* Kernel creates multiply `.data..percpu` sections (ones perc-pu) during initialization process;
|
||||
* All variables created with the `DEFINE_PER_CPU` macro will be reloacated to the first section or for CPU0;
|
||||
* `__per_cpu_offset` array filled with the distance (`BOOT_PERCPU_OFFSET`) between `.data..percpu` sections;
|
||||
* When `per_cpu_ptr` called for example for getting pointer on the certain per-cpu variable for the third CPU, `__per_cpu_offset` array will be accessed, where every index points to the certain CPU.
|
||||
|
@ -133,7 +133,7 @@ static inline void list_add(struct list_head *new, struct list_head *head)
|
||||
}
|
||||
```
|
||||
|
||||
It just calls internal function `__list_add` with the 3 given paramters:
|
||||
It just calls internal function `__list_add` with the 3 given parameters:
|
||||
|
||||
* new - new entry;
|
||||
* head - list head after which will be inserted new item;
|
||||
@ -230,7 +230,7 @@ The next offsetof macro calculates offset from the beginning of the structure to
|
||||
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
|
||||
```
|
||||
|
||||
Let's summarize all about `container_of` macro. `container_of` macro returns address of the structure by the given address of the structure's field with `list_head` type, the name of the structure field with `list_head` type and type of the container structure. At the first line this macro declares the `__mptr` pointer which points to the field of the structure that `ptr` points to and assigns it to the `ptr`. Now `ptr` and `__mptr` points have the same address. Techincally we no need in this line, it needs for the type checking. First line ensures that that given structure (`type` paramter) has a member which called `member`. In the second line it calculates offset of the field from the structure with the `offsetof` macro and substracts it from the structure address. That's all.
|
||||
Let's summarize all about `container_of` macro. `container_of` macro returns address of the structure by the given address of the structure's field with `list_head` type, the name of the structure field with `list_head` type and type of the container structure. At the first line this macro declares the `__mptr` pointer which points to the field of the structure that `ptr` points to and assigns it to the `ptr`. Now `ptr` and `__mptr` points have the same address. Techincally we no need in this line, it needs for the type checking. First line ensures that that given structure (`type` parameter) has a member which called `member`. In the second line it calculates offset of the field from the structure with the `offsetof` macro and subtracts it from the structure address. That's all.
|
||||
|
||||
Of course `list_add` and `list_entry` is not only functions which provides `<linux/list.h>`. Implementation of the doubly linked list provides the following API:
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
You will find here a couple of posts which describe the full cycle of kernel initialization from its first steps after the kernel has decompressed to the start of the first process run by the kernel itself.
|
||||
|
||||
* [First steps after kernel decompression](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-1.md) - describes first steps in the kernel.
|
||||
* [Early interupt and exception handling](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-2.md) - describes early interrupts initialization and early page fault handler.
|
||||
* [Early interrupt and exception handling](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-2.md) - describes early interrupts initialization and early page fault handler.
|
||||
* [Last preparations before the kernel entry point](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-3.md) - describes the last preparations before the call of the `start_kernel`.
|
||||
* [Kernel entry point](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-4.md) - describes first steps in the kernel generic code.
|
||||
* [Continue of architecture-specific initializations](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-5.md) - describes architecture-specific initialization.
|
||||
|
@ -387,7 +387,7 @@ As we got `init_per_cpu__gdt_page` in `INIT_PER_CPU_VAR` and `INIT_PER_CPU` macr
|
||||
|
||||
Generally per-CPU variables is a 2.6 kernel feature. You can understand what is it from it's name. When we create `per-CPU` variable, each CPU will have will have it's own copy of this variable. Here we creating `gdt_page` per-CPU variable. There are many advantages for variables of this type, like there are no locks, because each CPU works with it's own copy of variable and etc... So every core on multiprocessor will have it's own `GDT` table and every entry in the table will represent a memory segment which can be accessed from the thread which runned on the core. You can read in details about `per-CPU` variables in the [Theory/per-cpu](http://0xax.gitbooks.io/linux-insides/content/Theory/per-cpu.html) post.
|
||||
|
||||
As we loaded new Global Descriptor Table, we reload segments as we did it everytime:
|
||||
As we loaded new Global Descriptor Table, we reload segments as we did it every time:
|
||||
|
||||
```assembly
|
||||
xorl %eax,%eax
|
||||
|
@ -136,7 +136,7 @@ As we have `init_level4_pgt` filled with zeros, we set the last `init_level4_pgt
|
||||
init_level4_pgt[511] = early_level4_pgt[511];
|
||||
```
|
||||
|
||||
Remeber that we dropped all `early_level4_pgt` entries in the `reset_early_page_table` function and kept only kernel high mapping there.
|
||||
Remember that we dropped all `early_level4_pgt` entries in the `reset_early_page_table` function and kept only kernel high mapping there.
|
||||
|
||||
The last step in the `x86_64_start_kernel` function is the call of the:
|
||||
|
||||
@ -250,7 +250,7 @@ lowmem = min(lowmem, LOWMEM_CAP);
|
||||
memblock_reserve(lowmem, 0x100000 - lowmem);
|
||||
```
|
||||
|
||||
`memblock_reserve` function is defined at [mm/block.c](https://github.com/torvalds/linux/blob/master/mm/block.c) and takes two paramters:
|
||||
`memblock_reserve` function is defined at [mm/block.c](https://github.com/torvalds/linux/blob/master/mm/block.c) and takes two parameters:
|
||||
|
||||
* base physical address;
|
||||
* region size.
|
||||
@ -266,7 +266,7 @@ In the previous paragraph we stopped at the call of the `memblock_reserve` funct
|
||||
memblock_reserve_region(base, size, MAX_NUMNODES, 0);
|
||||
```
|
||||
|
||||
function and passes 4 paramters there:
|
||||
function and passes 4 parameters there:
|
||||
|
||||
* physical base address of the memory region;
|
||||
* size of the memory region;
|
||||
@ -382,7 +382,7 @@ NUMA node id depends on `MAX_NUMNODES` macro which is defined in the [include/li
|
||||
#define MAX_NUMNODES (1 << NODES_SHIFT)
|
||||
```
|
||||
|
||||
where `NODES_SHIFT` depends on `CONFIG_NODES_SHIFT` configuration paramter and defined as:
|
||||
where `NODES_SHIFT` depends on `CONFIG_NODES_SHIFT` configuration parameter and defined as:
|
||||
|
||||
```C
|
||||
#ifdef CONFIG_NODES_SHIFT
|
||||
|
@ -235,13 +235,13 @@ Remember that we have passed `cpu_number` as `pcp` to the `this_cpu_read` from t
|
||||
})
|
||||
```
|
||||
|
||||
Yes, it look a little strange, but it's easy. First of all we can see defintion of the `pscr_ret__` variable with the `int` type. Why int? Ok, `variable` is `common_cpu` and it was declared as per-cpu int variable:
|
||||
Yes, it look a little strange, but it's easy. First of all we can see definition of the `pscr_ret__` variable with the `int` type. Why int? Ok, `variable` is `common_cpu` and it was declared as per-cpu int variable:
|
||||
|
||||
```C
|
||||
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
|
||||
```
|
||||
|
||||
In the next step we call `__verify_pcpu_ptr` with the address of `cpu_number`. `__veryf_pcpu_ptr` used to verifying that given paramter is an per-cpu pointer. After that we set `pscr_ret__` value which depends on the size of the variable. Our `common_cpu` variable is `int`, so it 4 bytes size. It means that we will get `this_cpu_read_4(common_cpu)` in `pscr_ret__`. In the end of the `__pcpu_size_call_return` we just call it. `this_cpu_read_4` is a macro:
|
||||
In the next step we call `__verify_pcpu_ptr` with the address of `cpu_number`. `__veryf_pcpu_ptr` used to verifying that given parameter is an per-cpu pointer. After that we set `pscr_ret__` value which depends on the size of the variable. Our `common_cpu` variable is `int`, so it 4 bytes size. It means that we will get `this_cpu_read_4(common_cpu)` in `pscr_ret__`. In the end of the `__pcpu_size_call_return` we just call it. `this_cpu_read_4` is a macro:
|
||||
|
||||
```C
|
||||
#define this_cpu_read_4(pcp) percpu_from_op("mov", pcp)
|
||||
@ -276,9 +276,9 @@ set_cpu_present(cpu, true);
|
||||
set_cpu_possible(cpu, true);
|
||||
```
|
||||
|
||||
All of these functions use the concept - `cpumask`. `cpu_possible` is a set of cpu ID's which can be plugged in anytime during the life of that system boot. `cpu_present` represents which CPUs are currently plugged in. `cpu_online` represents subset of the `cpu_present` and indicates CPUs which are available for scheduling. These masks depends on `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. Implementation of the all of these functions are very similar. Every function checks the second paramter. If it is `true`, calls `cpumask_set_cpu` or `cpumask_clear_cpu` otherwise.
|
||||
All of these functions use the concept - `cpumask`. `cpu_possible` is a set of cpu ID's which can be plugged in anytime during the life of that system boot. `cpu_present` represents which CPUs are currently plugged in. `cpu_online` represents subset of the `cpu_present` and indicates CPUs which are available for scheduling. These masks depends on `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. Implementation of the all of these functions are very similar. Every function checks the second parameter. If it is `true`, calls `cpumask_set_cpu` or `cpumask_clear_cpu` otherwise.
|
||||
|
||||
For example let's look on `set_cpu_possible`. As we passed `true` as the second paramter, the:
|
||||
For example let's look on `set_cpu_possible`. As we passed `true` as the second parameter, the:
|
||||
|
||||
```C
|
||||
cpumask_set_cpu(cpu, to_cpumask(cpu_possible_bits));
|
||||
@ -304,7 +304,7 @@ As we can see from its definition, `DECLARE_BITMAP` macro expands to the array o
|
||||
: (void *)sizeof(__check_is_bitmap(bitmap))))
|
||||
```
|
||||
|
||||
I don't know how about you, but it looked really weird for me at the first time. We can see ternary operator operator here which is `true` everytime, but why the `__check_is_bitmap` here? It's simple, let's look on it:
|
||||
I don't know how about you, but it looked really weird for me at the first time. We can see ternary operator operator here which is `true` every time, but why the `__check_is_bitmap` here? It's simple, let's look on it:
|
||||
|
||||
```C
|
||||
static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
@ -313,7 +313,7 @@ static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
}
|
||||
```
|
||||
|
||||
Yeah, it just returns `1` everytime. Actually we need in it here only for one purpose: In compile time it checks that given `bitmap` is a bitmap, or with another words it checks that given `bitmap` has type - `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting array of `unsigned long` to the `struct cpumask *`. Now we can call `cpumask_set_cpu` function with the `cpu` - 0 and `struct cpumask *cpu_possible_bits`. This function makes only one call of the `set_bit` function which sets the given `cpu` in the cpumask. All of these `set_cpu_*` functions work on the same principle.
|
||||
Yeah, it just returns `1` every time. Actually we need in it here only for one purpose: In compile time it checks that given `bitmap` is a bitmap, or with another words it checks that given `bitmap` has type - `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting array of `unsigned long` to the `struct cpumask *`. Now we can call `cpumask_set_cpu` function with the `cpu` - 0 and `struct cpumask *cpu_possible_bits`. This function makes only one call of the `set_bit` function which sets the given `cpu` in the cpumask. All of these `set_cpu_*` functions work on the same principle.
|
||||
|
||||
If you're not sure that this `set_cpu_*` operations and `cpumask` are not clear for you, don't worry about it. You can get more info by reading of the special part about it - [cpumask](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).
|
||||
|
||||
@ -335,7 +335,7 @@ as you can see it just expands to the `printk` call. For this moment we use `pr_
|
||||
pr_notice("%s", linux_banner);
|
||||
```
|
||||
|
||||
which is just kernel version with some additional paramters:
|
||||
which is just kernel version with some additional parameters:
|
||||
|
||||
```
|
||||
Linux version 4.0.0-rc6+ (alex@localhost) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #319 SMP
|
||||
@ -352,7 +352,7 @@ This function starts from the reserving memory block for the kernel `_text` and
|
||||
memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text);
|
||||
```
|
||||
|
||||
You can read about `memblock` in the [Linux kernel memory management Part 1.](http://0xax.gitbooks.io/linux-insides/content/mm/linux-mm-1.html). As you can remember `memblock_reserve` function takes two paramters:
|
||||
You can read about `memblock` in the [Linux kernel memory management Part 1.](http://0xax.gitbooks.io/linux-insides/content/mm/linux-mm-1.html). As you can remember `memblock_reserve` function takes two parameters:
|
||||
|
||||
* base physical address of a memory block;
|
||||
* size of a memor block.
|
||||
@ -364,7 +364,7 @@ Base physical address of the `_text` symbol we will get with the `__pa_symbol` m
|
||||
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
|
||||
```
|
||||
|
||||
First of all it calls `__phys_reloc_hide` macro on the given paramter. `__phys_reloc_hide` macro does nothing for `x86_64` and just returns the given paramter. Implementation of the `__phys_addr_symbol` macro is easy. It just substracts the symbol address from the base addres of the kernel text mapping base virtual address (you can remember that it is `__START_KERNEL_map`) and adds `phys_base` which is base address of the `_text`:
|
||||
First of all it calls `__phys_reloc_hide` macro on the given parameter. `__phys_reloc_hide` macro does nothing for `x86_64` and just returns the given parameter. Implementation of the `__phys_addr_symbol` macro is easy. It just subtracts the symbol address from the base address of the kernel text mapping base virtual address (you can remember that it is `__START_KERNEL_map`) and adds `phys_base` which is base address of the `_text`:
|
||||
|
||||
```C
|
||||
#define __phys_addr_symbol(x) \
|
||||
@ -376,7 +376,7 @@ After we got physical address of the `_text` symbol, `memblock_reserve` can rese
|
||||
Reserve memory for initrd
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
In the next step after we reserved place for the kernel text and data is resering place for the [initrd](http://en.wikipedia.org/wiki/Initrd). We will not see details about `initrd` in this post, you just may know that it is temprary root file system stored in memory and used by the kernel during its startup. `early_reserve_initrd` function does all work. First of all this function get the base address of the ram disk, its size and the end address with:
|
||||
In the next step after we reserved place for the kernel text and data is resering place for the [initrd](http://en.wikipedia.org/wiki/Initrd). We will not see details about `initrd` in this post, you just may know that it is temporary root file system stored in memory and used by the kernel during its startup. `early_reserve_initrd` function does all work. First of all this function get the base address of the ram disk, its size and the end address with:
|
||||
|
||||
```C
|
||||
u64 ramdisk_image = get_ramdisk_image();
|
||||
@ -384,7 +384,7 @@ u64 ramdisk_size = get_ramdisk_size();
|
||||
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
|
||||
```
|
||||
|
||||
All of these paramters it takes from the `boot_params`. If you have read chapter abot [Linux Kernel Booting Process](http://0xax.gitbooks.io/linux-insides/content/Booting/index.html), you must remember that we filled `boot_params` structure during boot time. Kerne setup header contains a couple of fields which describes ramdisk, for example:
|
||||
All of these parameters it takes from the `boot_params`. If you have read chapter abot [Linux Kernel Booting Process](http://0xax.gitbooks.io/linux-insides/content/Booting/index.html), you must remember that we filled `boot_params` structure during boot time. Kerne setup header contains a couple of fields which describes ramdisk, for example:
|
||||
|
||||
```
|
||||
Field name: ramdisk_image
|
||||
|
@ -52,7 +52,7 @@ As you can read above, we passed address of the `#DB` handler as `&debug` in the
|
||||
asmlinkage void debug(void);
|
||||
```
|
||||
|
||||
We can see `asmlinkage` attribute which tells to us that `debug` is function written with [assembly](http://en.wikipedia.org/wiki/Assembly_language). Yeah, again and again assembly :). Implementation of the `#DB` handler as other handlers is in ths [arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/entry_64.S) and defined with the `idtentry` assembly macro:
|
||||
We can see `asmlinkage` attribute which tells to us that `debug` is function written with [assembly](http://en.wikipedia.org/wiki/Assembly_language). Yeah, again and again assembly :). Implementation of the `#DB` handler as other handlers is in this [arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/entry_64.S) and defined with the `idtentry` assembly macro:
|
||||
|
||||
```assembly
|
||||
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
|
||||
@ -163,7 +163,7 @@ The next step is initialization of early `ioremap`. In general there are two way
|
||||
|
||||
We already saw first method (`outb/inb` instructions) in the part about linux kernel booting [process](http://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-3.html). The second method is to map I/O physical addresses to virtual addresses. When a physical address is accessed by the CPU, it may refer to a portion of physical RAM which can be mapped on memory of the I/O device. So `ioremap` used to map device memory into kernel address space.
|
||||
|
||||
As i wrote above next function is the `early_ioremap_init` which re-maps I/O memory to kernel address space so it can access it. We need to initialize early ioremap for early intialization code which needs to temporarily map I/O or memory regions before the normal mapping functions like `ioremap` are available. Implementation of this function is in the [arch/x86/mm/ioremap.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/ioremap.c). At the start of the `early_ioremap_init` we can see definition of the `pmd` point with `pmd_t` type (which presents page middle directory entry `typedef struct { pmdval_t pmd; } pmd_t;` where `pmdval_t` is `unsigned long`) and make a check that `fixmap` aligned in a correct way:
|
||||
As i wrote above next function is the `early_ioremap_init` which re-maps I/O memory to kernel address space so it can access it. We need to initialize early ioremap for early initialization code which needs to temporarily map I/O or memory regions before the normal mapping functions like `ioremap` are available. Implementation of this function is in the [arch/x86/mm/ioremap.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/ioremap.c). At the start of the `early_ioremap_init` we can see definition of the `pmd` point with `pmd_t` type (which presents page middle directory entry `typedef struct { pmdval_t pmd; } pmd_t;` where `pmdval_t` is `unsigned long`) and make a check that `fixmap` aligned in a correct way:
|
||||
|
||||
```C
|
||||
pmd_t *pmd;
|
||||
@ -196,7 +196,7 @@ After early `ioremap` was initialized, you can see the following code:
|
||||
ROOT_DEV = old_decode_dev(boot_params.hdr.root_dev);
|
||||
```
|
||||
|
||||
This code obtains major and minor numbers for the root device where `initrd` will be mounted later in the `do_mount_root` function. Major number of the device identifies a driver associated with the device. Minor number reffered on the device controlled by driver. Note that `old_decode_dev` takes one parameter from the `boot_params_structure`. As we can read from the x86 linux kernel boot protocol:
|
||||
This code obtains major and minor numbers for the root device where `initrd` will be mounted later in the `do_mount_root` function. Major number of the device identifies a driver associated with the device. Minor number referred on the device controlled by driver. Note that `old_decode_dev` takes one parameter from the `boot_params_structure`. As we can read from the x86 linux kernel boot protocol:
|
||||
|
||||
```
|
||||
Field name: root_dev
|
||||
|
@ -61,7 +61,7 @@ first of all it check that given index of `fixed_addresses` enum is not greater
|
||||
#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
|
||||
```
|
||||
|
||||
Here we shift left the given `fix-mapped` address index on the `PAGE_SHIFT` which determines size of a page as I wrote above and substract it from the `FIXADDR_TOP` which is the highest address of the `fix-mapped` area. There is inverse function for getting `fix-mapped` address from a virtual address:
|
||||
Here we shift left the given `fix-mapped` address index on the `PAGE_SHIFT` which determines size of a page as I wrote above and subtract it from the `FIXADDR_TOP` which is the highest address of the `fix-mapped` area. There is inverse function for getting `fix-mapped` address from a virtual address:
|
||||
|
||||
```C
|
||||
static inline unsigned long virt_to_fix(const unsigned long vaddr)
|
||||
@ -79,7 +79,7 @@ static inline unsigned long virt_to_fix(const unsigned long vaddr)
|
||||
|
||||
A PFN is simply in index within physical memory that is counted in page-sized units. PFN for a physical address could be trivially defined as (page_phys_addr >> PAGE_SHIFT);
|
||||
|
||||
`__virt_to_fix` clears first 12 bits in the given address, substracts it from the last address the of `fix-mapped` area (`FIXADDR_TOP`) and shifts right result on `PAGE_SHIFT` which is `12`. Let I explain how it works. As i already wrote we will crear first 12 bits in the given address with `x & PAGE_MASK`. As we substract this from the `FIXADDR_TOP`, we will get last 12 bits of the `FIXADDR_TOP` which are represent. We know that first 12 bits of the virtual address present offset in the page frame. With the shiting it on `PAGE_SHIFT` we will get `Page frame number` which is just all bits in a virtual address besides first 12 offset bits. `Fix-mapped` addresses are used in different [places](http://lxr.free-electrons.com/ident?i=fix_to_virt) of the linux kernel. `IDT` descriptor stored there, [Intel Trusted Execution Technology](http://en.wikipedia.org/wiki/Trusted_Execution_Technology) UUID stored in the `fix-mapped` area started from `FIX_TBOOT_BASE` index, [Xen](http://en.wikipedia.org/wiki/Xen) bootmap and many more... We already saw a little about `fix-mapped` addresses in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about linux kernel initialization. We used `fix-mapped` area in the early `ioremap` initialization. Let's look on it and try to understand what is it `ioremap`, how it implemented in the kernel and how it releated with the `fix-mapped` addresses.
|
||||
`__virt_to_fix` clears first 12 bits in the given address, subtracts it from the last address the of `fix-mapped` area (`FIXADDR_TOP`) and shifts right result on `PAGE_SHIFT` which is `12`. Let I explain how it works. As i already wrote we will crear first 12 bits in the given address with `x & PAGE_MASK`. As we subtract this from the `FIXADDR_TOP`, we will get last 12 bits of the `FIXADDR_TOP` which are represent. We know that first 12 bits of the virtual address present offset in the page frame. With the shiting it on `PAGE_SHIFT` we will get `Page frame number` which is just all bits in a virtual address besides first 12 offset bits. `Fix-mapped` addresses are used in different [places](http://lxr.free-electrons.com/ident?i=fix_to_virt) of the linux kernel. `IDT` descriptor stored there, [Intel Trusted Execution Technology](http://en.wikipedia.org/wiki/Trusted_Execution_Technology) UUID stored in the `fix-mapped` area started from `FIX_TBOOT_BASE` index, [Xen](http://en.wikipedia.org/wiki/Xen) bootmap and many more... We already saw a little about `fix-mapped` addresses in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about linux kernel initialization. We used `fix-mapped` area in the early `ioremap` initialization. Let's look on it and try to understand what is it `ioremap`, how it implemented in the kernel and how it releated with the `fix-mapped` addresses.
|
||||
|
||||
ioremap
|
||||
--------------------------------------------------------------------------------
|
||||
@ -164,7 +164,7 @@ struct resource iomem_resource = {
|
||||
.flags = IORESOURCE_MEM,
|
||||
};
|
||||
|
||||
As I wrote about `request_regions` is used for registering of I/O port region and this macro used in many [places](http://lxr.free-electrons.com/ident?i=request_region) in the kernel. For example let's look on the [drivers/char/rtc.c](https://github.com/torvalds/linux/blob/master/char/rtc.c). This source code file provides [Real Time Clock](http://en.wikipedia.org/wiki/Real-time_clock) interface in the linux kernel. As every kernel module, `rtc` module contains `module_init` defintion:
|
||||
As I wrote about `request_regions` is used for registering of I/O port region and this macro used in many [places](http://lxr.free-electrons.com/ident?i=request_region) in the kernel. For example let's look on the [drivers/char/rtc.c](https://github.com/torvalds/linux/blob/master/char/rtc.c). This source code file provides [Real Time Clock](http://en.wikipedia.org/wiki/Real-time_clock) interface in the linux kernel. As every kernel module, `rtc` module contains `module_init` definition:
|
||||
|
||||
```C
|
||||
module_init(rtc_init);
|
||||
@ -254,7 +254,7 @@ static inline const char *e820_type_to_string(int e820_type)
|
||||
|
||||
and we can see it in the `/proc/iomem` (read above).
|
||||
|
||||
Now let's try to understand how `ioremap` works. We already know little about `ioremap`, we saw it in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about linux kernel initalization. If you have read this part, you can remember call of the `early_ioremap_init` function from the [arch/x86/mm/ioremap.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/ioremap.c). Initialization of the `ioremap` splitten on two parts: there is early part which we can use before normal `ioremap` is available and normal `ioremap` which is available after `vmalloc` initialization and call of the `paging_init`. We do not know anything about `vmalloc` for now, so let's consider early initialization of the `ioremap`. First of all `early_ioremap_init` checks that `fixmap` is aligned on page middle directory boundary:
|
||||
Now let's try to understand how `ioremap` works. We already know little about `ioremap`, we saw it in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about linux kernel initialization. If you have read this part, you can remember call of the `early_ioremap_init` function from the [arch/x86/mm/ioremap.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/ioremap.c). Initialization of the `ioremap` splitten on two parts: there is early part which we can use before normal `ioremap` is available and normal `ioremap` which is available after `vmalloc` initialization and call of the `paging_init`. We do not know anything about `vmalloc` for now, so let's consider early initialization of the `ioremap`. First of all `early_ioremap_init` checks that `fixmap` is aligned on page middle directory boundary:
|
||||
|
||||
```C
|
||||
BUILD_BUG_ON((fix_to_virt(0) + PAGE_SIZE) & ((1 << PMD_SHIFT) - 1));
|
||||
@ -479,7 +479,7 @@ static inline void __native_flush_tlb_single(unsigned long addr)
|
||||
}
|
||||
```
|
||||
|
||||
or call `__flush_tlb` which just updates `cr3` register as we saw it above. After this step execution of the `__early_set_fixmap` function is finsihed and we can back to the `__early_ioremap` implementation. As we set fixmap area for the given addres, need to save the base virtual address of the I/O Re-mapped area in the `prev_map` with the `slot` index:
|
||||
or call `__flush_tlb` which just updates `cr3` register as we saw it above. After this step execution of the `__early_set_fixmap` function is finsihed and we can back to the `__early_ioremap` implementation. As we set fixmap area for the given address, need to save the base virtual address of the I/O Re-mapped area in the `prev_map` with the `slot` index:
|
||||
|
||||
```C
|
||||
prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);
|
||||
|
Loading…
Reference in New Issue
Block a user