mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-18 11:41:08 +00:00
Create linux-initialization-9.md
This commit is contained in:
parent
2a35aabc62
commit
c755233a7c
430
Initialization/linux-initialization-9.md
Normal file
430
Initialization/linux-initialization-9.md
Normal file
@ -0,0 +1,430 @@
|
|||||||
|
Kernel initialization. Part 9.
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
RCU initialization
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
This is ninth part of the [Linux Kernel initialization process](http://0xax.gitbooks.io/linux-insides/content/Initialization/index.html) and in the previous part we stopped at the [scheduler initialization](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-8.html). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) after the `sched_init` is the call of the `preempt_disablepreempt_disable`. There are two macros:
|
||||||
|
|
||||||
|
* `preempt_disable`
|
||||||
|
* `preempt_enable`
|
||||||
|
|
||||||
|
for preemption disabling and enabling. First of all let's try to understand what is it `preempt` in the context of an operating system kernel. In a simple words, preemption is ability of the operating system kernel to preempt current task to run task with higher priority. Here we need to disable preemption because we will have only one `init` process for the early boot time and we no need to stop it before we will call `cpu_idle` function. The `preempt_disable` macro defined in the [include/linux/preempt.h](https://github.com/torvalds/linux/blob/master/include/linux/preempt.h) and depends on the `CONFIG_PREEMPT_COUNT` kernel configuration option. This maco implemeted as:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#define preempt_disable() \
|
||||||
|
do { \
|
||||||
|
preempt_count_inc(); \
|
||||||
|
barrier(); \
|
||||||
|
} while (0)
|
||||||
|
```
|
||||||
|
|
||||||
|
and if `CONFIG_PREEMPT_COUNT` is not set just:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#define preempt_disable() barrier()
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's look on it. First of all we can see one difference between these macro implementations. The `preempt_disable` with `CONFIG_PREEMPT_COUNT` contains the call of the `preempt_count_inc`. There is special `percpu` variable which stores the number of held locks and `preempt_disable` calls:
|
||||||
|
|
||||||
|
```C
|
||||||
|
DECLARE_PER_CPU(int, __preempt_count);
|
||||||
|
```
|
||||||
|
|
||||||
|
In the first implementation of the `preempt_disable` we increment this `__preempt_count`. There is API for returning value of the `__preempt_count`, it is the `preempt_count` function. As we called `preempt_disable`, first of all we increment preemption counter with the `preempt_count_inc` macro which expands to the:
|
||||||
|
|
||||||
|
```
|
||||||
|
#define preempt_count_inc() preempt_count_add(1)
|
||||||
|
#define preempt_count_add(val) __preempt_count_add(val)
|
||||||
|
```
|
||||||
|
|
||||||
|
where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html)). Ok, we increased `__preempt_count` and th next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need in the oportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider simple example:
|
||||||
|
|
||||||
|
```C
|
||||||
|
preempt_disable();
|
||||||
|
foo();
|
||||||
|
preempt_enable();
|
||||||
|
```
|
||||||
|
|
||||||
|
Compiler can rearrange it as:
|
||||||
|
|
||||||
|
```C
|
||||||
|
preempt_disable();
|
||||||
|
preempt_enable();
|
||||||
|
foo();
|
||||||
|
```
|
||||||
|
|
||||||
|
In this case non-preemptible function `foo` can be preempted. As we put `barrier` macro in the `preempt_disable` and `preempt_enable` macros, it prevents the compiler from swapping `preempt_count_inc` with other statements. More about barriers you can read [here](http://en.wikipedia.org/wiki/Memory_barrier) and [here](https://www.kernel.org/doc/Documentation/memory-barriers.txt).
|
||||||
|
|
||||||
|
In the next step we can see following statement:
|
||||||
|
|
||||||
|
```C
|
||||||
|
if (WARN(!irqs_disabled(),
|
||||||
|
"Interrupts were enabled *very* early, fixing it\n"))
|
||||||
|
local_irq_disable();
|
||||||
|
```
|
||||||
|
|
||||||
|
which check [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) state, and disabling (with `cli` instruction for `x86_64`) if they are enabled.
|
||||||
|
|
||||||
|
That's all. Preemption is disabled and we can go ahead.
|
||||||
|
|
||||||
|
Initialization of the integer ID management
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
In the next step we can see the call of the `idr_init_cache` function which defined in the [lib/idr.c](https://github.com/torvalds/linux/blob/master/lib/idr.c). The `idr` library used in a various [places](http://lxr.free-electrons.com/ident?i=idr_find) in the linux kernel to manage assigning integer `IDs` to objects and looking up objects by id.
|
||||||
|
|
||||||
|
Let's look on the implementation of the `idr_init_cache` function:
|
||||||
|
|
||||||
|
```C
|
||||||
|
void __init idr_init_cache(void)
|
||||||
|
{
|
||||||
|
idr_layer_cache = kmem_cache_create("idr_layer_cache",
|
||||||
|
sizeof(struct idr_layer), 0, SLAB_PANIC, NULL);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](http://0xax.gitbooks.io/linux-insides/content/mm/index.html) chapter). In our case, as we are using `kmem_cache_t` it will be used the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can seee we pass five parameters to the `kmem_cache_create`:
|
||||||
|
|
||||||
|
* name of the cache;
|
||||||
|
* size of the object to store in cache;
|
||||||
|
* offset of the first object in the page;
|
||||||
|
* flags;
|
||||||
|
* constructor for the objects.
|
||||||
|
|
||||||
|
and it will create `kmem_cache` for the integer IDs. Integer `IDs` is commonly used pattern for the to map set of integer IDs to the set of pointers. We can see usage of the integer IDs for example in the [i2c](http://en.wikipedia.org/wiki/I%C2%B2C) drivers subsystem. For example [drivers/i2c/i2c-core.c]((https://github.com/torvalds/linux/blob/master/drivers/i2c/i2c-core) which presentes the core of the `i2c` subsystem defines `ID` for the `i2c` adapter with the `DEFINE_IDR` macro:
|
||||||
|
|
||||||
|
```C
|
||||||
|
static DEFINE_IDR(i2c_adapter_idr);
|
||||||
|
```
|
||||||
|
|
||||||
|
and than it uses it for the declaration of the `i2c` adapter:
|
||||||
|
|
||||||
|
```C
|
||||||
|
static int __i2c_add_numbered_adapter(struct i2c_adapter *adap)
|
||||||
|
{
|
||||||
|
int id;
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
id = idr_alloc(&i2c_adapter_idr, adap, adap->nr, adap->nr + 1, GFP_KERNEL);
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
and `id2_adapter_idr` presents dynamically calculated bus number.
|
||||||
|
|
||||||
|
More about integer ID management you can read [here](https://lwn.net/Articles/103209/).
|
||||||
|
|
||||||
|
RCU initialization
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
The next step is [RCU](http://en.wikipedia.org/wiki/Read-copy-update) initialization with the `rcu_init` function and it's implementation depends on two kernel configuration options:
|
||||||
|
|
||||||
|
* `CONFIG_TINY_RCU`
|
||||||
|
* `CONFIG_TREE_RCU`
|
||||||
|
|
||||||
|
In the first case `rcu_init` will be in the [kernel/rcu/tiny.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tiny.c) and in the second case it will be defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c). We will see the implementation of the `tree rcu`, but first of all about the `RCU` in general.
|
||||||
|
|
||||||
|
`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy.
|
||||||
|
|
||||||
|
Of course this description of the `RCU` is very simplified. To understand some details about `RCU`, first of all we need to learn some terminology. Data readers in the `RCU` executed in the [critical section](http://en.wikipedia.org/wiki/Critical_section). Everytime when data reader joins to the critical section, it calls the `rcu_read_lock`, and `rcu_read_unlock` on exit from the critical section. If the thread is not in the critical section, it will be in state which called - `quiescent state`. Every moment when every thread was in the `quiescent state` called - `grace period`. If a thread wants to remove element from the data structure, this occurs in two steps. First steps is `removal` - atomically removes element from the data structure, but does not release the physical memory. After this thread-writer announces and waits while it will be finsihed. From this moment, the removed element is available to the thread-readers. After the `grace perioud` will be finished, the second step of the element removal will be started, it just removes element from the physical memory.
|
||||||
|
|
||||||
|
There a couple implementations of the `RCU`. Old `RCU` called classic, the new implemetation called `tree` RCU. As you already can undrestand, the `CONFIG_TREE_RCU` kernel configuration option enables tree `RCU`. Another is the `tiny` RCU which depends on `CONFIG_TINY_RCU` and `CONFIG_SMP=n`. We will see more details about the `RCU` in general in the separate chapter about synchronization primitives, but now let's look on the `rcu_init` implementation from the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c):
|
||||||
|
|
||||||
|
```C
|
||||||
|
void __init rcu_init(void)
|
||||||
|
{
|
||||||
|
int cpu;
|
||||||
|
|
||||||
|
rcu_bootup_announce();
|
||||||
|
rcu_init_geometry();
|
||||||
|
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
|
||||||
|
rcu_init_one(&rcu_sched_state, &rcu_sched_data);
|
||||||
|
__rcu_init_preempt();
|
||||||
|
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We don't need protection against CPU-hotplug here because
|
||||||
|
* this is called early in boot, before either interrupts
|
||||||
|
* or the scheduler are operational.
|
||||||
|
*/
|
||||||
|
cpu_notifier(rcu_cpu_notify, 0);
|
||||||
|
pm_notifier(rcu_pm_notify, 0);
|
||||||
|
for_each_online_cpu(cpu)
|
||||||
|
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
|
||||||
|
|
||||||
|
rcu_early_boot_tests();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In the beginning of the `rcu_init` function we define `cpu` variable and call `rcu_bootup_announce`. The `rcu_bootup_announce` function is pretty simple:
|
||||||
|
|
||||||
|
```C
|
||||||
|
static void __init rcu_bootup_announce(void)
|
||||||
|
{
|
||||||
|
pr_info("Hierarchical RCU implementation.\n");
|
||||||
|
rcu_bootup_announce_oddness();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
It just prints information about the `RCU` with the `pr_info` function and `rcu_bootup_announce_oddness` which uses `pr_info` too, for printing different information about the current `RCU` configuration which depends on different kernel configuration options like `CONFIG_RCU_TRACE`, `CONFIG_PROVE_RCU`, `CONFIG_RCU_FANOUT_EXACT` and etc... In the next step, we can see the call of the `rcu_init_geometry` function. This function defined in the same source code file and computes the node tree geometry depends on amount of CPUs. Actually `RCU` provides scalability with extremely low internal to RCU lock contention. What if a data structure will be read from the different CPUs? `RCU` API provides the `rcu_state` structure wihch presents RCU global state including node hierarchy. Hierachy presented by the:
|
||||||
|
|
||||||
|
```
|
||||||
|
struct rcu_node node[NUM_RCU_NODES];
|
||||||
|
```
|
||||||
|
|
||||||
|
array of structures. As we can read in the comment which is above definition of this structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
The root (first level) of the hierarchy is in ->node[0] (referenced by ->level[0]), the second
|
||||||
|
level in ->node[1] through ->node[m] (->node[1] referenced by ->level[1]), and the third level
|
||||||
|
in ->node[m+1] and following (->node[m+1] referenced by ->level[2]). The number of levels is
|
||||||
|
determined by the number of CPUs and by CONFIG_RCU_FANOUT.
|
||||||
|
|
||||||
|
Small systems will have a "hierarchy" consisting of a single rcu_node.
|
||||||
|
```
|
||||||
|
|
||||||
|
The `rcu_node` structure defined in the [kernel/rcu/tree.h](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.h) and contains information about current grace period, is grace period completed or not, CPUs or groups that need to switch in order for current grace period to proceed and etc... Every `rcu_node` contains a lock for a couple of CPUs. These `rcu_node` structures embedded into a linear array in the `rcu_state` structure and represeted as a tree with the root in the zero element and it covers all CPUs. As you can see the number of the rcu nodes determined by the `NUM_RCU_NODES` which depends on number of available CPUs:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#define NUM_RCU_NODES (RCU_SUM - NR_CPUS)
|
||||||
|
#define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3 + NUM_RCU_LVL_4)
|
||||||
|
```
|
||||||
|
|
||||||
|
where levels values depend on the `CONFIG_RCU_FANOUT_LEAF` configuration option. For example for the simplest case, one `rcu_node` will cover two CPU on machine with the eight CPUs:
|
||||||
|
|
||||||
|
```
|
||||||
|
+-----------------------------------------------------------------+
|
||||||
|
| rcu_state |
|
||||||
|
| +----------------------+ |
|
||||||
|
| | root | |
|
||||||
|
| | rcu_node | |
|
||||||
|
| +----------------------+ |
|
||||||
|
| | | |
|
||||||
|
| +----v-----+ +--v-------+ |
|
||||||
|
| | | | | |
|
||||||
|
| | rcu_node | | rcu_node | |
|
||||||
|
| | | | | |
|
||||||
|
| +------------------+ +----------------+ |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
| +----v-----+ +-------v--+ +-v--------+ +-v--------+ |
|
||||||
|
| | | | | | | | | |
|
||||||
|
| | rcu_node | | rcu_node | | rcu_node | | rcu_node | |
|
||||||
|
| | | | | | | | | |
|
||||||
|
| +----------+ +----------+ +----------+ +----------+ |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
+---------|-----------------|-------------|---------------|-------+
|
||||||
|
| | | |
|
||||||
|
+---------v-----------------v-------------v---------------v--------+
|
||||||
|
| | | | |
|
||||||
|
| CPU1 | CPU3 | CPU5 | CPU7 |
|
||||||
|
| | | | |
|
||||||
|
| CPU2 | CPU4 | CPU6 | CPU8 |
|
||||||
|
| | | | |
|
||||||
|
+------------------------------------------------------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
So, in the `rcu_init_geometry` function we just need to calculate the total number of `rcu_node` structures. We start to do it with the calculation of the `jiffies` till to the first and next `fqs` which is `force-quiescent-state` (read above about it):
|
||||||
|
|
||||||
|
```C
|
||||||
|
d = RCU_JIFFIES_TILL_FORCE_QS + nr_cpu_ids / RCU_JIFFIES_FQS_DIV;
|
||||||
|
if (jiffies_till_first_fqs == ULONG_MAX)
|
||||||
|
jiffies_till_first_fqs = d;
|
||||||
|
if (jiffies_till_next_fqs == ULONG_MAX)
|
||||||
|
jiffies_till_next_fqs = d;
|
||||||
|
```
|
||||||
|
|
||||||
|
where:
|
||||||
|
|
||||||
|
```C
|
||||||
|
#define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
|
||||||
|
#define RCU_JIFFIES_FQS_DIV 256
|
||||||
|
```
|
||||||
|
|
||||||
|
As we calculated these [jiffies](http://en.wikipedia.org/wiki/Jiffy_%28time%29), we check that previous defined `jiffies_till_first_fqs` and `jiffies_till_next_fqs` variables are equal to the [ULONG_MAX](http://www.rowleydownload.co.uk/avr/documentation/index.htm?http://www.rowleydownload.co.uk/avr/documentation/ULONG_MAX.htm) (their default values) and set they equal to the calculated value. As we did not touch these variables before, they are equal to the `ULONG_MAX`:
|
||||||
|
|
||||||
|
```C
|
||||||
|
static ulong jiffies_till_first_fqs = ULONG_MAX;
|
||||||
|
static ulong jiffies_till_next_fqs = ULONG_MAX;
|
||||||
|
```
|
||||||
|
|
||||||
|
In the next step of the `rcu_init_geometry`, we check that `rcu_fanout_leaf` didn't chage (it has the same value as `CONFIG_RCU_FANOUT_LEAF` in compile-time) and equal to the value of the `CONFIG_RCU_FANOUT_LEAF` configuration option, we just return:
|
||||||
|
|
||||||
|
```C
|
||||||
|
if (rcu_fanout_leaf == CONFIG_RCU_FANOUT_LEAF &&
|
||||||
|
nr_cpu_ids == NR_CPUS)
|
||||||
|
return;
|
||||||
|
```
|
||||||
|
|
||||||
|
After this we need to compute the number of nodes that can be handled an `rcu_node` tree with the given number of levels:
|
||||||
|
|
||||||
|
```C
|
||||||
|
rcu_capacity[0] = 1;
|
||||||
|
rcu_capacity[1] = rcu_fanout_leaf;
|
||||||
|
for (i = 2; i <= MAX_RCU_LVLS; i++)
|
||||||
|
rcu_capacity[i] = rcu_capacity[i - 1] * CONFIG_RCU_FANOUT;
|
||||||
|
```
|
||||||
|
|
||||||
|
And in the last step we calcluate the number of rcu_nodes at each level of the tree in the [loop](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c#L4094).
|
||||||
|
|
||||||
|
As we calculated geometry of the `rcu_node` tree, we need to back to the `rcu_init` function and next step we need to initialize two `rcu_state` structures with the `rcu_init_one` function:
|
||||||
|
|
||||||
|
```C
|
||||||
|
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
|
||||||
|
rcu_init_one(&rcu_sched_state, &rcu_sched_data);
|
||||||
|
```
|
||||||
|
|
||||||
|
The `rcu_init_one` function takes two arguments:
|
||||||
|
|
||||||
|
* Global `RCU` state;
|
||||||
|
* Per-CPU data for `RCU`.
|
||||||
|
|
||||||
|
Both variables defined in the [kernel/rcu/tree.h](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.h) with its `percpu` data:
|
||||||
|
|
||||||
|
```
|
||||||
|
extern struct rcu_state rcu_bh_state;
|
||||||
|
DECLARE_PER_CPU(struct rcu_data, rcu_bh_data);
|
||||||
|
```
|
||||||
|
|
||||||
|
About this states you can read [here](http://lwn.net/Articles/264090/). As I wrote above we need to initialize `rcu_state` structures and `rcu_init_one` function will help us with it. After the `rcu_state` initialization, we can see the call of the ` __rcu_init_preempt` which depends on the `CONFIG_PREEMPT_RCU` kernel configuration option. It does the same that previous functions - initialization of the `rcu_preempt_state` structure with the `rcu_init_one` function which has `rcu_state` type. After this, in the `rcu_init`, we can see the call of the:
|
||||||
|
|
||||||
|
```C
|
||||||
|
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
|
||||||
|
```
|
||||||
|
|
||||||
|
function. This function registers a handler of the `pending interrupt`. Pending interrupt or `softirq` supposes that part of actions cab be delayed for later execution when the system will be less loaded. Pending interrupts represeted by the following structure:
|
||||||
|
|
||||||
|
```C
|
||||||
|
struct softirq_action
|
||||||
|
{
|
||||||
|
void (*action)(struct softirq_action *);
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
which defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) and contains only one field - handler of an interrupt. You can know about `softirqs` in the your system with the:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cat /proc/softirqs
|
||||||
|
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||||
|
HI: 2 0 0 1 0 2 0 0
|
||||||
|
TIMER: 137779 108110 139573 107647 107408 114972 99653 98665
|
||||||
|
NET_TX: 1127 0 4 0 1 1 0 0
|
||||||
|
NET_RX: 334 221 132939 3076 451 361 292 303
|
||||||
|
BLOCK: 5253 5596 8 779 2016 37442 28 2855
|
||||||
|
BLOCK_IOPOLL: 0 0 0 0 0 0 0 0
|
||||||
|
TASKLET: 66 0 2916 113 0 24 26708 0
|
||||||
|
SCHED: 102350 75950 91705 75356 75323 82627 69279 69914
|
||||||
|
HRTIMER: 510 302 368 260 219 255 248 246
|
||||||
|
RCU: 81290 68062 82979 69015 68390 69385 63304 63473
|
||||||
|
```
|
||||||
|
|
||||||
|
The `open_softirq` function takes two parameters:
|
||||||
|
|
||||||
|
* index of the interrupt;
|
||||||
|
* interrupt handler.
|
||||||
|
|
||||||
|
and adds interrupt handler to the array of the pending interrupts:
|
||||||
|
|
||||||
|
```C
|
||||||
|
void open_softirq(int nr, void (*action)(struct softirq_action *))
|
||||||
|
{
|
||||||
|
softirq_vec[nr].action = action;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In our case the interrupt handler is - `rcu_process_callbacks` which defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c) and does the `RCU` core processing for the current CPU. After we registered `softirq` interrupt for the `RCU`, we can see the following code:
|
||||||
|
|
||||||
|
```C
|
||||||
|
cpu_notifier(rcu_cpu_notify, 0);
|
||||||
|
pm_notifier(rcu_pm_notify, 0);
|
||||||
|
for_each_online_cpu(cpu)
|
||||||
|
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we can see registration of the `cpu` notifier which needs in sysmtems which supports [CPU hotplug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) and we will not dive into details about this theme. The last function in the `rcu_init` is the `rcu_early_boot_tests`:
|
||||||
|
|
||||||
|
```C
|
||||||
|
void rcu_early_boot_tests(void)
|
||||||
|
{
|
||||||
|
pr_info("Running RCU self tests\n");
|
||||||
|
|
||||||
|
if (rcu_self_test)
|
||||||
|
early_boot_test_call_rcu();
|
||||||
|
if (rcu_self_test_bh)
|
||||||
|
early_boot_test_call_rcu_bh();
|
||||||
|
if (rcu_self_test_sched)
|
||||||
|
early_boot_test_call_rcu_sched();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
which runs self tests for the `RCU`.
|
||||||
|
|
||||||
|
That's all. We saw initialization process of the `RCU` subsystem. As I wrote above, more about the `RCU` will be in the separate chapter about synchronization primitives.
|
||||||
|
|
||||||
|
Rest of the initialization process
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Ok, we already passed the main theme of this part which is `RCU` initialization, but it is not the end of the linux kernel initialization process. In the last paragraph of this theme we will see a couple of functions which work in the initialization time, but we will not dive into deep details around this function by different reasons. Some reasons not to dive into details are following:
|
||||||
|
|
||||||
|
* They are not very important for the generic kernel initialization process and can depend on the different kernel configuration;
|
||||||
|
* They have the character of debugging and not important too for now;
|
||||||
|
* We will see many of this stuff in the separate parts/chapters.
|
||||||
|
|
||||||
|
After we initilized `RCU`, the next step which you can see in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the - `trace_init` function. As you can understand from its name, this function initialize [tracing](http://en.wikipedia.org/wiki/Tracing_%28software%29) subsystem. More about linux kernel trace system you can read - [here](http://elinux.org/Kernel_Trace_Systems).
|
||||||
|
|
||||||
|
After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c) and more about it you can read in the part about [Radix tree](http://0xax.gitbooks.io/linux-insides/content/DataStructures/radix-tree.md).
|
||||||
|
|
||||||
|
In the next step we can see the functions which are related to the `interrupts handling` subsystem, they are:
|
||||||
|
|
||||||
|
* `early_irq_init`
|
||||||
|
* `init_IRQ`
|
||||||
|
* `softirq_init`
|
||||||
|
|
||||||
|
We will see explanation about this functions and their implementation in the special part about interrupts and exceptions handling. After this many different functions (like `init_timers`, `hrtimers_init`, `time_init` and etc...) which are related to different timing and timers stuff. More about these function we will see in the chapter about timers.
|
||||||
|
|
||||||
|
The next couple of functions related with the [perf](https://perf.wiki.kernel.org/index.php/Main_Page) events - `perf_event-init` (will be separate chapter about perf), initialization of the `profiling` with the `profile_init`. After this we enable `irq` with the call of the:
|
||||||
|
|
||||||
|
```C
|
||||||
|
local_irq_enable();
|
||||||
|
```
|
||||||
|
|
||||||
|
which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](http://0xax.gitbooks.io/linux-insides/content/mm/index.html) chapter).
|
||||||
|
|
||||||
|
After the post initialization of the `SLAB`, next point is initialization of the console with the `console_init` function from the [drivers/tty/tty_io.c](https://github.com/torvalds/linux/blob/master/drivers/tty/tty_io.c).
|
||||||
|
|
||||||
|
After the console initialization, we can see the `lockdep_info` function which prints information about the [Lock dependency validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). After this, we can see the initialization of the dynamic allocation of the `debug objects` with the `debug_objects_mem_init`, kernel memory leack [detector](https://www.kernel.org/doc/Documentation/kmemleak.txt) initialization with the `kmemleak_init`, `percpu` pageset setup with the `setup_per_cpu_pageset`, setup of the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) policy with the `numa_policy_init`, setting time for the scheduler with the `sched_clock_init`, `pidmap` initialization with the call of the `pidmap_init` function for the initial `PID` namespace, cache creation with the `anon_vma_init` for the private virtual memory areas and early initialization of the [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface) with the `acpi_early_init`.
|
||||||
|
|
||||||
|
This is the end of the ninth part of the [linux kernel initialization process](http://0xax.gitbooks.io/linux-insides/content/Initialization/index.html) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we went thorugh the many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I wrote already many times, we will see details of implementations, but in the other parts or other chapters.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
It is the end of the ninth part about the linux kernel [initialization process](http://0xax.gitbooks.io/linux-insides/content/Initialization/index.html). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file and will see that start of the first process.
|
||||||
|
|
||||||
|
If you will have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||||
|
|
||||||
|
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you will find any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).**
|
||||||
|
|
||||||
|
Links
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
* [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure)
|
||||||
|
* [kmemleak](https://www.kernel.org/doc/Documentation/kmemleak.txt)
|
||||||
|
* [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface)
|
||||||
|
* [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
|
||||||
|
* [RCU](http://en.wikipedia.org/wiki/Read-copy-update)
|
||||||
|
* [RCU documentation](https://github.com/torvalds/linux/tree/master/Documentation/RCU)
|
||||||
|
* [integer ID management](https://lwn.net/Articles/103209/)
|
||||||
|
* [Documentation/memory-barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
|
||||||
|
* [Runtime locking correctness validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt)
|
||||||
|
* [Per-CPU variables](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html)
|
||||||
|
* [Linux kernel memory management](http://0xax.gitbooks.io/linux-insides/content/mm/index.html)
|
||||||
|
* [slab](http://en.wikipedia.org/wiki/Slab_allocation)
|
||||||
|
* [i2c](http://en.wikipedia.org/wiki/I%C2%B2C)
|
||||||
|
* [Previous part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-8.html)
|
Loading…
Reference in New Issue
Block a user