1
0
mirror of https://github.com/0xAX/linux-insides.git synced 2024-12-22 14:48:08 +00:00
linux-insides/Initialization/linux-initialization-9.md

431 lines
29 KiB
Markdown
Raw Normal View History

2015-05-17 13:59:01 +00:00
Kernel initialization. Part 9.
================================================================================
RCU initialization
================================================================================
This is ninth part of the [Linux Kernel initialization process](https://0xax.gitbook.io/linux-insides/summary/initialization) and in the previous part we stopped at the [scheduler initialization](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-8). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c) after the `sched_init` is the call of the `preempt_disable`. There are two macros:
2015-05-17 13:59:01 +00:00
* `preempt_disable`
* `preempt_enable`
for preemption disabling and enabling. First of all let's try to understand what is `preempt` in the context of an operating system kernel. In simple words, preemption is ability of the operating system kernel to preempt current task to run task with higher priority. Here we need to disable preemption because we will have only one `init` process for the early boot time and we don't need to stop it before we call `cpu_idle` function. The `preempt_disable` macro is defined in the [include/linux/preempt.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/preempt.h) and depends on the `CONFIG_PREEMPT_COUNT` kernel configuration option. This macro is implemented as:
2015-05-17 13:59:01 +00:00
```C
#define preempt_disable() \
do { \
preempt_count_inc(); \
barrier(); \
} while (0)
```
and if `CONFIG_PREEMPT_COUNT` is not set just:
```C
#define preempt_disable() barrier()
```
2016-01-12 07:43:54 +00:00
Let's look on it. First of all we can see one difference between these macro implementations. The `preempt_disable` with `CONFIG_PREEMPT_COUNT` set contains the call of the `preempt_count_inc`. There is special `percpu` variable which stores the number of held locks and `preempt_disable` calls:
2015-05-17 13:59:01 +00:00
```C
DECLARE_PER_CPU(int, __preempt_count);
```
In the first implementation of the `preempt_disable` we increment this `__preempt_count`. There is API for returning value of the `__preempt_count`, it is the `preempt_count` function. As we called `preempt_disable`, first of all we increment preemption counter with the `preempt_count_inc` macro which expands to the:
```
#define preempt_count_inc() preempt_count_add(1)
#define preempt_count_add(val) __preempt_count_add(val)
```
where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](https://0xax.gitbook.io/linux-insides/summary/concepts/linux-cpu-1)). Ok, we increased `__preempt_count` and the next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need the opportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider a simple example:
2015-05-17 13:59:01 +00:00
```C
preempt_disable();
foo();
preempt_enable();
```
Compiler can rearrange it as:
```C
preempt_disable();
preempt_enable();
foo();
```
In this case non-preemptible function `foo` can be preempted. As we put `barrier` macro in the `preempt_disable` and `preempt_enable` macros, it prevents the compiler from swapping `preempt_count_inc` with other statements. More about barriers you can read [here](http://en.wikipedia.org/wiki/Memory_barrier) and [here](https://www.kernel.org/doc/Documentation/memory-barriers.txt).
In the next step we can see following statement:
```C
if (WARN(!irqs_disabled(),
"Interrupts were enabled *very* early, fixing it\n"))
local_irq_disable();
```
which check [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) state, and disabling (with `cli` instruction for `x86_64`) if they are enabled.
That's all. Preemption is disabled and we can go ahead.
Initialization of the integer ID management
--------------------------------------------------------------------------------
In the next step we can see the call of the `idr_init_cache` function which defined in the [lib/idr.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/lib/idr.c). The `idr` library is used in a various [places](http://lxr.free-electrons.com/ident?i=idr_find) in the linux kernel to manage assigning integer `IDs` to objects and looking up objects by id.
2015-05-17 13:59:01 +00:00
Let's look on the implementation of the `idr_init_cache` function:
```C
void __init idr_init_cache(void)
{
idr_layer_cache = kmem_cache_create("idr_layer_cache",
sizeof(struct idr_layer), 0, SLAB_PANIC, NULL);
}
```
Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](https://0xax.gitbook.io/linux-insides/summary/mm) chapter). In our case, as we are using `kmem_cache_t` which will be used by the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can see we pass five parameters to the `kmem_cache_create`:
2015-05-17 13:59:01 +00:00
* name of the cache;
* size of the object to store in cache;
* offset of the first object in the page;
* flags;
* constructor for the objects.
Fix broken git hub links Replace the following dead github links, with equivalent working ones. s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973 | https://github.com/torvalds/linux s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/ | https://github.com/torvalds/linux s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/Documentation/security/credentials.txt | https://github.com/torvalds/linux/blob/master/Documentation/security/credentials.rst s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/Documentation/workqueue.txt | https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/Documentation/core-api/workqueue.rst s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry_entry_64.S | https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/calling.h | https://github.com/torvalds/linux/blob/master/arch/x86/entry/calling.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/pgalloc. | https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgalloc.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/bitops.h | https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/irqflags.h | https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/irqflags.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/uapi/asm/msr-index.h | https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/msr-index.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel.setup.c | https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/entry_64.S | https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsyscall_64.c | https://github.com/torvalds/linux/blob/master/arch/x86/entry/vsyscall/vsyscall_64.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsyscall_emu_64.S | https://github.com/torvalds/linux/blob/master/arch/x86/entry/vsyscall/vsyscall_emu_64.S s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/blob/arch/x86/kernel/cpu/common.c | https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/common.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/drivers/clocksource_acpi_pm.c | https://github.com/torvalds/linux/blob/master/drivers/clocksource/acpi_pm.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/drivers/i2c/i2c-core.c | https://github.com/torvalds/linux/blob/master/drivers/i2c/i2c-core-base.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/asm-generic-sections.h | https://github.com/torvalds/linux/blob/master/include/asm-generic/sections.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/context_tracking.h | https://github.com/torvalds/linux/blob/master/include/linux/context_tracking.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/mm_types.h | https://github.com/torvalds/linux/blob/master/include/linux/mm_types.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/apic/io_apic.c | https://github.com/torvalds/linux/blob/master/arch/x86/kernel/apic/io_apic.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/apic/vector.c | https://github.com/torvalds/linux/blob/master/arch/x86/kernel/apic/vector.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/cgroup.c | https://github.com/torvalds/linux/blob/master/kernel/cgroup/cgroup.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/cpuset.c | https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/irqinit.c | https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irqinit.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/locking/lockdep_insides.h | https://github.com/torvalds/linux/blob/master/kernel/locking/lockdep_internals.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/tick-common.c | https://github.com/torvalds/linux/blob/master/kernel/time/tick-common.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/time/tich-sched.c | https://github.com/torvalds/linux/blob/master/kernel/time/tick-sched.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/linux/binfmts.h | https://github.com/torvalds/linux/blob/master/include/linux/binfmts.h s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/locking/rwsem-xadd.c | https://github.com/torvalds/linux/blob/master/kernel/locking/rwsem.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/mm/block.c | https://github.com/torvalds/linux/blob/master/mm/memblock.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/sched/idle.c | https://github.com/torvalds/linux/blob/master/kernel/sched/idle.c s/ https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/sound/isa/sscape | https://github.com/torvalds/linux/blob/master/sound/isa/sscape.c Signed-off-by: Sebastian Fricke <sebastian.fricke.linux@gmail.com>
2020-06-18 15:22:10 +00:00
and it will create `kmem_cache` for the integer IDs. Integer `IDs` is commonly used pattern to map set of integer IDs to the set of pointers. We can see usage of the integer IDs in the [i2c](http://en.wikipedia.org/wiki/I%C2%B2C) drivers subsystem. For example [drivers/i2c/i2c-core-base.c](https://github.com/torvalds/linux/blob/master/drivers/i2c/i2c-core-base.c) which represents the core of the `i2c` subsystem defines `ID` for the `i2c` adapter with the `DEFINE_IDR` macro:
2015-05-17 13:59:01 +00:00
```C
static DEFINE_IDR(i2c_adapter_idr);
```
2016-01-12 07:43:54 +00:00
and then uses it for the declaration of the `i2c` adapter:
2015-05-17 13:59:01 +00:00
```C
static int __i2c_add_numbered_adapter(struct i2c_adapter *adap)
{
int id;
...
...
...
id = idr_alloc(&i2c_adapter_idr, adap, adap->nr, adap->nr + 1, GFP_KERNEL);
...
...
...
}
```
and `id2_adapter_idr` presents dynamically calculated bus number.
More about integer ID management you can read [here](https://lwn.net/Articles/103209/).
RCU initialization
--------------------------------------------------------------------------------
The next step is [RCU](http://en.wikipedia.org/wiki/Read-copy-update) initialization with the `rcu_init` function and it's implementation depends on two kernel configuration options:
* `CONFIG_TINY_RCU`
* `CONFIG_TREE_RCU`
In the first case `rcu_init` will be in the [kernel/rcu/tiny.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tiny.c) and in the second case it will be defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.c). We will see the implementation of the `tree rcu`, but first of all about the `RCU` in general.
2015-05-17 13:59:01 +00:00
`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurrently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](https://0xax.gitbook.io/linux-insides/summary/concepts/linux-cpu-1) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique is designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy.
2015-05-17 13:59:01 +00:00
2016-05-19 14:46:26 +00:00
Of course this description of the `RCU` is very simplified. To understand some details about `RCU`, first of all we need to learn some terminology. Data readers in the `RCU` executed in the [critical section](http://en.wikipedia.org/wiki/Critical_section). Every time when data reader get to the critical section, it calls the `rcu_read_lock`, and `rcu_read_unlock` on exit from the critical section. If the thread is not in the critical section, it will be in state which called - `quiescent state`. The moment when every thread is in the `quiescent state` called - `grace period`. If a thread wants to remove an element from the data structure, this occurs in two steps. First step is `removal` - atomically removes element from the data structure, but does not release the physical memory. After this thread-writer announces and waits until it is finished. From this moment, the removed element is available to the thread-readers. After the `grace period` finished, the second step of the element removal will be started, it just removes the element from the physical memory.
2015-05-17 13:59:01 +00:00
There a couple of implementations of the `RCU`. Old `RCU` called classic, the new implementation called `tree` RCU. As you may already understand, the `CONFIG_TREE_RCU` kernel configuration option enables tree `RCU`. Another is the `tiny` RCU which depends on `CONFIG_TINY_RCU` and `CONFIG_SMP=n`. We will see more details about the `RCU` in general in the separate chapter about synchronization primitives, but now let's look on the `rcu_init` implementation from the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.c):
2015-05-17 13:59:01 +00:00
```C
void __init rcu_init(void)
{
int cpu;
rcu_bootup_announce();
rcu_init_geometry();
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
rcu_init_one(&rcu_sched_state, &rcu_sched_data);
__rcu_init_preempt();
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
/*
* We don't need protection against CPU-hotplug here because
* this is called early in boot, before either interrupts
* or the scheduler are operational.
*/
cpu_notifier(rcu_cpu_notify, 0);
pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu)
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
rcu_early_boot_tests();
}
```
In the beginning of the `rcu_init` function we define `cpu` variable and call `rcu_bootup_announce`. The `rcu_bootup_announce` function is pretty simple:
```C
static void __init rcu_bootup_announce(void)
{
pr_info("Hierarchical RCU implementation.\n");
rcu_bootup_announce_oddness();
}
```
2016-05-19 14:46:26 +00:00
It just prints information about the `RCU` with the `pr_info` function and `rcu_bootup_announce_oddness` which uses `pr_info` too, for printing different information about the current `RCU` configuration which depends on different kernel configuration options like `CONFIG_RCU_TRACE`, `CONFIG_PROVE_RCU`, `CONFIG_RCU_FANOUT_EXACT`, etc. In the next step, we can see the call of the `rcu_init_geometry` function. This function is defined in the same source code file and computes the node tree geometry depends on the amount of CPUs. Actually `RCU` provides scalability with extremely low internal RCU lock contention. What if a data structure will be read from the different CPUs? `RCU` API provides the `rcu_state` structure which presents RCU global state including node hierarchy. Hierarchy is presented by the:
2015-05-17 13:59:01 +00:00
```
struct rcu_node node[NUM_RCU_NODES];
```
2016-01-12 07:43:54 +00:00
array of structures. As we can read in the comment of above definition:
2015-05-17 13:59:01 +00:00
```
The root (first level) of the hierarchy is in ->node[0] (referenced by ->level[0]), the second
level in ->node[1] through ->node[m] (->node[1] referenced by ->level[1]), and the third level
in ->node[m+1] and following (->node[m+1] referenced by ->level[2]). The number of levels is
determined by the number of CPUs and by CONFIG_RCU_FANOUT.
Small systems will have a "hierarchy" consisting of a single rcu_node.
```
The `rcu_node` structure is defined in the [kernel/rcu/tree.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.h) and contains information about current grace period, is grace period completed or not, CPUs or groups that need to switch in order for current grace period to proceed, etc. Every `rcu_node` contains a lock for a couple of CPUs. These `rcu_node` structures are embedded into a linear array in the `rcu_state` structure and represented as a tree with the root as the first element and covers all CPUs. As you can see the number of the rcu nodes determined by the `NUM_RCU_NODES` which depends on number of available CPUs:
2015-05-17 13:59:01 +00:00
```C
#define NUM_RCU_NODES (RCU_SUM - NR_CPUS)
#define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3 + NUM_RCU_LVL_4)
```
where levels values depend on the `CONFIG_RCU_FANOUT_LEAF` configuration option. For example for the simplest case, one `rcu_node` will cover two CPU on machine with the eight CPUs:
```
+-----------------------------------------------------------------+
| rcu_state |
| +----------------------+ |
| | root | |
| | rcu_node | |
| +----------------------+ |
| | | |
| +----v-----+ +--v-------+ |
| | | | | |
| | rcu_node | | rcu_node | |
| | | | | |
| +------------------+ +----------------+ |
| | | | | |
| | | | | |
| +----v-----+ +-------v--+ +-v--------+ +-v--------+ |
| | | | | | | | | |
| | rcu_node | | rcu_node | | rcu_node | | rcu_node | |
| | | | | | | | | |
| +----------+ +----------+ +----------+ +----------+ |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
+---------|-----------------|-------------|---------------|-------+
| | | |
+---------v-----------------v-------------v---------------v--------+
| | | | |
| CPU1 | CPU3 | CPU5 | CPU7 |
| | | | |
| CPU2 | CPU4 | CPU6 | CPU8 |
| | | | |
+------------------------------------------------------------------+
```
So, in the `rcu_init_geometry` function we just need to calculate the total number of `rcu_node` structures. We start to do it with the calculation of the `jiffies` till to the first and next `fqs` which is `force-quiescent-state` (read above about it):
```C
d = RCU_JIFFIES_TILL_FORCE_QS + nr_cpu_ids / RCU_JIFFIES_FQS_DIV;
if (jiffies_till_first_fqs == ULONG_MAX)
jiffies_till_first_fqs = d;
if (jiffies_till_next_fqs == ULONG_MAX)
jiffies_till_next_fqs = d;
```
where:
```C
#define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
#define RCU_JIFFIES_FQS_DIV 256
```
As we calculated these [jiffies](http://en.wikipedia.org/wiki/Jiffy_%28time%29), we check that previous defined `jiffies_till_first_fqs` and `jiffies_till_next_fqs` variables are equal to the [ULONG_MAX](http://www.rowleydownload.co.uk/avr/documentation/index.htm?http://www.rowleydownload.co.uk/avr/documentation/ULONG_MAX.htm) (their default values) and set they equal to the calculated value. As we did not touch these variables before, they are equal to the `ULONG_MAX`:
```C
static ulong jiffies_till_first_fqs = ULONG_MAX;
static ulong jiffies_till_next_fqs = ULONG_MAX;
```
2016-05-19 14:46:26 +00:00
In the next step of the `rcu_init_geometry`, we check that `rcu_fanout_leaf` didn't change (it has the same value as `CONFIG_RCU_FANOUT_LEAF` in compile-time) and equal to the value of the `CONFIG_RCU_FANOUT_LEAF` configuration option, we just return:
2015-05-17 13:59:01 +00:00
```C
if (rcu_fanout_leaf == CONFIG_RCU_FANOUT_LEAF &&
nr_cpu_ids == NR_CPUS)
return;
```
2016-01-12 07:43:54 +00:00
After this we need to compute the number of nodes that an `rcu_node` tree can handle with the given number of levels:
2015-05-17 13:59:01 +00:00
```C
rcu_capacity[0] = 1;
rcu_capacity[1] = rcu_fanout_leaf;
for (i = 2; i <= MAX_RCU_LVLS; i++)
rcu_capacity[i] = rcu_capacity[i - 1] * CONFIG_RCU_FANOUT;
```
And in the last step we calculate the number of rcu_nodes at each level of the tree in the [loop](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.c#L4094).
2015-05-17 13:59:01 +00:00
2016-01-12 07:43:54 +00:00
As we calculated geometry of the `rcu_node` tree, we need to go back to the `rcu_init` function and next step we need to initialize two `rcu_state` structures with the `rcu_init_one` function:
2015-05-17 13:59:01 +00:00
```C
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
rcu_init_one(&rcu_sched_state, &rcu_sched_data);
```
The `rcu_init_one` function takes two arguments:
* Global `RCU` state;
* Per-CPU data for `RCU`.
Both variables defined in the [kernel/rcu/tree.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.h) with its `percpu` data:
2015-05-17 13:59:01 +00:00
```
extern struct rcu_state rcu_bh_state;
DECLARE_PER_CPU(struct rcu_data, rcu_bh_data);
```
2016-01-12 07:43:54 +00:00
About this states you can read [here](http://lwn.net/Articles/264090/). As I wrote above we need to initialize `rcu_state` structures and `rcu_init_one` function will help us with it. After the `rcu_state` initialization, we can see the call of the ` __rcu_init_preempt` which depends on the `CONFIG_PREEMPT_RCU` kernel configuration option. It does the same as previous functions - initialization of the `rcu_preempt_state` structure with the `rcu_init_one` function which has `rcu_state` type. After this, in the `rcu_init`, we can see the call of the:
2015-05-17 13:59:01 +00:00
```C
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
```
2016-05-19 14:46:26 +00:00
function. This function registers a handler of the `pending interrupt`. Pending interrupt or `softirq` supposes that part of actions can be delayed for later execution when the system is less loaded. Pending interrupts is represented by the following structure:
2015-05-17 13:59:01 +00:00
```C
struct softirq_action
{
void (*action)(struct softirq_action *);
};
```
which is defined in the [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/interrupt.h) and contains only one field - handler of an interrupt. You can check about `softirqs` in the your system with the:
2015-05-17 13:59:01 +00:00
```
$ cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
HI: 2 0 0 1 0 2 0 0
TIMER: 137779 108110 139573 107647 107408 114972 99653 98665
NET_TX: 1127 0 4 0 1 1 0 0
NET_RX: 334 221 132939 3076 451 361 292 303
BLOCK: 5253 5596 8 779 2016 37442 28 2855
BLOCK_IOPOLL: 0 0 0 0 0 0 0 0
TASKLET: 66 0 2916 113 0 24 26708 0
SCHED: 102350 75950 91705 75356 75323 82627 69279 69914
HRTIMER: 510 302 368 260 219 255 248 246
RCU: 81290 68062 82979 69015 68390 69385 63304 63473
```
The `open_softirq` function takes two parameters:
* index of the interrupt;
* interrupt handler.
and adds interrupt handler to the array of the pending interrupts:
```C
void open_softirq(int nr, void (*action)(struct softirq_action *))
{
softirq_vec[nr].action = action;
}
```
In our case the interrupt handler is - `rcu_process_callbacks` which is defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/rcu/tree.c) and does the `RCU` core processing for the current CPU. After we registered `softirq` interrupt for the `RCU`, we can see the following code:
2015-05-17 13:59:01 +00:00
```C
cpu_notifier(rcu_cpu_notify, 0);
pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu)
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
```
2016-05-19 14:46:26 +00:00
Here we can see registration of the `cpu` notifier which needs in systems which supports [CPU hotplug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) and we will not dive into details about this theme. The last function in the `rcu_init` is the `rcu_early_boot_tests`:
2015-05-17 13:59:01 +00:00
```C
void rcu_early_boot_tests(void)
{
pr_info("Running RCU self tests\n");
if (rcu_self_test)
early_boot_test_call_rcu();
if (rcu_self_test_bh)
early_boot_test_call_rcu_bh();
if (rcu_self_test_sched)
early_boot_test_call_rcu_sched();
}
```
which runs self tests for the `RCU`.
That's all. We saw initialization process of the `RCU` subsystem. As I wrote above, more about the `RCU` will be in the separate chapter about synchronization primitives.
Rest of the initialization process
--------------------------------------------------------------------------------
2016-01-12 07:43:54 +00:00
Ok, we already passed the main theme of this part which is `RCU` initialization, but it is not the end of the linux kernel initialization process. In the last paragraph of this theme we will see a couple of functions which work in the initialization time, but we will not dive into deep details around this function for different reasons. Some reasons not to dive into details are following:
2015-05-17 13:59:01 +00:00
2016-01-12 07:43:54 +00:00
* They are not very important for the generic kernel initialization process and depend on the different kernel configuration;
* They have the character of debugging and not important for now;
2015-05-17 13:59:01 +00:00
* We will see many of this stuff in the separate parts/chapters.
After we initialized `RCU`, the next step which you can see in the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c) is the - `trace_init` function. As you can understand from its name, this function initialize [tracing](http://en.wikipedia.org/wiki/Tracing_%28software%29) subsystem. You can read more about linux kernel trace system - [here](http://elinux.org/Kernel_Trace_Systems).
2015-05-17 13:59:01 +00:00
After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familiar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function is defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/lib/radix-tree.c) and you can read more about it in the part about [Radix tree](https://0xax.gitbook.io/linux-insides/summary/datastructures/linux-datastructures-2).
2015-05-17 13:59:01 +00:00
In the next step we can see the functions which are related to the `interrupts handling` subsystem, they are:
* `early_irq_init`
* `init_IRQ`
* `softirq_init`
2016-01-12 07:43:54 +00:00
We will see explanation about this functions and their implementation in the special part about interrupts and exceptions handling. After this many different functions (like `init_timers`, `hrtimers_init`, `time_init`, etc.) which are related to different timing and timers stuff. We will see more about these function in the chapter about timers.
2015-05-17 13:59:01 +00:00
2016-01-12 07:43:54 +00:00
The next couple of functions are related with the [perf](https://perf.wiki.kernel.org/index.php/Main_Page) events - `perf_event-init` (there will be separate chapter about perf), initialization of the `profiling` with the `profile_init`. After this we enable `irq` with the call of the:
2015-05-17 13:59:01 +00:00
```C
local_irq_enable();
```
which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](https://0xax.gitbook.io/linux-insides/summary/mm) chapter).
2015-05-17 13:59:01 +00:00
After the post initialization of the `SLAB`, next point is initialization of the console with the `console_init` function from the [drivers/tty/tty_io.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/drivers/tty/tty_io.c).
2015-05-17 13:59:01 +00:00
2016-01-12 07:43:54 +00:00
After the console initialization, we can see the `lockdep_info` function which prints information about the [Lock dependency validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). After this, we can see the initialization of the dynamic allocation of the `debug objects` with the `debug_objects_mem_init`, kernel memory leak [detector](https://www.kernel.org/doc/Documentation/kmemleak.txt) initialization with the `kmemleak_init`, `percpu` pageset setup with the `setup_per_cpu_pageset`, setup of the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) policy with the `numa_policy_init`, setting time for the scheduler with the `sched_clock_init`, `pidmap` initialization with the call of the `pidmap_init` function for the initial `PID` namespace, cache creation with the `anon_vma_init` for the private virtual memory areas and early initialization of the [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface) with the `acpi_early_init`.
2015-05-17 13:59:01 +00:00
This is the end of the ninth part of the [linux kernel initialization process](https://0xax.gitbook.io/linux-insides/summary/initialization) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we will go through many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I already wrote many times, we will see details of implementations in other parts or other chapters.
2015-05-17 13:59:01 +00:00
Conclusion
--------------------------------------------------------------------------------
It is the end of the ninth part about the linux kernel [initialization process](https://0xax.gitbook.io/linux-insides/summary/initialization). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c) source code file and will see the start of the first process.
2015-05-17 13:59:01 +00:00
2016-01-05 03:01:32 +00:00
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
2015-05-17 13:59:01 +00:00
2016-01-05 03:01:32 +00:00
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
2015-05-17 13:59:01 +00:00
Links
--------------------------------------------------------------------------------
* [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure)
* [kmemleak](https://www.kernel.org/doc/Documentation/kmemleak.txt)
* [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface)
* [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
* [RCU](http://en.wikipedia.org/wiki/Read-copy-update)
* [RCU documentation](https://github.com/torvalds/linux/tree/master/Documentation/RCU)
* [integer ID management](https://lwn.net/Articles/103209/)
* [Documentation/memory-barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
* [Runtime locking correctness validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt)
* [Per-CPU variables](https://0xax.gitbook.io/linux-insides/summary/concepts/linux-cpu-1)
* [Linux kernel memory management](https://0xax.gitbook.io/linux-insides/summary/mm)
2015-05-17 13:59:01 +00:00
* [slab](http://en.wikipedia.org/wiki/Slab_allocation)
* [i2c](http://en.wikipedia.org/wiki/I%C2%B2C)
* [Previous part](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-8)