mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-05 05:10:55 +00:00
Minor typos and grammatical fixes
This commit is contained in:
parent
a5b4ec093b
commit
8da68308db
@ -1,9 +1,9 @@
|
|||||||
Per-CPU variables
|
Per-CPU variables
|
||||||
================================================================================
|
================================================================================
|
||||||
|
|
||||||
Per-CPU variables are one of the kernel features. You can understand what this feature means by reading its name. We can create a variable and each processor core will have its own copy of this variable. We take a closer look on this feature and try to understand how it is implemented and how it works in this part.
|
Per-CPU variables are one of the kernel features. You can understand what this feature means by reading its name. We can create a variable and each processor core will have its own copy of this variable. In this part, we take a closer look at this feature and try to understand how it is implemented and how it works.
|
||||||
|
|
||||||
The kernel provides API for creating per-cpu variables - `DEFINE_PER_CPU` macro:
|
The kernel provides an API for creating per-cpu variables - the `DEFINE_PER_CPU` macro:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define DEFINE_PER_CPU(type, name) \
|
#define DEFINE_PER_CPU(type, name) \
|
||||||
@ -12,13 +12,13 @@ The kernel provides API for creating per-cpu variables - `DEFINE_PER_CPU` macro:
|
|||||||
|
|
||||||
This macro defined in the [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) as many other macros for work with per-cpu variables. Now we will see how this feature is implemented.
|
This macro defined in the [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) as many other macros for work with per-cpu variables. Now we will see how this feature is implemented.
|
||||||
|
|
||||||
Take a look at the `DECLARE_PER_CPU` definition. We see that it takes 2 parameters: `type` and `name`, so we can use it to create per-cpu variable, for example like this:
|
Take a look at the `DECLARE_PER_CPU` definition. We see that it takes 2 parameters: `type` and `name`, so we can use it to create per-cpu variables, for example like this:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
DEFINE_PER_CPU(int, per_cpu_n)
|
DEFINE_PER_CPU(int, per_cpu_n)
|
||||||
```
|
```
|
||||||
|
|
||||||
We pass the type and the name of our variable. `DEFI_PER_CPU` calls `DEFINE_PER_CPU_SECTION` macro and passes the same two paramaters and empty string to it. Let's look at the definition of the `DEFINE_PER_CPU_SECTION`:
|
We pass the type and the name of our variable. `DEFINE_PER_CPU` calls the `DEFINE_PER_CPU_SECTION` macro and passes the same two paramaters and empty string to it. Let's look at the definition of the `DEFINE_PER_CPU_SECTION`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define DEFINE_PER_CPU_SECTION(type, name, sec) \
|
#define DEFINE_PER_CPU_SECTION(type, name, sec) \
|
||||||
@ -32,35 +32,35 @@ We pass the type and the name of our variable. `DEFI_PER_CPU` calls `DEFINE_PER_
|
|||||||
PER_CPU_ATTRIBUTES
|
PER_CPU_ATTRIBUTES
|
||||||
```
|
```
|
||||||
|
|
||||||
where section is:
|
where `section` is:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define PER_CPU_BASE_SECTION ".data..percpu"
|
#define PER_CPU_BASE_SECTION ".data..percpu"
|
||||||
```
|
```
|
||||||
|
|
||||||
After all macros are expanded we will get global per-cpu variable:
|
After all macros are expanded we will get a global per-cpu variable:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
__attribute__((section(".data..percpu"))) int per_cpu_n
|
__attribute__((section(".data..percpu"))) int per_cpu_n
|
||||||
```
|
```
|
||||||
|
|
||||||
It means that we will have `per_cpu_n` variable in the `.data..percpu` section. We can find this section in the `vmlinux`:
|
It means that we will have a `per_cpu_n` variable in the `.data..percpu` section. We can find this section in the `vmlinux`:
|
||||||
|
|
||||||
```
|
```
|
||||||
.data..percpu 00013a58 0000000000000000 0000000001a5c000 00e00000 2**12
|
.data..percpu 00013a58 0000000000000000 0000000001a5c000 00e00000 2**12
|
||||||
CONTENTS, ALLOC, LOAD, DATA
|
CONTENTS, ALLOC, LOAD, DATA
|
||||||
```
|
```
|
||||||
|
|
||||||
Ok, now we know that when we use `DEFINE_PER_CPU` macro, per-cpu variable in the `.data..percpu` section will be created. When the kernel initilizes it calls `setup_per_cpu_areas` function which loads `.data..percpu` section multiply times, one section per CPU.
|
Ok, now we know that when we use the `DEFINE_PER_CPU` macro, a per-cpu variable in the `.data..percpu` section will be created. When the kernel initializes it calls the `setup_per_cpu_areas` function which loads the `.data..percpu` section multiple times, one section per CPU.
|
||||||
|
|
||||||
Let's look on the per-CPU areas initialization process. It start in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) from the call of the `setup_per_cpu_areas` function which defined in the [arch/x86/kernel/setup_percpu.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup_percpu.c).
|
Let's look at the per-CPU areas initialization process. It starts in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) from the call of the `setup_per_cpu_areas` function which is defined in the [arch/x86/kernel/setup_percpu.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup_percpu.c).
|
||||||
|
|
||||||
```C
|
```C
|
||||||
pr_info("NR_CPUS:%d nr_cpumask_bits:%d nr_cpu_ids:%d nr_node_ids:%d\n",
|
pr_info("NR_CPUS:%d nr_cpumask_bits:%d nr_cpu_ids:%d nr_node_ids:%d\n",
|
||||||
NR_CPUS, nr_cpumask_bits, nr_cpu_ids, nr_node_ids);
|
NR_CPUS, nr_cpumask_bits, nr_cpu_ids, nr_node_ids);
|
||||||
```
|
```
|
||||||
|
|
||||||
The `setup_per_cpu_areas` starts from the output information about the Maximum number of CPUs set during kernel configuration with `CONFIG_NR_CPUS` configuration option, actual number of CPUs, `nr_cpumask_bits` is the same that `NR_CPUS` bit for the new `cpumask` operators and number of `NUMA` nodes.
|
The `setup_per_cpu_areas` starts from the output information about the maximum number of CPUs set during kernel configuration with the `CONFIG_NR_CPUS` configuration option, actual number of CPUs, `nr_cpumask_bits` is the same that `NR_CPUS` bit for the new `cpumask` operators and number of `NUMA` nodes.
|
||||||
|
|
||||||
We can see this output in the dmesg:
|
We can see this output in the dmesg:
|
||||||
|
|
||||||
@ -69,7 +69,7 @@ $ dmesg | grep percpu
|
|||||||
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
|
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
|
||||||
```
|
```
|
||||||
|
|
||||||
In the next step we check `percpu` first chunk allocator. All percpu areas are allocated in chunks. First chunk is used for the static percpu variables. Linux kernel has `percpu_alloc` command line parameters which provides type of the first chunk allocator. We can read about it in the kernel documentation:
|
In the next step we check the `percpu` first chunk allocator. All percpu areas are allocated in chunks. The first chunk is used for the static percpu variables. The Linux kernel has `percpu_alloc` command line parameters which provides the type of the first chunk allocator. We can read about it in the kernel documentation:
|
||||||
|
|
||||||
```
|
```
|
||||||
percpu_alloc= Select which percpu first chunk allocator to use.
|
percpu_alloc= Select which percpu first chunk allocator to use.
|
||||||
@ -80,21 +80,21 @@ percpu_alloc= Select which percpu first chunk allocator to use.
|
|||||||
and performance comparison.
|
and performance comparison.
|
||||||
```
|
```
|
||||||
|
|
||||||
The [mm/percpu.c](https://github.com/torvalds/linux/blob/master/mm/percpu.c) contains handler of this command line option:
|
The [mm/percpu.c](https://github.com/torvalds/linux/blob/master/mm/percpu.c) contains the handler of this command line option:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
early_param("percpu_alloc", percpu_alloc_setup);
|
early_param("percpu_alloc", percpu_alloc_setup);
|
||||||
```
|
```
|
||||||
|
|
||||||
Where `percpu_alloc_setup` function sets the `pcpu_chosen_fc` variable depends on the `percpu_alloc` parameter value. By default first chunk allocator is `auto`:
|
Where the `percpu_alloc_setup` function sets the `pcpu_chosen_fc` variable depends on the `percpu_alloc` parameter value. By default the first chunk allocator is `auto`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
enum pcpu_fc pcpu_chosen_fc __initdata = PCPU_FC_AUTO;
|
enum pcpu_fc pcpu_chosen_fc __initdata = PCPU_FC_AUTO;
|
||||||
```
|
```
|
||||||
|
|
||||||
If `percpu_alooc` parameter not given to the kernel command line, the `embed` allocator will be used wich as you can understand embed the first percpu chunk into bootmem with the [memblock](http://0xax.gitbooks.io/linux-insides/content/mm/linux-mm-1.html). The last allocator is first chunk `page` allocator which maps first chunk with `PAGE_SIZE` pages.
|
If the `percpu_alloc` parameter is not given to the kernel command line, the `embed` allocator will be used which embeds the first percpu chunk into bootmem with the [memblock](http://0xax.gitbooks.io/linux-insides/content/mm/linux-mm-1.html). The last allocator is the first chunk `page` allocator which maps the first chunk with `PAGE_SIZE` pages.
|
||||||
|
|
||||||
As I wrote about first of all we make a check of the first chunk allocator type in the `setup_per_cpu_areas`. First of all we check that first chunk allocator is not page:
|
As I wrote about first of all, we make a check of the first chunk allocator type in the `setup_per_cpu_areas`. First of all we check that first chunk allocator is not page:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
|
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
|
||||||
@ -104,7 +104,7 @@ if (pcpu_chosen_fc != PCPU_FC_PAGE) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
If it is not `PCPU_FC_PAGE`, we will use `embed` allocator and allocate space for the first chunk with the `pcpu_embed_first_chunk` function:
|
If it is not `PCPU_FC_PAGE`, we will use the `embed` allocator and allocate space for the first chunk with the `pcpu_embed_first_chunk` function:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
rc = pcpu_embed_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
|
rc = pcpu_embed_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
|
||||||
@ -116,13 +116,13 @@ rc = pcpu_embed_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
|
|||||||
As I wrote above, the `pcpu_embed_first_chunk` function embeds the first percpu chunk into bootmem. As you can see we pass a couple of parameters to the `pcup_embed_first_chunk`, they are
|
As I wrote above, the `pcpu_embed_first_chunk` function embeds the first percpu chunk into bootmem. As you can see we pass a couple of parameters to the `pcup_embed_first_chunk`, they are
|
||||||
|
|
||||||
* `PERCPU_FIRST_CHUNK_RESERVE` - the size of the reserved space for the static `percpu` variables;
|
* `PERCPU_FIRST_CHUNK_RESERVE` - the size of the reserved space for the static `percpu` variables;
|
||||||
* `dyn_size` - minimum free size for dynamic allocation in byte;
|
* `dyn_size` - minimum free size for dynamic allocation in bytes;
|
||||||
* `atom_size` - all allocations are whole multiples of this and aligned to this parameter;
|
* `atom_size` - all allocations are whole multiples of this and aligned to this parameter;
|
||||||
* `pcpu_cpu_distance` - callback to determine distance between cpus;
|
* `pcpu_cpu_distance` - callback to determine distance between cpus;
|
||||||
* `pcpu_fc_alloc` - function to allocate `percpu` page;
|
* `pcpu_fc_alloc` - function to allocate `percpu` page;
|
||||||
* `pcpu_fc_free` - function to release `percpu` page.
|
* `pcpu_fc_free` - function to release `percpu` page.
|
||||||
|
|
||||||
All of this parameters we calculat before the call of the `pcpu_embed_first_chunk`:
|
All of these parameters we calculate before the call of the `pcpu_embed_first_chunk`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
const size_t dyn_size = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE - PERCPU_FIRST_CHUNK_RESERVE;
|
const size_t dyn_size = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE - PERCPU_FIRST_CHUNK_RESERVE;
|
||||||
@ -134,15 +134,15 @@ size_t atom_size;
|
|||||||
#endif
|
#endif
|
||||||
```
|
```
|
||||||
|
|
||||||
If first chunk allocator is `PCPU_FC_PAGE`, we will use the `pcpu_page_first_chunk` instead of the `pcpu_embed_first_chunk`. After that `percpu` areas up, we setup `percpu` offset and its segment for the every CPU with the `setup_percpu_segment` function (only for `x86` systems) and move some early data from the arrays to the `percpu` variables (`x86_cpu_to_apicid`, `irq_stack_ptr` and etc...). After the kernel finished the initialization process, we have loaded N `.data..percpu` sections, where N is the number of CPU, and section used by bootstrap processor will contain uninitialized variable created with `DEFINE_PER_CPU` macro.
|
If the first chunk allocator is `PCPU_FC_PAGE`, we will use the `pcpu_page_first_chunk` instead of the `pcpu_embed_first_chunk`. After that `percpu` areas up, we setup `percpu` offset and its segment for every CPU with the `setup_percpu_segment` function (only for `x86` systems) and move some early data from the arrays to the `percpu` variables (`x86_cpu_to_apicid`, `irq_stack_ptr` and etc...). After the kernel finishes the initialization process, we will have loaded N `.data..percpu` sections, where N is the number of CPUs, and the section used by the bootstrap processor will contain an uninitialized variable created with the `DEFINE_PER_CPU` macro.
|
||||||
|
|
||||||
The kernel provides API for per-cpu variables manipulating:
|
The kernel provides an API for per-cpu variables manipulating:
|
||||||
|
|
||||||
* get_cpu_var(var)
|
* get_cpu_var(var)
|
||||||
* put_cpu_var(var)
|
* put_cpu_var(var)
|
||||||
|
|
||||||
|
|
||||||
Let's look at `get_cpu_var` implementation:
|
Let's look at the `get_cpu_var` implementation:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define get_cpu_var(var) \
|
#define get_cpu_var(var) \
|
||||||
@ -152,7 +152,7 @@ Let's look at `get_cpu_var` implementation:
|
|||||||
}))
|
}))
|
||||||
```
|
```
|
||||||
|
|
||||||
Linux kernel is preemptible and accessing a per-cpu variable requires to know which processor kernel running on. So, current code must not be preempted and moved to the another CPU while accessing a per-cpu variable. That's why first of all we can see call of the `preempt_disable` function. After this we can see call of the `this_cpu_ptr` macro, which looks as:
|
The Linux kernel is preemptible and accessing a per-cpu variable requires us to know which processor the kernel running on. So, current code must not be preempted and moved to the another CPU while accessing a per-cpu variable. That's why first of all we can see a call of the `preempt_disable` function. After this we can see a call of the `this_cpu_ptr` macro, which looks like:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
|
#define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
|
||||||
@ -164,7 +164,7 @@ and
|
|||||||
#define raw_cpu_ptr(ptr) per_cpu_ptr(ptr, 0)
|
#define raw_cpu_ptr(ptr) per_cpu_ptr(ptr, 0)
|
||||||
```
|
```
|
||||||
|
|
||||||
where `per_cpu_ptr` returns a pointer to the per-cpu variable for the given cpu (second parameter). After that we got per-cpu variables and made any manipulations on it, we must call `put_cpu_var` macro which enables preemption with call of `preempt_enable` function. So the typical usage of a per-cpu variable is following:
|
where `per_cpu_ptr` returns a pointer to the per-cpu variable for the given cpu (second parameter). After we've created a per-cpu variable and made modifications to it, we must call the `put_cpu_var` macro which enables preemption with a call of `preempt_enable` function. So the typical usage of a per-cpu variable is as follows:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
get_cpu_var(var);
|
get_cpu_var(var);
|
||||||
@ -174,7 +174,7 @@ get_cpu_var(var);
|
|||||||
put_cpu_var(var);
|
put_cpu_var(var);
|
||||||
```
|
```
|
||||||
|
|
||||||
Let's look at `per_cpu_ptr` macro:
|
Let's look at the `per_cpu_ptr` macro:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define per_cpu_ptr(ptr, cpu) \
|
#define per_cpu_ptr(ptr, cpu) \
|
||||||
@ -184,7 +184,7 @@ Let's look at `per_cpu_ptr` macro:
|
|||||||
})
|
})
|
||||||
```
|
```
|
||||||
|
|
||||||
As I wrote above, this macro returns per-cpu variable for the given cpu. First of all it calls `__verify_pcpu_ptr`:
|
As I wrote above, this macro returns a per-cpu variable for the given cpu. First of all it calls `__verify_pcpu_ptr`:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define __verify_pcpu_ptr(ptr)
|
#define __verify_pcpu_ptr(ptr)
|
||||||
@ -194,37 +194,37 @@ do {
|
|||||||
} while (0)
|
} while (0)
|
||||||
```
|
```
|
||||||
|
|
||||||
which makes given `ptr` type of `const void __percpu *`,
|
which makes the given `ptr` type of `const void __percpu *`,
|
||||||
|
|
||||||
After this we can see the call of the `SHIFT_PERCPU_PTR` macro with two parameters. At first parameter we pass our ptr and sencond we pass cpu number to the `per_cpu_offset` macro which:
|
After this we can see the call of the `SHIFT_PERCPU_PTR` macro with two parameters. At first parameter we pass our ptr and second we pass the cpu number to the `per_cpu_offset` macro:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define per_cpu_offset(x) (__per_cpu_offset[x])
|
#define per_cpu_offset(x) (__per_cpu_offset[x])
|
||||||
```
|
```
|
||||||
|
|
||||||
expands to getting `x` element from the `__per_cpu_offset` array:
|
which expands to getting the `x` element from the `__per_cpu_offset` array:
|
||||||
|
|
||||||
|
|
||||||
```C
|
```C
|
||||||
extern unsigned long __per_cpu_offset[NR_CPUS];
|
extern unsigned long __per_cpu_offset[NR_CPUS];
|
||||||
```
|
```
|
||||||
|
|
||||||
where `NR_CPUS` is the number of CPUs. `__per_cpu_offset` array filled with the distances between cpu-variables copies. For example all per-cpu data is `X` bytes size, so if we access `__per_cpu_offset[Y]`, so `X*Y` will be accessed. Let's look at the `SHIFT_PERCPU_PTR` implementation:
|
where `NR_CPUS` is the number of CPUs. The `__per_cpu_offset` array is filled with the distances between cpu-variable copies. For example all per-cpu data is `X` bytes in size, so if we access `__per_cpu_offset[Y]`, `X*Y` will be accessed. Let's look at the `SHIFT_PERCPU_PTR` implementation:
|
||||||
|
|
||||||
```C
|
```C
|
||||||
#define SHIFT_PERCPU_PTR(__p, __offset) \
|
#define SHIFT_PERCPU_PTR(__p, __offset) \
|
||||||
RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
|
RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
|
||||||
```
|
```
|
||||||
|
|
||||||
`RELOC_HIDE` just returns offset `(typeof(ptr)) (__ptr + (off))` and it will be pointer of the variable.
|
`RELOC_HIDE` just returns offset `(typeof(ptr)) (__ptr + (off))` and it will return a pointer to the variable.
|
||||||
|
|
||||||
That's all! Of course it is not the full API, but the general part. It can be hard for the start, but to understand per-cpu variables feature need to understand mainly [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) magic.
|
That's all! Of course it is not the full API, but a general overview. It can be hard to start with, but to understand per-cpu variables you mainly need to understand the [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) magic.
|
||||||
|
|
||||||
Let's again look at the algorithm of getting pointer on per-cpu variable:
|
Let's again look at the algorithm of getting a pointer to a per-cpu variable:
|
||||||
|
|
||||||
* The kernel creates multiply `.data..percpu` sections (ones perc-pu) during initialization process;
|
* The kernel creates multiple `.data..percpu` sections (one per-cpu) during initialization process;
|
||||||
* All variables created with the `DEFINE_PER_CPU` macro will be reloacated to the first section or for CPU0;
|
* All variables created with the `DEFINE_PER_CPU` macro will be relocated to the first section or for CPU0;
|
||||||
* `__per_cpu_offset` array filled with the distance (`BOOT_PERCPU_OFFSET`) between `.data..percpu` sections;
|
* `__per_cpu_offset` array filled with the distance (`BOOT_PERCPU_OFFSET`) between `.data..percpu` sections;
|
||||||
* When `per_cpu_ptr` called for example for getting pointer on the certain per-cpu variable for the third CPU, `__per_cpu_offset` array will be accessed, where every index points to the certain CPU.
|
* When the `per_cpu_ptr` is called, for example for getting a pointer on a certain per-cpu variable for the third CPU, the `__per_cpu_offset` array will be accessed, where every index points to the required CPU.
|
||||||
|
|
||||||
That's all.
|
That's all.
|
||||||
|
Loading…
Reference in New Issue
Block a user