mirror of
https://github.com/0xAX/linux-insides.git
synced 2024-12-22 14:48:08 +00:00
Merge pull request #328 from spacewander/linux-timers-1-fix
update Timers
This commit is contained in:
commit
9d4536c96d
@ -284,7 +284,7 @@ function. This function goes through the list of handlers that contains differen
|
||||
* `binfmt_elf_fdpic` - Support for [elf](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) [FDPIC](http://elinux.org/UClinux_Shared_Library#FDPIC_ELF) binaries;
|
||||
* `binfmt_em86` - support for Intel [elf](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) binaries running on [Alpha](https://en.wikipedia.org/wiki/DEC_Alpha) machines.
|
||||
|
||||
So, the search-binary_handler tries to call the `load_binary` function and pass `linux_binprm` to it. If the binary handler supports the given executable file format, it starts to prepare the executable binary for execution:
|
||||
So, the search_binary_handler tries to call the `load_binary` function and pass `linux_binprm` to it. If the binary handler supports the given executable file format, it starts to prepare the executable binary for execution:
|
||||
|
||||
```C
|
||||
int search_binary_handler(struct linux_binprm *bprm)
|
||||
|
@ -25,7 +25,7 @@ x86_init.timers.wallclock_init();
|
||||
|
||||
We already saw `x86_init` structure in the chapter that describes initialization of the Linux kernel. This structure contains pointers to the default setup functions for the different platforms like [Intel MID](https://en.wikipedia.org/wiki/Mobile_Internet_device#Intel_MID_platforms), [Intel CE4100](http://www.wpgholdings.com/epaper/US/newsRelease_20091215/255874.pdf) and etc. The `x86_init` structure defined in the [arch/x86/kernel/x86_init.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/x86_init.c#L36) and as you can see it determines standard PC hardware by default.
|
||||
|
||||
As we can see, the `x86_init` structure has `x86_init_ops` type that provides a set of functions for platform specific setup like reserviing standard resources, platform specific memory setup, initialization of interrupt handlers and etc. This structure looks like:
|
||||
As we can see, the `x86_init` structure has `x86_init_ops` type that provides a set of functions for platform specific setup like reserving standard resources, platform specific memory setup, initialization of interrupt handlers and etc. This structure looks like:
|
||||
|
||||
```C
|
||||
struct x86_init_ops {
|
||||
@ -105,7 +105,7 @@ void __init intel_mid_rtc_init(void)
|
||||
}
|
||||
```
|
||||
|
||||
That's all, after this a device based on `Intel MID` will be able to get get time from hardware clock. As I already wrote, the standard PC [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture does not support `function` and just do nothing during call of this function. We just saw initialization of the [real time clock](https://en.wikipedia.org/wiki/Real-time_clock) for the [Intel MID](https://en.wikipedia.org/wiki/Mobile_Internet_device#Intel_MID_platforms) architecture and now times to return to the general `x86_64` architecture and will look on the time management related stuff there.
|
||||
That's all, after this a device based on `Intel MID` will be able to get time from hardware clock. As I already wrote, the standard PC [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture does not support `x86_init_noop` and just do nothing during call of this function. We just saw initialization of the [real time clock](https://en.wikipedia.org/wiki/Real-time_clock) for the [Intel MID](https://en.wikipedia.org/wiki/Mobile_Internet_device#Intel_MID_platforms) architecture and now times to return to the general `x86_64` architecture and will look on the time management related stuff there.
|
||||
|
||||
Acquainted with jiffies
|
||||
--------------------------------------------------------------------------------
|
||||
@ -128,7 +128,7 @@ This definition is very similar to the `jiffy` in the Linux kernel. There is glo
|
||||
extern unsigned long volatile __jiffy_data jiffies;
|
||||
```
|
||||
|
||||
during initialization process. This global variable will be incremented each time during timer interrupt. Besides this, near the `jiffies` variable we can see definition of the similar variable
|
||||
during initialization process. This global variable will be increased each time during timer interrupt. Besides this, near the `jiffies` variable we can see definition of the similar variable
|
||||
|
||||
```C
|
||||
extern u64 jiffies_64;
|
||||
@ -168,7 +168,7 @@ Now we know a little theory about `jiffies` and we can return to the our functio
|
||||
The `clocksource` is hardware abstraction for a free-running counter.
|
||||
```
|
||||
|
||||
I'm not sure about you, but that description didn't give a good understanding about the `clocksource` concept. Let's try to understand what is it, but we will not go deeper because this topic will be described in a separate part in much more detail. The main point of the `clocksource` is timekeeping abstraction or in very simple words - it provides a time value to the kernel. We already know about `jiffies` interface that represents number of ticks that have occurred since the system booted. It represented by the global variable in the Linux kernel and incremented each timer interrupt. The Linux kernel can use `jiffies` for time measurement. So why do we need in separate context like the `clocksource`? Actually different hardware devices provide different clock sources that are widely in their capabilities. The availability of more precise techniques for time intervals measurement is hardware-dependent.
|
||||
I'm not sure about you, but that description didn't give a good understanding about the `clocksource` concept. Let's try to understand what is it, but we will not go deeper because this topic will be described in a separate part in much more detail. The main point of the `clocksource` is timekeeping abstraction or in very simple words - it provides a time value to the kernel. We already know about `jiffies` interface that represents number of ticks that have occurred since the system booted. It represented by the global variable in the Linux kernel and increased each timer interrupt. The Linux kernel can use `jiffies` for time measurement. So why do we need in separate context like the `clocksource`? Actually different hardware devices provide different clock sources that are widely in their capabilities. The availability of more precise techniques for time intervals measurement is hardware-dependent.
|
||||
|
||||
For example `x86` has on-chip a 64-bit counter that is called [Time Stamp Counter](https://en.wikipedia.org/wiki/Time_Stamp_Counter) and its frequency can be equal to processor frequency. Or for example [High Precision Event Timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer) that consists of a `64-bit` counter of at least `10 MHz` frequency. Two different timers and they are both for `x86`. If we will add timers from other architectures, this only makes this problem more complex. The Linux kernel provides `clocksource` concept to solve the problem.
|
||||
|
||||
@ -279,7 +279,7 @@ As I already wrote, the main purpose of the `register_refined_jiffies` function
|
||||
struct clocksource refined_jiffies;
|
||||
```
|
||||
|
||||
There is one different between `refined_jiffies` and `clocksource_jiffies`: The standard `jiffies` based clock source is the lowest common denominator clock source which should function on all systems. As we already know, the `jiffies` global variable will be incremented during each timer interrupt. This means that standard `jiffies` based clock source has the same resolution as the timer interrupt frequency. From this we can understand that standard `jiffies` based clock source may suffer from inaccuracies. The `refined_jiffies` uses `CLOCK_TICK_RATE` as the base of `jiffies` shift.
|
||||
There is one different between `refined_jiffies` and `clocksource_jiffies`: The standard `jiffies` based clock source is the lowest common denominator clock source which should function on all systems. As we already know, the `jiffies` global variable will be increased during each timer interrupt. This means that standard `jiffies` based clock source has the same resolution as the timer interrupt frequency. From this we can understand that standard `jiffies` based clock source may suffer from inaccuracies. The `refined_jiffies` uses `CLOCK_TICK_RATE` as the base of `jiffies` shift.
|
||||
|
||||
Let's look on the implementation of this function. First of all we can see that the `refined_jiffies` clock source based on the `clocksource_jiffies` structure:
|
||||
|
||||
@ -297,7 +297,7 @@ int register_refined_jiffies(long cycles_per_second)
|
||||
...
|
||||
```
|
||||
|
||||
Here we can see that we update the name of the `refined_jiffies` to `refined-jiffies` and increment the rating of this structure. As you remember, the `clocksource_jiffies` has rating - `1`, so our `refined_jiffies` clocksource will have rating - `2`. This means that the `refined_jiffies` will be best selection for clock source management code.
|
||||
Here we can see that we update the name of the `refined_jiffies` to `refined-jiffies` and increase the rating of this structure. As you remember, the `clocksource_jiffies` has rating - `1`, so our `refined_jiffies` clocksource will have rating - `2`. This means that the `refined_jiffies` will be best selection for clock source management code.
|
||||
|
||||
In the next step we need to calculate number of cycles per one tick:
|
||||
|
||||
@ -354,7 +354,7 @@ We just saw initialization of two `jiffies` based clock sources in the previous
|
||||
* standard `jiffies` based clock source;
|
||||
* refined `jiffies` based clock source;
|
||||
|
||||
Don't worry if you don't understand the calculations here. They look frightening at first. Soon, step by step we will learn these things. So, we just saw initialization of `jffies` based clock sources and also we know that the Linux kernel has the global variable `jiffies` that holds the number of ticks that have occured since the kernel started to work. Now, let's look how to use it. To use `jiffies` we just can use `jiffies` global variable by its name or with the call of the `get_jiffies_64` function. This function defined in the [kernel/time/jiffies.c](https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c) source code file and just returns full full `64-bit` value of the `jiffies`:
|
||||
Don't worry if you don't understand the calculations here. They look frightening at first. Soon, step by step we will learn these things. So, we just saw initialization of `jffies` based clock sources and also we know that the Linux kernel has the global variable `jiffies` that holds the number of ticks that have occured since the kernel started to work. Now, let's look how to use it. To use `jiffies` we just can use `jiffies` global variable by its name or with the call of the `get_jiffies_64` function. This function defined in the [kernel/time/jiffies.c](https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c) source code file and just returns full `64-bit` value of the `jiffies`:
|
||||
|
||||
```C
|
||||
u64 get_jiffies_64(void)
|
||||
|
@ -9,7 +9,7 @@ The previous [part](https://0xax.gitbooks.io/linux-insides/content/Timers/timers
|
||||
* `jiffies`
|
||||
* `clocksource`
|
||||
|
||||
The first is the global variable that defined in the [include/linux/jiffies.h](https://github.com/torvalds/linux/blob/master/include/linux/jiffies.h) header file and represents counter that incremented during each timer interrupt. So if we can access this global variable and we know timer interrupt rate we can convert `jiffies` to the human time units. As we already know the timer interrupt rate represented by the compile-time constant that is called `HZ` in the Linux kernel. The value of the `HZ` is equal to the value of the `CONFIG_HZ` kernel configuration option and if we will look in the [arch/x86/configs/x86_64_defconfig](https://github.com/torvalds/linux/blob/master/arch/x86/configs/x86_64_defconfig) kernel configuration file, we will see that:
|
||||
The first is the global variable that defined in the [include/linux/jiffies.h](https://github.com/torvalds/linux/blob/master/include/linux/jiffies.h) header file and represents counter that increased during each timer interrupt. So if we can access this global variable and we know timer interrupt rate we can convert `jiffies` to the human time units. As we already know the timer interrupt rate represented by the compile-time constant that is called `HZ` in the Linux kernel. The value of the `HZ` is equal to the value of the `CONFIG_HZ` kernel configuration option and if we will look in the [arch/x86/configs/x86_64_defconfig](https://github.com/torvalds/linux/blob/master/arch/x86/configs/x86_64_defconfig) kernel configuration file, we will see that:
|
||||
|
||||
```
|
||||
CONFIG_HZ_1000=y
|
||||
@ -79,7 +79,7 @@ So, you can find that `jiffies` variable is very widely used in the Linux kernel
|
||||
Introduction to `clocksource`
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
The `clocksource` concept represents generic API for clock sources management in the Linux kernel. Why do we need separate framework for this? Let's go back to the beginning. The `time` concept is fundamental concept in the Linux kernel and other operating system kernels. And the timekeeping is an one one of the necessities to use this concept. For example Linux kernel must know and update the time elapsed since system startup, it must determine how long the current process has been running for every processor and many many more. Where the Linux kernel can get information about time? First of all it is Real Time Clock or [RTC](https://en.wikipedia.org/wiki/Real-time_clock) that represents by the a nonvolatile device. You can find a set of architecture-independend real time clock drivers in the Linux kernel in the [drivers/rtc](https://github.com/torvalds/linux/tree/master/drivers/rtc) directory. Besides this, each architecture can provide a driver for the architecture-dependend real time clock, for example - `CMOS/RTC` - [arch/x86/kernel/rtc.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/rtc.c) for the [x86](https://en.wikipedia.org/wiki/X86) architecture. The second is system timer - timer that excites [interrupts](https://en.wikipedia.org/wiki/Interrupt) with a periodic rate. For example, for [IBM PC](https://en.wikipedia.org/wiki/IBM_Personal_Computer) compatibles it was - [programmable interval timer](https://en.wikipedia.org/wiki/Programmable_interval_timer).
|
||||
The `clocksource` concept represents generic API for clock sources management in the Linux kernel. Why do we need separate framework for this? Let's go back to the beginning. The `time` concept is fundamental concept in the Linux kernel and other operating system kernels. And the timekeeping is one of the necessities to use this concept. For example Linux kernel must know and update the time elapsed since system startup, it must determine how long the current process has been running for every processor and many many more. Where the Linux kernel can get information about time? First of all it is Real Time Clock or [RTC](https://en.wikipedia.org/wiki/Real-time_clock) that represents by the a nonvolatile device. You can find a set of architecture-independend real time clock drivers in the Linux kernel in the [drivers/rtc](https://github.com/torvalds/linux/tree/master/drivers/rtc) directory. Besides this, each architecture can provide a driver for the architecture-dependend real time clock, for example - `CMOS/RTC` - [arch/x86/kernel/rtc.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/rtc.c) for the [x86](https://en.wikipedia.org/wiki/X86) architecture. The second is system timer - timer that excites [interrupts](https://en.wikipedia.org/wiki/Interrupt) with a periodic rate. For example, for [IBM PC](https://en.wikipedia.org/wiki/IBM_Personal_Computer) compatibles it was - [programmable interval timer](https://en.wikipedia.org/wiki/Programmable_interval_timer).
|
||||
|
||||
We already know that for timekeeping purposes we can use `jiffies` in the Linux kernel. The `jiffies` can be considered as read only global variable which is updated with `HZ` frequency. We know that the `HZ` is a compile-time kernel parameter whose reasonable range is from `100` to `1000` [Hz](https://en.wikipedia.org/wiki/Hertz). So, it is guaranteed to have an interface for time measurement with `1` - `10` milliseconds resolution. Besides standard `jiffies`, we saw the `refined_jiffies` clock source in the previous part that is based on the `i8253/i8254` [programmable interval timer](https://en.wikipedia.org/wiki/Programmable_interval_timer) tick rate which is almost `1193182` hertz. So we can get something about `1` microsecond resolution with the `refined_jiffies`. In this time, [nanoseconds](https://en.wikipedia.org/wiki/Nanosecond) are the favorite choice for the time value units of the given clock source.
|
||||
|
||||
@ -123,7 +123,7 @@ struct clocksource {
|
||||
} ____cacheline_aligned;
|
||||
```
|
||||
|
||||
We already saw the first field of the `clocksource` structure in the previous part - it is pointer to the `read` function that returns best conter selected by the clocksource framework. For example we use `jiffies_read` function to read `jiffies` value:
|
||||
We already saw the first field of the `clocksource` structure in the previous part - it is pointer to the `read` function that returns best counter selected by the clocksource framework. For example we use `jiffies_read` function to read `jiffies` value:
|
||||
|
||||
```C
|
||||
static struct clocksource clocksource_jiffies = {
|
||||
@ -269,7 +269,7 @@ void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see calculation of the maximum number of seconds which we can run before a clock source counter will overflow. First of all we fill the `sec` variable with the value of a clock source mask. Remember that a clock source's mask represents maximum amount of bits that are valid for the given clock source. After this, we can see two division operations. At first we divide our `sec` variable on a clock source frequency and than on scale factor. The `freq` parameter shows us how many timer interrupts will be occured in one second. So, we divide `mask` value that represents maximum number of a counter (for example `jiffy`) on the frequency of a timer and will get the maximum number of seconds for the certain clock source. The second division operation will give us maximum number of seconds for the certain clock source depends on its scale factor which can be `1` hertz or `1` kilohertz (10^ Hz).
|
||||
Here we can see calculation of the maximum number of seconds which we can run before a clock source counter will overflow. First of all we fill the `sec` variable with the value of a clock source mask. Remember that a clock source's mask represents maximum amount of bits that are valid for the given clock source. After this, we can see two division operations. At first we divide our `sec` variable on a clock source frequency and then on scale factor. The `freq` parameter shows us how many timer interrupts will be occured in one second. So, we divide `mask` value that represents maximum number of a counter (for example `jiffy`) on the frequency of a timer and will get the maximum number of seconds for the certain clock source. The second division operation will give us maximum number of seconds for the certain clock source depends on its scale factor which can be `1` hertz or `1` kilohertz (10^ Hz).
|
||||
|
||||
After we have got maximum number of seconds, we check this value and set it to `1` or `600` depends on the result at the next step. These values is maximum sleeping time for a clocksource in seconds. In the next step we can see call of the `clocks_calc_mult_shift`. Main point of this function is calculation of the `mult` and `shift` values for a given clock source. In the end of the `__clocksource_update_freq_scale` function we check that just calculated `mult` value of a given clock source will not cause overflow after adjustment, update the `max_idle_ns` and `max_cycles` values of a given clock source with the maximum nanoseconds that can be converted to a clock source counter and print result to the kernel buffer:
|
||||
|
||||
@ -316,7 +316,7 @@ static void clocksource_enqueue(struct clocksource *cs)
|
||||
}
|
||||
```
|
||||
|
||||
In the end we just insert new clocksource to the `clocksource_list`. The second function - `clocksource_enqueue_watchdog` does almost the same that previous function, but it inserts new clock source to the `wd_list` depends on flangs of a clock source and starts new [watchdog](https://en.wikipedia.org/wiki/Watchdog_timer) timer. As I already wrote, we will not consider `watchdog` related stuff in this part but will do it in next parts.
|
||||
In the end we just insert new clocksource to the `clocksource_list`. The second function - `clocksource_enqueue_watchdog` does almost the same that previous function, but it inserts new clock source to the `wd_list` depends on flags of a clock source and starts new [watchdog](https://en.wikipedia.org/wiki/Watchdog_timer) timer. As I already wrote, we will not consider `watchdog` related stuff in this part but will do it in next parts.
|
||||
|
||||
The last function is the `clocksource_select`. As we can understand from the function's name, main point of this function - select the best `clocksource` from registered clocksources. This function consists only from the call of the function helper:
|
||||
|
||||
@ -327,7 +327,7 @@ static void clocksource_select(void)
|
||||
}
|
||||
```
|
||||
|
||||
Note that the `__clocksource_select` function takes one parameter (`false` in our case). This [bool](https://en.wikipedia.org/wiki/Boolean_data_type) parameter shows how to traverese the `clocksource_list`. In our case we pass `false` that is mean that we will go through all entries of the `clocksource_list`. We already know that `clocksource` with the best rating will the first in the `clocksource_list` after the call of the `clocksource_enqueue` function, so we can easily get it from this list. After we found a clock source with the best rating, we switch to it:
|
||||
Note that the `__clocksource_select` function takes one parameter (`false` in our case). This [bool](https://en.wikipedia.org/wiki/Boolean_data_type) parameter shows how to traverse the `clocksource_list`. In our case we pass `false` that is meant that we will go through all entries of the `clocksource_list`. We already know that `clocksource` with the best rating will the first in the `clocksource_list` after the call of the `clocksource_enqueue` function, so we can easily get it from this list. After we found a clock source with the best rating, we switch to it:
|
||||
|
||||
```C
|
||||
if (curr_clocksource != best && !timekeeping_notify(best)) {
|
||||
|
@ -117,7 +117,7 @@ Ultimately, the memory space will be allocated for the given `cpumask` with the
|
||||
*mask = kmalloc_node(cpumask_size(), flags, node);
|
||||
```
|
||||
|
||||
Now let's look on the `cpumasks` that will be initialized in the `tick_broadcast_init` function. As we can see, the `tick_broadcast_init` function will initialize six `cpumasks`, and moreover, initialization of the last three `cpumasks` will be dependet on the `CONFIG_TICK_ONESHOT` kernel configuration option.
|
||||
Now let's look on the `cpumasks` that will be initialized in the `tick_broadcast_init` function. As we can see, the `tick_broadcast_init` function will initialize six `cpumasks`, and moreover, initialization of the last three `cpumasks` will be depended on the `CONFIG_TICK_ONESHOT` kernel configuration option.
|
||||
|
||||
The first three `cpumasks` are:
|
||||
|
||||
@ -321,7 +321,7 @@ If you remember, we have started this part with the call of the `tick_init` func
|
||||
Initialization of dyntick related data structures
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
We already saw some information about `dyntick` concept in this part and we know that this concept allows kernel to disable system timer interrupts in the `idle` state. The `tick_nohz_init` function makes initialization of the different data structures which are related to this concept. This function defined in the [kernel/time/tick-sched.c](https://github.com/torvalds/linux/blob/master/kernel/time/tich-sched.c) source code file and starts from the check of the value of the `tick_nohz_full_running` variable which represents state of the tick-less mode for the `idle` state and the state when system timer interrups are disabled durung a processor has only one runnable task:
|
||||
We already saw some information about `dyntick` concept in this part and we know that this concept allows kernel to disable system timer interrupts in the `idle` state. The `tick_nohz_init` function makes initialization of the different data structures which are related to this concept. This function defined in the [kernel/time/tick-sched.c](https://github.com/torvalds/linux/blob/master/kernel/time/tich-sched.c) source code file and starts from the check of the value of the `tick_nohz_full_running` variable which represents state of the tick-less mode for the `idle` state and the state when system timer interrups are disabled during a processor has only one runnable task:
|
||||
|
||||
```C
|
||||
if (!tick_nohz_full_running) {
|
||||
@ -330,7 +330,7 @@ if (!tick_nohz_full_running) {
|
||||
}
|
||||
```
|
||||
|
||||
If this mode is not running we cann the `tick_nohz_init_all` function that defined in the same source code file and check its result. The `tick_nohz_init_all` function tries to allocate the `tick_nohz_full_mask` with the call of the `alloc_cpumask_var` that will allocate space for a `tick_nohz_full_mask`. The `tck_nohz_full_mask` will store numbers of processors that have enabled full `NO_HZ`. After successful allocation of the `tick_nohz_full_mask` we set all bits in the `tick_nogz_full_mask`, set the `tick_nohz_full_running` and return result to the `tick_nohz_init` function:
|
||||
If this mode is not running we call the `tick_nohz_init_all` function that defined in the same source code file and check its result. The `tick_nohz_init_all` function tries to allocate the `tick_nohz_full_mask` with the call of the `alloc_cpumask_var` that will allocate space for a `tick_nohz_full_mask`. The `tck_nohz_full_mask` will store numbers of processors that have enabled full `NO_HZ`. After successful allocation of the `tick_nohz_full_mask` we set all bits in the `tick_nogz_full_mask`, set the `tick_nohz_full_running` and return result to the `tick_nohz_init` function:
|
||||
|
||||
```C
|
||||
static int tick_nohz_init_all(void)
|
||||
@ -360,7 +360,7 @@ if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
|
||||
}
|
||||
```
|
||||
|
||||
This `cpumask` will store number of processor for `housekeeping` or in other words we need at least in one processor that will not me in `NO_HZ` mode, because it will do timekeeping and etc. After this we check the result of the architecture-specific `arch_irq_work_has_interrupt` function. This function checks ability to send inter-processor interrupt for the certain architecture. We need to check this, because system timer of a processor will be disabled during `NO_HZ` mode, so there must be at least one online processor which can send inter-processor interrupt to awake offline processor. This function defined in the [arch/x86/include/asm/irq_work.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/irq_work.h) header file for the [x86_64](https://en.wikipedia.org/wiki/X86-64) and just checks that a processor has [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) from the [CPUID](https://en.wikipedia.org/wiki/CPUID):
|
||||
This `cpumask` will store number of processor for `housekeeping` or in other words we need at least in one processor that will not be in `NO_HZ` mode, because it will do timekeeping and etc. After this we check the result of the architecture-specific `arch_irq_work_has_interrupt` function. This function checks ability to send inter-processor interrupt for the certain architecture. We need to check this, because system timer of a processor will be disabled during `NO_HZ` mode, so there must be at least one online processor which can send inter-processor interrupt to awake offline processor. This function defined in the [arch/x86/include/asm/irq_work.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/irq_work.h) header file for the [x86_64](https://en.wikipedia.org/wiki/X86-64) and just checks that a processor has [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) from the [CPUID](https://en.wikipedia.org/wiki/CPUID):
|
||||
|
||||
```C
|
||||
static inline bool arch_irq_work_has_interrupt(void)
|
||||
@ -369,7 +369,7 @@ static inline bool arch_irq_work_has_interrupt(void)
|
||||
}
|
||||
```
|
||||
|
||||
If a processor has not `APIC`, the Linux kernel prints warning message, clears the `tick_nohz_full_mask` cpumask, copies numbers of all posible processors in the system to the `housekeeping_mask` and resets the value of the `tick_nogz_full_running` variable:
|
||||
If a processor has not `APIC`, the Linux kernel prints warning message, clears the `tick_nohz_full_mask` cpumask, copies numbers of all possible processors in the system to the `housekeeping_mask` and resets the value of the `tick_nohz_full_running` variable:
|
||||
|
||||
```C
|
||||
if (!arch_irq_work_has_interrupt()) {
|
||||
@ -382,7 +382,7 @@ if (!arch_irq_work_has_interrupt()) {
|
||||
}
|
||||
```
|
||||
|
||||
After this step, we get the number of the current processor by the call of the `smp_processor_id` and check this processor in the `tick_nogh_full_mask`. If the `tick_nohz_full_mask` contains a given processor we clear appropriate bit in the `tick_nohz_full_mask`:
|
||||
After this step, we get the number of the current processor by the call of the `smp_processor_id` and check this processor in the `tick_nohz_full_mask`. If the `tick_nohz_full_mask` contains a given processor we clear appropriate bit in the `tick_nohz_full_mask`:
|
||||
|
||||
```C
|
||||
cpu = smp_processor_id();
|
||||
@ -393,7 +393,7 @@ if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
|
||||
}
|
||||
```
|
||||
|
||||
Because this processor will be used for timekeeping. After this step we put all numbers of processors that are in the `cpu_posssible_mask` and not in the `tick_nogz_full_mask`:
|
||||
Because this processor will be used for timekeeping. After this step we put all numbers of processors that are in the `cpu_possible_mask` and not in the `tick_nohz_full_mask`:
|
||||
|
||||
```C
|
||||
cpumask_andnot(housekeeping_mask,
|
||||
|
@ -4,7 +4,7 @@ Timers in the Linux kernel. Part 4.
|
||||
Timers
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
This is fourth part of the [chapter](https://0xax.gitbooks.io/linux-insides/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel and in the previous [part](https://0xax.gitbooks.io/linux-insides/content/Timers/timers-3.html) we knew about the `tick broadcast` framework and `NO_HZ` mode in the Linux kernel. We will continue to dive into the time managemented related stuff in the Linux kernel in this part and will and will acquainted with yet another concept in the Linux kernel - `timers`. Before we will look at timers in the Linux kernel, we have to learn some theory about this concept. Note that we will consider software timers in this part.
|
||||
This is fourth part of the [chapter](https://0xax.gitbooks.io/linux-insides/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel and in the previous [part](https://0xax.gitbooks.io/linux-insides/content/Timers/timers-3.html) we knew about the `tick broadcast` framework and `NO_HZ` mode in the Linux kernel. We will continue to dive into the time management related stuff in the Linux kernel in this part and will be acquainted with yet another concept in the Linux kernel - `timers`. Before we will look at timers in the Linux kernel, we have to learn some theory about this concept. Note that we will consider software timers in this part.
|
||||
|
||||
The Linux kernel provides a `software timer` concept to allow to kernel functions could be invoked at future moment. Timers are widely used in the Linux kernel. For example, look in the [net/netfilter/ipset/ip_set_list_set.c](https://github.com/torvalds/linux/blob/master/net/netfilter/ipset/ip_set_list_set.c) source code file. This source code file provides implementation of the framework for the managing of groups of [IP](https://en.wikipedia.org/wiki/Internet_Protocol) addresses.
|
||||
|
||||
@ -75,7 +75,7 @@ static void __init init_timer_cpus(void)
|
||||
}
|
||||
```
|
||||
|
||||
If you do not know or do not remember what is it a `possible` cpu, you can read the special [part](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) of this book which describes `cpumask` concept in the Linux kernel. In shot words, a `possible` processor is a processor which can be plugged in anytime during the life of the system.
|
||||
If you do not know or do not remember what is it a `possible` cpu, you can read the special [part](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) of this book which describes `cpumask` concept in the Linux kernel. In short words, a `possible` processor is a processor which can be plugged in anytime during the life of the system.
|
||||
|
||||
The `init_timer_cpu` function does main work for us, namely it executes initialization of the `tvec_base` structure for each processor. This structure defined in the [kernel/time/timer.c](https://github.com/torvalds/linux/blob/master/kernel/time/timer.c) source code file and stores data related to a `dynamic` timer for a certain processor. Let's look on the definition of this structure:
|
||||
|
||||
@ -98,7 +98,7 @@ struct tvec_base {
|
||||
} ____cacheline_aligned;
|
||||
```
|
||||
|
||||
The `thec_base` structure contains following fields: The `lock` for `tvec_base` protection, the next `running_timer` field points to the currently running timer for the certain processor, the `timer_jiffies` fields represents the earilest expiration time (it will be used by the Linux kernel to find already expired timers). The next field - `next_timer` contains the next pending timer for a next timer [interrupt](https://en.wikipedia.org/wiki/Interrupt) in a case when a processor goes to sleep and the `NO_HZ` mode is enabled in the Linux kernel. The `active_timers` field provides accounting of non-deferrable timers timers or in other words all timers that will not be stopped during a processor will go to sleep. The `all_timers` field tracks total number of timers or `active_timers` + deferrable timers. The `cpu` field represents number of a processor which owns timers. The `migration_enabled` and `nohz_active` fields are represent opportunity of timers migration to another processor and status of the `NO_HZ` mode respectively.
|
||||
The `thec_base` structure contains following fields: The `lock` for `tvec_base` protection, the next `running_timer` field points to the currently running timer for the certain processor, the `timer_jiffies` fields represents the earliest expiration time (it will be used by the Linux kernel to find already expired timers). The next field - `next_timer` contains the next pending timer for a next timer [interrupt](https://en.wikipedia.org/wiki/Interrupt) in a case when a processor goes to sleep and the `NO_HZ` mode is enabled in the Linux kernel. The `active_timers` field provides accounting of non-deferrable timers or in other words all timers that will not be stopped during a processor will go to sleep. The `all_timers` field tracks total number of timers or `active_timers` + deferrable timers. The `cpu` field represents number of a processor which owns timers. The `migration_enabled` and `nohz_active` fields are represent opportunity of timers migration to another processor and status of the `NO_HZ` mode respectively.
|
||||
|
||||
The last five fields of the `tvec_base` structure represent lists of dynamic timers. The first `tv1` field has:
|
||||
|
||||
@ -232,9 +232,9 @@ static int timer_cpu_notify(struct notifier_block *self,
|
||||
}
|
||||
```
|
||||
|
||||
This chapter will not describe `hotplug` related events in the Linux kernel source code, but if you are interesting in such things, you can find implementation iof the `migrate_timers` function in the [kernel/time/timer.c](https://github.com/torvalds/linux/blob/master/kernel/time/timer.c) source code file.
|
||||
This chapter will not describe `hotplug` related events in the Linux kernel source code, but if you are interesting in such things, you can find implementation of the `migrate_timers` function in the [kernel/time/timer.c](https://github.com/torvalds/linux/blob/master/kernel/time/timer.c) source code file.
|
||||
|
||||
The last step in the is the `init_timers` function is the call of the:
|
||||
The last step in the `init_timers` function is the call of the:
|
||||
|
||||
```C
|
||||
open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
|
||||
@ -267,7 +267,7 @@ At the beginning of the `run_timer_softirq` function we get a `dynamic` timer fo
|
||||
|
||||
Reclaim that the `timer_jiffies` field of the `tvec_base` structure represents the relative time when functions delayed by the given timer will be executed. So we compare these two values and if the current time represented by the `jiffies` is greater than `base->timer_jiffies`, we call the `__run_timers` function that defined in the same source code file. Let's look on the implementation of this function.
|
||||
|
||||
As I just wrote, the `__run_timers` function runs all expired timers for a given processor. This function starts from the acquireing of the `tvec_base's` lock to protect the `tvec_base` structure
|
||||
As I just wrote, the `__run_timers` function runs all expired timers for a given processor. This function starts from the acquiring of the `tvec_base's` lock to protect the `tvec_base` structure
|
||||
|
||||
```C
|
||||
static inline void __run_timers(struct tvec_base *base)
|
||||
|
Loading…
Reference in New Issue
Block a user