CPU masks ================================================================================ Introduction -------------------------------------------------------------------------------- `Cpumasks` is a special way provided by the Linux kernel to store information about CPUs in the system. The relevant source code and header files which contains API for `Cpumasks` manipulation: * [include/linux/cpumask.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/cpumask.h) * [lib/cpumask.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/lib/cpumask.c) * [kernel/cpu.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/cpu.c) As comment says from the [include/linux/cpumask.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/cpumask.h): Cpumasks provide a bitmap suitable for representing the set of CPU's in a system, one bit position per CPU number. We already saw a bit about cpumask in the `boot_cpu_init` function from the [Kernel entry point](https://0xax.gitbook.io/linux-insides/summary/initialization/linux-initialization-4) part. This function makes first boot cpu online, active and etc...: ```C set_cpu_online(cpu, true); set_cpu_active(cpu, true); set_cpu_present(cpu, true); set_cpu_possible(cpu, true); ``` Before we consider implementation of these functions, let's consider all of these masks. The `cpu_possible` is a set of cpu ID's which can be plugged in anytime during the life of that system boot or in other words mask of possible CPUs contains maximum number of CPUs which are possible in the system. It will be equal to value of the `NR_CPUS` which is set statically via the `CONFIG_NR_CPUS` kernel configuration option. The `cpu_present` mask represents which CPUs are currently plugged in. The `cpu_online` represents a subset of the `cpu_present` and indicates CPUs which are available for scheduling or in other words a bit from this mask tells the kernel if a processor may be utilized by the Linux kernel. The last mask is `cpu_active`. Bits of this mask tells to Linux kernel is a task may be moved to a certain processor. All of these masks depend on the `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. The implementations of all of these functions are very similar. Every function checks the second parameter. If it is `true`, it calls `cpumask_set_cpu` otherwise it calls `cpumask_clear_cpu` . There are two ways for a `cpumask` creation. First is to use `cpumask_t`. It is defined as: ```C typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t; ``` It wraps the `cpumask` structure which contains one bitmask `bits` field. The `DECLARE_BITMAP` macro gets two parameters: * bitmap name; * number of bits. and creates an array of `unsigned long` with the given name. Its implementation is pretty easy: ```C #define DECLARE_BITMAP(name,bits) \ unsigned long name[BITS_TO_LONGS(bits)] ``` where `BITS_TO_LONGS`: ```C #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) ``` As we are focusing on the `x86_64` architecture, `unsigned long` is 8-bytes size and our array will contain only one element: ``` (((8) + (64) - 1) / (64)) = 1 ``` `NR_CPUS` macro represents the number of CPUs in the system and depends on the `CONFIG_NR_CPUS` macro which is defined in [include/linux/threads.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/threads.h) and looks like this: ```C #ifndef CONFIG_NR_CPUS #define CONFIG_NR_CPUS 1 #endif #define NR_CPUS CONFIG_NR_CPUS ``` The second way to define cpumask is to use the `DECLARE_BITMAP` macro directly and the `to_cpumask` macro which converts the given bitmap to `struct cpumask *`: ```C #define to_cpumask(bitmap) \ ((struct cpumask *)(1 ? (bitmap) \ : (void *)sizeof(__check_is_bitmap(bitmap)))) ``` We can see the ternary operator operator here which is `true` every time. `__check_is_bitmap` inline function is defined as: ```C static inline int __check_is_bitmap(const unsigned long *bitmap) { return 1; } ``` And returns `1` every time. We need it here for only one purpose: at compile time it checks that a given `bitmap` is a bitmap, or in other words it checks that a given `bitmap` has type - `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting an array of `unsigned long` to the `struct cpumask *`. cpumask API -------------------------------------------------------------------------------- As we can define cpumask with one of the methods, Linux kernel provides API for manipulating a cpumask. Let's consider one of the function which presented above. For example `set_cpu_online`. This function takes two parameters: * Index of CPU; * CPU status; Implementation of this function looks as: ```C void set_cpu_online(unsigned int cpu, bool online) { if (online) { cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); } else { cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits)); } } ``` First of all it checks the second `state` parameter and calls `cpumask_set_cpu` or `cpumask_clear_cpu` depending on it. Here we can see casting to the `struct cpumask *` of the second parameter in the `cpumask_set_cpu`. In our case it is `cpu_online_bits` which is a bitmap and defined as: ```C static DECLARE_BITMAP(cpu_online_bits, CONFIG_NR_CPUS) __read_mostly; ``` The `cpumask_set_cpu` function makes only one call to the `set_bit` function: ```C static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) { set_bit(cpumask_check(cpu), cpumask_bits(dstp)); } ``` The `set_bit` function takes two parameters too, and sets a given bit (first parameter) in the memory (second parameter or `cpu_online_bits` bitmap). We can see here that before `set_bit` is called, its two parameters will be passed to the * cpumask_check; * cpumask_bits. Let's consider these two macros. First if `cpumask_check` does nothing in our case and just returns given parameter. The second `cpumask_bits` just returns the `bits` field from the given `struct cpumask *` structure: ```C #define cpumask_bits(maskp) ((maskp)->bits) ``` Now let's look on the `set_bit` implementation: ```C static __always_inline void set_bit(long nr, volatile unsigned long *addr) { if (IS_IMMEDIATE(nr)) { asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)CONST_MASK(nr)) : "memory"); } else { asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); } } ``` This function looks scary, but it is not so hard as it seems. First of all it passes `nr` or number of the bit to the `IS_IMMEDIATE` macro which just calls the GCC internal `__builtin_constant_p` function: ```C #define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) ``` `__builtin_constant_p` checks that given parameter is known constant at compile-time. As our `cpu` is not compile-time constant, the `else` clause will be executed: ```C asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); ``` Let's try to understand how it works step by step: `LOCK_PREFIX` is a x86 `lock` instruction. This instruction tells the cpu to occupy the system bus while the instruction(s) will be executed. This allows the CPU to synchronize memory access, preventing simultaneous access of multiple processors (or devices - the DMA controller for example) to one memory cell. `BITOP_ADDR` casts the given parameter to the `(*(volatile long *)` and adds `+m` constraints. `+` means that this operand is both read and written by the instruction. `m` shows that this is a memory operand. `BITOP_ADDR` is defined as: ```C #define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) ``` Next is the `memory` clobber. It tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). `Ir` - immediate register operand. The `bts` instruction sets a given bit in a bit string and stores the value of a given bit in the `CF` flag. So we passed the cpu number which is zero in our case and after `set_bit` is executed, it sets the zero bit in the `cpu_online_bits` cpumask. It means that the first cpu is online at this moment. Besides the `set_cpu_*` API, cpumask of course provides another API for cpumasks manipulation. Let's consider it in short. Additional cpumask API -------------------------------------------------------------------------------- cpumask provides a set of macros for getting the numbers of CPUs in various states. For example: ```C #define num_online_cpus() cpumask_weight(cpu_online_mask) ``` This macro returns the amount of `online` CPUs. It calls the `cpumask_weight` function with the `cpu_online_mask` bitmap (read about it). The`cpumask_weight` function makes one call of the `bitmap_weight` function with two parameters: * cpumask bitmap; * `nr_cpumask_bits` - which is `NR_CPUS` in our case. ```C static inline unsigned int cpumask_weight(const struct cpumask *srcp) { return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); } ``` and calculates the number of bits in the given bitmap. Besides the `num_online_cpus`, cpumask provides macros for the all CPU states: * num_possible_cpus; * num_active_cpus; * cpu_online; * cpu_possible. and many more. Besides that the Linux kernel provides the following API for the manipulation of `cpumask`: * `for_each_cpu` - iterates over every cpu in a mask; * `for_each_cpu_not` - iterates over every cpu in a complemented mask; * `cpumask_clear_cpu` - clears a cpu in a cpumask; * `cpumask_test_cpu` - tests a cpu in a mask; * `cpumask_setall` - set all cpus in a mask; * `cpumask_size` - returns size to allocate for a 'struct cpumask' in bytes; and many many more... Links -------------------------------------------------------------------------------- * [cpumask documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt)