From db72e924c14ff231598430364a99efe1e6a40890 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Fri, 27 Mar 2020 06:55:13 +0100 Subject: [PATCH 1/8] correct the source code file as reference the function void set_system_intr_gate doesn't exists anymore the function set_intr_gate is now located in arch/x86/kernel/idt.c --- Interrupts/linux-interrupts-1.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index c1f88e6..411e7ee 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -37,7 +37,7 @@ Addresses of each of the interrupt handlers are maintained in a special location BUG_ON((unsigned)n > 0xFF); ``` -You can find this check within the Linux kernel source code related to interrupt setup (e.g. The `set_intr_gate`, `void set_system_intr_gate` in [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/desc.h)). The first 32 vector numbers from `0` to `31` are reserved by the processor and used for the processing of architecture-defined exceptions and interrupts. You can find the table with the description of these vector numbers in the second part of the Linux kernel initialization process - [Early interrupt and exception handling](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-2.html). Vector numbers from `32` to `255` are designated as user-defined interrupts and are not reserved by the processor. These interrupts are generally assigned to external I/O devices to enable those devices to send interrupts to the processor. +You can find this check within the Linux kernel source code related to interrupt setup (e.g. The `set_intr_gate` in [arch/x86/kernel/idt.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/ldt.c)). The first 32 vector numbers from `0` to `31` are reserved by the processor and used for the processing of architecture-defined exceptions and interrupts. You can find the table with the description of these vector numbers in the second part of the Linux kernel initialization process - [Early interrupt and exception handling](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-2.html). Vector numbers from `32` to `255` are designated as user-defined interrupts and are not reserved by the processor. These interrupts are generally assigned to external I/O devices to enable those devices to send interrupts to the processor. Now let's talk about the types of interrupts. Broadly speaking, we can split interrupts into 2 major classes: From 73fd0ad5e11e30c616054ce917e93b1fe194d463 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Fri, 27 Mar 2020 07:01:38 +0100 Subject: [PATCH 2/8] add Sebastian Fricke to the contributer list --- contributors.md | 1 + 1 file changed, 1 insertion(+) diff --git a/contributors.md b/contributors.md index e76005e..0dea738 100644 --- a/contributors.md +++ b/contributors.md @@ -128,3 +128,4 @@ Thank you to all contributors: * [Stefan20162016](https://github.com/stefan20162016) * [Marco Torsello](https://github.com/md1512) * [Bruno Meneguele](https://github.com/bmeneguele) +* [Sebastian Fricke](https://github.com/initBasti) From 1bf6ed1ec9f826518f1dfad9fe0c073420ee0783 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Wed, 1 Apr 2020 07:21:50 +0200 Subject: [PATCH 3/8] replace gate_struct64 with unified gate_struct As described in this: https://lore.kernel.org/lkml/20170828064957.861974317@linutronix.de/ mail from the lkml. And changed within this commit: https://github.com/torvalds/linux/commit/64b163fab684e3de47aa8db6cc08ae7d2e194373#diff-35bcd00365a749ba6cfa246a7dc86a68 The gate_struct was unified for 32 and 64bit machines. Replaced gate_struct64 definition with that of gate_struct. --- Interrupts/linux-interrupts-1.md | 30 +++++++++--------------------- 1 file changed, 9 insertions(+), 21 deletions(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index 411e7ee..4fbdca5 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -232,33 +232,21 @@ The `IST` or `Interrupt Stack Table` is a new mechanism in the `x86_64`. It is u The `Interrupt Descriptor Table` represented by the array of the `gate_desc` structures: ```C -extern gate_desc idt_table[]; +gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss; ``` -where `gate_desc` is: +where `gate_struct` is defined as: ```C +struct gate_struct { + u16 offset_low; + u16 segment; + struct idt_bits bits; + u16 offset_middle; #ifdef CONFIG_X86_64 -... -... -... -typedef struct gate_struct64 gate_desc; -... -... -... + u32 offset_high; + u32 reserved; #endif -``` - -and `gate_struct64` defined as: - -```C -struct gate_struct64 { - u16 offset_low; - u16 segment; - unsigned ist : 3, zero0 : 5, type : 5, dpl : 2, p : 1; - u16 offset_middle; - u32 offset_high; - u32 zero1; } __attribute__((packed)); ``` From e3711a1ac34dd33f9eeb55498c6668952b12a10b Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Wed, 1 Apr 2020 07:34:48 +0200 Subject: [PATCH 4/8] Add correct location & link to the definition Add link to the github file location and the path within the source directory to gate_struct definiton --- Interrupts/linux-interrupts-1.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index 4fbdca5..b94008f 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -231,11 +231,13 @@ The `IST` or `Interrupt Stack Table` is a new mechanism in the `x86_64`. It is u The `Interrupt Descriptor Table` represented by the array of the `gate_desc` structures: + ```C -gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss; +extern gate_desc idt_table[]; ``` where `gate_struct` is defined as: +[/arch/x86/include/asm/desc\_defs.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc_defs.h) ```C struct gate_struct { From 7a3f099c76da551c457880ccac5ccd80eac52655 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Sat, 4 Apr 2020 09:03:02 +0200 Subject: [PATCH 5/8] Replace irq_stack_union with new implementation The irq_stack is no longer within a irq_stack_union but separated into the irq_stack struct and the fixed_percpu_data struct This change was made with the following series of commits: https://github.com/torvalds/linux/commit/e6401c13093173aad709a5c6de00cf8d692ee786#diff-7db868ab08485b2578c9f97e45fb7d00 --- Interrupts/linux-interrupts-1.md | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index b94008f..8fd1ee4 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -281,22 +281,35 @@ The `PAGE_SIZE` is `4096`-bytes and the `THREAD_SIZE_ORDER` depends on the `KASA #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER) ``` -Or `16384` bytes. The per-cpu interrupt stack represented by the `irq_stack_union` union in the Linux kernel for `x86_64`: +Or `16384` bytes. The per-cpu interrupt stack is represented by the `irq_stack` struct and the `fixed_percpu_data` struct +in the Linux kernel for `x86_64`: ```C -union irq_stack_union { - char irq_stack[IRQ_STACK_SIZE]; +/* Per CPU interrupt stacks */ +struct irq_stack { + char stack[IRQ_STACK_SIZE]; +} __aligned(IRQ_STACK_SIZE); +``` - struct { - char gs_base[40]; - unsigned long stack_canary; - }; +```C +#ifdef CONFIG_X86_64 +struct fixed_percpu_data { + /* + * GCC hardcodes the stack canary as %gs:40. Since the + * irq_stack is the object at %gs:0, we reserve the bottom + * 48 bytes of the irq stack for the canary. + */ + char gs_base[40]; + unsigned long stack_canary; }; +... +#endif ``` -The first `irq_stack` field is a 16 kilobytes array. Also you can see that `irq_stack_union` contains a structure with the two fields: +The `irq_stack` struct contains 16 kilobytes array. +Also, you can see that the fixed\_percpu\_data contains two fields: -* `gs_base` - The `gs` register always points to the bottom of the `irqstack` union. On the `x86_64`, the `gs` register is shared by per-cpu area and stack canary (more about `per-cpu` variables you can read in the special [part](https://0xax.gitbooks.io/linux-insides/content/Concepts/linux-cpu-1.html)). All per-cpu symbols are zero-based and the `gs` points to the base of the per-cpu area. You already know that [segmented memory model](http://en.wikipedia.org/wiki/Memory_segmentation) is abolished in the long mode, but we can set the base address for the two segment registers - `fs` and `gs` with the [Model specific registers](http://en.wikipedia.org/wiki/Model-specific_register) and these registers can be still be used as address registers. If you remember the first [part](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-1.html) of the Linux kernel initialization process, you can remember that we have set the `gs` register: +* `gs_base` - The `gs` register always points to the bottom of the `fixed_percpu_data`. On the `x86_64`, the `gs` register is shared by per-cpu area and stack canary (more about `per-cpu` variables you can read in the special [part](https://0xax.gitbooks.io/linux-insides/content/Concepts/linux-cpu-1.html)). All per-cpu symbols are zero-based and the `gs` points to the base of the per-cpu area. You already know that [segmented memory model](http://en.wikipedia.org/wiki/Memory_segmentation) is abolished in the long mode, but we can set the base address for the two segment registers - `fs` and `gs` with the [Model specific registers](http://en.wikipedia.org/wiki/Model-specific_register) and these registers can be still be used as address registers. If you remember the first [part](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-1.html) of the Linux kernel initialization process, you can remember that we have set the `gs` register: ```assembly movl $MSR_GS_BASE,%ecx From 64a9777ca71fd91eaef4b9d21bb0810498141b94 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Sat, 4 Apr 2020 09:09:02 +0200 Subject: [PATCH 6/8] Replace deprecated initial_gs initialization Within /arch/x86/kernel/head_64.S the implementation of the initialization was changed. Update the passage accordingly. https://github.com/torvalds/linux/commit/b1bd27b9ad45d77a2924e2168c6982c8ff1d8083#diff-a136f03867893e5d01eeadaba59c2dff Also fix a typo from a previous commit. --- Interrupts/linux-interrupts-1.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index 8fd1ee4..c8cee23 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -306,7 +306,7 @@ struct fixed_percpu_data { #endif ``` -The `irq_stack` struct contains 16 kilobytes array. +The `irq_stack` struct contains a 16 kilobytes array. Also, you can see that the fixed\_percpu\_data contains two fields: * `gs_base` - The `gs` register always points to the bottom of the `fixed_percpu_data`. On the `x86_64`, the `gs` register is shared by per-cpu area and stack canary (more about `per-cpu` variables you can read in the special [part](https://0xax.gitbooks.io/linux-insides/content/Concepts/linux-cpu-1.html)). All per-cpu symbols are zero-based and the `gs` points to the base of the per-cpu area. You already know that [segmented memory model](http://en.wikipedia.org/wiki/Memory_segmentation) is abolished in the long mode, but we can set the base address for the two segment registers - `fs` and `gs` with the [Model specific registers](http://en.wikipedia.org/wiki/Model-specific_register) and these registers can be still be used as address registers. If you remember the first [part](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-1.html) of the Linux kernel initialization process, you can remember that we have set the `gs` register: @@ -318,11 +318,10 @@ Also, you can see that the fixed\_percpu\_data contains two fields: wrmsr ``` -where `initial_gs` points to the `irq_stack_union`: +where `initial_gs` points to the `fixed_percpu_data`: ```assembly -GLOBAL(initial_gs) -.quad INIT_PER_CPU_VAR(irq_stack_union) +SYM_DATA(initial_gs, .quad INIT_PER_CPU_VAR(fixed_percpu_data)) ``` * `stack_canary` - [Stack canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow#Stack_canaries) for the interrupt stack is a `stack protector` From c96791d527a82671dee8c0d13653d8892e2453ed Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Sun, 5 Apr 2020 07:10:37 +0200 Subject: [PATCH 7/8] Update irq_stack initialization Replace irq_stack_union with fixed_percpu_data Update to the current system map Update description of initialization process Replace DECLARE macros with the current implementation --- Interrupts/linux-interrupts-1.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index c8cee23..65a8dc3 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -327,13 +327,17 @@ SYM_DATA(initial_gs, .quad INIT_PER_CPU_VAR(fixed_percpu_data)) * `stack_canary` - [Stack canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow#Stack_canaries) for the interrupt stack is a `stack protector` to verify that the stack hasn't been overwritten. Note that `gs_base` is a 40 bytes array. `GCC` requires that stack canary will be on the fixed offset from the base of the `gs` and its value must be `40` for the `x86_64` and `20` for the `x86`. -The `irq_stack_union` is the first datum in the `percpu` area, we can see it in the `System.map`: +The `fixed_percpu_data` is the first datum in the `percpu` area, we can see it in the `System.map`: ``` 0000000000000000 D __per_cpu_start -0000000000000000 D irq_stack_union -0000000000004000 d exception_stacks +0000000000000000 D fixed_percpu_data +00000000000001e0 A kexec_control_code_size +0000000000001000 D cpu_debug_store +0000000000002000 D irq_stack_backing_store +0000000000006000 D cpu_tss_rw 0000000000009000 D gdt_page +000000000000a000 d exception_stacks ... ... ... @@ -342,17 +346,21 @@ The `irq_stack_union` is the first datum in the `percpu` area, we can see it in We can see its definition in the code: ```C -DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible; +DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible; ``` -Now, it's time to look at the initialization of the `irq_stack_union`. Besides the `irq_stack_union` definition, we can see the definition of the following per-cpu variables in the [arch/x86/include/asm/processor.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/processor.h): +Now, it's time to look at the initialization of the `fixed_percpu_data`. Besides the `fixed_percpu_data` definition, we can see the definition of the following per-cpu variables in the [arch/x86/include/asm/processor.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/processor.h): ```C -DECLARE_PER_CPU(char *, irq_stack_ptr); +DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr); +... DECLARE_PER_CPU(unsigned int, irq_count); +... +/* Per CPU softirq stack pointer */ +DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); ``` -The first is the `irq_stack_ptr`. From the variable's name, it is obvious that this is a pointer to the top of the stack. The second - `irq_count` is used to check if a CPU is already on an interrupt stack or not. Initialization of the `irq_stack_ptr` is located in the `setup_per_cpu_areas` function in [arch/x86/kernel/setup_percpu.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup_percpu.c): +The first and third are the stack pointers for hardware and software interrupts. It is obvious from the name of the variables, that these point to the top of stacks. The second - `irq_count` is used to check if a CPU is already on an interrupt stack or not. Initialization of the `irq_stack_ptr` is located in the `setup_per_cpu_areas` function in [arch/x86/kernel/setup_percpu.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup_percpu.c): ```C void __init setup_per_cpu_areas(void) From 4a7f812014403eb5f20ae98975984cca9b0597e7 Mon Sep 17 00:00:00 2001 From: Sebastian Fricke Date: Sun, 5 Apr 2020 09:54:15 +0200 Subject: [PATCH 8/8] Update irq_stack initialization II Replace the removed initialization within setup_percpu.c with the initialization for X86_64 defined within irq_64.c Change the description accordingly. --- Interrupts/linux-interrupts-1.md | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index 65a8dc3..eeab497 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -360,31 +360,24 @@ DECLARE_PER_CPU(unsigned int, irq_count); DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); ``` -The first and third are the stack pointers for hardware and software interrupts. It is obvious from the name of the variables, that these point to the top of stacks. The second - `irq_count` is used to check if a CPU is already on an interrupt stack or not. Initialization of the `irq_stack_ptr` is located in the `setup_per_cpu_areas` function in [arch/x86/kernel/setup_percpu.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup_percpu.c): +The first and third are the stack pointers for hardware and software interrupts. It is obvious from the name of the variables, that these point to the top of stacks. The second - `irq_count` is used to check if a CPU is already on an interrupt stack or not. Initialization of the `hardirq_stack_ptr` is located in the `irq_init_percpu_irqstack` function in [arch/x86/kernel/irq\_64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irq_64.c): ```C -void __init setup_per_cpu_areas(void) +int irq_init_percpu_irqstack(unsigned int cpu) { -... -... -#ifdef CONFIG_X86_64 -for_each_possible_cpu(cpu) { - ... - ... - ... - per_cpu(irq_stack_ptr, cpu) = - per_cpu(irq_stack_union.irq_stack, cpu) + - IRQ_STACK_SIZE - 64; - ... - ... - ... -#endif -... -... + if (per_cpu(hardirq_stack_ptr, cpu)) + return 0; + return map_irq_stack(cpu); } ``` -Here we go over all the CPUs one-by-one and setup `irq_stack_ptr`. This turns out to be equal to the top of the interrupt stack minus `64`. Why `64`?TODO [arch/x86/kernel/cpu/common.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/cpu/common.c) source code file is following: +Here we go over all the CPUs one-by-one and setup the `hardirq_stack_ptr`. +Where `map_irq_stack` is called to initialize the `hardirq_stack_ptr`, +to point onto the `irq_backing_store` of the current CPU with an offset of IRQ\_STACK\_SIZE, +either with guard pages or without when KASan is enabled. + + +[arch/x86/kernel/cpu/common.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/cpu/common.c) source code file is following: ```C void load_percpu_segment(int cpu)