mirror of
https://github.com/0xAX/linux-insides.git
synced 2024-12-22 06:38:07 +00:00
Many fixes for initialization and MM related parts
This commit is contained in:
parent
4c6b631361
commit
f3bc5949e2
@ -20,6 +20,7 @@ First steps in the kernel
|
||||
Okay, we got the address of the decompressed kernel image from the `decompress_kernel` function into `rax` register and just jumped there. As we already know the entry point of the decompressed kernel image starts in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly source code file and at the beginning of it, we can see following definitions:
|
||||
|
||||
```assembly
|
||||
.text
|
||||
__HEAD
|
||||
.code64
|
||||
.globl startup_64
|
||||
@ -91,7 +92,11 @@ Here we just compare low part of the `rbp` register with the complemented value
|
||||
|
||||
```C
|
||||
#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))
|
||||
```
|
||||
|
||||
where `PMD_PAGE_SIZE` macro defined as:
|
||||
|
||||
```
|
||||
#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT)
|
||||
#define PMD_SHIFT 21
|
||||
```
|
||||
@ -207,10 +212,6 @@ After this we store address of the `_text` in the `rax` and get the index of the
|
||||
```assembly
|
||||
movq %rdi, %rax
|
||||
shrq $PGDIR_SHIFT, %rax
|
||||
|
||||
leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
|
||||
movq %rdx, 0(%rbx,%rax,8)
|
||||
movq %rdx, 8(%rbx,%rax,8)
|
||||
```
|
||||
|
||||
where `PGDIR_SHIFT` is `39`. `PGDIR_SHFT` indicates the mask for page global directory bits in a virtual address. There are macro for all types of page directories:
|
||||
@ -221,45 +222,50 @@ where `PGDIR_SHIFT` is `39`. `PGDIR_SHFT` indicates the mask for page global dir
|
||||
#define PMD_SHIFT 21
|
||||
```
|
||||
|
||||
After this we put the address of the first `level3_kernel_pgt` in the `rdx` with the `_KERNPG_TABLE` access rights (see above) and fill the `early_level4_pgt` with the 2 `level3_kernel_pgt` entries.
|
||||
After this we put the address of the first entry of the `early_dynamic_pgts` page table to the `rdx` register with the `_KERNPG_TABLE` access rights (see above) and fill the `early_level4_pgt` with the 2 `early_dynamic_pgts` entries:
|
||||
|
||||
After this we add `4096` (size of the `early_level4_pgt`) to the `rdx` (it now contains the address of the first entry of the `level3_kernel_pgt`) and put `rdi` (it now contains physical address of the `_text`) to the `rax`. And after this we write addresses of the two page upper directory entries to the `level3_kernel_pgt`:
|
||||
```assembly
|
||||
leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
|
||||
movq %rdx, 0(%rbx,%rax,8)
|
||||
movq %rdx, 8(%rbx,%rax,8)
|
||||
```
|
||||
|
||||
The `rbx` register contains address of the `early_level4_pgt` and `%rax * 8` here is index of a page global directory occupied by the `_text` address. So here we fill two entries of the `early_level4_pgt` with address of two entries of the `early_dynamic_pgts` which is related to `_text`. The `early_dynamic_pgts` is array of arrays:
|
||||
|
||||
```C
|
||||
extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
|
||||
```
|
||||
|
||||
which will store temporary page tables for early kernel which we will not move to the `init_level4_pgt`.
|
||||
|
||||
After this we add `4096` (size of the `early_level4_pgt`) to the `rdx` (it now contains the address of the first entry of the `early_dynamic_pgts`) and put `rdi` (it now contains physical address of the `_text`) to the `rax`. Now we shift address of the `_text` ot `PUD_SHIFT` to get index of an entry from page upper directory which contains this address and clears high bits to get only `pud` related part:
|
||||
|
||||
```assembly
|
||||
addq $4096, %rdx
|
||||
movq %rdi, %rax
|
||||
shrq $PUD_SHIFT, %rax
|
||||
andl $(PTRS_PER_PUD-1), %eax
|
||||
```
|
||||
|
||||
As we have index of a page upper directory we write two addresses of the second entry of the `early_dynamic_pgts` array to the first entry of this temporary page directory:
|
||||
|
||||
```assembly
|
||||
movq %rdx, 4096(%rbx,%rax,8)
|
||||
incl %eax
|
||||
andl $(PTRS_PER_PUD-1), %eax
|
||||
movq %rdx, 4096(%rbx,%rax,8)
|
||||
```
|
||||
|
||||
In the next step we write addresses of the page middle directory entries to the `level2_kernel_pgt` and the last step is correcting of the kernel text+data virtual addresses:
|
||||
In the next step we do the same operation for last page table directory, but filling not two entries, but all entries to cover full size of the kernel.
|
||||
|
||||
After our early page table directories filled, we put physical address of the `early_level4_pgt` to the `rax` register and jump to label `1`:
|
||||
|
||||
```assembly
|
||||
leaq level2_kernel_pgt(%rip), %rdi
|
||||
leaq 4096(%rdi), %r8
|
||||
1: testq $1, 0(%rdi)
|
||||
jz 2f
|
||||
addq %rbp, 0(%rdi)
|
||||
2: addq $8, %rdi
|
||||
cmp %r8, %rdi
|
||||
jne 1b
|
||||
```
|
||||
|
||||
Here we put the address of the `level2_kernel_pgt` to the `rdi` and address of the page table entry to the `r8` register. Next we check the present bit in the `level2_kernel_pgt` and if it is zero we're moving to the next page by adding 8 bytes to `rdi` which contains address of the `level2_kernel_pgt`. After this we compare it with `r8` (contains address of the page table entry) and go back to label `1` or move forward.
|
||||
|
||||
In the next step we correct `phys_base` physical address with `rbp` (contains physical address of the `_text`), put physical address of the `early_level4_pgt` and jump to label `1`:
|
||||
|
||||
```assembly
|
||||
addq %rbp, phys_base(%rip)
|
||||
movq $(early_level4_pgt - __START_KERNEL_map), %rax
|
||||
jmp 1f
|
||||
```
|
||||
|
||||
where `phys_base` matches the first entry of the `level2_kernel_pgt` which is `512` MB kernel mapping.
|
||||
That's all for now. Our early paging is prepared and we just need to finish last preparation before we will jump into C code and kernel entry point later.
|
||||
|
||||
Last preparation before jump at the kernel entry point
|
||||
--------------------------------------------------------------------------------
|
||||
@ -343,16 +349,16 @@ movq %rax, %cr0
|
||||
We already know that to run any code, and even more [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code from assembly, we need to setup a stack. As always, we are doing it by the setting of [stack pointer](https://en.wikipedia.org/wiki/Stack_register) to a correct place in memory and resetting [flags](https://en.wikipedia.org/wiki/FLAGS_register) register after this:
|
||||
|
||||
```assembly
|
||||
movq stack_start(%rip), %rsp
|
||||
movq initial_stack(%rip), %rsp
|
||||
pushq $0
|
||||
popfq
|
||||
```
|
||||
|
||||
The most interesting thing here is the `stack_start`. It defined in the same [source](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) code file and looks like:
|
||||
The most interesting thing here is the `initial_stack`. This symbol is defined in the [source](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) code file and looks like:
|
||||
|
||||
```assembly
|
||||
GLOBAL(stack_start)
|
||||
.quad init_thread_union+THREAD_SIZE-8
|
||||
GLOBAL(initial_stack)
|
||||
.quad init_thread_union+THREAD_SIZE-8
|
||||
```
|
||||
|
||||
The `GLOBAL` is already familiar to us from. It defined in the [arch/x86/include/asm/linkage.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/linkage.h) header file expands to the `global` symbol definition:
|
||||
@ -372,7 +378,7 @@ The `THREAD_SIZE` macro is defined in the [arch/x86/include/asm/page_64_types.h]
|
||||
|
||||
We consider when the [kasan](http://lxr.free-electrons.com/source/Documentation/kasan.txt) is disabled and the `PAGE_SIZE` is `4096` bytes. So the `THREAD_SIZE` will expands to `16` kilobytes and represents size of the stack of a thread. Why is `thread`? You may already know that each [process](https://en.wikipedia.org/wiki/Process_%28computing%29) may have parent [processes](https://en.wikipedia.org/wiki/Parent_process) and [child](https://en.wikipedia.org/wiki/Child_process) processes. Actually, a parent process and child process differ in stack. A new kernel stack is allocated for a new process. In the Linux kernel this stack is represented by the [union](https://en.wikipedia.org/wiki/Union_type#C.2FC.2B.2B) with the `thread_info` structure.
|
||||
|
||||
And as we can see the `init_thread_union` is represented by the `thread_union`, which defined as:
|
||||
And as we can see the `init_thread_union` is represented by the `thread_union` [union](https://en.wikipedia.org/wiki/Union_type#C.2FC.2B.2B). Earlier this union looked like:
|
||||
|
||||
```C
|
||||
union thread_union {
|
||||
@ -381,46 +387,40 @@ union thread_union {
|
||||
};
|
||||
```
|
||||
|
||||
and `init_thread_union` looks like:
|
||||
but from the Linux kernel `4.9-rc1` release, `thread_info` was moved to the `task_struct` structure which represents a thread. So, for now `thread_union` looks like:
|
||||
|
||||
```C
|
||||
union thread_union init_thread_union __init_task_data =
|
||||
{ INIT_THREAD_INFO(init_task) };
|
||||
union thread_union {
|
||||
#ifndef CONFIG_THREAD_INFO_IN_TASK
|
||||
struct thread_info thread_info;
|
||||
#endif
|
||||
unsigned long stack[THREAD_SIZE/sizeof(long)];
|
||||
};
|
||||
```
|
||||
|
||||
Where the `INIT_THREAD_INFO` macro takes `task_struct` structure which represents process descriptor in the Linux kernel and does some basic initialization of the given `task_struct` structure:
|
||||
where the `CONFIG_THREAD_INFO_IN_TASK` kernel configuration option is enabled for `x86_64` architecture. So, as we consider only `x86_64` architecture in this book, an instance of `thread_union` will contain only stack and `thread_info` structure will be placed in the `task_struct`.
|
||||
|
||||
```C
|
||||
#define INIT_THREAD_INFO(tsk) \
|
||||
{ \
|
||||
.task = &tsk, \
|
||||
.flags = 0, \
|
||||
.cpu = 0, \
|
||||
.addr_limit = KERNEL_DS, \
|
||||
}
|
||||
```
|
||||
|
||||
So, the `thread_union` contains low-level information about a process and process's stack and placed in the bottom of stack:
|
||||
The `init_thread_union` looks like:
|
||||
|
||||
```
|
||||
+-----------------------+
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| Kernel stack |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
|-----------------------|
|
||||
| |
|
||||
| struct thread_info |
|
||||
| |
|
||||
+-----------------------+
|
||||
union thread_union init_thread_union __init_task_data = {
|
||||
#ifndef CONFIG_THREAD_INFO_IN_TASK
|
||||
INIT_THREAD_INFO(init_task)
|
||||
#endif
|
||||
};
|
||||
```
|
||||
|
||||
Note that we reserve `8` bytes at the to of stack. This is necessary to guarantee illegal access of the next page memory.
|
||||
which represents just thread stack. Now we may understand this expression:
|
||||
|
||||
After the early boot stack is set, to update the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table) with `lgdt` instruction:
|
||||
```assembly
|
||||
GLOBAL(initial_stack)
|
||||
.quad init_thread_union+THREAD_SIZE-8
|
||||
```
|
||||
|
||||
|
||||
that `initial_stack` symbol points to the start of the `thread_union.stack` array + `THREAD_SIZE` which is 16 killobytes and - 8 bytes. Here we need to subtract `8` bytes at the to of stack. This is necessary to guarantee illegal access of the next page memory.
|
||||
|
||||
After the early boot stack is set, to update the [Global Descriptor Table](https://en.wikipedia.org/wiki/Global_Descriptor_Table) with the `lgdt` instruction:
|
||||
|
||||
```assembly
|
||||
lgdt early_gdt_descr(%rip)
|
||||
@ -441,7 +441,9 @@ We need to reload `Global Descriptor Table` because now kernel works in the low
|
||||
#define GDT_ENTRIES 32
|
||||
```
|
||||
|
||||
for kernel code, data, thread local storage segments and etc... it's simple. Now let's look at the `early_gdt_descr_base`. First of `gdt_page` defined as:
|
||||
for kernel code, data, thread local storage segments and etc... it's simple. Now let's look at the definition of the `early_gdt_descr_base`.
|
||||
|
||||
First of `gdt_page` defined as:
|
||||
|
||||
```C
|
||||
struct gdt_page {
|
||||
@ -516,10 +518,9 @@ We need to put `MSR_GS_BASE` to the `ecx` register and load data from the `eax`
|
||||
In the next step we put the address of the real mode bootparam structure to the `rdi` (remember `rsi` holds pointer to this structure from the start) and jump to the C code with:
|
||||
|
||||
```assembly
|
||||
movq initial_code(%rip),%rax
|
||||
pushq $0
|
||||
pushq $__KERNEL_CS
|
||||
pushq %rax
|
||||
movq initial_code(%rip), %rax
|
||||
pushq $__KERNEL_CS # set correct cs
|
||||
pushq %rax # target address in negative space
|
||||
lretq
|
||||
```
|
||||
|
||||
|
@ -130,7 +130,7 @@ void set_task_stack_end_magic(struct task_struct *tsk)
|
||||
}
|
||||
```
|
||||
|
||||
Its implementation is simple. `set_task_stack_end_magic` gets the end of the stack for the given `task_struct` with the `end_of_stack` function. The end of a process stack depends on the `CONFIG_STACK_GROWSUP` configuration option. As we learn in `x86_64` architecture, the stack grows down. So the end of the process stack will be:
|
||||
Its implementation is simple. `set_task_stack_end_magic` gets the end of the stack for the given `task_struct` with the `end_of_stack` function. Earlier (and now for all architectures besides `x86_64`) stack was located in the `thread_info` structure. So the end of a process stack depends on the `CONFIG_STACK_GROWSUP` configuration option. As we learn in `x86_64` architecture, the stack grows down. So the end of the process stack will be:
|
||||
|
||||
```C
|
||||
(unsigned long *)(task_thread_info(p) + 1);
|
||||
@ -142,13 +142,52 @@ where `task_thread_info` just returns the stack which we filled with the `INIT_T
|
||||
#define task_thread_info(task) ((struct thread_info *)(task)->stack)
|
||||
```
|
||||
|
||||
From the Linux kernel `v4.9-rc1` release, `thread_info` structure may contains only flags and stack pointer resides in `task_struct` structure which represents a thread in the Linux kernel. This depends on `CONFIG_THREAD_INFO_IN_TASK` kernel configuration option which is enabled by default for `x86_64`. You can be sure in this if you will look in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) configuration build file:
|
||||
|
||||
```
|
||||
config THREAD_INFO_IN_TASK
|
||||
bool
|
||||
help
|
||||
Select this to move thread_info off the stack into task_struct. To
|
||||
make this work, an arch will need to remove all thread_info fields
|
||||
except flags and fix any runtime bugs.
|
||||
|
||||
One subtle change that will be needed is to use try_get_task_stack()
|
||||
and put_task_stack() in save_thread_stack_tsk() and get_wchan().
|
||||
```
|
||||
|
||||
and [arch/x86/Kconfig](https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig):
|
||||
|
||||
```
|
||||
config X86
|
||||
def_bool y
|
||||
...
|
||||
...
|
||||
...
|
||||
select THREAD_INFO_IN_TASK
|
||||
...
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
So, in this way we may just get end of a thread stack from the given `task_struct` structure:
|
||||
|
||||
```C
|
||||
#ifdef CONFIG_THREAD_INFO_IN_TASK
|
||||
static inline unsigned long *end_of_stack(const struct task_struct *task)
|
||||
{
|
||||
return task->stack;
|
||||
}
|
||||
#endif
|
||||
```
|
||||
|
||||
As we got the end of the init process stack, we write `STACK_END_MAGIC` there. After `canary` is set, we can check it like this:
|
||||
|
||||
```C
|
||||
if (*end_of_stack(task) != STACK_END_MAGIC) {
|
||||
//
|
||||
// handle stack overflow here
|
||||
//
|
||||
//
|
||||
}
|
||||
```
|
||||
|
||||
|
@ -8,9 +8,9 @@ In the previous [part](http://0xax.gitbooks.io/linux-insides/content/Initializat
|
||||
|
||||
```
|
||||
----------------------------------------------------------------------------------------------
|
||||
|Vector|Mnemonic|Description |Type |Error Code|Source |
|
||||
|Vector|Mnemonic|Description |Type |Error Code|Source |
|
||||
----------------------------------------------------------------------------------------------
|
||||
|3 | #BP |Breakpoint |Trap |NO |INT 3 |
|
||||
|3 | #BP |Breakpoint |Trap |NO |INT 3 |
|
||||
----------------------------------------------------------------------------------------------
|
||||
```
|
||||
|
||||
@ -29,7 +29,9 @@ We already saw implementation of the `set_intr_gate` in the previous part about
|
||||
|
||||
* number of the interrupt;
|
||||
* base address of the interrupt/exception handler;
|
||||
* third parameter is - `Interrupt Stack Table`. `IST` is a new mechanism in the `x86_64` and part of the [TSS](http://en.wikipedia.org/wiki/Task_state_segment). Every active thread in kernel mode has own kernel stack which is 16 kilobytes. While a thread in user space, kernel stack is empty except `thread_info` (read about it previous [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html)) at the bottom. In addition to per-thread stacks, there are a couple of specialized stacks associated with each CPU. All about these stack you can read in the linux kernel documentation - [Kernel stacks](https://www.kernel.org/doc/Documentation/x86/x86_64/kernel-stacks). `x86_64` provides feature which allows to switch to a new `special` stack for during any events as non-maskable interrupt and etc... And the name of this feature is - `Interrupt Stack Table`. There can be up to 7 `IST` entries per CPU and every entry points to the dedicated stack. In our case this is `DEBUG_STACK`.
|
||||
* third parameter is - `Interrupt Stack Table`. `IST` is a new mechanism in the `x86_64` and part of the [TSS](http://en.wikipedia.org/wiki/Task_state_segment). Every active thread in kernel mode has own kernel stack which is `16` kilobytes. While a thread in user space, this kernel stack is empty.
|
||||
|
||||
In addition to per-thread stacks, there are a couple of specialized stacks associated with each CPU. All about these stack you can read in the linux kernel documentation - [Kernel stacks](https://www.kernel.org/doc/Documentation/x86/x86_64/kernel-stacks). `x86_64` provides feature which allows to switch to a new `special` stack for during any events as non-maskable interrupt and etc... And the name of this feature is - `Interrupt Stack Table`. There can be up to 7 `IST` entries per CPU and every entry points to the dedicated stack. In our case this is `DEBUG_STACK`.
|
||||
|
||||
`set_intr_gate_ist` and `set_system_intr_gate_ist` work by the same principle as `set_intr_gate` with only one difference. Both of these functions checks
|
||||
interrupt number and call `_set_gate` inside:
|
||||
|
@ -33,10 +33,12 @@ Base virtual address and size of the `fix-mapped` area are presented by the two
|
||||
|
||||
```C
|
||||
#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
|
||||
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
|
||||
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
|
||||
```
|
||||
|
||||
Here `__end_of_permanent_fixed_addresses` is an element of the `fixed_addresses` enum and as I wrote above: Every fix-mapped address is represented by an integer index which is defined in the `fixed_addresses`. `PAGE_SHIFT` determines the size of a page. For example size of the one page we can get with the `1 << PAGE_SHIFT`. In our case we need to get the size of the fix-mapped area, but not only of one page, that's why we are using `__end_of_permanent_fixed_addresses` for getting the size of the fix-mapped area. In my case it's a little more than `536` kilobytes. In your case it might be a different number, because the size depends on amount of the fix-mapped addresses which are depends on your kernel's configuration.
|
||||
Here `__end_of_permanent_fixed_addresses` is an element of the `fixed_addresses` enum and as I wrote above: Every fix-mapped address is represented by an integer index which is defined in the `fixed_addresses`. `PAGE_SHIFT` determines the size of a page. For example size of the one page we can get with the `1 << PAGE_SHIFT` expression.
|
||||
|
||||
In our case we need to get the size of the fix-mapped area, but not only of one page, that's why we are using `__end_of_permanent_fixed_addresses` for getting the size of the fix-mapped area. The `__end_of_permanent_fixed_addresses` is the last index of the `fixed_addresses` enum or in other words the `__end_of_permanent_fixed_addresses` contains amount of pages in a fixed-mapped area. So if multiply value of the `__end_of_permanent_fixed_addresses` on a page size value we will get size of fix-mapped area. In my case it's a little more than `536` kilobytes. In your case it might be a different number, because the size depends on amount of the fix-mapped addresses which are depends on your kernel's configuration.
|
||||
|
||||
The second `FIXADDR_START` macro just subtracts the fix-mapped area size from the last address of the fix-mapped area to get its base virtual address. `FIXADDR_TOP` is a rounded up address from the base address of the [vsyscall](https://lwn.net/Articles/446528/) space:
|
||||
|
||||
@ -60,7 +62,19 @@ first of all it checks that the index given for the `fixed_addresses` enum is no
|
||||
#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
|
||||
```
|
||||
|
||||
Here we shift left the given `fix-mapped` address index on the `PAGE_SHIFT` which determines size of a page as I wrote above and subtract it from the `FIXADDR_TOP` which is the highest address of the `fix-mapped` area. There is an inverse function for getting `fix-mapped` address from a virtual address:
|
||||
Here we shift left the given index of a `fix-mapped` area on the `PAGE_SHIFT` which determines size of a page as I wrote above and subtract it from the `FIXADDR_TOP` which is the highest address of the `fix-mapped` area:
|
||||
|
||||
```
|
||||
+-----------------+
|
||||
| PAGE 1 | FIXADDR_TOP (virt address)
|
||||
| PAGE 2 |
|
||||
| PAGE 3 |
|
||||
| PAGE 4 (idx) | x - 4
|
||||
| PAGE 5 |
|
||||
+-----------------+
|
||||
```
|
||||
|
||||
There is an inverse function for getting an index of a fix-mapped area corresponding to the given virtual address:
|
||||
|
||||
```C
|
||||
static inline unsigned long virt_to_fix(const unsigned long vaddr)
|
||||
@ -70,15 +84,19 @@ static inline unsigned long virt_to_fix(const unsigned long vaddr)
|
||||
}
|
||||
```
|
||||
|
||||
`virt_to_fix` takes a virtual address, checks that this address is between `FIXADDR_START` and `FIXADDR_TOP` and calls the `__virt_to_fix` macro which implemented as:
|
||||
The `virt_to_fix` takes a virtual address, checks that this address is between `FIXADDR_START` and `FIXADDR_TOP` and calls the `__virt_to_fix` macro which implemented as:
|
||||
|
||||
```C
|
||||
#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT)
|
||||
```
|
||||
|
||||
A PFN is simply an index within physical memory that is counted in page-sized units. PFN for a physical address could be trivially defined as (page_phys_addr >> PAGE_SHIFT);
|
||||
As we may see, the `__virt_to_fix` macro clears the first `12` bits in the given virtual address, subtracts it from the last address the of `fix-mapped` area (`FIXADDR_TOP`) and shifts the result right on `PAGE_SHIFT` which is `12`. Let me explain how it works.
|
||||
|
||||
`__virt_to_fix` clears the first 12 bits in the given address, subtracts it from the last address the of `fix-mapped` area (`FIXADDR_TOP`) and shifts the result right on `PAGE_SHIFT` which is `12`. Let me explain how it works. As I already wrote we will clear the first 12 bits in the given address with `x & PAGE_MASK`. As we subtract this from the `FIXADDR_TOP`, we will get the last 12 bits of the `FIXADDR_TOP` which are present. We know that the first 12 bits of the virtual address represent the offset in the page frame. With the shifting it on `PAGE_SHIFT` we will get `Page frame number` which is just all bits in a virtual address besides the first 12 offset bits. `Fix-mapped` addresses are used in different [places](http://lxr.free-electrons.com/ident?i=fix_to_virt) in the linux kernel. `IDT` descriptor stored there, [Intel Trusted Execution Technology](http://en.wikipedia.org/wiki/Trusted_Execution_Technology) UUID stored in the `fix-mapped` area started from `FIX_TBOOT_BASE` index, [Xen](http://en.wikipedia.org/wiki/Xen) bootmap and many more... We already saw a little about `fix-mapped` addresses in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about of the linux kernel initialization. We use `fix-mapped` area in the early `ioremap` initialization. Let's look at it more closely and try to understand what `ioremap` is, how it is implemented in the kernel and how it is related to the `fix-mapped` addresses.
|
||||
As in previous example (in `__fix_to_virt` macro), we start from the top of the fix-mapped area. We also go back to bottom from the top to search an index of a fix-mapped area corresponding to the given virtual address. As you may see, forst of all we will clear the first `12` bits in the given virtual address with `x & PAGE_MASK` expression. This allows us to get base address of page. We need to do this for case when the given virtual address points somewhere in a beginning/middle or end of a page, but not to the base address of it. At the next step subtract this from the `FIXADDR_TOP` and this gives us virtual address of a correspinding page in a fix-mapped area. In the end we just divide value of this address on `PAGE_SHIFT`. This gives us index of a fix-mapped area corresponding to the given virtual address. It may looks hard, but if you will go through this step by step, you will be sure that the `__virt_to_fix` macro is pretty easy.
|
||||
|
||||
That's all. For this moment we know a little about `fix-mapped` addresses, but this is enough to go next.
|
||||
|
||||
`Fix-mapped` addresses are used in different [places](http://lxr.free-electrons.com/ident?i=fix_to_virt) in the linux kernel. `IDT` descriptor stored there, [Intel Trusted Execution Technology](http://en.wikipedia.org/wiki/Trusted_Execution_Technology) UUID stored in the `fix-mapped` area started from `FIX_TBOOT_BASE` index, [Xen](http://en.wikipedia.org/wiki/Xen) bootmap and many more... We already saw a little about `fix-mapped` addresses in the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-5.html) about of the linux kernel initialization. We use `fix-mapped` area in the early `ioremap` initialization. Let's look at it more closely and try to understand what `ioremap` is, how it is implemented in the kernel and how it is related to the `fix-mapped` addresses.
|
||||
|
||||
ioremap
|
||||
--------------------------------------------------------------------------------
|
||||
|
Loading…
Reference in New Issue
Block a user