mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-21 21:21:18 +00:00
Syscall: fix syscall table entry for 86_64
Fix the maxium number of syscall table for 86_64 in kernel version: v5.0.0-rc7 Signed-off-by: Bo YU <tsu.yubo@gmail.com>
This commit is contained in:
parent
c9fbcafdb6
commit
d71137fbf2
@ -1,7 +1,7 @@
|
||||
System calls in the Linux kernel. Part 2.
|
||||
================================================================================
|
||||
|
||||
How does the Linux kernel handle a system call
|
||||
How does the Linux kernel handle a system call
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
The previous [part](https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html) was the first part of the chapter that describes the [system call](https://en.wikipedia.org/wiki/System_call) concepts in the Linux kernel.
|
||||
@ -51,10 +51,10 @@ asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
|
||||
};
|
||||
```
|
||||
|
||||
As we can see, the `sys_call_table` is an array of `__NR_syscall_max + 1` size where the `__NR_syscall_max` macro represents the maximum number of system calls for the given [architecture](https://en.wikipedia.org/wiki/List_of_CPU_architectures). This book is about the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, so for our case the `__NR_syscall_max` is `322` and this is the correct number at the time of writing (current Linux kernel version is `4.2.0-rc8+`). We can see this macro in the header file generated by [Kbuild](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt) during kernel compilation - include/generated/asm-offsets.h`:
|
||||
As we can see, the `sys_call_table` is an array of `__NR_syscall_max + 1` size where the `__NR_syscall_max` macro represents the maximum number of system calls for the given [architecture](https://en.wikipedia.org/wiki/List_of_CPU_architectures). This book is about the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, so for our case the `__NR_syscall_max` is `547` and this is the correct number at the time of writing (current Linux kernel version is `5.0.0-rc7`). We can see this macro in the header file generated by [Kbuild](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt) during kernel compilation - include/generated/asm-offsets.h`:
|
||||
|
||||
```C
|
||||
#define __NR_syscall_max 322
|
||||
#define __NR_syscall_max 547
|
||||
```
|
||||
|
||||
There will be the same number of system calls in the [arch/x86/entry/syscalls/syscall_64.tbl](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/syscalls/syscall_64.tbl#L331) for the `x86_64`. There are two important topics here; the type of the `sys_call_table` array, and the initialization of elements in this array. First of all, the type. The `sys_call_ptr_t` represents a pointer to a system call table. It is defined as [typedef](https://en.wikipedia.org/wiki/Typedef) for a function pointer that returns nothing and does not take arguments:
|
||||
@ -247,7 +247,7 @@ sub $(6*8), %rsp
|
||||
|
||||
When a system call occurs from the user's application, general purpose registers have the following state:
|
||||
|
||||
* `rax` - contains system call number;
|
||||
* `rax` - contains system call number;
|
||||
* `rcx` - contains return address to the user space;
|
||||
* `r11` - contains register flags;
|
||||
* `rdi` - contains first argument of a system call handler;
|
||||
@ -293,7 +293,7 @@ where the `__X32_SYSCALL_BIT` is
|
||||
|
||||
As we can see the `__SYSCALL_MASK` depends on the `CONFIG_X86_X32_ABI` kernel configuration option and represents the mask for the 32-bit [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) in the 64-bit kernel.
|
||||
|
||||
So we check the value of the `__SYSCALL_MASK` and if the `CONFIG_X86_X32_ABI` is disabled we compare the value of the `rax` register to the maximum syscall number (`__NR_syscall_max`), alternatively if the `CONFIG_X86_X32_ABI` is enabled we mask the `eax` register with the `__X32_SYSCALL_BIT` and do the same comparison:
|
||||
So we check the value of the `__SYSCALL_MASK` and if the `CONFIG_X86_X32_ABI` is disabled we compare the value of the `rax` register to the maximum syscall number (`__NR_syscall_max`), alternatively if the `CONFIG_X86_X32_ABI` is enabled we mask the `eax` register with the `__X32_SYSCALL_BIT` and do the same comparison:
|
||||
|
||||
```assembly
|
||||
#if __SYSCALL_MASK == ~0
|
||||
@ -310,7 +310,7 @@ After this we check the result of the last comparison with the `ja` instruction
|
||||
ja 1f
|
||||
```
|
||||
|
||||
and if we have the correct system call for this, we move the fourth argument from the `r10` to the `rcx` to keep [x86_64 C ABI](http://www.x86-64.org/documentation/abi.pdf) compliant and execute the `call` instruction with the address of a system call handler:
|
||||
and if we have the correct system call for this, we move the fourth argument from the `r10` to the `rcx` to keep [x86_64 C ABI](http://www.x86-64.org/documentation/abi.pdf) compliant and execute the `call` instruction with the address of a system call handler:
|
||||
|
||||
```assembly
|
||||
movq %r10, %rcx
|
||||
@ -367,11 +367,11 @@ In the end we just call the `USERGS_SYSRET64` macro that expands to the call of
|
||||
Now we know what occurs when a user application calls a system call. The full path of this process is as follows:
|
||||
|
||||
* User application contains code that fills general purpose register with the values (system call number and arguments of this system call);
|
||||
* Processor switches from the user mode to kernel mode and starts execution of the system call entry - `entry_SYSCALL_64`;
|
||||
* Processor switches from the user mode to kernel mode and starts execution of the system call entry - `entry_SYSCALL_64`;
|
||||
* `entry_SYSCALL_64` switches to the kernel stack and saves some general purpose registers, old stack and code segment, flags and etc... on the stack;
|
||||
* `entry_SYSCALL_64` checks the system call number in the `rax` register, searches a system call handler in the `sys_call_table` and calls it, if the number of a system call is correct;
|
||||
* If a system call is not correct, jump on exit from system call;
|
||||
* After a system call handler will finish its work, restore general purpose registers, old stack, flags and return address and exit from the `entry_SYSCALL_64` with the `sysretq` instruction.
|
||||
* After a system call handler will finish its work, restore general purpose registers, old stack, flags and return address and exit from the `entry_SYSCALL_64` with the `sysretq` instruction.
|
||||
|
||||
That's all.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user