1
0
mirror of https://github.com/0xAX/linux-insides.git synced 2024-12-22 06:38:07 +00:00

Syscall: fix syscall table entry for 86_64

Fix the maxium number of syscall table for 86_64
in kernel version: v5.0.0-rc7

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
This commit is contained in:
Bo YU 2019-02-24 21:45:00 +08:00
parent c9fbcafdb6
commit d71137fbf2

View File

@ -1,7 +1,7 @@
System calls in the Linux kernel. Part 2.
================================================================================
How does the Linux kernel handle a system call
How does the Linux kernel handle a system call
--------------------------------------------------------------------------------
The previous [part](https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html) was the first part of the chapter that describes the [system call](https://en.wikipedia.org/wiki/System_call) concepts in the Linux kernel.
@ -51,10 +51,10 @@ asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
};
```
As we can see, the `sys_call_table` is an array of `__NR_syscall_max + 1` size where the `__NR_syscall_max` macro represents the maximum number of system calls for the given [architecture](https://en.wikipedia.org/wiki/List_of_CPU_architectures). This book is about the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, so for our case the `__NR_syscall_max` is `322` and this is the correct number at the time of writing (current Linux kernel version is `4.2.0-rc8+`). We can see this macro in the header file generated by [Kbuild](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt) during kernel compilation - include/generated/asm-offsets.h`:
As we can see, the `sys_call_table` is an array of `__NR_syscall_max + 1` size where the `__NR_syscall_max` macro represents the maximum number of system calls for the given [architecture](https://en.wikipedia.org/wiki/List_of_CPU_architectures). This book is about the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, so for our case the `__NR_syscall_max` is `547` and this is the correct number at the time of writing (current Linux kernel version is `5.0.0-rc7`). We can see this macro in the header file generated by [Kbuild](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt) during kernel compilation - include/generated/asm-offsets.h`:
```C
#define __NR_syscall_max 322
#define __NR_syscall_max 547
```
There will be the same number of system calls in the [arch/x86/entry/syscalls/syscall_64.tbl](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/syscalls/syscall_64.tbl#L331) for the `x86_64`. There are two important topics here; the type of the `sys_call_table` array, and the initialization of elements in this array. First of all, the type. The `sys_call_ptr_t` represents a pointer to a system call table. It is defined as [typedef](https://en.wikipedia.org/wiki/Typedef) for a function pointer that returns nothing and does not take arguments:
@ -247,7 +247,7 @@ sub $(6*8), %rsp
When a system call occurs from the user's application, general purpose registers have the following state:
* `rax` - contains system call number;
* `rax` - contains system call number;
* `rcx` - contains return address to the user space;
* `r11` - contains register flags;
* `rdi` - contains first argument of a system call handler;
@ -293,7 +293,7 @@ where the `__X32_SYSCALL_BIT` is
As we can see the `__SYSCALL_MASK` depends on the `CONFIG_X86_X32_ABI` kernel configuration option and represents the mask for the 32-bit [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) in the 64-bit kernel.
So we check the value of the `__SYSCALL_MASK` and if the `CONFIG_X86_X32_ABI` is disabled we compare the value of the `rax` register to the maximum syscall number (`__NR_syscall_max`), alternatively if the `CONFIG_X86_X32_ABI` is enabled we mask the `eax` register with the `__X32_SYSCALL_BIT` and do the same comparison:
So we check the value of the `__SYSCALL_MASK` and if the `CONFIG_X86_X32_ABI` is disabled we compare the value of the `rax` register to the maximum syscall number (`__NR_syscall_max`), alternatively if the `CONFIG_X86_X32_ABI` is enabled we mask the `eax` register with the `__X32_SYSCALL_BIT` and do the same comparison:
```assembly
#if __SYSCALL_MASK == ~0
@ -310,7 +310,7 @@ After this we check the result of the last comparison with the `ja` instruction
ja 1f
```
and if we have the correct system call for this, we move the fourth argument from the `r10` to the `rcx` to keep [x86_64 C ABI](http://www.x86-64.org/documentation/abi.pdf) compliant and execute the `call` instruction with the address of a system call handler:
and if we have the correct system call for this, we move the fourth argument from the `r10` to the `rcx` to keep [x86_64 C ABI](http://www.x86-64.org/documentation/abi.pdf) compliant and execute the `call` instruction with the address of a system call handler:
```assembly
movq %r10, %rcx
@ -367,11 +367,11 @@ In the end we just call the `USERGS_SYSRET64` macro that expands to the call of
Now we know what occurs when a user application calls a system call. The full path of this process is as follows:
* User application contains code that fills general purpose register with the values (system call number and arguments of this system call);
* Processor switches from the user mode to kernel mode and starts execution of the system call entry - `entry_SYSCALL_64`;
* Processor switches from the user mode to kernel mode and starts execution of the system call entry - `entry_SYSCALL_64`;
* `entry_SYSCALL_64` switches to the kernel stack and saves some general purpose registers, old stack and code segment, flags and etc... on the stack;
* `entry_SYSCALL_64` checks the system call number in the `rax` register, searches a system call handler in the `sys_call_table` and calls it, if the number of a system call is correct;
* If a system call is not correct, jump on exit from system call;
* After a system call handler will finish its work, restore general purpose registers, old stack, flags and return address and exit from the `entry_SYSCALL_64` with the `sysretq` instruction.
* After a system call handler will finish its work, restore general purpose registers, old stack, flags and return address and exit from the `entry_SYSCALL_64` with the `sysretq` instruction.
That's all.