1
0
mirror of https://github.com/0xAX/linux-insides.git synced 2025-01-09 23:31:09 +00:00

Update asm.md

This commit is contained in:
walkerlala 2017-02-21 09:32:49 +08:00 committed by GitHub
parent c9a9daa15f
commit 32672a7d90

View File

@ -7,7 +7,7 @@ Introduction
While reading source code in the [Linux kernel](https://github.com/torvalds/linux), I often see statements like this: While reading source code in the [Linux kernel](https://github.com/torvalds/linux), I often see statements like this:
```C ```C
__asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK)); __asm__("andl %%esp,%0; ":"=r" (ti) : "0" (CURRENT_MASK));
``` ```
Yes, this is [inline assembly](https://en.wikipedia.org/wiki/Inline_assembler) or in other words assembler code which is integrated in a high level programming language. In this case the high level programming language is [C](https://en.wikipedia.org/wiki/C_%28programming_language%29). Yes, the `C` programming language is not very high-level, but still. Yes, this is [inline assembly](https://en.wikipedia.org/wiki/Inline_assembler) or in other words assembler code which is integrated in a high level programming language. In this case the high level programming language is [C](https://en.wikipedia.org/wiki/C_%28programming_language%29). Yes, the `C` programming language is not very high-level, but still.
@ -15,7 +15,7 @@ Yes, this is [inline assembly](https://en.wikipedia.org/wiki/Inline_assembler) o
If you are familiar with the [assembly](https://en.wikipedia.org/wiki/Assembly_language) programming language, you may notice that `inline assembly` is not very different from normal assembler. Moreover, the special form of inline assembly which is called `basic form` is exactly the same. For example: If you are familiar with the [assembly](https://en.wikipedia.org/wiki/Assembly_language) programming language, you may notice that `inline assembly` is not very different from normal assembler. Moreover, the special form of inline assembly which is called `basic form` is exactly the same. For example:
```C ```C
__asm__("movq %rax, %rsp"); __asm__("movl %eax, %esp");
``` ```
or or
@ -32,8 +32,8 @@ The same code (of course without `__asm__` prefix) you might see in plain assemb
The basic form consists of only two things: the `__asm__` keyword and the string with valid assembler instructions. For example it may look something like this: The basic form consists of only two things: the `__asm__` keyword and the string with valid assembler instructions. For example it may look something like this:
```C ```C
__asm__("movq $3, %rax\t\n" __asm__("movl $3, %eax\t\n"
"movq %rsi, %rdi"); "movl %esi, %edi");
``` ```
The `asm` keyword may be used in place of `__asm__`, however `__asm__` is portable whereas the `asm` keyword is a `GNU` [extension](https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html). In further examples I will only use the `__asm__` variant. The `asm` keyword may be used in place of `__asm__`, however `__asm__` is portable whereas the `asm` keyword is a `GNU` [extension](https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html). In further examples I will only use the `__asm__` variant.
@ -42,7 +42,9 @@ If you know assembly programming language this looks pretty familiar. The main p
I've decided to write this part to consolidate my knowledge related to the inline assembly, as inline assembly statements are quite common in the Linux kernel and we may see them in [linux-insides](https://0xax.gitbooks.io/linux-insides/content/) parts sometimes. I thought that it would be useful if we have a special part which contains information on more important aspects of the inline assembly. Of course you may find comprehensive information about inline assembly in the official [documentation](https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html#Using-Assembly-Language-with-C), but I like to put everything in one place. I've decided to write this part to consolidate my knowledge related to the inline assembly, as inline assembly statements are quite common in the Linux kernel and we may see them in [linux-insides](https://0xax.gitbooks.io/linux-insides/content/) parts sometimes. I thought that it would be useful if we have a special part which contains information on more important aspects of the inline assembly. Of course you may find comprehensive information about inline assembly in the official [documentation](https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html#Using-Assembly-Language-with-C), but I like to put everything in one place.
** Note: This part will not provide guide for assembly programming. It is not intended to teach you to write programs with assembler or to know what one or another assembler instruction means. Just a little memo for extended asm. ** **Note:**
- This part will not provide guide for assembly programming. It is not intended to teach you to write programs with assembler or to know what one or another assembler instruction means. Just a little memo for extended asm.
- Codes in this document are targeted at 32-bit machine specifically. But, in 64-bit machine, things would be similar
Introduction to extended inline assembly Introduction to extended inline assembly
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
@ -159,7 +161,7 @@ As mentioned above, the "clobbered" part should contain a comma-separated list o
Consider the example from before, but we will add an additional, simple assembler instruction: Consider the example from before, but we will add an additional, simple assembler instruction:
```C ```C
__asm__("movq $100, %%rdx\t\n" __asm__("movl $100, %%edx\t\n"
"addl %1,%2" : "=r" (sum) : "r" (a), "0" (b)); "addl %1,%2" : "=r" (sum) : "r" (a), "0" (b));
``` ```
@ -169,15 +171,15 @@ If we look at the assembly output
0000000000400400 <main>: 0000000000400400 <main>:
400400: ba 05 00 00 00 mov $0x5,%edx 400400: ba 05 00 00 00 mov $0x5,%edx
400405: b8 0a 00 00 00 mov $0xa,%eax 400405: b8 0a 00 00 00 mov $0xa,%eax
40040a: 48 c7 c2 64 00 00 00 mov $0x64,%rdx 40040a: 48 c7 c2 64 00 00 00 mov $0x64,%edx
400411: 01 d0 add %edx,%eax 400411: 01 d0 add %edx,%eax
``` ```
we see that the `%edx` register is overwritten with `0x64` or `100` and the result will be `115` instead of `15`. Now if we add the `%rdx` register to the list of "clobbered" register we see that the `%edx` register is overwritten with `0x64` or `100` and the result will be `115` instead of `15`. Now if we add the `%rdx` register to the list of "clobbered" register
```C ```C
__asm__("movq $100, %%rdx\t\n" __asm__("movl $100, %%edx\t\n"
"addl %1,%2" : "=r" (sum) : "r" (a), "0" (b) : "%rdx"); "addl %1,%2" : "=r" (sum) : "r" (a), "0" (b) : "%edx");
``` ```
and look at the assembler output again and look at the assembler output again
@ -186,7 +188,7 @@ and look at the assembler output again
0000000000400400 <main>: 0000000000400400 <main>:
400400: b9 05 00 00 00 mov $0x5,%ecx 400400: b9 05 00 00 00 mov $0x5,%ecx
400405: b8 0a 00 00 00 mov $0xa,%eax 400405: b8 0a 00 00 00 mov $0xa,%eax
40040a: 48 c7 c2 64 00 00 00 mov $0x64,%rdx 40040a: 48 c7 c2 64 00 00 00 mov $0x64,%edx
400411: 01 c8 add %ecx,%eax 400411: 01 c8 add %ecx,%eax
``` ```
@ -198,7 +200,7 @@ the `%ecx` register will be used for `sum` calculation, preserving the intended
The first - `cc` indicates that an assembler code modifies [flags](https://en.wikipedia.org/wiki/FLAGS_register) register. This is typically used if the assembly within contains arithmetic or logic instructions. The first - `cc` indicates that an assembler code modifies [flags](https://en.wikipedia.org/wiki/FLAGS_register) register. This is typically used if the assembly within contains arithmetic or logic instructions.
```C ```C
__asm__("incq %0" ::""(variable): "cc"); __asm__("incl %0" ::""(variable): "cc");
``` ```
The second `memory` specifier tells the compiler that the given inline assembly statement executes read/write operations on memory not specified by operands in the output list. This prevents the compiler from keeping memory values loaded and cached in registers. Let's take a look at the following example: The second `memory` specifier tells the compiler that the given inline assembly statement executes read/write operations on memory not specified by operands in the output list. This prevents the compiler from keeping memory values loaded and cached in registers. Let's take a look at the following example:
@ -229,13 +231,13 @@ The result is `5` here, but why? We incremented `a[0]` and subtracted b, so the
```assembly ```assembly
00000000004004f6 <main>: 00000000004004f6 <main>:
4004f6: c7 44 24 f0 0a 00 00 movl $0xa,-0x10(%rsp) 4004f6: c7 44 24 f0 0a 00 00 movl $0xa,-0x10(%esp)
4004fd: 00 4004fd: 00
4004fe: c7 44 24 f4 14 00 00 movl $0x14,-0xc(%rsp) 4004fe: c7 44 24 f4 14 00 00 movl $0x14,-0xc(%esp)
400505: 00 400505: 00
400506: c7 44 24 f8 1e 00 00 movl $0x1e,-0x8(%rsp) 400506: c7 44 24 f8 1e 00 00 movl $0x1e,-0x8(%esp)
40050d: 00 40050d: 00
40050e: ff 44 24 f0 incl -0x10(%rsp) 40050e: ff 44 24 f0 incl -0x10(%esp)
400512: b8 05 00 00 00 mov $0x5,%eax 400512: b8 05 00 00 00 mov $0x5,%eax
``` ```
@ -259,22 +261,22 @@ Now the result is correct. If we look at the assembly output again
```assembly ```assembly
00000000004004f6 <main>: 00000000004004f6 <main>:
4004f6: c7 44 24 f0 0a 00 00 movl $0xa,-0x10(%rsp) 4004f6: c7 44 24 f0 0a 00 00 movl $0xa,-0x10(%esp)
4004fd: 00 4004fd: 00
4004fe: c7 44 24 f4 14 00 00 movl $0x14,-0xc(%rsp) 4004fe: c7 44 24 f4 14 00 00 movl $0x14,-0xc(%esp)
400505: 00 400505: 00
400506: c7 44 24 f8 1e 00 00 movl $0x1e,-0x8(%rsp) 400506: c7 44 24 f8 1e 00 00 movl $0x1e,-0x8(%esp)
40050d: 00 40050d: 00
40050e: ff 44 24 f0 incl -0x10(%rsp) 40050e: ff 44 24 f0 incl -0x10(%esp)
400512: 8b 44 24 f0 mov -0x10(%rsp),%eax 400512: 8b 44 24 f0 mov -0x10(%esp),%eax
400516: 83 e8 05 sub $0x5,%eax 400516: 83 e8 05 sub $0x5,%eax
400519: c3 retq 400519: c3 retl
``` ```
we will see one difference here which is in the following piece code: we will see one difference here which is in the following piece code:
```assembly ```assembly
400512: 8b 44 24 f0 mov -0x10(%rsp),%eax 400512: 8b 44 24 f0 mov -0x10(%esp),%eax
400516: 83 e8 05 sub $0x5,%eax 400516: 83 e8 05 sub $0x5,%eax
``` ```
@ -313,7 +315,7 @@ unsigned long test_asm(int nr)
{ {
unsigned long a = 0; unsigned long a = 0;
__asm__("movq %1, %0" : "=r"(a) : "I"(0xffffffffffff)); __asm__("movl %1, %0" : "=r"(a) : "I"(0xffffffffffff));
return a; return a;
} }
``` ```
@ -324,7 +326,7 @@ you will get an error
$ gcc -O3 test.c -o test $ gcc -O3 test.c -o test
test.c: In function test_asm: test.c: In function test_asm:
test.c:7:9: warning: asm operand 1 probably doesnt match constraints test.c:7:9: warning: asm operand 1 probably doesnt match constraints
__asm__("movq %1, %0" : "=r"(a) : "I"(0xffffffffffff)); __asm__("movl %1, %0" : "=r"(a) : "I"(0xffffffffffff));
^ ^
test.c:7:9: error: impossible constraint in asm test.c:7:9: error: impossible constraint in asm
``` ```
@ -336,7 +338,7 @@ unsigned long test_asm(int nr)
{ {
unsigned long a = 0; unsigned long a = 0;
__asm__("movq %1, %0" : "=r"(a) : "i"(0xffffffffffff)); __asm__("movl %1, %0" : "=r"(a) : "i"(0xffffffffffff));
return a; return a;
} }
``` ```
@ -359,7 +361,7 @@ int main(void)
static unsigned long arr[3] = {0, 1, 2}; static unsigned long arr[3] = {0, 1, 2};
static unsigned long element; static unsigned long element;
__asm__ volatile("movq 16+%1, %0" : "=r"(element) : "o"(arr)); __asm__ volatile("movl 16+%1, %0" : "=r"(element) : "o"(arr));
printf("%lu\n", element); printf("%lu\n", element);
return 0; return 0;
} }
@ -392,7 +394,7 @@ will use a memory operand.
```assembly ```assembly
0000000000400400 <main>: 0000000000400400 <main>:
400400: 8b 05 26 0c 20 00 mov 0x200c26(%rip),%eax # 60102c <a> 400400: 8b 05 26 0c 20 00 mov 0x200c26(%eip),%eax # 60102c <a>
``` ```
That's about all of the commonly used constraints in inline assembly statements. You can find more in the official [documentation](https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints). That's about all of the commonly used constraints in inline assembly statements. You can find more in the official [documentation](https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints).
@ -400,7 +402,7 @@ That's about all of the commonly used constraints in inline assembly statements.
Architecture specific constraints Architecture specific constraints
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
Before we finish, let's look at the set of special constraints. These constrains are architecture specific and as this book is specific to the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, we will look at constraints related to it. First of all the set of `a` ... `d` and also `S` and `D` constraints represent [generic purpose](https://en.wikipedia.org/wiki/Processor_register) registers. In this case the `a` constraint corresponds to `%al`, `%ax`, `%eax` or `%rax` register depending on instruction size. The `S` and `D` constraints are `%si` and `%di` registers respectively. For example let's take our previous example. We can see in its assembly output that value of the `a` variable is stored in the `%eax` register. Now let's look at the assembly output of the same assembly, but with other constraint: Before we finish, let's look at the set of special constraints. These constrains are architecture specific and as this book is specific to the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, we will look at constraints related to it. First of all the set of `a` ... `d` and also `S` and `D` constraints represent [generic purpose](https://en.wikipedia.org/wiki/Processor_register) registers. In this case the `a` constraint corresponds to `%al`, `%ax`, `%eax` register depending on instruction size. The `S` and `D` constraints are `%si` and `%di` registers respectively. For example let's take our previous example. We can see in its assembly output that value of the `a` variable is stored in the `%eax` register. Now let's look at the assembly output of the same assembly, but with other constraint:
```C ```C
#include <stdio.h> #include <stdio.h>
@ -419,7 +421,7 @@ Now we see that value of the `a` variable will be stored in the `%edx` register:
```assembly ```assembly
0000000000400400 <main>: 0000000000400400 <main>:
400400: 8b 15 26 0c 20 00 mov 0x200c26(%rip),%edx # 60102c <a> 400400: 8b 15 26 0c 20 00 mov 0x200c26(%eip),%edx # 60102c <a>
``` ```
The `f` and `t` constraints represent any floating point stack register - `%st` and the top of the floating point stack respectively. The `u` constraint represents the second value from the top of the floating point stack. The `f` and `t` constraints represent any floating point stack register - `%st` and the top of the floating point stack respectively. The `u` constraint represents the second value from the top of the floating point stack.