mirror of
https://github.com/0xAX/linux-insides.git
synced 2025-01-22 05:31:19 +00:00
commit
7777b3f30b
@ -72,7 +72,7 @@ As mentioned above the GDT contains `segment descriptors` which describe memory
|
||||
------------------------------------------------------------
|
||||
```
|
||||
|
||||
Don't worry, I know it looks a little scary after real mode, but it's easy. For example LIMIT 15:0 means that bit 0-15 of the Descriptor contain the value for the limit. The rest of it is in LIMIT 16:19. So, the size of Limit is 0-19 i.e 20-bits. Let's take a closer look at it:
|
||||
Don't worry, I know it looks a little scary after real mode, but it's easy. For example LIMIT 15:0 means that bit 0-15 of the Descriptor contain the value for the limit. The rest of it is in LIMIT 19:16. So, the size of Limit is 0-19 i.e 20-bits. Let's take a closer look at it:
|
||||
|
||||
1. Limit[20-bits] is at 0-15,16-19 bits. It defines `length_of_segment - 1`. It depends on `G`(Granularity) bit.
|
||||
|
||||
|
@ -14,7 +14,7 @@ In the previous [part](https://github.com/0xAX/linux-insides/blob/master/Booting
|
||||
jmpl *%eax
|
||||
```
|
||||
|
||||
Remind that `eax` register contains the address of the 32-bit entry point. We can read about this point from the linux kernel x86 boot protocol:
|
||||
Recall that `eax` register contains the address of the 32-bit entry point. We can read about this point from the linux kernel x86 boot protocol:
|
||||
|
||||
```
|
||||
When using bzImage, the protected-mode kernel was relocated to 0x100000
|
||||
|
@ -14,7 +14,7 @@ struct list_head {
|
||||
};
|
||||
```
|
||||
|
||||
You can note that it is different from many implementations of doubly linked list which you have seen. For example, this doubly linked list structure from the [glib](http://www.gnu.org/software/libc/) looks like :
|
||||
You can note that it is different from many implementations of doubly linked list which you have seen. For example, this doubly linked list structure from the [glib](http://www.gnu.org/software/libc/) library looks like :
|
||||
|
||||
```C
|
||||
struct GList {
|
||||
@ -118,13 +118,13 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
|
||||
}
|
||||
```
|
||||
|
||||
In the next step after device is created by the `device_create` function, we add it to the miscellaneous devices list with:
|
||||
In the next step after a device is created by the `device_create` function, we add it to the miscellaneous devices list with:
|
||||
|
||||
```
|
||||
list_add(&misc->list, &misc_list);
|
||||
```
|
||||
|
||||
Kernel `list.h` provides this API for the addition of new entry to the list. Let's look on it's implementation:
|
||||
Kernel `list.h` provides this API for the addition of a new entry to the list. Let's look at its implementation:
|
||||
|
||||
```C
|
||||
static inline void list_add(struct list_head *new, struct list_head *head)
|
||||
@ -135,8 +135,8 @@ static inline void list_add(struct list_head *new, struct list_head *head)
|
||||
|
||||
It just calls internal function `__list_add` with the 3 given parameters:
|
||||
|
||||
* new - new entry;
|
||||
* head - list head after which the new item will be inserted
|
||||
* new - new entry.
|
||||
* head - list head after which the new item will be inserted.
|
||||
* head->next - next item after list head.
|
||||
|
||||
Implementation of the `__list_add` is pretty simple:
|
||||
@ -189,7 +189,7 @@ As we can see it just calls `container_of` macro with the same arguments. At fir
|
||||
(type *)( (char *)__mptr - offsetof(type,member) );})
|
||||
```
|
||||
|
||||
First of all you can note that it consists of two expressions in curly brackets. Compiler will evaluate the whole block in the curly braces and use the value of the last expression.
|
||||
First of all you can note that it consists of two expressions in curly brackets. The compiler will evaluate the whole block in the curly braces and use the value of the last expression.
|
||||
|
||||
For example:
|
||||
|
||||
@ -205,7 +205,7 @@ int main() {
|
||||
|
||||
will print `2`.
|
||||
|
||||
The next point is `typeof`, it's simple. As you can understand from its name, it just returns the type of the given variable. When I first saw the implementation of the `container_of` macro, the strangest thing for me was the zero in the `((type *)0)` expression. Actually this pointer magic calculates the offset of the given field from the address of the structure, but as we have `0` here, it will be just a zero offset alongwith the field width. Let's look at a simple example:
|
||||
The next point is `typeof`, it's simple. As you can understand from its name, it just returns the type of the given variable. When I first saw the implementation of the `container_of` macro, the strangest thing I found was the zero in the `((type *)0)` expression. Actually this pointer magic calculates the offset of the given field from the address of the structure, but as we have `0` here, it will be just a zero offset along with the field width. Let's look at a simple example:
|
||||
|
||||
```C
|
||||
#include <stdio.h>
|
||||
@ -224,13 +224,13 @@ int main() {
|
||||
|
||||
will print `0x5`.
|
||||
|
||||
The next offsetof macro calculates offset from the beginning of the structure to the given structure's field. Its implementation is very similar to the previous code:
|
||||
The next `offsetof` macro calculates offset from the beginning of the structure to the given structure's field. Its implementation is very similar to the previous code:
|
||||
|
||||
```C
|
||||
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
|
||||
```
|
||||
|
||||
Let's summarize all about `container_of` macro. `container_of` macro returns address of the structure by the given address of the structure's field with `list_head` type, the name of the structure field with `list_head` type and type of the container structure. At the first line this macro declares the `__mptr` pointer which points to the field of the structure that `ptr` points to and assigns `ptr` to it. Now `ptr` and `__mptr` point to the same address. Technically we don't need this line but its useful for type checking. First line ensures that that given structure (`type` parameter) has a member called `member`. In the second line it calculates offset of the field from the structure with the `offsetof` macro and subtracts it from the structure address. That's all.
|
||||
Let's summarize all about `container_of` macro. The `container_of` macro returns the address of the structure by the given address of the structure's field with `list_head` type, the name of the structure field with `list_head` type and type of the container structure. At the first line this macro declares the `__mptr` pointer which points to the field of the structure that `ptr` points to and assigns `ptr` to it. Now `ptr` and `__mptr` point to the same address. Technically we don't need this line but it's useful for type checking. The first line ensures that the given structure (`type` parameter) has a member called `member`. In the second line it calculates offset of the field from the structure with the `offsetof` macro and subtracts it from the structure address. That's all.
|
||||
|
||||
Of course `list_add` and `list_entry` is not the only functions which `<linux/list.h>` provides. Implementation of the doubly linked list provides the following API:
|
||||
|
||||
|
@ -41,9 +41,9 @@ Lets talk about what a `radix tree` is. Radix tree is a `compressed trie` where
|
||||
+-----------+
|
||||
```
|
||||
|
||||
So in this example, we can see the `trie` with keys, `go` and `cat`. A compressed trie or `radix tree` differs from a `trie` in that all intermediates nodes which have only one child are removed.
|
||||
So in this example, we can see the `trie` with keys, `go` and `cat`. The compressed trie or `radix tree` differs from `trie` in that all intermediates nodes which have only one child are removed.
|
||||
|
||||
Radix tree in linux kernel is the data structure which maps values to the integer key. It is represented by the following structures from the file [include/linux/radix-tree.h](https://github.com/torvalds/linux/blob/master/include/linux/radix-tree.h):
|
||||
Radix tree in linux kernel is the datastructure which maps values to integer keys. It is represented by the following structures from the file [include/linux/radix-tree.h](https://github.com/torvalds/linux/blob/master/include/linux/radix-tree.h):
|
||||
|
||||
```C
|
||||
struct radix_tree_root {
|
||||
@ -56,14 +56,20 @@ struct radix_tree_root {
|
||||
This structure presents the root of a radix tree and contains three fields:
|
||||
|
||||
* `height` - height of the tree;
|
||||
* `gfp_mask` - tells how memory allocations are to be performed;
|
||||
* `gfp_mask` - tells how memory allocations will be performed;
|
||||
* `rnode` - pointer to the child node.
|
||||
|
||||
The first structure we will discuss is `gfp_mask`:
|
||||
The first field we will discuss is `gfp_mask`:
|
||||
|
||||
Low-level kernel memory allocation functions take a set of flags as - `gfp_mask`, which describes how that allocation is to be performed. These `GFP_` flags which control the allocation process can have following values: (`GF_NOIO` flag) means sleep and wait for memory, (`__GFP_HIGHMEM` flag) means high memory can be used, (`GFP_ATOMIC` flag) means the allocation process has high-priority and can't sleep etc.
|
||||
|
||||
The next structure is `rnode`:
|
||||
* `GFP_NOIO` - can sleep and wait for memory;
|
||||
* `__GFP_HIGHMEM` - high memory can be used;
|
||||
* `GFP_ATOMIC` - allocation process is high-priority and can't sleep;
|
||||
|
||||
etc.
|
||||
|
||||
The next field is `rnode`:
|
||||
|
||||
```C
|
||||
struct radix_tree_node {
|
||||
@ -83,7 +89,7 @@ struct radix_tree_node {
|
||||
};
|
||||
```
|
||||
|
||||
This structure contains information about the offset in a parent and height from the bottom, count of the child nodes and fields for accessing and freeing a node. The fields are described below:
|
||||
This structure contains information about the offset in a parent and height from the bottom, count of the child nodes and fields for accessing and freeing a node. This fields are described below:
|
||||
|
||||
* `path` - offset in parent & height from the bottom;
|
||||
* `count` - count of the child nodes;
|
||||
@ -99,7 +105,7 @@ Now that we know about radix tree structure, it is time to look on its API.
|
||||
Linux kernel radix tree API
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
We start from the data structure intialization. There are two ways to initialize new radix tree. The first is to use `RADIX_TREE` macro:
|
||||
We start from the datastructure initialization. There are two ways to initialize a new radix tree. The first is to use `RADIX_TREE` macro:
|
||||
|
||||
```C
|
||||
RADIX_TREE(name, gfp_mask);
|
||||
@ -140,10 +146,10 @@ do { \
|
||||
|
||||
makes the same initialziation with default values as it does `RADIX_TREE_INIT` macro.
|
||||
|
||||
The next are two functions for the inserting and deleting records to/from a radix tree:
|
||||
The next are two functions for inserting and deleting records to/from a radix tree:
|
||||
|
||||
* `radix_tree_insert`;
|
||||
* `radix_tree_delete`.
|
||||
* `radix_tree_delete`;
|
||||
|
||||
The first `radix_tree_insert` function takes three parameters:
|
||||
|
||||
@ -173,7 +179,7 @@ unsigned int radix_tree_gang_lookup(struct radix_tree_root *root,
|
||||
unsigned int max_items);
|
||||
```
|
||||
|
||||
and returns number of records, sorted by the keys, starting from the first index. Number of the returned records will be not greater than `max_items` value.
|
||||
and returns number of records, sorted by the keys, starting from the first index. Number of the returned records will not be greater than `max_items` value.
|
||||
|
||||
And the last `radix_tree_lookup_slot` function will return the slot which will contain the data.
|
||||
|
||||
|
@ -122,7 +122,7 @@ The first step before we started to setup identity paging, need to correct follo
|
||||
addq %rbp, level2_fixmap_pgt + (506*8)(%rip)
|
||||
```
|
||||
|
||||
Here we need to correct `early_level4_pgt` and other addresses of the page table directories, because as I wrote above, kernel can't be run at the default `0x1000000` address. `rbp` register contains actual address so we add to the `early_level4_pgt`, `level3_kernel_pgt` and `level2_fixmap_pgt`. Let's try to understand what these labels means. First of all let's look on their definition:
|
||||
Here we need to correct `early_level4_pgt` and other addresses of the page table directories, because as I wrote above, kernel can't be run at the default `0x1000000` address. `rbp` register contains actual address so we add to the `early_level4_pgt`, `level3_kernel_pgt` and `level2_fixmap_pgt`. Let's try to understand what these labels mean. First of all let's look on their definition:
|
||||
|
||||
```assembly
|
||||
NEXT_PAGE(early_level4_pgt)
|
||||
@ -398,7 +398,7 @@ As we loaded new Global Descriptor Table, we reload segments as we did it every
|
||||
movl %eax,%gs
|
||||
```
|
||||
|
||||
After all of these steps we set up `gs` register that it post to the `irqstack` (we will see information about it in the next parts):
|
||||
After all of these steps we set up `gs` register that it post to the `irqstack` (we will see information about it in the upcoming parts):
|
||||
|
||||
```assembly
|
||||
movl $MSR_GS_BASE,%ecx
|
||||
@ -508,7 +508,7 @@ This is the end of the first part about linux kernel initialization.
|
||||
|
||||
If you have questions or suggestions, feel free to ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/0xAX/linux-internals/issues/new).
|
||||
|
||||
In the next part we will see initialization of the early interruption handlers, kernel space memory mapping and many many more.
|
||||
In the next part we will see initialization of the early interruption handlers, kernel space memory mapping and a lot more.
|
||||
|
||||
**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-insides).**
|
||||
|
||||
|
@ -4,15 +4,15 @@ Linux kernel development
|
||||
Introduction
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
As you already may know, I've started a series of [blog posts](http://0xax.github.io/categories/assembly/) about assembler programming for `x86_64` architecture in the last year. I hadn't written even one line of low-level code before this moment, of course except a couple of toy `Hello World` examples in the university. It was a long time ago and, as I already said, I didn't write low-level code at all. Some time ago I became interested in such things, or in other words, I understood that I can write programs, but actually I didn't understand how my program is arranged.
|
||||
As you already may know, I've started a series of [blog posts](http://0xax.github.io/categories/assembly/) about assembler programming for `x86_64` architecture in the last year. I have never written a line of low-level code before this moment, except for a couple of toy `Hello World` examples in the university. It was already a long time ago and as I already said I didn't write low-level code at all. Some time ago I was interested in such things or in other words I understood that I can write programs, but actually I didn't understand how my program is arranged.
|
||||
|
||||
After writing some assembler code I began to understand how my program looks after compilation, **approximately**. But I didn't understand many different things. For example: what occurs when the `syscall` instruction is executed in my assembler, what occurs when the `printf` function starts to work, how does my program talk with other computers via a network and many many other cases. [Assembler](https://en.wikipedia.org/wiki/Assembly_language#Assembler) programming language didn't give me the answers to my questions, and I decided to go deeper in my research. I started to learn the source code of the Linux kernel and tried to understand things that interest me. The source code of the Linux kernel didn't give me answers on **all** of my questions, but now my knowledge about the Linux kernel and processes around it is much better.
|
||||
After writing some assembler code I began to understand how my program looks after compilation, **approximately**. But anyway, I didn't understand many other things. For example: what occurs when the `syscall` instruction is executed in my assembler, what occurs when the `printf` function starts to work or how can my program talk with other computers via network. [Assembler](https://en.wikipedia.org/wiki/Assembly_language#Assembler) programming language didn't give me answers to my questions and I decided to go deeper in my research. I started to learn from the source code of the Linux kernel and tried to understand the things that I'm interested in. The source code of the Linux kernel didn't give me the answers to **all** of my questions, but now my knowledge about the Linux kernel and the processes around it is much better.
|
||||
|
||||
I'm writing this part nine and a half months after I started learning the source code of the Linux kernel and published the first [part](https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html) of this book. It now contains forty parts, and that is not the end. I decided to write this series about the Linux kernel mostly for myself. As you know, the Linux kernel is very huge piece of code and it is very easy to forget what it does, what this or that part of the Linux kernel means, and how it implements things. But soon, the [linux-insides](https://github.com/0xAX/linux-insides) repo became popular and after nine months it has `9096` stars:
|
||||
I'm writing this part nine and a half months after I've started to learn from the source code of the Linux kernel and published the first [part](https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html) of this book. Now it contains forty parts and it is not the end. I decided to write this series about the Linux kernel mostly for myself. As you know the Linux kernel is very huge piece of code and it is easy to forget what does this or that part of the Linux kernel mean and how does it implement something. But soon the [linux-insides](https://github.com/0xAX/linux-insides) repo became popular and after nine months it has `9096` stars:
|
||||
|
||||
![github](http://s2.postimg.org/jjb3s4frt/stars.png)
|
||||
|
||||
Yeah, seems that people are interested in the internals of the Linux kernel. Besides this, in all that time that I'm writing `linux-inside`, I have received many questions from different people like: how to start with the Linux kernel, what do I need to start contribute to the Linux kernel and and others like these. Generally people are interested contribute to open source project for different reasons and the Linux kernel is not exception:
|
||||
It seems that people are interested in the internals of the Linux kernel. Besides this, in all that time that I'm writing `linux-inside`, I have received many questions from different people like: how to start with the Linux kernel, what do I need to start contribute to the Linux kernel and and others like these. Generally people are interested contribute to open source project for different reasons and the Linux kernel is not exception:
|
||||
|
||||
![google-linux](http://s4.postimg.org/yg9z5zx0d/google_linux.png)
|
||||
|
||||
@ -199,7 +199,7 @@ echo -e "Distributive: ${Green}${DISTRIBUTIVE}${Color_Off}"
|
||||
|
||||
if [[ "$DISTRIBUTIVE" == "Fedora" ]] ;
|
||||
then
|
||||
su -c 'grub2-mkconfig -o /boot/grub2/grub.cfg'
|
||||
su -c 'grub2-mkconfig -o /boot/grub2/grub.cfg'
|
||||
else
|
||||
sudo update-grub
|
||||
fi
|
||||
@ -254,10 +254,10 @@ copy `busybox` fields to the `bin`, `sbin` and other directories. Now we need to
|
||||
|
||||
```shell
|
||||
#!/bin/sh
|
||||
|
||||
|
||||
mount -t proc none /proc
|
||||
mount -t sysfs none /sys
|
||||
|
||||
|
||||
exec /bin/sh
|
||||
```
|
||||
|
||||
@ -341,7 +341,7 @@ $ git commit -s -v
|
||||
|
||||
After the last command an editor will be openned that will be chosen from `$GIT_EDITOR` or `$EDITOR` environment variable. The `-s` command line argument will add `Signed-off-by` line by the committer at the end of the commit log message. You can find this line in the end of each commit message, for example - [00cc1633](https://github.com/torvalds/linux/commit/00cc1633816de8c95f337608a1ea64e228faf771). The main point of this line is the tracking of who did a change. The `-v` option show unified diff between the HEAD commit and what would be committed at the bottom of the commit message. It is not necessary, but very useful sometimes. A couple of words about commit message. Actually a commit message consists from two parts:
|
||||
|
||||
The first part is on the first line and contains short descrption of changes. It starts from the `[PATCH]` prefix followed by a subsystem, driver or architecture name and after `:` symbol short description. In our case it will be something like this:
|
||||
The first part is on the first line and contains short description of changes. It starts from the `[PATCH]` prefix followed by a subsystem, driver or architecture name and after `:` symbol short description. In our case it will be something like this:
|
||||
|
||||
```
|
||||
[PATCH] staging/dgap: Use strpbrk() instead of dgap_sindex()
|
||||
@ -354,12 +354,12 @@ The <linux/string.h> provides strpbrk() function that does the same that the
|
||||
dgap_sindex(). Let's use already defined function instead of writing custom.
|
||||
```
|
||||
|
||||
And the `Sign-off-by` line in the end of the commit message. Note that each line of a commit message must no be longer than `80` symbols and commit message must describe your changes in detail. Do not just write a commit message like: `Custom function removed`, you need to describe what you are did and why. The patch reviewers must know what they review. Besides this commit messages in this view are very helpful. Each time when we can't understand something, we can use [git blame](http://git-scm.com/docs/git-blame) to read description of changes.
|
||||
And the `Sign-off-by` line in the end of the commit message. Note that each line of a commit message must no be longer than `80` symbols and commit message must describe your changes in details. Do not just write a commit message like: `Custom function removed`, you need to describe what you did and why. The patch reviewers must know what they review. Besides this commit messages in this view are very helpful. Each time when we can't understand something, we can use [git blame](http://git-scm.com/docs/git-blame) to read description of changes.
|
||||
|
||||
After we have commited changes time to generate patch. We can do it with the `format-patch` command:
|
||||
After we have committed changes time to generate patch. We can do it with the `format-patch` command:
|
||||
|
||||
```
|
||||
$ git format-patch master
|
||||
$ git format-patch master
|
||||
0001-staging-dgap-Use-strpbrk-instead-of-dgap_sindex.patch
|
||||
```
|
||||
|
||||
@ -382,7 +382,7 @@ devel@driverdev.osuosl.org (open list:STAGING SUBSYSTEM)
|
||||
linux-kernel@vger.kernel.org (open list)
|
||||
```
|
||||
|
||||
Yout will see the set of the names and related emails. Now we can send our patch with:
|
||||
You will see the set of the names and related emails. Now we can send our patch with:
|
||||
|
||||
```
|
||||
$ git send-email --to "Lidza Louina <lidza.louina@gmail.com>" \
|
||||
@ -428,7 +428,7 @@ Also you can see problematic places with the help of the `git diff`:
|
||||
|
||||
* [Linus doesn't accept github pull requests](https://github.com/torvalds/linux/pull/17#issuecomment-5654674)
|
||||
|
||||
* If your change consists from some different and not too closely related changes, you need to split your changes. Each change must in a separate commit. The `git format-patch` command will generate patches for each commit and subject of each patch will contain `vN` prefix where the `N` is the number of the patch. If you are planning to send not patch, but series of patches, will be good if you will pass `--cover-letter` option to the `git format-patch` command. This will generate additional file that will contain cover letter that you can use to describe what your patchset changes. Also it is good idea to use `--in-reply-to` option in the `git send-email` command. This option allows you to send your patchseries in reply to the your cover message, so the structure of the your patch will be look like this for a maintainer:
|
||||
* If your change consists from some different and unrelated changes, you need to split the changes via separate commits. The `git format-patch` command will generate patches for each commit and the subject of each patch will contain a `vN` prefix where the `N` is the number of the patch. If you are planning to send a series of patches it will be helpful to pass the `--cover-letter` option to the `git format-patch` command. This will generate an additional file that will contain the cover letter that you can use to describe what your patchset changes. It is also a good idea to use the `--in-reply-to` option in the `git send-email` command. This option allows you to send your patch series in reply to your cover message. The structure of the your patch will look like this for a maintainer:
|
||||
|
||||
```
|
||||
|--> cover letter
|
||||
@ -436,36 +436,31 @@ Also you can see problematic places with the help of the `git diff`:
|
||||
|----> patch_2
|
||||
```
|
||||
|
||||
You need to pass `message-id` as value of the `--in-reply-to` option that you can find in the output of the `git send-email`:
|
||||
You need to pass `message-id` as an argument of the `--in-reply-to` option that you can find in the output of the `git send-email`:
|
||||
|
||||
![send-email](http://oi60.tinypic.com/2mhd8wo.jpg)
|
||||
It's important that your email be in the [plain text](https://en.wikipedia.org/wiki/Plain_text) format. Generally, `send-email` and `format-patch` are very useful during development, so look at the documentation for the commands and you'll find some useful options such as: [git send-email](http://git-scm.com/docs/git-send-email) and [git format-patch](http://git-scm.com/docs/git-format-patch).
|
||||
|
||||
Note one important thing that your email must be in the [plain text](https://en.wikipedia.org/wiki/Plain_text) format. Generally these two `git` commands: `send-email` and `format-patch` are very useful during development, look on the documentation for this commands and you will find many interesting and useful options: [git send-email](http://git-scm.com/docs/git-send-email) and [git format-patch](http://git-scm.com/docs/git-format-patch).
|
||||
* Do not be surprised if you do not get an immediate answer after you send your patch. Maintainers can be very busy.
|
||||
|
||||
* Do not be surprised if you do not get an answer right away after you will send your patch. Maintainers are people too and people can sometimes be busy.
|
||||
* The [scripts](https://github.com/torvalds/linux/tree/master/scripts) directory contains many different useful scripts that are related to Linux kernel development. We already saw two scripts from this directory: the `checkpatch.pl` and the `get_maintainer.pl` scripts. Outside of those scripts, you can find the [stackusage](https://github.com/torvalds/linux/blob/master/scripts/stackusage) script that will print usage of the stack, [extract-vmlinux](https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) for extracting an uncompressed kernel image, and many others. Outside of the `scripts` directory you can find some very useful [scripts](https://github.com/lorenzo-stoakes/kernel-scripts) by [Lorenzo Stoakes](https://twitter.com/ljsloz) for kernel development.
|
||||
|
||||
* The [scripts](https://github.com/torvalds/linux/tree/master/scripts) directory contains many different useful scripts that are related to the Linux kernel development. We already saw two scripts from this directory: the `checkpatch.pl` and the `get_maintainer.pl` scripts. Besides these two, you can find [stackusage](https://github.com/torvalds/linux/blob/master/scripts/stackusage) script that will print usage of the stack as you can understand from the script's name, [extract-vmlinux](https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) for extracting uncompressed kernel image, and many others. Besides this `scripts` directory you can find some very useful [scripts](https://github.com/lorenzo-stoakes/kernel-scripts) by [Lorenzo Stoakes](https://twitter.com/ljsloz) for kernel development.
|
||||
* Subscribe to the Linux kernel mailing list. There are a large number of letters every day on `lkml`, but it is very useful to read them and understand things such as the current state of the Linux kernel. Other than `lkml` there are [set](http://vger.kernel.org/vger-lists.html) mailing listings which are related to the different Linux kernel subsystems.
|
||||
|
||||
* Subscribe on the Linux kernel mail listing. Yes, there is large flow of letters every day on `lkml`, but it is very useful to read and understand things like current state of the Linux kernel and etc. Besides this there is a [set](http://vger.kernel.org/vger-lists.html) of the mail listings which are related to the different Linux kernel subsystems.
|
||||
|
||||
* If your patch is not accepted from the first time and you have got feedback from Linux kernel developers, make changes and resend the patch with the `[PATCH vN]` prefix, where `N` is the number of patch version. For example:
|
||||
* If your patch is not accepted the first time and you receive feedback from Linux kernel developers, make your changes and resend the patch with the `[PATCH vN]` prefix (where `N` is the number of patch version). For example:
|
||||
|
||||
```
|
||||
[PATCH v2] staging/dgap: Use strpbrk() instead of dgap_sindex()
|
||||
```
|
||||
|
||||
Also it must contain changelog that will describe all changes changes from previous patch versions.
|
||||
|
||||
That's all. Of course, these are not all the subtleties of the Linux kernel development collected in this part, but some of the most important.
|
||||
Also it must contain changelog that will describe all changes changes from previous patch versions. Of course, this is not an exhaustive list of requirements for Linux kernel development, but some of the most important items were addressed.
|
||||
|
||||
Happy Hacking!
|
||||
|
||||
Conclusion
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
This is the end of this part and here we saw all steps from the getting source code of the Linux kernel to sending of a patch to the Linux kernel mailing list. Hope it will help you to join to the Linux kernel community.
|
||||
|
||||
If you have any questions or suggestions, write me an [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) on twitter.
|
||||
I hope this will help others join the Linux kernel community!
|
||||
If you have any questions or suggestions, write me at [email](kuleshovmail@gmail.com) or ping [me](https://twitter.com/0xAX) on twitter.
|
||||
|
||||
Please note that English is not my first language, and I am really sorry for any inconvenience. If you find any mistakes please let me know via email or send a PR.
|
||||
|
||||
|
@ -166,7 +166,7 @@ HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-p
|
||||
HOSTCXXFLAGS = -O2
|
||||
```
|
||||
|
||||
Next we get to the `CC` variable that represents compiler too, so why do we need the `HOST*` variables? `CC` is the target compiler that will be used during kernel compilation, but `HOSTCC` will be used during compilation of the set of the `host` programs (we will see it soon). After this we can see the definition of `KBUILD_MODULES` and `KBUILD_BUILTIN` variables that are used to determine what to compile (kernel, modules or both):
|
||||
Next we get to the `CC` variable that represents compiler too, so why do we need the `HOST*` variables? `CC` is the target compiler that will be used during kernel compilation, but `HOSTCC` will be used during compilation of the set of the `host` programs (we will see it soon). After this we can see the definition of `KBUILD_MODULES` and `KBUILD_BUILTIN` variables that are used to determine what to compile (modules, kernel, or both):
|
||||
|
||||
```Makefile
|
||||
KBUILD_MODULES :=
|
||||
@ -550,7 +550,7 @@ The first is `voffset.h` generated by the `sed` script that gets two addresses f
|
||||
#define VO__text 0xffffffff81000000
|
||||
```
|
||||
|
||||
They are start and end of the kernel. The second is `zoffset.h` depens on the `vmlinux` target from the [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/Makefile):
|
||||
They are the start and the end of the kernel. The second is `zoffset.h` depens on the `vmlinux` target from the [arch/x86/boot/compressed/Makefile](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/Makefile):
|
||||
|
||||
```Makefile
|
||||
$(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
|
||||
|
@ -11,7 +11,7 @@ Before we start to dive into the implementation of the system calls related stuf
|
||||
System call. What is it?
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
A system call is just a userspace request of a kernel service. Yes, the operating system kernel provides many services. When your program wants to write to or read from a file, starts to listen for connections on a [socket](https://en.wikipedia.org/wiki/Network_socket), delete or create a directory, or even to finish its work, a program uses a system call. In another words, a system call is just a [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) function that is placed in the kernel space and a user program can ask the kernel to do something via this function.
|
||||
A system call is just a userspace request of a kernel service. Yes, the operating system kernel provides many services. When your program wants to write to or read from a file, start to listen for connections on a [socket](https://en.wikipedia.org/wiki/Network_socket), delete or create directory, or even to finish its work, a program uses a system call. In another words, a system call is just a [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) function that is placed in the kernel space and an user program can ask kernel to do something via this function.
|
||||
|
||||
The Linux kernel provides a set of these functions and each architecture provides its own set. For example: the [x86_64](https://en.wikipedia.org/wiki/X86-64) provides [322](https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl) system calls and the [x86](https://en.wikipedia.org/wiki/X86) provides [358](https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_32.tbl) different system calls. Ok, a system call is just a function. Let's look on a simple `Hello world` example that's written in the assembly programming language:
|
||||
|
||||
@ -184,7 +184,7 @@ Yes, system calls are ubiquitous. Each program needs to open/write/read file, ne
|
||||
$ sudo cat /proc/1/comm
|
||||
systemd
|
||||
|
||||
$ sudo cat /proc/1/syscall
|
||||
$ sudo cat /proc/1/syscall
|
||||
232 0x4 0x7ffdf82e11b0 0x1f 0xffffffff 0x100 0x7ffdf82e11bf 0x7ffdf82e11a0 0x7f9114681193
|
||||
```
|
||||
|
||||
@ -197,7 +197,7 @@ $ ps ax | grep emacs
|
||||
$ sudo cat /proc/2093/comm
|
||||
emacs
|
||||
|
||||
$ sudo cat /proc/2093/syscall
|
||||
$ sudo cat /proc/2093/syscall
|
||||
270 0xf 0x7fff068a5a90 0x7fff068a5b10 0x0 0x7fff068a59c0 0x7fff068a59d0 0x7fff068a59b0 0x7f777dd8813c
|
||||
```
|
||||
|
||||
|
@ -6,7 +6,7 @@ vsyscalls and vDSO
|
||||
|
||||
This is the third part of the [chapter](http://0xax.gitbooks.io/linux-insides/content/SysCall/index.html) that describes system calls in the Linux kernel and we saw preparations after a system call caused by an userspace application and process of handling of a system call in the previous [part](http://0xax.gitbooks.io/linux-insides/content/SysCall/syscall-2.html). In this part we will look at two concepts that are very close to the system call concept, they are called `vsyscall` and `vdso`.
|
||||
|
||||
We already know what is a `system call`. This is special routine in the Linux kernel which userspace application asks to do privileged tasks, like to read or to write to a file, to open a socket and etc. As you maybe know, invoking a system call is an expensive operation in the Linux kernel, because the processor must interrupt the currently executing task and switch context to kernel mode, subsequently jumping again into userspace after the system call handler finishes its work. These two mechanisms - `vsyscall` and `vdso` are designed to speed up this process for certain system calls and in this part we will try to understand how these mechanisms are arranged.
|
||||
We already know what is a `system call`. This is special routine in the Linux kernel which userspace application asks to do privileged tasks, like to read or to write to a file, to open a socket and etc. As you may know, invoking a system call is an expensive operation in the Linux kernel, because the processor must interrupt the currently executing task and switch context to kernel mode, subsequently jumping again into userspace after the system call handler finishes its work. These two mechanisms - `vsyscall` and `vdso` are designed to speed up this process for certain system calls and in this part we will try to understand how these mechanisms work.
|
||||
|
||||
Introduction to vsyscalls
|
||||
--------------------------------------------------------------------------------
|
||||
@ -36,7 +36,7 @@ static inline void map_vsyscall(void) {}
|
||||
#endif
|
||||
```
|
||||
|
||||
As we can read in the help text, the `CONFIG_X86_VSYSCALL_EMULATION` configuration option: `Enable vsyscall emulation`. Why emulate `vsyscall`? Actually, the `vsyscall` is a legacy [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) due to the security reasons. Virtual system calls have fixed addresses, meaning that `vsyscall` page is still at the same location every time and the location of this page is determined in the `map_vsyscall` function. Let's look on the implementation of this function:
|
||||
As we can read in the help text, the `CONFIG_X86_VSYSCALL_EMULATION` configuration option: `Enable vsyscall emulation`. Why emulate `vsyscall`? Actually, the `vsyscall` is a legacy [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) due to security reasons. Virtual system calls have fixed addresses, meaning that `vsyscall` page is still at the same location every time and the location of this page is determined in the `map_vsyscall` function. Let's look on the implementation of this function:
|
||||
|
||||
```C
|
||||
void __init map_vsyscall(void)
|
||||
|
@ -6,7 +6,7 @@ Introduction
|
||||
|
||||
This is yet another post that opens new chapter in the [linux-insides](http://0xax.gitbooks.io/linux-insides/content/) book. The previous [part](https://0xax.gitbooks.io/linux-insides/content/SysCall/syscall-4.html) was a list part of the chapter that describes [system call](https://en.wikipedia.org/wiki/System_call) concept and now time is to start new chapter. As you can understand from the post's title, this chapter will be devoted to the `timers` and `time management` in the Linux kernel. The choice of topic for the current chapter is not accidental. Timers and generally time management are very important and widely used in the Linux kernel. The Linux kernel uses timers for various tasks, different timeouts for example in [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) implementation, the kernel must know current time, scheduling asynchronous functions, next event interrupt scheduling and many many more.
|
||||
|
||||
So, we will start to learn implementation of the different time management related stuff in this part. We will see different types of timers and how do different Linux kernel subsystems use them. As always we will start from the earliest part of the Linux kernel and will go through initialization process of the Linux kernel. We already did it in the special [chapter](https://0xax.gitbooks.io/linux-insides/content/Initialization/index.html) which describes initialization process of the Linux kernel, but as you may remember we missied some things there. And one of these is initialization of timers.
|
||||
So, we will start to learn implementation of the different time management related stuff in this part. We will see different types of timers and how do different Linux kernel subsystems use them. As always we will start from the earliest part of the Linux kernel and will go through initialization process of the Linux kernel. We already did it in the special [chapter](https://0xax.gitbooks.io/linux-insides/content/Initialization/index.html) which describes initialization process of the Linux kernel, but as you may remember we missied some things there. And one of them is the initialization of timers.
|
||||
|
||||
Let's start.
|
||||
|
||||
@ -168,7 +168,7 @@ Now we know a little theory about `jiffies` and we can return to the our functio
|
||||
The `clocksource` is hardware abstraction for a free-running counter.
|
||||
```
|
||||
|
||||
I don't know how about you, but this not full description didn't give me almost anything in understanding of the `clocksource` concept. Let's try to understand what is it, but we will not go deeper because this topic will be described in the separate part in details. The main point of the `clocksource` is timekeeping abstraction or in very simple words - it provides a time value to the kernel. We already know about `jiffies` interface that represents number of ticks that have occurred since the system booted. It represented by the global variable in the Linux kernel and incremented each timer interrupt. The Linux kernel can use `jiffies` for time measurement. So why do we need in separate context like the `clocksource`? Actually different hardware devices provide different clock sources that are widely in their capabilities. The availability of more precise techniques for time intervals measurement is hardware-dependent.
|
||||
I'm not sure about you, but that description didn't give a good understanding about the `clocksource` concept. Let's try to understand what is it, but we will not go deeper because this topic will be described in a separate part in much more detail. The main point of the `clocksource` is timekeeping abstraction or in very simple words - it provides a time value to the kernel. We already know about `jiffies` interface that represents number of ticks that have occurred since the system booted. It represented by the global variable in the Linux kernel and incremented each timer interrupt. The Linux kernel can use `jiffies` for time measurement. So why do we need in separate context like the `clocksource`? Actually different hardware devices provide different clock sources that are widely in their capabilities. The availability of more precise techniques for time intervals measurement is hardware-dependent.
|
||||
|
||||
For example `x86` has on-chip a 64-bit counter that is called [Time Stamp Counter](https://en.wikipedia.org/wiki/Time_Stamp_Counter) and its frequency can be equal to processor frequency. Or for example [High Precision Event Timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer) that consists of a `64-bit` counter of at least `10 MHz` frequency. Two different timers and they are both for `x86`. If we will add timers from other architectures, this only makes this problem more complex. The Linux kernel provides `clocksource` concept to solve the problem.
|
||||
|
||||
@ -354,7 +354,7 @@ We just saw initialization of two `jiffies` based clock sources in the previous
|
||||
* standard `jiffies` based clock source;
|
||||
* refined `jiffies` based clock source;
|
||||
|
||||
Don't worry if you didn't understand calculations there. They looks frighteningly at first. Soon, step by step we will learn these things. So, we just saw initialization of `jffies` based clock sources and also we know that the Linux kernel has the global variable `jiffies` that holds the number of ticks that have occured since the kernel started to work. Now, let's look how to use it. To use `jiffies` we just can use `jiffies` global variable by its name or with the call of the `get_jiffies_64` function. This function defined in the [kernel/time/jiffies.c](https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c) source code file and just returns full full `64-bit` value of the `jiffies`:
|
||||
Don't worry if you don't understand the calculations here. They look frightening at first. Soon, step by step we will learn these things. So, we just saw initialization of `jffies` based clock sources and also we know that the Linux kernel has the global variable `jiffies` that holds the number of ticks that have occured since the kernel started to work. Now, let's look how to use it. To use `jiffies` we just can use `jiffies` global variable by its name or with the call of the `get_jiffies_64` function. This function defined in the [kernel/time/jiffies.c](https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c) source code file and just returns full full `64-bit` value of the `jiffies`:
|
||||
|
||||
```C
|
||||
u64 get_jiffies_64(void)
|
||||
|
@ -73,3 +73,4 @@ Thank you to all contributors:
|
||||
* [aouelete](https://github.com/aouelete)
|
||||
* [Dennis Birkholz](https://github.com/dennisbirkholz)
|
||||
* [Anton Tyurin](https://github.com/noxiouz)
|
||||
* [Bogdan Kulbida](https://github.com/kulbida)
|
||||
|
Loading…
Reference in New Issue
Block a user