8.8 KiB
Data Structures in the Linux Kernel
Doubly linked list
Linux kernel provides it's own doubly linked list implementation which you can find in the include/linux/list.h. We will start Data Structures in the Linux kernel
from the doubly linked list data structure. Why? Because it is very popular in the kernel, just try to search
First of all let's look on the main structure:
struct list_head {
struct list_head *next, *prev;
};
You can note that it is different from many lists implementations which you could see. For example this doubly linked list structure from the glib:
struct GList {
gpointer data;
GList *next;
GList *prev;
};
Usually linked list structure contains pointer to the item. Linux kernel implementation of the list has not. So the main question is here - where list stores data?
. Actually implementation of lists in the kernel is - Intrusive list
. An intrusive linked list does not contain data in it's nodes and nodes just contain pointers to the to the next and previous node and list nodes part of the data that are added to the list. This makes data structure generic, so it does not care about entry data type anymore.
For example:
struct nmi_desc {
spinlock_t lock;
struct list_head head;
};
Let's look on some examples, how list_head
uses in the kernel. As i already wrote about, there are many, really many different places where lists are used in the kernel. Let's look for example in miscellaneous character drivers. Misc character drivers API from the drivers/char/misc.c for writing small drivers for handling simple hardware or virtual devices. This drivers share major number:
#define MISC_MAJOR 10
but has own minor number. For example you can see it with:
ls -l /dev | grep 10
crw------- 1 root root 10, 235 Mar 21 12:01 autofs
drwxr-xr-x 10 root root 200 Mar 21 12:01 cpu
crw------- 1 root root 10, 62 Mar 21 12:01 cpu_dma_latency
crw------- 1 root root 10, 203 Mar 21 12:01 cuse
drwxr-xr-x 2 root root 100 Mar 21 12:01 dri
crw-rw-rw- 1 root root 10, 229 Mar 21 12:01 fuse
crw------- 1 root root 10, 228 Mar 21 12:01 hpet
crw------- 1 root root 10, 183 Mar 21 12:01 hwrng
crw-rw----+ 1 root kvm 10, 232 Mar 21 12:01 kvm
crw-rw---- 1 root disk 10, 237 Mar 21 12:01 loop-control
crw------- 1 root root 10, 227 Mar 21 12:01 mcelog
crw------- 1 root root 10, 59 Mar 21 12:01 memory_bandwidth
crw------- 1 root root 10, 61 Mar 21 12:01 network_latency
crw------- 1 root root 10, 60 Mar 21 12:01 network_throughput
crw-r----- 1 root kmem 10, 144 Mar 21 12:01 nvram
brw-rw---- 1 root disk 1, 10 Mar 21 12:01 ram10
crw--w---- 1 root tty 4, 10 Mar 21 12:01 tty10
crw-rw---- 1 root dialout 4, 74 Mar 21 12:01 ttyS10
crw------- 1 root root 10, 63 Mar 21 12:01 vga_arbiter
crw------- 1 root root 10, 137 Mar 21 12:01 vhci
Now let's look how lists are used in the misc device drivers. First of all let's look on miscdevice
structure:
struct miscdevice
{
int minor;
const char *name;
const struct file_operations *fops;
struct list_head list;
struct device *parent;
struct device *this_device;
const char *nodename;
mode_t mode;
};
We can see the fourth field in the miscdevice
structure - list
which is list of registered devices. In the beginning of the source code file we can see definition of the:
static LIST_HEAD(misc_list);
which expands to definition of the variables with list_head
type:
#define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)
and initializes it with the LIST_HEAD_INIT
macro which set previous and next entries:
#define LIST_HEAD_INIT(name) { &(name), &(name) }
Now let's look on the misc_register
function which registers a miscellaneous device. At the start it initializes miscdevice->list
with the INIT_LIST_HEAD
function:
INIT_LIST_HEAD(&misc->list);
which does the same that LIST_HEAD_INIT
macro:
static inline void INIT_LIST_HEAD(struct list_head *list)
{
list->next = list;
list->prev = list;
}
In the next step after device created with the device_create
function we add it to the miscellaneous devices list with:
list_add(&misc->list, &misc_list);
Kernel list.h
provides this API for the addition of new entry to the list. Let's look on it's implementation:
static inline void list_add(struct list_head *new, struct list_head *head)
{
__list_add(new, head, head->next);
}
It just calls internal function __list_add
with the 3 given paramters:
- new - new entry;
- head - list head after which will be inserted new item;
- head->next - next item after list head.
Implementation of the __list_add
is pretty simple:
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
Here we set new item between prev
and next
. So misc
list which we defined at the start with the LIST_HEAD_INIT
macro will contain previous and next pointers to the miscdevice->list
.
There is still only one question how to get list's entry. There is special special macro for this point:
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
which gets three parameters:
- ptr - the structure list_head pointer;
- type - structure type;
- member - the name of the list_head within the struct;
For example:
const struct miscdevice *p = list_entry(v, struct miscdevice, list)
After this we can access to the any miscdevice
field with p->minor
or p->name
and etc... Let's look on the list_entry
implementation:
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
As we can see it just calls container_of
macro with the same arguments. For the first look container_of
looks strange:
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
First of all you can note that it consists from two expressions in curly brackets. Compiler will evaluate the whole block in the curly braces and use the value of the last expression.
For example:
#include <stdio.h>
int main() {
int i = 0;
printf("i = %d\n", ({++i; ++i;}));
return 0;
}
will print 2
.
The next point is typeof
, it's simple. As you can understand from it's name, it just returns the type of the given varible. When I first time saw implementation of the container_of
macro, the stranges thing for me was the zero in the ((type *)0)
expression. Actually this pointers magic calculates the offset of the given field from the address of the structure, but as we have 0
here, it will be just adds offset of the given field to zero. Let's look on the one simple example:
#include <stdio.h>
struct s {
int field1;
char field2;
char field3;
};
int main() {
printf("%p\n", &((struct s*)0)->field3);
return 0;
}
will print 0x5
.
The next offsetof macro calculates offset from the beggining of the structure to the given structure's field. It's implementation very similar to the previous code:
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
Let's summarize all about container_of
macro. container_of
macro returns address of the structure by the given address of the structure's field with list_head
type, the name of the structure field with list_head
type and type of the container structure. At the first line this macro declares the __mptr
pointer which points to the field of the structure that ptr
points to and assigns it to the ptr
. Now ptr
and __mptr
points have the same address. Techincally we no need in this line, it needs for the type checking. First line ensures that that given structure (type
paramter) has a member which called member
. In the second line it calculates offset of the field from the structure with the offsetof
macro and substracts it from the structure address. That's all.
Of course list_add
and list_entry
is not only functions which provides <linux/list.h>
. Implementation of the doubly linked list provides the following API:
- list_add
- list_add_tail
- list_del
- list_replace
- list_move
- list_is_last
- list_empty
- list_cut_position
- list_splice
and many more.