If you need to printf from without a _loop kernel, keep in mind that you need to add a branching manually for a specific loop position.
```
if ((loop_pos + i) == 0) printf ("%08x\n", a);
```
When using -n 1 -u 1 -T 1 you shouldn't add
```
if((gid==1) && (lid==1)) {
```
to your kernel, as there will be only one kernel thread, there's no gid==1 (only gid==0) so your code won't be executed.
Some last recommendations about printf() itself. Printing a string %s is not recommended. Missing zero bytes or big endian byte order can be very confusing. Instead try to use only the %08x template for everything. Especially for strings this makes a lot of sense, if for example you want to find unexpected non zero bytes. This can be done by calling printf() multiple times. Get used to this and it will simplify a lot of things for you.
To decide which type of kernel you want to write (pure or optimized), here are some recommendations when to write an optimized kernel implementation: