Observations

Kernel mode HTML parser

System calls are means through which user level processes can communicate with kernel. Though Linux kernel allows kernel code to invoke system calls.

This is not generally considered a good idea in terms of debugging, maintaining and porting the code. But if performance or size are absolutely necessary porting applications on kernel seems to have huge benefits.

The gain of performance comes for costly user/kernel space transition and associated data passing.

In order to measure timing benefits I implemented a rudimentary HTML parser in kernel space and a similar parser in userland.

Code snippets from kernel module for reading a html file and removing the html tags is as follows(Complete source available here):

	best = ~0;
	measure_time(0);
	tsc = best;
	printk(KERN_INFO "Time taken for no code: %ld\n", tsc);

	/*Measure time of reading a file*/
	/*Prepare to invoke system call*/
	fs = get_fs();	/*Save previous value*/
     	set_fs(get_ds());	/*use kernel limit*/
	/*Call system call*/
	fd = filp_open(FILE_NAME, O_RDONLY, 0600);

	if(fd->f_op && fd->f_op->read){
	    best = ~0;
	    measure_time(fd->f_op->read(fd, html, 1000, &fd->f_pos));
	    printk(KERN_INFO "Time taken by read: %ld\n", best-tsc);
	    parse_html(html, text); /*Parse html to text*/
	    printk(KERN_INFO "Parsed text: %s", text);
	}

This code parses the HTML by calling an ugly parser parse_html(Defined in common.h available here) which strips out the html tags.

While part of similar userland code is as follows(Complete source available here):

	/*time rdsc, i.e. no code*/
	best =~ 0;
	measure_time(0);
	tsc = best;
	printf("Time taken for no code: %ld\n", tsc);
	
	/*Measure time for reading a file*/
	fd = open(FILE_NAME, O_RDONLY, 0600);
	if(!fd){
	    printf("Error opening file\n");
	    exit(1);
	}
	best = ~0;
	measure_time(read(fd, html, 1000));
	printf("Time taken by read: %li\n", best - tsc);
	parse_html(html, text);
	printf("Parsed text: %s\n", text);

I collected following read time for first 5 runs:

Clock ticks/run 1st Run 2nd Run 3rd Run 4th Run 5th Run
Kernel HTML Parser 246 366 245 351 246
Userland HTML Parser 675 683 561 683 675
Thus, file read time in kernel outperforms userland code by around 3 times.

There are couple of interesting possibilities on porting application requiring high performance to kernel space. There already exists few including a Kernel mode web server. Ofcourse, the crash for a not properly tested module could cost more than their userland counterparts.

ReiserFS: To be or not to be

There seem to be quite alot of debate going on LKML over whether to include ReiserFS in kernel or not.
ReiserFS has been into problems over couple of things. Firstly, the way it was pushed by Hans Reiser was not liked by many.

Then there were talks over reliability of file system. As someone pointed out:
"The fact that reiserfs uses a single B-tree to store all of its data means that very entertaining things can happen if you lose a sector containing a high-level node in the tree.It's even more entertaining if you have image files (like initrd files) in reiserfs format stored in reiserfs, and you run the recovery program on the filesystem."

Another problem with ReiserFS is it's quest to integrate everything within filesystem. As an example it has plugins that can alter the symantics of files, like making files into directories inside which you could see meta-files like file/uid and file/size which contained meta-data and such accessible as normal files to all the unix tools. You could get things like chmod by just doing
'echo root >file/owner'.

Whether this is a good idea is quite debatable, as it is being long believed in Unix world that do one thing well and keep it simple. Next step in this direction could to parse the zip archives in kernel space for doing a 'cd linux-2.6.17.tar.bz2'(or is it already implemented) which does not sound like a good idea.
Moreover, this may require couple of changes in VFS.

I recently noticed missing readv system call in ReiserFS while calling it from kernel space.

Someone wrote an article on why ReiserFS is not included in kernel:
http://wiki.kernelnewbies.org/WhyReiser4IsNotIn

Although there seem to have been quite alot of development going on ReiserFS and it is installed as part of SUSE distributions. It seems some of the ideas implemented by this file system are unique and may be useful to other filesystems implemented in the future.

Java binary on Linux

Recently noticed support for Java binary on Linux kernel which means you can execute your Java applications simply as:
$ ./HelloWorld.class
And this can be achieved in following few steps(Assuming JDK is already installed and CLASSPATH properly configured):

  • Recompile your kernel with CONFIG_BINFMT_MISC option. This can be achieved as follows:
    #cd /usr/src/linux; make menuconfig
    Select "Executable file formats / Emulations" -> Kernel Support for MISC binaries
    Save and exit. Follow /usr/src/linux/README for further information on compiling the kernel.
    BINFMT_MISC can also be compiled as a independent module and inserted manually. This feature allows you to invoke almost any binary by simply typing it's name in shell. Refer to /usr/src/linux/Documentation/binfmt_misc.txt for more information on this.
  • Mount binfmt_misc and setup for Java executable:
    # mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
    # echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:' > /proc/sys/fs/binfmt_misc/register
  • Now execute the following:
    # cat /usr/src/linux/Documentation/java.txt |grep -m3 -A 195 "Cut here"
    and copy the first script as /usr/local/bin/javawrapper. This script will add the class file to classpath.
    Compile the second C program as follow:
    # gcc -O2 -o javaclassname javaclassname.c
    # cp javaclassname /usr/local/bin
    This executable is required to find the fully qualified class name i.e. for class Test.class in package foo.bar, it will return foo.bar.Test
  • 5. And now the fun part. Just chmod any Java class to execute it.
    # javac HelloWorld.java
    # chmod +x HelloWorld.class
    # ./HelloWorld.class
  • I gathered this from /usr/src/linux/Documentation/java.txt which can be referred for more information.

    XML feed