Kernel mode HTML parser

System calls are means through which user level processes can communicate with kernel. Though Linux kernel allows kernel code to invoke system calls.

This is not generally considered a good idea in terms of debugging, maintaining and porting the code. But if performance or size are absolutely necessary porting applications on kernel seems to have huge benefits.

The gain of performance comes for costly user/kernel space transition and associated data passing.

In order to measure timing benefits I implemented a rudimentary HTML parser in kernel space and a similar parser in userland.

Code snippets from kernel module for reading a html file and removing the html tags is as follows(Complete source available here):

	best = ~0;
	tsc = best;
	printk(KERN_INFO "Time taken for no code: %ld\n", tsc);

	/*Measure time of reading a file*/
	/*Prepare to invoke system call*/
	fs = get_fs();	/*Save previous value*/
     	set_fs(get_ds());	/*use kernel limit*/
	/*Call system call*/
	fd = filp_open(FILE_NAME, O_RDONLY, 0600);

	if(fd->f_op && fd->f_op->read){
	    best = ~0;
	    measure_time(fd->f_op->read(fd, html, 1000, &fd->f_pos));
	    printk(KERN_INFO "Time taken by read: %ld\n", best-tsc);
	    parse_html(html, text); /*Parse html to text*/
	    printk(KERN_INFO "Parsed text: %s", text);

This code parses the HTML by calling an ugly parser parse_html(Defined in common.h available here) which strips out the html tags.

While part of similar userland code is as follows(Complete source available here):

	/*time rdsc, i.e. no code*/
	best =~ 0;
	tsc = best;
	printf("Time taken for no code: %ld\n", tsc);
	/*Measure time for reading a file*/
	fd = open(FILE_NAME, O_RDONLY, 0600);
	    printf("Error opening file\n");
	best = ~0;
	measure_time(read(fd, html, 1000));
	printf("Time taken by read: %li\n", best - tsc);
	parse_html(html, text);
	printf("Parsed text: %s\n", text);

I collected following read time for first 5 runs:

Clock ticks/run 1st Run 2nd Run 3rd Run 4th Run 5th Run
Kernel HTML Parser 246 366 245 351 246
Userland HTML Parser 675 683 561 683 675
Thus, file read time in kernel outperforms userland code by around 3 times.

There are couple of interesting possibilities on porting application requiring high performance to kernel space. There already exists few including a Kernel mode web server. Ofcourse, the crash for a not properly tested module could cost more than their userland counterparts.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
amedeus1977 Says:
Tue, 2010-07-27 02:50

wholesale cz chains,M3 Real card,cz bearrings
angles,wholesale mobile phones china

mkaymer Says:
Wed, 2010-08-18 18:21

JbAgzv mimlvoahayhg, [url=]sujdrgjksfai[/url], [link=]lehvahnqyxaf[/link],

mkaymer Says:
Mon, 2010-08-23 23:17
mkaymer Says:
Tue, 2010-08-24 21:43
mkaymer Says:
Wed, 2010-08-25 15:37
mkaymer Says:
Thu, 2010-08-26 03:57
beats666 Says:
Fri, 2011-04-29 05:43

Huge Selection of Monster Headphones. Beats Headphones are Always on Sale. Save Up to 46% Off. New Collection Summer 2011 Direct Sale. Monster Beats delivers all the power, clarity, and deep bass today's top artists and producers want you to hear. BUY NOW!

Post new comment


  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.