About a kernel panic and in praise of the wonderous compiler that is GCC
I have been mucking around with the xv6 kernel lately, adding my ugly code and diminishing it’s elegance. This has resulted in infinite loops, page faults, triple faults, and kernel panics.
This post is about a stupid piece of code I wrote today, and cool shit I learned while debugging it.
While I was running tests for my system calls, the kernel panicked. As the kernel manages all hardware resources, it must be extremely secure. A null pointer derefernce or a deadlock in the kernel, or something that might lead to undefined behaviour is to be prevented at all costs. So, whenever there is a trap or exception inside the kernel, the kernel panics.
First, I tried to replicate the error. Losing the error would have been horrible, because that would mean an undetected bug. Thankfully, it happened again. I started thinking about rewriting my code, regretting the fact that I had not made proper commits in my frenzy.
Then I got to my senses, and tried to find out the reason behind the trap.
Trap number 6. Inspect traps.h. Invalid Opcode. This made me hypothesize that somehow, the user program had gained access to kernel memory, and had overwritten the code segment with some invalid value. But then, the user program I was running made no system call where the kernel dereferenced a pointer passed by the user.
Then, I looked at the eip.
eip = 801047e3
Let’s take a look at the kernel assembly code- if there is a valid instruction in the assembly code, and yet there is a trap, it means that something nasty happened and a non-malicious code had by fluke succeeded in changing the kernel code, and making it panic.
objdump -D kernel | less
This was my code! What is ud2? Googling yielded this.
ud2 is an instruction which is meant to raise an invalid opcode exception. What is ud2 doing in my code? And which is the exact snippet that got compiled down to ud2?
For debugging with gdb, a symbol table is needed. The xv6 makefile has a debugging mode recipe, which generates .asm files (which are needed for line by line source level debugging . Opening up kernel.asm … Ah! Here is ud2! Below p->process->threadcount -= 1. Now, there is nothing in this line of code that should generate ud2.
So I searched some more. Google “when does gcc generate ud2”. Found this answer on StackOverflow. NULL pointer derefernce!
And sure enough, on the line just above the line incrementing the thread count, there was that fateful line of code saying p->process = 0; Wow. This is what you get for arbitrarily inserting code without thinking.
It seems that GCC has a lot of tricks up it’s sleeves. This has happened to me before- looking at the differences in the object code generated with and without the -O3 flag, and a recent experience where GCC reordered memory accesses and caused my communicating threads to deadlock. It’s a good idea to inspect assembly code once in a while.