I should read more code. Working with other people and reading other people's code are good ways to learn, and I don't do enough of either. Working with other people is something I do at work, so I get some of that, but it doesn't get me looking at the code of world-famous, effective programmers. So, I should read more code.
I've wanted to read the Linux source for some time, but a modern version seems a bad place to start, with several million lines of source. The 1.0 kernel, on the other hand, has 170K lines total. Pretty practical.
It's something of a time capsule. I extracted it from my 1997 Linux Developer's Resource CDROM set (where it was already a few years old), and the feature-set is utterly different from even a few years later, let alone now. x86 only (no 'arch' directory). Single processor only. A floating-point emulator in case your CPU lacks an FPU. The number of drivers is... laughable. The design is clearly an extremely vanilla Unix clone - Tanenbaum's comments on Linux's retro design at the time seem very fair.
Looking at comments in the source, there are lines I'm reading going back 25 years, which is somewhat scary. When I first used Linux, a few years after that, aged 16, OS kernels seemed black magic - something I could compile and play with, but not truly understand. At the time, this was probably true, not just because of my inexperience, but also due to the lack of available documentation. There weren't wikis full of the minutae of x86 OS construction. Coming back at it now, it's wonderfully convenient, so I can fit it in at the end of a work day when the children are in bed.
Anyway, that's enough of the nostalgic back story. I'm reading the source. For warm-up, the FPU emulator and top-level docs, and then the boot process. My plan is to understand most things reasonably, but I'm not aiming for a fine-toothed comb on everything. I just don't have time.
drivers/FPU-emu I've also been reading a bit of Lua source. In comparison, the FPU emulator is wonderfully well-commented, even if I am skimming it. It makes it clear how much (tedious) work is required to get the details right in emulating an FPU. It also highlights the dearth of documentation available in the early '90s - the author notes they don't have access to a 486 or a copy of the IEEE spec, yet they still have a darned good go at emulating it!
CREDITS There are a decent number of recognisable names here. The most amusing is to see Raymond Chen, noted MS employee and "The Old New Thing" blogger, who contributed the Configure scripts.
Makefile Looking back, this thing is really very simple!
Configure A "little language" in the grand old tradition of Unix, it parses the "config.in" file and asks configuration questions off the back of it, and then generates the config files. Nice. As noted above, written by Raymond Chen.
config.in Wow, a seriously small number of drivers! It is warned that turning on verbase SCSI error warning will bloat your kernel by 12K!
build.c Before tackling the actual start-up process, I thought I should look at the tools used to construct the bootable image. "build.c" assembles the kernel from a boot block, a (16-bit) setup chunk, and then the main (32-bit) image. It's remarkably simple, perhaps because the files being glued together are very simple. The a.out format used makes it just a few lines of code to check the object file headers, strip them off, and glue them together into the final image.
zBoot/ The whole compressed image thing is charmingly straightforward, too. Rather than having a kernel with a base address of 0x1000, it is now based at 0x100000, and the initialisation code, er, starts at 0x1000, calls standard decompression code that'll undo gzip, and then calls into it at 0x100000. The tools that do all this are nice and simple.
inflate.c is hairy but standard (g)zip decompression implementation, so I skipped it.
boot/ Things move all over the place! "bootsect.S" gets loaded at 0x7c00, moves itself to 0x90000, loads "setup.S" after itself, and then puts the compressed system at 0x10000. "setup.s" calls a bunch of BIOS stuff, and moves the compressed system from 0x10000 to 0x1000 (obviously), before calling it in 32-bit mode. Most of the code seems to be about selecting and setting up the video mode.
At this point, we call into the "head.S" of zBoot, which decompresses the kernel, and then calls into the "head.S" of boot.
"head.S" is the final 32-bit start-up code. Turns out the initial start-up code is very similar to the "head.S" on zBoot, which isn't really that surprising. We then stash parameters down in the zero page and identify the processor. Set up paging, reload segment registers, and call "start_kernel"!
init/main.c Called by head.S. The first thing I see is lots of extern declarations in the .c file, and some shared structs being defined again. Very much not "Don't repeat yourself".
It starts everything up, moves into user mode as process 0, forks init (process 1) and enters the idle loop. How does entering user mode just work? It does a return from interrupt to the next instruction, only with the segment registers switched from kernel segments to user segments, which (for this process) are mapped to just be the same as the kernel segments, only user-readable.
"init" starts by calling the "setup" syscall. I need to find out how syscalls get dispatched! Then it tries to exec some version of init, and then there's a little bit of emergency fallback stuff. Done!
Posted 2015-02-19.