Subtitled "The breakneck race to create Windows NT and the next generation at Microsoft", the idea is that this is The Soul of a New Machine, but for software. In practice, it's... something else. It's a bit of a horror story
While Kidder's TSoaNM had elements of stress to it, it reads like the entire development of NT was dysfunctional and destructive. The focus of all this is Dave Cutler, who comes across as pretty darn horrible as far as modern management theory goes. The designer of VMS, he came in to lead NT when his previous project got canned at DEC. He's a perfectionist, which is kinda useful in a kernel dev, but the only way he seems able to express this is through aggression and even violence. He got on well with Steve Ballmer, which for me somehow sits nicely with Ballmer later running MS into the ground.
Cutler sounds like he has the biggest ego possible, which might be helpful when he's mostly right, but there are great examples where he was myopic and his ego just got in the way. Coming from the world of minicomputers, he built another large-machine OS that was supposed to run on 386s when memory was really expensive. He didn't take this into account from the start, so a lot of work went in later on memory optimisation.
Cutler decided not to make the OS pageable, which certainly increased memory pressure. Towards the end of the project, Rick Rashid, developer of Mach, got involved, adding kernel pageability to NT. Cutler didn't like this, but instead added his own version of kernel paging to NT. Just. So. Much. Ego.
NT was a huge project for its day, but Cutler was clearly a bad leader for it. The core kernel is just a small fraction of the whole, and in some ways not the bit that customers actually care about - it's assumed and the actual features and Windows personality on top of the core and all the rest of it are the real product. This meant that the overall Windows NT team was hundreds of people, but Cutler only really felt they could go up to 25 people or so. This wouldn't be so bad, if he could be leading the core kernel team in a wider project, but Cutler isn't a team player, and has to be the lead. Which sucks if they can't scale.
With 25 people, team members could get enough contact with Cutler to get desensitised, or, well, couldn't actually hide. With a couple of hundred engineers, his macho, aggressive style would make people avoid him, and poor information flows are anathema to decent project management.
All this male-centric, macho posturing made for a toxic work environment in more than one way. The author does give space to the rare women working on NT and at MS, and for example their campaign to ban nudes from the office. Yeah, not a particularly inclusive place. I feel the author Has Opinions on the MS culture.
The thing is, I don't believe any of this Cutler-worship was necessary. NT was an OS rewrite with a Big Name. On the other side, Apple has switched core OS (MacOS classic to OS X), and gone from 68K to PPC to Intel to ARM, all without needing a big ego technical expert. Modern internet systems make NT look like a toy, and leading a 25-person team project is nothing when there are platforms with thousands of developers. Sure, Cutler may have been a technical expert, but a team player would have worked so much better - enabling better org scaling, getting away from a "bus factor" of 1, and just having a better design when multiple experts can work together.
History shows the leadership of NT was BS.
This is made all the more painful by the "death march" section, seeing how NT's development managed to destroy or mangle so many families and marriages.
The book offers some decent technical insights. One of the strengths was the focus on the build process. Keeping the pipeline of builds flowing was key to keeping up momentum. Dogfooding, while costly to developer productivity, also helped find bugs and bring in useful features. It's interesting to think how they could have been improved further. It sounds like the build process was very manual, with a cost of human misery, and it's interesting to think how that could be improved. Behind that (and perhaps explaining the pain), there were no forced code reviews of tests on code check-in, so it's easy to see how the build could break.
It seems Cutler's attitude was "just write correct code", but I think he appreciated the usefulness of the tests. I imagine it would have been useful to put more of the testing onus on the devs rather than testers, and push to a more unit-testing-like approach. This would be tough to do for a kernel, but, seriously, catching bugs early is super-valuable. Similarly, I wonder how they could have started dogfooding earlier, given it took a couple of years from the start of the project to being able to do so. If you compare with Linux, that feels like dogfood even from the earliest days.
In any case, this book is both a good reminder of how important good development tools are to efficient development, and of the huge strides we've made since the early '90s in this area. Powerful open source source control systems and CI/CD pipelines are awesome.
NT was designed to support multiple "personalities", and targeting the Windows API, rather than primarily OS/2 was a big twist that came late. I guess this means the joke that "WNT is VMS + 1" is just that, and not a Cutler trick.
I was surprised to see that NTFS was a late addition to NT. It looked like they were thinking of having NT run on FAT or the OS/2 filesystem for quite some time. Even then, NTFS was at risk of being cut for lack of time.
The world of graphics seemed pretty messy, being a thing that Cutler didn't understand, and seemed to have a power vacuum I'd like to attribute to Cutler messing everything up. ;) They spent a year developing a system only to chuck it out, and experimented with using C++ (in the late '80s/early '90s), only to have it bite them. Cutler did not like C++, sticking to C, which I found interesting as I'd heard of NT as being the exciting new OS written in C++. So there you go.
Towards the end Michael Abrash comes in and fixed the graphics performance. Yeah, the guy who did Quake. Zachary is complimentary about very few people in this book, but Abrash comes off really well. I get the impression that the guy's legendary status is worth it just for making things good in that environment!
The historical perspective the book offers as a contemporaneous account is also pretty fascinating. The mention of Allchin working on Cairo (where everything was componentised) was with it aimed to be the Next Big Thing, before it was canned. The early discussion of NT being cross platform as RISC would be the next big thing, had a dissenter who thought RISC was overblown... and they were right at a 20-year horizon!
The book dicusses NT as the last big new OS; ironically Linux was just getting going (and while it started as a reimplementation of a simple classical uni-processor Unix, it has become so much more since then). OS design has become much less of a black art, with whole wikis devoted to developing DIY OSes.
The retrospective addendum from 2008 is even more interesting. While it identifies how NT missed mobile and the internet, the author still thinks OSes are an uninteresting space. Systems programming is the bedrock of modern internet-scale systems, and Cloud computing has made new parts of OS design intersting. The retrospective downplays MS, yet MS is now back again, thanks to Azure. I always find it fun to read retrospectives and see what they miss when they're looking at what they previously missed.The new MS has a reputation for being a rather kinder, gentler place - a de-Ballmered environment, and one that's dropped the Cutler-style confrontational egotistical attitudes. Looking back at the leadership style, Zachary justifies it as necessary to push the project forwards. I just don't buy it, as we've developed so many large-scale, complicated systems without that kind of rubbish. Focus on technical excellence does not mean treating people horribly. Hiring good people and treating them well... works, actually.
One of the things I think is interesting throughout all this is that I've had a very aggressive and perhaps slightly egotistical tech lead, way back when I started working (and, just like Cutler, he's technically extremely strong). I'm full Stockholm on this one, because he's still my friend. Partly I think he's changed over the years, but it's also something you can get away with in a small team where everyone can get used to the personas.
In a highly silo'd work environment where skill levels are highly variable, being competent and aggressive can get things done correctly across the silos, but it's far better to work somewhere with a better culture. Investment banks are not known for good tech culture! I also think that his approach would have been a disaster with a large team - he would always take a small high-calibre team over a large mediocre team, but I don't think it'd have worked if you ever wanted to assemble a large high-calibre team like they were going for with NT.
Hindsight is easy. It takes effort for me to remember how slow compiling even Linux 0.97 on a 486 with 4MB of RAM. Trying to build complex new systems with the hardware and software available at the time was a completely different challenge from what we do now, and the knowledge of modern best practices not available. It's hard to remember how different developing software to be released on disk was. Given that, the focus on the build process and dogfooding was excellent. Acting as an aggressive and egotistical diva was never acceptable.
Posted 2021-11-20.