A Philosophy of Software Design - John Ousterhout

John Ousterhout is a Stanford CS professor, and has done many things, including having his name on the Raft paper. On the other hand, the thing he's most well-known for is Tcl, so maybe I'm not totally convinced of his philosophy. :) However, I'm interested to hear what he says.

The book is small - both A5ish, and only 170 or so pages of content. Surprisingly it seems to be Amazon print-on-demand - perhaps he's expecting most people to take the e-book? It talks about good design, and it repeats much standard wisdom. The main thing it brings to the conversation, though, is the concept of "deep" modules - modules with simple interfaces and complicated internals, that make powerful abstractions.

This sounds very straightforward - for example a sorted map structure that hides a balanced tree implementation fits this well. Ousterhout is also willing to name and shame examples of bad interfaces - pointing out that to deserialise a Java object you need to construct and chain a FileInputStream, BufferedInputStream and ObjectInputStream. And Unix is given as a good example, with open, read, write, lseek and close forming most of the interface.

However, I think there needs to be more subtlety. Sometimes you need the control - Unix's interface gets into a messy pile of ioctls and fnctls to handle all the other cases, and TCP/IP's simple abstraction through the socket interface doesn't help you when the abstraction leaks and you need to debug production. Simple interfaces need escape hatches for advanced users.

The Java file I/O situation is interesting, because it provides the escape hatch by default - you get all the grubby details and can build a stack of Streams as you like - but it's almost certainly the wrong interface by default. However, it's still an attractive design for the implementation, separating out concerns. Some factory methods can build the useful default stacks, and you could break out individual constructors if you need something non-standard, but this wasn't discussed.

The idea of deep modules is to give you a lot of functionality without having to put much in your head. This is almost the exact opposite of various Haskell combinator libraries, where you get dozens of combinators that each do very little, and you know what's inside them, yet you're still expected to assemble them together. I think the idea behind these combinator libraries is to change the way you think and work, without constraining the functionality you build. In contrast, Ousterhout's deep modules don't tell you how to approach coding, but have strong opinions on the functionality they expose to you.

Deep modules make me think of hi-fi separates - big chunks of functionality with clear and simple interfaces that you can plug together without needing to know the details. Haskell combinators are like individual components.

Both in my current role as a Site Reliability Engineer, and indeed on the edge of my previous software engineering roles, things have got most interesting when abstractions leak. One could claim that good modules don't leak their abstractions. I think this is not the case - especially for the kind of modules that this book proposes, that try to make life as easy as possible for the user, encapsulate more, and thus have more that can leak when reality strikes. This book doesn't talk about leaky abstractions, and how to leak safely and effectively, which I think is a shame, as that's where things get really interesting.

The chapter on comments is quite refreshing, since not only does it do the usual "What/why, not how" spiel, but it also gives some real-world examples and they're way more verbose than I'd normally expect. I read them and... understand what the intention is. When I've looked at random pieces of open source code, this has not been the case, and I've felt that maybe I'm dumb. The book makes me feel vindicated. :)

The chapter on naming conventions is also interesting. In particular, it talks about a bug where the same variable name, "block", was used repeatedly to reference file and disk block ids, and a horrific bug that resulted when this lead to one value being used in the other context. The book suggests using "fileBlock" and "diskBlock" instead. I find this fascinating because at this point I'm just jumping up and down shouting "Type systems! Different types would prevent this!", and I assume this is just a generational thing, and there are just a bunch of super-senior engineers who are never going to get type systems until they retire! :)

Overall, though, I like this book. Having come at the world from a slightly different angle, there are bits I disagree with, but there are large swathes I really like, backed up by a solid career designing many systems, and teaching software design. On the whole, recommended.

Posted 2018-12-15.