Patterns for Fault Tolerant Software - Robert S. Hanmer

Patterns books are often right up there with Agile books for stating the bleeding obvious really, really slowly. This book is not quite like that. Perhaps it's because I didn't have much background in fault tolerant stuff, so much of the content was new to me. Perhaps it's because it's not really a patterns book, but it got shoe-horned that way, probably because this is how architecture is explained now.

Before I get into why this is a funny book, I should probably mention my motivation. When writing quant library code for exotic derivatives, something going wrong is bad, but not the end of the world. The time-frame of action is not "It's not been working for seconds, argh!", and there's a human cynically looking at all the numbers we produce (people taking the numbers coming out of quant models based on tractable but naive assumptions at face value is a very, very bad thing). When writing algorithmic trading systems, as I do now, a system stopping or going mad is really bad. However, there's not much of a formal approach around making our systems fault tolerant. I thought I would do some reading.

This book is not really a patterns book. It's a monograph - "Explaining the interlinked fault-tolerant concepts Robert Hanmer uses when designing fault-tolerant systems". You can tell this because it occasionally flails around to find pertinent examples outside of his domain, and it doesn't click well, a bit like a hastily-put-together "Related Work" section in a research paper.

There's a picture of a satellite on the cover, but that's misleading. Space (and aeronautic) stuff is one of the more well-known cases of complex fault tolerant software, but he worked on the other one - telecoms. Specifically, 4ESS, which has a ridiculous number of 9s of reliability. So, he knows his stuff.

However, the concepts are spread out in full pattern style, which means that despite the content not being stupidly obvious, it does become highly repetitive as the same thing gets discussed in multiple patterns. The need for pattern books to have lots of diagrams or illustrations leads to lots of massively, amusingly spurious pictures.

Fundamentally, though, the underlying content is excellent. The mental framework discussed is logical, the concepts placed within the framework are solid, the whole thing is very clear. Trade-offs are discussed instead of silver bullets, and it's got a great sense of pragmatism.

For some reason, this sane (and in places rather plodding) book reminds me of the utterly enjoyable but totally nutty Systems Bible (aka Systemantics). Perhaps it's the weird diagrams and pervasive sense of paranoia. Who knows. In any case, I can forgive the cross-reference repetition as an unfortunate necessity, as underneath that this book is thorough and excellent.

Posted 2014-05-03.