Simon Frankau's blog

Paper round: Distributed Systems

I recently realised, to my horror, that I really haven't been keeping up with a fair old amount of research. So, I decided to catch up on 'distributed systems'.

Back when I was an undergrad, we had a course on 'concurrent systems', which dealt partly with locking and race conditions and stuff, and then did a little bit of the distributed end of things. When I finally discovered that expensive 'middleware' was mostly fancy networking wrappers, I was remarkably disappointed and underwhelmed. Making a pile of networked PCs work together sounded like the boring end of research.

Since then, distributed systems have become Key To The Web. Real practical uses, with real practical requirements, instead of half-hearted academic systems. Things have come a long way, and I just haven't kept up. It's rather ironic, really, since my first job out of uni was working on compute farm infrastructure. This managed a queue of short-run compute jobs, and associated data distribution, so it's nothing to write home about algorithm-wise, but it taught me a lot about middleware, failure modes and performance. And then, I moved on.

I didn't track the research, which is a bit embarassing given all these famous papers by Google, etc. I am finally catching up on the area. It's not exactly practical to set up a grid at home, but reading the theory is better than nothing...

Here are my paper mini-reviews. If I were still acting like a proper researcher I would provide a proper bibliography, but these papers are well-known enough that paper titles suffice.

The PageRank Citation Ranking: Bringing Order to the Web Ok, it's not a distributed systems paper, but given it's what effectively kicked off Google, it seemed worth reading. I'd never actually read the details of the PageRank algorithm, but it's both simple and clever. Elegant, in other words. The paper is well-written and clued up, and having read this paper I don't begrudge Brin and Page becoming billionaires! It also reminded me how bad web search had got by that point. The example web search for 'university' had a 'major commercial search engine' producing more-or-less junk, whereas PageRank produced something basically sane.

The Anatomy of a Large-Scale Hypertextual Web Search Engine 'Large-Scale' is severely relative. What was large back then is tiny now. This paper is pretty much a partner to the PageRank paper, focussing on system architecture, both in terms of how the algorithms are glued together, and how they are parallelised. This early-days paper is perhaps not terribly relevant now, but is certainly interesting (if only from a historical perspective :p ).

Web search for a planet: The Google cluster architecture A not terribly informative introduction that tells you that web search is parallelisable, and that best bang-per-buck is achieved with mid-range commodity hardware. Hmmm. Well, maybe it's not that bad, but it's not an earth-shattering paper.

The Google File System A very readable paper, describing Google's big, distributed file system (used to manage hundreds of terabytes across thousands of machines). Demonstrates a nicely simple design - don't bother with full Posix, and everything is entirely user-space (building the server on top of the native file system). Funny consistency model, neat operation log.

The Chubby lock server for loosely-coupled distributed systems After the GFS paper, quite a disappointment. Chubby provides a smaller, simpler file system with locks on the files. It's mostly used to hold configuration for other systems, provide locks for other systems, and provide a mechanism for other distributed systems to elect masters. Bits of it feel like a nameserver with subscription/non-timeout-based caching. This all-in-one behaviour, coupled with a lack of details on common usage patterns made it feel like a bit of a wishy-washy design to me. What problem is it really trying to solve? Yuck, yuck, yuck.

Bigtable: A distibuted storage system for structured data This is really just a key-value-ish database built on top of GFS. It feels like another simple design, done well, explained well. The immutable design is rather neat.

MapReduce: Simplified data processing on large clusters Another popular yet underwhelming paper. The idea is simultaneously neat and straightforward: perform your grid calculations in a stylised map and aggregate format, and use that structure to make the gridded implementation simple, efficient, etc. The 'reduce' part is basically a monoid, and I think that whole area could be cleaned up by talking about it in that way. But maybe I'm just a Haskell nutter.

Dynamo: Amazon's highly available key-value store The paper doesn't start off well. It reads like an advert for just how darned brilliant Amazon are. As you read through the paper, either the advertising falls away, or you become immunised to it. What's left is a rather good paper. Dynamo is highly optimised for uniform good performance, and the paper goes into a lot of detail about the engineering. There's a lot more focus on the algorithms than in the Google papers, and it feels like they've built a complex system on top of pre-existing research, in a way that the Google descriptions don't. It's well-written and introduced me to various algorithms and ideas I hadn't really looked into before.

MAD Skills: New Analysis Practices for Big Data Randomly recommended by Wikipedia as an influence on modern Big Data stuff, this paper tries to sell the idea of agile practices for Big Data, rather than creating a fixed schema data warehouse and making it difficut to pile new data in. The start of the paper felt like it was trying too hard (UrbanDictionary.com reference, anyone?), but it soon settled down, into a spirited promotion of an agile approach (at the expense of a somewhat straw man waterfall approach, it seems), plus an explanation of how to do linear algebra and machine learning in SQL. I'm still somewhat in two minds about SQL for Big Data, but there you go...

Apache Hadoop Goes Realtime at Facebook Another Wikipedia recommendation, and the first one to really dig into using Hadoop. A fairly neat introduction into what it takes to pick up Hadoop and use it for a production-scale and production-reliability real-time system. An interesting combination of showing that such open source software in this area can be used for the big time, and what customisations Facebook needed to make. Technically, compared to most of the above papers, it's not much, but in effect it's showing a flattening in the technology curve - you can now get the kind of software previously only described in research papers off the shelf.

So, I've read enough papers to keep me going for a while. On the other hand, it's shown me how much more stuff I should really read up on. Cool things that I read about that I want to read more about include:

Paxos and Byzantine fault tolerance, etc.
Gossip protocols and anti-entropy protocols
Bentley-McIlroy compression
Turbo codes, erasure codes
Consistent hashing
Merkle trees
Log-structured merge trees

Plenty to go!

BONUS PAPER: "Ulrich Drepper's How To Write Shared Libraries" One of our work projects is a library that works on both Windows and Linux. Once I'd discovered that Linux shared object files are dynamically linked purely by symbol names, not by library name/symbol name pair, I couldn't believe it. Ok, it gives you LD_PRELOAD, but leads to so many other possible brokennesses. I went and read a paper on it:

This (long) paper goes into excruciating depth about how it all works, and all the pain involved. It's very readable and comprehensive, and tells you all the things not to do, plus various micro-optimisations suitable for making a really efficient libc. It covers API and ABI compatibility, and is generally a good read, even if you can tell from time to time that English isn't his first language.

Posted 2012-07-11.

Full index