Simon Frankau's blog

Paper round: Odds and ends

Around, er, two years ago, I reviewed a bunch of papers, and noted some other topics I should learn more about. I read up on some of those topics shortly afterwards, but never actually put together a round of paper reviews covering highlights of what I'd read since then. This is it. There are a lot fewer papers than I should have read, but on the other hand, there's a lot of cool stuff that's just readable as web pages now, so I almost feel less embarassed about it...

Data structure and algorithms

A Digital Signature Based on a Conventional Encryption Function - Merkle This is the origin of Merkle trees (hash trees). The hash tree idea is rather useful in many situations, but it's really not a huge part of the paper, which is another clever crypto algorithm paper. Fun, but less relevant than you'd expect for hash trees.
"Hash tree" in Wikipedia I'm reviewing Wikipedia pages now?! Apparently. Anyway, a much more useful introduction to hash trees than the above paper, even if it misses out fun irrelevant crypto.
"Error detection and correction" in Wikipedia While I'm at it, error detection/correction codes are nutty, and I've never found a decent introduction to them. On the other hand, reading a few Wikipedia pages is actually quite a good way to get a basic idea of the landscape. I also read the pages on Erasure Code, and Turbo Code. Turbo codes are nutty magic. I need to read a proper book on codes.
Turbo Codes - Emilia Kaesper Another introduction to turbo codes. It explains how they work, but doesn't give a good intuition as to why they work. They continue to be magic to me.
Space/Time Trade-offs in Hash Coding with Allowable Errors - Bloom Bloom filters are awesome, and I'm embarassed I didn't know about them sooner. They should have been somewhere on my undergrad algorithms course. This paper introduces them. Read it.
The Log-structured Merge-tree - O'Neil et al The LSM-tree is basically "Keep multiple copies of the data in a hierarchy (e.g. in-memory and on-disk), and then batch up and merge through, to allow high-throughput writing with querying." The paper is, I thought, disappointing - long-winded and not terribly inspiring.
Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web Ugh. An ugly long paper title, presumably influenced by "it must be a trendy web technology". This paper covers consistent hashing and random trees, but it's only really consistent hashing that I wanted to learn about. The paper hides the intuition behind it pretty badly compared to, say, the Wikipedia page.

Google papers

I'd previously read about GFS, Bigtable etc., as well as Dynamo and a few others, but there were plenty more Google papers to read. Bear in mind I read these a couple of years before my Google start date:

Spanner: Google's Globally-Distributed Database This is Google's replacement for the famous BigTable and Megastore systems. You can tell it's a big deal by the number of authors on the paper! Sadly, however, I don't get the big deal. There appear to be two impressive angles: The timing aspect, and SQL-like querying. The timing aspect seems mildly unimpressive to me - if you synchronise your clocks, you can bound when each component sees things. Er, great. And the SQL-like querying angle isn't really covered by the paper. So, the level of innovation, as presented by the paper, underwhelmed me. Ho hum.
Experience Report: Haskell as a Reagent - Pop Having used Haskell for a production system, I had some interest in seeing how other people ended up using it. In the end, I found the paper a little unexciting, but competent enough. Oh, and Haskell was being integrated with Python. Ick. Since reading the paper, I've become a little more interested in sysadmin approaches at Google. Ahem.
A New ELF Linker - Taylor This describes the "gold" linker. Pretty cool work by the guy who worked on the original GNU "ld".
Modular Software Upgrades for Distributed Systems - Ajmani et al Another Google one. Basically "How do you upgrade complex, long-lived distributed systems?" "Carefully." Nice ideas, but just a little more formalisation than was really necessary.

Misc

The Athens Affair (IEEE Spectrum) Co-authored by a co-author of the FPF paper (the rather prolific Diomidis Spinellis), it's a discussion of the rather fascinating hack of a Greek cell-phone network. It's one of the lesser-known publicly-known probably-sponsored-by-the-US hacks.
Barbarians at the Gateways - ACM Queue An introductory article on the HFT arms race and the technology it involves. Light fun.
Finding and Understanding Bugs in C Compilers - Yang et al Generate randomised test code for compilers, compile them, look for bugs. Collect bugs, fix them, look for patterns. Really quite interesting.

Posted 2015-05-27.

Full index