Around, er, two years ago, I reviewed a bunch of papers, and noted
some other topics I should learn more about. I read up on some of
those topics shortly afterwards, but never actually put together a
round of paper reviews covering highlights of what I'd read since
then. This is it. There are a lot fewer papers than I should have
read, but on the other hand, there's a lot of cool stuff that's just
readable as web pages now, so I almost feel less embarassed about
it...
Data structure and algorithms
- A Digital Signature Based on a Conventional Encryption Function
- Merkle This is the origin of Merkle trees (hash trees). The hash
tree idea is rather useful in many situations, but it's really not a
huge part of the paper, which is another clever crypto algorithm
paper. Fun, but less relevant than you'd expect for hash trees.
- "Hash tree" in Wikipedia I'm reviewing Wikipedia pages
now?! Apparently. Anyway, a much more useful introduction to hash
trees than the above paper, even if it misses out fun irrelevant crypto.
- "Error detection and correction" in Wikipedia While I'm at
it, error detection/correction codes are nutty, and I've never found a
decent introduction to them. On the other hand, reading a few
Wikipedia pages is actually quite a good way to get a basic idea of
the landscape. I also read the pages on Erasure Code, and
Turbo Code. Turbo codes are nutty magic. I need to read a
proper book on codes.
- Turbo Codes - Emilia Kaesper Another introduction to turbo
codes. It explains how they work, but doesn't give a good intuition as
to why they work. They continue to be magic to me.
- Space/Time Trade-offs in Hash Coding with Allowable Errors -
Bloom Bloom filters are awesome, and I'm embarassed I didn't know
about them sooner. They should have been somewhere on my undergrad
algorithms course. This paper introduces them. Read it.
- The Log-structured Merge-tree - O'Neil et al The LSM-tree
is basically "Keep multiple copies of the data in a hierarchy (e.g.
in-memory and on-disk), and then batch up and merge through, to allow
high-throughput writing with querying." The paper is, I thought,
disappointing - long-winded and not terribly inspiring.
- Consistent Hashing and Random Trees: Distributed Caching
Protocols for Relieving Hot Spots on the World Wide Web Ugh. An
ugly long paper title, presumably influenced by "it must be a trendy
web technology". This paper covers consistent hashing and
random trees, but it's only really consistent hashing that I wanted to
learn about. The paper hides the intuition behind it pretty badly
compared to, say, the Wikipedia page.
Google papers
I'd previously read about GFS, Bigtable etc., as well as Dynamo and
a few others, but there were plenty more Google papers to read. Bear
in mind I read these a couple of years before my Google start
date:
- Spanner: Google's Globally-Distributed Database This is
Google's replacement for the famous BigTable and
Megastore systems. You can tell it's a big deal by the number
of authors on the paper! Sadly, however, I don't get the big deal.
There appear to be two impressive angles: The timing aspect, and
SQL-like querying. The timing aspect seems mildly unimpressive to me -
if you synchronise your clocks, you can bound when each component sees
things. Er, great. And the SQL-like querying angle isn't really
covered by the paper. So, the level of innovation, as presented by the
paper, underwhelmed me. Ho hum.
- Experience Report: Haskell as a Reagent - Pop Having used
Haskell for a production system, I had some interest in seeing how
other people ended up using it. In the end, I found the paper a little
unexciting, but competent enough. Oh, and Haskell was being integrated
with Python. Ick. Since reading the paper, I've become a little more
interested in sysadmin approaches at Google. Ahem.
- A New ELF Linker - Taylor This describes the "gold" linker.
Pretty cool work by the guy who worked on the original GNU "ld".
- Modular Software Upgrades for Distributed Systems - Ajmani et
al Another Google one. Basically "How do you upgrade complex,
long-lived distributed systems?" "Carefully." Nice ideas, but just a
little more formalisation than was really necessary.
Misc
- The Athens Affair (IEEE Spectrum) Co-authored by a
co-author of the FPF paper (the rather prolific Diomidis Spinellis),
it's a discussion of the rather fascinating hack of a Greek cell-phone
network. It's one of the lesser-known publicly-known
probably-sponsored-by-the-US hacks.
- Barbarians at the Gateways - ACM Queue An introductory
article on the HFT arms race and the technology it involves. Light
fun.
- Finding and Understanding Bugs in C Compilers - Yang et al
Generate randomised test code for compilers, compile them, look for
bugs. Collect bugs, fix them, look for patterns. Really quite
interesting.
Posted 2015-05-27.