A number of months ago, I moved to a team which does algorithmic trading in Java. I had little expertise in algorithmic trading or Java, so I did a little bit of reading. Not a huge amount, but just enough to get an idea of the field. I also just read a bunch of papers that seemed fun that I hadn't got around to otherwise.
Making a Faster Cryptoanalytic Time-Memory Trade-Off - Philippe Oechslin This paper describes rainbow tables, which is a space efficient way of storing password reverse look-up tables. I'd seen them about the place, including being referenced by (but not understood by) Cory Doctorow - see the awful Knights of the Rainbow Tables for example. I'd read the Wikipedia explanation, but it never really clicked. So, I read the original paper, and it made sense for a while. It's slightly fallen out of my head now, but it clicked nicely for a while. :)
Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine - Simon Peyton Jones A while ago, I tried reading SPJ's book on lazy language implementations, which went via spines and all the rest. I move incrementally from finding it heavy going to getting lost. Given that the STG is core to GHC, and Haskell was my day job for so long, I thought I owed it to try to understand the STG. So, I read this extended paper, and it's very straightforward! Without spines and all that, and devoid of the memoisation part, the non-strict evaluation is quite clear, and the virtual machine it describes is not far off a strict machine. The adding of updates for memoising results is clever and subtle, but it written in a way that makes it feel not totally impossible. It's a really well-written exposition that makes the whole thing come together for me.
JSR 133 (Java Memory Model) FAQ - Jeremy Manson and Brian Goetz As the algo trading system I work with is (unsurprisingly) multithreaded, and the Java memory model is new since I last used Java in anger (i.e. around the turn of the millennium), I thought it would be a good idea to get up to speed on what Java guarantees. So, this document provides a nice summary of that.
JSR-133: Java Memory Model and Thread Specification This is the official spec itself. It's got a formal model. However, I found it quite disappointing. The informal rationale didn't really make things clear, and the formal model was simultaneously a bit woolly in places, and actually surprisingly unintuitive. It could just be that I'm a bit dumb. However, I like to pretend that I'm not, and even if I were, a decent write-up would allow me to understand it easilt enough anyway.
The JSR-133 Cookbook for Compiler Wrters - Doug Lea And this one tries to cover everything again from another angle - that of a compiler writer. A slightly interesting alternative, but not really terribly insightful.
Generics in the Java Programming Language - Gilad Bracha The other particularly funky thing that turned up in Java was generics. The basic idea of generics is simple. The full subtle complexity is... subtle and complex. You can do most stuff without worrying about it. I hoped this document might cover the other side. It didn't. I think I might need to go to the spec.
The Reyes Image Rendering Architecture - Cook, Carpenter and Catmull That's Java, so now a random other paper. And it's a bit of a classic. Reyes is the algorithm behind Renderman, Pixar's rendering system. So, it's a kinda lightweight, high-level paper, but it does give you the flavour of it, micro-polygons and all. It's all rather interesting, and well worth reading.
NVIDIA Fermi compute architecture Not actually related to Reyes! So, in my last role, we were looking at GPUs for computation. I realised that I didn't really know what the architecture involved really was. Turns out, it's basically a multiprocessor system where each processor is itself a SIMD processor with multithreaded scheduling (a la Hyperthreading). Not a surprise, I guess, but fun to understand where the state of the art is/was.
Ecology of the Modern Institutional Spot FX: The EBS Market in 2011 - Anatoly Schmidt EBS is one of the online exchanges. Some people access it through GUIs, others through algorithmic trading systems. It's interesting to see how it's used, and who makes money off who, etc. Also, it's interesting to see the size of the market, the overall structure, etc. This paper focusses on liquidity in this market, but does provide some other interesting background.
The Trading Profits of High Frequency Traders - Baron, Brogaad, Kirilenko This paper tries to work out who profits from whom, in terms of traders they categorised as aggressive, medium and passive HTFs, fundamental traders, small traders, non-HFT market makers and opportunistic traders. They did this by analysing data from S&P 500 eMini trading. It does this competently enough, not really revealing anything that will help make profits, but at least making the structure of the market that little bit clearer.
Does Academic Research Destroy Stock Return Predictability? - McLean and Pontiff This paper looks at published studies of stock market anomalies, and investgiates whether the returns decrease once they're published, presumably as they're arbitraged away. It seems they do. Not a real surprise. What's impressive is how non-commercial this kind of research is. Not only does it not actually produce a convenient list of stock market anomalies you might trade off, but it doesn't look at these anomalies in terms of risk-reward tradeoffs. So, it's not clear what the limits of arbitraging away are - has it just hit a level where the risk isn't worth it, or what? Useless social science research. Gah.
Is Everything We Know About Password Stealing Wrong? - Florencio and Herley This rather random paper is a simple one-trick pony. It points out that because banks can mostly undo disputed transactions, online theft mostly consists of stealing money from mules ('work from home, processing transactions!'), and it's waaay harder to find mules than compromised accounts. While it's not deep, it provides a fairly sensible reality check on the economics of password stealing. Fun.
A Parameter-Free Hedging Algorithm - Chaudhuri, Freund and Hsu I saw a presentation where this approach was mentioned for an online-learning decision theoretic system. It made sense at the time. Then I read this paper. It describes the simple algorithm, succinctly. It proves useful properties in an appendix. What it doesn't do is motivate the underlying problem. It explains the advance over the previous unexplained state of the art. As I'm lacking all the background in this area, I'm completely missing the point of what this is about and why I should care!
The Influence of Organizational Structure on Software Quality: An Empirical Case Study - Nagappan, Murphy and Basili Conway's law says organisations produce systems that mirror their organisational/communications structure. This paper extends that idea - that bug defect rates can be predicted from organisational information, when related to source control information - so if code is edited by lots of different people in various unrelated teams, you know you've got problems. The 'empirical case study' in question is Windows Vista. They claim good results, even compared to other metrics that look at the code itself, but it never really convinces me. There's some correlation/causation stuff going on here - I'm not quite sure if the analysis is picking up the right stuff, but it does make a very fun formalised anecdote.
Searching for Build Debt: Experiences Managing Technical Debt at Google - Morgenthaler, Gridnev, Sauciuc and Bhansali 'Technical Debt' is such a great phrase - it's actually pretty good at getting across to management why, just because it works right now, not tidying things up is a bad thing and will bite us in the future. All projects get technical debt. Well, either that or they're not real world projects, and probably won't ever be used. :p So, how to manage technical debt is a bit of an issue, and any ideas how others do it are interesting. Sadly, this paper covers a few very specific cases, and so falls rather short of my hopes.
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code - Li, Lu, Myagmar and Zhou Another perennial issue is code duplication. At least it is where I work - I work with mathematicians and scientists who aren't necessarily highly experienced with good software development practices. And sometimes, they're up against tight deadlines. So, copy-and-paste is a way of life. So, code bloats, bugs are only fixed in one of several locations, etc. There are tools out there to analysis this stuff, so I thought I'd read up on them. This paper starts inauspiciously, but is actually rather good. The algorithm does some neatish stuff, the analysis shows more-or-less what you'd expect (most of the copy-and-paste in the Linux kernel is in drivers), and it does a rather neat thing of spotting copy-and-paste related bugs. The only downside is that the actual CP-Miner software is only available commercially, and the open source equivalents are nothing like as good. Ho hum.