Simon Frankau's blog

Enterprise Middleware Development

I wrote this a few weeks ago, but never stuck it up:

I'm reading Joel on Software at the moment. It's quite interesting, and I'll no doubt review it at the appropriate juncture. What I have noticed, however, is how product-focussed he is. With user interfaces and stuff. We don't really do that. While we kind of have users, we really have people who interface with our code, and in the distance we have the angry shouts of actual end users. In other words, we are doing Enterprise Middleware Development. It's that polysyllabic. We're also not coding for the long term - the codebase isn't static and requirements are always changing. Anyway, I thought I'd share some of the thoughts I've had:

You'll never reach consensus. If it's a big system, made of multiple groups, there's never going to be a complete top to bottom focus, with everyone understanding everything else. If you could, it wouldn't be a big system, and you wouldn't need all the groups. If someone's trying to get the big picture, this may not be a bad thing, but beware missing all the important details. It's the n-squared communications path thing. So, instead you'll go for 'abstraction', and you'll have to cope with different groups not really knowing what each other are doing, and all taking different approaches.
Be language/compiler compatible So, lots of different groups, noone fully understands everyone else... a big way this will hit you is that different people will probably be using different compilers, or even different languages. Don't pretend there's a standard, since it'll just screw you even more when people finally diverge (we've got at least 4 compilers for C++ alone, let alone the number of versions of STL). This will probably mean your interface is going to be very sparse and low-level. Most interfaces tend to be reminiscent of COM.
Yes, your code will mostly be interface. A corollary of the language independence and general loose-coupling and abstraction between layers is that most of the code you write will end up being boilerplate interface fluff around simple functionality. Yes it sucks, get over it. This does have various knock-on effects, though....
Minimise the number of layers. More layers is good, right? Looser coupling, more stratification, blah, blah? No! You've only got the layers because it's too big to stick in your head otherwise - it's a necessity, not something you want. More layers equals more interface, more bloat. The end user wants a complex new feature from the bottom layer? Well, you'll have to stick it in every single layer.... This is not to say that the design should not be layered. I'm saying the number of political domains should be minimised. Fewer layers gives fewer meetings. You can keep the class hierarchy just as big (although you might want to bin a few pointless proxy classes).
Data-driven, for great justice! When you're the filling of a systems sandwich, you want to be data-driven. It means your interfaces should change less (see below), configuration is simplified (ditto), plug-ins work better (again) and it improves version compatability (yadda yadda). If everyone does this, you should be able to expose features from the layer below to the layer above in an automated fashion. Yay.
Synchronising releases? Hahahaha. You want to release multiple layers of your software stack at the same time? Yes, very funny. Don't. A requirement for releasing multiple parts simultaneously is generally a sign that you've screwed something up big-time. It's just going to make debugging worse as everything has changed, you're going to compound your schedule problems, and generally you're sticking a big pile of risk together. If you're going to do this, you need to allow a lot of time for integration. Better still, don't do it.
You need version-safe interfaces. So, if you're not releasing the entire world at once, you need to be able to upgrade a single component at a time. Welcome to version-safe interfaces. Generally, I've nicked stuff from COM here, and it works pretty well. The outline is to have an extensible interface with version numbers, which has the ability to expose completely new and different interfaces if you get bored. You've got to support the old interfaces, though. Maybe you can retire several-year-old interfaces that noone uses. Sadly, it generally turns out that someone actually is still using it (see rollback, below).
Version compatability, too. A big stack really sucks if releasing a feature somewhere down the bottom of the stack requires a release of all other layers to reap the benefits. Being able to have other components successfully upgrade around yours, even if you don't know about the new features is generally a good sign. Data-driven interfaces allow you to expose new stuff you don't even understand.
Plug-ins should be pretty abstract, actually. In the last paragraph, we had different versions of the same software being swappable. If you have swappable components that meet an interface, do make them properly swappable, eh? Plug-ins aren't necessarily going to have 100% overlap in functionality, so the design of a good interface should abstract away common ground, and provide a flexible way of accessing the uncommon features. Again, a data-driven interface for the uncommon features should mean that dropping in a new plug-in and config file should give you all the features you want. Put another way: A plug-in system that requires a new release in order to use a new plug-in isn't a plug-in system at all. It's a waste of time.
Make configuration brain-dead. Configuration is boring. Every layer has to do it for the other layers. Most likely, the configuration is going to be read from a config file in another layer, anyway. Lowest common denominator is text key/value pairs. Increasingly popular is XML, but you know this'll mean that every layer will pull in its own SAX/DOM/whatever library. Super-bloat, but that's what we're here for. Last interface I worked on, we did key/value pairs, but the values were typed with basic types (ints, bools, etc.). String values with parsing on the inside do seem preferable, since it allows the exernal config code to be much simpler, which is great, 'cos then people might bother using your code. Personally, I like strongly-typed interfaces where I can use them, bu I think if you're going to have one you should still support parsing from string values just 'cos it makes life that much simpler for your users. You're going to want nice error handling, though.
You are in DLL hell. You may not know it yet, but you pretty certainly are. There will almost certainly be some kind of mess involved with installing multiple versions of everything simultaneously and getting them to talk with each other correctly. If you're mostly avoiding it, congratulations! With nicely-designed stuff, you still tend to get it when, e.g. you're using a hacked-up dev build, and then there's a production problem you need to resolve (on an old version) which you need to reproduce. Ideally, without knackering the dev build. Oh, and the bug's actually in your side-by-side install code.
Allow side-by-side installation. With rollback. So, you've thought hard to minimise your DLL hell. Lots of versioning information in every executable, automatically updated with every build. Source control labels generated every time you save a file. Scripts to automatically select versions. Tools to select which version of the other layers you use with your code. This of course presupposes the ability to side-by-side installation. So, make sure you really can do that. That generally means no centralised configuration, except maybe some kind of routing mechanism to plug the layers together. Which should probably be programmatically overridable, but with a sensible default to prevent everyone overriding it, thus locking your layers together. Maybe you will have centralised configuration, but one which all versions can use, to make upgrading easier. Or whatever. It's going to be complex. You may well want to run muliple versions of the same software on the same machine. Perhaps in the same process for plug-ins. And when it does go horribly wrong, you want to nuke all traces of the new version, as if it had never been there. With 100% certainty.
Quick deployment and comprehensive, flexible UAT are necessary. It's all very well having side-by-side installation and quick rollback, but if you can't get a fast turnaround on installing your software on something suspiciously like all real loads (not a realistic load. All realistic loads - the situations you don't test will be the ones that screw you. Tragically, these are going to be the ones you don't test because you don't even know they exist - the major pain of such super-layered systems), it just means you'll always be running an old version. Testing within a layer is all well and good, but your code doesn't run in isolation. It's got to share with who-knows-what. And the kinds of bug that show up when all the code is integrated together, but not separately, are the really, really unpleasant kind. And they'll turn up at the last minute because nobody budgeted time for integration because your system is so wonderfully layered and loosely-coupled that it couldn't possibly go wrong after you've unit-tested each individual component. The UAT system basically needs to scale to the same size and load as production, yet have change control at a level that allows you to actually find bugs, fix them, and then find the bugs caused by the fix, and then not suffocate in red tape.
Fast release cycles. That'd be nice. Otherwise you get to the situation where each release is so painful that it actually takes longer than a normal release cycle to debug that release (after the two day roll-out-roll-back period). You don't want to go there.
Premature optimisation does not mean no optimisation - AKA don't just deal with reasonable cases. If your code is being used as part of a large system, someone somewhere you don't even know will be abusing it. So don't write your code to just deal with the reasonable cases. Try to survive in the nutty cases. The ones where they go outside your target range by a factor of 10. Because they will. And if you don't plan for it, it'll get ugly, and fixes will be written in a hurry, and the code will get hacky. And if your release process is not speedy, everything's screwed in the mean time, and if you're lucky there'll be bugs in the fix because it was worked on quickly and released quickly and ugh. So, over-engineer everything, and suspect that developers on other teams are out to get you. Failing at this is something I am particularly susceptable to - I've treated 'premature optimisation is the root of all evil' as meaning 'don't optimise things that would impact simplicity at the expense of being able to perform better in worse cases scenarios', rather than 'make it simple first, get good tests in place, then optimise, ruthlessly but scientifically'. Generally I get the O-notation bounds fine, but stick in a big constant factor which means it runs out of resources in a fifth of the time it should (but 5 times longer than my 'sane' test cases. D'oh).
Frameworks and flexibility don't go together. By 'framework', I mean a system where one group defines a bunch of interfaces for plug-ins, which everyone else implements, and then that group implements a magic black box which'll glue all the plug-ins together. If you look carefully at the spec, all the magic black box is doing is calling 3 methods on object 1, and 3 methods on object 2. However, the magic black box^W^W^Wframework code somehow takes 50,000 lines, and has a 6 month release cycle. Moreover, the plug-ins never speak to anything other than the framework, and have brittle interfaces. The upshot of all this is an inflexible architecture with a gaping bottleneck. If you don't need flexibility and quick direction changes, a framework is very Enterprise and appropriate. However, if you want quick changes, bin the framework. Just define the appropriate interfaces, and let people write transparent plumbing tools that connect the objects. Given COM and a bit of gaffer tape, you can write custom 'frameworks' for your objects as required in C++, VB or Excel. In less than a day (you can probably do the same thing in .NET, too). That's flexibility.
Fix problems at their root. And probably elsewhere, too. When big, complicated systems blow up, there's interest in applying the minimal hack to keep going. If one component doesn't scale, hack it independently of the other components until it does? Round-robinning between inefficient servers is not the way forwards. If the short-term hack provides relief, and doesn't affect long-term stability, by all means do it. But fix the inefficiency that caused the problem, which may well be in a different component, re-examine the architecture to see if assumptions have changed and sub-systems or approaches must be replaced. At the same time, don't just fix the root of the problem, but make sure all systems that can be affected by it can cope. If layer A now produces an nth of the load, and layer B can cope with n times the load, you've just given yourself n square headroom. That kind of thing.

Posted 2006-10-21.

Full index