I wrote this a few weeks ago, but never stuck it up:
I'm reading Joel on Software at the moment. It's quite
interesting, and I'll no doubt review it at the appropriate
juncture. What I have noticed, however, is how product-focussed he
is. With user interfaces and stuff. We don't really do that. While we
kind of have users, we really have people who interface with our code,
and in the distance we have the angry shouts of actual end users. In
other words, we are doing Enterprise Middleware Development. It's that
polysyllabic. We're also not coding for the long term - the codebase
isn't static and requirements are always changing. Anyway, I thought
I'd share some of the thoughts I've had:
- You'll never reach consensus. If it's a big system, made
of multiple groups, there's never going to be a complete top to bottom
focus, with everyone understanding everything else. If you could, it
wouldn't be a big system, and you wouldn't need all the groups. If
someone's trying to get the big picture, this may not be a bad thing,
but beware missing all the important details. It's the n-squared
communications path thing. So, instead you'll go for 'abstraction',
and you'll have to cope with different groups not really knowing what
each other are doing, and all taking different approaches.
- Be language/compiler compatible So, lots of different
groups, noone fully understands everyone else... a big way this will
hit you is that different people will probably be using different
compilers, or even different languages. Don't pretend there's a
standard, since it'll just screw you even more when people finally
diverge (we've got at least 4 compilers for C++ alone, let alone the
number of versions of STL). This will probably mean your interface is
going to be very sparse and low-level. Most interfaces tend to be
reminiscent of COM.
- Yes, your code will mostly be interface. A corollary of the
language independence and general loose-coupling and abstraction
between layers is that most of the code you write will end up being
boilerplate interface fluff around simple functionality. Yes it sucks,
get over it. This does have various knock-on effects, though....
- Minimise the number of layers. More layers is good, right?
Looser coupling, more stratification, blah, blah? No! You've only got
the layers because it's too big to stick in your head otherwise - it's
a necessity, not something you want. More layers equals more
interface, more bloat. The end user wants a complex new feature from
the bottom layer? Well, you'll have to stick it in every single
layer.... This is not to say that the design should not be
layered. I'm saying the number of political domains should be
minimised. Fewer layers gives fewer meetings. You can keep the class
hierarchy just as big (although you might want to bin a few pointless
proxy classes).
- Data-driven, for great justice! When you're the filling of
a systems sandwich, you want to be data-driven. It means your
interfaces should change less (see below), configuration is simplified
(ditto), plug-ins work better (again) and it improves version
compatability (yadda yadda). If everyone does this, you should be able
to expose features from the layer below to the layer above in an
automated fashion. Yay.
- Synchronising releases? Hahahaha. You want to release
multiple layers of your software stack at the same time? Yes, very
funny. Don't. A requirement for releasing multiple parts
simultaneously is generally a sign that you've screwed something up
big-time. It's just going to make debugging worse as everything has
changed, you're going to compound your schedule problems, and
generally you're sticking a big pile of risk together. If you're going
to do this, you need to allow a lot of time for
integration. Better still, don't do it.
- You need version-safe interfaces. So, if you're not
releasing the entire world at once, you need to be able to upgrade a
single component at a time. Welcome to version-safe
interfaces. Generally, I've nicked stuff from COM here, and it works
pretty well. The outline is to have an extensible interface with
version numbers, which has the ability to expose completely new and
different interfaces if you get bored. You've got to support the old
interfaces, though. Maybe you can retire several-year-old interfaces
that noone uses. Sadly, it generally turns out that someone actually
is still using it (see rollback, below).
- Version compatability, too. A big stack really sucks if
releasing a feature somewhere down the bottom of the stack requires a
release of all other layers to reap the benefits. Being able to have
other components successfully upgrade around yours, even if you don't
know about the new features is generally a good sign. Data-driven
interfaces allow you to expose new stuff you don't even
understand.
- Plug-ins should be pretty abstract, actually. In the last
paragraph, we had different versions of the same software being
swappable. If you have swappable components that meet an interface, do
make them properly swappable, eh? Plug-ins aren't necessarily going to
have 100% overlap in functionality, so the design of a good interface
should abstract away common ground, and provide a flexible way of
accessing the uncommon features. Again, a data-driven interface for
the uncommon features should mean that dropping in a new plug-in and
config file should give you all the features you want. Put another
way: A plug-in system that requires a new release in order to use a
new plug-in isn't a plug-in system at all. It's a waste of time.
- Make configuration brain-dead. Configuration is
boring. Every layer has to do it for the other layers. Most likely,
the configuration is going to be read from a config file in another
layer, anyway. Lowest common denominator is text key/value
pairs. Increasingly popular is XML, but you know this'll mean that
every layer will pull in its own SAX/DOM/whatever
library. Super-bloat, but that's what we're here for. Last interface I
worked on, we did key/value pairs, but the values were typed with
basic types (ints, bools, etc.). String values with parsing on the
inside do seem preferable, since it allows the exernal config code to
be much simpler, which is great, 'cos then people might bother using
your code. Personally, I like strongly-typed interfaces where I can
use them, bu I think if you're going to have one you should still
support parsing from string values just 'cos it makes life that much
simpler for your users. You're going to want nice error handling,
though.
- You are in DLL hell. You may not know it yet, but you
pretty certainly are. There will almost certainly be some kind of mess
involved with installing multiple versions of everything
simultaneously and getting them to talk with each other correctly. If
you're mostly avoiding it, congratulations! With nicely-designed
stuff, you still tend to get it when, e.g. you're using a hacked-up
dev build, and then there's a production problem you need to resolve
(on an old version) which you need to reproduce. Ideally, without
knackering the dev build. Oh, and the bug's actually in your
side-by-side install code.
- Allow side-by-side installation. With rollback. So, you've
thought hard to minimise your DLL hell. Lots of versioning information
in every executable, automatically updated with every build. Source
control labels generated every time you save a file. Scripts to
automatically select versions. Tools to select which version of the
other layers you use with your code. This of course presupposes the
ability to side-by-side installation. So, make sure you really can do
that. That generally means no centralised configuration, except maybe
some kind of routing mechanism to plug the layers together. Which
should probably be programmatically overridable, but with a sensible
default to prevent everyone overriding it, thus locking your layers
together. Maybe you will have centralised configuration, but one which
all versions can use, to make upgrading easier. Or whatever. It's
going to be complex. You may well want to run muliple versions of the
same software on the same machine. Perhaps in the same process for
plug-ins. And when it does go horribly wrong, you want to nuke all
traces of the new version, as if it had never been there. With 100%
certainty.
- Quick deployment and comprehensive, flexible UAT are
necessary. It's all very well having side-by-side installation
and quick rollback, but if you can't get a fast turnaround on
installing your software on something suspiciously like all real loads
(not a realistic load. All realistic loads - the situations you
don't test will be the ones that screw you. Tragically, these are
going to be the ones you don't test because you don't even know they
exist - the major pain of such super-layered systems), it just means
you'll always be running an old version. Testing within a layer is all
well and good, but your code doesn't run in isolation. It's got to
share with who-knows-what. And the kinds of bug that show up when all
the code is integrated together, but not separately, are the really,
really unpleasant kind. And they'll turn up at the last minute
because nobody budgeted time for integration because your system is so
wonderfully layered and loosely-coupled that it couldn't possibly go
wrong after you've unit-tested each individual component. The UAT
system basically needs to scale to the same size and load as
production, yet have change control at a level that allows you to
actually find bugs, fix them, and then find the bugs caused by the
fix, and then not suffocate in red tape.
- Fast release cycles. That'd be nice. Otherwise you get to
the situation where each release is so painful that it actually takes
longer than a normal release cycle to debug that release (after the
two day roll-out-roll-back period). You don't want to go there.
- Premature optimisation does not mean no optimisation - AKA
don't just deal with reasonable cases. If your code is being used
as part of a large system, someone somewhere you don't even know will
be abusing it. So don't write your code to just deal with the
reasonable cases. Try to survive in the nutty cases. The ones where
they go outside your target range by a factor of 10. Because they
will. And if you don't plan for it, it'll get ugly, and fixes will be
written in a hurry, and the code will get hacky. And if your release
process is not speedy, everything's screwed in the mean time, and if
you're lucky there'll be bugs in the fix because it was worked on
quickly and released quickly and ugh. So, over-engineer everything,
and suspect that developers on other teams are out to get you. Failing
at this is something I am particularly susceptable to - I've treated
'premature optimisation is the root of all evil' as meaning 'don't
optimise things that would impact simplicity at the expense of being
able to perform better in worse cases scenarios', rather than 'make it
simple first, get good tests in place, then optimise, ruthlessly but
scientifically'. Generally I get the O-notation bounds fine, but stick
in a big constant factor which means it runs out of resources in a
fifth of the time it should (but 5 times longer than my 'sane' test
cases. D'oh).
- Frameworks and flexibility don't go together. By
'framework', I mean a system where one group defines a bunch of
interfaces for plug-ins, which everyone else implements, and then that
group implements a magic black box which'll glue all the plug-ins
together. If you look carefully at the spec, all the magic black box
is doing is calling 3 methods on object 1, and 3 methods on object
2. However, the magic black box^W^W^Wframework code somehow takes
50,000 lines, and has a 6 month release cycle. Moreover, the plug-ins
never speak to anything other than the framework, and have brittle
interfaces. The upshot of all this is an inflexible architecture with
a gaping bottleneck. If you don't need flexibility and quick direction
changes, a framework is very Enterprise and appropriate. However, if
you want quick changes, bin the framework. Just define the appropriate
interfaces, and let people write transparent plumbing tools that
connect the objects. Given COM and a bit of gaffer tape, you can write
custom 'frameworks' for your objects as required in C++, VB or
Excel. In less than a day (you can probably do the same thing in .NET,
too). That's flexibility.
- Fix problems at their root. And probably elsewhere, too.
When big, complicated systems blow up, there's interest in applying
the minimal hack to keep going. If one component doesn't scale, hack
it independently of the other components until it does?
Round-robinning between inefficient servers is not the way
forwards. If the short-term hack provides relief, and doesn't affect
long-term stability, by all means do it. But fix the inefficiency that
caused the problem, which may well be in a different component,
re-examine the architecture to see if assumptions have changed and
sub-systems or approaches must be replaced. At the same time, don't
just fix the root of the problem, but make sure all systems that can
be affected by it can cope. If layer A now produces an nth of the
load, and layer B can cope with n times the load, you've just given
yourself n square headroom. That kind of thing.
Posted 2006-10-21.