On configs and general-purpose languages

Sometimes you'll read something that's clear, coherent, thoughtful and utterly at odds with your experience. That's just happened to me. In particular, it's this article on using a general-purpose programming language to specify the structure of your system. Well, "specify the structure of your system" is probably incorrect, since there's no hard line between the bit about system shape and the operations you perform, but that's roughly the gist? Anyway, the idea is to use general-purpose code to configure production.

As usual, bear in mind that I'm arguing against what I read, which may not be what was written. Interpretation is fun.

My TL;DR is that any time I've seen a general-purpose language used for something that doesn't need it, at scale it will turn into a horror such that you end up Googlng "The Office no meme". The freedom gets abused. I'm going to spend the rest of this post trying to add some subtlety to that viewpoint....

My Fun With DSLs (TM)

First, let's introduce DSLs...

Domain-Specific Languages (DSLs) are languages cooked up for a particular purpose. They should look like and feel like a general-purpose programming language, but they're specialised for a particular task.

A DSL can be Turing-complete - it can be set up so that any program you want to write can be written in that language, and you cannot, in general, tell if a given DSL program will terminate - but for most tasks you don't need that power and it's a pain to work with, so you'll make your DSL not Turing-complete. You can still have conditionals and loops, for example, but the loops must be bounded (kinda primitive recursive-level power).

I've seen DSLs used to great effect throughout my career. I spent a bunch of time in banking, and used them for specifying exotic equity derivative payouts and real-time trading algorithms (which are extremely different uses, even if the words mean nothing to you), as well as for configuring infrastructure. My PhD thesis was also based on a DSL, but I gotta say, looking back, it was more an exercise in how not to do it!

In general, I've found DSLs to work really well. I've found that when people break the box open and shovel in general-purpose language behaviour, things go really badly.

Note that I'm not against using general-purpose languages to write your DSL code in. Embedded DSLs, where you write your DSL within the syntax of another language, are a quick and easy way to bootstrap a DSL, and can give you a lot of mileage before you hit any problems. The important thing to note, though, is that you need to be clear about the DSL boundaries, and never leak into full general-purpose language functionality.

Terminology

The linked article uses a rather personal naming system. The ideas embedded in it seem to closely, but not necessarily precisely, align with concepts I have, so rather than risk confusing ideas, I'm going to use my own personal naming scheme, and suggest alignments. My stack-of-abstractions looks like this:

  1. Data: Roughly akin to "config", it's a data structure in some form, not too bothered about the details. It represents a "fully-expanded" view of the world.
  2. Domain-specific language: See above! A constrained language, contains some features of a programming language. Roughly aligns with "code", but not necessarily Turing-powerful.
  3. General-purpose programming languages: Go and stuff. :) Full Turing power, you can put whatever you like in there. Aligns with "software".

Why DSLs are awesome

Roughly, the article is "code sucks, use software", and my response is "general-purpose sucks, use DSLs". So, let's put forward my position...

The reason I like DSLs over general-purpose languages can be summarised as "If you give people a general-purpose language, they will use its full power, and this will screw you over." Breaking this down, we have:

  1. Pure functional behaviour: The DSL can be constrained to have extremely well-defined, consistent behaviour through well-known interfaces. Your general-purpose code can just go off and do whatever it likes. Good luck avoiding surprises in large systems, because people will make use of that.
  2. Separation of "what" and "how": Since the DSL is constrained, you are forced to specify the "what" in the DSL, and all the details of "how" go into the DSL implementation. You're forced to have some structure to your system.
  3. Analysability: This enforced structuring simplifies analysis. A decent DSL is easy to statically analyse. Since it's known to terminate, most "analyses" consist of "running" the program with a particular interpretation, and collecting the results, and you can be confident that it's not going to hit some general-purpose code that's going to take down prod or whatever, when all you wanted to do is calculate how many machines this config would require.

In practice, the main "analysis" that you use on the DSL is "expand out a fully-unfolded data structure/config", but it gives you the option to do better than that. And I always want the option to see the fully-unfolded config, as it will be used downstream, without actually performing any operations, as that's an absolute baseline for observability. And a badly-done general-purpose-programming-language-based system won't give you that.

DSLs are great, code sucks

Much of what I read in the article rails against the current state of tooling. All the YAML. Templating. Crappy tooling. Code as config/data with a thin coat of paint.

I think I've been really lucky, in that my career has kept me away from the "state of the art" of open source infra tools, and has let me play around with decent DSLs, supported by the people who use them. I've not had that pain.

If you have crappy "code" systems, I can see the temptation to move to "software" systems. I've seen people give in to that temptation, and as things scale up, I've seen them regret it. General-purpose languages are harder to reason about than DSLs, and we don't need more complexity. DSLs are an investment in keeping things simple, because if you're operating large-scale distributed systems complexity is a complete reliability killer.

My take is that if infrastructure as code sucks, we should make it not suck, not move to general-purpose languages. I get the impression that a lot of the tooling is built by infra people without understanding of programming language design, so we end up with bad tools. The proposal is to jump to general-purpose languages to steal those lessons and tools, but that's not the lesson I want to take away. Which brings me on to...

The infra engineer inferiority complex

I felt the article's idea of "software engineers" vs. "infrastructure engineers" very revealing. I could be reading way too much into it, as someone with a strong software background who's spent some time now in infra, but it feels like the infra inferiority complex.

The message I'm getting is "Infra engineers write code in these crappy tools, while proper software engineers write in proper languages like Go. Infra engineers should be like software engineers.". I wholeheartedly agree, but draw a different conclusion.

Proper Software Engineering throws in tonnes of abstractions. It builds DSLs. Look at Unix: You want to manipulate text? C sucks for that, let's build awk and sed and stuff. The software engineer approach is DSLs, but with real ownership - tools built by those who use the tools. They treat it like a real language, and build out the tooling so that it's not a joke to use. They don't just create a better string manipulation library for C.

I think the issue is the ownership. Right now, the "code" end is for infra engineers, and the tooling that actuates it is "software", usually part of some OSS project, and they're viewed as somewhat distinct, and in the end the infra people are just writing some templated config.

This is not how it should be. "Infra engineers" should have the same software engineering skills as "software engineers", and hence engineer software similarly. They should own the system end to end - both the config/what and the actuation/how parts - but still respect abstractions. "Software engineers" do not (ahem, should not) mix config into their source tree, even if they can. Infra engineers should be just as good at abstracting.

Back to Embedded DSLs

Is the article really suggesting something I totally hate? I can't be sure, but I think so. The "func MyApplication" example code is pretty much an embedded DSL - a data structure being built within a general-purpose language can be clearly distinguished from the embedding language. This does not scare me.

On the other hand, the linked pull request is clearly structured in a way that mixes deployment behaviour with parameters, breaking all the abstractions I'm such a fan of. So... yeah, I think I don't like this.

What now?

What now? I'm not sure. I'm sympathetic to existing tooling sucking badly. I really don't like the proposed solution. I've had a go at explaining why I dislike it, I have no idea if I've got that across or not. I don't have the time or energy to propose a better alternative, and TBH I don't know the OSS tools anything like well enough to make a sensible suggestion. All I can say is "if this is a proposed improvement, I think the starting point must be really bad.".

Or... maybe I lie. I do have strong opinions of my own. I like declarative systems, with reconciliation-based actuation. I like DSLs for those declarative systems, static-analysis-based tooling to leverage those descriptions, and strong end-to-end ownership of the full system by infrastructure engineers so that infrastructure management is truly treated as software engineering. Alas, I still lack the time and energy to act on those opinions.

In any case, that article really got me thinking about config for the first time in a while.

Posted 2021-06-14.