Syntax trees, APIs, templates, and languages

My old pal Tim St. Clair recently lamented about the state of configuration languages on Twitter:

The world desperately needs a configuration language that isn’t YAML or JSON. Something that is “expressive enough” yet “simple enough”. That balance is super hard.

Since thinking about the semantics of configuration systems and languages is a longstanding hobby of mine, I chimed in:

Part of the problem is endemic confusion over what constitutes a “language” — at best, YAML and JSON are human-readable serialization formats for abstract syntax trees.

So many extant DSL designs clearly start and end with AST serialization, never getting to semantics or UX.

Which led to a great question from Justin Taylor-Berrick:

Is the problem, then, that YAML and JSON are poor ASTs? The configuration languages all build up from data and add their features, rather than us agreeing on an AST between human friendly languages and data? Maybe we are missing an abstraction layer.

I can’t answer Justin properly in a Tweet, so I’m doing it here.

I really like the way Justin notes that starting with a serialization format and adding some ad hoc operations isn’t ideal: this results in “languages” that are adequate at representing the data they’re supposed to manipulate but probably aren’t as good at expressing the operations we’d like to perform on that data as they could be, to say nothing of establishing any guarantees about the transformations we’re performing or providing a predictable and consistent developer experience.¹

In the case of Kubernetes in particular, “configuration languages” generally don’t do much beyond specifying API objects (sometimes parameterized at configuration time) that should or should not exist. This is actually not as bad as it sounds, given how Kubernetes works! (In general, you could do worse than using an established API as a starting point for a DSL.) But it would be better to provide higher-level query, coordination, and templating operations over these, and that’s where the complexity comes in.

A serious problem with the approach of starting with API objects and adding templating is that templating languages are extraordinarily difficult to get right. Existing templating facilities are often full of usability pitfalls because they have grown organically to satisfy certain applications. This sort of language evolution leads to many corner cases and inconsistencies. (Consider, for example, whether anyone could reliably document the semantics of PHP, or whether you’d be more inclined to trust a page-long Scheme program or a page-long Perl script.)

Syntax, whether concrete or abstract, isn’t the most interesting or difficult part of DSL (or language) design, but it is the part that invites the most bikeshedding. Most designers of YAML- or JSON-based “DSLs” seem to have put some thought into what the nouns should be called and into what files should look like in their editors but not into what things should mean or do. This is great if one’s goal is to design a language that one will enjoy seeing in an editor window, to the extent that one can enjoy looking at serialized nested lists and dictionaries in an editor window, but it is less great if one’s goal is to design a language that will not confuse its users.

Amateur language designers often think about incidental features of how they like languages to look and enjoy combining these incidental features in a novel way. Starting with an AST (and using a textual serialization format) means that they don’t have to develop a lexer or parser; failing to design the language beyond the AST means that the language is essentially a way to construct API objects (whether the API is for the system being configured or for the configuration system itself). This provides little value above simply publishing the API, but can potentially introduce usability headaches if, e.g., not every feature of the API is exposed, or if the semantics of iteration or variable-expansion facilities are unclear.

Flexible configuration is often really multi-stage programming, and we want a way to check, document, and test our configurations in the same way we check, document, and test our programs. It would be better to approach configuration by starting with a lightweight general-purpose programming language², removing unnecessary features almost to the point of austerity, and enriching this core language with built-in functions and literals for API objects from the system to be configured.

For (an intentionally-controversial) example: I hate XML, but no matter how much one hates XML, one has to acknowledge that the XPath, XQuery, and XSLT tooling is better for document query and manipulation than an ad hoc combination of YAML or JSON and some templating engine designed for the view layer of a web application. ^[return]
Almost certainly not a Turing-complete one. ^[return]