• Syntax trees, APIs, templates, and languages

    My old pal Tim St. Clair recently lamented about the state of configuration languages on Twitter:

    The world desperately needs a configuration language that isn’t YAML or JSON. Something that is “expressive enough” yet “simple enough”. That balance is super hard.

    Since thinking about the semantics of configuration systems and languages is a longstanding hobby of mine, I chimed in:

    Part of the problem is endemic confusion over what constitutes a “language” — at best, YAML and JSON are human-readable serialization formats for abstract syntax trees.

    So many extant DSL designs clearly start and end with AST serialization, never getting to semantics or UX.

    Which led to a great question from Justin Taylor-Berrick:

    Is the problem, then, that YAML and JSON are poor ASTs? The configuration languages all build up from data and add their features, rather than us agreeing on an AST between human friendly languages and data? Maybe we are missing an abstraction layer.

    I can’t answer Justin properly in a Tweet, so I’m doing it here.

    I really like the way Justin notes that starting with a serialization format and adding some ad hoc operations isn’t ideal: this results in “languages” that are adequate at representing the data they’re supposed to manipulate but probably aren’t as good at expressing the operations we’d like to perform on that data as they could be, to say nothing of establishing any guarantees about the transformations we’re performing or providing a predictable and consistent developer experience.1

    In the case of Kubernetes in particular, “configuration languages” generally don’t do much beyond specifying API objects (sometimes parameterized at configuration time) that should or should not exist. This is actually not as bad as it sounds, given how Kubernetes works! (In general, you could do worse than using an established API as a starting point for a DSL.) But it would be better to provide higher-level query, coordination, and templating operations over these, and that’s where the complexity comes in.

    A serious problem with the approach of starting with API objects and adding templating is that templating languages are extraordinarily difficult to get right. Existing templating facilities are often full of usability pitfalls because they have grown organically to satisfy certain applications. This sort of language evolution leads to many corner cases and inconsistencies. (Consider, for example, whether anyone could reliably document the semantics of PHP, or whether you’d be more inclined to trust a page-long Scheme program or a page-long Perl script.)

    Syntax, whether concrete or abstract, isn’t the most interesting or difficult part of DSL (or language) design, but it is the part that invites the most bikeshedding. Most designers of YAML- or JSON-based “DSLs” seem to have put some thought into what the nouns should be called and into what files should look like in their editors but not into what things should mean or do. This is great if one’s goal is to design a language that one will enjoy seeing in an editor window, to the extent that one can enjoy looking at serialized nested lists and dictionaries in an editor window, but it is less great if one’s goal is to design a language that will not confuse its users.

    Amateur language designers often think about incidental features of how they like languages to look and enjoy combining these incidental features in a novel way. Starting with an AST (and using a textual serialization format) means that they don’t have to develop a lexer or parser; failing to design the language beyond the AST means that the language is essentially a way to construct API objects (whether the API is for the system being configured or for the configuration system itself). This provides little value above simply publishing the API, but can potentially introduce usability headaches if, e.g., not every feature of the API is exposed, or if the semantics of iteration or variable-expansion facilities are unclear.

    Flexible configuration is often really multi-stage programming, and we want a way to check, document, and test our configurations in the same way we check, document, and test our programs. It would be better to approach configuration by starting with a lightweight general-purpose programming language2, removing unnecessary features almost to the point of austerity, and enriching this core language with built-in functions and literals for API objects from the system to be configured.


    1. For (an intentionally-controversial) example: I hate XML, but no matter how much one hates XML, one has to acknowledge that the XPath, XQuery, and XSLT tooling is better for document query and manipulation than an ad hoc combination of YAML or JSON and some templating engine designed for the view layer of a web application. [return]
    2. Almost certainly not a Turing-complete one. [return]
  • Sometimes you can’t even improve what you do measure

    As part of my Sisyphean quest to find a bicycle computer that doesn’t make me want to start tracking my rides with an abacus or programmable loom, I recently bought a new Garmin device to replace a deceased old one. The new device is not yet enraging me; it has everything I liked about my old one, plus more, and it also works.

    Part of the “more” is three new mountain bike metrics. The most obvious is a screen that pops up and beeps at you whenever you momentarily leave the ground, saying “Great Jump!” and telling you how long you were in the air, how far you traveled, and how fast you were going at takeoff. Given my level of mountain bike proficiency, I read “Great Jump!” as sarcastic every single time.

    The other new metrics are called “Grit” and “Flow.” I’d not read the manual before my first ride but I remembered reading that “Grit” was a measure of the difficulty of the route and “Flow” was a measure of how well you maintained speed while descending.

    I hit a few loops of smooth singletrack at the ski club and tried to increase my “Flow” score with each lap. No matter how much I focused on improving my “Flow” score — staying loose, breathing slowly, pretending that every instant of pressure on my brakes came from a finite budget — I couldn’t get my “Flow” above five or so for a given lap. I didn’t know what five flow meant or if it was any good, but I was confident I could do better.

    When I got home, I read Garmin’s description of “Flow,” which told me two things:

    • a “Flow” score between one and twenty is not bad, but
    • a “Flow” score between zero and one is ideal

    Optimizing for the opposite of the right metric seems worse than the more common problem of optimizing for the wrong metric altogether; it’s probably worth looking out for in general.

  • Operas summarized briefly

    Carmen (Bizet, 1875): Man disregards advice from woman, with grave consequences.
    Orfeo ed Euridice (Gluck, 1762): Man disregards advice from deity, with grave consequences.
    Der fliegende Holländer (Wagner, 1843): Woman disregards advice from ghost pirate, with grave consequences.

    Così fan tutte (Mozart, 1790): The composer does not like his wife.
    Fidelio (Beethoven, 1805–14): The composer likes the idea of having a wife, but would settle for a free and just society.
    Tristan und Isolde (Wagner, 1865): The composer likes other people’s wives.

    La Traviata (Verdi, 1853): All you need is love.
    Pagliacci (Leoncavallo, 1892): The tears of a clown, when everyone’s around.
    Die Zauberflöte (Mozart, 1791): You say you want a revolution, and your bird can sing.

    Rienzi (Wagner, 1840): The composer would like to make some money in Paris.
    Tannhäuser (Wagner, 1845): The composer demonstrates that the Twisted Sister/Tipper Gore feud would have had a much larger body count had it occurred in medieval Germany.
    Die Meistersinger von Nürnberg (Wagner, 1868): The composer thinks you should know that he read your review and it still stings a bit.
    Parsifal (Wagner, 1882): The composer spent a lot of late nights thinking aloud in his dorm room the semester he took Intro to World Religions.

    Falstaff (Verdi, 1893): It might actually be possible to improve on Shakespeare.
    Roméo et Juliette (Gounod, 1867): But not like this.
    Das Liebesverbot (Wagner, 1836): And absolutely not like this.

    La nozze di Figaro (Mozart, 1786): Rich people are terrible.
    Madama Butterfly (Puccini, 1904): Americans are terrible.
    La Bohème (Puccini, 1896): Infectious diseases are terrible.
    Das Rheingold (Wagner, 1869): Teutonic deities are terrible.
    Guillaume Tell (Rossini, 1829): Habsburgs are terrible.
    Die Fledermaus (J. Strauss, 1874): Operetta is terrible.

  • Tracing the development of presentation style

    For nearly all of my adult life, a large part of my job has involved communicating technical concepts. I like to imagine that I’ve developed a consistent voice, style, and visual language, and I’m also inclined to imagine that it has taken me a long time to get here. I recently needed to look over an old deck and was surprised to note that many elements of my current (and presumably at least somewhat refined) style were present in a talk I gave over a decade ago as a graduate student.

    Here’s the old talk; for comparison, here’s a talk I gave this January. While I’m still improving at giving talks and designing visual explanations, I guess things haven’t changed as radically as I might have assumed.

  • The delights of cookbooks

    I was reminded today that the delights of good cookbooks subsist not merely in explaining how to prepare particular dishes of interest but in introducing wonderful things that one didn’t even know were of interest.

    This is true of the best technical writing structured around a cookbook metaphor, as well.

subscribe via RSS