Anti-unification in practice

Finding the shape of a fleet's configs

So far this is all nice in theory: parse some configs, extract their structure, and operate on fleets. How well the structure extraction works in practice though is what makes or breaks this idea, and thankfully it seems to work quite well.

Given a set of terms (here, parsed TOML or YAML values) anti-unification finds the most specific term that generalises all of them, using fresh variables where they disagree. For trees, this amounts to structural recursion: walk in parallel, agree → copy the literal, disagree → emit a hole. It's thresholdless (meaning no manual tuning of algorithm parameters) and has been a well-known tool since Plotkin and Reynolds wrote about it in 1970.

On a fleet of dependabot configs, this works very cleanly as the files are structurally rigid (and I haven't been particularly adventurous with them). Across the 9 dependabot configs in my fleet it recovers exactly the template you'd write by hand:

    updates:
      - cooldown?: ⟨?0⟩
        directory: "/"
        package-ecosystem: ⟨?1⟩
        schedule:
          interval: ⟨?2⟩
    version: 2

with three holes:

the package-ecosystem (8× github-actions, 1× cargo),
the interval (6× weekly, 3× monthly),
and an optional cooldown block (present in 3, constant when present).

The ? after a key name indicates it's optional (present in some instances but not others). For those we descend into the subset where it's present rather than treating the whole subtree as opaque).

Repo-derived holes

Some holes aren't really variables but are functionally determined by the repo itself. project.name in pyproject.toml across a fleet of Python packages takes a distinct value per repo, and most of the time that value just is the repo name (with the conventional kebab-case ↔ snake_case allowance).

After anti-unifying, we scan each hole's observed values and check whether they match the per-repo names (with a PEP 503 normalisation). If every observed value does, we flag the hole as derivable. In my own fleet this catches project.name directly, and separately flags things like tool.coverage.run.source[0] and tool.isort.known_first_party[0] as also derivable from the repo name (which they are, because they're the Python module path, which equals the normalised package name).

The value of this is that a derivable hole isn't a free parameter. It doesn't need to appear in the fleet model at all; it can be filled from the metadata we already have when rendering or validating. This is one direction where the tool starts doing the same compression we do mentally when reading these files: the apparent variability across the fleet is actually zero degrees of freedom once you account for the repo you're in.