Deferred Path-ification

On creating pathlib Paths from template strings (t-strings) in Python 3.14

    Out of the box, t-strings in Python 3.14 will not solve the issue of deferred (or "late-binding") paths, as discussed in part 1. But with a little tinkering I managed to make a decent proof of concept of how we can achieve it!

    For the unfamiliar: they're like f-strings, except instead of producing an interpolated string at runtime, they produce a string.templatelib.Template (see their docs).

    This Template class is composed of:

    It also provides access to the values of the interpolations as a tuple on the .values attribute.

    For example, this kind of string you often see in dataset filenames:

    >>> chunk_idx, chunk_total = 3, 10
    >>> template = t"chunk_{chunk_idx:03}-of-{chunk_total:03}.parquet"
    >>> template
    Template(
        strings=('chunk_', '-of-', '.parquet'),
        interpolations=(
            Interpolation(3, 'chunk_idx', None, '03'),
            Interpolation(10, 'chunk_total', None, '03'),
        )
    )
    >>> template.values == (chunk_idx, chunk_total)
    True
    

    They are not deferred, they bind eagerly at runtime, but they do have all the moving parts we would need to infer filename patterns.

    Let's set up the demo with some variables to use in t-string paths, which for now we'll just call T (but you might call TPath):

    root = Param("root")
    idx = Param("idx")
    total = Param("total")
    

    These three variables are our unbound params, which will be deferred and go in str.templatelib.Interpolation:

    >>> root, idx, total
    ('$root', '$idx', '$total')
    

    We use them in our t-string and they are immediately stored, but stored as deferred Params, so they're still symbolic:

    chunk_filename_t = T(t"chunk_{idx:03}-of-{total:03}.parquet")
    

    Inside the chunk_filename_t template we find this:

    TemplateExpr(
        template=Template(
                strings=('', '/chunks/chunk_', '-of-', '.parquet'),
                interpolations=(
                    Interpolation("$root", 'root', None, ''),
                    Interpolation("$idx", 'idx', None, '03'),
                    Interpolation("$total", 'total', None, '03'),
                ),
            ),
        ),
    )
    

    We can join the root: Param with a string literal for a fixed segment onto this templated leaf path fragment chunk_filename_t:

    chunk_file_t = root / "chunks" / chunk_filename_t
    

    This gives a JoinExpr of the root and the "chunks", wrapped in another JoinExpr appending the chunk filename template (the leaf path):

    JoinExpr(
        left=JoinExpr(
            left=ParamExpr(param='$root'),
            right=LiteralExpr(value='chunks')
        ),
        right=TemplateExpr(template=Template(strings=('chunk_', '-of-', '.parquet'), interpolations=(Interpolation('$idx', 'idx', None, '03'), Interpolation('$total', 'total', None, '03'))))
    )
    

    It's all still deferred: to collect them, we resolve its variables with concrete values:

    chunk_file_t.resolve({"root": Path("/data"), "idx": 1, "total": 10})
    

    PosixPath('/data/chunks/chunk_001-of-010.parquet')

    We just interpolated the root as /data and the chunk index as 1, of a total of 10. The 03 conversions in the t-string zero-pad both int values to a width of 3.

    log_dir = root / "logs"
    log_filename = T(t"log_{idx:05}.parquet")
    log_file = log_dir / log_filename
    log_file.resolve({"root": Path("/data"), "idx": 123})
    

    PosixPath('/data/logs/log_00123.parquet')

    A similar situation, except now we have just one variable (the logs are indexed without a total).

    We can take the parent path of the chunk file, and notice we don't have to specify the idx to do so:

    chunk_file_t.parent.resolve({"root": Path("/data")})
    

    PosixPath('/data/chunks')

    The .parent step up the path chopped off the parameterised segment.

    json_file_t = chunk_file_t.with_suffix(".json")
    json_file_t.resolve({"root": Path("/data"), "idx": 5, "total": 10})
    

    PosixPath('/data/chunks/chunk_005-of-010.json')

    This time we changed the suffix to .json and just used it as a template. It acts like both a template and a regular pathlib Path!

    Now let's do a lateral movement: we're going to get a sibling path, replacing the log_filename (the t-string containing idx) with just a fixed string literal: "schema.json":

    log_schema = log_file.with_name("schema.json")
    log_schema.resolve({"root": Path("/data")})
    

    PosixPath('/data/logs/schema.json')


    For the code used in this post, see the gist