When it comes to software that runs on code, there are two main ways you'd supply test cases:
- Create a set of files and refer to those files within the tests
- Hard-code them into the test themselves as strings (making it easy to see in context, but making tests verbose)
black
The popular code linter black
takes the 1st approach a step further by referring to its own source files as the
test cases
(listing them as a constant named SOURCES
).
flake8
The style guide/quality checker flake8
uses a middle ground between the 2 approaches.
Unlike black it doesn't simply need to confirm that a bunch of files pass the check, but give particular
recommendations,
e.g. output diffs,
so it can't just take a simple list of files to check.
However, the hard-coded strings are written as temporary files using the
pytest tmpdir
fixture
(which doesn't need to be imported, it's just detected automatically).
isort
The import reordering tool isort
is another diff-producing linter tested with Pytest,
but it also relies heavily on property-based testing library
hypothesis.
The tests are all specified within decorators, coupling the config and code in a way that has been
noticed among data science tools lately too.
vulture
The 'dead code' finding tool Vulture has a few dozen test modules assessing various aspects of the library,
with test_report.py
checking the report formatted output, again in Pytest.
I find it unusual that it calls its own module via subprocess
with python -m vulture
.
I imagine the vulture tests are self-explanatory to the author, but some would benefit from module strings to document
what is specifically being tested in each module
(e.g. test_scavenging
).
Vulture does not use parameterised fixtures, and hard-codes the input program strings as arguments to the function calls
directly (e.g. in test_unreachable
).
Despite this I think it has straightforward enough tests.
pydocstyle
I hadn't heard of pydocstyle
before ("docstring style checker"),
but it has a nice approach to testing. This approach is what I initially expected most of the linters to use.
Unusually its tests live in src/tests
alongside the package directory (rather than under the top level).
All of the test functions have docstrings (naturally for a docstring linting library)
and rather than just using plain strings to represent code, it wraps them all in a class CodeSnippet
,
which then uses the textwrap
library's dedent
function to remove the indenting you get from writing multiline strings
within an indented code block [within the test function]. This is a really neat idea I've never seen before!
The CodeSnippet
class also wraps the snippet as a file-like object
(allowing it to be treated as if it were loaded from a file),
which avoids having to write anything to a Pytest tmpdir
.
More complex cases are stored in standalone files within src/tests/test_cases
,
with some peculiar positional arg-related handling
decorators
that get placed on the functions being checked themselves.
I'm not sure whether I like this second part in practice (because the decorators change the code being checked...)
even if I like the idea in theory (of keeping the expected results coupled closely to the source being checked).
The machinery to run the test_cases
subdirectory's modules is also a neat approach
(the test case module names provided as a pytest mark.parameterize
list), stored in
test_definitions
.
Conclusion
My personal preference after reviewing the options would be for pydocstyle
's approach of string
literals wrapped in dedent
and an io.FileIO
file-like object wrapper. I would go one step
further, to parametrise the creation of the strings involved, and potentially refactor these into
classes that eliminate as much of the repetition of the tests as possible.
For example, here we're considering a library that does imports, so rather than write out these
imports by hand we could use ast.unparse
to generate them for us.
def make_import_string(imports: dict[str,dict[str,str[]):
for import, import_dict in imports
return ast.unparse()