In the previous part of this series, we introduced the following 3 examples which have equivalent behaviour, and found that type checking alone is insufficient to verify the correctness of rewrites.
def guarded_handler(event: dict[str, str]) -> int | None:
if (payload := event.get("payload")) is None:
return None
return len(payload)
def ternary_handler(event: dict) -> int | None:
return None if (payload := event.get("payload")) is None else len(payload)
def handler(event: dict) -> int | None:
payload = event.get("payload")
if payload is None:
output = None
else:
output = len(payload)
return output
The 2nd and 3rd (ternary_handler and handler) are more alike than the 1st,
as they both have a single return value, so it makes sense for us to try to verify them first.
import ast
from enum import Enum
from pathlib import Path
class Handlers(Enum):
guarded = "guarded_handler.py"
ternary = "ternary_handler.py"
linear = "handler.py"
all_handler_programs = [ast.parse(Path(h.value).read_text()) for h in Handlers]
all_handler_functions = [tree.body[1] for tree in all_handler_programs]
assert all("handler" in f.name for f in all_handler_functions)
Comparing two Abstract Syntax Trees
I found some code to dict-ify the AST on StackOverflow (ironically using early
returns itself), which I updated for Python 3, to pretty-print the ASTs for review:
def simplify(node):
"""
Modified version of AST pprinter https://stackoverflow.com/a/19598419/
"""
ignored_node_attributes = (
"lineno end_lineno col_offset end_col_offset ctx type_comment keywords"
)
if isinstance(node, ast.AST):
res = vars(node).copy()
for k in ignored_node_attributes.split():
res.pop(k, None)
for k, v in res.items():
res[k] = simplify(v)
res["__type__"] = type(node).__name__
return res
elif isinstance(node, list):
return list(map(simplify, node))
else:
return node
For the ternary handler it gives:
[{'value': {'test': {'left': {'target': {'id': 'payload', '__type__': 'Name'},
'value': {'func': {'value': {'id': 'event',
'__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'NamedExpr'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'IfExp'},
'__type__': 'Return'}]
And for the linear handler it gives:
[{'targets': [{'id': 'payload', '__type__': 'Name'}],
'value': {'func': {'value': {'id': 'event', '__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'Assign'},
{'test': {'left': {'id': 'payload', '__type__': 'Name'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': [{'targets': [{'id': 'output', '__type__': 'Name'}],
'value': {'value': None, 'kind': None, '__type__': 'Constant'},
'__type__': 'Assign'}],
'orelse': [{'targets': [{'id': 'output', '__type__': 'Name'}],
'value': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'Assign'}],
'__type__': 'If'},
{'value': {'id': 'output', '__type__': 'Name'}, '__type__': 'Return'}]
From glancing between these trees, and thinking about what they represent, it seems simple enough to construct an equivalence from three simplifications:
-
The assignment with
:=should be considered as a preceding assignment expression. This is represented in the AST by aNamedExprnode. -
The ternary value assignment should be considered as equivalent to
if/elseclauses. Both are represented in the AST by atestnode. -
The unnamed return value (with the ternary expression) should be considered as equivalent to the named value. That is, consider a 'direct' value in the return statement as equivalent to a reference.
That seems to break the problem down sufficiently to tackle it.
-
The
NamedExprnode in the ternary handler can be pulled out of theleftside of theIfExpand be considered for purpose of comparison as a precedingAssignnode, for which it should match the same node in the linear handler. Where the value was used, it can be considered equivalent to any named variable with the samevalueforargs(i.e. the variable is not renamed). -
The
testnode in the ternary handler (which says simplypayload is None):
{'target': {'id': 'payload', '__type__': 'Name'} ...
'ops': [{'__type__': 'Is'}], 'comparators': [{'value': None
matches the test node in the linear handler:
{'test': {'left': {'id': 'payload', '__type__': 'Name'},
'ops': [{'__type__': 'Is'}], 'comparators': [{'value': None,
- The direct return value of the ternary handler
'body': {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}], '__type__': 'Call'},
is the same as the return value from the variable output in the linear handler:
{'value': {'id': 'output', '__type__': 'Name'}, '__type__': 'Return'}]
which comes from one of two Assign nodes in the body and orelse of the If node.
'body': ... {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': ... {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}], '__type__': 'Call'},
So far so easy to make rules for: but what about early returns?
Revisiting the guarded handler
Now we can return to the guarded handler:
[{'test': {'left': {'target': {'id': 'payload', '__type__': 'Name'},
'value': {'func': {'value': {'id': 'event',
'__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'NamedExpr'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': [{'value': {'value': None, 'kind': None, '__type__': 'Constant'},
'__type__': 'Return'}],
'orelse': [],
'__type__': 'If'},
{'value': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'Return'}]
To get to the linear handler from this will require all of the 3 rules used for the ternary handler, plus:
- An
ifstatement with areturnstatement inside entails an implicitelseclause for everything that follows.
This lets it be 'translated' into a program with an if/else clause that returns once
with a variable assigned in the condition blocks rather than returning early.
The only other rule needed to 'rewrite' the AST into the linear handler's AST is the 3rd one we already defined (consider anonymous and named return values as equivalent).
Specifying the precise ast.AST node conditions
To actually use these rules, we need to be more specific. We can rewrite them with reference to the AST nodes themselves as:
- An
ast.NamedExprwalrus assignment ⇒ a precedingast.Assign. - An
ast.Assignmentfrom anast.IfExpternary value ⇒ anast.Ifif/elsestatement with 2ast.Assignstatements (inbodyandorelse). - Unnamed
ast.Returnvalue + name ⇒ast.Returnbyid(named value) - An
ast.Compareifblock with anast.Returnin the body ⇒ anast.orelseclause for everything that follows.