Here's a trivial equivalent of some Python code I wrote:
def pastry_template(
hours: int,
fruit: str,
flour: str = "self-raising",
temp: int = 180,
turn: bool = True,
chop: bool = True,
):
recipe = f"""
First turn your oven to {temp}°C.
Next, mix 100g of {flour} flour into the {'chopped ' if chop else ''}{fruit}.
{'Turn once during cooking' if turn else ''}
After {'an hour' if hours == 1 else f'{hours} hours'} take your pastry out to cool."""
return recipe
This was useful to me as I had a lot of similar fruit pastry recipies to bake at work, and I got the right recipes, but it wasn't great code. Doing everything in one big f-string seemed wrong, and for example forced me to tolerate a blank line in recipes that didn't require turning.
The obvious solution I saw was a dataclass, which simplifies to:
from dataclasses import dataclass
@dataclass
class Pastry:
hours: int
fruit: str
flour: str = "self-raising"
temp: int = 180
turn: bool = True
chop: bool = True
@property
def recipe(self) -> str:
return f"""
First turn your oven to {temp}°C.
Next, mix 100g of {flour} flour into the {'chopped ' if chop else ''}{fruit}.
{'Turn once during cooking' if turn else ''}
After {'an hour' if hours == 1 else f'{hours} hours'} take your pastry out to cool."""
def pastry_template(
hours: int,
fruit: str,
flour: str = "self-raising",
temp: int = 180,
turn: bool = True,
chop: bool = True,
):
pastry = Pastry(hours=hours, fruit=fruit, flour=flour, temp=temp, turn=turn, chop=chop)
return pastry.recipe
- Note a common pitfall here would be to forget to remove the trailing commas from the dataclass attributes, making the default values into tuples.
- Also note an immediate benefit you gain: variables that have been misnamed/renamed will be picked
up as type errors, e.g. if I renamed
chop
tochop_fruit
thenmypy
could tell me thatPastry
has no such attributechop_fruit
and I'd know exactly where to fix it.
This is the strangler fig refactor pattern: the old interface is kept, all that changed is what happens inside. Once that's confirmed, the old interface can be stripped away if not needed. The kwarg passing is clearly redundant and begging to be deleted.
The correctness of a rewrite aside (which tests can verify, or use of a trustworthy automated refactoring tool), what I prefer about the new style is that it is clearer about the 'contract' and the computation.
I went down a Wikipedia rabbit hole recently and at the bottom found the Eiffel programming language, an object-oriented language written with the goal of
conceived in 1985 with the goal of increasing the reliability of commercial software development
and whose key characteristics include:
- An object-oriented program structure in which a class serves as the basic unit of decomposition.
- Design by contract tightly integrated with other language constructs.
- ...
- Static typing
Design by contract in brief:
prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types with preconditions, postconditions and invariants. These specifications are referred to as "contracts", in accordance with a conceptual metaphor with the conditions and obligations of business contracts.
I don't want to revive DbC, but I really appreciated this idea of preconditions and postconditions. The clear separation of these aspects does allow us to reason about code better.
My original function also did some comparisons between the bools involved, which is harder to illustrate for a pastry, but imagine the function had extra flags it compared like:
def pastry_template(
allergens: bool = False,
rare_ingredient: bool = False,
prep_time_mins: int = 60,
):
if not (rare_ingredient or allergens):
no_risk = True
if rare_ingredient or prep_time_mins > 60:
high_effort = True
Dataclasses don't support this, so you'd put these checks in property
methods too:
@dataclass
class Pastry:
allergens: bool = False
difficult: bool = False
prep_time_mins: int = 60
@property
def no_risk(self) -> bool:
return not (self.rare_ingredient or self.allergens)
@property
def high_effort(self) -> bool:
return self.rare_ingredient or self.prep_time_mins > 60
...and these properties could be accessed in your recipe method on self
rather than from the local
namespace. This is the part I like: it makes the individual calculations that go into the 'one big
procedure' separate, and nudges you to maintain a sense of state on the class, which in turn
simplifies variable names (e.g. pastry_risk
can be shortened to no_risk
when it's on the
Pastry
class). That shedding of a noun becomes quite clarifying when you do it at the scale of an
entire program.
By this nudge toward dataclasses, Python programs get nudged to store their state in a clearer way, and the computation upon that state is more readily separated as being available at instantiation or not.
When you split them apart like that, you are then led to move those classes out of the same module
as the business logic operating on them entirely. This is how you end up with modules with names
like data_model.py
(perhaps a bad name, but sometimes appropriate and feels clear to me from reuse).
Crucially, I find code with this emphasis on clarity and separation easier to show others, and easier to revisit myself.
def __post_init__(self) -> None:
"""Fail fast if requirements not met!"""
if recipe.high_effort and not self.no_risk:
raise ValueError("Recipe must be less hassle!")
In reality this is the equivalent of asserting the paramaters do not specify incompatible aims for the program.