Inheritance and architectural implications

How will dataclasses change a codebase, and how we think about designing libraries?

So far I've not given much indication of how this code is meant to work, so it's hard to make firm judgements on what improvement is gained here.

In reality, as is often the case in programs, we are dealing with an enumeration of things (to continue the analogy, of a certain number of pastries). Let's say I have 4 pastries for my patisserie. We originally had to specify them by calling our function multiple times:

from enum import Enum

def pastry_template(
    hours: int,
    fruit: str,
    flour: str = "self-raising",
    temp: int = 180,
    turn: bool = True,
    chop: bool = True,
):
    recipe = f"""
    First turn your oven to {temp}°C.
    Next, mix 100g of {flour} flour into the {'chopped ' if chop else ''}{fruit}.
    {'Turn once during cooking' if turn else ''}
    After {'an hour' if hours == 1 else f'{hours} hours'} take your pastry out to cool."""
    return recipe

apple_muffin = pastry_template(hours=1, fruit="apple")
banana_bread = pastry_template(hours=1, fruit="banana", flour="plain", turn=False)
cherry_pie = pastry_template(hours=1, fruit="cherry", chop=False)
xmas_pudding = pastry_template(
    hours=4, fruit="raisins", flour="lard", temp=140, turn=False, chop=False
)

class Menu(Enum):
    muffin = apple_muffin
    bread = banana_bread
    pie = cherry_pie
    pudding = xmas_pudding

In reality the arguments would not be so simple: in reality, good variable names are longer, so if you use Black to lint your code each kwarg tends to take up a line to itself. The switch to dataclasses won't increase the lines of code in this case, but will let you remove the commas and look at each line as an assignment, making the code as a whole more readable and intuitive to reason about (working with objects rather than function calls).

We've already covered this next conversion step, from functions to dataclasses, essentially just sprinkling some self accesses into the recipe method:

from enum import Enum
from dataclasses import dataclass

@dataclass
class Pastry:
    hours: int
    fruit: str
    flour: str = "self-raising"
    temp: int = 180
    turn: bool = True
    chop: bool = True

    @property
    def recipe(self) -> str:
        prepped_fruit = f"chopped {self.fruit}" if self.chop else self.fruit
        time = "an hour" if self.hours == 1 else f"{self.hours} hours"
        recipe = f"""
        First turn your oven to {self.temp}°C.
        Next, mix 100g of {self.flour} flour into the {prepped_fruit}.
        """
        if self.turn:
            recipe += """Turn once during cooking.
        """
        recipe += f"After {time} take your pastry out to cool."
        return recipe

apple_muffin = Pastry(hours=1, fruit="apple")
banana_bread = Pastry(hours=1, fruit="banana", flour="plain", turn=False)
cherry_pie = Pastry(hours=1, fruit="cherry", chop=False)
xmas_pudding = Pastry(
    hours=4, fruit="raisins", flour="lard", temp=140, turn=False, chop=False
)

class Menu(Enum):
    muffin = apple_muffin
    bread = banana_bread
    pie = cherry_pie
    pudding = xmas_pudding

Just looking at this it looks uneven: if you had dozens of these dataclass instantiations what you're really doing is storing state in a module (as you store recipes in a recipe book in real life).

There's a convenient way you can make the above dataclass-centric code into something more stateful again, available in Python 3.10+.

from enum import Enum
from dataclasses import dataclass

@dataclass(kw_only=True)
class Pastry:
    hours: int
    fruit: str
    flour: str = "self-raising"
    temp: int = 180
    turn: bool = True
    chop: bool = True

    @property
    def recipe(self) -> str:
        prepped_fruit = f"chopped {self.fruit}" if self.chop else self.fruit
        time = "an hour" if self.hours == 1 else f"{self.hours} hours"
        recipe = f"""
        First turn your oven to {self.temp}°C.
        Next, mix 100g of {self.flour} flour into the {prepped_fruit}.
        """
        if self.turn:
            recipe += """Turn once during cooking.
        """
        recipe += f"After {time} take your pastry out to cool."
        return recipe

@dataclass
class AppleMuffin(Pastry):
    hours: int = 1
    fruit: str = "apple"

@dataclass
class BananaBread(Pastry):
    hours: int = 1
    fruit: str = "banana"
    flour: str = "plain"
    turn: bool = False

@dataclass
class CherryPie(Pastry):
    hours: int = 1
    fruit: str = "cherry"
    chop: bool = False

@dataclass
class XmasPudding(Pastry):
    hours: int = 4
    fruit: str = "raisins"
    flour: str = "lard"
    temp: int = 140
    turn: bool = False
    chop: bool = False

class Menu(Enum):
    muffin = AppleMuffin()
    bread = BananaBread()
    pie = CherryPie()
    pudding = XmasPudding()

I think that immediately looks clearer to scan through, and more consistent.

The only downside to this rewrite was that I needed to add back the type annotations, which weren't needed when simply calling the dataclass constructor with kwargs. If you don't, then you appear to lose the type annotations (checked with inspect.get_annotations).

Et voila

>>> for i in Menu: print(i.name, i.value.recipe)
... 

muffin 
        First turn your oven to 180°C.
        Next, mix 100g of self-raising flour into the chopped apple.
        Turn once during cooking.
        After an hour take your pastry out to cool.
bread 
        First turn your oven to 180°C.
        Next, mix 100g of plain flour into the chopped banana.
        After an hour take your pastry out to cool.
pie 
        First turn your oven to 180°C.
        Next, mix 100g of self-raising flour into the cherry.
        Turn once during cooking.
        After an hour take your pastry out to cool.
pudding 
        First turn your oven to 140°C.
        Next, mix 100g of lard flour into the raisins.
        After an hour take your pastry out to cool.

Again, in the real world, optimising for readable (thus more easily modifiable) code I'd turn the components of the recipe method that are calculated within the method into properties. We end up with really clear code, whose components are less of a burden reason about:

@dataclass(kw_only=True)
class Pastry:
    hours: int
    fruit: str
    flour: str = "self-raising"
    temp: int = 180
    turn: bool = True
    chop: bool = True

    @property
    def recipe(self) -> str:
        recipe = f"""
        First turn your oven to {self.temp}°C.
        Next, mix 100g of {self.flour} flour into the {self.prepped_fruit}.
        """
        if self.turn:
            recipe += """Turn once during cooking.
        """
        recipe += f"After {self.time} take your pastry out to cool."
        return recipe

    @property
    def time(self) -> str:
        return "an hour" if self.hours == 1 else f"{self.hours} hours"

    @property
    def prepped_fruit(self) -> str:
        return f"chopped {self.fruit}" if self.chop else self.fruit

The negative view of properties is that they can lead to a tradeoff between visibility of the route computation takes, versus more interpretable code (more structured and easier to reason about, which by extension impacts testing, debugging etc.).

If another member of my team, or of a non-dev part of the business wants to change some behaviour (e.g. the chef wants to cook the banana bread for longer), it's much easier to see where to make that edit in the dataclass form than in the dense and not-so-structured procedural form we started with.

In terms of my perception while using them, I find that a class is simply easier to handle than a partial or a function call: for instance here I can create a Pastry and then review its attributes before I compute the recipe property, and even alter the attributes while trying to debug. Function calls have less 'granular' control, and to debug you tend to end up loading the computation path into your short-term memory then figuring out where to breakpoint.