By properly utilizing Algebraic Data Types (ADTs, not to be confused with abstract data types), you can transform certain types of invalid states from runtime errors into type-checking errors, making them an excellent method for representing data and managing state. Although ADTs may sound complex, they represent a fairly straightforward concept. They are composite types, meaning they reference or contain other types, and primarily consist of two basic categories: product and sum types. If you have experience with tuples, you've worked with product types, and if you've used booleans, you've dealt with sum types. While there are additional types, we'll focus on these two fundamental categories for now.
We'll be using typed Python, since ADTs really shine with the help of type-checking.
Here's the short version: both product and sum types are composite types that reference other types and let's pretend those other types are str
and int
. Following this, a product type would contain str
AND int
, and a sum type contains str
OR int
.
So, what's up with the naming? Product and sum types do sound kind of pretentious.
Sum types are called sum types because the number of possible different values in a sum type is simply the sum of all possible values of its containing types. The number of possible different values of a product type is... A Cartesian product of all the values of its containing types. This actually came as a surprise to me while researching this post, since I had a different notion of where the naming comes from[1].
Product types
Python has a ton of product types; tuples being the OG. A tuple[str, int]
is a product type over str
and int
- it contains a string AND an integer. The number of different values of this tuple is: the number of different values of str
, multiplied by the number of different values of int
. Admittedly that's not very useful in this context since both strings and integers in Python are bounded by available memory, but the theory checks out.
Other product types include named tuples, dataclasses, attrs classes, msgspec classes, TypedDicts, you get the idea. You use a ton of product types every day without thinking about it so I won't dwell on them too much.
Sum types
bool
is the quintessential sum type in Python, being logically equivalent to a union of two singletons, True
and False
. Barring bool
, Python sum types come in the form of enums, literals or unions.
Enums have been here a while, being added back in 3.4. Here's an example from the docs:
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
This is a sum type of 3 singletons. Color.RED
, Color.GREEN
and Color.BLUE
are, effectively, values inhabiting the Color
type. Meaning if a function parameter is annotated with Color
, passing in Color.RED
is valid. Enums can only contain singletons, although the singletons themselves can be of any value. So this is also legal:
class Color(Enum):
RED = (1, "a")
GREEN = (2, "b")
BLUE = (3, "c")
(Given that these are singletons, you want to be using immutable values here.)
If you've ever used the Rust programming language, you may have noticed it has enum types of its own. Even though they're called the same, Rust enums are different; they are equivalent to Python unions which we'll get to later.
Literals were added to Python in 3.8 (although since they're mostly a typing construct I think they are usable on earlier versions via typing_extensions
), and they are essentially a simple, anonymous enum with some restrictions.
Here's Color
as a literal:
from typing import Literal
ColorLiteral = Literal["RED", "GREEN", "BLUE"]
If a function argument was annotated as a ColorLiteral
, Mypy would ensure only these three strings were allowed to be passed in.
I think literals are super fun since they're so easy to define and use. But again, literals can only contain string and int values, and singletons; you can't have a Literal["RED", tuple[str, int]]
. I've been using them more and more since they're just more airy than enums.
Unions are the heavyweight of Python sum types. They've been around for ages, and in Python 3.10 they've received dedicated syntax to make them more readable (which I'll be using here).
If you want a sum type that can be either a string or an int, you need to use a union:
StringOrInt = str | int
Now if your function takes an instance of StringOrInt
, you'll know it's either an instance of a string or an int.
You might think unions only work for classes, so you can't have a union of a product type and a sum type, and you'd be wrong - unions can contain literals and enums too.
Sum types in practice
Here's an example. Your web app has a user model, and your users have a privilege assigned to them. The privilege can be one of:
- normal user
- admin
- banned
But for banned users, you want to know when they were banned. You might go about modeling your user class like so:
class User:
privilege: str
banned_at: datetime | None
This definition sucks. First of all, the privilege field is just a string, but not any string can be a valid privilege. So let's fix that.
class User:
privilege: Literal["normal", "admin", "banned"]
banned_at: datetime | None
Ok, great, now we can leverage our type-checker to ensure privilege is always valid. But our state modeling is still lacking; it's possible for a user to exist with the normal privilege and a banned_at
date, and a banned user with no banned_at
field. We're using two independent fields of a product type (User
) where we should be using a sum type instead.
A product type in a trench coat pretending to be a sum type is something I see all the time, especially in contexts where sum types don't exist or are hard to use. The most famous example is maybe found in the Go programming language, where a function that can return a successful result T
or an error will return a tuple[T, error]
, and the callers are expected to check if the error is nil before proceeding. Following example taken from the offical docs:
f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}
// do something with the open *File f
There's no help from the type-checker if you return both (or neither) by accident. Good times.
We can, and will, do better:
class Banned:
banned_at: datetime
class User:
privilege: Literal["normal", "admin"] | Banned
So there, as long as we run Mypy on this (and the rest of our toolkit supports these constructs), it's actually impossible for our Users to get into an invalid state. Not only that, but intended valid states are obvious from the types themselves, so we don't need endless comments, tests and Notion pages explaining how the banned_at
field is only there for banned users.
Handling Sum Types
Booleans and enum members are singletons, so you'll need a chain of if
statements using identity comparisons (the is
operator):
def process_color(c: Color):
if c is Color.RED:
print("Got Red!")
elif c is Color.BLUE:
print("Got Blue!")
# and so on
Literals are values so you'll need ordinary equality, unless the literal contains an enum member, in which case you'll want identity comparisons again.
def process_color_literal(c: ColorLiteral):
if c == "RED":
print("Got Red!")
# an so on
Unions are a little more complex since they can contain literals, enums, and a lot of different product types at once. For union members that happen to be classes, you should use an isinstance
check.
def process_user(user: User):
if isinstance(user.privilege, Banned):
print("The user is banned!")
elif user.privilege == "admin":
print("The user is an admin"
# an so on
Python 3.10 introduced the match
statement to the language. Matching is cool since it presents a unified way of handling both product and sum types, and I feel it expresses the intent of the code better. The cons are its complexity (I've been using it for a while and I still need to go look up the guides sometimes) and the fact it requires an additional indentation level. Here's the last example written using match
:
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
# an so on
Handling Sum Types Exhaustively
Here's a useful trick for exhaustively handling sum types. It can often be the case that all possible variants of a sum type need to be handled, and neither the if
chain nor the match statement provide this by default. For example, in the last example, the case of the normal
user privilege wasn't handled. Exhaustiveness checking is great because it can add a layer of safety to refactoring sum types: new variants can be added without them causing havoc somewhere downcode.
The typing
module contains a special function called assert_never
. If a static type checker detects this function in a place that's not unreachable, it will generate an error. We can make use of this to make our checks exhaustive.
from typing import assert_never
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
case _:
assert_never(user.privilege) # ERROR
(For the if
chain approach, instead of case _
just use a bare else
.)
If we actually handle the missing variant, the error goes away.
from typing import assert_never
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
case "normal":
print("The user is a normal user!")
case _:
assert_never(user.privilege) # No error any more.
More on Naming
Ever since some ancient, long-forgotten college course I've had this notion that multiplication maps to conjunction (i.e. logical AND), and addition maps to disjunction (i.e. logical OR). In other words, *
means AND, and +
means OR. Probably because someone taught me Boolean algebra for a digital electronics class. So it was always logical to me that product types (product, *
) means those types reference type A AND type B, and sum types (+
) mean those types reference type A OR type B. Which is ultimately correct.
But it turns out that's not exactly where product and sum types get their name.