DEV Community

Cover image for Advanced Static Typing with mypy (part2)
Chad Dombrova
Chad Dombrova

Posted on • Edited on

Advanced Static Typing with mypy (part2)

More lessons learned from 7 years of annotating a large code base

If you haven't read the first part of this series, it's not strictly necessary to understand this article, but it's worth a read. This article is more focused on topics that relate to the typing of classes, whereas the other is more focused on general concepts and functions.

A quick primer on generics and variance

This is an advanced topic and the better you comprehend it, the less time you'll spend solving type errors via trial and error. It's especially important once you start creating your own generic classes, so the first thing you should do is read up on variance of generic types in the mypy docs.

Now that you've read all of that, let's forge ahead.

Consider this simple example:


class Employee:
    def work(self) -> None:
        pass

class Manager(Employee):
    def manage(self) -> None:
        pass

def do_work(x: Employee) -> None:
   x.work()

do_work(Employee())
do_work(Manager())

Enter fullscreen mode Exit fullscreen mode

mypy passes with flying colors. We can pass a Manager instance to do_work because Manager is considered a subtype of Employee. Subclasses are subtypes. Easy enough. But it gets more complicated when we introduce generics.

Consider this:

from typing import Iterable, TypeVar, Generic
T = TypeVar('T')

class Employee:
    def work(self):
        pass

class Manager(Employee):
    def manage(self):
        pass

class Team(Generic[T]):
    pass

def do_team_work(x: Team[Employee]) -> None:
   pass

team = Team[Employee]()
do_team_work(team)
management_team = Team[Manager]()
do_team_work(management_team)  # mypy error!
Enter fullscreen mode Exit fullscreen mode

Is Team[Manager] a subtype of Team[Employee]? Intuitively it seems like it should be, but in fact it is not! The code above produces the following error:

Argument 1 to "do_work" has incompatible type "Team[Manager]"; expected "Team[Employee]
Enter fullscreen mode Exit fullscreen mode

This is pretty unintuitive. Now it's time to RTFD!

From the mypy docs on variance of generic types:

By default, mypy assumes that all user-defined generics are invariant.

Team is a user-defined generic, so it's invariant. What does that mean?

The very first thing we need to understand is what is meant by "generic"? Essentially this means a container type. For example, a list is a container type, it holds other objects, for example a list[str]. A dictionary is a container type, it contains other types in the form of keys and values, e.g. a dict[str, int].

We intuitively understand that a subclass is a sub-type of its parent class. For example, in the code below, it's obvious that B is a sub-type of A.

class A:
    pass

class B(A):
    pass
Enter fullscreen mode Exit fullscreen mode

"Variance" is all about understanding how and why generics -- i.e. containers -- are subtypes of each other.

To understand this, we have to review the 3 types of variance: invariant, covariant, and contravariant. Here's my simplified version of the mypy docs:

Given these classes:

from typing import Generic, TypeVar

T = TypeVar("T") 

class A:
    pass

class B(A):
    pass

class Thing(Generic[T]):
    pass
Enter fullscreen mode Exit fullscreen mode

Here's how we can think about the variance of some generic type Thing:

Variance type Rule description Rule code
covariant Thing[T] is covariant if Thing[B] is always a subtype of Thing[A] issubclass(Thing[B], Thing[A]) is True
contravariant Thing[T] is contravariant if Thing[A] is always a subtype of Thing[B] issubclass(Thing[A], Thing[B]) is True
invariant Thing[T] is called invariant if neither of the above is true issubclass(Thing[B], Thing[A]) is False and issubclass(Thing[A], Thing[B]) is False

This definition is recursive, because to know if B is a subtype of A, we must again refer to their variance.

We want Team[B] to be a subtype of Team[A], so referring to the list above we need the first one, covariance. Here's how we do this:

from typing import Iterable, TypeVar, Generic
T_co = TypeVar('T_co', covariant=True)

class Employee:
    def work(self):
        pass

class Manager(Employee):
    def manage(self):
        pass

class Team(Generic[T_co]):
    pass

def do_work(x: Team[Employee]) -> None:
   pass

team = Team[Employee]()
do_team_work(team)
management_team = Team[Manager]()
do_team_work(management_team)  # <-- NO mypy error!
Enter fullscreen mode Exit fullscreen mode

This does not mean that you should use covariant=True for every TypeVar that you define! A covariant TypeVar should be reserved for immutable generics -- containers which cannot have their members added or removed after instantiation. If it is not immutable, then you subvert the variance protection that mypy provides. This is why Sequence is covariant, but List is not -- Sequence does not provide any methods for modifying its contents.

It seems to be a pretty well-established convention to use _co and _contra suffixes for covariant and contravariant respectively.

How to deal with classes that instantiate attributes outside of __init__

It often comes up that a class has a function to refresh its instance variables, and the principles of code reuse dictate that we use that function in our __init__ to initialize the variables as well:

def read_from_db() -> Tuple[int, str]:
    ...

class Person:
    def __init__(self, name: str) -> None:
        self.name = name
        self.age: int = None         # error!
        self.location: str = None    # error!
        self.refresh()

    def refresh(self) -> None:
        self.age, self.location = read_from_db()
Enter fullscreen mode Exit fullscreen mode

mypy produces the following errors:

error: Incompatible types in assignment (expression has type "None", variable has type "int")
error: Incompatible types in assignment (expression has type "None", variable has type "str")
Enter fullscreen mode Exit fullscreen mode

The naive solution to this is to make self.age and self.location Optional, however in this case this is not what we want, because in our contrived example read_from_db() always returns a non-None value, and we want don't want code that uses our Person to have to add is None checks everywhere for these attributes.

Here's one solution:

def read_from_db() -> Tuple[int, str]:
    ...

class Person:
    def __init__(self, name: str) -> None:
        self.name = name
        self.refresh()

    def refresh(self) -> None:
        self.age, self.location = read_from_db()
Enter fullscreen mode Exit fullscreen mode

This works because a variable's type is assigned on the first line that it is defined. The downside is that the variables and their types are not front-and-center in the __init__ where we expect them, so developers reading your code may miss them if they don't go hunting. Also, it's somewhat brittle, since shuffling method order during a refactor could lead to some other line being the first assignment of these variables (not in this example, obviously, because there are only two methods and one assignment for each attribute).

Here's an alternative solution:

def read_from_db() -> Tuple[int, str]:
    ...

class Person:
    age: int
    location: str

    def __init__(self, name: str) -> None:
        self.name = name
        self.refresh()

    def refresh(self) -> None:
        self.age, self.location = read_from_db()
Enter fullscreen mode Exit fullscreen mode

This is valid, but we have to be careful when we use technique. We must instantiate self.age and self.location as early in the life-cycle of Person as possible, because mypy now believes they are non-None.

Consider this example:

def read_from_db() -> Tuple[int, str]:
    ...

class Person(object):
    age: int = None
    location: str = None

    def __init__(self, name: str) -> None:
        self.name = name
        # self.refresh() not called!

    def refresh(self) -> None:
        "You must call this manually!"
        self.age, self.location = read_from_db()

p = Person('chad')
next_year = p.age + 1  # runtime error!
Enter fullscreen mode Exit fullscreen mode

mypy will not complain, because it believes p.age is an int, however this code will fail at runtime because p.age is actually None since we have not instantiated it.

When to use assert and typing.cast

typing.cast and assert are easy ways to resolve mypy errors, especially with Optional types, but they should be used as a last resort.

assert can be used to narrow a type in the same way that you can with an if/else statement, but directly in the current scope.

typing.cast does the same thing, but without the runtime implications.

These two approaches should only be used to correct mypy oversights that can't be corrected by other means.

Why? An assert is a sanity check, it means "if everything is working, this statement should never fail". If there is any chance that it could fail, you should raise an error with a proper exception type.

In the case of typing.cast, mypy will blindly change the type of the variable to whatever you cast it to, and you may be wrong! There is no runtime check to keep you honest. I almost always reserve cast for scenarios that involve "imaginary" types that can't be used in an isinstance statement, such as TypeVars.

Let's consider this example adapted from above:

def read_from_db() -> Tuple[int, str]:
    ...

class Person(object):

    def __init__(self, name: str) -> None:
        self.name = name
        self.age: Optional[int] = None

    def load_age(self) -> None:
        "You must call this manually!"
        self.age = read_from_db()[0]

    def is_younger_than(self, other_age: int) -> bool:
        self.load_age()
        return self.age < other_age   # mypy error!
Enter fullscreen mode Exit fullscreen mode

self.age is Optional inside is_younger_than, because mypy is not able to track conditional changes in state that happen outside of a function.

We can solve this by adding an assert:

    def is_younger_than(self, other_age: int) -> bool:
        self.load_age()
        assert self.age is not None, \
            "self.age is set by load_age"
        return self.age < other_age
Enter fullscreen mode Exit fullscreen mode

When it comes to typing, it is a good habit to add an explanation for every assert or cast. Explain why you believe that the assertion should not fail or the cast is correct at this point in the program's execution. What logical assurances do we have that this will always be safe? If you can't explain it, then you should consider raising a proper exception.

A better alternative is almost always to restructure things to allow static typing to pass. For example, you could try redesigning your API so that the values are always set on __init__ (e.g. by providing alternate instantiators as classmethods).

Below is an example of using a cached property instead of an attribute to solve the situation outlined above:

from functools import cached_property

def read_from_db() -> Tuple[int, str]:
    ...

class Person:

    def __init__(self, name: str) -> None:
        self.name = name

    @cached_property
    def age(self) -> int:
        "You must call this manually!"
        return read_from_db()[0]

    def is_younger_than(self, other_age: int) -> bool:
        return self.age < other_age
Enter fullscreen mode Exit fullscreen mode

Sometimes the cure is worse than the illness -- meaning refactoring the code to remove the type error would be too disruptive -- so it's up to you to decide whether an assert is the right solution, but you should use them sparingly.

Here's my order of preference:

  1. Try to solve the problem with better typing. A type error often means A) the type of the variable is wrong, or B) the type of a function that the variable is being passed to is wrong. If the types don't line up with reality then I try my best to make them. This may require making an argument more permissive (using a protocol), adding overloads to a function, or generics to a class.
  2. If option 1 fails, or the added complexity was untenable, but I know the actual type at runtime, then I do one of two things:
    1. use an assert if I'm reasonably certain that the assertion will never fail. provide a comment explaining why it will never fail.
    2. raise a proper exception if there is a chance that it might fail.
  3. As a final fallback, I use typing.cast and provide a comment explaining why the cast is necessary.

I highly recommend enabling --warn-redundant-casts so that you can be notified if a call to cast is attempting to cast an expression to the same type. A redundant cast can mislead developers into thinking that the type of the expression or variable was something other than the cast type.

Be mindful with tuples, they come in two forms

Tuples are unique in the typing system because they can be used in two different ways:

  1. a specific, bounded, and possibly heterogenous group of types. e.g. Tuple[str, int] is a 2-tuple containing a string and an integer
  2. an unbounded sequence of homogenous types. e.g. Tuple[str, ...] (note the ellipsis!). this is a sequence of strings of arbitrary length

And of course you can also combine the two: Tuple[Tuple[str, int], ...]

When mypy infers the type of a tuple literal, it assumes the intent is option 1 by default. In order to tell mypy, "no I actually mean option 2", you must add an annotation. This is particularly annoying with inherited class variables.

class Base(object):
    valid_things = ('thing1',)

class A(Base):
    valid_things = ('thing1', 'thing2')  # error: expression has type "Tuple[str, str]", base class defined type as "Tuple[str]"
Enter fullscreen mode Exit fullscreen mode

This results in an error because Base.validThings is Tuple[str] and A.validThings is Tuple[str, str]. Here's a naive solution to this problem:

class Base(object):
    valid_things: Tuple[str, ...] = ('thing1',)

class A(Base):
    valid_things: Tuple[str, ...] = ('thing1', 'thing2')
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it is kind of verbose. One solution that's less verbose and maintains immutabilty is to use frozenset (assuming of course that order does not matter):


class Base(object):
    valid_things = frozenSet(['thing1'])

class A(Base):
    valid_things = frozenSet(['thing1', 'thing2'])
Enter fullscreen mode Exit fullscreen mode

Or we could use the faux-immutability approach explained above in "Use abstract types to help enforce immutability":

class Base(object):
    valid_things: Sequence[str] = ['thing1']

class A(Base):
    valid_things: Sequence[str] = ['thing1', 'thing2']
Enter fullscreen mode Exit fullscreen mode

With the above solution you do need to redeclare the attribute type as Sequence[str] on each sub-class or it will be promoted to List[str] and will be thus be mutable.

Get the benefits of abc.ABCMeta without the metaclass conflicts

The purpose of abc.ABCMeta is to raise an error at runtime if you define a class that does not implement all of the methods marked as abstract on the abstract base class. Unfortunately, this can cause conflicts with other classes that uses metaclasses, which is particularly irksome with third-party projects like PyQt, PySide, or typing.Generic (before python 3.7). The solutions to this can be quite unsavory.

mypy gives us a brand new solution to this problem: continue using the abc decorators, but don't use the abc.ABCMeta metaclass: mypy perform the same checks as abc.ABCMeta, but during static analysis rather than at runtime. Of course, this is only effective if your code is annotated well enough to track your abstract classes, but it's a great option if it is.


Did I miss anything? Feel free to leave comments with questions or other tips that I missed!

Top comments (0)