Chad Dombrova

Posted on Apr 26, 2022 • Edited on Sep 15, 2022

Advanced Static Typing with mypy (part1)

#python #mypy #typing

Lessons learned from 7 years of annotating a large code base

"I don't meditate, I annotate"

Introducing static typing and mypy to a large code base is not for the faint of heart, but the benefits are worth the effort:

release fewer bugs
make large refactors with confidence
navigate codebase with ease
rapidly onboard new developers

Once you've worked with a well-typed python project, working with "naked" code is painful. In fact, you may find, like I did, that annotating code is a rewarding puzzle that can be quite meditative.

This article is not an introduction to static typing, there are other great resources for that. This is adapted from an internal wiki that I wrote over several years of collecting solutions to common problems. Eventually I came to realize that these tips represent a new set of idioms, conventions, best practices, and design patterns for working with typed code that (to my knowledge) haven't been well documented and disseminated yet. Hopefully you'll find them useful.

You may also find a use for some of the tools I've made along the way:

typewriter: generate annotations from various sources
mypy-runner: wrapper around mypy that gives more control over filtering

Also, the second part of this series is here.

Enough foreplay, let's get started!

Use concrete types for function results, and abstract types for arguments

If you read only one part of this article, it should be this.

When adding type annotations for containers (e.g. list, set, tuple, dict, etc), you can increase the flexibility of your code by specifying the capabilities that your function requires of each argument, rather than using a specific type: this is known as "structural subtyping". The easiest way to do this is by using the generic collections in abc.collections, which are also available in the typing module.

In the abc.collections docs you'll see that each abstract collection class corresponds to one more more underlying special methods. For example, Sized implements __len__ which enables len(x), and Container implements __contains__ which enables y in x. An object is considered an instance of one of these classes if it implements the required methods.

Consider this example:

from typing import List

def get_unique(values: List[str], invalid: List[str]) -> List[str]:
    return [x for x in values if x not in invalid]

The annotations for the arguments are overly specific: values and invalid don't need to be lists. This function requires that values implements __iter__, and invalid implements __contains__. We can refer to the abc.collections documentation to see that we can use Iterable and Container respectively:

from typing import Container, Iterable, List

def get_unique(values: Iterable[str], invalid: Container[str]) -> List[str]:
    return [x for x in values if x not in invalid]

Does that mean we should also use an abstract type for the return type? The answer is usually "no". There are specific situations where using abstract rather than concrete result types is beneficial, such as to fake immutability (see next section) or to satisfy Liskov, but in most cases, when you use a more abstract type for return annotations you're withholding information from the type system and other developers. In our example above, the result is always a list, and there's no advantage in pretending that it's not. Doing so might encourage users of your function to needlessly re-cast the result to a list, for example.

Note, if the return type varies based on an argument type, use a TypeVar and/or @typing.overload to describe those relationships.

Use abstract types to help enforce immutability

Using abstract container types has another benefit.

Consider this example:

from functools import lru_cache
from typing import List, Set

@lru_cache(None)
def get_value() -> List[Set[int]]:
    return [{1, 2}, {3}]

x = get_value()
x.append({4})  # oops, we're appending to the cached value!

Caching decorators like lru_cache will return the same object on successive calls to the same function, and our example above demonstrates how a developer might unknowingly alter the cached value, thinking they have a unique copy. We can protect against this by using abstract containers that don't support mutation.

from functools import lru_cache
from typing import AbstractSet, Sequence

@lru_cache(None)
def get_value() -> Sequence[AbstractSet[int]]:
    return [{1, 2}, {3}]

x = get_value()
x.append({4})  # mypy errors here: no append method
x[0].add(4)  # mypy errors here: no add method

Replacing list with tuple and set with frozenset would achieve the same goal, but not every container has a concrete immutable alternative, and it's often easier to strip away capabilities at the typing level, rather than modifying our code to recursively convert mutable types to immutable types.

Note that if you have a nested data structure, as above, and you want to use typing to enforce immutability, you need to declare each type in the hierarchy as immutable. Immutability is not recursively granted by the parent container to its children -- a type only affects the attributes and methods of itself.

Here's a list of concrete types and some abstract types to use enforce immutability.

concrete type	immutable alternative	abstract immutable type
`dict`	?	`Mapping`
`list`	`tuple` / `Tuple`	`Sequence`
`set`	`frozenset` / `FrozenSet`	`AbstractSet`

Don't obscure the use of `isinstance` / `issubclass` / `is type`.

mypy inspects conditional blocks (and asserts) that use isinstance, issubclass, is type (including is None) in order to track the types of variables as they enter new scopes.
As a result, you should avoid using helper functions that wrap these calls when checking types, or if you do, you should use typing.TypeGuard (see below).

For example, at the time of this writing, mypy does not support inspect.isclass():

if inspect.isclass(x) and issubclass(x, Asset):
    ...

Here's what you should do instead:

if isinstance(x, type) and issubclass(x, Asset):
    ...

Another common way to confuse mypy is storing the result of a type check to a variable:

is_asset = isinstance(x, Asset)
if is_asset:
    ... do stuff
else:
    ... do stuff

The solution is simply to perform the check on the if line:

if isinstance(x, Asset):
    is_asset = True
    ... do stuff
else:
    is_asset = False
    ... do stuff

Likewise, when designing APIs, avoid providing methods that distinguish between types, in order to encourage use of isinstance checks.

Here's another example of an idiom that mypy does not understand:

if None not in (x, y, z):
    result = x + y + z

Instead, you need to be more explicit:

if x is not None and y is not None and z is not None:
    result = x + y + z

Or, depending on your circumstances, you might be able to do something like this:

inputs = (x, y, z)
valid = [v for v in inputs if v is not None]
if len(valid) == len(inputs):
    result = sum(valid)

If you do obscure the use of `isinstance` / `issubclass` / `is type`, then use `TypeGuard`

Here's a giant caveat to everything in the last section. A recent addition to typing provides a way to write functions that can be used to narrow the type of a variable in a way that has previously only been possible with builtin functions like isinstance that mypy supports explicitly. The new functionality is called a TypeGuard. This is covered here in the mypy docs.

Here's a simple recipe that's very handy:

from typing import Any, Type, TypeGuard, TypeVar
T = TypeVar("T")

def safe_issubclass(obj: Any, typ: Type[T]) -> TypeGuard[Type[T]]:
    return isinstance(obj, type) and issubclass(obj, typ)

Avoid writing functions that return different types based on an argument

This is a big no-no in static type checking. The naive approach is use a Union result:

def add(x: int, y: int, as_str: bool = False) -> Union[str, int]:
    result = x + y
    if as_str:
        return str(result)
    return result

value = add(1, 2)
print(value + 10)  # error! can't add str and int

Using a Union in this situation undermines the type checker for everything downstream of this call.

It can sometimes be possible to solve this using @typing.overload, but it is a verbose and overly complicated solution.

from typing import overload, Literal, Union

@overload
def add(x: int, y: int, as_str: Literal[False] = False) -> int:
    pass

@overload
def add(x: int, y: int, as_str: Literal[True]) -> str:
    pass

def add(x: int, y: int, as_str: bool = False) -> Union[str, int]:
    result = x + y
    if as_str:
        return str(result)
    return result

value = add(1, 2)
print(value + 10)  # success!  mypy knows that value is an int

The simplest solution is to provide two different functions, ideally where one calls the other and then performs some final type adjustments, so that you can avoid duplicating code.

Use annotations to "relax" inferred types when they're too restrictive

mypy assigns a type to a variable based on its value on the first line on which it is defined. This initial type is often too narrow, because the type of the variable may be a work-in-progress.

Here's an example where we build up a dictionary of keyword args to pass to a function, which is a fairly common practice:

import subprocess
from typing import Dict, Iterable, Union

def run(args: Union[str, Iterable[str]], env: Optional[Dict[str, str]] = None) -> None:
    kwargs = {"shell": isinstance(args, str)}  # <-- inferred type is Dict[str, bool] !!!
    if env:
        kwargs["env"] = env  # error! env is not a bool
    subprocess.call(args, **kwargs)

mypy produces the following errors:

mypytest.py:11: error: Incompatible types in assignment (expression has type "Dict[str, str]", target has type "bool")
mypytest.py:12: error: Argument 2 to "call" has incompatible type "**Dict[str, bool]"; expected "Union[str, unicode]"
mypytest.py:12: error: Argument 2 to "call" has incompatible type "**Dict[str, bool]"; expected "Callable[[], Any]"
mypytest.py:12: error: Argument 2 to "call" has incompatible type "**Dict[str, bool]"; expected "Union[Mapping[str, Union[str, unicode]], Mapping[unicode, Union[str, unicode]]]"

The first error is the real culprit; the others are collateral damage. What's happened here is that mypy has determined that the variable kwargs is Dict[str, bool] based on its first assignment, so it complains when trying to add env to the dict, because it is restricted to bool values.

To solve this, we just need to relax the type of the kwargs dictionary from its inferred type of Dict[str, bool] to Dict[str, Any]:

import subprocess
from typing import Dict, Iterable, Union

def run(args: Union[str, Iterable[str]], env: Optional[Dict[str, str]] = None) -> None:
    kwargs: Dict[str, Any] = {"shell": isinstance(args, str)}
    if env:
        kwargs["env"] = env
    subprocess.call(args, **kwargs)

I do not recommend using a Union for the value component of a dictionary (e.g. Dict[str, Union[int, bool]]), especially not for one that will be used with **: it causes far more trouble than it's worth. Use Dict[str, Any] instead.

It's important to realize that using ** arg expansion undermines mypy, because it's not able to connect the dots between the types of the individual keys in the expanded dictionary and the keyword args in the function being called (you could with a TypedDict, but in most cases that's not worth the extra work). So, in our example above, we could set the type wrong for kwargs["env"] and mypy would not warn us.

The ideal solution, when possible, is to avoid ** expansion and instead pass values directly. This often requires doing a little digging to find the default arguments for the function that's being called to avoid the need for conditionally passing arguments:

import subprocess
from typing import Dict, Iterable, Union

def run(args: Union[str, Iterable[str]], env: Optional[Dict[str, str]] = None) -> None:
    subprocess.call(args, shell=isinstance(args, str), env=env)

This problem also rears its head with variables that we want to be a Union of types:

if whatever:
    foo = 2
else:
    foo = 'some string'

This produces the dreaded Incompatible types in assignment error. To avoid this, we need to declare the real type on the first line that foo is defined:

if whatever:
    foo: Union[int, str] = 2
else:
    foo = 'some string'

Consider returning an optional tuple instead of a tuple with optional values

It's quite common for a function to return None as a signal that a minor failure occurred: i.e. the correct value could not be retrieved. In the case of a function that returns a tuple, prior to adopting static typing, I would have typically returned None for each item in the tuple. For example:

def get_dimensions() -> Tuple[Optional[int], Optional[int], Optional[int]]:
    if is_valid_context():
        return 10, 20, 30
    else:
        return None, None, None

This is nice because we can immediately unpack the results:

x, y, z = get_dimensions()
if x is not None:
    print(x + y + z)  # ERROR: y and z cannot be None

As humans reading this code we can reason that if one of these is None then they all are, or if any of them is not None, then they are all not None. But to a type checker each None-state is independent.

This can create some painfully verbose type-checking scenarios:

x, y, z = get_dimensions()
if x is not None and y is None and z is None:
    print(x + y + z)

The most technically correct solution is to use typing.overload:

from typing import overload, Optional, Tuple

@overload
def get_dimensions() -> Tuple[int, int, int]:
    pass

@overload
def get_dimensions() -> Tuple[None, None, None]:
    pass

def get_dimensions() -> Tuple[Optional[int], Optional[int], Optional[int]]:
    if is_valid_context():
        return 10, 20, 30
    else:
        return None, None, None

With the addition of these overloads mypy knows that there are only two scenarios: either all the items are None or they're all int. However, it's a pretty verbose solution, and particularly unappealing for functions with many arguments, and thus many annotations to keep up to date.

Note that the overloads establish the typing for get_dimensions for the outside world -- code that calls get_dimensions -- but they do not affect the analysis of the internals of get_dimensions itself. Only the typing of the final, non-overloaded function applies to the internals of the function implementation.

Another solution is to make the entire tuple optional:

def get_dimensions() -> Optional[Tuple[int, int, int]]:
    if is_valid_context():
        return 10, 20, 30
    else:
        return None

Then use of the returned result like this:

dimension = get_dimensions()
if dimensions is not None:
    x, y, z = dimensions
    print(x + y + z)

This would not normally be my favorite choice because it often requires an extra variable and line to unpack it, but it seems to be the lesser of the evils.

In python 3.8 this could be made even more concise with the walrus operator:

if (dimensions := get_dimensions()) is not None:
    x, y, z = dimensions
    print(x + y + z)

Techniques to reuse variables

A common complaint about mypy is that it forces developers to create additional variables "needlessly" just to avoid mypy errors about incompatible types.

As mentioned before, mypy assigns a type to a variable based on its value on the first line on which it is defined. If you assign a new value to a variable it must be the same as the original type, or narrower.

Note that this behavior can be disabled using the mypy flag --allow-redefinition, though I honestly don't recommend it.

Narrow the type within scopes

You can always make the type of a variable more specific. Here are some examples of valid narrowing of types:

kind of narrowing	example
base to subclass	`basestring -> str`
union pruning	`Union[str, int] -> int`
abstract to concrete	`Iterable[int] -> List[int]`

ways to narrow
`isinstance(x, y)`
`issubclass(x, y)`
`type(x) is y`
`typing.cast(y, x)`
`typing.TypeGuard[y]`

Here's an example of successful union narrowing:

import subprocess
from typing import Any, Dict, Optional, List, Union

def run(args: Union[str, List[str]], env: Optional[Dict[str, str]] = None) -> None:
    kwargs: Dict[str, Any] = {
        "shell": False,
    }
    if isinstance(args, str):
        # in this scope, args is understood to be a str.
        # now we convert args from str to List[str].
        args = args.split()
        # the str case of Union[str, List[str]] is now removed
    # at this point args can now only be List[str]
    if env:
        kwargs["env"] = env
    subprocess.call(args, **kwargs)

Above, we've successfully narrowed args from Union[str, List[str]] to List[str].

Return early

Another way to avoid variable proliferation is to look for opportunities to return early. Since Python is so flexible, I often see if/else blocks that bind different types to the same variable before returning it:

def letter_list(s: str, strip_last: bool = False) -> Optional[str]:
    if s:
        result = list(s)
        if strip_last:
            result = result[:-1]
    else:
        result = None   # error: incompatible type
    return result

If we try to type result it leads to more problems:

def letter_list(s: str, strip_last: bool = False) -> Optional[str]:
    if s:
        result: Optional[str] = list(s)
        if strip_last:
            result = result[:-1]   # error: None does not support indexing
    else:
        result = None
    return result

The simplest solution is sometimes to return as soon as you can:

def letter_list(s: str, strip_last: bool = False) -> Optional[str]:
    if s:
        result = list(s)
        if strip_last:
            result = result[:-1]
        return result
    else:
        return None

When to use a new variable name

Sometimes using a new variable is unavoidable.

Here's an example of that kind of failure:

import subprocess
from typing import *

def run(args: str, env: Optional[Dict[str, str]] = None) -> None:
    kwargs: Dict[str, Any] = {
        "shell": False,
    }
    args = args.split()  # error: can't reassign the type of args!
    args.insert(0, 'xdg-open')
    if env:
        kwargs["env"] = env
    subprocess.call(args, **kwargs)

The problem line is args = args.split() where we try to convert args from a str to List[str].

Instead, we need to use different variables for each type:

import subprocess
from typing import *

def run(args: str, env: Optional[Dict[str, str]] = None) -> None:
    kwargs: Dict[str, Any] = {
        "shell": False,
    }
    parts = args.split()
    parts.insert(0, 'xdg-open')
    if env:
        kwargs["env"] = env
    subprocess.call(parts, **kwargs)

Generally speaking, I find this to be a good thing, as usually when data changes type, it changes meaning as well.

If you enjoyed this, check out the second part, and feel free to leave comments with questions or other tips that I missed!

Top comments (2)

Richie • Sep 14 '22

Do you have any general guidance on type hinting class instance types that would:

avoid circular import
reduce import costs at runtime

Is this the way to go?
docs.python.org/3/library/typing.h...

Chad Dombrova • Sep 15 '22

Yes, if TYPE_CHECKING is the way to go. Note that if False also works, if you want to completely avoid importing the typing module.

DEV Community

Advanced Static Typing with mypy (part1)

Lessons learned from 7 years of annotating a large code base

Use concrete types for function results, and abstract types for arguments

Use abstract types to help enforce immutability

Don't obscure the use of `isinstance` / `issubclass` / `is type`.

If you do obscure the use of `isinstance` / `issubclass` / `is type`, then use `TypeGuard`

Avoid writing functions that return different types based on an argument

Use annotations to "relax" inferred types when they're too restrictive

Consider returning an optional tuple instead of a tuple with optional values

Techniques to reuse variables

Narrow the type within scopes

Return early

When to use a new variable name

Top comments (2)

Read next

How to Install PySpark on Your Local Machine

Vedro Hooks

7 Powerful Python Metaprogramming Techniques for Dynamic Code

BCEWithLogitsLoss in PyTorch

Lessons learned from 7 years of annotating a large code base

Use concrete types for function results, and abstract types for arguments

Use abstract types to help enforce immutability

Don't obscure the use of isinstance / issubclass / is type.

If you do obscure the use of isinstance / issubclass / is type, then use TypeGuard

Avoid writing functions that return different types based on an argument

Use annotations to "relax" inferred types when they're too restrictive

Consider returning an optional tuple instead of a tuple with optional values

Techniques to reuse variables

Narrow the type within scopes

Return early

When to use a new variable name

Read next

How to Install PySpark on Your Local Machine

Vedro Hooks

7 Powerful Python Metaprogramming Techniques for Dynamic Code

BCEWithLogitsLoss in PyTorch

Don't obscure the use of `isinstance` / `issubclass` / `is type`.

If you do obscure the use of `isinstance` / `issubclass` / `is type`, then use `TypeGuard`