Read the original on my personal blog
Mypy is a static type checker for Python. It acts as a linter, that allows you to write statically typed code, and verify the soundness of your types.
All mypy does is check your type hints. It's not like TypeScript, which needs to be compiled before it can work. All mypy code is valid Python, no compiler needed.
This gives us the advantage of having types, as you can know for certain that there is no type-mismatch in your code, just as you can in typed, compiled languages like C++ and Java, but you also get the benefit of being Python 🐍✨ (you also get other benefits like null safety!)
For a more detailed explanation on what are types useful for, head over to the blog I wrote previously: Does Python need types?
This article is going to be a deep dive for anyone who wants to learn about mypy, and all of its capabilities.
If you haven't noticed the article length, this is going to be long. So grab a cup of your favorite beverage, and let's get straight into it.
Index
- Setting up mypy
- Primitive types
- Collection types
- Type debugging - Part 1
- Union and Optional
- Any type
- Miscellaneous types
- Typing classes
- Typing namedtuples
- Typing decorators
- Typing generators
- Typing
*args
and**kwargs
- Duck types
- Function overloading with
@overload
Type
type- Typing pre-existing projects
- Type debugging - Part 2
- Typing context managers
- Typing async functions
- Making your own Generics
- Advanced/Recursive type checking with
Protocol
- Further learning
Setting up mypy
All you really need to do to set it up is pip install mypy
.
Using mypy in the terminal
Let's create a regular python file, and call it test.py
:
This doesn't have any type definitions yet, but let's run mypy over it to see what it says.
$ mypy test.py
Success: no issues found in 1 source file
🤨
Don't worry though, it's nothing unexpected. As explained in my previous article, mypy doesn't force you to add types to your code. But, if it finds types, it will evaluate them.
This can definitely lead to mypy missing entire parts of your code just because you accidentally forgot to add types.
Thankfully, there's ways to customise mypy to tell it to always check for stuff:
$ mypy --disallow-untyped-defs test.py
test.py:1: error: Function is missing a return type annotation
Found 1 error in 1 file (checked 1 source file)
And now it gave us the error we wanted.
There are a lot of these --disallow-
arguments that we should be using if we are starting a new project to prevent such mishaps, but mypy gives us an extra powerful one that does it all: --strict
This gave us even more information: the fact that we're using give_number
in our code, which doesn't have a defined return type, so that piece of code also can have unintended issues.
TL;DR: for starters, use
mypy --strict filename.py
Using mypy in VSCode
VSCode has pretty good integration with mypy. All you need to get mypy working with it is to add this to your settings.json
:
...
"python.linting.mypyEnabled": true,
"python.linting.mypyArgs": [
"--ignore-missing-imports",
"--follow-imports=silent",
"--show-column-numbers",
"--strict"
],
...
Now opening your code folder in python should show you the exact same errors in the "Problems" pane:
Also, if you're using VSCode I'll highly suggest installing Pylance from the Extensions panel, it'll help a lot with tab-completion and getting better insight into your types.
Okay, now on to actually fixing these issues.
Primitive types
The most fundamental types that exist in mypy are the primitive types. To name a few:
int
str
float
-
bool
...
Notice a pattern?
Yup. These are the same exact primitive Python data types that you're familiar with.
And these are actually all we need to fix our errors:
All we've changed is the function's definition in def
:
What this says is "function double
takes an argument n
which is an int
, and the function returns an int
.
And running mypy now:
$ mypy --strict test.py
Success: no issues found in 1 source file
Congratulations, you've just written your first type-checked Python program 🎉
We can run the code to verify that it indeed, does work:
$ python test.py
42
I should clarify, that mypy does all of its type checking without ever running the code. It is what's called a static analysis tool (this static is different from the static in "static typing"), and essentially what it means is that it works not by running your python code, but by evaluating your program's structure. What this means is, if your program does interesting things like making API calls, or deleting files on your system, you can still run mypy over your files and it will have no real-world effect.
What is interesting to note, is that we have declared num
in the program as well, but we never told mypy what type it is going to be, and yet it still worked just fine.
We could tell mypy what type it is, like so:
And mypy would be equally happy with this as well. But we don't have to provide this type, because mypy knows its type already. Because double
is only supposed to return an int
, mypy inferred it:
And inference is cool. For 80% of the cases, you'll only be writing types for function and method definitions, as we did in the first example. One notable exception to this is "empty collection types", which we will discuss now.
Collection types
Collection types are how you're able to add types to collections, such as "a list of strings", or "a dictionary with string keys and boolean values", and so on.
Some collection types include:
List
Dict
Set
DefaultDict
Deque
Counter
Now these might sound very familiar, these aren't the same as the builtin collection types (more on that later).
These are all defined in the typing
module that comes built-in with Python, and there's one thing that all of these have in common: they're generic.
I have an entire section dedicated to generics below, but what it boils down to is that "with generic types, you can pass types inside other types". Here's how you'd use collection types:
This tells mypy that nums
should be a list of integers (List[int]
), and that average
returns a float
.
Here's a couple more examples:
Remember when I said that empty collections is one of the rare cases that need to be typed? This is because there's no way for mypy to infer the types in that case:
Since the set has no items to begin with, mypy can't statically infer what type it should be.
Small note, if you try to run mypy on the piece of code above, it'll actually succeed. It's because the mypy devs are smart, and they added simple cases of look-ahead inference. Meaning, new versions of mypy can figure out such types in simple cases. Keep in mind that it doesn't always work.PS:
To fix this, you can manually add in the required type:
Note: Starting from Python 3.7, you can add a future import,
from __future__ import annotations
at the top of your files, which will allow you to use the builtin types as generics, i.e. you can uselist[int]
instead ofList[int]
. If you're using Python 3.9 or above, you can use this syntax without needing the__future__
import at all. However, there are some edge cases where it might not work, so in the meantime I'll suggest using thetyping.List
variants. This is detailed in PEP 585.
Type debugging - Part 1
Let's say you're reading someone else's — or your own past self's — code, and it's not really apparent what the type of a variable is. The code is using a lot of inference, and it's using some builtin methods that you don't exactly remember how they work, bla bla.
Thankfully mypy lets you reveal the type of any variable by using reveal_type
:
Running mypy on this piece of code gives us:
$ mypy --strict test.py
test.py:12: note: Revealed type is 'builtins.int'
Ignore the builtins for now, it's able to tell us that counts
here is an int
.
Cool, right? You don't need to rely on an IDE or VSCode, to use hover to check the types of a variable. A simple terminal and mypy is all you need. (although VSCode internally uses a similar process to this to get all type informations)
However, some of you might be wondering where reveal_type
came from. We didn't import it from typing
... is it a new builtin? Is that even valid in python?
And sure enough, if you try to run the code:
py test.py
Traceback (most recent call last):
File "/home/tushar/code/test/test.py", line 12, in <module>
reveal_type(counts)
NameError: name 'reveal_type' is not defined
reveal_type
is a special "mypy function". Since python doesn't know about types (type annotations are ignored at runtime), only mypy knows about the types of variables when it runs its type checking. So, only mypy can work with reveal_type
.
All this means, is that you should only use reveal_type
to debug your code, and remove it when you're done debugging.
Union and Optional
So far, we have only seen variables and collections that can hold only one type of value. But what about this piece of code?
What's the type of fav_color
in this code?
Let's try to do a reveal_type
:
BTW, since this function has no return statement, its return type is
None
.
Running mypy on this:
$ mypy test.py
test.py:5: note: Revealed type is 'Union[builtins.str*, None]'
And we get one of our two new types: Union. Specifically, Union[str, None]
.
All this means, is that fav_color
can be one of two different types, either str
, or None
.
And unions are actually very important for Python, because of how Python does polymorphism. Here's a simpler example:
$ python test.py
Hi!
This is a test
of polymorphism
Now let's add types to it, and learn some things by using our friend reveal_type
:
Can you guess the output of the reveal_type
s?
$ mypy test.py
test.py:4: note: Revealed type is 'Union[builtins.str, builtins.list[builtins.str]]'
test.py:8: note: Revealed type is 'builtins.list[builtins.str]'
test.py:11: note: Revealed type is 'builtins.str'
Mypy is smart enough, where if you add an isinstance(...)
check to a variable, it will correctly assume that the type inside that block is narrowed to that type.
In our case, item
was correctly identified as List[str]
inside the isinstance
block, and str
in the else
block.
This is an extremely powerful feature of mypy, called Type narrowing.
Now, here's a more contrived example, a tpye-annotated Python implementation of the builtin function abs
:
And that's everything you need to know about Union.
... so what's Optional
you ask?
Well, Union[X, None]
seemed to occur so commonly in Python, that they decided it needs a shorthand. Optional[str]
is just a shorter way to write Union[str, None]
.
Any type
If you ever try to run reveal_type
inside an untyped function, this is what happens:
The revealed type is told to be Any
.
Any
just means that anything can be passed here. Whatever is passed, mypy should just accept it. In other words, Any
turns off type checking.
Of course, this means that if you want to take advantage of mypy, you should avoid using Any
as much as you can.
But since Python is inherently a dynamically typed language, in some cases it's impossible for you to know what the type of something is going to be. For such cases, you can use Any
. For example:
You can also use Any
as a placeholder value for something while you figure out what it should be, to make mypy happy in the meanwhile. But make sure to get rid of the Any
if you can .
Miscellaneous types
Tuple
You might think of tuples as an immutable list, but Python thinks of it in a very different way.
Tuples are different from other collections, as they are essentially a way to represent a collection of data points related to an entity, kinda similar to how a C struct
is stored in memory. While other collections usually represent a bunch of objects, tuples usually represent a single object.
A good example is sqlite:
Tuples also come in handy when you want to return multiple values from a function, for example:
Because of these reasons, tuples tend to have a fixed length, with each index having a specific type. (Our sqlite example had an array of length 3 and types int
, str
and int
respectively.
Here's how you'd type a tuple:
However, sometimes you do have to create variable length tuples. You can use the Tuple[X, ...]
syntax for that.
The ...
in this case simply means there's a variable number of elements in the array, but their type is X
. For example:
TypedDict
A TypedDict
is a dictionary whose keys are always string, and values are of the specified type. At runtime, it behaves exactly like a normal dictionary.
By default, all keys must be present in a TypedDict
. It is possible to override this by specifying total=False
.
Literal
A Literal
represents the type of a literal value. You can use it to constrain already existing types like str
and int
, to just some specific values of them. Like so:
$ mypy test.py
test.py:7: error: Argument 1 to "i_only_take_5" has incompatible type "Literal[6]"; expected "Literal[5]"
This has some interesting use-cases. A notable one is to use it in place of simple enums:
$ mypy test.py
test.py:8: error: Argument 1 to "make_request" has incompatible type "Literal['DLETE']"; expected "Union[Literal['GET'], Literal['POST'], Literal['DELETE']]"
Oops, you made a typo in 'DELETE'
! Don't worry, mypy saved you an hour of debugging.
Final
Final
is an annotation that declares a variable as final. What that means that the variable cannot be re-assigned to. This is similar to final
in Java and const
in JavaScript.
NoReturn
NoReturn
is an interesting type. It's rarely ever used, but it still needs to exist, for that one time where you might have to use it.
There are cases where you can have a function that might never return. Two possible reasons that I can think of for this are:
- The function always raises an exception, or
- The function is an infinite loop.
Here's an example of both:
Note that in both these cases, typing the function as -> None
will also work. But if you intend for a function to never return anything, you should type it as NoReturn
, because then mypy will show an error if the function were to ever have a condition where it does return.
For example, if you edit while True:
to be while False:
or while some_condition()
in the first example, mypy will throw an error:
$ mypy test.py
test.py:6: error: Implicit return in function which does not return
Typing classes
All class methods are essentially typed just like regular functions, except for self
, which is left untyped. Here's a simple Stack class:
If you've never seen the
{x!r}
syntax inside f-strings, it's a way to use therepr()
of a value. For more information, pyformat.info is a very good resource for learning Python's string formatting features.
There's however, one caveat to typing classes: You can't normally access the class itself inside the class' function declarations (because the class hasn't been finished declaring itself yet, because you're still declaring its methods).
So something like this isn't valid Python:
$ mypy --strict test.py
Success: no issues found in 1 source file
$ python test.py
Traceback (most recent call last):
File "/home/tushar/code/test/test.py", line 11, in <module>
class MyClass:
File "/home/tushar/code/test/test.py", line 15, in MyClass
def copy(self) -> MyClass:
NameError: name 'MyClass' is not defined
There's two ways to fix this:
- Turn the classname into a string: The creators of PEP 484 and Mypy knew that such cases exist where you might need to define a return type which doesn't exist yet. So, mypy is able to check types if they're wrapped in strings.
- Use
from __future__ import annotations
. What this does, is turn on a new feature in Python called "postponed evaluation of type annotations". This essentially makes Python treat all type annotations as strings, storing them in the internal__annotations__
attribute. Details are described in PEP 563.
Starting with Python 3.11, the Postponed evaluation behaviour will become default, and you won't need to have the
__future__
import anymore.
Typing namedtuples
namedtuple
s are a lot like tuples, except every index of their fields is named, and they have some syntactic sugar which allow you to access its properties like attributes on an object:
Since the underlying data structure is a tuple, and there's no real way to provide any type information to namedtuples, by default this will have a type of Tuple[Any, Any, Any]
.
To combat this, Python has added a NamedTuple
class which you can extend to have the typed equivalent of the same:
If you're curious how Type declarations inside a function or class don't actually define the variable, but they add the type annotation to that function or class' metadata, in the form of a dictionary entry, into Doing Inner workings of NamedTuple:
NamedTuple
works under the hood: age: int
is a type declaration, without any assignment (like age : int = 5
). x.__annotations__
.print(ishan.__annotations__)
in the code above gives us {'name': <class 'str'>, 'age': <class 'int'>, 'bio': <class 'str'>}
.typing.NamedTuple
uses these annotations to create the required tuple.
Typing decorators
Decorators are a fairly advanced, but really powerful feature of Python. If you don't know anything about decorators, I'd recommend you to watch Anthony explains decorators, but I'll explain it in brief here as well.
A decorator is essentially a function that wraps another function. Decorators can extend the functionalities of pre-existing functions, by running other side-effects whenever the original function is called. A decorator decorates a function by adding new functionality.
A simple example would be to monitor how long a function takes to run:
To be able to type this, we'd need a way to be able to define the type of a function. That way is called Callable
.
Callable
is a generic type with the following syntax:
Callable[[<list of argument types>], <return type>]
The types of a function's arguments goes into the first list inside Callable
, and the return type follows after. A few examples:
Here's how you'd implenent the previously-shown time_it
decorator:
Note:
Callable
is what's called a Duck Type. What it means, is that you can create your own custom object, and make it a validCallable
, by implementing the magic method called__call__
. I have a dedicated section where I go in-depth about duck types ahead.
Typing generators
Generators are also a fairly advanced topic to completely cover in this article, and you can watch
Anthony explains generators if you've never heard of them. A brief explanation is this:
Generators are a bit like perpetual functions. Instead of returning a value a single time, they yield
values out of them, which you can iterate over. When you yield
a value from an iterator, its execution pauses. But when another value is requested from the generator, it resumes execution from where it was last paused. When the generator function returns, the iterator stops.
Here's an example:
To add type annotations to generators, you need typing.Generator
. The syntax is as follows:
Generator[yield_type, throw_type, return_type]
With that knowledge, typing this is fairly straightforward:
Since we're not raising any errors in the generator, throw_type
is None
. And although the return type is int
which is correct, we're not really using the returned value anyway, so you could use Generator[str, None, None]
as well, and skip the return
part altogether.
Typing *args
and **kwargs
*args
and **kwargs
is a feature of python that lets you pass any number of arguments and keyword arguments to a function (that's what the name args
and kwargs
stands for, but these names are just convention, you can name the variables anything). Anthony explains args and kwargs
All the extra arguments passed to *args
get turned into a tuple, and kewyord arguments turn into a dictionay, with the keys being the string keywords:
Since the *args
will always be of typle Tuple[X]
, and **kwargs
will always be of type Dict[str, X]
, we only need to provide one type value X
to type them. Here's a practical example:
Duck types
Duck types are a pretty fundamental concept of python: the entirety of the Python object model is built around the idea of duck types.
Quoting Alex Martelli:
"You don't really care for IS-A -- you really only care for BEHAVES-LIKE-A-(in-this-specific-context), so, if you do test, this behaviour is what you should be testing for."
What it means is that Python doesn't really care what the type of an object is, but rather how does it behave.
I had a short note above in typing decorators that mentioned duck typing a function with __call__
, now here's the actual implementation:
> Running mypy over the above code is going to give a cryptic error about "Special Forms", don't worry about that right now, we'll fix this in the Protocol section. All I'm showing right now is that the Python code works. PS.
You can see that Python agrees that both of these functions are "Call-able", i.e. you can call them using the x()
syntax. (this is why the type is called Callable
, and not something like Function
)
What duck types provide you is to be able to define your function parameters and return types not in terms of concrete classes, but in terms of how your object behaves, giving you a lot more flexibility in what kinds of things you can utilize in your code now, and also allows much easier extensibility in the future without making "breaking changes".
A simple example here:
Running this code with Python works just fine. But running mypy over this gives us the following error:
$ mypy test.py
test.py:12: error: Argument 1 to "count_non_empty_strings" has incompatible type "ValuesView[str]"; expected "List[str]"
ValuesView
is the type when you do dict.values()
, and although you could imagine it as a list of strings in this case, it's not exactly the type List
.
In fact, none of the other sequence types like tuple
or set
are going to work with this code. You could patch it for some of the builtin types by doing strings: Union[List[str], Set[str], ...]
and so on, but just how many types will you add? And what about third party/custom types?
The correct solution here is to use a Duck Type (yes, we finally got to the point). The only thing we want to ensure in this case is that the object can be iterated upon (which in Python terms means that it implements the __iter__
magic method), and the right type for that is Iterable
:
And now mypy is happy with our code.
There are many, many of these duck types that ship within Python's typing
module, and a few of them include:
-
Sequence
for defining things that can be indexed and reversed, likeList
andTuple
. -
MutableMapping
, for when you have a key-value pair kind-of data structure, likedict
, but also others likedefaultdict
,OrderedDict
andCounter
from the collections module. -
Collection
, if all you care about is having a finite number of items in your data structure, eg. aset
,list
,dict
, or anything from the collections module.
If you haven't already at this point, you should really look into how python's syntax and top level functions hook into Python's object model via
__magic_methods__
, for essentially all of Python's behaviour. The documentation for it is right here, and there's an excellent talk by James Powell that really dives deep into this concept in the beginning.
Function overloading with @overload
Let's write a simple add
function that supports int
's and float
's:
The implementation seems perfectly fine... but mypy isn't happy with it:
$ test.py:15: error: No overload variant of "__getitem__" of "list" matches argument type "float"
test.py:15: note: Possible overload variants:
test.py:15: note: def __getitem__(self, int) -> int
test.py:15: note: def __getitem__(self, slice) -> List[int]
What mypy is trying to tell us here, is that in the line:
print(joined_list[last_index])
last_index
could be of type float
. And checking with reveal_type, that definitely is the case:
And since it could, mypy won't allow you to use a possible float value to index a list, because that will error out.
One thing we could do is do an isinstance
assertion on our side to convince mypy:
But this will be pretty cumbersome to do at every single place in our code where we use add
with int
's. Also we as programmers know, that passing two int
's will only ever return an int
. But how do we tell mypy that?
Answer: use @overload
. The syntax basically replicates what we wanted to say in the paragraph above:
And now mypy knows that add(3, 4)
returns an int
.
Note that Python has no way to ensure that the code actually always returns an
int
when it getsint
values. It's your job as the programmer providing these overloads, to verify that they are correct. This is why in some cases, usingassert isinstance(...)
could be better than doing this, but for most cases@overload
works fine.Also, in the overload definitions
-> int: ...
, the...
at the end is a convention for when you provide type stubs for functions and classes, but you could technically write anything as the function body:pass
,42
, etc. It'll be ignored either way.
Another good overload example is this:
Type
type
Type
is a type used to type classes. It derives from python's way of determining the type of an object at runtime:
You'd usually use
issubclass(x, int)
instead oftype(x) == int
to check for behaviour, but sometimes knowing the exact type can help, for eg. in optimizations.
Since type(x)
returns the class of x
, the type of a class C
is Type[C]
:
We had to use
Any
in 3 places here, and 2 of them can be eliminated by using generics, and we'll talk about it later on.
Typing pre-existing projects
If you need it, mypy gives you the ability to add types to your project without ever modifying the original source code. It's done using what's called "stub files".
Stub files are python-like files, that only contain type-checked variable, function, and class definitions. It's kindof like a mypy header file.
You can make your own type stubs by creating a .pyi
file:
Now, run mypy on the current folder (make sure you have an __init__.py
file in the folder, if not, create an empty one).
$ ls
__init__.py test.py test.pyi
$ mypy --strict .
Success: no issues found in 2 source files
Type debugging - Part 2
Since we are on the topic of projects and folders, let's discuss another one of pitfalls that you can find yourselves in when using mypy.
The first one is PEP 420
A fact that took me some time to realise, was that for mypy to be able to type-check a folder, the folder must be a module.
Let's say you find yourself in this situatiion:
$ tree
.
├── test.py
└── utils
└── foo.py
1 directory, 2 files
$ cat test.py
from utils.foo import average
print(average(3, 4))
$ cat utils/foo.py
def average(x: int, y: int) -> float:
return float(x + y) / 2
$ py test.py
3.5
$ mypy test.py
test.py:1: error: Cannot find implementation or library stub for module named 'utils.foo'
test.py:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 1 source file)
What's the problem? Python is able to find utils.foo
no problems, why can't mypy?
The error is very cryptic, but the thing to focus on is the word "module" in the error. utils.foo
should be a module, and for that, the utils
folder should have an __init__.py
, even if it's empty.
$ tree
.
├── test.py
└── utils
├── foo.py
└── __init__.py
1 directory, 3 files
$ mypy test.py
Success: no issues found in 1 source file
Now, the same issue re-appears if you're installing your package via pip, because of a completely different reason:
$ tree ..
..
├── setup.py
├── src
│ └── mypackage
│ ├── __init__.py
│ └── utils
│ ├── foo.py
│ └── __init__.py
└── test
└── test.py
4 directories, 5 files
$ cat ../setup.py
from setuptools import setup, find_packages
setup(
name="mypackage",
packages = find_packages('src'),
package_dir = {"":"src"}
)
$ pip install ..
[...]
successfully installed mypackage-0.0.0
$ cat test.py
from mypackage.utils.foo import average
print(average(3, 4))
$ python test.py
3.5
$ mypy test.py
test.py:1: error: Cannot find implementation or library stub for module named 'mypackage.utils.foo'
test.py:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 1 source file)
What now? Every folder has an __init__.py
, it's even installed as a pip package and the code runs, so we know that the module structure is right. What gives?
Well, turns out that pip packages aren't type checked by mypy by default. This behaviour exists because type definitions are opt-in by default. Python packages aren't expected to be type-checked, because mypy types are completely optional. If mypy were to assume every package has type hints, it would show possibly dozens of errors because a package doesn't have proper types, or used type hints for something else, etc.
To opt-in for type checking your package, you need to add an empty py.typed
file into your package's root directory, and also include it as metadata in your setup.py
:
$ tree ..
..
├── setup.py
├── src
│ └── mypackage
│ ├── __init__.py
│ ├── py.typed
│ └── utils
│ ├── foo.py
│ └── __init__.py
└── test
└── test.py
4 directories, 6 files
$ cat ../setup.py
from setuptools import setup, find_packages
setup(
name="mypackage",
packages = find_packages(
where = 'src',
),
package_dir = {"":"src"},
package_data={
"mypackage": ["py.typed"],
}
)
$ mypy test.py
Success: no issues found in 1 source file
There's yet another third pitfall that you might encounter sometimes, which is if a.py
declares a class MyClass
, and it imports stuff from a file b.py
which requires to import MyClass
from a.py
for type-checking purposes.
This creates an import cycle, and Python gives you an ImportError
. To avoid this, simple add an if typing.TYPE_CHECKING:
block to the import statement in b.py
, since it only needs MyClass
for type checking. Also, everywhere you use MyClass
, add quotes: 'MyClass'
so that Python is happy.
Typing Context managers
Context managers are a way of adding common setup and teardown logic to parts of your code, things like opening and closing database connections, establishing a websocket, and so on. On the surface it might seem simple but it's a pretty extensive topic, and if you've never heard of it before, Anthony covers it here.
To define a context manager, you need to provide two magic methods in your class, namely __enter__
and __exit__
. They're then called automatically at the start and end if your with
block.
You might have used a context manager before: with open(filename) as file:
- this uses a context manager underneath. Speaking of which, let's write our own implementation of open
:
Typing async functions
The typing
module has a duck type for all types that can be awaited: Awaitable
.
Just like how a regular function is a Callable
, an async function is a Callable
that returns an Awaitable
:
Generics
Generics (or generic types) is a language feature that lets you "pass types inside other types".
I personally think it is best explained with an example:
Let's say you have a function that returns the first item in an array. To define this, we need this behaviour:
"Given a list of type List[X]
, we will be returning an item of type X
."
And that's exactly what generic types are: defining your return type based on the input type.
Generic functions
We've seen make_object
from the Type type section before, but we had to use Any
to be able to support returning any kind of object that got created by calling cls(*args)
. But, we don't actually have to do that, because we can use generics. Here's how you'd do that:
T = TypeVar('T')
is how you declare a generic type in Python. What the function definition now says, is "If i give you a class that makes T
's, you'll be returning an object T
".
And sure enough, the reveal_type
on the bottom shows that mypy knows c
is an object of MyClass
.
The generic type name
T
is another convention, you can call it anything.
Another example: largest
, which returns the largest item in a list:
This seems good, but mypy isn't happy:
$ mypy --strict test.py
test.py:10: error: Unsupported left operand type for > ("T")
Found 1 error in 1 file (checked 1 source file)
This is because you need to ensure you can do a < b
on the objects, to compare them with each other, which isn't always the case:
>>> {} < {}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'
For this, we need a Duck Type that defines this "a less than b" behaviour.
And although currently Python doesn't have one such builtin hankfully, there's a "virtual module" that ships with mypy called _typeshed
. It has a lot of extra duck types, along with other mypy-specific features.
Now, mypy will only allow passing lists of objects to this function that can be compared to each other.
If you're wondering why checking for
<
was enough while our code uses>
, that's how python does comparisons. I'm planning to write an article on this later.
Note that _typeshed
is not an actual module in Python, so you'll have to import it by checking if TYPE_CHECKING
to ensure python doesn't give a ModuleNotFoundError
. And since SupportsLessThan
won't be defined when Python runs, we had to use it as a string when passed to TypeVar
.
At this point you might be interested in how you could implement one of your own such
SupportsX
types. For that, we have another section below: Protocols.
Generic classes
we implemented a simple Stack class in typing classes, but it only worked for integers. But we can very simply make it work for any type.
To do that, we need mypy to understand what T
means inside the class. And for that, we need the class to extend Generic[T]
, and then provide the concrete type to Stack
:
You can pass as many TypeVar
s to Generic[...]
as you need, for eg. to make a generic dictionary, you might use class Dict(Generic[KT, VT]): ...
Generic types
Generic types (a.k.a. Type Aliases) allow you to put a commonly used type in a variable -- and then use that variable as if it were that type.
And mypy lets us do that very easily: with literally just an assignment. The generics parts of the type are automatically inferred.
There is an upcoming syntax that makes it clearer that we're defining a type alias:
Vector: TypeAlias = Tuple[int, int]
. This is available starting Python 3.10
Just like how we were able to tell the TypeVar T
before to only support types that SupportLessThan
, we can also do that
AnyStr
is a builtin restricted TypeVar, used to define a unifying type for functions that accept str
and bytes
:
This is different from Union[str, bytes]
, because AnyStr
represents Any one of those two types at a time, and thus doesn't concat
doesn't accept the first arg as str
and the second as bytes
.
Advanced/Recursive type checking with Protocol
We implemented FakeFuncs
in the duck types section above, and we used isinstance(FakeFuncs, Callable)
to verify that the object indeed, was recognized as a callable.
But what if we need to duck-type methods other than __call__
?
If we want to do that with an entire class: That becomes harder. Say we want a "duck-typed class", that "has a get method that returns an int", and so on. We don't actually have access to the actual class for some reason, like maybe we're writing helper functions for an API library.
To do that, we need to define a Protocol
:
Using this, we were able to type check out code, without ever needing a completed Api
implementaton.
This is extremely powerful. We're essentially defining the structure of object we need, instead of what class it is from, or it inherits from. This gives us the flexibility of duck typing, but on the scale of an entire class.
Remember SupportsLessThan
? if you check its implementation in _typeshed
, this is it:
Yeah, that's the entire implementaton.
What this also allows us to do is define Recursive type definitions. The simplest example would be a Tree:
Note that for this simple example, using Protocol wasn't necessary, as mypy is able to understand simple recursive structures. But for anything more complex than this, like an N-ary tree, you'll need to use Protocol.
Structural subtyping and all of its features are defined extremely well in PEP 544.
Further learning
If you're interested in reading even more about types, mypy has excellent documentation, and you should definitely read it for further learning, especially the section on Generics.
I referenced a lot of Anthony Sottile's videos in this for topics out of reach of this article. He has a YouTube channel where he posts short, and very informative videos about Python.
You can find the source code the typing module here, of all the typing duck types inside the _collections_abc module, and of the extra ones in _typeshed
in the typeshed repo.
A topic that I skipped over while talking about TypeVar
and generics, is Variance. It's a topic in type theory that defines how subtypes and generics relate to each other. If you want to learn about it in depth, there's documentation in mypy docs of course, and there's two more blogs I found which help grasp the concept, here and here.
A bunch of this material was cross-checked using Python's official documentation, and honestly their docs are always great. Also, the "Quick search" feature works surprisingly well.
There's also quite a few typing PEPs you can read, starting with the kingpin: PEP 484, and the accompanying PEP 526. Other PEPs I've mentioned in the article above are PEP 585, PEP 563, PEP 420 and PEP 544.
And that's it!
I've worked pretty hard on this article, distilling down everything I've learned about mypy in the past year, into a single source of knowledge. If you have any doubts, thoughts, or suggestions, be sure to comment below and I'll get back to you.
Also, if you read the whole article till here, Thank you! And congratulations, you now know almost everything you'll need to be able to write fully typed Python code in the future. I hope you liked it ✨
Top comments (13)
I thought I use typehints a lot, but I have not yet encountered half of the things described here! Thank you for such an awesome and thorough article :3
Question. What do you think would be best approach on separating types for several concepts that share the same builtin type underneath? To avoid something like:
In modern C++ there is a concept of
ratio
heavily used instd::chrono
to convert seconds in milliseconds and vice versa, and there are strict-typing libraries for various SI units. Would be nice to have some alternative for that in python.oh yea, that's the one thing that I omitted from the article because I couldn't think up a reason to use it. mypy has
NewType
which less you subtype any other typelike you can do
ms = NewType('ms', int)
and now if your function requires ams
it won't work with anint
, you need to specifically doms(1000)
. But in python code, it's still just an int. I think that's exactly what you need.Totally! The ultimate syntactic sugar now would be an option to provide automatic "conversion constructors" for those custom types, like
def __ms__(seconds: s): return ms(s*1000)
- but that's not a big deal compared to ability to differentiate integral types semantically.Thank you :)
Knowing that it's Python, I'm pretty sure that's easy to patch in on your side as well :)
I'm going to add NewType to the article now that I have a reason to :)
Thanks for this very interesting article. Same as Artalus below, I use types a lot in all my recent Py modules, but I learned a lot of new tricks by reading this.
Superb!
I am using
pyproject.toml
as a configuration file andstubs
folder for my custom-types for third party packages.Running from CLI,
mypy .
runs successfully. but when it runs at pre-commit, it fails (probably assuming stubs not present and thus return type isAny
)Great post! I use type hinting all the time in python, it helps readability in larger projects.
In JavaScript ecosystem, some third-party libraries have no Typescript support at all or sometimes have incorrect types which can be a major hassle during development. How's the status of mypy in Python ecosystem?
Mypy is still fairly new, it was essentially unknown as early as 4 years ago. But the good thing about both of them is that you can add types to projects even if the original authors don't, using type stub files, and most common libraries have either type support or stubs available :)
Not much different than TypeScript honestly.
What a great post! This is the most comprehensive article about mypy I have ever found, really good. Congratulations!
Thanks a lot, that's what I aimed it to be :D
Used mypy for the first time today and sometimes quite obscur. Many, many thanks for your GREAT article !