In this article, I will discuss the most important aspects of namedtuple
s in Python. We’ll start from the very basics and move up to more complex concepts. You’ll learn why you should use them and how they can make your code cleaner. At the end of this guide, you’ll feel very comfortable using them in numerous situations.
Learning Objectives
By the end of this tutorial, you should be able to:
- Understand why and when you should use it
- Convert regular tuples and dictionaries into Namedtuples
- Convert a
namedtuple
to dictionary or regular tuple - Sort a list of Namedtuples
- Understand the differences between Namedtuples and Data Classes
- Create Namedtuples with optionals fields
- Serialize Namedtuples to JSON
- Add docstrings
Table of Contents
- Why should I use
namedtuple
s? - How to Convert a Regular Tuple or Dictionary Into a
namedtuple
- How to Convert a
namedtuple
to Dictionary or Regular Tuple - How to Sort a List of
namedtuple
s - How to Serialize
namedtuple
s to JSON - How to Add a
docstring
to anamedtuple
- What Are the Differences Between
namedtuple
s and Data Classes? - Conclusion
Why should I use namedtuple
s?
namedtuple
is a very interesting (and also underrated) data structure. It’s very common to find Python’s code that heavily relies on regular tuples, or sometimes dictionaries, to store data. Don’t get me wrong, both dictionaries and tuples have their value. The problem lies in misusing them. Allow me to explain.
Suppose that you have a function that converts a string into a color. The color must be represented in a 4-dimensional space, the RGBA.
def convert_string_to_color(desc: str, alpha: float = 0.0):
if desc == "green":
return 50, 205, 50, alpha
elif desc == "blue":
return 0, 0, 255, alpha
else:
return 0, 0, 0, alpha
Then, we can use it like this:
r, g, b, a = convert_string_to_color(desc="blue", alpha=1.0)
Ok, that works, but... we have a couple of problems here. The first one is, there's no way to ensure the order of the returned values. That is, there's nothing stopping another developer to call convert_string_to_color
like this:
g, b, r, a = convert_string_to_color(desc="blue", alpha=1.0)
Also, we may not know that the function returns 4 values, and end up calling the function like so:
r, g, b = convert_string_to_color(desc="blue", alpha=1.0)
Which, in turn, fails with ValueError
since we cannot unpack the whole tuple.
That's true. But why don't you use a dictionary instead?
Python’s dictionaries are a very versatile data structure. They can serve as an easy and convenient way to store multiple values. However, a dict
doesn’t come without shortcomings. Due to its flexibility, dictionaries are very easily abused. As an illustration, let us convert our example to use a dictionary instead of tuple.
def convert_string_to_color(desc: str, alpha: float = 0.0):
if desc == "green":
return {"r": 50, "g": 205, "b": 50, "alpha": alpha}
elif desc == "blue":
return {"r": 0, "g": 0, "b": 255, "alpha": alpha}
else:
return {"r": 0, "g": 0, "b": 0, "alpha": alpha}
Ok, we now can use it like this, expecting just one value to be returned:
color = convert_string_to_color(desc="blue", alpha=1.0)
No need to remember the order, but it has at least two drawbacks. The first one is that we must keep track of the key’s names. If we change {"r": 0, “g”: 0, “b”: 0, “alpha”: alpha}
to {”red": 0, “green”: 0, “blue”: 0, “a”: alpha}
, when accessing a field, we’ll get a KeyError
back, as the keys r, g, b, and alpha no longer exist.
The second issue with dict
s is that they are not hashable. That means we cannot store them in a set
or other dictionaries. Let’s imagined that we want to keep track of how many colors a particular image has. If we use collections.Counter
to count, we’ll get TypeError: unhashable type: ‘dict’
.
Also, dictionaries are mutable, so we can add as many new keys as we want. Trust me, this is a recipe for nasty bugs that are really hard to track down.
Ok, fine, that makes sense. So, now what? What I can use instead?
namedtuple
s! Just... use it!
Converting our function to use namedtuple
s is as easy as this:
from collections import namedtuple
...
Color = namedtuple("Color", "r g b alpha")
...
def convert_string_to_color(desc: str, alpha: float = 0.0):
if desc == "green":
return Color(r=50, g=205, b=50, alpha=alpha)
elif desc == "blue":
return Color(r=50, g=0, b=255, alpha=alpha)
else:
return Color(r=50, g=0, b=0, alpha=alpha)
Like the dict
’s case, we can assign it to a single variable and use as we please. There’s no need to remember ordering. And if you’re using an IDE such as PyCharm and VSCode, you have auto completions out of the box.
color = convert_string_to_color(desc="blue", alpha=1.0)
...
has_alpha = color.alpha > 0.0
...
is_black = color.r == 0 and color.g == 0 and color.b == 0
To top it all off, namedtuple
s are immutable. If another developer on the team thinks it’s a good idea to add a new field during runtime, the program will fail.
>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> blue.e = 0
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-8c7f9b29c633> in <module>
----> 1 blue.e = 0
AttributeError: 'Color' object has no attribute 'e'
Not only that, now we can use it the Counter
to track how many colors a collection has.
>>> Counter([blue, blue])
>>> Counter({Color(r=0, g=0, b=255, alpha=1.0): 2})
How to Convert a Regular Tuple or Dictionary Into a namedtuple
Now that we understand the motivations behind using namedtuple
, it’s time to learn how to convert regular tuples and dictionaries into named tuples. Say that, for whatever reasons, you have dictionary instance containing the RGBA values for a color. If you want to convert it to the Color
namedtuple
we just created, you go do it like this:
>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color(**c)
>>> Color(r=50, g=205, b=50, alpha=0)
That’s it. We can just leverage the **
construct to unpack the dict
into a namedtuple
.
What if I want to create a
namedtuple
from thedict
?
No problem, do it like this and you're good:
>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color = namedtuple("Color", c)
>>> Color(**c)
Color(r=50, g=205, b=50, alpha=0)
By passing the dict
instance to the namedtuple
factory function, it will take care of the creation of the fields for you. Then, to create a new Color
instance from a dict
we can just unpack the dictionary like in the previous example.
How to Convert a namedtuple
to Dictionary or Regular Tuple
We've just learned how to convert a namedtuple
into a dict
. What about the inverse? How can we convert it to a dictionary instance?
It turns out, namedtuple
comes with a method called ._asdict()
. So, converting it is as simple as calling the method.
>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> blue._asdict()
{'r': 0, 'g': 0, 'b': 255, 'alpha': 1.0}
You may be wondering why the method starts with a _
. Unfortunately, this is one of the inconsistencies with Python. Usually, _
represents private method or attribute. However, namedtuple
adds them to its public method to avoid naming conflicts. Besides _asdict
, there’s also _replace
, _fields
, and _field_defaults
. You can find all of them here.
To convert a named tuple into a regular tuple, it's enough to pass it to a tuple
constructor.
>>> tuple(Color(r=50, g=205, b=50, alpha=0.1))
(50, 205, 50, 0.1)
How to Sort a List of namedtuple
s
Another common use case is storing several namedtuple
s instances in a list and sort them based on some criteria. For example, say that we have a list of colors and we need to sort them by alpha intensity.
Fortunately, Python allows a very pythonic way of doing that. We can use the operator.attrgetter
operator. According to the docs, attrgetter
“returns a callable object that fetches attr from its operand”. In layman’s terms, we can pass the name of the field, we want to sort it and pass it to the sorted
function. Example:
from operator import attrgetter
...
colors = [
Color(r=50, g=205, b=50, alpha=0.1),
Color(r=50, g=205, b=50, alpha=0.5),
Color(r=50, g=0, b=0, alpha=0.3)
]
...
>>> sorted(colors, key=attrgetter("alpha"))
[Color(r=50, g=205, b=50, alpha=0.1),
Color(r=50, g=0, b=0, alpha=0.3),
Color(r=50, g=205, b=50, alpha=0.5)]
Now, the list of colors is sorted in ascending order by alpha
intensity!
How to Serialize namedtuple
s to JSON
Sometimes you may need to save a namedtuple
to JSON. As you may probably know, Python’s dictionaries can be converted to JSON through the json
module. As a result, if we convert our tuple to dictionary with the _asdict
method, then we’re all set. As an example, consider this scenario:
>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> import json
>>> json.dumps(blue._asdict())
'{"r": 0, "g": 0, "b": 255, "alpha": 1.0}'
As you can see, json.dumps
converts a dict
into a JSON string.
How to Add a docstring
to a namedtuple
In Python, we can document methods, classes and modules using plain strings. This string is then made available as a special attribute named __doc__
. That being said, how can we add a documentation to our Color
namedtuple
?
There’s no right answer to this, but we can do it in two ways. The first one (and a bit more cumbersome) is to extend the tuple using a wrapper. By doing so, we can then define the docstring
in this wrapper. As an example, consider the following snippet:
_Color = namedtuple("Color", "r g b alpha")
class Color(_Color):
"""A namedtuple that represents a color.
It has 4 fields:
r - red
g - green
b - blue
alpha - the alpha channel
"""
>>> print(Color.__doc__)
A namedtuple that represents a color.
It has 4 fields:
r - red
g - green
b - blue
alpha - the alpha channel
>>> help(Color)
Help on class Color in module __main__:
class Color(Color)
| Color(r, g, b, alpha)
|
| A namedtuple that represents a color.
| It has 4 fields:
| r - red
| g - green
| b - blue
| alpha - the alpha channel
|
| Method resolution order:
| Color
| Color
| builtins.tuple
| builtins.object
|
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
As you can see, by inheriting the _Color
tuple, we added a __doc__
attribute it.
The second way of adding docstring
is just setting __doc__
. You see? There’s no need to extend the tuple in the first place.
>>> Color.__doc__ = """A namedtuple that represents a color.
It has 4 fields:
r - red
g - green
b - blue
alpha - the alpha channel
"""
Just bear in mind that these methods only work on Python 3+.
What Are the Differences Between namedtuple
s and Data Classes?
Before Python 3.7, creating a simple container of data involved using either:
- a
namedtuple
- a regular class
- a third-party library, such as
attrs
.
If you wanted to go through the class route, that meant you would have to implement a couple of methods. For instance, a regular class will require a __init__
method to set the attributes during class instantiation. If you wanted the class to be hashable, that meant implementing yourself a __hash__
method. To compare different objects, you also want a __eq__
method implemented. And finally, to make debugging easier, you need a __repr__
method. Again, let’s revisit our color use case again using a regular class.
class Color:
"""A regular class that represents a color."""
def __init__(self, r, g, b, alpha=0.0):
self.r = r
self.g = g
self.b = b
self.alpha = alpha
def __hash__(self):
return hash((self.r, self.g, self.b, self.alpha))
def __repr__(self):
return "{0}({1}, {2}, {3}, {4})".format(
self.__class__.__name__, self.r, self.g, self.b, self.alpha
)
def __eq__(self, other):
if not isinstance(other, Color):
return False
return (
self.r == other.r
and self.g == other.g
and self.b == other.b
and self.alpha == other.alpha
)
As you can see, there's a lot to implement. You just need a container to hold the data for you and not bother with distracting details. Also, a key difference why people preferred to implement a class is that they are mutable. In fact, the PEP that introduced Data Classes refers them as "mutable namedtuple
s with defaults".
Now, let's see how this class is implemented as a Data Class.
from dataclasses import dataclass
...
@dataclass
class Color:
"""A regular class that represents a color."""
r: float
g: float
b: float
alpha: float
Wow! Is that it?
Yes, that's it. As simple as that! A major difference is that, since there's no __init__
any more, you can just define the attributes after the docstring
. Also, they must be annotated with a type hint.
Besides being mutable, a Data Class can also have optional fields out of the box. Let’s say that our Color
class does not require an alpha
field. We can then make it Optional
.
from dataclasses import dataclass
from typing import Optional
...
@dataclass
class Color:
"""A regular class that represents a color."""
r: float
g: float
b: float
alpha: Optional[float]
And we can instantiate it like so:
>>> blue = Color(r=0, g=0, b=255)
Since they're mutable, we can change whatever field we want. And we can instantiate it like so:
>>> blue = Color(r=0, g=0, b=255)
>>> blue.r = 1
>>> # or even add more fields on the fly
>>> blue.e = 10
Unfortunately, due to their nature, namedtuple
s don't have optional fields by default. To add them we need a bit of a hack and a little meta-programming.
Caveat: To add a __hash__
method, you need to make them immutable by setting unsafe_hash
to True
:
@dataclass(unsafe_hash=True)
class Color:
...
Another difference is that unpacking is a first-class citizen with namedtuple
s. If you want your Data Class to have the same behavior, you must implement yourself.
from dataclasses import dataclass, astuple
...
@dataclass
class Color:
"""A regular class that represents a color."""
r: float
g: float
b: float
alpha: float
def __iter__(self):
yield from dataclasses.astuple(self)
Performance Comparison
Comparing only the features is not enough, Named Tuples and Data Classes differ in performance too. Data Classes are implemented in pure Python and based on a dict
. This makes them faster when it comes to accessing the fields. On the other hand, namedtuple
s are just an extension a regular tuple
. That means their implementation is based on a faster C code and have a smaller memory footprint.
To show that, consider this experiment on Python 3.8.5.
In [6]: import sys
In [7]: ColorTuple = namedtuple("Color", "r g b alpha")
In [8]: @dataclass
...: class ColorClass:
...: """A regular class that represents a color."""
...: r: float
...: g: float
...: b: float
...: alpha: float
...:
In [9]: color_tup = ColorTuple(r=50, g=205, b=50, alpha=1.0)
In [10]: color_cls = ColorClass(r=50, g=205, b=50, alpha=1.0)
In [11]: %timeit color_tup.r
36.8 ns ± 0.109 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [12]: %timeit color_cls.r
38.4 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [15]: sys.getsizeof(color_tup)
Out[15]: 72
In [16]: sys.getsizeof(color_cls) + sys.getsizeof(vars(color_cls))
Out[16]: 152
As you can see, accessing a field is slightly faster in a dataclass
, however they take up much more space in memory than a tuple.
How to Add Type Hints to a namedtuple
As you can see, Data Classes use type hints by default. However, we can have them on namedtuple
s as well. By importing the Namedtuple
annotation type and inheriting from it, we can have our Color
tuple annotated.
from typing import NamedTuple
...
class Color(NamedTuple):
"""A namedtuple that represents a color."""
r: float
g: float
b: float
alpha: float
Another detail that might have gone unnoticed is that this way also allows us to have docstring
s. If we type help(Color)
we'll be able to see them.
Help on class Color in module __main__:
class Color(builtins.tuple)
| Color(r: float, g: float, b: float, alpha: Union[float, NoneType])
|
| A namedtuple that represents a color.
|
| Method resolution order:
| Color
| builtins.tuple
| builtins.object
|
| Methods defined here:
|
| __getnewargs__(self)
| Return self as a plain tuple. Used by copy and pickle.
|
| __repr__(self)
| Return a nicely formatted representation string
|
| _asdict(self)
| Return a new dict which maps field names to their values.
How to Add Optional Default Values to a namedtuple
In the last section, we learned that Data Classes can have optional values. Also, I mentioned that to mimic the same behavior on a named tuple
requires some hacking. As it turns out, we can use inheritance, as in the example below.
from collections import namedtuple
class Color(namedtuple("Color", "r g b alpha")):
__slots__ = ()
def __new__(cls, r, g, b, alpha=None):
return super().__new__(cls, r, g, b, alpha)
>>> c = Color(r=0, g=0, b=0)
>>> c
Color(r=0, g=0, b=0, alpha=None)
Conclusion
Named Tuples are a very powerful data structure. They make our code cleaner and more reliable. Despite the competition against the new Data Classes, they still have plenty of firewood to burn. In this tutorial, we learned several ways of making use of namedtuple
s, and I hope you can them useful.
If you liked this post, consider sharing it with your friends! Also, feel free to follow me https://miguendes.me.
Everything You Need to Know About Python's Namedtuples first appeared on miguendes's blog.
Top comments (0)