DEV Community

Cover image for Everything You Need to Know About Python's Namedtuples
Miguel Brito
Miguel Brito

Posted on • Originally published at miguendes.me

Everything You Need to Know About Python's Namedtuples

In this article, I will discuss the most important aspects of namedtuples in Python. We’ll start from the very basics and move up to more complex concepts. You’ll learn why you should use them and how they can make your code cleaner. At the end of this guide, you’ll feel very comfortable using them in numerous situations.

Learning Objectives

By the end of this tutorial, you should be able to:

  • Understand why and when you should use it
  • Convert regular tuples and dictionaries into Namedtuples
  • Convert a namedtuple to dictionary or regular tuple
  • Sort a list of Namedtuples
  • Understand the differences between Namedtuples and Data Classes
  • Create Namedtuples with optionals fields
  • Serialize Namedtuples to JSON
  • Add docstrings

Table of Contents

  1. Why should I use namedtuples?
  2. How to Convert a Regular Tuple or Dictionary Into a namedtuple
  3. How to Convert a namedtuple to Dictionary or Regular Tuple
  4. How to Sort a List of namedtuples
  5. How to Serialize namedtuples to JSON
  6. How to Add a docstring to a namedtuple
  7. What Are the Differences Between namedtuples and Data Classes?
  8. Conclusion

Why should I use namedtuples?

namedtuple is a very interesting (and also underrated) data structure. It’s very common to find Python’s code that heavily relies on regular tuples, or sometimes dictionaries, to store data. Don’t get me wrong, both dictionaries and tuples have their value. The problem lies in misusing them. Allow me to explain.

Suppose that you have a function that converts a string into a color. The color must be represented in a 4-dimensional space, the RGBA.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return 50, 205, 50, alpha
    elif desc == "blue":
        return 0, 0, 255, alpha
    else:
        return 0, 0, 0, alpha

Then, we can use it like this:

r, g, b, a = convert_string_to_color(desc="blue", alpha=1.0)

Ok, that works, but... we have a couple of problems here. The first one is, there's no way to ensure the order of the returned values. That is, there's nothing stopping another developer to call convert_string_to_color like this:

g, b, r, a = convert_string_to_color(desc="blue", alpha=1.0)

Also, we may not know that the function returns 4 values, and end up calling the function like so:

r, g, b = convert_string_to_color(desc="blue", alpha=1.0)

Which, in turn, fails with ValueError since we cannot unpack the whole tuple.

That's true. But why don't you use a dictionary instead?

Python’s dictionaries are a very versatile data structure. They can serve as an easy and convenient way to store multiple values. However, a dict doesn’t come without shortcomings. Due to its flexibility, dictionaries are very easily abused. As an illustration, let us convert our example to use a dictionary instead of tuple.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return {"r": 50, "g": 205, "b": 50, "alpha": alpha}
    elif desc == "blue":
        return {"r": 0, "g": 0, "b": 255, "alpha": alpha}
    else:
        return {"r": 0, "g": 0, "b": 0, "alpha": alpha}

Ok, we now can use it like this, expecting just one value to be returned:

color = convert_string_to_color(desc="blue", alpha=1.0)

No need to remember the order, but it has at least two drawbacks. The first one is that we must keep track of the key’s names. If we change {"r": 0, “g”: 0, “b”: 0, “alpha”: alpha} to {”red": 0, “green”: 0, “blue”: 0, “a”: alpha}, when accessing a field, we’ll get a KeyError back, as the keys r, g, b, and alpha no longer exist.

The second issue with dicts is that they are not hashable. That means we cannot store them in a set or other dictionaries. Let’s imagined that we want to keep track of how many colors a particular image has. If we use collections.Counter to count, we’ll get TypeError: unhashable type: ‘dict’.

Also, dictionaries are mutable, so we can add as many new keys as we want. Trust me, this is a recipe for nasty bugs that are really hard to track down.

Ok, fine, that makes sense. So, now what? What I can use instead?

namedtuples! Just... use it!

Converting our function to use namedtuples is as easy as this:

from collections import namedtuple
...
Color = namedtuple("Color", "r g b alpha")
...
def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return Color(r=50, g=205, b=50, alpha=alpha)
    elif desc == "blue":
        return Color(r=50, g=0, b=255, alpha=alpha)
    else:
        return Color(r=50, g=0, b=0, alpha=alpha)

Like the dict’s case, we can assign it to a single variable and use as we please. There’s no need to remember ordering. And if you’re using an IDE such as PyCharm and VSCode, you have auto completions out of the box.

color = convert_string_to_color(desc="blue", alpha=1.0)
...
has_alpha = color.alpha > 0.0
...
is_black = color.r == 0 and color.g == 0 and color.b == 0

To top it all off, namedtuples are immutable. If another developer on the team thinks it’s a good idea to add a new field during runtime, the program will fail.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)

>>> blue.e = 0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-8c7f9b29c633> in <module>
----> 1 blue.e = 0

AttributeError: 'Color' object has no attribute 'e'

Not only that, now we can use it the Counter to track how many colors a collection has.

>>> Counter([blue, blue])
>>> Counter({Color(r=0, g=0, b=255, alpha=1.0): 2})

How to Convert a Regular Tuple or Dictionary Into a namedtuple

Now that we understand the motivations behind using namedtuple, it’s time to learn how to convert regular tuples and dictionaries into named tuples. Say that, for whatever reasons, you have dictionary instance containing the RGBA values for a color. If you want to convert it to the Color namedtuple we just created, you go do it like this:

>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color(**c)
>>> Color(r=50, g=205, b=50, alpha=0)

That’s it. We can just leverage the ** construct to unpack the dict into a namedtuple.

What if I want to create a namedtuple from the dict?

No problem, do it like this and you're good:

>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color = namedtuple("Color", c)
>>> Color(**c)
Color(r=50, g=205, b=50, alpha=0)

By passing the dict instance to the namedtuple factory function, it will take care of the creation of the fields for you. Then, to create a new Color instance from a dict we can just unpack the dictionary like in the previous example.

How to Convert a namedtuple to Dictionary or Regular Tuple

We've just learned how to convert a namedtuple into a dict. What about the inverse? How can we convert it to a dictionary instance?

It turns out, namedtuple comes with a method called ._asdict(). So, converting it is as simple as calling the method.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> blue._asdict()
{'r': 0, 'g': 0, 'b': 255, 'alpha': 1.0}

You may be wondering why the method starts with a _. Unfortunately, this is one of the inconsistencies with Python. Usually, _ represents private method or attribute. However, namedtuple adds them to its public method to avoid naming conflicts. Besides _asdict, there’s also _replace, _fields, and _field_defaults. You can find all of them here.

To convert a named tuple into a regular tuple, it's enough to pass it to a tupleconstructor.

>>> tuple(Color(r=50, g=205, b=50, alpha=0.1))
(50, 205, 50, 0.1)

How to Sort a List of namedtuples

Another common use case is storing several namedtuples instances in a list and sort them based on some criteria. For example, say that we have a list of colors and we need to sort them by alpha intensity.

Fortunately, Python allows a very pythonic way of doing that. We can use the operator.attrgetter operator. According to the docs, attrgetter “returns a callable object that fetches attr from its operand”. In layman’s terms, we can pass the name of the field, we want to sort it and pass it to the sorted function. Example:

from operator import attrgetter
...
colors = [
    Color(r=50, g=205, b=50, alpha=0.1),
    Color(r=50, g=205, b=50, alpha=0.5),
    Color(r=50, g=0, b=0, alpha=0.3)
]
...
>>> sorted(colors, key=attrgetter("alpha"))
[Color(r=50, g=205, b=50, alpha=0.1),
 Color(r=50, g=0, b=0, alpha=0.3),
 Color(r=50, g=205, b=50, alpha=0.5)]

Now, the list of colors is sorted in ascending order by alpha intensity!

How to Serialize namedtuples to JSON

Sometimes you may need to save a namedtupleto JSON. As you may probably know, Python’s dictionaries can be converted to JSON through the json module. As a result, if we convert our tuple to dictionary with the _asdict method, then we’re all set. As an example, consider this scenario:

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> import json
>>> json.dumps(blue._asdict())
'{"r": 0, "g": 0, "b": 255, "alpha": 1.0}'

As you can see, json.dumps converts a dict into a JSON string.

How to Add a docstring to a namedtuple

In Python, we can document methods, classes and modules using plain strings. This string is then made available as a special attribute named __doc__. That being said, how can we add a documentation to our Color namedtuple?

There’s no right answer to this, but we can do it in two ways. The first one (and a bit more cumbersome) is to extend the tuple using a wrapper. By doing so, we can then define the docstring in this wrapper. As an example, consider the following snippet:

_Color = namedtuple("Color", "r g b alpha")

class Color(_Color):
    """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

>>> print(Color.__doc__)
A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
>>> help(Color)
Help on class Color in module __main__:

class Color(Color)
 |  Color(r, g, b, alpha)
 |  
 |  A namedtuple that represents a color.
 |  It has 4 fields:
 |  r - red
 |  g - green
 |  b - blue
 |  alpha - the alpha channel
 |  
 |  Method resolution order:
 |      Color
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)

As you can see, by inheriting the _Color tuple, we added a __doc__ attribute it.

The second way of adding docstring is just setting __doc__. You see? There’s no need to extend the tuple in the first place.

>>> Color.__doc__ = """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

Just bear in mind that these methods only work on Python 3+.

What Are the Differences Between namedtuples and Data Classes?

Before Python 3.7, creating a simple container of data involved using either:

  • a namedtuple
  • a regular class
  • a third-party library, such as attrs.

If you wanted to go through the class route, that meant you would have to implement a couple of methods. For instance, a regular class will require a __init__ method to set the attributes during class instantiation. If you wanted the class to be hashable, that meant implementing yourself a __hash__ method. To compare different objects, you also want a __eq__ method implemented. And finally, to make debugging easier, you need a __repr__ method. Again, let’s revisit our color use case again using a regular class.

class Color:
    """A regular class that represents a color."""

    def __init__(self, r, g, b, alpha=0.0):
        self.r = r
        self.g = g
        self.b = b
        self.alpha = alpha

    def __hash__(self):
        return hash((self.r, self.g, self.b, self.alpha))

    def __repr__(self):
        return "{0}({1}, {2}, {3}, {4})".format(
            self.__class__.__name__, self.r, self.g, self.b, self.alpha
        )

    def __eq__(self, other):
        if not isinstance(other, Color):
            return False
        return (
            self.r == other.r
            and self.g == other.g
            and self.b == other.b
            and self.alpha == other.alpha
        )

As you can see, there's a lot to implement. You just need a container to hold the data for you and not bother with distracting details. Also, a key difference why people preferred to implement a class is that they are mutable. In fact, the PEP that introduced Data Classes refers them as "mutable namedtuples with defaults".

Now, let's see how this class is implemented as a Data Class.

from dataclasses import dataclass
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

Wow! Is that it?

Yes, that's it. As simple as that! A major difference is that, since there's no __init__ any more, you can just define the attributes after the docstring. Also, they must be annotated with a type hint.

Besides being mutable, a Data Class can also have optional fields out of the box. Let’s say that our Color class does not require an alpha field. We can then make it Optional.

from dataclasses import dataclass
from typing import Optional
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: Optional[float]

And we can instantiate it like so:

>>> blue = Color(r=0, g=0, b=255)

Since they're mutable, we can change whatever field we want. And we can instantiate it like so:

>>> blue = Color(r=0, g=0, b=255)
>>> blue.r = 1
>>> # or even add more fields on the fly
>>> blue.e = 10

Unfortunately, due to their nature, namedtuples don't have optional fields by default. To add them we need a bit of a hack and a little meta-programming.

Caveat: To add a __hash__ method, you need to make them immutable by setting unsafe_hash to True:

@dataclass(unsafe_hash=True)
class Color:
    ...

Another difference is that unpacking is a first-class citizen with namedtuples. If you want your Data Class to have the same behavior, you must implement yourself.

from dataclasses import dataclass, astuple
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

    def __iter__(self):
        yield from dataclasses.astuple(self)

Performance Comparison

Comparing only the features is not enough, Named Tuples and Data Classes differ in performance too. Data Classes are implemented in pure Python and based on a dict. This makes them faster when it comes to accessing the fields. On the other hand, namedtuples are just an extension a regular tuple. That means their implementation is based on a faster C code and have a smaller memory footprint.

To show that, consider this experiment on Python 3.8.5.

In [6]: import sys

In [7]: ColorTuple = namedtuple("Color", "r g b alpha")

In [8]: @dataclass
   ...: class ColorClass:
   ...:     """A regular class that represents a color."""
   ...:     r: float
   ...:     g: float
   ...:     b: float
   ...:     alpha: float
   ...: 

In [9]: color_tup = ColorTuple(r=50, g=205, b=50, alpha=1.0)

In [10]: color_cls = ColorClass(r=50, g=205, b=50, alpha=1.0)

In [11]: %timeit color_tup.r
36.8 ns ± 0.109 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit color_cls.r
38.4 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: sys.getsizeof(color_tup)
Out[15]: 72

In [16]: sys.getsizeof(color_cls) + sys.getsizeof(vars(color_cls))
Out[16]: 152

As you can see, accessing a field is slightly faster in a dataclass, however they take up much more space in memory than a tuple.

How to Add Type Hints to a namedtuple

As you can see, Data Classes use type hints by default. However, we can have them on namedtuples as well. By importing the Namedtuple annotation type and inheriting from it, we can have our Color tuple annotated.

from typing import NamedTuple
...
class Color(NamedTuple):
    """A namedtuple that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

Another detail that might have gone unnoticed is that this way also allows us to have docstrings. If we type help(Color) we'll be able to see them.

Help on class Color in module __main__:

class Color(builtins.tuple)
 |  Color(r: float, g: float, b: float, alpha: Union[float, NoneType])
 |  
 |  A namedtuple that represents a color.
 |  
 |  Method resolution order:
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.

How to Add Optional Default Values to a namedtuple

In the last section, we learned that Data Classes can have optional values. Also, I mentioned that to mimic the same behavior on a named tuple requires some hacking. As it turns out, we can use inheritance, as in the example below.

from collections import namedtuple

class Color(namedtuple("Color", "r g b alpha")):
    __slots__ = ()
    def __new__(cls, r, g, b, alpha=None):
        return super().__new__(cls, r, g, b, alpha)
>>> c = Color(r=0, g=0, b=0)
>>> c
Color(r=0, g=0, b=0, alpha=None)

Conclusion

Named Tuples are a very powerful data structure. They make our code cleaner and more reliable. Despite the competition against the new Data Classes, they still have plenty of firewood to burn. In this tutorial, we learned several ways of making use of namedtuples, and I hope you can them useful.

If you liked this post, consider sharing it with your friends! Also, feel free to follow me https://miguendes.me.

Everything You Need to Know About Python's Namedtuples first appeared on miguendes's blog.

Top comments (0)