DEV Community

Kevin Tewouda
Kevin Tewouda

Posted on • Updated on

Brief history of data classes in python

In this tutorial, we will review the different ways we create data classes in python from the oldest way to the newer. Hopefully at the end, you will be convinced to use pydantic dataclasses as your default way to create data
classes.

Default python class creation

The oldest way to define a class in python is via the special __init__ method. For our example we will work with a Point class taking x and y coordinates as input.

class Point:
    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y


p1 = Point(1, 2)
print(p1.x)  # will print 1
print(p1)  # will print something like <__main__.Point object at 0x7fc5d4283c90>

p2 = Point(1, 2)
print(p1 == p2)  # will print False
Enter fullscreen mode Exit fullscreen mode

Like you can see, there is some boilerplate code because we define the same variables as method arguments and class attributes (x and y). When we try to print our created object to see how it looks like, we see a strange default representation made by python.
Even worse, we can instantiate an object like the following Point(1, 'foo') and python will not complaint at all.
Ok you can say that tools like mypy or pyright will help you catch the bug but not everybody wants to use them, so we have to find another way.
Also, the default class implementation does not have comparison methods implemented, therefore p == p2 returns False even if the attributes have the same values between the two objects. 🥲
Here is how we can fix these three issues with the following code.

class Point:
    def __init__(self, x: int, y: int):
        for item in [x, y]:
            if not isinstance(item, int):
                raise ValueError(f'{item} is not an integer')
        self.x = x
        self.y = y

    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'

    def __eq__(self, other):
        if other.__class__ is self.__class__:
            return (self.x, self.y) == (other.x, other.y)
        return NotImplemented

    # we take the opportunity to write the opposite method
    def __ne__(self, other):
        result = self.__eq__(other)
        if result is NotImplemented:
            return NotImplemented
        return not result


p1 = Point(1, 2)
print(p1)  # will print Point(x=1, y=2)

p2 = Point(1, 2)
print(p1 == p2)  # will print True
Point(1, 'foo')  # will raise ValueError
Enter fullscreen mode Exit fullscreen mode

Now we have a working class with a pretty representation, but look the amount of code we have to write... And we haven't even implemented all the comparison methods.

Namedtuples

I will make a small digression on
the namedtuples because one car argue that it is a way to declare a class quickly, and yes it is the case but not without some caveats...
Consider the following example:

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p1 = Point(1, 2)
print(p1)  # will print Point(x=1, y=2)

p2 = Point(1, 'foo')  # we can still mess with variable type

print(p1 == (1, 2))  # will print True :(
Enter fullscreen mode Exit fullscreen mode

Ok, with namedtuples, we have a pretty representation by default, but we suffer for the same lack of type verification like in the default way of creating classes and since it inherits the tuple class, the comparison with a tuple in the last instruction is correct which is not always what we will want.

attrs

Taking into account the problems related to class creation quoted above, a well known pythonista decides to bring a solution with a library called attrs. Let's see how we can re-write our Point class.

from attrs import define, field, validators


@define
class Point:
    x: int = field(validator=[validators.instance_of(int)])
    y: int = field(validator=[validators.instance_of(int)])


p1 = Point(1, 2)
print(p1)  # will print Point(x=1, y=2)

p2 = Point(1, 2)
print(p1 == p2)  # will print True

print(p1 == (1, 2))  # will print False

p = Point(1, 'foo')  # will raise a TypeError
Enter fullscreen mode Exit fullscreen mode

Ok, we clearly see a difference with the handwritten class we wrote above. We have:

  • A pretty default representation
  • Type verification using the field function and validators.
  • A default comparison implementation taking in account the type of the compared objects. This is why the test with a tuple returns False.

It is a well-thought library, and it can be customized in different ways like defining slots, frozen classes, keyword-only arguments and more.

dataclasses

In python3.7, the language introduces dataclasses defined in PEP 557 with the will to simplify the writing of classes. In fact, this new standard library is heavily inspired by attrs. Let's see what our famous Point class look like:

from dataclasses import dataclass


@dataclass
class Point:
    x: int
    y: int


p1 = Point(1, 2)

p2 = Point(1, 2)
print(p1 == p2)  # will print True

print(p1 == (1, 2))  # will print False

p = Point(1, 'foo')  # will not raise a TypeError :(
Enter fullscreen mode Exit fullscreen mode

We have almost the same advantages as the attrs definition except that the argument type is not verified at initialization time.
The only way to achieve this verification is to do the following:

from dataclasses import dataclass


@dataclass
class Point:
    x: int
    y: int

    def __post_init__(self):
        if not isinstance(self.x, int):
            raise TypeError('x is not an integer')
        if not isinstance(self.y, int):
            raise TypeError('y is not an integer')


Point(1, 'foo')  # will raise a TypeError
Enter fullscreen mode Exit fullscreen mode

Yeah, it sucks a little, but it was a will of the CPython maintainers to have a simplified version of attrs without validation and other joys.

pydantic dataclasses

Finally, we will talk about pydantic, a data validation library made famous by
a relatively young web framework, FastAPI. If you don't know it, I highly recommend to check its api, it is another well-though piece of software. I also wrote a blog post presenting some of its advantages.

The feature that interest us in this article is that of the dataclasses. Again we will look at our Point class implementation. 😁

from pydantic.dataclasses import dataclass


@dataclass
class Point:
    x: int
    y: int


p1 = Point(1, 2)
print(p1)  # will print Point(x=1, y=2)

p2 = Point(1, 2)
print(p1 == p2)  # will print True

print(p1 == (1, 2))  # will print False

p = Point(1, 'foo')  # will raise a pydantic.ValidationError
Enter fullscreen mode Exit fullscreen mode

We have all the advantages that we had with the attrs implementation and the written code is even smaller!
pydantic leverages type annotations to validate data, and we can still use the api provided by the standard library dataclasses like field, asdict, etc... because pydantic.dataclasses is just a wrapper around the standard one. For proof, look at the following example:

import dataclasses
import pydantic.dataclasses


@dataclasses.dataclass
class Point:
    x: int
    y: int


# no error raised
print(Point('foo', 2))

Point = pydantic.dataclasses.dataclass(Point)

# error raised
print(Point('foo', 2))
Enter fullscreen mode Exit fullscreen mode

Here we wrap a normal dataclass into a pydantic one, and we have the same verifications and features!
Pydantic is a fantastic library (yes I am a little biased) and I can only recommend you to check out its documentation.

This is all for this tutorial, hope you enjoyed it. Take care of yourself and see you next time! 😁


If you like my article and want to continue learning with me, don’t hesitate to follow me here and subscribe to my newsletter on substack 😉

Top comments (0)