In this tutorial, we will review the different ways we create data classes in python from the oldest way to the newer. Hopefully at the end, you will be convinced to use pydantic dataclasses as your default way to create data
classes.
Default python class creation
The oldest way to define a class in python is via the special __init__
method. For our example we will work with a Point
class taking x
and y
coordinates as input.
class Point:
def __init__(self, x: int, y: int):
self.x = x
self.y = y
p1 = Point(1, 2)
print(p1.x) # will print 1
print(p1) # will print something like <__main__.Point object at 0x7fc5d4283c90>
p2 = Point(1, 2)
print(p1 == p2) # will print False
Like you can see, there is some boilerplate code because we define the same variables as method arguments and class attributes (x
and y
). When we try to print our created object to see how it looks like, we see a strange default representation made by python.
Even worse, we can instantiate an object like the following Point(1, 'foo')
and python will not complaint at all.
Ok you can say that tools like mypy or pyright will help you catch the bug but not everybody wants to use them, so we have to find another way.
Also, the default class implementation does not have comparison methods implemented, therefore p == p2
returns False
even if the attributes have the same values between the two objects. 🥲
Here is how we can fix these three issues with the following code.
class Point:
def __init__(self, x: int, y: int):
for item in [x, y]:
if not isinstance(item, int):
raise ValueError(f'{item} is not an integer')
self.x = x
self.y = y
def __repr__(self):
return f'Point(x={self.x}, y={self.y})'
def __eq__(self, other):
if other.__class__ is self.__class__:
return (self.x, self.y) == (other.x, other.y)
return NotImplemented
# we take the opportunity to write the opposite method
def __ne__(self, other):
result = self.__eq__(other)
if result is NotImplemented:
return NotImplemented
return not result
p1 = Point(1, 2)
print(p1) # will print Point(x=1, y=2)
p2 = Point(1, 2)
print(p1 == p2) # will print True
Point(1, 'foo') # will raise ValueError
Now we have a working class with a pretty representation, but look the amount of code we have to write... And we haven't even implemented all the comparison methods.
Namedtuples
I will make a small digression on
the namedtuples because one car argue that it is a way to declare a class quickly, and yes it is the case but not without some caveats...
Consider the following example:
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p1 = Point(1, 2)
print(p1) # will print Point(x=1, y=2)
p2 = Point(1, 'foo') # we can still mess with variable type
print(p1 == (1, 2)) # will print True :(
Ok, with namedtuples, we have a pretty representation by default, but we suffer for the same lack of type verification like in the default way of creating classes and since it inherits the tuple
class, the comparison with a tuple in the last instruction is correct which is not always what we will want.
attrs
Taking into account the problems related to class creation quoted above, a well known pythonista decides to bring a solution with a library called attrs. Let's see how we can re-write our Point
class.
from attrs import define, field, validators
@define
class Point:
x: int = field(validator=[validators.instance_of(int)])
y: int = field(validator=[validators.instance_of(int)])
p1 = Point(1, 2)
print(p1) # will print Point(x=1, y=2)
p2 = Point(1, 2)
print(p1 == p2) # will print True
print(p1 == (1, 2)) # will print False
p = Point(1, 'foo') # will raise a TypeError
Ok, we clearly see a difference with the handwritten class we wrote above. We have:
- A pretty default representation
- Type verification using the field function and validators.
- A default comparison implementation taking in account the type of the compared objects. This is why the test with a
tuple returns
False
.
It is a well-thought library, and it can be customized in different ways like defining slots, frozen classes, keyword-only arguments and more.
dataclasses
In python3.7, the language introduces dataclasses defined in PEP 557 with the will to simplify the writing of classes. In fact, this new standard library is heavily inspired by attrs
. Let's see what our famous Point
class look like:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
p1 = Point(1, 2)
p2 = Point(1, 2)
print(p1 == p2) # will print True
print(p1 == (1, 2)) # will print False
p = Point(1, 'foo') # will not raise a TypeError :(
We have almost the same advantages as the attrs
definition except that the argument type is not verified at initialization time.
The only way to achieve this verification is to do the following:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
def __post_init__(self):
if not isinstance(self.x, int):
raise TypeError('x is not an integer')
if not isinstance(self.y, int):
raise TypeError('y is not an integer')
Point(1, 'foo') # will raise a TypeError
Yeah, it sucks a little, but it was a will of the CPython maintainers to have a simplified version of attrs
without validation and other joys.
pydantic dataclasses
Finally, we will talk about pydantic, a data validation library made famous by
a relatively young web framework, FastAPI. If you don't know it, I highly recommend to check its api, it is another well-though piece of software. I also wrote a blog post presenting some of its advantages.
The feature that interest us in this article is that of the dataclasses. Again we will look at our Point
class implementation. 😁
from pydantic.dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
p1 = Point(1, 2)
print(p1) # will print Point(x=1, y=2)
p2 = Point(1, 2)
print(p1 == p2) # will print True
print(p1 == (1, 2)) # will print False
p = Point(1, 'foo') # will raise a pydantic.ValidationError
We have all the advantages that we had with the attrs
implementation and the written code is even smaller!
pydantic
leverages type annotations to validate data, and we can still use the api provided by the standard library dataclasses
like field, asdict, etc... because pydantic.dataclasses
is just a wrapper around the standard one. For proof, look at the following example:
import dataclasses
import pydantic.dataclasses
@dataclasses.dataclass
class Point:
x: int
y: int
# no error raised
print(Point('foo', 2))
Point = pydantic.dataclasses.dataclass(Point)
# error raised
print(Point('foo', 2))
Here we wrap a normal dataclass into a pydantic one, and we have the same verifications and features!
Pydantic is a fantastic library (yes I am a little biased) and I can only recommend you to check out its documentation.
This is all for this tutorial, hope you enjoyed it. Take care of yourself and see you next time! 😁
If you like my article and want to continue learning with me, don’t hesitate to follow me here and subscribe to my newsletter on substack 😉
Top comments (0)