DEV Community

Patrice Ferlet
Patrice Ferlet

Posted on

Python, the usefulness of "dataclass"

When you need to describe a "data", you commonly use classes (instead of dict) – and Python offers a very elegant way to manage this.

A Data Objects represents data which can be saved inside a database. This concept is in the heart of SQLAlchemy, but as the name should be obvious: it's for SQL Database (in general). Today, there are now document databases too (like MongoDB, ArangoDB, RethinkDB that I love so much, or even PostgreSQL). So, a "data" is like a "structured and typed document" that you save "as is". That's not the same paradigm, not the same controls. There are advantages and disadvantages, but we won't debate that here.

The topic, today, is that Python can help a lot to define our "data classes" in a "generic" way with controls and initialization.

At this time, you generally use a "pure plain object"… this will change.

For example, to describe a "user":

class User:
    username = ""
    email = ""
    password = ""
    level = 0
Enter fullscreen mode Exit fullscreen mode

Of course, you need a bit more controls. You require a constructor to initialize the properties, maybe a setter to make the password and to avoid it to be represented in JSON...

And there comes "dataclass", a very cool built-in package in Python.

Dataclasses?

If you read the corresponding documentation page, you'll discover a treasure. This package offers easy to use, but powerful, decorator and functions to manage a "data object" by defining a class.

A dataclass defines constructor, operator, and managed fields if they have got annotations. Either the property is only for internal use. This helps a lot to manage what is part of the data, and what is not.

Let me show you the very basic way to use it, using annotations instead of values:

from dataclasses import dataclass

@dataclass
class User:
    username: str
    email: str
    password: str
    level: int
    example = "foo" # This is not a field, no annotation
Enter fullscreen mode Exit fullscreen mode

Now, the class has got __init__ function to create the object with keywords argument, a __repr__() method, and a __eq__ method that overrides the equal operator.

There are more things to see later, but let's check the usage :

user1 = User(
    username="John",
    email="me@test.com",
    password="foobar",
    level=0,
)
user2 = User(
    username="John",
    email="me2@test.com",
    password="foobar",
    level=0,
)

# show a nice object representation
print(user1)

# try comparison
print(user1 == user2)  # False, email differs
Enter fullscreen mode Exit fullscreen mode

OK, that's nice, but we can do more… a lot more!

Dataclasses fields

Let's imagine we want to create a user without setting its level, because we define that the level should be "0" by default.

The issue is that the __init__ constructor defines it. So at this time we must provide a value when we build the object.

The dataclasses packages provides a function named field that will help a lot to ease the development.

from dataclasses import dataclass, field

@dataclass
class User:
    username: str
    email: str
    password: str
    level: int = field(default=0) # set field as optional

# test
user1 = User(
    username="joe",
    email="me@test.com",
    password="123456") # no level provided

# but the level is set to 0
print(user1.level)
# >> 0
Enter fullscreen mode Exit fullscreen mode

And that's not the end. We can do a lot of things.

Not always need for constructor, use "post init"

Sometimes you want to make something when an object is instantiated. So, the first reaction is to create a constructor. But, of course, here, the dataclass decorator provides one and it's well made to manage default values.

That's why you can create a __post_init__ method. It is called right after the constructor.

For example, let's make a check on the password length.

""" User management """
from dataclasses import dataclass, field


@dataclass
class User:
    """A user object"""

    username: str
    email: str
    password: str
    level: int = field(default=0)  # set field as optional

    def __post_init__(self):
        if len(self.password.strip()) < 8:
            raise ValueError("Password must be at least 8 characters long")
Enter fullscreen mode Exit fullscreen mode

That's enough to make some validation.

Of course, you can manage this with setter or getter, but I only show you an example.

When dataclass becomes central

You may think that it is only a gadget, a "too simple management" of class that represents data.

Now, it's time to see when simplicity provides controls.

I will present how this may help you to create an API with Quart and a bit of Quart Schema. If you already use Flask, that will not be a problem as Quart is a "fork" which only make it asynchronous.

Before the use of this, you probably do something like this:


@api.route("/user")
async def user():
    user = User(
        username="John",
        email="me@test.com",
        password="foo"
    )
    return jsonify({
        "username": user.username,
        "email": user.email,
        "level": user.level,
    })

Enter fullscreen mode Exit fullscreen mode

Of course, you probably created methods to transform the data to JSON, or to dict. But now, with dataclass + quart-schema, it's way more explicit.

First, you must declare that the application is encapsulated to a Schema:

from quart import Quart
from quart_schema import QuartSchema, validate_response

api = Quart(__name__)
QuartSchema(api)
Enter fullscreen mode Exit fullscreen mode

Then, you are able to return an object, no need to jsonify or to manage transformation!

@api.route("/user")
async def user() -> User:
    return User(
        username="John",
        email="me@test.com",
        password="foo"
    )
Enter fullscreen mode Exit fullscreen mode

That works:

http -b :5000/user
{
    "email": "me@test.com",
    "level": 0,
    "password": "12345678",
    "username": "John"
}
Enter fullscreen mode Exit fullscreen mode

OK, but… The password…

Hide fields

The problem here is of course that the password is sent to the response.

Of course, you will hash the password in database and you must never send back the password

There are plenty of possibilities, but I will propose you one that I prefer.

In "my" design view, there are several kind of data to manage:

  • a user, that is the representation of what I can show to everybody
  • a data user, that is the representation of what I manage in database
  • some specific user representation for "login" or "registration" process

So, here is my example:

@dataclass
class User:
    """User data"""

    username: str
    email: str
    level: int = field(default=0, kw_only=True)


@dataclass
class DataUser(User):
    """Data class for User in database"""

    password: str = field(kw_only=True)

    def hash_password(self):
        """Hash password with SHA1"""
        self.password = sha1(self.password.encode()).hexdigest()
Enter fullscreen mode Exit fullscreen mode

The DataUser class inherits fields from User. We must force kw_only to ensure that field with default values doesn't interfere with derived class fields.

@app.route("/user/<email>")
async def get_user(email: str) -> User:
    """Get a user"""
    user = db("users").table("users").get(email)
    del user["password"]
    return User(**user)
Enter fullscreen mode Exit fullscreen mode

I want to insist: I'm SURE that the password will never be sent back to the response, because the User() construction will raise an exception if I provide the password in argument.

And to save a user:

@app.route("/user", methods=["POST"])
async def create_user():
    """Create a new user"""
    sent = await request.json
    user = DataUser(**sent)
    user.hash_password()

    res = db("app").table("users").insert(asdict(user))
    # check errors... then

    del res["password"]
    return User(**res)
Enter fullscreen mode Exit fullscreen mode

asdict() is taken from the dataclasses package, it builds a complete dictionary from your dataclass. So it's easy to use with a document database like Mongo or RethinkDB.

Note that using validate_request and validate_response from Quart Schema simplies a lot the method. For example:

@app.route("/user", methods=["POST"])
@validate_request(DataUser)
async def create_user(data: DataUser) -> User:
    """Create a new user"""

    data.hash_password()
    res = db("app").table("users").insert(asdict(data))

    # check errors... then
    del res["password"]
    return User(**res)
Enter fullscreen mode Exit fullscreen mode

Last words

Anyway, what I hope you to understand is the interest of using dataclass and the fields, field, asdict and other functions to make your data structures easy to use and to manage.

Top comments (2)

Collapse
 
farrukh007 profile image
Farrukh

Hi...do u have experienced in Machine learning programming?
I need to trained my dataset and get results over multiple supervised Machine learning algorithms.
Can u help me in the above context?

Collapse
 
metal3d profile image
Patrice Ferlet

Hi, yes, I did many models with TensorFlow and Keras.
But I'm a bit busy to give large help at this time. You can probably contact me instead of using article comments ?