Paolo Melchiorre

Posted on Nov 24, 2023 • Originally published at paulox.net on Nov 23, 2023

Database generated columns (2): Django & PostgreSQL

#django #database #postgres #python

An introduction to database generated columns, using PostgreSQL and the new GeneratedField added in Django 5.0.

⚡ TL;DR

GeneratedField is a new Django 5.0 feature and has these characteristics:

its value is calculated entirely by the database
it works in all supported backend databases

Here is a model example with a GeneratedField to use with PostgreSQL as the database backend:

from django.db import models
from django.contrib.postgres.search import (
    SearchVector, SearchVectorField
)


class Quote(models.Model):
    author = models.TextField()
    text = models.TextField()
    search = models.GeneratedField(
        db_persist=True,
        expression=SearchVector(
            "text", config="english"
        ),
        output_field=SearchVectorField(),
    )

🚀 Introduction

I've been thinking for a long time that database generated columns would be a killer feature for Django and now they will finally be available in Django 5.0 with the new GeneratedField.

Given my great interest in this functionality, I proposed it to the Django developer mailing list, I tried to help in the revision phase of the Pull Request on the Django code, but above all I tried to test them as much as possible after the merge, trying to highlight any bugs before the release of Django 5.0 as a stable version.

In this article, I want to share with you the work I have done to allow you to experiment with GeneratedField using common model field types using PostgreSQL as a database backend.

🤖 Generated Field

The GeneratedField are full-fledged fields that can be used in queries, displayed and even indexed but their values cannot be set or modified because they are automatically calculated by the database itself whenever the other fields of the same row are modified.

For this reason, they are very useful for having immediately available values calculated starting from the other fields of the same model as long as these are in the same database table.

The definition of the GeneratedField class requires specifying the expression, output_field, and db_persist attributes.

class GeneratedField(
    expression, output_field, db_persist=None, **kwargs
):
    ...

🕶️ Attributes

`db_persist`

In this article, we're going to use PostgreSQL as a database backend with persisted columns, specified with db_persist=True, in that the database column will occupy storage as if it were a real column.

`output_field`

We have to specify an explicit model field instance to define the field’s data type and its attributes.

`expression`

The expression attributes is the one we're going to play more with, using various example of Expression that the database will use to automatically set the field value each time the model is changed.

↔️ Migrations

The SQL code generated from the migrations produced in our examples will all have the GENERATED ALWAYS SQL (...) STORED syntax, varying only in the internal logic of the expression.

💻 Set Up

The requirements to experiment with Django GeneratedField are:

🐍 Python

A stable and supported version of Python 3 (tested with Python 3.11-3.12):

$ python3 --version
Python 3.12.0

⚗️ Virtual environment

A Python virtual environment:

$ python3 -m venv ~/.generatedfields
$ source ~/.generatedfields/bin/activate

🦄 Django

The latest version of Django (tested with Django 5.0):

$ python3 -m pip install django==5.0

🐘 Psycopg

We’ll use the Python PostgreSQL database adapter which requires installing psycopg greater than the 3.1.8 version.

$ python3 -m pip install psycopg[binary]~=3.1.8

⚠️ Warning

The psycopg[binary] package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements. For production uses you are advised to use the "Local installation". ⬀

🗃️ The `generatedfields` project

To create the generatedfields project I'll switch to my projects directory:

$ cd ~/projects

and then use the startproject Django command:

$ python3 -m django startproject generatedfields

The basic files of our project will be created in the generatedfields directory:

$ tree --noreport generatedfields/
generatedfields/
├── manage.py
└── generatedfields
    ├── asgi.py
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

🎛️ Activating PostgreSQL

To use Postgres as the database in our Django project, we modify the project database settings, adding the PostgreSQL engine and the connection parameters of our PostgreSQL database:

generatedfields/generatedfields/settings.py

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "HOST": "<my_database_host>",
        "NAME": "<my_database_name",
        "PASSWORD": "<my_database_password>",
        "PORT": "<my_database_port>",
        "USER": "<my_database_user>",
    }
}

⚠️ Warning

You need to create/activate a PostgreSQL database instance (e.g. with system packages, with Docker or as a remote service) and then replace your credentials in the DATABASE settings.

🗂️ The `samples` app

After switching to the generatedfields directory:

$ cd generatedfields

We can create our samples app with the Django startapp command:

$ python3 -m django startapp samples

Again, all the necessary files will be created for us in the samples directory:

$ tree --noreport samples/
samples/
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│   └── __init__.py
├── models.py
├── tests.py
└── views.py

✅ Activating the `samples` app

Now, we have to activate our samples application by inserting its name in the list of the INSTALLED_APPS in the generatedfields settings file.

generatedfields/generatedfields/settings.py

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "samples",
]

📚 Sample models

The example models I have experimented with have expressions with complexity that progressively increase from example to example.

We start by combining two fields of the model to create the value for the generated one and then using database functions, combining them, using them with conditions and even generating multiple fields from the same starting field.

These experiments have allowed us to identify some bugs and fix them during this Django 5.0 testing period, but I leave off some edge cases to take into account which we'll mention at the end of the article.

🔧 Useful commands

In the following examples, we will use some commands that we show here to avoid repetition.

`makemigrations`

Generates a new database migration to apply it to our database:

$ python3 -m manage makemigrations --no-header

`sqlmigrate`

See the SQL code that our migration will apply to the database:

$ python3 -m manage sqlmigrate samples <migration-id>

`migrate`

Apply migration to our database with:

$ python3 -m manage migrate

`shell`

Start an interactive Django shell:

$ python3 -m manage shell

♻️ Common generated fields

In my previous article in this series (Database generated columns⁽¹⁾: Django & SQLite), I showed nine examples of generated fields using SQLite.

The first eight examples can be used as they were shown even with PostgreSQL as the database.

By viewing the generated SQL code with the sqlmigrate command you will notice some differences because different syntaxes and specific field types will be used, or PostgreSQL capabilities absent in SQLite will be exploited.

The difference in the SQL code generated by PostgreSQL is relevant when viewing the example "🔑 A calculated JSON key field" where the data type for JSON fields and the specific syntax for extracting keys is used.

The ninth example "🖇️ A calculated concatenated field", however, although simple from a functionality point of view, highlights a problem that should be kept in mind when using generated fields, namely that only IMMUTABLE functions can be used.

Unfortunately in the SQL code, the concatenation function that the ORM uses when PostgreSQL is the selected database is not IMMUTABLE, and I will show you a way I found to get around this.

I copied the two examples from my previous article below in full for completeness, showing the difference in the SQL or Python code.

🔑 A calculated JSON key field

In the Package model there is the slug field and the data JSONfield in which we want to store the JSON payload of the Python package, corresponding to the slug value, as it is returned to us by PyPi via its endpoint https://pypi.org/pypi/. The generated field version will contain the latest version of the package, extracted from the JSONfield data.

generatedfields/samples/models.py

from django.db import models
from django.db.models import F


class Package(models.Model):
    slug = models.SlugField()
    data = models.JSONField()
    version = models.GeneratedField(
        expression=F("data__info__version"),
        output_field=models.TextField(),
        db_persist=True,
    )

    def __str__(self):
        return f"{self.slug} {self.version}"

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Package
--
CREATE TABLE "samples_package" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "slug" varchar(50) NOT NULL,
  "data" jsonb NOT NULL,
  "version" text GENERATED ALWAYS AS (
    ("data"#>'{info,version}')
  ) STORED
);
CREATE INDEX "samples_package_slug_idx"
ON "samples_package" ("slug");
CREATE INDEX "samples_package_slug_idx_like"
ON "samples_package" ("slug" varchar_pattern_ops);
COMMIT;

The SQL code generated for PostgreSQL is decidedly simpler than that generated for other databases (for example SQLite) because it uses the JSON type with its specific functions.

After running migrate, open a shell and check the model behaves as intended:

>>> import json
>>> from urllib.request import urlopen
>>> from samples.models import Package
>>> slug = "django"
>>> URL = f"https://pypi.org/pypi/{slug}/json"
>>> data = json.loads(urlopen(URL).read())
>>> Package.objects.create(slug=slug, data=data)
<Package: django 4.2.7>

🔔 Note

At first glance, the case presented in this example might seem not very useful because it is limited to a simple extraction of the value corresponding to a specific key from a JSON, which in many databases is rather quick to do at run time.

In reality, if we assume that we have a large number of instances of the Package model, the execution of a query in which the version value is already present in its field greatly lightens its execution by simplifying the logic, and also alleviating indexes.

Furthermore, if the version were the only interesting field to extract from the JSON in the data field, we could defer the data field from our queries, speeding up them and drastically reducing the data transferred from the database (e.g. in the change list pages, in the Django admin).

🖇️ A calculated concatenated field

The last User model partially follows the one already present in Django itself, but we only have the first_name and the last_name fields. The generated field full_name is calculated starting from the first two on which is performed a concatenation with the addition of a space, using the appropriate Concat function.

generatedfields/samples/models.py

from django.db import models
from django.db.models import Value


class ConcatOp(models.Func):
    arg_joiner = " || "
    function = None
    output_field = models.TextField()
    template = "%(expressions)s"


class User(models.Model):
    first_name = models.CharField(max_length=150)
    last_name = models.CharField(max_length=150)
    full_name = models.GeneratedField(
        expression=ConcatOp(
            "first_name", Value(" "), "last_name",
        ),
        output_field=models.TextField(),
        db_persist=True,
    )

    def __str__(self):
        return self.full_name

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model User
--
CREATE TABLE "samples_user" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "first_name" varchar(150) NOT NULL,
  "last_name" varchar(150) NOT NULL,
  "full_name" text GENERATED ALWAYS AS (
    "first_name" || ' ' || "last_name"
  ) STORED
);
COMMIT;

Using the custom ConcatOp function the SQL code generated for PostgreSQL uses the concatenation operator || to join column values together.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import User
>>> User.objects.create(first_name="Jane", last_name="Doe")
<User: Jane Doe>

🔔 Note

I defined the ConcatOp function to make this example work also in PostgreSQL because with the Concat function defined by the Djanngo ORM it would have failed as it would have used the CONCAT SQL function which is not IMMUTABLE.

I don't recommend you use this function in production, but rather you can try to lend a hand to modify the current implementation in Django ORM which is in progress in this pull request: Refs #34955 -- Simplified implementation of Concat ⬀

🧩 Activating PostgreSQL module

As we have seen in all the examples of using the generated fields shown with SQLite, with some small differences they also work with PostgreSQL, it was enough to change the backend database in the settings of our project.

However, there are specific features that are contained in the specific module for PostgreSQL (present in Django since version 1.8) and which can be exploited after adding it to the list of INSTALLED_APPS of our project settings.

generatedfields/generatedfields/settings.py

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "django.contrib.postgres",
    "samples",
]

🔍 A calculated search vector field

Once the PostgreSQL module is activated, we can add a calculated search vector field, in a document model with a content text field. The search vector field is updated with each change to the content of the document, this makes it possible to perform the full-text search immediately eliminating the time and effort of its update.

generatedfields/samples/models.py

from django.db import models
from django.contrib.postgres.search import (
    SearchVector, SearchVectorField
)


class Quote(models.Model):
    author = models.TextField()
    text = models.TextField()
    search = models.GeneratedField(
        db_persist=True,
        expression=SearchVector(
            "text", config="english"
        ),
        output_field=SearchVectorField(),
    )

    def __str__(self):
        return f"[{self.author}] {self.text}"

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Quote
--
CREATE TABLE "samples_quote" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "author" text NOT NULL,
  "text" text NOT NULL,
  "search" tsvector GENERATED ALWAYS AS (
    to_tsvector('english'::regconfig, COALESCE("text", ''))
  ) STORED
);
COMMIT;

The SQL code is specific to PostgreSQL and exploits the expressions and functions of full-text search, which however are used automatically without the need for triggers or store procedures to keep the search vector column updated.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Quote
>>> author = "Plato"
>>> text = "Man is a being in search of meaning"
>>> Quote.objects.create(author=author, text=text)
<Quote: [Plato] Man is a being in search of meaning>
>>> Quote.objects.values_list("search")
<QuerySet [("'man':1 'mean':8 'search':6",)]>
>>> Quote.objects.filter(search="meanings").first()
<Quote: [Plato] Man is a being in search of meaning>

🔔 Note

What was done with generated fields could also be achieved using functional indexes, but having a generated field greatly simplifies the investigation of their content, which is very useful for the continuous improvement of search mechanisms.

We have already pointed out that the generated fields, like functional indexes, have the constraint of operating only on the columns of the same table.

But, if we wanted to include fields from other tables in the search mechanisms, we could not do it with functional indexes, but only using a search vector field, updated with more complex systems.

🧮 A calculated array length field

Another specific field that can be used after activating the PostgreSQL module is the ArrayField. This field opens up many interesting usage scenarios such as storing values of the same type (e.g. URL fields) in a single column. In this case, we used an array field of integers which we are going to count in the generated field.

generatedfields/samples/models.py

from django.db import models
from django.contrib.postgres.fields.array import (
    ArrayField,
    ArrayLenTransform,
)


class Landmark(models.Model):
    name = models.TextField()
    reviews = ArrayField(
        models.SmallIntegerField()
    )
    count = models.GeneratedField(
        db_persist=True,
        expression=ArrayLenTransform(
            "reviews"
        ),
        output_field=models.IntegerField(),
    )

    def __str__(self):
        return (
            f"{self.name} "
            f"({self.count} reviews)"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Landmark
--
CREATE TABLE "samples_landmark" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "name" text NOT NULL,
  "reviews" smallint [] NOT NULL,
  "count" integer GENERATED ALWAYS AS (
    CASE
      WHEN "reviews" IS NULL THEN NULL
      ELSE coalesce(
        array_length("reviews", 1),
        0
      )
    END
  ) STORED
);
COMMIT;

In the generated SQL code we see the specific syntax used for array-type columns and the function for calculating the length of the field which we use to count how many values are present in the column.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Landmark
>>> name = "Colosseum"
>>> reviews = [4, 3, 5, 4, 4, 5, 5, 3, 4]
>>> Landmark.objects.create(
...     name=name, reviews=reviews
... )
<Landmark: Colosseum (9 reviews)>

🔔 Note

Unfortunately in the ORM, there are not many functions ready to work with Array fields, but there are many available in PostgreSQL (e.g. concatenation, value extraction, etc ...) that can be used in Django by defining a custom function or proposing addition in Django's PostgreSQL module.

⏳ A calculated date range field

The Range type is also an interesting data type available thanks to Django's PostgreSQL module. Range fields of various types can be created (e.g. integers) and very useful comparison operations can then be carried out (e.g. containment, overlaps, etc...).

In this case, we have created a generated date range type field that we create starting from your normal date fields.

Missing a ready-to-use function in the Django ORM, we defined our custom DateRange function to then be used in the generated field.

generatedfields/samples/models.py

from django.db import models
from django.contrib.postgres.fields import (
    DateRangeField,
)


class DateRangeFunc(models.Func):
    function = "DATERANGE"
    output_field = DateRangeField()


class Booking(models.Model):
    start = models.DateField()
    end = models.DateField()
    span = models.GeneratedField(
        expression=DateRangeFunc(
            "start", "end"
        ),
        output_field=DateRangeField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.span.bounds[0]}"
            f"{self.span.lower.isoformat()} -"
            f"> {self.span.upper.isoformat()}"
            f"{self.span.bounds[1]}"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Booking
--
CREATE TABLE "samples_booking" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "start" date NOT NULL,
  "end" date NOT NULL,
  "span" daterange GENERATED ALWAYS AS (
    DATERANGE("start", "end")
  ) STORED
);
COMMIT;

The SQL code generated for PostgreSQL in this case uses only the specific data type for the date range and the function that generates it starting from two single dates.

After running migrate, open a shell and check the model behaves as intended:

>>> from datetime import date
>>> from samples.models import Booking
>>> Booking.objects.create(
...     start="2023-1-1",
...     end="2023-1-9"
... )
<Booking: [2023-01-01 -> 2023-01-09)>
>>> Booking.objects.filter(
...     span__contains=date(2023, 1, 5)
... )
<QuerySet [<Booking: [2023-01-01 -> 2023-01-09)>]>

🔔 Note

The range data type is very convenient for making complex range queries with little effort. Having ranges automatically calculated with the generated fields allows you not to change the experience of the user who has to provide this data, but at the same time makes the power of this data type available.

Some functions or features are not yet supported by the Django ORM (e.g. multi ranges) but it is not excluded that they will be added in the future.

🙋 Get involved

Thank you for getting this far with reading this article.

I hope you found it interesting to read the examples I proposed, as it was for me to formulate them and report them here. The motivation behind this work and this article is to collaborate with all Django developers to churn out features that are as solid and stable as possible.

As I have repeatedly invited you to do in my articles and talks, I invite you to get involved.

Try Django 5.0 and experiment with its new features, but above all report any strange behavior or bugs so that they end up being better for everyone.

🥠 Conclusion

As we have seen, the generated fields of Django 5.0 are a very powerful tool, which lend themselves to solving a wide variety of problems.

I think they also have the merit of bringing a lot of logic back into the database, which in most projects is the lowest layer of the project stack.

Together with the database-computed default values they also make it possible to simplify the collaboration of different applications on the same database.

In future articles, I'll try to show other examples of using Django's generated fields with other Django modules (e.g. GeoDjango).

Stay tuned.

— Paolo

⚠️ Disclaimer

This code is for demonstration purposes only and should not be used in production as is. However, the code is released without any guarantee from the author and no liability can be attributed. Use at your own risk.

⚡ TL;DR

🚀 Introduction

🤖 Generated Field

🕶️ Attributes

db_persist

output_field

expression

↔️ Migrations

💻 Set Up

🐍 Python

⚗️ Virtual environment

🦄 Django

🐘 Psycopg

⚠️ Warning

🗃️ The generatedfields project

🎛️ Activating PostgreSQL

⚠️ Warning

🗂️ The samples app

✅ Activating the samples app

📚 Sample models

🔧 Useful commands

makemigrations

sqlmigrate

migrate

shell

♻️ Common generated fields

🔑 A calculated JSON key field

🔔 Note

🖇️ A calculated concatenated field

🔔 Note

🧩 Activating PostgreSQL module

🔍 A calculated search vector field

🔔 Note

🧮 A calculated array length field

🔔 Note

⏳ A calculated date range field

🔔 Note

🙋 Get involved

🥠 Conclusion

⚠️ Disclaimer

🔗 Resources

Read next

Referential integrity In The Absence Of Foreign Key

Dirty Code: Simple Rules to Avoid It

使用 selenium 讀取需要登入會員的網頁

What is SQL and Why It’s Essential for Data Management

`db_persist`

`output_field`

`expression`

🗃️ The `generatedfields` project

🗂️ The `samples` app

✅ Activating the `samples` app

`makemigrations`

`sqlmigrate`

`migrate`

`shell`