An introduction to database generated columns, using PostgreSQL and the new GeneratedField added in Django 5.0.
⚡ TL;DR
GeneratedField
is a new Django 5.0 feature and has these characteristics:
- its value is calculated entirely by the database
- it works in all supported backend databases
Here is a model example with a GeneratedField
to use with PostgreSQL as the database backend:
from django.db import models
from django.contrib.postgres.search import (
SearchVector, SearchVectorField
)
class Quote(models.Model):
author = models.TextField()
text = models.TextField()
search = models.GeneratedField(
db_persist=True,
expression=SearchVector(
"text", config="english"
),
output_field=SearchVectorField(),
)
🚀 Introduction
I've been thinking for a long time that database generated columns would be a killer feature for Django and now they will finally be available in Django 5.0 with the new GeneratedField
.
Given my great interest in this functionality, I proposed it to the Django developer mailing list, I tried to help in the revision phase of the Pull Request on the Django code, but above all I tried to test them as much as possible after the merge, trying to highlight any bugs before the release of Django 5.0 as a stable version.
In this article, I want to share with you the work I have done to allow you to experiment with GeneratedField
using common model field types using PostgreSQL as a database backend.
🤖 Generated Field
The GeneratedField
are full-fledged fields that can be used in queries, displayed and even indexed but their values cannot be set or modified because they are automatically calculated by the database itself whenever the other fields of the same row are modified.
For this reason, they are very useful for having immediately available values calculated starting from the other fields of the same model as long as these are in the same database table.
The definition of the GeneratedField
class requires specifying the expression
, output_field
, and db_persist
attributes.
class GeneratedField(
expression, output_field, db_persist=None, **kwargs
):
...
🕶️ Attributes
db_persist
In this article, we're going to use PostgreSQL as a database backend with persisted columns, specified with db_persist=True
, in that the database column will occupy storage as if it were a real column.
output_field
We have to specify an explicit model field instance to define the field’s data type and its attributes.
expression
The expression
attributes is the one we're going to play more with, using various example of Expression
that the database will use to automatically set the field value each time the model is changed.
↔️ Migrations
The SQL code generated from the migrations produced in our examples will all have the GENERATED ALWAYS SQL (...) STORED
syntax, varying only in the internal logic of the expression.
💻 Set Up
The requirements to experiment with Django GeneratedField
are:
🐍 Python
A stable and supported version of Python 3 (tested with Python 3.11-3.12):
$ python3 --version
Python 3.12.0
⚗️ Virtual environment
A Python virtual environment:
$ python3 -m venv ~/.generatedfields
$ source ~/.generatedfields/bin/activate
🦄 Django
The latest version of Django (tested with Django 5.0):
$ python3 -m pip install django==5.0
🐘 Psycopg
We’ll use the Python PostgreSQL database adapter which requires installing psycopg
greater than the 3.1.8
version.
$ python3 -m pip install psycopg[binary]~=3.1.8
⚠️ Warning
The psycopg[binary]
package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements. For production uses you are advised to use the "Local installation". ⬀
🗃️ The generatedfields
project
To create the generatedfields
project I'll switch to my projects
directory:
$ cd ~/projects
and then use the startproject
Django command:
$ python3 -m django startproject generatedfields
The basic files of our project will be created in the generatedfields
directory:
$ tree --noreport generatedfields/
generatedfields/
├── manage.py
└── generatedfields
├── asgi.py
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py
🎛️ Activating PostgreSQL
To use Postgres as the database in our Django project, we modify the project database settings, adding the PostgreSQL engine and the connection parameters of our PostgreSQL database:
generatedfields/generatedfields/settings.py
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"HOST": "<my_database_host>",
"NAME": "<my_database_name",
"PASSWORD": "<my_database_password>",
"PORT": "<my_database_port>",
"USER": "<my_database_user>",
}
}
⚠️ Warning
You need to create/activate a PostgreSQL database instance (e.g. with system packages, with Docker or as a remote service) and then replace your credentials in the DATABASE settings.
🗂️ The samples
app
After switching to the generatedfields
directory:
$ cd generatedfields
We can create our samples
app with the Django startapp
command:
$ python3 -m django startapp samples
Again, all the necessary files will be created for us in the samples
directory:
$ tree --noreport samples/
samples/
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│ └── __init__.py
├── models.py
├── tests.py
└── views.py
✅ Activating the samples
app
Now, we have to activate our samples
application by inserting its name in the list of the INSTALLED_APPS
in the generatedfields
settings file.
generatedfields/generatedfields/settings.py
INSTALLED_APPS = [
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
"samples",
]
📚 Sample models
The example models I have experimented with have expressions with complexity that progressively increase from example to example.
We start by combining two fields of the model to create the value for the generated one and then using database functions, combining them, using them with conditions and even generating multiple fields from the same starting field.
These experiments have allowed us to identify some bugs and fix them during this Django 5.0 testing period, but I leave off some edge cases to take into account which we'll mention at the end of the article.
🔧 Useful commands
In the following examples, we will use some commands that we show here to avoid repetition.
makemigrations
Generates a new database migration to apply it to our database:
$ python3 -m manage makemigrations --no-header
sqlmigrate
See the SQL code that our migration will apply to the database:
$ python3 -m manage sqlmigrate samples <migration-id>
migrate
Apply migration to our database with:
$ python3 -m manage migrate
shell
Start an interactive Django shell:
$ python3 -m manage shell
♻️ Common generated fields
In my previous article in this series (Database generated columns⁽¹⁾: Django & SQLite), I showed nine examples of generated fields using SQLite.
The first eight examples can be used as they were shown even with PostgreSQL as the database.
By viewing the generated SQL code with the sqlmigrate
command you will notice some differences because different syntaxes and specific field types will be used, or PostgreSQL capabilities absent in SQLite will be exploited.
The difference in the SQL code generated by PostgreSQL is relevant when viewing the example "🔑 A calculated JSON key field" where the data type for JSON fields and the specific syntax for extracting keys is used.
The ninth example "🖇️ A calculated concatenated field", however, although simple from a functionality point of view, highlights a problem that should be kept in mind when using generated fields, namely that only IMMUTABLE functions can be used.
Unfortunately in the SQL code, the concatenation function that the ORM uses when PostgreSQL is the selected database is not IMMUTABLE
, and I will show you a way I found to get around this.
I copied the two examples from my previous article below in full for completeness, showing the difference in the SQL or Python code.
🔑 A calculated JSON key field
In the Package
model there is the slug
field and the data
JSONfield in which we want to store the JSON payload of the Python package, corresponding to the slug
value, as it is returned to us by PyPi
via its endpoint https://pypi.org/pypi/
. The generated field version
will contain the latest version of the package, extracted from the JSONfield data
.
generatedfields/samples/models.py
from django.db import models
from django.db.models import F
class Package(models.Model):
slug = models.SlugField()
data = models.JSONField()
version = models.GeneratedField(
expression=F("data__info__version"),
output_field=models.TextField(),
db_persist=True,
)
def __str__(self):
return f"{self.slug} {self.version}"
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model Package
--
CREATE TABLE "samples_package" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"slug" varchar(50) NOT NULL,
"data" jsonb NOT NULL,
"version" text GENERATED ALWAYS AS (
("data"#>'{info,version}')
) STORED
);
CREATE INDEX "samples_package_slug_idx"
ON "samples_package" ("slug");
CREATE INDEX "samples_package_slug_idx_like"
ON "samples_package" ("slug" varchar_pattern_ops);
COMMIT;
The SQL code generated for PostgreSQL is decidedly simpler than that generated for other databases (for example SQLite) because it uses the JSON type with its specific functions.
After running migrate
, open a shell
and check the model behaves as intended:
>>> import json
>>> from urllib.request import urlopen
>>> from samples.models import Package
>>> slug = "django"
>>> URL = f"https://pypi.org/pypi/{slug}/json"
>>> data = json.loads(urlopen(URL).read())
>>> Package.objects.create(slug=slug, data=data)
<Package: django 4.2.7>
🔔 Note
At first glance, the case presented in this example might seem not very useful because it is limited to a simple extraction of the value corresponding to a specific key from a JSON, which in many databases is rather quick to do at run time.
In reality, if we assume that we have a large number of instances of the Package
model, the execution of a query in which the version value is already present in its field greatly lightens its execution by simplifying the logic, and also alleviating indexes.
Furthermore, if the version were the only interesting field to extract from the JSON in the data
field, we could defer the data
field from our queries, speeding up them and drastically reducing the data transferred from the database (e.g. in the change list pages, in the Django admin).
🖇️ A calculated concatenated field
The last User
model partially follows the one already present in Django itself, but we only have the first_name
and the last_name
fields. The generated field full_name
is calculated starting from the first two on which is performed a concatenation with the addition of a space, using the appropriate Concat
function.
generatedfields/samples/models.py
from django.db import models
from django.db.models import Value
class ConcatOp(models.Func):
arg_joiner = " || "
function = None
output_field = models.TextField()
template = "%(expressions)s"
class User(models.Model):
first_name = models.CharField(max_length=150)
last_name = models.CharField(max_length=150)
full_name = models.GeneratedField(
expression=ConcatOp(
"first_name", Value(" "), "last_name",
),
output_field=models.TextField(),
db_persist=True,
)
def __str__(self):
return self.full_name
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model User
--
CREATE TABLE "samples_user" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"first_name" varchar(150) NOT NULL,
"last_name" varchar(150) NOT NULL,
"full_name" text GENERATED ALWAYS AS (
"first_name" || ' ' || "last_name"
) STORED
);
COMMIT;
Using the custom ConcatOp
function the SQL code generated for PostgreSQL uses the concatenation operator ||
to join column values together.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import User
>>> User.objects.create(first_name="Jane", last_name="Doe")
<User: Jane Doe>
🔔 Note
I defined the ConcatOp
function to make this example work also in PostgreSQL because with the Concat
function defined by the Djanngo ORM it would have failed as it would have used the CONCAT
SQL function which is not IMMUTABLE
.
I don't recommend you use this function in production, but rather you can try to lend a hand to modify the current implementation in Django ORM which is in progress in this pull request: Refs #34955 -- Simplified implementation of Concat ⬀
🧩 Activating PostgreSQL module
As we have seen in all the examples of using the generated fields shown with SQLite, with some small differences they also work with PostgreSQL, it was enough to change the backend database in the settings of our project.
However, there are specific features that are contained in the specific module for PostgreSQL (present in Django since version 1.8) and which can be exploited after adding it to the list of INSTALLED_APPS
of our project settings.
generatedfields/generatedfields/settings.py
INSTALLED_APPS = [
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
"django.contrib.postgres",
"samples",
]
🔍 A calculated search vector field
Once the PostgreSQL module is activated, we can add a calculated search vector field, in a document model with a content text field. The search vector field is updated with each change to the content of the document, this makes it possible to perform the full-text search immediately eliminating the time and effort of its update.
generatedfields/samples/models.py
from django.db import models
from django.contrib.postgres.search import (
SearchVector, SearchVectorField
)
class Quote(models.Model):
author = models.TextField()
text = models.TextField()
search = models.GeneratedField(
db_persist=True,
expression=SearchVector(
"text", config="english"
),
output_field=SearchVectorField(),
)
def __str__(self):
return f"[{self.author}] {self.text}"
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model Quote
--
CREATE TABLE "samples_quote" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"author" text NOT NULL,
"text" text NOT NULL,
"search" tsvector GENERATED ALWAYS AS (
to_tsvector('english'::regconfig, COALESCE("text", ''))
) STORED
);
COMMIT;
The SQL code is specific to PostgreSQL and exploits the expressions and functions of full-text search, which however are used automatically without the need for triggers or store procedures to keep the search vector column updated.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import Quote
>>> author = "Plato"
>>> text = "Man is a being in search of meaning"
>>> Quote.objects.create(author=author, text=text)
<Quote: [Plato] Man is a being in search of meaning>
>>> Quote.objects.values_list("search")
<QuerySet [("'man':1 'mean':8 'search':6",)]>
>>> Quote.objects.filter(search="meanings").first()
<Quote: [Plato] Man is a being in search of meaning>
🔔 Note
What was done with generated fields could also be achieved using functional indexes, but having a generated field greatly simplifies the investigation of their content, which is very useful for the continuous improvement of search mechanisms.
We have already pointed out that the generated fields, like functional indexes, have the constraint of operating only on the columns of the same table.
But, if we wanted to include fields from other tables in the search mechanisms, we could not do it with functional indexes, but only using a search vector field, updated with more complex systems.
🧮 A calculated array length field
Another specific field that can be used after activating the PostgreSQL module is the ArrayField
. This field opens up many interesting usage scenarios such as storing values of the same type (e.g. URL fields) in a single column. In this case, we used an array field of integers which we are going to count in the generated field.
generatedfields/samples/models.py
from django.db import models
from django.contrib.postgres.fields.array import (
ArrayField,
ArrayLenTransform,
)
class Landmark(models.Model):
name = models.TextField()
reviews = ArrayField(
models.SmallIntegerField()
)
count = models.GeneratedField(
db_persist=True,
expression=ArrayLenTransform(
"reviews"
),
output_field=models.IntegerField(),
)
def __str__(self):
return (
f"{self.name} "
f"({self.count} reviews)"
)
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model Landmark
--
CREATE TABLE "samples_landmark" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"name" text NOT NULL,
"reviews" smallint [] NOT NULL,
"count" integer GENERATED ALWAYS AS (
CASE
WHEN "reviews" IS NULL THEN NULL
ELSE coalesce(
array_length("reviews", 1),
0
)
END
) STORED
);
COMMIT;
In the generated SQL code we see the specific syntax used for array-type columns and the function for calculating the length of the field which we use to count how many values are present in the column.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import Landmark
>>> name = "Colosseum"
>>> reviews = [4, 3, 5, 4, 4, 5, 5, 3, 4]
>>> Landmark.objects.create(
... name=name, reviews=reviews
... )
<Landmark: Colosseum (9 reviews)>
🔔 Note
Unfortunately in the ORM, there are not many functions ready to work with Array fields, but there are many available in PostgreSQL (e.g. concatenation, value extraction, etc ...) that can be used in Django by defining a custom function or proposing addition in Django's PostgreSQL module.
⏳ A calculated date range field
The Range type is also an interesting data type available thanks to Django's PostgreSQL module. Range fields of various types can be created (e.g. integers) and very useful comparison operations can then be carried out (e.g. containment, overlaps, etc...).
In this case, we have created a generated date range type field that we create starting from your normal date fields.
Missing a ready-to-use function in the Django ORM, we defined our custom DateRange
function to then be used in the generated field.
generatedfields/samples/models.py
from django.db import models
from django.contrib.postgres.fields import (
DateRangeField,
)
class DateRangeFunc(models.Func):
function = "DATERANGE"
output_field = DateRangeField()
class Booking(models.Model):
start = models.DateField()
end = models.DateField()
span = models.GeneratedField(
expression=DateRangeFunc(
"start", "end"
),
output_field=DateRangeField(),
db_persist=True,
)
def __str__(self):
return (
f"{self.span.bounds[0]}"
f"{self.span.lower.isoformat()} -"
f"> {self.span.upper.isoformat()}"
f"{self.span.bounds[1]}"
)
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model Booking
--
CREATE TABLE "samples_booking" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"start" date NOT NULL,
"end" date NOT NULL,
"span" daterange GENERATED ALWAYS AS (
DATERANGE("start", "end")
) STORED
);
COMMIT;
The SQL code generated for PostgreSQL in this case uses only the specific data type for the date range and the function that generates it starting from two single dates.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from datetime import date
>>> from samples.models import Booking
>>> Booking.objects.create(
... start="2023-1-1",
... end="2023-1-9"
... )
<Booking: [2023-01-01 -> 2023-01-09)>
>>> Booking.objects.filter(
... span__contains=date(2023, 1, 5)
... )
<QuerySet [<Booking: [2023-01-01 -> 2023-01-09)>]>
🔔 Note
The range data type is very convenient for making complex range queries with little effort. Having ranges automatically calculated with the generated fields allows you not to change the experience of the user who has to provide this data, but at the same time makes the power of this data type available.
Some functions or features are not yet supported by the Django ORM (e.g. multi ranges) but it is not excluded that they will be added in the future.
🙋 Get involved
Thank you for getting this far with reading this article.
I hope you found it interesting to read the examples I proposed, as it was for me to formulate them and report them here. The motivation behind this work and this article is to collaborate with all Django developers to churn out features that are as solid and stable as possible.
As I have repeatedly invited you to do in my articles and talks, I invite you to get involved.
Try Django 5.0 and experiment with its new features, but above all report any strange behavior or bugs so that they end up being better for everyone.
🥠 Conclusion
As we have seen, the generated fields of Django 5.0 are a very powerful tool, which lend themselves to solving a wide variety of problems.
I think they also have the merit of bringing a lot of logic back into the database, which in most projects is the lowest layer of the project stack.
Together with the database-computed default values they also make it possible to simplify the collaboration of different applications on the same database.
In future articles, I'll try to show other examples of using Django's generated fields with other Django modules (e.g. GeoDjango).
Stay tuned.
— Paolo
⚠️ Disclaimer
This code is for demonstration purposes only and should not be used in production as is. However, the code is released without any guarantee from the author and no liability can be attributed. Use at your own risk.
Top comments (0)