DEV Community

Aidas Bendoraitis
Aidas Bendoraitis

Posted on • Originally published at djangotricks.blogspot.com on

QuerySet Filters on Many-to-many Relations

Django ORM (Object-relational mapping) makes querying the database so intuitive, that at some point you might forget that SQL is being used in the background.

This year at the DjangoCon Europe Katie McLaughlin was giving a talk and mentioned one thing that affects the SQL query generated by Django ORM, depending on how you call the QuerySet or manager methods. This particularity is especially relevant when you are creating your QuerySets dynamically. Here it is. When you have a many-to-many relationship, and you try to filter objects by the fields of the related model, every new filter() method of a QuerySet creates a new INNER JOIN clause. I won't discuss whether that's a Django bug or a feature, but these are my observations about it.

The Books and Authors Example

Let's create an app with books and authors, where each book can be written by multiple authors.

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title
Enter fullscreen mode Exit fullscreen mode

The similar app with sample data can be found in this repository.

Inefficient Filter

With the above models, you could define the following QuerySet to select books which author is me, Aidas Bendoraitis.

queryset = Book.objects.filter(
    authors__first_name='Aidas',
).filter(
    authors__last_name='Bendoraitis',
)
Enter fullscreen mode Exit fullscreen mode

We can check what SQL query it would generate with str(queryset.query) (or queryset.query. __str__ ()).

The output would be something like this:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
INNER JOIN `libraryapp_book_authors` T4 ON ( `libraryapp_book`.`id` = T4.`book_id` )
INNER JOIN `libraryapp_author` T5 ON ( T4.`author_id` = T5.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND T5.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;
Enter fullscreen mode Exit fullscreen mode

Did you notice, that the database table libraryapp_author was attached through the libraryapp_book_authors table to the libraryapp_book table TWICE where just ONCE would be enough?

Efficient Filter

On the other hand, if you are defining query expressions in the same filter() method like this:

queryset = Book.objects.filter(
    authors__first_name='Aidas',
    authors__last_name='Bendoraitis',
)
Enter fullscreen mode Exit fullscreen mode

The generated SQL query will be much shorter and (theoretically) would perform faster:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND `libraryapp_author`.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;
Enter fullscreen mode Exit fullscreen mode

The same SQL query can be achieved using the Q() objects:

queryset = Book.objects.filter(
    models.Q(authors__first_name='Aidas') &
    models.Q(authors__last_name='Bendoraitis')
)
Enter fullscreen mode Exit fullscreen mode

The Q() objects add a lot of flexibility to filters allowing to OR, AND, and negate query expressions.

Dynamic Filtering

So to have faster performance, when creating QuerySets dynamically, DON'T use filter() multiple times:

queryset = Book.objects.all()
if first_name:
    queryset = queryset.filter(
        authors__first_name=first_name,
    )
if last_name:
    queryset = queryset.filter(
        authors__last_name=last_name,
    )
Enter fullscreen mode Exit fullscreen mode

DO this instead:

filters = models.Q()
if first_name:
    filters &= models.Q(
        authors__first_name=first_name,
    )
if last_name:
    filters &= models.Q(
        authors__last_name=last_name,
    )
queryset = Book.objects.filter(filters)
Enter fullscreen mode Exit fullscreen mode

Here the empty Q() doesn't have any impact for the generated SQL query, so you don't need the complexity of creating a list of filters and then joining all of them with the bitwise AND operator, like this:

import operator
from django.utils.six.moves import reduce

filters = []
if first_name:
    filters.append(models.Q(
        authors__first_name=first_name,
    ))
if last_name:
    filters.append(models.Q(
        authors__last_name=last_name,
    ))
queryset = Book.objects.filter(reduce(operator.iand, filters))
Enter fullscreen mode Exit fullscreen mode

Profiling

In DEBUG mode, you can check how long the previously executed SQL queries took by checking django.db.connection.queries:

>>> from django.db import connection
>>> connection.queries
[{'sql': 'SELECT …', 'time': '0.001'}, {'sql': 'SELECT …', 'time': '0.004'}]
Enter fullscreen mode Exit fullscreen mode

The Takeaways

  • When querying many-to-many relationships, avoid using multiple filter() methods, make use of Q() objects instead.
  • You can check the SQL query of a QuerySet with str(queryset.query).
  • Check the performance of recently executed SQL queries with django.db.connection.queries.
  • With small datasets, the performance difference is not so obvious. For your specific cases you should do the benchmarks yourself.

Cover photo by Tobias Fischer.

This post was originally published on djangotricks.blogspot.com

Top comments (0)