DEV Community

loading...
Cover image for Fixing inefficient Django ORM in brownfield projects

Fixing inefficient Django ORM in brownfield projects

djangodoctor profile image Django Doctor Updated on ・3 min read

One of the quickest wins for improving the speed of a Django app is to fix the "oops didn't mean to" inefficient database reads that slipped into a codebase unnoticed over the years. Indeed, as the codebase matures and the team members change, and code gather dust old inefficient ORM queries can be overlooked. Simple ORM queries like:

def check_hounds():
    queryset = HoundsModel.objects.all()
    if len(queryset) > 2:
         return "oh no. Run!" 
Enter fullscreen mode Exit fullscreen mode

This is reading every record from the database and checking at application level. Much less efficient than

def check_hounds():
    if HoundsModel.objects.count() > 2:
         return "oh no. Run!" 
Enter fullscreen mode Exit fullscreen mode

It's easy to scoff at such inefficiencies. "I would never do that". Well, maybe not. But what about some dev in the past that was rushing to meet a deadline, an 11pm code reviewer equally surviving on coffee during crunch time? And you're now responsible for that code!

Slightly less textbook, but also just as easy to both overlook and improve:

def write_condolence_letters(visited_coops):
    queryset = ChickenModel.objects.all()
    for chicken in queryset:
        if chicken.coop.pk in visited_coops:
            return f"dear {chicken.coop.owner_name}..."
        else:
            ...
Enter fullscreen mode Exit fullscreen mode

Spot the problems? Well, looping over all() will read everything from the db in one go. If there are billions of chickens then expect performance issues. For that we can use iterator() to chunk the reads 2000 (default) records pulled from the db at a time. Additionally, chicken.coop.pk does an additional database read for each chicken because relationships are lazily evaluated by default: coop is read from the db when only it was accessed via chicken.coop. For this particular field we can use Django's field caching: Django creates an _id field for related field. So this can be:

def write_condolence_letters(visited_coops):
    queryset = ChickenModel.objects.all()
    for chicken in queryset.iterator():
        if chicken.coop_pk in visited_coops:
            return f"dear {chicken.coop.owner_name}..."
        else:
            ...
Enter fullscreen mode Exit fullscreen mode

What if we're working with a field on a related model other than the pk? Sure coop_pk is created by Django for us, but what if we needed to access other fields such as chicken.coop.owner_name? For that we an use select_related or prefetch_related (depending on if it's a ForeignKey or OneToOne etc relationship):

def write_condolence_letters(visited_coops):
    queryset = ChickenModel.objects.all()
    for chicken in queryset.select_related('coop'):
        if chicken.coop.pk in visited_coops:
            return f"dear {chicken.coop.owner_name}..."
        else:
            ...
Enter fullscreen mode Exit fullscreen mode

Now the related coop will be pulled out at during the same read as chicken. Notice though iterator is no longer used because iterator is useless with select_related and prefetch_related: iterator nullifies their effects. So these are mutually exclusive so only with context can we really say specifically what performance improvement strategy is needed.

Does your codebase have old inefficient ORM lookups?

It's easier to spot new inefficient ORM lookups during code review but what about those already in your codebase? I can check that for you at django.doctor, or can review your GitHub PRs:

Alt Text

Or try out Django refactor challenges.

Discussion (1)

Some comments have been hidden by the post's author - find out more

Forem Open with the Forem app