Ivan Slavko Matić

Posted on Sep 1, 2022

Django 4.1: Where To Apply Async

#python #django

Introduction

Some exciting changes are coming in the future of Django Framework's ongoing development. One of the notable changes is updated support for asynchronous data access operations and HTTP method handlers. Throughout this article, we will go through some of the tests I've created and compare the execution times of both asynchronous and synchronous processes.

Difference between synchronous and asynchronous in Django

Synchronous programming follows a strict set of sequences (or in other words, it executes sequentially), where operations are executed one at a time in strict order. The following quote from David Bevans at 'mendix.com' provides a great description of synchronous programming:

To illustrate how synchronous programming works, think of a telephone. During a phone call, while one person speaks, the other listens. When the first person finishes, the second tends to respond immediately.

Asynchronous programming allows multiple related operations to run at the same time without waiting for prior tasks to finish. We are describing it as a multithreaded model that implements non-blocking architecture. If a thread is free and can perform an operation - it performs it, while pausing the operation that is being awaited. Another excellent description from David Bevans of asynchronous programming:

Texting is an asynchronous communication method. One person can send a text message and the recipient can respond at their leisure. In the meantime, the sender may do other things while waiting for a response.

That being said, we can discern that async should bring a major performance boost if utilized correctly.

Django at its core works and performs synchronously, and somewhere around version 3.0 (when extensions such as 'twisted', 'channels' and 'asyncio' started to gain traction), the idea that Django can work asynchronously started to manifest.

From my research, best way to decide where to use sync or async, is to discern if the view (and by extension what we want to achieve with that view) is I/O bound or CPU-bound. Async should be utilized when approaching I/O bound issues.

Environment and prerequisites

Before we start testing, let's review the setup and some important notes. First things first, an obvious note (but an important one): results may vary from IDE to IDE and everyone's setup of related/used packages in their project, background processes and third-party software. Therefore, the execution times you see here, may or may not be, similar to the times you achieve, if you decide to recreate these tests.

For these tests, we are using Python 3.9 as our interpreter and PostgreSQL 13.6 as our database system. Installed python packages are as follows:

Django~=4.1 psycopg2~=2.9.3 hypercorn~=0.14.1

psycopg2 acts as our database adapter, and hypercorn will emulate the ASGI test server for our environment.

You might wonder and say:

Yeah, but why use hypercorn, when we already have asgi.py in our project ?

That is true, asgi.py in our project indeed contains an application callable and can be used by any ASGI server in development or production. But it is not used by the development server (which we call with runserver command), and that is where hypercorn comes in. By calling myproject.asgi:application, server is up and ready to be used locally.

Important note: proper way to measure performance and execution time would be through TestCase (or even more feature-rich TransactionTestCase) and testing modules/units in general. But I have some concerns regarding the AsyncRequestFactory used in tests and AsyncClient(). Reading through the documentation, it seems that async behaviour is emulated partly, where async functions are executed, and then the process reverts to synchronous behaviour. Saying that the most authentic/realistic way to test this is by doing a simple view and calling it via URL with hypercorn emulated ASGI server. References [3] and [4] contain more information about 'Testing asynchronous code' and 'Advanced testing topics' - it is possible I missed some crucial context.

Models

For our testing, we will keep our models as light as possible. The fields we use are varied, but the data size of each object inside the models will be similar if not the same even.

from django.db.models import Model, TextField, CharField, DateField, ManyToManyField, ForeignKey, CASCADE
from django.utils.timezone import now


# Create your models here.
class CarParts(Model):
    engine = TextField()


class Manufacturer(Model):
    name = CharField(max_length=150)


class Store(Model):
    location = CharField(max_length=150)


class Car(Model):
    created_at = DateField(default=now)
    store = ManyToManyField(Store, null=True)
    car_parts = ForeignKey(CarParts, on_delete=CASCADE, null=True)
    manufacturer = ForeignKey(Manufacturer, on_delete=CASCADE, null=True)

We will be querying and iterating through 50, 500 and 1500 objects per model, so we can track how it scales with an increased number of data.

Testing

We have two class views extending the base class 'View' from django.views, where each will contain multiple methods. Django does not allow the mixing of sync and async methods - remember that sync implements a blocking approach and non-blocking in async. So one class view will have all async methods and other sync methods.

Tests will be performed in the following order:

Load x number of objects
Perform three consecutive tests (multiple tests to compensate for potential IDE background processes that may cause performance penalties)
Analyse

Test A: CRUD cycle using ORM via get and post methods

Global functions to be called in views (one async and other sync):

async def async_iteration():
    car_obj = Car.objects.all()
    async for i in car_obj:
        a_query = await i.store.afirst()
    return car_obj

def sync_iteration():
    car_obj = Car.objects.all()
    for i in car_obj:
        a_query = i.store.first()
    return car_obj

Our sync class:

class PerformanceTestSync(View):
    def get(self, request):
        e_list = []
        start = time.time()
        for entry in Car.objects.all():
            try:
                a_query = entry.store.first()
                b_query = CarParts.objects.filter(
                    Q(engine__contains='engine') & 
                    Q(car__manufacturer_id__gt=20)).first()
            except (Exception, KeyError) as e:
                print('Error during sync iteration: ', e)
        co = sync_iteration()
        for v in co:
            e_list.append(v)
        end = time.time()
        print('Sync GET time: ', end - start)

    def post(self, request):
        start = time.time()
        engine = request.POST.get('engine')
        name = request.POST.get('name')
        location = request.POST.get('location')
        try:
            car_parts_obj = CarParts.objects.create(engine=engine)
        except (CarParts.DoesNotExist, ObjectDoesNotExist, 
                Exception) as e:
            print('CarParts failed to create: ', e)
            car_parts_obj = None
            pass
        try:
            manufacturer_obj = Manufacturer.objects.create(
                                                 name=name)
        except (Manufacturer.DoesNotExist, ObjectDoesNotExist,
                             Exception) as e:
            print('Manufacturer failed to create: ', e)
            manufacturer_obj = None
            pass
        try:
            store_obj = Store.objects.create(location=location)
        except (Store.DoesNotExist, ObjectDoesNotExist, 
                Exception) as e:
            print('Store failed to create: ', e)
            store_obj = None
            pass

        if car_parts_obj and manufacturer_obj:
            try:
                car_obj = Car.objects.create(
                            car_parts=car_parts_obj, 
                            manufacturer=manufacturer_obj
                          )
                car_obj.store.add(store_obj)

            except (Car.DoesNotExist, ObjectDoesNotExist, 
                    Exception) as e:
                print('Car failed to create: ', e)
                pass

        end = time.time()
        print('Sync POST time: ', end - start)
        return HttpResponse("ok", status=200)

Our async class:

class PerformanceTestAsync(View):
    async def get(self, request):
        e_list = []
        start = time.time()
        async for entry in Car.objects.all():
            try:
                a_query = await entry.store.afirst()
                b_query = await CarParts.objects.filter(
                        Q(engine__contains='engine') & 
                        Q(car__manufacturer_id__gt=20)).afirst()
            except (Exception, KeyError) as e:
                print('Error during async iteration: ', e)
        co = await async_iteration()
        async for v in co:
            e_list.append(v)
        end = time.time()
        print('Async GET time: ', end - start)

        return render(request, 'demo_template.html')

    async def post(self, request):
        start = time.time()
        engine = request.POST.get('engine')
        name = request.POST.get('name')
        location = request.POST.get('location')
        try:
            car_parts_obj = await CarParts.objects.acreate(
                                 engine=engine
                                )
        except (CarParts.DoesNotExist, ObjectDoesNotExist, 
                Exception) as e:
            print('CarParts failed to create: ', e)
            car_parts_obj = None
            pass
        try:
            manufacturer_obj = await Manufacturer.objects.acreate(
                                        name=name
                                      )
        except (Manufacturer.DoesNotExist, ObjectDoesNotExist, 
                Exception) as e:
            print('Manufacturer failed to create: ', e)
            manufacturer_obj = None
            pass

        if car_parts_obj and manufacturer_obj:
            try:
                car_obj = await Car.objects.acreate(
                                  car_parts=car_parts_obj, 
                                  manufacturer=manufacturer_obj
                                )
                if car_obj:
                    try:
                        await car_obj.store.acreate(
                               location=location
                              )
                    except Exception as e:
                        print('Couldnt add store object 
                               relations for m2m: ', e)
            except (Car.DoesNotExist, ObjectDoesNotExist, 
                    Exception) as e:
                print('Car failed to create: ', e)
                pass

        end = time.time()
        print('Async POST time: ', end - start)
        return HttpResponse("ok", status=200)

In the get() method, we are iterating through objects in our Car model and in the loop we are making one simple query and one a bit complex query with Q expressions. All wrapped with the new ORM interface introduced in 4.1. In the end, we call a global function that will iterate, and return queryset whose items will be appended to an empty list. Meanwhile, in the post() method we are creating some objects for our models and updating existing data and in the end, deleting. Data in post is received from the client via request. Deleting could be a method on its own as could updating, but to keep things simple, get and post will do the trick.

You will notice in the async class that I didn't use .add in my m2m field update. That is because (sadly) I couldn't find an async adaptation of it in the documentation, and Django treats it as a sync operation. Therefore, I did a little workaround where I create the 'Store' object directly.

Test results

Method get() for 50 objects (displaying an average of three tests):

async: ~0.39832 s
sync: ~0.29502 s

Method post() for creating single object for all models:

async: ~0.07699 s
sync: ~0.08299 s

Method get() for 500 objects:

async: ~3.67973 s
sync: ~2.39427 s

Method get() for 1500 objects:

async: ~8.79098 s
sync: ~6.26344 s

From this test, we can conclude that this test polygon is definitely CPU-bound, our async is underutilized.

Test B: API communication
For this test, you might need to install python's httpx package. I first tried python's requests package believing it had async but it didn't have sadly, so httpx came in as a wonderful alternative. Also, python's built-in asyncio package will also be needed to gather data from the awaited call to the function. Finally, we are importing shield from asyncio to prevent potential cancellation.

pip install httpx
# Then in your view
import httpx
import asyncio
from asyncio import shield

Using GET request, we will be contacting an external API for exchange rates at apilayer.com.

Note: I was using a free subscription which acts as a trial/demo for testing purposes, which is perfect for this testing. Specifically, I was using API on https://apilayer.com/marketplace/exchangerates_data-api. Additionally, I want to point out that it is pretty easy to use and it even offers template code for multiple programming languages to call its API.

From the offered API set, we are using only one and that is:

GET/timeseries

The targeted API allows us to pull exchange rates from the past (max. period of 365 days), which should result in a lengthy response that should take some time to assemble/send. apilayer.com is really fast and responsive, and for that reason, a whole year of exchange rates is taken for this test.

As before, we are creating a sync version and async version to execute API calls. As before, three consecutive tests will be made, and the average displayed.

API function-based views:

def api_exchange_sync():
    url = "https://api.apilayer.com/exchangerates_data/timeseries"

    payload = {}
    headers = {
        "apikey": "[Insert your API key]"
    }

    r = httpx.get(url, params={"start_date": "2016-01-02", 
                               "end_date": "2017-01-01"},
                               headers=headers)

    return r.json()

async def api_exchange_async():
    url = "https://api.apilayer.com/exchangerates_data/timeseries"

    payload = {}
    headers = {
        "apikey": "[Insert your API key]"
    }

    async with httpx.AsyncClient(timeout=50.0) as client:
        r = await client.get(url, params={"start_date": "2016-01-
                             02", "end_date": "2017-01-01"},
                             headers=headers)

    return r.json()

We set the timeout on AsyncClient to prevent httpx.ReadTimeout error. That can happen if you mash the button for restarting API call :P.

Sync and async function-based views:

def append_data_sync(request):
    start = time.time()
    e_list = []
    data = [api_exchange_sync()]
    for k in data:
        e_list.append(k)
    end = time.time()
    print('read_sync time: ', end - start)
    return HttpResponse()

async def append_data_async(request):
    start = time.time()
    e_list = []
    data = await asyncio.gather(*[api_exchange_async()])
    for d in data:
        e_list.append(d)
    end = time.time()
    print('read_async time: ', end - start)
    return HttpResponse()

Test results

async: ~2.23437 s
sync: ~3.30023 s

As we iterate through lists of received data from API functions, we append them to a list. Future objects from asyncio.gather() proved advantageous and I believe that if used data streaming, we would be even faster. Async won the race in this test.

Conclusion

Async came in handy while we were waiting for API to respond, we managed to receive data faster and we used the time better. Sync excelled when there was a need for internal data processing and communication with DB via ORM. There was no need for waiting, shielding and multithreading hence the extra speed sync got. There is no need to compare which is faster with sync and async - because it heavily depends on the context of what we want to do. It's important to recognize CPU-bound and I/O-bound situations to properly utilize async and sync. We don't need to use async for every little delay we encounter. I believe a pattern of delays (during our development) must be noticed before a decision is made to implement the async approach.

Django 4.1 async update is not a speed update - I rather look at it as an update that allows us to cover better and more situations.

References

mendix.com, https://www.mendix.com/blog/asynchronous-vs-synchronous-programming/, 'Difference between async and sync programming', David Bevans
djangoproject.com, https://docs.djangoproject.com/, 'Async related pages, related 4.1 changes and sites'
djangoproject.com, https://docs.djangoproject.com/en/4.0/topics/testing/tools/
djangoproject.com, https://docs.djangoproject.com/en/4.1/topics/testing/advanced/

DEV Community

Django 4.1: Where To Apply Async

Introduction

Difference between synchronous and asynchronous in Django

Environment and prerequisites

Conclusion

References

Top comments (0)

Read next

Automated crypto price tracking using GMAIL and Python

YOLOv11: A New Breakthrough in Document Layout Analysis

Advent of Code 2024 - Day 2: Red-Nosed Reports

The 7 Best Python Libraries Every Developer Needs to Know