Matt Layman

Posted on Jan 19, 2022 • Originally published at mattlayman.com

Understand Django: Go Fast With Django

#django #python #performance

In the last Understand Django article, we learned about commands. Commands are the way to execute scripts that interact with your Django app.

With this article, we're going to dig into performance. How do you make your Django site faster? Keep reading to find out.

Theory Of Performance

There are two ways to make a website faster:

Do more work.

Do less work.

How we do more work and less work depends on the type of work that we're trying to address on the site.

When we're talking about doing more work, what I'm really saying is that we often need to increase the throughput of a site. Throughput is a measure of work over time. By increasing throughput, a site can serve more users concurrently.

A very natural throughput measure on a Django site is requests per second. A page view on a site could contain multiple requests, so requests per second isn't a perfect analog to how many users your site can handle, but it's still a useful measure to help you reason about site performance.

On the flip side of doing more work, what does it mean to do less work to improve performance?

The fastest code is no code.

Every line of code that you write must be processed by a computer when it runs. If your code is inefficient or if too much is running, that will naturally mean that it will take longer to produce a final result.

The time from when an input happens to when an output is received is called latency. If a user clicks on a link on your site, how long does it take for them to get a response. That time delay is latency.

Less work doesn't just mean that you're running too much code! There are a lot of factors that might contribute to latency, some of them are easier to optimize than others.

Inefficient code - mistakes in development can make a computer slower than necessary
Data size - sending more data requires more effort to deliver to users
Geographic location - the speed of light is a real limit on network communication
and more!

If you can reduce latency on your site, you can improve the experience of the people using the site.

In the rest of this article, you'll learn how you can do both more and less work to make a better site that will benefit your users.

Measure First

Before optimizing, we have to recognize what kind of work is impacting an app. In other words, what is the resource constraint that is preventing an app from performing better?

Measuring applications, especially those that are running with live traffic, can be a tricky endeavor. I think we can look at an app from a zoomed out macro level or a zoomed in point of view.

I would start my analysis from inspecting the system overall for patterns. Broadly, you will find that performance tends to fall into a couple of major bottleneck categories:

I/O bound
CPU bound

I/O bound means that the system is limited (that's the "bound" part) by the inputs and outputs of the system. Since that's still a really vague statement, let's make it more concrete. An I/O bound system is one that is waiting for work to be available. Classic examples include:

waiting for responses from a database,
waiting for content from a file system,
waiting for data to transfer over a network,
and so on.

Optimizing an I/O bound system is all about minimizing those waiting moments.

Conversely, CPU bound systems are systems that are drowning in immediate work to calculate. The computer's Central Processing Unit can't keep up with all that it's being asked to do. Classic examples of CPU bound work are:

Data science computations for machine learning
Image processing and rendering
Test suite execution

Optimizing a CPU bound system focuses heavily on making calculations faster and cheaper.

Since we understand that an application that is underperforming is likely to be I/O bound or CPU bound, we can start to look for these patterns within the system. The easiest initial signal to observe is the CPU load. If the processors on your production machines are running at very high CPU utilization, then it's a pretty clear indicator that your app is CPU bound. In my estimation, you'll rarely see this for web applications. Most underperforming web applications are likely to be I/O bound.

The reason that web apps are often I/O bound has to do with their typical function. Most apps are fetching data from a database and displaying it to a user. There isn't a massive amount of computation (comparatively to something like machine learning) that the app needs to do. Thus, your app is probably waiting around for data from the database.

If you observe that the system is not CPU bound, then the next action is to dig deeper and find where the application is spending its time waiting, but how? Philosophically, you should now have an ok understanding of what to be looking for, but what tools can you use to accomplish the task of measurement?

We need to rewind a bit. A moment ago, I also made an assumption that you know how to find the CPU load of your application. That may not be the case.

The easiest way to monitor basic resource information about your app including CPU, memory, disk usage, and more may come from your hosting vendor. My preferred hosting vendor, Heroku, displays all these kinds of metrics on a single page so I can assess system performance at a glance. Other vendors like Digital Ocean or AWS provide tools to help you see this information too.

If your hosting vendor doesn't provide these tools, then you'll have to use other techniques. Presumably, if you're using a Virtual Private Server (VPS) for hosting, you have access to the server itself via ssh. On the server that's running your application, you can use a program like top. This is a classic program for checking which processes are using the "top" amount of resources. top will show the list of processes ordered by what is consuming the most CPU and will refresh the order of the list every second to provide a current snapshot in time. (Tip: use q to quit top after you start it.)

While top is useful and gets the job done to learn about CPU usage, it's not exactly the friendliest tool out there. There are alternatives to top that may offer a better user experience. I personally find top sufficient, but I know htop is a popular alternative.

If you don't have tools from your hosting provider and don't want to use ssh to log into a server, there are other options to consider. Broadly, this other category of tools is called Application Performance Monitoring (APM). APM tools are vendors that will monitor your application (go figure!) if you install the tool along with your app. Application performance is very important to businesses, so the software industry is full of vendors to choose from with a wide range of features.

To see what these tools can be like for free, you might want to check out Datadog which has a free tier (Datadog is not a sponsor, I've just used their service, enjoyed it, and know that it's free for a small number of servers). Other popular vendors include Scout APM and New Relic.

Finally, we've reached a point where you can diagnose the performance constraints on your application using a wide variety of tools or services. Let's see how to fix problems you may be experiencing!

Do More

We can address throughput and do more with a few different knobs.

When thinking about doing more, try to think about the system in two different scaling dimensions:

Horizontally
Vertically

Horizontal and vertical scaling are methods of describing how to do more in software systems.

Let's relate this to a silly example to give you a good intuitive feel for scaling. Imagine that you need to move large bags of dirt (approximately 40 lbs / 18 kg per bag) to plant a huge garden. The job is to unload the hundreds of bags from a delivery truck to your imaginary back yard. You enlist the help of your friends to get the job done.

One strategy is to get your strongest friends to help. Maybe there aren't as many of them, but their strength can make quick work of moving the bags. This is vertical scaling. The additional power of your friends allows them to move the bags more easily than someone with an average or weaker build.

Another strategy is to get lots of friends to help. Maybe these friends can't move as many bags as the stronger ones, but many hands make light work. This is horizontal scaling. The increased number of people allows the group to move more bags because more individuals can do the work simultaneously.

We can apply this same thinking to computer systems.

Vertical Scaling

To achieve vertical scaling, you would run your application on a more powerful computer. Cloud vendors give you all kinds of tools to do this. Picking a faster computer is naturally going to cost you more, so vendors make many options available (check out this page from AWS to see the dizzying array of options).

When should you think about vertical scaling? One natural case is when your application is CPU bound. If the processor is struggling to process the requests from an application, a faster processor may help. With a higher clock speed from a faster individual CPU, a computer will be able to process an individual request faster.

Moving to a larger computer is typically consider vertical scaling, but it may be possible to have horizontal effects by moving to a larger computer because of how modern computers are designed. These day, a larger computers typically come with a higher number of CPUs. Each individual CPU may be faster than a smaller computer configuration and there will be more CPUs on the single machine. Because of this characteristic, you will likely need to change your application configuration to take advantage of the additional power.

In the Understand Django deployment article, we discussed Gunicorn's --workers flag. Recall that Python application servers like Gunicorn work by creating a main process and a set of worker processes. The main process will distribute incoming network connections to the worker processes to handle the actual traffic on your website. If you vertically scale the server machine from a size that has 1 CPU to a machine that has 2, 4, or more CPUs, and you don't change the number of workers, then you'll waste available CPU capacity and won't see most of the benefits from the upgrade in server size.

If modern vertical scaling uses more CPUs when moving to a bigger machine, then what is horizontal scaling? The difference is primarily in the number of computers needed to do the scaling. Vertical scaling changes a single machine to achieve more throughput. Horizontal scaling pulls multiple machines into the equation.

Horizontal Scaling

Conceptually, how does horizontal scaling work? With the vertical scaling model, you can see a clear connection between users making a request to your website's domain and a single machine handling those requests (i.e., the main process from your application server distributes requests). With the horizontal model, we're now discussing multiple computers. How does a single domain name handle routing to multiple computers? With more computers!

Like the main process that distributes requests, we need a central hub that is able to route traffic to the different machines in your horizontally scaled system. This hub is usually called a load balancer. A load balancer can be used for multiple things. I see load balancers used primarily to:

Route traffic to the different application servers in a system.
Handle the TLS certificate management that makes HTTPS possible.

Since the load balancer doesn't do most of the actual work of processing a request, your system can increase its throughput by increasing the number of application servers. In this setup, each application server "thinks" that it is the main server that's handling requests. The load balancer behaves like a client that's making requests on behalf of the actual user. This kind of configuration is called a proxy setup.

If you want to learn more about horizontal scaling with a load balancer, then I suggest you check out Nginx (pronounced "engine X"), HAProxy (which stands for "high availability proxy"), or AWS ALBs (for "application load balancer"). These tools are commonly reached for and have a reputation for being strong load balancers.

What's Better?

What are the tradeoffs between horizontal scaling and vertical scaling?

When you add more pieces to a system, you're increasing the complexity of the system. Thus, vertical scaling can, at least initially, produce a design with lower operational complexity. Personally, if I ran a service on some VPS like Digital Ocean or AWS, I would probably reach for vertical scaling first. A bigger machine would allow me to use a higher number of concurrent worker processes to increase throughput, and I would avoid the complexity of deploying multiple application servers.

In reality, I run my side projects on a Platform as a Service, Heroku. With my choice of Heroku, the service already includes a load balancer by default. This means that I can trivially scale horizontally by changing a setting in Heroku that will start multiple application servers.

While vertical scaling may be a good fit if you don't have an existing load balancer, that scaling path does have downsides to consider.

First, in a vertically scaled world, downtime on your server could mean downtime for your service. Whether a site is reachable or not reachable is called "availability" by the software industry. If your entire site is tied to a large vertically scaled server, then it can act as a single point of failure if there is a problem.

Secondly, a vertically scaled service may potentially have more cost for you. In my experience, most websites have high and low periods of usage throughout the day. For instance, my current employer is a US healthcare company that provides telemedicine visits for people that need to speak with a doctor virtually. When it's the middle of the night in the US, the site utilization is naturally lower as most people are sleeping.

One common cost optimization is to use fewer computing resources during periods of lower utilization. On a vertically scaled service, it is harder to change computer sizes quickly. Thus, that computing resources usage is relatively fixed, even if no one is using your service. In contrast, a horizontally scaled service can be configured to use "auto-scaling."

Auto-scaling is the idea that the infrastructure can be resized dynamically, depending on the use of the site. During periods of high activity, more computers can be added automatically to join the load balancer distribution and handle additional load. When activity dies down, these extra machines can be removed from use. This cost saving technique helps ensure that your system is only using what it needs.

The truth is that if your system reaches a large enough size and scale, then picking horizontal or vertical scaling is a false choice. As a system matures and grows, you may need to have a mix of the two scaling types, so that your service has the characteristics that you need (like availability).

I hope that I've helped equip you with some mental modeling tools. With these tools, you should have some idea of how to handle more traffic when your site becomes wildly popular. 😄

In this section, we focused on increasing throughput by changing your service's infrastructure to handle more load. Now let's shift from the macro point of view to the micro view and talk about how to improve throughput by doing less.

Do Less

How do you make your Django site do less work? We should always measure, but since I believe most websites are I/O bound, let's focus on techniques to improve in that dimension.

Optimizing Database Queries

The most common performance problem that I've encountered with Django applications is the N+1 query bug (some people will describe it as the 1+N query bug for reasons that may become evident in a moment).

The N+1 bug occurs when your application code calls the database in a loop. How many queries are in this made up example?

from application.models import Movie

movies = Movie.objects.all()
for movie in movies:
    print(movie.director.name)

It's a bit of a trick question because you might have a custom manager (i.e., objects), but, in the simplest scenario, there is one query to fetch the movies, and one query for each director.

The reason for this behavior is that Django does a lazy evaluation of the movies QuerySet. The ORM is not aware that it needs to do a join on the movie and director tables to fetch all the data. The first query on the movie table occurs when the iteration happens in the Python for loop. When the print function tries to access the director foreign key, the ORM does not have the director information cached in Python memory for the query set. Django must then fetch the director data in another database query to display the director's name.

This adds up to:

1 query for the movies table
N queries on the director table for each iteration through the for loop

Hence the name, "N+1" query bug.

The reason that this is so bad is because calling the database is way slower than accessing data in Python memory. Also, this problem gets worse if there are more rows to iterate over (i.e., more movies to process and, thus, more directors to fetch individually).

The way to fix this issue is to hint to Django that the code is going to access data from the deeper relationship. We can do that hinting to the ORM with select_related. Let's see the example with this change.

from application.models import Movie

movies = Movie.objects.select_related("director").all()
for movie in movies:
    print(movie.director.name)

In the reworked example, the ORM will "know" that it must fetch the director data. Because of this extra information, the framework will fetch from both the movie and director tables in a single query when the for loop iteration starts.

Under the hood, Django performs a more complex SELECT query that includes a join on the two tables. The database sends back all the data at once and Django caches the data in Python memory. Now, when execution reaches the print line, the director.name attribute can pull from memory instead of needing to trigger another database query.

The performance savings here can be massive, especially if your code works with a lot of database rows at once.

While select_related is fantastic, it doesn't work for all scenarios. Other types of relationships like a many to many relationship can't be fetched in a single query. For those scenarios, you can reach for prefetch_related. With this method, Django will issue a smaller number of queries (usually 1 per table) and blend the results together in memory. In practice, prefetch_related operates very much like select_related in most circumstances. Check out the Django docs to understand more.

Caching Expensive Work

If you know:

Execution will likely happen many times
Is expensive to create, AND
Won't need to change

Then you're looking at work that is a very good candidate to cache. With caching, Django can save the results of some expensive operation into a very fast caching tool and restore those results later.

A good example of this might be a news site. A news site is very "read heavy," that is, users are more likely to use the site for viewing information than for writing and saving information to the site. A news site is also a good example because users will read the same article, and the content of that article is fixed in form.

Django includes tools to make it simple to work with the cache to optimize content like our news site example.

The simplest of these tools is the cache_page decorator. This decorator can cache the results of an entire Django view for a period of time. When a page doesn't have any personalization, this can be a quick and effective way to serve HTML results from a view. You can find this decorator in django.views.decorators.cache.

You may need a lower level of granularity than a whole page. For instance, your site might have some kind of logged in user and a customized navigation bar with a profile picture or something similar. In that scenario, you can't really cache the whole page and serve that to multiple users, because other users would see the customized navigation bar of the first user who made the request. If this is the kind of situation you're in, then the cache template tag may be the best tool for you.

Here's a template example of the cache tag in use.

{% load cache %}

Hi {{ user.username }}, this part won't be cached.

{% cache 600 my_cache_key_name %}
    Everything inside of here will be cached.
    The first argument to `cache` is how long this should be cached
    in seconds. This cache fragment will cache for 10 minutes.
    Cached chunks need a key name to help the cache system
    find the right cache chunk.

    This cache example usage is a bit silly because this is static text
    and there is no expensive computation in this chunk.
{% endcache %}

With this scheme, any expensive computation that your template does will be cached. Be mindful with this tag! The tag is useful if computation happens during rendering, but if you're do the evaluation and fetching inside of your view instead of at template rendering time, then you're unlikely to get the benefits that you want.

Finally, there is the option to use the cache interface directly. Here's the basic usage pattern:

# application/views.py
from django.core.cache import cache

from application.complex import calculate_expensive_thing

def some_view(request):
    expensive_result = cache.get("expensive_computation")
    if expensive_result is None:
        expensive_result = calculate_expensive_thing()
        cache.set("expensive_computation", expensive_result)

    # Handle the rest of the view.
    ...

On the first request to this view, the expensive_result won't be in the cache, so the view will calculate the result and save it to the cache. On subsequent requests, the expensive result can be pulled from the cache. In this example, I'm using the default timeout for the cache, but you can control the timeout values when you need more control. The cache system has plenty of other cool features, so check it out in the docs.

As fair warning, caching often requires more tools and configuration. Django works with very popular cache tools like Redis and Memcached, but you'll have to configure one of the tools on your own. The Django documentation will help you, but be prepared for more work on your part.

Database optimization and caching are go to techniques for optimization. When you're optimizing, how do you know that you're doing it right? What gains are you getting? Let's look at some tools next that will let you answer those questions.

Tools To Measure Change

We'll look at tools at an increasing level of complexity. This first tool is one that is massively useful while developing in Django. The other tools are more general purpose tools, but they still are worth knowing about, so that you'll know when to reach for them.

Each of these tools helps you get real performance data. By measuring the before and after of your changes, you can learn if the changes are actually producing the gains that you expect or hope to achieve.

Django Debug Toolbar

The Django Debug Toolbar is a critical tool that I add to my projects. The toolbar acts as an overlay on your site that opens to give you a tray of different categories of diagnostic information about your views.

Each category of information is grouped into a "panel," and you can select between the different panels to dig up information that will assist you while doing optimization work.

You'll find panels like:

SQL
Templates
Request
Time

The SQL panel is where I spend nearly all of my time when optimizing. This panel will display all the queries that a page requests. For each query, you can find what code triggered the database query and you even get the exact SQL SELECT. You can also get an EXPLAIN about a query if you really need the gory details of what the database is doing.

With a little bit of eye training, you'll learn to spot N+1 query bugs because you can see certain queries repeated over and over and "cascading" like a waterfall.

I'll often test with the debug toolbar when I'm trying to sprinkle in select_related to visually confirm that I've reduced the query count on a page. The debug toolbar is open source and is a great free resource. The toolbar is totally worth the investment of configuring it for your next Django project.

hey / ab

There are two tools that are very similar that I use when I need to get a crude measure of the performance of a site. These tools are hey and ab (Apache Bench). Both of these tools are load generators that are meant to benchmark a site's basic performance characteristics.

In practice, I prefer hey, but I mention ab because it is a well known tool in this space that you are likely to encounter if you research this load generator topic.

Operating this kind of tool is trivial:

$ hey https://www.example.com

In this example, hey will try to open up a large number of concurrent connections to the URL and make a bunch of requests. When the tool is done, it will report how many of the requests were successful and some related timing information and statistics. Using a load generator like this lets you synthesize traffic to learn how your site is going to perform.

I'd suggest you be careful where you tell these tools to operate. If you're not careful, you could cause a Denial of Service attack on your own machines. The flood of requests might make your site unavailable to other users by consuming all your server's resources. Think twice before pointing this at your live site!

Locust

The previous load generator tools that I mentioned act as somewhat crude measurements because you're limited to testing a single URL at a time. What should you do if you need to simulate traffic that matches real user usage patterns? Enter Locust. Locust is not a tool that I would reach for casually, but it is super cool and worth knowing about.

The goal of Locust is to do load testing on your site in a realistic way. This means that it's your job to model the expected behavior of your users in a machine understandable way. If you know your users well (and I hope you do), then you can imagine the flows that they might follow when using your site.

In Locust, you codify the behavior patterns that you care about, then run Locust to simulate a large number of users that will act like you expect (with randomness to boot to really make the test like reality).

Advanced load testing is something you may never need for your site, but it's pretty cool to know that Python has you covered if you need to understand performance and the limits of your site at that deep level.

Application Performance Monitoring (APM)

Earlier in this article, I mentioned that Application Performance Monitoring tools can show you CPU and memory utilization of your site. That's usually just the tip of the iceberg.

An APM tool often goes far beyond hardware resource measurement. I like to think of APMs as a supercharged version of the debug toolbar.

First, an APM is used on live sites typically. The tool will collect data about real requests. This lets you learn about the real performance problems on the site that affect real users.

For instance, New Relic will collect data on slow requests into "traces." These traces are aggregated into a set to show you which pages on your site are the worst performers. You can drill into that list, view an individual trace, and investigate the problem.

Maybe you've got an N+1 bug. Maybe one of your database tables is missing an index on an important field, and the database is scanning too many records during SELECT statements. These traces (or whatever they are called in other services) help you prioritize what to fix.

In fact, an APM highlights the true value of measurement. If I can leave you with a parting thought about optimization, think about this: optimize where it counts.

Here's a simple thought experiment to illustrate what I mean. You have an idealized system that does two things repeatedly:

One task (A) is 90% of all activity on the site.
The other task (B) is the remaining 10%.

If you have a to pick a target to try to optimize because your system performance is inadequate, which one do you pick?

Let's assume that you know an optimization for each type of task that could cause the task to execute in 50% of the time. If implementing each optimization is the same level of effort, then there is a clear winner as to which task you should optimize. You could either:

Optimize A for 90% * 50% for a total system saving of 45%.
Optimize B for 10% * 50% for a total system saving of 5%.

In most circumstances, spend your optimization effort on the area that will have outsized impact (i.e., pick task A as much as you can). Sometimes the hard part is figuring out which task is A and which task is B. Monitoring tools like an APM can help you see where the worst offenders are so you can focus your limited time in the right spot.

Summary

In this article, we looked into making Django apps go fast. We saw:

A mental model for thinking about performance optimization
Different types of performance bottlenecks
How to get your system to do more by either horizontal or vertical scaling
How to get your app to do less work by optimizing database queries and caching
Tools to aid you in all of this optimization work

In the next article, we'll look into security. You'll learn about:

How Django helps you be more secure with some of it design features
What those different warnings from ./manage.py check --deploy mean
Fundamental things you should consider to help keep your site secure

If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am @mblayman.

Top comments (2)

Grace • Jan 25 '22

Hi, Matt! Really enjoyed reading your piece. I work at New Relic and want to thank you for giving us a shout out. I'd love to send you a small gift! Please DM @newrelic on Twitter. :) Look forward to hearing from you!

Matt Layman • Mar 6 '22

Hi, Grace! I'm glad you liked the article. No gift needed, but I appreciate the sentiment. :)