Scale your wsgi project to the next level by leveraging everything Gunicorn has to offer.
This article assumes you’re using a sync framework like flask or Django and won’t explore the possibility of using the async/await pattern.
First, let’s briefly discuss how python handles concurrency and parallelism.
Python never runs more than 1 thread per process because of the GIL.
Even if you have 100 threads inside your process, the GIL will only allow a single thread to run at the same time. That means that, at any time, 99 of those threads are paused and 1 thread is working. The GIL is responsible for that orchestration.
To get around this limitation, we can use Gunicorn. From the docs:
Gunicorn is based on the pre-fork worker model. This means that there is a central master process that manages a set of worker processes. The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.
This means that Gunicorn will spawn the specified number of individual processes and load your application into each process/worker allowing parallel processing for your python application.
Since one size will never fit everyone’s needs, it offers different worker types in order to suit a broader range of use cases.
This is the default worker class. Each process will handle 1 request at a time and you can use the parameter -w to set workers.
The recommendation for the number of workers is 2–4 x $(NUM_CORES), although it will depend on how your application works.
When to use:
- Your work is almost entirely CPU bound;
- Low to zero I/O operations (this includes database access, network requests, etc).
Signs to look for in production:
Monitor CPU usage and incoming requests to make sure you have the right average number of processes for your machine size and also request patterns.
If you have too many processes, it can slow down your average latency since it will force a lot of context switching to happen in your machine CPU.
If you see a lot of timeout errors between your reverse proxy (i.e. nginx), it’s a sign that you don’t have enough concurrency to handle your traffic patterns/load.
If you try to use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used instead.
If you use gthread, Gunicorn will allow each worker to have multiple threads. In this case, the Python application is loaded once per worker, and each of the threads spawned by the same worker shares the same memory space.
Those threads will be at the mercy of the GIL, but it’s still useful for when you have some I/O blocking happening. It will allow you to handle more concurrency without increasing your memory too much.
The recommended total amount of parallel requests is still the same.
This is probably the most used configuration you’ll see out in the wild.
When to use:
- Moderate I/O operations;
- Moderate CPU usage;
- You’re using packages/extensions that are not patched to run async and/or are unable to patch them yourself.
Signs to look for in production:
The ones I described for the sync worker type.
…with the caveat of the balance between proc vs threads. This balance will depend a lot on your usage patterns.
Eventlet and gevent make use of “green threads” or “pseudo threads” and are based on greenlet.
In practice, if your application work is mainly I/O bound, it will allow it to scale to potentially thousands of concurrent requests on a single process.
Even with the rise of async frameworks (fastapi, sanic, etc), this is still relevant today since it allows you to optimize for I/O without having the extra code complexity.
The way they manage to do it is by “monkey patching” your code, mainly replacing blocking parts with compatible cooperative counterparts from gevent package.
It uses epoll or kqueue or libevent for highly scalable non-blocking I/O. Coroutines ensure that the developer uses a blocking style of programming that is similar to threading, but provide the benefits of non-blocking I/O.
This is usually the most efficient way to run your django/flask/etc web application, since most of the time the bulk of the latency comes from I/O related work.
That being said, it can be tricky to have it configured 100% correctly, and if you’re not serving hundreds or more requests/sec, it’s probably easier to just use the gthread worker class
Signs to look for in production:
- Make sure all parts of your code cooperate with these async frameworks (e.g. properly patched). Without that, you could have blocked threads that are sitting idle and won’t be able to execute work (like accepting new requests and answer to previously accepted requests that finished the I/O call). In production, if your CPU usage is low but you’re seeing a lot of timeouts in your nginx logs, there’s a good chance that’s happening. But you should audit this before deploying to production. (I’ll describe how to handle this later on this post).
- Connections to your databases. If you have thousands of concurrent connections and you’re using a DBMS like PostgreSQL without a connection pooler, chances are, you’re going to have a bad time (I’ll describe how to handle this later on this post).
There’s also a Tornado worker class. It can be used to write applications using the Tornado framework. Although the Tornado workers are capable of serving a WSGI application, this is not a recommended configuration.
I’ll focus on gevent instead of eventlet since it has become the popular choice.
Make sure everything on your project is gevent friendly. This includes packages and drivers. I’ll list some of the most used packages and how to patch them if needed.
The official package psycopg2, but it’s not prepared to be patched by gevent.
You also need psycogreen:
The recommended package is PyMySQL and it is gevent friendly:
The recommended package is redis-py and it is gevent friendly:
The recommended package is PyMongo and it is gevent friendly:
The recommended package is elasticsearch-py and it is gevent friendly.
Quote from a maintainer:
The library itself just passes whatever is returned from the connection class. It uses standard sockets by default (via urllib3) so it can be made compatible by monkey patching. Alternatively you can create your own connection_class and plug it in.
The recommended package is from datastax and it is gevent friendly:
One thing to take into consideration when using gevent is to understand that it’s really easy to end up with a lot of concurrent connections to, for example, your database. For some DBMS like PostgreSQL, that can be really dangerous.
The standard practice for these cases is to use a connection pool. In the case of PostgreSQL, the SQLAlchemy framework or PgBouncer will work very well.
Blocked thread monitoring
It’s really important to make sure parts of your code are not blocking a greenlet from returning to the hub.
Fortunately, since gevent version 1.3, it’s simple to monitor using the property monitor_thread and you can event enable it inside your unit tests:
It’s also a good idea to have it enabled in your development environment since some blocks might be missed during your CI runs since it’s usual to mock some of the I/O stuff.
- Gunicorn/wsgi is still a valid choice even with the rise of async frameworks like fastapi and sanic;
- gthread is usually the preferred worker type by many due to it’s ease of configuration coupled with the ability to scale concurrency without bloating your memory too much;
- gevent is the best choice when you need concurrency and most of your work is I/O bound (network calls, file access, databases, etc…).
How does this all sound? Is there anything you’d like me to expand on? Let me know your thoughts in the comments section below (and hit the clap if this was useful)!
Stay tuned for the next post. Follow so you won’t miss it!