The Status of Python Web Performance at the end of 2022

#webdev #python #showdev

Let's talk about the current state of python web framework throughput performance, using the very popular TechEmPower as reference, looking only to PlainText to check only the framework throughput not the DB library or JSON library performance, we can see that for a few years japronto dominates the top.

Round 21:

Check the results here: https://www.techempower.com/benchmarks/#section=data-r21&test=plaintext&l=zijzen-6bj

The state of japronto

The source code from the project resides in the github, with more than 8.6k stars and 596 forks is a very popular github, but no new releases are made since 2018, looks pure much not maintained anymore, no PR's are accepted no Issues are closed, still without windows or macOS Silicon, or PyPy3 support. Japronto it self uses uvloop with more than 9k stars and 521 forks and different from japronto is seems to be well maintained.

So how the only Python framework hiting more than 1 million, actually more than 2.6 million of requests per second, more than 4.7 times than the second place falcon with only 564k requests per second, is an obsolete project with no secure updates since 2018 and still keeped by TechEmPower in the charts? I don't know but japronto github says that is an early preview and recomend the use of sanic with hits 277k requests per second.

Sanic is very very popular with 16.6k stars, 1.5k forks, opencollective sponsors and a very active github.
Falcon is more popular than japronto with 8.9k stars, 898 forks, opencollective sponsors and a very active github too.
Despite Japronto been keeped as first place by TechEmPower, Falcon is a way better solution in general with performance similar to fastify an very fast node.js framework that hits 575k requests per second in this benchmark.

So... Is still hope for hitting 1 million requests or more with Python?

After year of japronto dominance as the firsy place of TechEmPower PlaintText for Python,
a new library called vibora appears... and it's another project without any updates since 2019. Ok let's filter vibora and japronto and go for the next one that is active, a new library called socketify.py and also another new library called robyn arrives on the millions req/s in the live results.

Check the results here: https://www.techempower.com/benchmarks/#section=test&runid=43469a25-670e-4bda-a3c3-652839435784&test=plaintext&l=hra0hr-35n&w=18y6o-0-0-0&f=0-0-0-0-0-w-0-0-0-0-0-8vn08w-0-0

With socketify.py hiting more than 2 million of requests with Python or PyPy3, and robyn with more than 1 million the hope of an stable and well maintained library that can hit in the millions is born again.

But how it compare in others benchmarks?

I pick uvicorn, falcon and robyn for comparison using oha using 40 concurrent requests for 5s, running 1 time for warmup and getting an average of 3 runs, just sending back an "Hello, World!" message.



oha -c 40 -z 5s http://localhost:8000/

HTTP requests per second (Linux x64)

It's a really an over simplificated microbenchmark, but shows a big difference between TechEmPower benchmarks using wrk and + 16 pipelining with an lua script, and of course will not hit 1 million req/s on my machine, running tfb tools in my machine socketify.py + PyPy3 hits 770k req/s for some context.

It also show that PyPy3 will not magically boost your performance, you need to integrate in a manner that PyPy3 can optimize and delivery CPU performance, with a more complex example maybe it can help more. But why socketify is so much faster using PyPy3? The answer is CFFI, socketify did not use Cython for integration and cannot delivery the full performance on Python3, this will be solved with HPy.

Also we can see that the Cython + meinheld (that is builded with C) are WAY faster than other options of falcon, but you will not get an optimized JIT for all python code that you write in a real life project.

HTTP pipelining is buggy and not really supported by browsers, the technique was superseded by multiplexing via HTTP/2, which is supported by most modern browsers, TechEmPower don't test HTTP/2 or HTTP/3.

In HTTP/3, multiplexing is accomplished via QUIC which replaces TCP. This further reduces loading time, as there is no head-of-line blocking even if some packets are lost.

Supporting HTTP/2 and HTTP/3 is way more important than winning TechEmPower HTTP/1.1 benchmarks, this only means you need to take any benchmark with a grain of salt, and run bencharks in an more close to reality scenario by our self, Caching strategy, Bandwidth, Latency, CPU time, Memory, all are relevant and not really tested here.

WebSocket Messages per second (Linux x64)

Socketify.py got almost 900k messages/s with PyPy3 and 860k with Python3 the same performance as Bun,
with Falcon 35k messages/s and with Falcon with PyPy3 56k messages/s, node.js manages 192k.

Bun is REALLY fast! But why the performance of Bun is almost identicall to socketify.py? The answer is that it also uses uWebSockets as base, actually the same code of the PR i made to the C API is been used in Bun with some tweaks, and it endup with almost the same performance that socketify.py, showing that both have some DNA in common.

I will do a more deep look and benchmarks in future, but for now it's it.

This library still very young with less than 100 stars on github, still a very young library that gives commercial support and are looking for sponsors, It's built on the shoulders of giants (very well tested, maintained and supported giants), bringing HTTP, HTTP/2 and WebSockets support, with experimental HTTP/3 in the roadmap, is already a much more complete than japronto.

Runtime versions: PyPy3 7.3.9, Python 3.10.7, node v16.17.0, bun v0.2.2

Framework versions: gunicorn 20.1.0 + uvicorn 0.19.0, socketify alpha, gunicorn 20.1.0 + falcon 3.1.0, robyn 0.18.3

HTTP tested with oha -c 40 -z 5s http://localhost:8000/ (1 run for warmup and 3 runs average for testing)

WebSocket tested with Bun.sh bench chat-client check the source code in this link

Machine OS: Debian GNU/Linux bookworm/sid x86_64 Kernel: 6.0.0-2-amd64 CPU: Intel i7-7700HQ (8) @ 3.800GHz Memory: 32066MiB

Follow @cirospaciari on twitter for more updates about socketify.py and if you have some time check out the github https://github.com/cirospaciari/socketify.py this project can be even better with some community love ;)