Hey folks,
lately I'm playing around a lot with artificial intelligence and Python. What I want to build now is a web interface for a python based ML model (keras etc). For full flexibility I want the app to a) take a stream of input data b) return a stream of output data. Each chunk in the input stream would be run through the ML model and the corresonding output streamed back as a chunk.
However I could not find any resources online that tell me how to SEND a stream response in a production grade python web framework. Anyone got experience how to send a HTTP stream response in python?
Currently I'm focusing on the combination of falcon + gunicorn to create a web app as Flask doesn't seem to be a production grade framework. Although my requirements are not production-grade I would love to figure out how to do this at such a level.
/andreas
Top comments (6)
Hi Andreas,
Do you have special requirements?
With HTTP 1.1 you can stream data using chunked transfer encoding, which means you can send the header
Transfer-Encoding: chunked
and the data you want. You can see an example on MDN.You can also stream with other protocols: HTTP 2, websockets, grpc and so on.
So if you control both the client and the server you can choose how to stream your data.
Falcon unfortunately doesn't support that header.
The actual reason why is that WSGI itself (the interface under pretty much all Python servers) does not support Transfer-Encoding:
This because WSGI servers are middlewares and streaming would break the pattern.
Flask does not support it either but they worked around it using generators. Here it's my example:
and the client:
As you can see it's not using "Transfer-Encoding", just iterating on the generator and sending data.
Another option you have is to use aiohttp which is not based on WSGI and works well with chunked streaming.
You can find an example in this article though there's a bug on line 8 of his example, the rest works :-)
Replace:
interval = int(request.GET.get('interval', 1))
with
interval = int(request.query.get('interval', 1))
The headers sent by the server:
As you can see it supports chunked streaming.
Wow thanks for this amazing response! I will take a deeper look at aiohttp. But wrapping my code into a generator would also be a valid fallback.
Maybe I am just too spoiled by the way nodeJS handles streams that I have a hard time understanding why things are so difficult in the python world :)
If you're used to NodeJS I'm sure you'll fit in with aiohttp being async and all.
About the generator trick: I haven't tried with gunicorn and multiple processes. I feel like it's going to destroy the performance because each process might be able to serve only on request (the one generating the stream).
Can't wait to read your article on the solution ;)
It's gonna work well if you don't need to handle 10+ clients simultaneously
Hi Alex,
do you mean Flask's solution? If so, probably even less. If you mean aiohttp's I'm curious to know if you tested it.
I've never used aiohttp so I don't know much.
Use SSE with aiohttp, you can use websockets or
Transfer-Encoding: chunked
as well. Don't use any framework on top of WSGI unless you are building server for few/one concurrent clients