I’ve been a Python (2) dev for around 5 years now; I’m not here to bash Python. That being said, one of the language’s few failings is its poor concurrency/parallelism story. There have been attempts to fight the GIL before, like Twisted, eventlet, and the standard library’s threading module, but the results have been (IMO) overengineered, overcomplicated, and just not...nice.
Now there’s asyncio
, which seems better, but still inferior to goroutines or Clojure’s core.async (which is basically goroutines), or the plethora of options presented by e.g. Rust or Haskell.
My question to the audience is: is asyncio
worth it? That is:
1) if you had to start a project from scratch that you knew would involve concurrency and/or parallelism and had sufficient freedom of choice, would you choose Python 3.6 and asyncio
?
2) if you had a preexisting Python project that you had to add concurrency/parallelism to, would you choose asyncio
(over Twisted, eventlet, gevent, etc.)? Would it be worth porting a Python 2.x project to 3.6 for?
Top comments (5)
Let's define concurrency and parallelism first because it's a subtle difference and sometimes they get confused one for the other.
Concurrency means the program can run different tasks in overlapping time periods. A standard multitasker on a single core CPU is concurrent, because it cycles between different tasks (your programs) but those tasks are not technically running at the same time.
Parallelism means the ability of running completely separate tasks at the same time.
A program can be concurrent but not parallel, parallel but not concurrent, both or neither if it's totally sequential.
There are many ways to achieve concurrency, probably the most well known is using an event loop that runs until termination and on which build your async program. Twisted, eventlet, asyncio all use event loops (usually through poll or kqueue system calls).
Multithreading can be concurrent or parallel. On a single core it will always be concurrent. On a multi core it might be parallel because parallelism means "running two separate tasks AT THE SAME TIME" which you might or not achieve using multithreading, depends on how you structure your program.
Let me repeat: given a unit of time concurrent means multiple tasks make progress, parallel means multiple tasks run at the same time.
Python's multithreading, because of GIL, can never be parallel but it is concurrent. That's why you can use ThreadPool to "advance your tasks in concurrency" but not to run those tasks at the same time on different CPU cores.
A way to use all the cores in Python is to use multiple processes. Having more than one process (the interpreter with your piece of code) running on multiple cores can help you achieve parallelism in Python.
Processes in Python have a limit though, because you are effectively running multiple interpreters for the duration of your parallel tasks so you have to be careful about how many (benchmarking help) and how much memory each process occupies.
You can also obviously combine multiple processes and threads.
My head is already hurting and we haven't even talked about parallel programming with multiple networked machines, synchronisation between different tasks or other concurrency models (like Go's, Erlang's)
So, to answer your questions: it depends on what you have to do :-)
asyncio (or uvloop) is a perfectly valid choice if you want to build a high performant concurrent application. I wouldn't introduce Twisted in an existing application. I know there's an asyncio clone for Python 2 but I have no idea how stable it is: trollius.
Keep in mind you need async libraries for IO: database access, HTTP calls and stuff, everything needs to support asyncio or be adapted to it.
Concurrency and parallelism always come with a price :-)
I am a big fan of Python, and I would pick Node. I know the objections, about callback hell, but Iove its non-blocking philosophy of handling concurrency.
I don't know any of the two languages below, but I had space for experiments, I would pick Rust or Golang.
I am currently working on a realtime Django channels project which needs to fetch data from a REST api and AMQP server. Fetched data need to be processed in a very fast manner and pushed to web clients. There is also Redis involved. This was my first time using asyncio but I am quite happy with how the code is shaping and its performance. Your mind is free from dealing with race conditions or moving data between threads. You need to adapt your thinking to the concept of coroutines and the event loop but it happens quickly (i.e. easy learning curve) and from what you said in your post I guess it will be a no problem for you too. I guess other languages you mentioned already offers this without the need to fight the GIL but to me asyncio is a good enough abstraction to have the same in Python.
One downside is once you are into the async world you need compatible libraries to get the most out of it. Luckily for me aiohttp and aioredis is already available and stable. For amqp I chose kombu which is not yet asyncio compatible (but next major version will be) but Django channels provides something called background workers and integrating kombu wasn't a problem.
More specific answers to your questions:
1) I am not familiar with Rust or Clojure yet but I'll definitely choose Python and asyncio over Java or C++ threads if I need concurrency mainly for dealing with network bound slow operations. Pros of Python will beat whatever cons asyncio has for me.
2) I guess this depends mostly on the complexity of the project and what dependencies you need. If it includes an ORM library for instance which is not async, it can be a pain to adapt it to asyncio. Before starting this project I checked Twisted and eventlet and I found them complex for my needs. Asyncio offered a much simpler way of having concurrency and results so far are good as I said.
Concurrency is really hard in Python, not parallelism. (IMO)
I'd use multiprocessing, greenlets, or Stackless Python. Jython or IronPython could also help.