loading...

How to make Python code concurrent with 3 lines

rhymes profile image rhymes ・3 min read

I was inspired by @rpalo 's quest to uncover gems in Python's standard library

I decided to share one of my favorite tricks in Python's standard library through an example. The entire code runs on Python 3.2+ without external packages.

The initial problem

Let's say you have a thousand URLs to process/download/examine, so you need to issue as much HTTP GET calls and retrieve the body of each response.

This is a way to do it:

import http.client
import socket

def get_it(url):
    try:
        # always set a timeout when you connect to an external server
        connection = http.client.HTTPSConnection(url, timeout=2)

        connection.request("GET", "/")

        response = connection.getresponse()

        return response.read()
    except socket.timeout:
        # in a real world scenario you would probably do stuff if the
        # socket goes into timeout
        pass

urls = [
    "www.google.com",
    "www.youtube.com",
    "www.wikipedia.org",
    "www.reddit.com",
    "www.httpbin.org"
] * 200

for url in urls:
    get_it(url)

(I wouldn't use the standard library as an HTTP client but for the purpose of this post it's okay)

As you can see there's no magic here. Python iterates on 1000 URLs and calls each of them.

This thing on my computer occupies 2% of the CPU and spends most of the time waiting for I/O:

$ time python io_bound_serial.py
20.67s user 5.37s system 855.03s real 24292kB mem

It runs for roughly 14 minutes. We can do better.

Show me the trick!

from concurrent.futures import ThreadPoolExecutor as PoolExecutor
import http.client
import socket

def get_it(url):
    try:
        # always set a timeout when you connect to an external server
        connection = http.client.HTTPSConnection(url, timeout=2)

        connection.request("GET", "/")

        response = connection.getresponse()

        return response.read()
    except socket.timeout:
        # in a real world scenario you would probably do stuff if the
        # socket goes into timeout
        pass

urls = [
    "www.google.com",
    "www.youtube.com",
    "www.wikipedia.org",
    "www.reddit.com",
    "www.httpbin.org"
] * 200

with PoolExecutor(max_workers=4) as executor:
    for _ in executor.map(get_it, urls):
        pass

Let's see what changed:

# import a new API to create a thread pool
from concurrent.futures import ThreadPoolExecutor as PoolExecutor

# create a thread pool of 4 threads
with PoolExecutor(max_workers=4) as executor:

    # distribute the 1000 URLs among 4 threads in the pool
    # _ is the body of each page that I'm ignoring right now
    for _ in executor.map(get_it, urls):
        pass

So, 3 lines of code, we made a slow serial task into a concurrent one, taking little short of 5 minutes:

$ time python io_bound_threads.py
21.40s user 6.10s system 294.07s real 31784kB mem

We went from 855.03s to 294.07s, a 2.9x increase!

Wait, there's more

The great thing about this new API is that you can substitute

from concurrent.futures import ThreadPoolExecutor as PoolExecutor

with

from concurrent.futures import ProcessPoolExecutor as PoolExecutor

to tell Python to use processes instead of threads. Out of curiosity, let's see what happens to the running time:

$ time python io_bound_processes.py
22.19s user 6.03s system 270.28s real 23324kB mem

20 seconds less than the threaded version, not much different. Keep in mind that these are unscientific experiments and I'm using the computer while these scripts run.

Bonus content

My computer has 4 cores, let's see what happens to the threaded versions increasing the number of worker threads:

# 6 threads
20.48s user 5.19s system 155.92s real 35876kB mem
# 8 threads
23.48s user 5.55s system 178.29s real 40472kB mem
# 16 threads
23.77s user 5.44s system 119.69s real 58928kB mem
# 32 threads
21.88s user 4.81s system 119.26s real 96136kB mem

Three things to notice: RAM occupation obviously increases, we hit a wall around 16 threads and at 16 threads we're more than 7x faster than the serial version.

If you don't recognize time's output is because I've aliased it like this:

time='gtime -f '\''%Us user %Ss system %es real %MkB mem -- %C'\'

where gtime is installed by brew install gnu-time

Conclusions

I think ThreadPoolExecutor and ProcessPoolExecutor are super cool additions to Python's standard library. You could have done mostly everything they do with the "older" threading, multiprocessing and with FIFO queues but this API is so much better.

Posted on Nov 23 '18 by:

rhymes profile

rhymes

@rhymes

Software developer @ DEV

Discussion

markdown guide
 

Great article. I was not aware of concurrent.futures in standard library.

I use gevent in Python 2 for light-weight threads. It uses green threads under the hood. The API is very simple to use. If you have a lot of IO bound tasks, e.g downloading over 100 files / making a lot of requests concurrently, this library is very useful.

 

Great article. I was not aware of concurrent.futures in standard library.

I don't think you're the first one, the various "what's new in Python" contain a lot of gems over the years ;)

I use gevent in Python 2 for light-weight threads. It uses green threads under the hood. The API is very simple to use. If you have a lot of IO bound tasks, e.g downloading over 100 files / making a lot of requests concurrently, this library is very useful.

Yeah gevent is nice though I've never been a huge fan of cooperative multi tasking as a concurrency paradigm. It's true as you say that for I/O bound tasks has it uses though.

The objective of the post was to uncover a hidden gem in the standard library. Regarding async I/O you might want to take a look at asyncio in the standard library (Python 3's) and uvloop outside the library!

 

uvloop is excellent! I have used it for Python 3 based web-services.

 

I agree, have a look at asyncio, that seems to be the future. Better to say it's already the present!

 

Map is fantastic, but I like being able to see progress of my tasks correctly using tqdm, for that I use the following pattern:

from concurrent.futures import as_completed
output = []
with ThreadPoolExecutor(num_threads) as ex:
  futures = [ex.submit(lambda (ii, query): (ii, query_func(query))) for ii, ean in enumerate(queries)]
  for f in tqdm(as_completed(futures)):
    output.append(f.result())

output = [s[0] for s in sorted(output, key=lambda x: x[0])]
 

Yeah, there's a lot of neat stuff in that module. I chose map because it's probably the quick and dirty solution to make a sequential workload into a parallel one.

I have a program in production that does a mass upload of images to S3 using futures and as_completed.

Didn't know about tqdm though, thanks for the tip!

 
 

Awesome! I haven’t done much async/concurrent Python and I want to learn more about it. Thanks for sharing :)

P.S. I’m curious to see if series can span multiple users or not. What happens if you add ‘series: The Python Standard Library’ to your front matter? Does it link it to my post? Or does it start your own series?

 

P.S. I’m curious to see if series can span multiple users or not. What happens if you add ‘series: The Python Standard Library’ to your front matter? Does it link it to my post? Or does it start your own series?

I just checked the source code, series are linked to a single user :-)
github.com/thepracticaldev/dev.to/...

 

Probably for the best. Would want just anybody hijacking your series.

 

Just add three things to your post:

  1. The API has been around since Python 3.2
  2. There's a backport for those still using 2.7.
  3. Concurrency is not parallelism
 

The API has been around since Python 3.2

This I said in the intro :D

There's a backport for those still using 2.7.

This I didn't know, I don't pay attention to what's going on in Python 2 anymore :D
Thanks for the info!

Concurrency is not parallelism

That is true, though in this context it's yes and no and depends on the number of cores.

Python has a GIL which simplifies its C API so that up to 1 thread is running at the same time when it's running bytecode.

Python though has a 1 by 1 threading model, where 1 Python thread is mapped onto 1 OS thread.

If the code is I/O bound, Python effectively releases the GIL, so the threads waiting for I/O are running free of it.

How many threads are running at the same time? One per core. I have 4 cores, so there are effectively 4 units of independent code running in parallel. The context switching (concurrency) does happen but there are up to 4 different parallel threads runnning.

I hope this clears it.

As the famous Go slides you linked state: concurrency is about dealing with lots of things at once (and you can be concurrent with only one thread and one process, see Node for example), parallelism is about doing multiple things at once.

In this case we have concurrent tasks that can be run in parallel if you have more than one core.

In Python there's only one moment when multiple threads achieve paralellism, and that's if they are waiting for I/O (and that's exactly why I chose this example :D) with multiple cores. If you want CPU bound parallelism in Python you have to use processes.

 

How many threads are running at the same time? One per core. I have 4 cores, so there are effectively 4 units of independent code running in parallel.

Having 4 cores is irrelevant as the python threads can only utilize one logical core due to the GIL. The only possible way to release the GIL is using the C API, but it’s not advisable.

Python's I/O primitives are written with the C API.

They release the GIL when they are waiting for I/O, so in those moments when you're waiting for I/O, your program, for a little while is technically parallel.

A few moments of pure parallelism ;-)

So no, having multiple cores is not completely irrelevant.

 
 

2018: Python engineers have heard about concurrency :)

AFAIU, this executors were added in 3.2, and they are unfortunately not a solution for the real life. Like ruby, python is stuck to be single-threaded not because there are no system thread/processes wrappers existing in the standard library, but because neither threads nor processes are robust enough to deal with real tasks usually requiring some action upon completion.

Synchronization, yielding and basic map-reduce would become a nightmare. To deal with concurrency, the virtual machine is unfortunately required, and there are Java, Erlang and to some extent Golang to deal with the issue.

 

2018: Python engineers have heard about concurrency :)

So if my next post about some lesser known part of the standard library (or whatever I want to talk about) is about a string method are you going to start your comment with "2018: Python engineers have heard about strings" ?

AFAIU, this executors were added in 3.2,

If you re-read the intro you'll notice I said the code works from Python 3.2 onwards, I'm well aware of that.

and they are unfortunately not a solution for the real life

So not a solution in real life that people (and me) have been using the underlying APIs these executor use for years in production.

Like ruby, python is stuck to be single-threaded not because there are no system thread/processes wrappers existing in the standard library, but because neither threads nor processes are robust enough to deal with real tasks usually requiring some action upon completion.

So, now you're saying that everyone that has ruby and python code in production using multiple threads or processes and doing fine doesn't exist because you decided threads and processes are not robust enough to deal with "real tasks"?

To deal with concurrency, the virtual machine is unfortunately required, and there are Java, Erlang and to some extent Golang to deal with the issue.

Please tell me in which part of my post I made a comparison with other languages. The whole post is about talking about a functionality that has been in the standard library for years and some people might not know about.

 

I don’t know what you call production-ready, I am talking about 100K+ concurrent processes.

I am aware of the existence of people who think 1K processes is a concurrency. They make me laugh.

Laughter is always goood, releases endorphines

@rhymes , please don't pay too much attention when people with russian names criticise your work. High level of critics is a heritage of a Soviet model of education.
We suffer from self-criticism, too :)

From other side, sometimes westerns are too supportive, and will not tell you the bitter true, trying not to hurt anybody.

As for the question, I saw a python code that reach 100k requests per minute per core.
pawelmhm.github.io/asyncio/python/...

But far before this numbers, in a real app, you will reach limits of DB, or disk, or whatever other part of code, so...

Sloan, the sloth mascot Comment marked as low quality/non-constructive by the community View code of conduct

Now my name is not western enough. I wonder what would be the next in this ad-hominem hell on this “welcoming” site.

We suffer from self-criticism

Please do not speak for the internets, speak for yourself.

I saw a python code

You seem to be fine in googling titles, unfortunately fair arguing implies reading these texts as well. The link you posted contains “way is just adding some synchronization in your client limiting number of concurrent requests it can process. I’m going to do this by adding asyncio.Semaphore() with max tasks of 1000”—guess what?

in a real app, you will reach [...]

Please tell that to Hadoop engineers, or to WhatsApp (or any other Telecom) team. There are tons of examples, and none is made with python. Twitter escaped Ruby as soon as possible for the exactly same reason.

To be welcomed you shouldn't start to comment python article with something like "python sucks, use Java.

Everybody here knows good and bad sides of Python, and sure it is not 1kk RPS per core language, thank you, K.O.

As for company examples, remember Eve online, running dozens of thousands players in one world using python 2.7

To be welcomed you shouldn't

I never said I want to be welcomed and/or appreciated.

comment python article with something like “python sucks, use Java.”

Please stop putting words in my mouth. I never said that either. What I said was “don’t use python for concurrency,” and this is indeed a good advise.

Wow these guys are rude. As a Python developer (and Telecom employee!), I'm with you. Don't use Python threading or multiprocessing too much if you can avoid it. We definitely prefer a queue/worker model for our async code.

That being said, multiprocessing isn't terrible but you are running full instances of the python interpreter for each process so it's less than friendly on memory and can introduce leakages especially with file descriptors.