DEV Community

Cover image for Python: meet ThreadPool
pO0q 🦄
pO0q 🦄

Posted on

Python: meet ThreadPool

As the name suggests, you can use ThreadPool to manage thread pools in Python.


I'll simplify various concepts here, as it's only an introduction.

Python Threads in short

A Python process can be considered as an instance of the Python program (~ main thread).

It usually executes specific instructions in one thread, but you can create more threads to execute some tasks concurrently.

The built-in ThreadPool class can ease the configuration while providing some good standards.

Of course, you may want to manage it manually, like starting and closing threads exactly when you need it, but it gets significantly harder when the number of tasks increases, and the class already optimized that operation.

Multiprocessing in short

With ThreadPool, you basically get "reusable threads" to execute tasks. The class abstracts the complexity:

  • you don't have to select a thread for your task
  • you don't have to start the thread manually
  • you don't have to wait for the task to complete
  • it supports both local and remote concurrency

There is so much more to say about multiprocessing, but, as a beginner, such built-in tool can be beneficial:

from multiprocessing.pool import ThreadPool

if __name__ == '__main__':
    results = ThreadPool(5).imap_unordered(myfunc, some_list)
    for result in results:
Enter fullscreen mode Exit fullscreen mode

Here, we define a pool of 5 tasks, and we apply myfunc to some_list. If you have a list of files or URLs to process, you may leverage the benefits of the pool to speed up the execution.

N.B.: pool.imap_unordered is a variant of pool.imap. It might be slightly faster in some cases.

The big caveat

Obviously, if you misuse it, ThreadPool can have unexpected effects, but it's designed to ease the implementation and prevent common mistakes.

In my experience, it should not be used for writing large files unless you process them in chunks in your handler (myfunc), which Python allows you to do quite easily.

Better implementations

Please refer to the documentation for better implementations of ThreadPool.

For example, you'll see that Python devs recommend using concurrent.futures.ThreadPoolExecutor instead of ThreadPool, because it's compatible with more libraries:

from concurrent.futures import ThreadPoolExecutor

if __name__ == '__main__':
    with ThreadPoolExecutor(max_workers = 5) as executor:, some_list)
Enter fullscreen mode Exit fullscreen mode

Bottom line

Thread pools allows managing thread conveniently and efficiently.

The internal mapper is pretty handy to apply a function on each element in a list.

Top comments (0)