As the name suggests, you can use ThreadPool to manage thread pools in Python.
I'll simplify various concepts here, as it's only an introduction.
A Python process can be considered as an instance of the Python program (~ main thread).
It usually executes specific instructions in one thread, but you can create more threads to execute some tasks concurrently.
The built-in ThreadPool class can ease the configuration while providing some good standards.
Of course, you may want to manage it manually, like starting and closing threads exactly when you need it, but it gets significantly harder when the number of tasks increases, and the class already optimized that operation.
With ThreadPool, you basically get "reusable threads" to execute tasks. The class abstracts the complexity:
- you don't have to select a thread for your task
- you don't have to start the thread manually
- you don't have to wait for the task to complete
- it supports both local and remote concurrency
There is so much more to say about multiprocessing, but, as a beginner, such built-in tool can be beneficial:
from multiprocessing.pool import ThreadPool if __name__ == '__main__': results = ThreadPool(5).imap_unordered(myfunc, some_list) for result in results: print(result)
Here, we define a pool of 5 tasks, and we apply
some_list. If you have a list of files or URLs to process, you may leverage the benefits of the pool to speed up the execution.
pool.imap_unordered is a variant of
pool.imap. It might be slightly faster in some cases.
Obviously, if you misuse it, ThreadPool can have unexpected effects, but it's designed to ease the implementation and prevent common mistakes.
In my experience, it should not be used for writing large files unless you process them in chunks in your handler (
myfunc), which Python allows you to do quite easily.
Please refer to the documentation for better implementations of ThreadPool.
For example, you'll see that Python devs recommend using
concurrent.futures.ThreadPoolExecutor instead of
ThreadPool, because it's compatible with more libraries:
from concurrent.futures import ThreadPoolExecutor if __name__ == '__main__': with ThreadPoolExecutor(max_workers = 5) as executor: executor.map(mynfunc, some_list)
Thread pools allows managing thread conveniently and efficiently.
The internal mapper is pretty handy to apply a function on each element in a list.