DEV Community

Yosuke Hanaoka
Yosuke Hanaoka

Posted on • Updated on

Accelerate Python Programs with Concurrent Programming

Introduction

Concurrency is one of the approaches that can drastically improve the performance of our Python programs, which can be achieved in Python using numerous methods and modules. In this blog post, I would like to summarize my understanding and share the results of my attempts to speed up Python programs using the following three basic libraries.

  • asyncio: Coroutine-Based Concurrency
  • threading: Thread-Based Concurrency
  • multiprocessing: Process-Based Concurrency

What is Concurrency?

Before we get to the main topic of concurrent programming, please let me clarify the definition of the word "Concurrent". There are a variety of slightly different understandings of the word “concurrent" on the Internet, but after reading various explanations by various people, I understand it as follows, and I use the term concurrent with this thought in this blog post.

Concurrency is the ability to manage multiple tasks at the same time. The multiple tasks are executed in overlapping time periods, but not necessarily simultaneously, and they can be interleaved or executed in overlapping time periods.

Parallelism is a subset of concurrency and the ability to manage multiple tasks simultaneously. The multiple tasks are required to execute simultaneously, and the purpose of parallelism is to increase computational performance.

Concurrency

What limits the speed of our Python program?

In reducing the overall execution time of a program, it is important to understand what is currently influencing or limiting the execution time. This is because the reason for this will change the effective approach. In general, there are two main types of causes that limit program execution time: 1) CPU-bound and 2) I/O-bound.

CPU-bound refers to a situation where the execution time of a program depends on the computation speed of the CPU. For example, suppose there is a program that performs large-scale scientific calculations. If the program doesn't perform input/output (I/O) to/from the disk but takes a considerable amount of time to complete processing, the processing speed of this program is dependent on the CPU computation speed. Other examples include compression/decompression, encryption, and image conversion processes. The simplest solution is to use higher-frequency cores. Another solution is to rewrite the program so that it processes multiple works in parallel by multiple cores instead of one core, thereby reducing the overall program execution time.

I/O-bound refers to a situation where the execution time of a program depends on the processing speed of I/O. For example, suppose there is a program that searches for a given document from a large amount of data stored on disk. If the faster the disk is, the less time it takes to search, then the processing speed of this program depends on the speed of the I/O, not the CPU. Other examples include Web API calls, network latency, etc. Since the CPU is not working during I/O processing, it is CPU-waiting time. Therefore, by having other processing take place during this waiting time, the overall execution time of the program can be shortened.

Basic Types of Concurrent Programming in Python

asyncio: Coroutine-Based Concurrency

A coroutine is a unit that allows execution to be suspended and resumed. A single thread may execute many coroutines in an event loop. Unlike threads and processes, where the operating system controls when a thread or process is suspended and when it is resumed and executed, coroutines themselves control when they are suspended and resumed. Coroutines are defined and used via the async/await syntax in Python.

The asyncio is a library for writing single-threaded asynchronous I/O processing code using the async/await syntax. Asynchronous I/O processing is, simply put, "not waiting for one I/O process to finish before processing another. asyncio is appropriate for I/O-bound activities.

threading: Thread-Based Concurrency

A thread is the smallest unit of CPU utilization within a process. Basically, a CPU can only execute one thread in parallel on one core. At least one thread called the MainThread is included in a process. Any additional threads that we create within the process will belong to that process.

The threading is a library for launching multiple threads in the same process and writing multi-threaded concurrent code. Due to Global Interpreter Lock (GIL), no matter how many threads are created in the same process, there is always only one thread running at a time in the same process while the other threads wait to acquire the lock to execute. threading is appropriate for I/O-bound activities.

multiprocessing : Process-Based Concurrency

A process is a program being currently executed by OS. For example, when you run a Python source code, the interpreter compiles the source code into byte code; the OS executes this byte code and begins processing it as described in the source code. This is called a process.

The multiprocessing is a library for launching multiple processes and writing parallel processing code. Since GIL exists on a per-process basis, multiprocessing allows threads in each process to execute the bytecode in parallel. multiprocessing is appropriate for CPU-bound activities.

Comparison Table

Libraries Num. of Thread Num. of Process Num. of Core Concurrency Type
asycio 1 1 1 Concurrency
threading N 1 1 Concurrency
multiprocessing 1/Process N N Parallelism

The impact of GIL on multi-thread programming

Python's Global Interpreter Lock (GIL) is, simply put, a mutex (lock) that allows only one thread with a lock to execute bytecode in the same process, even when there are multiple threads, and the other threads are kept in a waiting state. In other words, multiple threads cannot run simultaneously within a single process, and Python multithreading can't be parallelism.

Python uses GIL to manage memory safely. On the other hand, it can be a performance bottleneck in CPU-bound and multi-threaded code.

GIL1

GIL2

Code Example

Sorry for the lengthy lead-in. Finally, I’ve reached the main part. This time, I used the following code for testing each library. I defined a function that processes CPU-bound, waits for I/O processing, and CPU-bound in sequence. Regarding CPU-bound, this function simply executes numerous loops to load. worker() is for synchronous, threaded, and multiprocessing, and worker_async() is for asyncio.

Then, I used asyncio, threading and multiprocessing libraries to run work() and work_async(). To measure processing time, logs are output before and after CPU and I/O bound. Additionally, logs are also output at 25%, 50%, and 75% progress of CPU bound (loop processing). But, for readability, log output is omitted from the following code display. The full version of the code can be found at the Github Gist link, concurrent_test.py .

The I/O bounds should actually be verified using such as the following libraries, but this time, I will simplify that part by replacing it with time.sleep() as an example.

  • File I/O -> built-in modules or aiofiles
  • Network I/O (HTTP) -> requests or aiohttp
  • Network I/O (DB) -> sqlite or aiosqlite

Code Example Execution Result

Below is a summary of the results of the code example run. The actual output log can be found at the Github Gist link, console.log

Result

In the case of programs including I/O processing waits, asyncio executes the next CPU process during the I/O processing wait time so that processing is performed without making the CPU wait. Compared to synchronous, the overall processing time is reduced by 6 seconds (three times the 2-second I/O processing wait).

Regarding threading, the CPU time behaviour of threading in the above table is only to show the characteristics and is not exact, but as can be seen from the log output, the loop processing progress of each Thread is side by side as in the case of CPU bound, so it's assumed to be processed in this way.

The problem of this threading case is that there is a long CPU wait in the middle of the processing. Personally, I expected I/O wait time will be used a little more effectively by shifting the three CPU=bound processings as in the asyncio case. However, it didn't happen.

So I lowered CPU_LOAD to 10**4 and re-ran. In that case, the threading took the same form as in asyncio. In short, instead of three CPU-bound processes proceeding concurrently, it completed only one CPU-bound processing first and then started the next CPU-bound processing while waiting for I/O processing. What criteria does threading use to switch processing? I would like to explore this in-depth separately.

In the case of multiprocessing, the main thread of each process is executed concurrently. Therefore, even in this example, the entire process is completed in about one-third of the time compared to synchronous processing.

Conclusion

  • If we want to accelerate CPU-bound processing, we must adopt multiprocessing because GIL exists in Python. Anything other than multiprocessing is not an option.
  • If we want to speed up I/O bound processing, we should apply asyncio or threading to utilize the waiting time.
  • As to whether to select asyncio or threading, asyncio can be the first choice because it's possible to process with one thread, but there are some processing that asyncio can't handle, so I think that threading is a candidate in that case.
  • Moreover, as shown in my code example above, there might be cases where threading can't use CPU efficiently at all compared to asyncio.
  • As far as the above results are concerned only, multiprocessing is the fastest. We need to consider more complex cases, such as exchanging data between processes. Plus, it can't use I/O processing waits efficiently.
  • The first thing we need to consider is to decide if we should use these libraries. In some cases, they will not have much effect and will only complicate the code. Then, once we have determined that we need to apply these libraries for concurrent programming, the next step is to examine whether the program is CPU-bound or I/O-bound and select the appropriate library.

References:

Top comments (0)