Multithreading and Multiprocessing are used to build asynchronous applications. Not many developers understand the difference between them or when to choose one over the other. In this article, we are exploring multiprocessing. We will create a function for it, picking up where we left off in the last article. We will also check its speed and see when it's best to use it. Let's begin!
Prerequisites
- You have read the previous article on Multithreading.
- You have Python installed.
- You have a code editor ready.
Before you get started, check how many processors your CPU has. Open your code editor and run the following program.
import multiprocessing
print("Number of cpu :", multiprocessing.cpu_count())
Thread vs. Process
Threads are subprocesses. They are the smallest unit of execution. A single process contains the main thread and several threads.
A process is an instance of a computer program being executed. It has its own memory space and is independent of other processes.
To understand this, imagine a process as a container. The threads share the same resources that the container(program) has. So, a process is seen as the entire program, while a thread is a subset of the program.
Multiprocessing
Multiprocessing, as the name implies, means running multiple processes. In simpler terms, it means running multiple unrelated processes with its Python interpreter, memory, and processor.
For example, right now, you might be reading this article on Google and playing music on Spotify. These are unrelated processes and are independent of each other. When you run multiple apps, think of them as multiple processes.
Multiprocessing allows parallel execution of tasks by creating separate processes, each with its Python interpreter. This approach is useful for CPU-bound tasks that can benefit from utilizing multiple CPU cores.
In summary, multiprocessing speeds up unique processes. It is used to achieve parallel computing, and it is the ability of a system to support more than one processor at a time.
To better understand Multiprocessing, let’s delve into the concept of parallelism.
Parallelism
Parallelism means doing things at the same time. It involves executing tasks simultaneously and running multiple processes by utilizing the CPU. It is a parallel execution. The program designates tasks to available CPU processes. With parallelism, a program enables multiprocessing.
To deepen our understanding of parallelism, let’s delve into CPU-intensive tasks and how to identify them in a program.
CPU-bound Task
These are tasks that rely on the CPU’s speed for execution. These tasks involve heavy computations and complex calculations. Examples of CPU-bound tasks are intensive mathematical calculations, data processing, and training machine learning models.
Global Interpreter Lock
Multiprocessing and Parallelism bypass the global interpreter lock, allowing us to leverage the CPU fully and run parallel execution. The problem with threads doesn’t surface in multiprocessing because the processes are independent of each other so they don’t write to the same memory, and race conditions don't surface.
Multithreading and Multiprocessing
Let’s discuss the similarities and differences between the two.
- Multiprocessing uses parallelism and multiple processors, while Multithreading uses concurrency and multiple threads.
- To gain a better understanding of Multiprocessing, you need to delve deeper into CPUs, cores, and processes compared to Multithreading.
- The Operating system handles process scheduling in both cases, but the Python interpreter handles thread scheduling in Multithreading.
- In multiprocessing, multiple workers(processes) execute the instructions, whereas in Multithreading, multiple workers(threads) handle subsets of instructions.
- A process is independent of other processes, while a thread is dependent on others.
- A process has its memory space and Python interpreter, whereas a thread shares the same memory space with other threads.
- Multiprocessing is used on tasks that rely heavily on the CPU. Multithreading is used on tasks that require input/output operations to be completed.
- Race conditions and deadlocks can occur in multithreading, but they are less common in multiprocessing.
Implementation
ProcessPoolExecutor
I won’t be explaining in steps as it would be a repetition. Everything is similar to multithreading except that we use ProcessPoolExecutor() instead.
import concurrent.futures
import time
def do_something(seconds):
if seconds == 8 or seconds == 12:
print(f'Sleeping {seconds} second(s)...')
time.sleep(seconds)
print(f'Done sleeping...{seconds}')
else:
print(seconds)
if __name__ == '__main__':
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
secs = range(30)
results = executor.map(do_something, secs)
end = time.perf_counter()
elapsed_time = end - start
print(f'Finished in {round(end - start, 2)} second(s)')
This was my result.
‘Finished in 12.82 second(s)’
You should get something close to similar. I ran the program multiple times, and I obtained different values, but they were all in the same range, with just a slight difference after the decimal point. Now, let’s focus on something important. Remember, a process is an independent application, and that application has several threads. So, the program is doing both parallel execution and concurrency. In each process, work is happening either sequentially or concurrently.
As a result, the program is a bit slower compared to Multithreading, but the difference is minimal, and I have explained the reason for this earlier.
In summary, multiple workers(processors) handle the printing of the numbers in the example code above.
However, this doesn’t seem like something we would typically do in the real world because it’s unnecessary to speed up small numbers. We only used it for experimentation. So, when you need to perform complex calculations or work with large datasets, like in data science or training machine learning models, leverage the power of parallelism and multiprocessing to speed it up.
Conclusion
We can draw the following conclusion from this discussion:
- A process is an instance of a computer program.
- A process contains one main thread and several threads.
- A process is independent of other processes.
- Multiprocessing is achieved by executing multiple processes.
- Parallelism helps us achieve multiprocessing.
- We use multiprocessing for tasks that rely heavily on the CPU.
Top comments (4)
For I/o or network intensive tasks, async/await is also another paradigm that can be used to speed up applications.
Yeah, I agree with you
Very good !!!
Thanks for reading