In Chapter 2, we mentioned that most processes in the system are in the sleep state. So, how does the kernel make each process run on the CPU when there are multiple executable processes in the system? In this chapter, we will discuss the Linux kernel's process scheduler (hereinafter referred to as the scheduler), which is responsible for allocating CPU resources to processes.
In books about computer science, the scheduler is explained as follows:
- Only one process can run at a time on a single logical CPU
- It allows multiple executable processes to use the CPU in turn, in units called "time slices"
For example, if there are three processes, p0, p1, and p2, it would look like the following figure.
Let's confirm whether this is actually the case in Linux by conducting an experiment.
Elapsed Time and CPU Time
To understand the content of this chapter, it is essential to understand the concepts of elapsed time and CPU time related to processes. This section explains these times. Their definitions are as follows:
- Elapsed Time: The time that passes from the start to the end of a process. It is like the time measured with a stopwatch from the start to the end of a process.
- CPU Time: The time that a process actually uses a logical CPU.
These terms probably won't make much sense just by looking at the explanation, so let's understand them through an experiment. If you run a process using the time command, you can get the elapsed time and CPU time from the start to the end of the target process. For example, let's run the following "load.py" which terminates after consuming some CPU resources.
#!/usr/bin/python3
# This parameter is to change the amount of CPU time used by this program. It is convenient to change this parameter to make the CPU time consumption several seconds.
NLOOP=100000000
for _ in range(NLOOP):
pass
The result is as follows.
$ time ./load
real 0m2.357s
user 0m2.357s
sys 0m0.000s
The output includes three lines starting with real, user, and sys. Among these, real indicates the elapsed time, and user and sys indicate the CPU time. user refers to the time when the process was running. In contrast, sys refers to the time when the kernel was operating as a result of system calls issued by the process. The load program continues to use the CPU from the start to the end of its execution and does not issue any system calls during that time, so real and user are almost the same, and sys is almost zero. The reason why it is "almost" is because the Python interpreter calls a few system calls at the beginning and end of the process.
Let's also run an experiment with the sleep command, which sleeps most of the time.
$ time sleep 3
real 0m3.009s
user 0m0.002s
sys 0m0.000s
Since it terminates after sleeping for 3 seconds, "real" is nearly 3 seconds. On the other hand, this command gives up the small mount of CPU time and goes to sleep right after starting, and when it begins to use only a little CPU time again 3 seconds later before terminating. So "user" and "sys" are almost 0. The differences of two values are shown in the following figure.
When Using Only One Logical CPU
To simplify the discussion, let's first consider the case of a single logical CPU. The "multiload.py" program will be used for the experiment. This program performs the following actions:
Usage: ./multiload [-m] <number of processes>
Executes a specified number of load processing processes for a set period of time and waits for all to finish.
The time it took to execute each process is outputted.
By default, all processes run only on a single logical CPU.
Meaning of the option:
-m: Allows each process to run on multiple CPUs.
Here is its source code.
#!/bin/bash
MULTICPU=0
PROGNAME=$0
SCRIPT_DIR=$(cd $(dirname $0) && pwd)
usage() {
exec >&2
echo Usage: ./multiload [-m] <number of processes>
Executes a specified number of load processing processes for a set period of time and waits for all to finish.
The time it took to execute each process is outputted.
By default, all processes run only on a single logical CPU.
Meaning of the option:
-m: Allows each process to run on multiple CPUs."
exit 1
}
while getopts "m" OPT ; do
case $OPT in
m)
MULTICPU=1
;;
\?)
usage
;;
esac
done
shift $((OPTIND - 1))
if [ $# -lt 1 ] ; then
usage
fi
CONCURRENCY=$1
if [ $MULTICPU -eq 0 ] ; then
# Pin the "load.py" program to CPU0
taskset -p -c 0 $$ >/dev/null
fi
for ((i=0;i<CONCURRENCY;i++)) do
time "${SCRIPT_DIR}/load.py" &
done
for ((i=0;i<CONCURRENCY;i++)) do
wait
done
Let's first set "" to 1 and run it. This is almost the same as running the load program alone.
$ ./multiload 1
real 0m2.359s
user 0m2.358s
sys 0m0.000s
In my environment, the elapsed time was 2.359 seconds. What about when the parallelism is 2 and 3?
$ ./multiload 2
real 0m4.730s
user 0m2.360s
sys 0m0.004s
real 0m4.739s
user 0m2.374s
sys 0m0.000s
$ ./multiload 3
real 0m7.095s
user 0m2.360s
sys 0m0.004s
real 0m7.374s
user 0m2.499s
sys 0m0.000s
real 0m7.541s
user 0m2.676s
sys 0m0.000s
As the level of parallelism increased by 2 times, 3 times, the CPU time did not change much, but the elapsed time increased by about 2 times, 3 times. This is because, as mentioned at the beginning, only one process can run at the same time on one logical CPU, and the scheduler gives CPU resources to each process in turn.
When Using Multiple Logical CPUs
Next, let's also take a look at the case of multiple logical CPUs. When the multiload program is run with the "-m" option, the scheduler tries to distribute the multiple load processes evenly across all logical CPUs. As a result, for instance, if there are two logical CPUs and two load processes, as shown in the following figure, the two load processes can each monopolize the resources of a logical CPU.
The logic of load balancing is very complex, so this book avoids detailed explanation.
Let's actually verify this. The results of running the multiload program with the -m option and parallelism from 1 to 3 are shown below.
$ ./multiload -m 1
real 0m2.361s
user 0m2.361s
sys 0m0.000s
$ ./multiload -m 2
real 0m2.482s
user 0m2.482s
sys 0m0.000s
real 0m2.870s
user 0m2.870s
sys 0m0.000s
$ ./multiload -m 3
real 0m2.694s
user 0m2.693s
sys 0m0.000s
real 0m2.857s
user 0m2.853s
sys 0m0.004s
real 0m2.936s
user 0m2.935s
sys 0m0.000s
For all processes, the values of real and user+sys were almost the same. In other words, we can see that each process was able to monopolize the resources of a logical CPU.
For all processes, the values of real and user+sys were almost the same. In other words, we can see that each process was able to monopolize the resources of a logical CPU.
Cases where "user + sys" is larger than "real"
Intuitively, you might think that "real >= user + sys" will always hold, but in reality, there can be cases where the value of "user + sys" is slightly larger than that of "real". This is due to the different methods used to measure each time and the fact that the precision of the measurement is not that high. There's no need to worry too much about this; it's enough to be aware that such things can happen.
Furthermore, there are cases where "user + sys" can be significantly larger than "real". For example, this occurs when you run the "multiload.py" program with the "-m" option and set the number of processes to 2 or more. Now, let's try running "./multiload -m 2" through the time command.
$ time ./multiload -m 2
real 0m2.510s
user 0m2.502s
sys 0m0.008s
real 0m2.725s
user 0m2.716s
sys 0m0.008s
real 0m2.728s
user 0m5.222s
sys 0m0.016s
The first and second entries are data about the load processing processes of the "multiload.py" program. The third entry is data about the "multiload.py" program itself. As you can see, the value of user is about twice that of "real". In fact, the "user" and "sys" values obtained by the "time" command are the sums of the values for the target process and its terminated child processes. Therefore, if a process generates child processes and they each run on a different logical CPU, the value of "user+sys" could be greater than real. The "multiload.py" program fits exactly this condition.
NOTE
This article is based on my book written in Japanese. Please contact me via satoru.takeuchi@gmail.com if you're interested in publishing this book's English version.
Oldest comments (0)