This article explains the meaning of the Linux's sysctl parameters about the process scheduler and some background knowledge needed to understand it. Here I don't tend to explain all parameters, but just cover essential ones.
The description in this article doesn't consider the following things about process scheduling for simplicity.
- nice value
- real-time priority
This article is based on Linux kernel v5.0.
There is a concept called
scheduling classes in the Linux kernel. All processes running on Linux belong to one of the scheduling classes. Each scheduling class defines how the processes belonging to it are scheduled.
Processes belong to
fair scheduling class by default. In this article, I call these processes
normal processes. On the other hand, processes called
real-time processes (see later) belong to
realtime scheduling class.
I'll describe the meaning of the sysctl parameters about the above-mentioned two scheduling classes in the following sections. In addition, I'll also describe a brief explanation about each scheduling class.
The normal processes belongs to
fair scheduling class are scheduled with Completely Fair Scheduler (CFS). The meaning of the CFS will be explained in the next section.
If there are two or more runnable processes, CFS divide CPU time to each process as fair as possible. In this case,
fair means giving fair share of CPU time to each process.
CFS has a concept called
latency target. CFS tries to give timeslice to all runnable processes once per the latency target. Here the timeslice of each process is
(latency target)/<the number of runnable processes>. For example, if the latency target is 10ms and there are two runnable processes, these can get 5ms per 10ms. If there are four, these can get 2.5ms per 10ms.
kernel.sched_latency_ns defines the
latency target of CFS in nanoseconds. If there are multiple CPUs in the system, the
latency target becomes
kernel.sched_latency_ns * (1+log2(the number of CPUs)).
How about the case that there are so many runnable processes? For example, if the latency target is 10ms and there are 100 runnable processes, does each process's timeslice get just 100us? It seems to be too short since the context switch cost becomes too high in this case.
To prevent this problem, timeslice is guaranteed to become equal or longer than the value of
kernel.sched_min_granularity_ns parameter. The unit of this parameter is nanoseconds. Please note that the latency target becomes
kernel.sched_min_granularity_ns * (the number of runnable processes).
Similar to the latency target, if there are multiple CPUs in the system, the guaranteed timeslice becomes
kernel.sched_min_granularity_ns * (1+log2(the number of CPUs)).
The processes, which are woken up from a sleep state, tend to sleep again in a short period. So, in many cases, it's efficient to give CPU time to the woken up process as soon as possible.
The typical example is terminal emulators that directly interact with users through the input from keyboard. When a user types something, a terminal emulator
is woken up and echo back his input. If the echo back takes too long, the user experience becomes bad.
CFS has a special logic to shorten the latency of such interactive processes. However, to explain the detail of this logic is a bit difficult. So I only say that if you decrease
kernel.wakeup_granularity_ns parameter, the probability of the preemption by the woken up process gets high. Then the system's interactivity would get better.
However, please note that there is a tradeoff between interactivity and throughput. If you set the value that is shorter than the default value, the number of context switches would get large and the throughput would get worse.
realtime scheduling class is for the processes that must run prior to any normal processes, in other words, the processes belonging to
fair scheduling class.
As I already described, the processes belong to
realtime scheduling class are called real-time processes. The definition of the real-time processes is the processes having
SCHED_FIFO scheduling policy or
SCHED_RR scheduling policy. We can set the scheduling policy of processes with
sched_setscheduler() system call.
Let's assume that a real-time process A becomes runnable in a CPU, in which process B, that belongs to
fair scheduling class, is running on this CPU. Here B can preempt A at any time by definition. So, how about the case that the B is also real-time processes? It depends on the scheduling policy of B.
If B's scheduling policy is SCHED_FIFO, A can't preempt B and can run on this CPU only when B exits or becomes sleeping state. However, if its scheduling policy is SCHED_RR, B has its predefined timeslice and B can preempt A after A exhausts its timeslice. If A also belongs to SCHED_RR, both A and B got CPU time in a round-robin manner after that.
This parameter means the timeslice of real-time processes that belong to SCHED_RR scheduling policy. Its unit is millisecond.
These parameters are to prevent CPU occupation by the out-of-control real-time processes.
If the real-time process continues to run for a long time without getting sleep, any normal processes can't get CPU time at all during this period. It would incur serious problems like hanging up the whole system. For example, let's assume a system that has only one CPU and the a real-time process A is running on the CPU. If A hangs up, the system also hangs up. In addition, we can't kill this problematic real-time process because launching bash is also prevented by this process.
To prevent this kind of problem, the process scheduler has a logic to limit the running time of real-time processes. In short, the total CPU time consumed by real-time processes can't exceed
kernel.sched_rt_period_us. Both units are microseconds.
This article describes some of Linux's scheduler and the basic knowledge which is necessary to understand this explanation. If you're interested in this topic, please modify these parameters and run your workload to verify whether the description of this article is correct or not. For example, the following article would help you.