1. Overview

As a multitasking operating system, Linux shares its resources between processes. One of these resources is CPU time. Usually, the users’ processes run with time-sharing scheduling while the kernel’s tasks use real-time. However, we can change the scheduling policies to meet our needs.

In this tutorial, we’ll learn how to manage these policies. Then, we’ll walk through some scheduling examples.

2. The Basics of Scheduling

First, let’s emphasize that the main goal of real-time scheduling is to run tasks in a predictable manner. In this way, we can meet the requirements for real-time systems. Accordingly, the real-time policies rely on priority, which takes value in the 1 through 99 range. Thus, a thread of higher priority always takes precedence over that of lower priority. On the other hand, all time-sharing threads have the same priority of zero value. So, all real-time tasks obtain the CPU resource before time-sharing threads.

Basically, we can work with three policies:

  • SCHED_FIFO – First In First Out real-time policy – threads of the same priority are queued in the order of arrival. Then, the first thread obtains the CPU
  • SCHED_RR – simple round-robin real-time scheduling which extends the FIFO scheme. All threads with the same priority receive the CPU in turn
  • SCHED_OTHER – time-sharing scheduling, implemented as the Completly Fair Scheduling (CFS)

It’s worth noting that the Linux system is preemptive, which means that kernel can take CPU access away from the thread and return it when that thread’s turn comes.

Finally, as we’ll be using top, let’s mention that it maps positive real-time priorities into negative values. So, the higher the process priority, the lower value in the PR column shown by top.

3. The chrt Command

With the chrt command, we can examine or set the process’ scheduling attributes. In addition, we can start a new process with the given priority and scheduling policy.

First, let’s see the command’s syntax:

chrt [options] -p [priority] PID       
chrt [options] priority command argument ...

So, the first variant allows us to manipulate properties of the running process by means of its PID, while the second enables running a command.

Next, let’s list the scheduling policies and their corresponding priority ranges with the -m option:

$ chrt -m
SCHED_OTHER min/max priority    : 0/0
SCHED_FIFO min/max priority    : 1/99
SCHED_RR min/max priority    : 1/99
# ...

To select the SCHED_OTHER, SCHED_FIFO, or SCHED_RR policy, we should use the o, f, or r options respectively.

3.1. Changing Properties of Running Process

For our first example, let’s start a stress-ng process for two minutes:

$ stress-ng --cpu 1 --timeout 120s

Then, let’s find the PID of the running stressor:

$ ps -eo command,stat,pid | grep ^stress-ng-cpu
stress-ng-cpu [run]         R+      7575

Now, we’ll use the –r option to change the process scheduling policy to round-robin and set its priority to 2:

$ sudo chrt -r -p 2 7575

Finally, let’s confirm the change:

$ chrt -p 7575
pid 7575's current scheduling policy: SCHED_RR
pid 7575's current scheduling priority: 2

3.2. Running a Command

Next, let’s start a process with the FIFO policy indicated by the –f option and the highest priority 99:

$ sudo chrt -f 99 stress-ng --cpu 1 --timeout 90s

Now let’s display the top data of this process:

$ top -p 5278
# ...

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
   5278 root      rt   0   45780   5900   3528 R  95,0   0,0   0:42.33 stress-+

So, rt in the PR column tells us that the process is real-time scheduled with the highest priority.

4. Simulating a Real-Time Machine

Now let’s simulate a one-processor machine to explore various aspects of real-time scheduling. Thus, we’re going to create a one-processor CPU set with the help of the cset command:

$ sudo cset shield --sysset=ts-set --userset=rt-set --cpu=3 --kthread=on

So now we have the system set called ts-set for time-share and the userset set named rt-set for real-time. In addition, we set the kthread option to move out from the newly created rt-set as many kernel threads as possible.

Subsequently, let’s move the current Bash shell process into the shield. In this way, all threads started from the corresponding terminal will stay in the user set:

$ sudo cset shield --sysset=ts-set --userset=rt-set --shield --pid=$$
cset: --> shielding following pidspec: 4508 
cset: done

In the next step, let’s change the policy to round-robin and set the priority to 2. Then, any process which runs in this terminal will inherit these attributes:

$ sudo chrt -r -p 2 $$

Finally, let’s check the shielded processes:

$ cset shield --sysset=ts-set --userset=rt-set --shield --verbose
cset: "rt-set" cpuset of CPUSPEC(3) with 1 task running
   USER       PID  PPID SPPr TASK NAME
   -------- ----- ----- ---- ---------
   joe       4508  2610 Sr_2 bash 
cset: done

So, we can learn from the SPPr column, that the Bash process is sleeping now (S), its policy is round-robin (r) and the priority equals 2.

5. Running Tasks With Round-Robin Scheduling

As we have the real-time machine simulator ready in the terminal, let’s start the stress-ng hogs from inside it:

$ stress-ng --cpu 2 --timeout 90s

Then, let’s see the processes in the real-time set rt-set:

$ cset shield --sysset=ts-set --userset=rt-set --shield --verbose
cset: "rt-set" cpuset of CPUSPEC(3) with 4 tasks running
   USER       PID  PPID SPPr TASK NAME
   -------- ----- ----- ---- ---------
   joe       4508  2610 Sr_2 bash 
   joe       9355  4508 Sr_2 stress-ng --cpu 2 --timeout 90s 
   joe       9356  9355 Rr_2 stress-ng-cpu [run]                 joe       9357  9355 Rr_2 stress-ng-cpu [run]             
cset: done

So, we have two running instances of stress-ng-cpu, with the inherited round-robin policy and the same priority 2.

Next, let’s get the top output for these very processes:

$ top -p 9356,9357
# ...

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
   9357 joe       -3   0   45784   5880   3500 R  48,7   0,0   0:12.24 stress-ng                                        
   9356 joe       -3   0   45784   5880   3500 R  46,7   0,0   0:12.25 stress-ng 

We can find out that, on average, they share a single CPU’s time.

6. Priorities and Preempting

Now let’s understand how the priorities work. So, let’s change the scheduling policy of our simulator shell to FIFO, with priority 2. Let’s type this command in the simulator’s terminal:

$ sudo chrt -f -p 2 $$

Then, let’s start a one CPU stressor to simulate a long-time task:

$ stress-ng --cpu 1 --timeout 240s

We can observe in the top output, that it consumes almost 100% of the only shield processor:

$ top -p 9235
# ...
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
   9235 joe       -3   0   45780   5996   3624 R  95,0   0,0   0:51.22 stress-ng

Next, let’s open a fresh terminal and start another hog outside the shield:

$ stress-ng --cpu 1 --timeout 120s

Assuming that its PID is 9256, let’s change its priority to 3 and move to the shield:

$ sudo chrt -f -p 3 9256
$ sudo cset shield --sysset=ts-set --userset=rt-set --shield --pid 9256

Afterwards, let’s list the shield’s dwellers:

$ cset shield --sysset=ts-set --userset=rt-set --shield --verbose
cset: "rt-set" cpuset of CPUSPEC(3) with 4 tasks running
   USER       PID  PPID SPPr TASK NAME
   -------- ----- ----- ---- ---------
   joe       4508  2610 Sf_2 bash 
   joe       9234  4508 Sf_2 stress-ng --cpu 1 --timeout 240s 
   joe       9235  9234 Rf_2 stress-ng-cpu [run]              
   joe       9256  9255 Rf_3 stress-ng-cpu [run]              
cset: done

So, both stressors are reported as running and using the FIFO policy now. However, now the top command shows that only the stressor of PID 9256 accesses the CPU:

$ top -p 9235,9256
# ...

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
   9256 joe       -4   0   45780   5900   3516 R  95,0   0,0   0:40.77 stress-ng                                        
   9235 joe       -3   0   45780   5996   3624 R   0,0   0,0   2:08.26 stress-ng

So, we’ve seen how the process of higher priority takes over the resource. At the same time, the lower priority task is preempted, (i.e., kicked off the CPU). However, its status remains running (R), as the preemption doesn’t cause the task to sleep.

7. Room for Time-Sharing

Now, let’s examine a situation that’s a bit different. First, let’s set the priority of the simulator to the highest possible value, 99. So, let’s issue the command in the simulator’s terminal:

$ sudo chrt -f -p 99 $$

Next, we’ll start a stressor in the rt-set shield as before:

$ stress-ng --cpu 1 --timeout 120s

Then, let’s start another task outside the simulator and add it to rt-set, but without changing its scheduling policy with chrt:

$ stress-ng --cpu 1 --timeout 120s
$ sudo cset shield --sysset=ts-set --userset=rt-set --shield --pid 5774

Finally, let’s take a look at the top output for these processes:

$ top -p 5745,5774
# ...

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
   5774 joe       rt   0   45780   5896   3512 R  95,0   0,0   2:11.35 stress-+ 
   5745 joe       20   0   45780   5988   3616 R   5,0   0,0   2:09.65 stress-+ 

In contrast to the prioritized task from the previous example, now the second process acquires a share of the CPU time. To explain that, let’s check the processes’ details:

$ cset shield --sysset=ts-set --userset=rt-set --shield --verbose
cset: "rt-set" cpuset of CPUSPEC(3) with 4 tasks running
   USER       PID  PPID SPPr TASK NAME
   -------- ----- ----- ---- ---------
   joe       4508  5448 Sf99 bash 
   joe       5745  5744 Roth stress-ng-cpu [run]              
   joe       5773  5449 Sf99 stress-ng --cpu 1 --timeout 240s 
   joe       5774  5773 Rf99 stress-ng-cpu [run]               
cset: done

We can observe that the process of PID 5745 runs with the SCHED_OTHER policy. So, the kernel provides it with some amount of the resource. In detail, we can find the kernel’s scheduling period in the /proc/sys/kernel/sched_rt_period_us file. Its default value is 1000000 microseconds, (i.e., 1s). Then, in the /proc/sys/kernel/sched_rt_runtime_us file, we can read how much of this time is reserved for real-time tasks. By default, it’s 0.95s. So, the remaining 0.05s is committed to keeping time-sharing tasks running.

8. Conclusion

In this article, we learned about real-time scheduling in Linux. First, we briefly looked through the different scheduling policies. Then, we used the chrt command to manipulate processes’ policies and priorities.

Next, we created a one-processor cpuset to highlight specific aspects of real-time scheduling. We saw how two processes of the same priority run with round-robin scheduling. Following this, we demonstrated the preempting of a lower priority process in the FIFO scheme. Finally, we discussed sharing the same CPU by the prioritized and normal tasks.