CS 251/EE 255: Real-Time Embedded Systems
University of California, Riverside
HW Project #2: Task Scheduling and Monitoring
Due: 2/23/2022, Wednesday, 11:59 p.m. (Pacific time)
Read the entire handout carefully before starting work. Good luck and happy kernel hacking!
1. Introduction
This homework project is designed to exercise the following topics:
• Accessing task information in the kernel data structures
• Linux scheduler basics
• Timers
Before you start working:
• Read the handout carefully: background, writeup questions and the tips section aim to give you
hints about design and implementation.
• Whenever the handout asks you to do something without explaining how to accomplish it, please
consult references on kernel development. That's by design.
• To be awarded full credit, your kernel must not crash or freeze under any circumstances.
2. Background Information
2.1. Periodic Task Parameters
In real-time systems, tasks are conventionally modeled as a periodic sequence of jobs. Jobs are released
every T units of time and each job needs to finish within a relative deadline of D units of time from its
release. To guarantee schedulability, we only admit a task into the system if the schedulability of the
resulting task set is guaranteed. To perform a schedulability test, it is necessary to calculate the amount of
resources each task uses, e.g., task 𝜏𝑖
's utilization 𝑈𝑖 = 𝐶𝑖/𝑇𝑖
, or the worst-case response time.
However, our guarantee relies on the task to never exceed its allowed execution time, C. Unfortunately,
tasks are not that obedient in practice and may overrun their budget due to an inaccurately estimated
execution time, timing penalties from memory access, etc. To maintain the timing safety of the system in
light of this, you will implement a simple monitoring framework in the kernel which will keep track of the
task budget (i.e., maximum of C computation time every T).
2.2. Processes and Threads
In the Linux kernel, the kernel object corresponding to each process/thread is a task. Every process/thread
running under Linux is dynamically allocated a struct task_struct structure, called Task Control
Block (TCB). The TCB is declared in /include/linux/sched.h.
2
The Linux kernel assigns a task ID to each task. Hence, both processes and threads have task IDs and each
task's ID is stored in its TCB, in a member variable pid_t pid (not ‘tid’ due to historical reasons). In a
single-threaded process, the task ID is equal to the process ID. In a multi-threaded process, all threads have
the same process ID, but each one has a unique task ID. The process ID is also called a thread group leader
ID (TGID) and it is stored in pid_t tgid of a TCB.
You can use ps and top commands in shell to view the list of running processes and (optionally) threads.
You can also see the family relationship between the running processes in a Linux system using the pstree
command.
For example, type “ps -eLfc” in the command line.
$ ps –eLfc
UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD
root 1 0 1 1 TS 19 Nov29 ? 00:00:09 /sbin/init
root 2 0 2 1 TS 19 Nov29 ? 00:00:00 [kthreadd]
...
root 10 2 10 1 TS 19 Nov29 ? 00:00:00 [rcu_bh]
root 11 2 11 1 FF 139 Nov29 ? 00:00:00 [migration/0]
root 12 2 12 1 TS 19 Nov29 ? 00:00:00 [cpuhp/0]
...
pi 804 560 804 3 TS 19 Nov29 ? 00:00:00 /usr/lib/gvfs/gvfsd-trash...
pi 804 560 809 3 TS 19 Nov29 ? 00:00:00 /usr/lib/gvfs/gvfsd-trash...
pi 804 560 810 3 TS 19 Nov29 ? 00:00:00 /usr/lib/gvfs/gvfsd-trash...
...
This command prints all tasks including processes and threads running in the system. The second column,
PID, displays the process ID of a task, and the fourth column, LWP, shows the actual task ID. See lines in
cyan and yellow. PID and LWP fields are the same. That means it is either a single-threaded process or the
leader thread (or also called master or main thread) of a multi-threaded process. In fact, PID 804 is a multithreaded
process. You can see two green lines, where PID is also 804 but LWP is different. This means
these two tasks, 809 and 810, are the child threads of the process 804.
The TCBs of tasks are maintained in a doubly-linked list in the kernel. To find a task's TCB with a task ID
(pid), you can use: find_task_by_pid_ns(
, &init_pid_ns);
Each task is associated with a priority level. In Linux, the priority value range is divided into generalpurpose
priorities and real-time priorities (see rt_priority field in struct task_struct). All realtime
priorities supersede all general-purpose priorities. A real-time task is a task whose scheduling class is
a real-time class and rt_priority value is within the real-time priority value range (1 to 99). The chrt
command can be used to give a process/thread real-time priority in a shell terminal.
sched_setscheduler() is a system call to set real-time priority in C programs.
Let’s take a look at the above “ps -eLfc” example again. The scheduling class and priority of tasks are
shown in the CLS and PRI columns, respectively. For convenience, Linux displays the priorities of timesharing
scheduling class (e.g., SCHED_OTHER) in the range of 0 to 40 and those of real-time scheduling
class (e.g., SCHED_FIFO, SCHED_RR) in the range of 41 to 139 (meaning real-time priority 1 to 99). See
the line in cyan, PID 11. Its CLS is FF, meaning its scheduling class is SCHED_FIFO; its PRI is 139,
indicating its real-time priority is 99 (the highest priority level). You can verify this observation by ‘chrt
-p
’ in a shell.
2.3. Task Scheduling
To keep track of the amount of computation resources a task consumes, the task will need to be followed
throughout its lifetime: birth, context switch in, context switch out, and death.
3
In lecture, you were introduced to Linux processes and threads. The kernel shares the available CPU(s)
among the tasks in the system. The kernel must be able to suspend the instruction stream of one task and
resume the instruction stream of another previously-suspended task. This activity is referred to as a context
switch.
When executing on one CPU, all tasks share the CPU registers. Hence, after suspending a task, the kernel
must save the values of registers to restore them when the task is resumed. The set of data that must be
loaded into the registers before the task resumes its execution on the CPU is called the hardware context.
A context switch involves saving the task control block (TCB, struct task_struct) of the task being
switched out and replacing it with that of the task being switched in place. The context switch in the Linux
kernel is initiated from one well-defined point: __schedule() function in kernel/sched/core.c.
static void __sched notrace __schedule(bool preempt)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
unsigned long prev_state;
struct rq_flags rf;
struct rq *rq;
int cpu;
cpu = smp_processor_id(); /* get current CPU ID */
rq = cpu_rq(cpu);
prev = rq->curr;
...
The __schedule() function is central to the kernel scheduler. Its objective is to find a task in the runqueue
(a list of tasks ready to run) and assign the CPU to it. The function involves several kernel routines.
It sets a local variable prev, which points to the TCB of the currently-running task on the current CPU.
...
next = pick_next_task(rq, prev, &rf);
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
if (likely(prev != next)) {
rq->nr_switches++;
...
rq = context_switch(rq, prev, next, &rf);
} else {
...
}
}
Next, it picks a next task to be run and sets a local variable next to the next task's TCB. If next is different
from prev, then finally context switch happens. If no runnable task has higher priority than the current
task, then next coincides with prev and no context switch takes place.
2.4. Timing
The in-kernel timer of choice for this project is the high-resolution hrtimer. The documentation with lots
of crucial usage information is available in the kernel source tree at
Documentation/timers/hrtimers.rst and in include/linux/hrtimer.h. Examples are
available at https://developer.ibm.com/tutorials/l-timers-list/. Understand the various modes and choose the
best ones for your purposes. Do not forget to cancel the timer after use or when it is no longer needed. Not
doing so is a recipe for resource leak and kernel panic.
4
Hrtimer works with time values of the type ktime_t, which is a 64-bit signed integer representing time in
nanoseconds.
A good, but not the only, function for getting the current time in the kernel is ktime_get().
3. Build Directory
You need to create a subfolder proj2/ in the top-level directory of your repository. Your source files to
be implemented will fall into three categories:
Category Location in kernel repo Build method
built-in kernel code proj2/kernel built together with the kernel image
kernel modules proj2/modules standalone using the kernel build system
user-space applications proj2/apps standalone using user-space libraries
Follow the instructions in the HW #1 handout to setup Makefile (and Kbuild for kernel source code).
4. Assignment (total 120 points)
4.1. Periodic Real-time User-level Test Program (no points; needed for testing your kernel)
Source code location: /proj2/apps/periodic/periodic.c
Let us run the user-level application that takes C, T, CPUID arguments as input and busy-loops for C
milliseconds every T milliseconds on the CPU CPUID. It supports C and T values up to 10,000 ms (10
secs). C and T values should be greater than 0 (C <= T). CPUID should be greater than or equal to 0 (0
means the first CPU core). The program is runnable on the stock kernel too: it does not rely on any of your
modifications to the kernel. The program should continue to run until a user presses ‘Ctrl-C’ or terminates
it by sending a KILL signal.
Download periodic.tgz from Canvas (Modules – Projects). Extract the file and build by:
$ mv periodic.tgz /proj2/apps/
$ cd /proj2/apps
$ tar xvfz periodic.tgz # extract
$ cd periodic
$ ls
Makefile periodic.c
$ make # build
Once compiled successfully, you can run it with some input parameters and this program will print out its
PID and the given input parameters. See the below example for formatting.
Example:
#### run with C=100ms, T=300ms on CPU0
$ ./periodic 100 300 0
PID: 892, C: 100, T: 300, CPUID: 0
^C
$
#### error case (C > T): C=300ms, T=100ms, CPUID=1
$ ./periodic 300 100 1
Invalid arguments: C 300, D 100, CPUID 1
$
#### run two instances in background, the one on CPU0 and the other on CPU1
$ ./periodic 100 500 0 & ### ‘&’ is to run in background
[1] 1203
PID: 1203, C: 100, T: 500, CPUID: 0
$ ./periodic 200 1000 1 &
[1] 1207
PID: 1207, C: 200, T: 1000, CPUID: 1
$
#### bringing back to foreground and terminate them
$ fg
./periodic 200 1000 1
^C
$ fg
./periodic 100 500 0
^C
$
Given that you are using a VM, the timing error of this program is expected to be around 20-50 ms. So use
C and T params in units of 100 milliseconds (e.g., 100, 200, 300, etc.)
How to verify this app? You may want to check precisely if this app is working as expected. One way is
to use trace-cmd (a command line tool for event tracing) and kernelshark (GUI front end)
Install as follows:
1 $ sudo apt install trace-cmd kernelshark
Then, you can record scheduling events while the periodic application is running. For example:
$ ./periodic 100 200 1 & ### ‘&’ is to run in background
[1] 1207
PID: 1207, C: 100, T: 200, CPUID: 1
$
### Record scheduling events of PID 1207 for 5s
### - It takes longer than 5s to complete
### - If you want to record all tasks, try “sudo trace-cmd record -e sched sleep 5”
$ sudo trace-cmd record -e sched_switch -P 1207 sleep 5
CPU 1: 67314 events lost
CPU0 data recorded at offset=0x577000
0 bytes in size
CPU1 data recorded at offset=0x577000
515227648 bytes in size
...
### Terminate the periodic task after recording
$ fg
./periodic 100 200 1 &
^C
$
After this, you should be able to see trace.dat file generated in the same directory. Launch the following
command to see the data.
1 $ kernelshark ### or “kernelshark trace.dat”
6
This is the screenshot of kernelshark. By default, it displays all events recorded. You may want to filter
out unrelated events by: clicking Filters > Show tasks from the menu and choosing only the tasks
you are interested.
If you want to trace two or more tasks, append -P
for each of them. For example, to trace three
tasks with PID 1207, 1208, and 1209 for 3 seconds: sudo trace-cmd record -e sched_switch -P 1207 -P
1208 -P 1209 sleep 3
4.2. Setting Task’s Timing Parameters (20 points)
Source code location: /proj2/kernel/
(If you want to implement it as an LKM, store the source code in /proj2/modules/ and
indicate in the writeup that LKM needs to be loaded for testing)
As a first step to develop a task monitoring framework, your job is to add system calls that set periodic task
timing parameters for a given task.
First, add timing parameters in a TCB:
ktime_t C, T; // C: execution time, T: period
The C and T parameters in the TCB should be in the ktime_t type so that they can be directly used with
hrtimers. Recall that ktime_t represents time in nanoseconds!
Then implement the following two syscalls:
int set_rtmon(pid_t pid, unsigned int C_ms, unsigned int T_ms);
int cancel_rtmon(pid t pid);
7
Use the syscall number 443 for set_rtmon, and 444 for cancel_rtmon.
The pid argument is the target task's ID. The C_ms and T_ms arguments of set_rtmon are time values
in milliseconds (ms). The maximum C_ms and T_ms allowed is 10000 ms (= 10 secs) and the minimum is
1 ms. The role of these two syscalls are simple; set_rtmon sets C and T parameters for the target task in
its TCB, and cancel_rtmon clears these parameters of the target task (i.e., setting C and T to all zero).
Your syscalls are supposed to check if the input arguments are all valid (e.g., valid pid; C_ms and T_ms
are within the maximum/minimum range; don’t trust the provided user-level program!). The syscalls should
return 0 if there is no error, and -1 if there was any error. If set_rtmon is called for a task that has nonzero
C and T parameters in its TCB, it should return -1. Likewise, if cancel_rtmon is called for a task
with zero C and T parameters, it should return -1.
The syscalls should work with processes as well as threads. When the given pid is of a thread, the setting
should apply to that thread and not to any of its siblings, parent or children.
4.3. Printing Task’s Timing Parameters (20 points)
Source code location: /proj2/kernel/
(If you want to implement it as an LKM, store the source code in /proj2/modules/)
Implement a syscall that prints out task timing parameters (C and T values).
int print_rtmon(pid t pid);
Use the syscall number 445 for print_rtmon.
The pid argument is the target task's ID. print_rtmon prints out the target task’s C and T parameters
with its task ID using printk. If pid is -1, then print_rtmon should print the C and T values of all tasks
whose C and T values are not zero. The syscalls should work with processes as well as threads.
print_rtmon should return 0 if it found a task with pid or pid is -1. Otherwise, it returns -1.
For example:
# print_rtmon(1295);
[ 256.201952] print_rtmon: PID 1295, C 200 ms, T 1200 ms
# print_rtmon(1302);
[ 271.129505] print_rtmon: PID 1302, C 0 ms, T 0 ms
# print_rtmon(-1);
[ 312.201952] print_rtmon: PID 1295, C 200 ms, T 1200 ms
[ 312.202015] print_rtmon: PID 1323, C 100 ms, T 500 ms
[ 312.202035] print_rtmon: PID 1324, C 300 ms, T 800 ms
In this example, PID 1302 prints all zero values, meaning that no value has been set for this task. There are
a total of three tasks, PID 1295, 1323, 1324, whose C and T values are non-zero (set by set_rtmon).
You must print timing values in milliseconds, not in nanoseconds.
4.4. Periodic Task Support (20 points)
Source code location: /proj2/kernel/
(If you want to implement it as an LKM, store the source code in /proj2/modules/)
8
Add an additional syscall to facilitate the implementation of periodic tasks under your monitoring
framework:
int wait_until_next_period(void);
Use the syscall number of 446. When a task with non-zero C and T values makes this call, the kernel
suspends the task until the beginning of the next period of this task. In order to implement this feature, you
would need to start a periodic timer when timing parameters are set for a task by set_rtmon. Use hrtimer
to implement the periodic timer.
The syscall should return 0 when the calling task has non-zero C and T values, and -1 otherwise.
An example of using this syscall to implement a periodic task is:
set_rtmon(pid, C_ms, T_ms);
while (1) {
// do computation
wait_until_next_period();
}
Test your kernel with periods of tens to hundreds of milliseconds. Kernelshark will be useful to verify
correctness. Small timing errors (< 10 ms in most cases and intermittent spikes) is acceptable in a VM.
IMPORTANT: Clean up any timer used upon task termination and rtmon cancellation. Remember that
task may be terminated at any time by ‘Ctrl-C’, so cancel_rtmon may not be explicitly called by the task.
Your kernel should take care of such cases and it should never crash.
4.5. Computation Time Tracking (20 points)
For each task with non-zero C and T values in the task's TCB, keep track of the cumulative CPU time used
by the task within each period (i.e., from the beginning of a period to the end of that period; and across
context switches). Reset this time accumulator at the start of every period.
When the kernel finds that a task has reached or exceeded its budget (execution time of C per T), print out
the following message to the kernel log:
PID 123: budget overrun (C xx ms, T yy ms, actual zz ms)
where xx and yy are the original C and T parameters set by set_rtmon and zz is the actual cumulative
CPU time used in this period.
Note:
1) Do NOT use existing time-accumulating variables that may already exist in struct task_struct;
implement your own time tracking feature.
2) Do NOT use a system-wide periodic timer to periodically count how many times you see the task running.
That logic cannot accurately capture the execution time of the task when a task has a shorter period. In ideal
condition (e.g., native Linux machine), your logic should be able to capture execution time with submillisecond
accuracy.
4.6. RT Priority Assignment (20 points)
9
Automatically assign real-time priority to a task when a rtmon is set by calling the set_rtmon syscall. Use
the Rate Monotonic scheduling to determine the real-time priority (e.g., tasks with shorter periods will get
higher RT priority). Use SCHED_FIFO scheduling class. Tasks with the same period should get the same
RT priority.
You MUST use the RT priority from 50 (lowest) to 70 (highest). It is safe to assume that the maximum
number of active rtmons in the system does not exceed 20. But tasks can be created and terminated anytime
(e.g., 15 tasks created first, 10 terminated, and then 10 tasks created again). Terminated tasks should not be
considered by the priority assignment.
Keep in mind that whenever a new rtmon is set, the assigned RT priorities may need to be readjusted; for
example, in the case where a new task has a shorter period than existing ones.
4.6. Enforced Budget (Bonus: 10 points)
Ensure that a task with an active rtmon (non-zero C and T values) executes for up to C time units in each
period, and it is suspended immediately when the budget is exhausted (so the task never runs more then C
per T). The task will wake up at the start of the next period as the budget will be replenished. See the Tips
section for suspending and resuming tasks and hrtimers.
If you have implemented computation time tracking (Sec 4.4), it should print the budget overrun message
whenever the budget is enforced. The ‘actual’ field of the message should be close to the original C value.
Hint: You will need an additional hrtimer for each task in order to enforce task execution time. Recall that
the requirement is to suspend the task as soon as its cumulative execution time has reached C.
4.6. Writeup (20 points)
Writeup location: /proj2/proj2_team00.pdf or proj2_team00.txt
Submit a plain text or PDF document with brief answers to the following questions:
1. Run three instances of the periodic program (periodic.c) and capture their execution traces for 3
seconds using trace-cmd. Use the following task parameters:
- Task 1: C = 40 ms, T = 250 ms, CPUID = 1
- Task 2: C = 60 ms, T = 500 ms, CPUID = 1
- Task 3: C = 200 ms, T = 1000 ms, CPUID = 1
(If your system has only one CPU, use CPUID = 0)
To launch the tasks at the same time:
$ ./periodic 40 250 1 & ./periodic 60 500 1 & ./periodic 200 1000 1 &
Attach a screenshot of kernelshark.
2. Run the same taskset, but after launching them, assign real-time priorities to each task by using
chrt on the command line (e.g., $ sudo chrt -f -p 80
; see Lecture 7 slides). Use
SCHED_FIFO.
- Task 1: C = 40 ms, T = 250 ms, CPUID = 1 -- real-time priority: 80
- Task 2: C = 60 ms, T = 500 ms, CPUID = 1 -- real-time priority: 79
- Task 3: C = 200 ms, T = 1000 ms, CPUID = 1 -- real-time priority: 78
10
Attach a screenshot of kernelshark.
3. Compare the two results and give an explanation of task behavior.
4. Give a brief summary of the contributions of each member.
5. How to submit?
You must use git to submit all your work for this project, including source code, makefiles, and the
answers to the writeup questions. Please make sure that all your commits to the source code and the writeup
are pushed into the master branch of your team repo, and they are all visible at the BitBucket website.
The code that will be graded has to be pushed to the remote origin (BitBucket) by the deadline. If you miss
any file and submit it late, there will be a late submission penalty.
The fresh clone of the submitted version of your repository should be compilable on the grader's machine.
Do not use absolute file paths for header file inclusion (e.g., #include "/home/user/foo/bar/file.h" –
this will likely create compile errors on a different machine).
6. Tips
6.1. General
• Look into include/linux/sched.h for TCB definition.
• sched.h is used by many kernel source files. So whenever you modify it, kernel compile takes
long as a large portion of the kernel needs to be rebuilt. Try to add all necessary variables in the
TCB at once to avoid repeated long compilations.
• You may need to install VirtualBox Guest Additions again when the TCB data structure is modified.
• If you don’t see any “[M]” symbol when recompiling the kernel, that means no kernel module has
been recompiled and you can skip “sudo make modules_install” to save time. But if you are unsure,
follow all the steps: make, sudo make modules_install, and then sudo make install (and of course,
reboot with the new kernel).
6.2. Task management
• You can pin a task to a core using sched_setaffinity().
• grep inside the core kernel folder for the macros for_each_process and
for_each_process_thread.
• Remember to protect accesses to shared data structures. Some of the ones you will need may be
protected by semaphores, mutexes, spinlocks or RCUs. When in doubt, look at how the needed
data structure is accessed elsewhere in kernel code.
• wake_up_process() can be used to wake up a suspended task.
• How to suspend a task? Basically, each task maintains its state in its TCB (e.g., TASK_RUNNING
for running or ready tasks; TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE for non-ready,
blocked tasks). If the currently-running task changes its state from TASK_RUNNING to
TASK_INTERRUPTIBLE and schedule() is invoked, the scheduler will find another ready task
and switch to it.
11
• There are two ways for the scheduler invocation: direct and indirect. If the kernel is in task context
(e.g., running a system call function), the scheduler can be invoked directly by calling schedule().
If the kernel is in interrupt context, the scheduler should be invoked indirectly by using
set_tsk_need_resched(). These will invoke the scheduler on the current CPU before
returning to user-space.
• How to check each task’s CPU time usage? A task consumes CPU time from when it is contextswitched
in to when it is switched out. So you can measure and compare timestamps within the
schedule function (e.g., right before the context switch function).
6.3. hrtimer
• Hrtimer handlers can be executed in interrupt context. This means the hrtimer handlers cannot
suspend and should not use blocking functions like msleep() and semaphore/mutex locks
(*_trylock functions are usable as they do not block).
• By default, on a multi-processor or multi-core platform, the handlers of hrtimrs may migrate freely
among cores. This may complicate the synchronization and scheduler-invocation issues that need
to be handled in the implementation.
• To force the handlers of hrtimers to execute on the core on which the timer was started, start the
hrtimer with the HRTIMER_MODE_PINNED bit set in the mode bitmask (e.g., for absolute time, use
“HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED” or “HRTIMER_MODE_ABS_PINNED” for the
mode parameter; see hrtimer.h).
• If you define hrtimer objects in a TCB, the container_of() macro can be used in an hrtimer
callback function to retrieve the corresponding TCB.
• Your kernel must cancel hrtimer when a task finishes. Recall any terminating task issues an exit()
syscall (even those terminated by Ctrl+C).
References
1. Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman, Linux Device Drivers, Third
Edition, O'Reilly, 2005. http://lwn.net/Kernel/LDD3/
2. Daniel P. Bovet and Marco Cesati, "Understanding the Linux Kernel," O'Reilly Media, ISBN:
978-0596005658, 2005 (3rd Edition).
3. Robert Love, "Linux Kernel Development," Addison-Wesley Professional, ISBN: 978-
0672329463, 2010 (3rd Edition).