Supporting Multiprocessor Resource Synchronization Protocols in RTEMS

by   Junjie Shi, et al.
TU Dortmund
University of Twente

When considering recurrent tasks in real-time systems, concurrent accesses to shared resources, can cause race conditions or data corruptions. Such a problem has been extensively studied since the 1990s, and numerous resource synchronization protocols have been developed for both uni-processor and multiprocessor real-time systems, with the assumption that the implementation overheads are negligible. However, in practice, the implementation overheads may impact the performance of different protocols depending upon the practiced scenarios, e.g., resources are accessed locally or remotely, and tasks spin or suspend themselves when the requested resources are not available. In this paper, to show the applicability of different protocols in real-world systems, we detail the implementation of several state-of-the-art multiprocessor resource synchronization protocols in RTEMS. To study the impact of the implementation overheads, we deploy these implemented protocols on a real platform with synthetic task set. The measured results illustrate that the developed resource synchronization protocols in RTEMS are comparable to the existed protocol, i.e., MrsP.


page 1

page 2

page 3

page 4


DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks

Real-time scheduling and locking protocols are fundamental facilities to...

Multiprocessor Real-Time Locking Protocols: A Systematic Review

We systematically survey the literature on analytically sound multiproce...

Scheduling of Real-Time Tasks with Multiple Critical Sections in Multiprocessor Systems

The performance of multiprocessor synchronization and locking protocols ...

On the Analysis of Parallel Real-Time Tasks with Spin Locks

Locking protocol is an essential component in resource management of rea...

Leveraging ERP Implementation to Create Intellectual Capital: the Role of Organizational Learning Capability

The extent to which enterprise resource planning (ERP) systems deliver v...

Synthesizing Fine-Grained Synchronization Protocols for Implicit Monitors (Extended Version)

A monitor is a widely-used concurrent programming abstraction that encap...

Dependency Graph Approach for Multiprocessor Real-Time Synchronization

Over the years, many multiprocessor locking protocols have been designed...

I Introduction

In multi-tasking real-time systems, the accesses to shared resources, e.g., file, memory cell, etc., are mutually exclusive, to prevent race conditions or data corruptions. A code segment that a task accesses to the shared resource(s) is called a critical section, which is protected by using binary semaphores or mutex locks. That is, a task must finish its execution of the critical section before another task can access the same resource. However, the mutually exclusive executions of critical sections may cause other problems, i.e., priority inversion and deadlock, which could jeopardize the predictability of the real-time system. In order to guarantee the timeliness of a real-time system, a lot of resource synchronization protocols have been developed and analyzed since 1990s for both uni-processor and multiprocessor real-time systems.

In uni-processor real-time systems, the Priority Inheritance Protocol (PIP) and the Priority Ceiling Protocol (PCP) by Sha et al. [DBLP:journals/tc/ShaRL90], as well as the Stack Resource Policy (SRP) by Baker [DBLP:journals/rts/Baker91] have been widely studied. Since PIP may potentially lead to a deadlock requiring additional verification to avoid [DBLP:conf/icfem/GadiaAB16], PCP has been relatively common and its performance has been widely accepted. Specifically, a variant of PCP has been implemented in Ada (named Ceiling locking) and in POSIX (named Priority Protect Protocol).

Because of the increasing demand of computational power of real-time systems, multiprocessor platforms have been widely used. A lot of multiprocessor resource synchronization protocols have been proposed and extensively studied in the domain, such as the Distributed Priority Ceiling Protocol (DPCP) [DBLP:conf/rtss/RajkumarSL88], the Multiprocessor Priority Ceiling Protocol (MPCP) [Rajkumar_1990], the Multiprocessor Stack Resource Policy (MSRP) [DBLP:conf/rtss/GaiLN01], the Flexible Multiprocessor Locking Protocol (FMLP) [block-2007], the Locking Protocol (OMLP) [DBLP:conf/rtss/BrandenburgA10], the Multiprocessor Bandwidth Inheritance (M-BWI) [DBLP:conf/ecrts/FaggioliLC10], gEDF-vpr [DBLP:journals/rts/AnderssonE10], LP-EE-vpr [DBLP:journals/rts/AnderssonR14], the Multiprocessor resource sharing Protocol (MrsP) [DBLP:conf/ecrts/BurnsW13], the Resource-Oriented Partitioned PCP (ROP-PCP) [RTSS2016-resource], the Dependency Graph Approach (DGA) for frame-based task set [Chen-Dependency-RTSS18], and its extension for periodic task set (HDGA) [Shi-Dependency-RTAS2019].

Although the protocols above provide the timing guarantees by bounding the worst-case response time of tasks, most of them rely on the assumption that the overheads invoked by the implementation are negligible. However, rethinking of the assumption is in fact needed. Depending on their settings, e.g., local or remote execution of critical sections, multiprocessor scheduling paradigm, and the tasks’ waiting semantics, the performance of different protocols is highly relevant to the implementation. For example, under a suspension-based synchronization protocol, tasks that are waiting for access to a shared resource (i.e., the resource is locked by another task) are suspended. This strategy frees the processor so that it can be used by other ready tasks, which exploits the utilization of processor, but also increases the context switch overhead due to extra en-queue and de-queue operations for each suspension. In contract, under a spin-based synchronization protocol, the task does not give up its privilege on the processor and has to wait by spinning on the processor until it can access the requested resource and starts its critical section, which is efficient when the critical sections are short [han-2014].

In fact, there are only a few of the protocols have been officially supported, and there are two real-time operating systems popular in the domain: the Linux Testbed for Multiprocessor Scheduling in Real-Time Systems ([calandrino2006litmus], and Real-Time Executive for Multiprocessor Systems (RTEMS) [rtems]. is an experimental platform for timing analysis mainly for academic usages. Brandenburg et al. implemented DPCP, MPCP, and FMLP [brandenburg2008implementation], Catellani et al. implemented MrsP [DBLP:conf/adaEurope/CatellaniBHM15], and Shi et al. solidate the implementation of MrsP [shi-OSPERT17]. In addition, the recently developed DGA and its extension for periodic tasks HDGA have been implemented by Shi et al. in [DGALITMUS, HDGALITMUS]

. Alternatively, RTEMS is an open-source real-time operating system which is popular for industrial applications. RTEMS has been widely used in many fields, e.g., space flight, medical, networking, etc. However, in RTEMS, only MrsP implemented by Catellani et al. in 

[DBLP:conf/adaEurope/CatellaniBHM15], is officially supported in the upstream repository.

Therefore, we believe it is beneficial to provide comprehensively support on RTEMS with resource synchronization protocols for the related researches. Afterwards, the performance of resource synchronization protocols might be clarified by system designers, and the optimizations of implementation can also be discussed. In this work, we focus on the resource synchronization protocols which are based on (semi-) partitioned scheduling, detailed as follows:

  • Partitioned Schedule: Each task is assigned on a dedicated processor, each processor maintains its own ready queue and scheduler. Tasks are not allowed to migrate among processors, e.g., MPCP.

  • Semi-partitioned Schedule: Unlike the pure partitioned schedule, semi-partitioned schedule allows tasks to migrate to other processors under certain conditions. For example, in DPCP and ROP-PCP, shared resources are assigned on processors, the critical sections have to be executed on the corresponding processors, where may not be the same as the original partition of a task.

Our Contribution in a nutshell: We enhance the RTEMS with the aforementioned multiprocessor resource synchronization protocols and discuss how to revise the kernel with RTEMS Symmetric Multiprocessing (SMP) support.

  • To harden the open source development, we review the SMP support of RTEMS and point out the potential pitfalls during the implementation, so that the insights can be reused on any other platforms (see Section III).

  • We detail the development of three multiprocessor resource synchronization protocols, i.e., MPCP, DPCP, and FMLP, and their variants in RTEMS (see Section IV).

  • To study the impact of the implementation overheads, we deploy our implementations on a real platform with synthetic task sets (see Section V). The measured overheads show that our implementation overheads are comparable to the existed implementation of MrsP, in RTEMS, which illustrates the applicability of our implementations.

The patches have been released under MIT license in [RTEMS-protocols] for RTEMS 4.12. Please note that this release branch was planned to be the latest release, but significant changes warranted to bump the major number from 4 to 5. To apply our patches to RTEMS 5, a certain adaption is additionally needed.

Ii System Model

We consider a task set consists of recurrent tasks to be scheduled on symmetric and identical (homogeneous) processors. All tasks can have multiple (non-nested) critical sections, each critical section accesses one of the shared resources, denoted as Each task is described by a tuple , where:

  • is the worst-case execution time (WCET) of task , i.e., .

  • is the set of resource(s) that requests.

  • is the period of task , i.e., .

  • is the relative deadline of the task . To fulfill its timing requirements a job of released at time must finish its execution before its absolute deadline . We consider constrained-deadline task systems, i.e., for every task .

  • is the priority of task .




owner == NULL

_SEM_Wait_For _ownership

Enqueue task using TQ functions of the semaphore variant


_SEM_Claim _Ownership



Fig. 1: Workflow of the lock directive. Block and are specified according to the adopted protocols.

Iii Symmetric Multiprocessing Support in RTEMS

RTEMS allows users to implement new resource synchronization protocols by strictly following the RTEMS API. To create a new semaphore, SEM_Initialize function is called to define the specified attributes for each resource synchronization protocol. Besides the creation of semaphore, which is defined by different protocols, some common components that are similar for all the protocols, i.e., lock and unlock directives, configuration for applications, and migration mechanism, are introduced in this section.

Iii-a Lock and Unlock Directives

The workflow of the lock directive is shown in Fig. 1. Once a task requests a shared resource, it will try to lock the corresponding semaphore. After selecting the right semaphore, denoted as SEM, calls the _SEM_Seize function. Then, the ownership of the semaphore is checked by getting the owner of the Thread queue Control. If the semaphore is locked by another task, has to wait for the owner to release the semaphore. The detailed operations in block are specified according to the design of different protocols. If there is no owner yet, is set as the owner of the semaphore, and starts the execution of its critical section. The operations in block can be different depending on the specified design of protocols.

The workflow of the unlock directive is shown in Fig. 2. It will be called when task has finished the execution of its critical section and releases the lock of the semaphore. The unlock directive selects the right _SEM_Surrender function to check whether the is the current owner of the semaphore. If is not the owner, the semaphore cannot be unlocked. Otherwise, can unlock the semaphore by executing the commands in block . The main function in is to find the next owner for the semaphore if (at least) one task that is waiting for the semaphore. If there is no waiting task, the owner will be set to NULL accordingly. The details of the functions in will be discussed in the corresponding sections for different protocols.

rtems_semaphore _release



owner != executing

Not owner



Fig. 2: Workflow of the unlock directive. Block is specified according to the adopted protocols.

Iii-B Application Configuration

In order to support semi-partitioned schedule in RTEMS, the flow for configuration in Fig. 3 has to be followed. Firstly, processors have to be bound to specific scheduler instances by using macro _RTEMS_SCHEDULER_ASSIGN supported in RTEMS by default. After that, each task is partitioned to a scheduler instance by using the rtems_task_set_scheduler directive. Each task can only be executed on the processor of the corresponding scheduler instance.


Scheduler Instance


Step 1

Step 2
Fig. 3: The steps to configure

When a RTEMS application is configured with SMP support by following the work flow in Fig. 3, some new functions have to be implemented. In Step 1, an initial task has to be defined, which is executed in the beginning of the RTEMS application. The binding of scheduler instances to processor is based of the guide in the official c-user guide. The dedicated schedule algorithm for the scheduler instances has to be selected at first. In this paper, the Deterministic Priority SMP Scheduler supported in RTEMS by default is selected for all the protocols, which is the same as Fixed-Priority (FP) scheduler in the literature. Please note that, the instances have to be defined for all the available processors in the system, in order to support the semi-partitioned schedule, i.e., tasks may migrate to other processors by changing their scheduler nodes, details can be found in next subsection.

Iii-C Migration Mechanism

The migration mechanism by using arbitrary processor affinity in [DBLP:journals/rts/GujaratiCB15] is not supported in the current version of RTEMS. Therefore, a new migration mechanism has to be applied for those distributed-based protocols, e.g., DPCP. In our implementation, the scheduler node is modified during the run time in order to realize the task migration. When a task needs to migrate to another processor, the scheduler node of the task in its original scheduler instance is blocked, and the scheduler node of the task in its destination processor is unblocked. An additional function named _Scheduler_Migrate_To is implemented in schedulerimpl.h, which contains the task information block, the target processor, and the priority of the task in the target processor. In addition, in order to guarantee the correctness of the migration, thread-dispatch is disabled during the migration operation.

Fig. 4 demonstrates an example of the implemented task migration. In Fig. 4 (1), task has a scheduler node for every scheduler instance in the system. is currently executing on CPU#0 with a priority of by using scheduler node , which is indicated by the the node with green background. Other two nodes with grey background are blocked, since has no access their respective scheduler instances, denoted as dashed line. In Fig. 4 (2), task performs migration to CPU#1. blocks itself on its original scheduler by using the block function of the scheduler instance on . After that, it adds to the list of its active scheduler nodes and modifies the priority of accordingly. It unblocks by using the unblock function of the corresponding scheduler instance. Migrating back to the original processor works similarly, i.e., Fig. 4 (1) is restored by using the same unblock/block function of the scheduler instances.

Task Executing on CPU#0

(255) [CPU#1] BLOCKED


(255) [CPU#2] BLOCKED

Task Migrated to CPU#1



(255) [CPU#2] BLOCKED


Fig. 4: Scheduler Node management: (1) Before migration, (2) After migration. Dashed blocks and lines represent that has no access to the respective scheduler instances, whereas green block is the currently used one.

Iv Multiprocessor Resource Synchronization

In this section, the implementation details of three protocols and corresponding variants are explained and discussed. Please note, we only consider non-nested resource accesses in our implementation, i.e., only one shared resource is requested during the execution of one critical section.

Iv-a Multiprocessor Priority Ceiling Protocol

The Multiprocessor Priority Ceiling Protocol (MPCP) is a typical protocol that is based on a partitioned fixed priority (P-FP) scheduler. That is, each task has a pre-defined priority, and the execution of a task is bound on a pre-defined processor, i.e., no migration is allowed. The main features of MPCP are: 1) a task will suspend itself if the resource is not available. 2) if a task is granted to access a shared resource, the priority of the task will be boosted to the ceiling priority, which equals to the highest priority of these tasks that request that resource.

The self-suspension feature is supported in RTEMS by default. In order to implement the ceiling priority boosting, one new semaphore structure is created. Besides these normal components, e.g., semaphore lock, wait queue, and current semaphore owner, one variable named ceiling_priority is added. Please note that, in our implementation the ceiling priority is defined by users instead of being calculated by the system dynamically. The pseudo code provided in Algo. 1 shows two main functions in our implementation, which fits the lock and unlock directive in Section III-A. The details are as follows: Once a task requests a shared resource, the ownership of the shared resource (semaphore) will be checked. If the owner of the requested shared resource is NULL, becomes the owner, and the priority of is boosted to the ceiling priority on the corresponding scheduler instance (operations in block in Fig. 1). Otherwise, will be added into a wait queue, which is sorted by tasks’ original priorities, i.e., task with higher priority will get earlier position (operations in block in Fig. 1). Once the task finishes the execution of critical section, it will release the semaphore. The first task of the wait queue is checked, i.e., the task with the highest priority in the wait queue. If there is no task in the wait queue, the semaphore owner will be set to NULL. Otherwise, the first task of the wait queue will be set as the semaphore owner (operations in block in Fig. 2).

0:  Task , and ceiling_priority of related semaphore;  Function mpcp_lock():
1:  if semaphore_owner is NULL then
2:     semaphore_owner ;
3:      ceiling_priority;
4:      starts the execution of its critical section;
5:  else
6:     Add to the corresponding wait_queue;
7:  end if Function mpcp_unlock():
8:   releases the semaphore lock;
9:  Next task the head of the wait_queue;
10:  if  is NULL then
11:     semaphore_owner NULL;
12:  else
13:     semaphore_owner ;
14:      starts the execution of its critical section;
15:  end if
Algorithm 1 MPCP implementation

Iv-B Distributed Priority Ceiling Protocol

The Distributed Priority Ceiling Protocol (DPCP) is based on semi-partitioned fixed priority schedule. In DPCP, tasks and shared resources are assigned on different processors separately, i.e., these processors that are assigned for the execution of non-critical sections are called application processors, and processors for the execution of critical sections are called synchronization processors. Once a task tries to access a shared resource, it will migrate to the corresponding synchronization processor where the shared resource is assigned on, before trying to lock the corresponding semaphore. Afterwards, these tasks on the same synchronization processor operate follow the uni-processor PCP, which been supported in RTEMS by default, i.e., Immediate Ceiling Priority Protocol (ICPP). When a task finished its execution of critical section, it will migrate back to the original application processor to continue the execution of its non-critical section, if it exists.

Hence, the main challenge of the implementation of DPCP is to allow task migrations among processors. In RTEMS, task partitioning is realized by the scheduler node in the scheduler function, i.e., scheduler node defines the original partition for each task before the execution, and stays the same during the run time. Details have been explained in Section III-C.

Iv-C Flexible Multiprocessor Locking Protocol

In Flexible Multiprocessor Locking Protocol (FMLP), requests of shared resources are divided into two groups, i.e., long and short, according to the length of the execution time of corresponding critical section. When the requested resource is not available, a task will suspend itself if it is a long request, and a task will spin on the correspond processor if it is a short request. However, there is no conclusion regarding to how to divide requests to obtain a better schedulability. Therefore, we divided our implementation into FMLP-L which only supports long requests, and FMLP-S which only supports short requests. Please note, to simplify the implementation, all the tasks in one task set all belong to either long group or short group, no mixed division of these two groups is allowed.

In both FMLP-L and FMLP-S, the wait queue in the semaphore structure is in a FIFO order, rather than sorting by priorities like MPCP and DPCP. The operations in block in Fig. 1 are as follows: In FMLP-L, we maintain a ceiling priority dynamically for each resource, which equals to the highest priority of these tasks that are currently waiting for the resource, i.e., tasks in the corresponding wait queue. The priority of the semaphore owner will be boosted to the ceiling priority if the original priority is lower than the ceiling priority, when it starts the execution of its critical section. In FMLP-S, the owner of the semaphore gets priority boosted to the highest possible priority in the system, so that the execution of its critical section is the non-preemptive. The operations in block in Fig. 1 are the same for both FMLP-L and FMLP-S, i.e., add task in the end of the corresponding wait queue. The unlock operations in block in Fig. 2 are also the same, i.e., try to find the next owner for the semaphore by checking the first task in the wait queue, if it exists.

Additionally, we implemented a distributed version of FMLP, denoted as DFLP, where all the requests are treated as long requests. The main difference between FMLP and DFLP is when a task requests a shared resource, it will migrate to the corresponding synchronization processor, which is similar to DPCP. The mechanism how we implement the migration has been explained in Section III-C. After the migration, critical sections are executed by following the FMLP-L on the corresponding synchronization processor(s).

V Evaluation and Discussion

In this section, we introduce the setup of experiments for overheads evaluation at first. Afterwards, the measured overheads are reported and analyzed. At the end, we discuss the need of formal verification over the implementation generally.

V-a Experimental Setup

We evaluated the overheads of our implementations on the following platform: a NXP QorIQ T4240 RDB reference design board, which is the same as used in [DBLP:conf/adaEurope/CatellaniBHM15]. It has 6 GB DDR3 memory with 1866 MT/s data rate, 128 MB NOR flash(16-bit), and 2 GB SLC NAND flash. The processor T4240 contains 24-virtual-core (12 physical cores) with the PowerPC Architecture, and is running on 1.67 GHz.

To measure the overheads of our implemented protocols, timestamps are added before and after the function of our implementations. The obtain and release functions of the semaphore are measured, denoted as lock and unlock respectively. We consider a multi-processor system consists of four processors, i.e., , including three application processors and one synchronization processor. The total number of tasks , and the number of available shared resources , i.e., . On each application processor, there are five tasks with five different priority levels, i.e., {High (H), Medium-High (MH), Medium (M), Medium-Low (ML) and Low (L)}. Each task requests one of these three shared resources. Details can be found in Table I.

L () L () L () -
ML () ML () ML () -
M () M () M () -
MH () MH () MH () -
H () H () H () -
TABLE I: Processor allocation of the test application.
Fig. 5: Overheads of protocols in RTEMS (lock operation is ended by _lk and unlock operation is ended by _ulkl)). The measurement of migrating a task to the synchronization processor (denoted as mig_to) and back to the application processor (denoted as mig_bk).

V-B Overheads Evaluation

The overheads for different protocols are reported in Figure 5, based on more than 9,000 instances of lock and unlock operations. These distributed-based protocols, i.e., DPCP and DFLP have higher overheads than others, due to the task migrations. DFLP has the highest average overheads, since it also maintains the dynamic ceiling priority update. MrsP also has relative high overheads, since it has the help mechanism requiring task migration (however, help mechanism may not be activated all the time). Our results related to MrsP are similar as reported in [DBLP:conf/adaEurope/CatellaniBHM15]

, i.e., 5376 ns for lock and 5514 ns for unlock on average. FMLP-L has the lowest overheads, due to the simplest mechanism. Overall, the overheads for all the protocols are relatively low and acceptable. For distributed-based protocols, we can observe that there are quite a few outliers. In fact, a similar observation has been reported in 

[shi-OSPERT17]. One reason could be that the behavior of cache memories kicks in to introduced operation overheads, but we have no sufficient data to pinpoint the exact cause here.

The migration overheads are measured separately, and reported in the left side of Figure 5. The results show that the overheads of task migration are significant, which might substantially affect those distributed-base protocols, i.e., DPCP and DFLP. Interestingly, we also notice that the overhead of a task to migrate to the synchronization processor is faster than migrating back to the application processor. The reason is that, normally there are more tasks running on the application processors than synchronization processors, which causes a task has to wait for longer time to obtain the scheduler instance lock on average. That is why the unlock overheads of DPCP and DFLP are higher than the lock overheads.

Although our evaluated overheads on RTEMS are similar to these protocols that are implemented on  [DBLP:conf/adaEurope/CatellaniBHM15, Shi-Dependency-RTAS2019], implementations of protocols on RTEMS and are not directly comparable due to the difference of purposes and architectures in two operating systems. Please note that RTEMS is a self-contained RTOS for real-world applications, whilst is a Linux-based testbed, which is mainly used for functional validation. It might be interesting to investigate which protocol is preferable on which operating systems, but it is considered out of scope here.

V-C Validation and Formal Verification

To validate the correctness of our implementation, at first we test over the official coverage tests provided by RTEMS, i.e., the SMP test suites ( especially, on the PowerPC device and also the QEMU emulator for ARM RealView Platform realview-pbx-a9, and conclude that the SMP related peripheries in RTEMS are not affected at all. Moreover, we further design several dedicated corner cases for each protocol and ensure that the designated tasks execute as the expected behaviors, which are treated as the additional coverage test for the future integration.

We note that such case-based validation may not be sufficient, since it is not possible to test over every case exhaustively. One possible way is to adopt software model checkers as proposed in  [DBLP:conf/icfem/GadiaAB16] to detect potential data races and deadlocks in the implementation of PIP with nested locks in RTEMS. However, such searching approaches may not scale well for multiprocessor protocols unless an effective pruning strategy can be found beforehand. How to validate or formally verify an existing implementation of synchronization protocols is still an unsolved problem but out of the scope.

Vi Conclusion

Over the decades, quite a few number of resource synchronization protocols have been extensively studied for uni-processor and especially multiprocessor real-time systems. In this work, we reviewed the SMP support in one popular real-time operating system RTEMS and detailed how we develop three state-of-the-art multiprocessor resource synchronization protocols, i.e., MPCP, DPCP, and FMLP, and their variants. With extensive synthetic experiments, the measured results showed that our implementations are comparable to MrsP, which is officially supported in RTEMS. Considering the real system overhead, the performance of resource synchronization protocols might be clarified and decidable by system designers.

Although several dedicated tests are provided to verify the correctness of the implementation, formal model checking is still desirable to prevent the system from potential deadlock, data races, and priority inversions. In the future work, we plan to explore on nested resource synchronization and support the arbitrary processor affinity in RTEMS to improve the generality and the efficiency. An ongoing effort is also provided to support for the latest version of RTEMS.


This paper is supported by DFG, as part of the Collaborative Research Center SFB876, subproject A1 and A3 (