DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks

07/01/2020 ∙ by Maolin Yang, et al. ∙ 0

Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources are partitioned to designated processors, and requests to each global resource are conducted on the processor on which the resource is partitioned. We extend the Distributed Priority Ceiling Protocol (DPCP) for parallel tasks under federated scheduling, with which we proved that a request can be blocked by at most one lower-priority request. We develop task and resource partitioning heuristics and propose analysis techniques to safely bound the task response times. Numerical evaluation (with heavy tasks on 8-, 16-, and 32-core processors) indicates that the proposed methods improve the schedulability significantly compared to the state-of-the-art locking protocols under federated scheduling.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

To exploit the parallelism for time-critical applications on multicores, the design of scheduling and analysis techniques for parallel real-time tasks has attracted increasing interests in recent years. Among the scheduling algorithms for parallel real-time tasks, the federated scheduling [13] is a promising approach with high flexibility and simplicity in analysis.

Coordinated locking protocols are used to ensure mutually exclusive access to shared resources while preventing uncontrolled priority inversions [6, 11]. In multiprocessor systems, requests to shared resources can be executed locally by the tasks [15] or remotely by resource agents [16], e.g., by means of the Remote Procedure Call (RPC) mechanism. Local execution of requests is in general more efficient since migrations are not needed, while blockings can be better explored and managed with remote execution of requests, e.g., by constraining resource contentions on designated processors [9, 10]. While existing locking protocols for parallel tasks [6, 11] are all based on local execution of requests, no work has been done with remote execution of requests so far as we know.

The Distributed Priority Ceiling Protocol (DPCP) [16] is a classic multiprocessor real-time locking protocol for sequential tasks that executes requests of global resources remotely, where both tasks and shared resources are partitioned among the processors and all requests to a global resource must be conducted by the resource agents on the processor on which the resource is partitioned. Empirical studies [3] indicate that the DPCP has better schedulability performance compared to similar protocols with local execution of requests. Further, the recent Resource-Oriented Partitioned (ROP) scheduling [10, 17, 18] with the DPCP guarantees bounded speedup factors.

In addition, since each heavy task exclusively uses a subset of processors under federated scheduling, there could be significant resource waste under the federated scheduling, i.e., almost half of the processing capacity is wasted in the extreme case. Executing global-resource-requests on remote processors can alleviate the potential resource wastes by shifting a part of the resource-related workload of a task to processors with lower workload.

This paper for the first time studies the distributed synchronization framework for parallel real-time tasks. we answer the fundamental question of whether the key insight of remote execution of shared resources for sequential tasks can be applied to parallel real-time tasks and how to do so. We propose DPCP-p, an extension of DPCP, to support parallel real-time tasks under federated scheduling, and develop the corresponding schedulability analysis and partitioning heuristic. DPCP-p retains the fundamental property of the underlying priority ceiling mechanism of the DPCP, namely a request can be blocked by at most one lower-priority request. Numerical evaluation with heavy tasks on more than 8-core processors indicates that DPCP-p improves the schedulability performance significantly compared to existing locking protocols under federated scheduling.

Ii System Model and Terminologies

We consider a set of parallel tasks to be scheduled on identical processors with shared resources .

Parallel Tasks. Each task is characterized by a Worst-Case Execution Time (WCET) , a relative deadline , and a minimum inter-arrival time , where (constrained-deadline is considered). The utilization of is defined by .

The structure of is represented by a Directed Acyclic Graph (DAG) , where is the set of vertices and is the set of edges. Each vertex has a WCET , and the WCET of all vertices of is . Each edge represents the precedence relation between and . A vertex is said to be pending during the time while all its predecessors are finished and is not finished. A complete path is a sequence of vertices , where is a head vertex, is a tail vertex, and is the predecessor of for each pair of consecutive vertices and . We use to denote an arbitrary complete path. The length of , denoted by , is defined as the sum of the WCETs of the vertices on . We also use to denote the length of the longest path of . For example in Fig. 1(a), the longest path of is , and

At runtime, each task generates a sequence of jobs, and each job inherits the DAG structure of the task. Let denote the th job of . Let and denote the arrival and finish time of respectively, then must finish no later than , and the subsequent job cannot arrive before . The Worst-Case Response Time (WCRT) of task is defined as . For brevity, let be an arbitrary job of .

Shared Resources. Each task uses a set of shared resources , and each resource is shared by a set of tasks . To ensure mutual exclusion, is protected by a binary semaphore (also called a lock for short). A job is allowed to execute a critical section for only if it holds the lock of , otherwise, it is suspended. A vertex requests at most times, and each time uses for a time of at most . For simplicity, we assume that a path requests at most times, and a job requests at most times.

Given that is included in , for brevity, we use and to denote the WCETs of the non-critical sections of and , respectively. For simplicity, it is assumed that . Further, critical sections are assumed to be non-nested, and nested critical sections remain in future work.

Scheduling. The tasks are scheduled based on the federated scheduling paradigm [13]. Each task with (i.e., heavy tasks) is assigned dedicated processors, and we use to denote the set of processors assigned to . The rest light tasks are assigned to the remaining processors. Each task has a unique base priority , and implies that has a base priority lower than . All jobs of and all vertices of have the same base priority .

At runtime, each heavy task is scheduled exclusively on the assigned processors according to a work-conserving scheduler (i.e., no processor assigned to a task is idle when there is a vertex of the task is ready to be scheduled), while each light task is treated as a sequential task and is scheduled with the tasks (if one exists) assigned on the same processor. We focus on heavy tasks in the following and discuss how to handle both heavy and light tasks in Sec. VI.

Iii The Distributed locking protocol DPCP-p

The design of DPCP-p is based on the DPCP [16] and extents it to support parallel real-time tasks under federated scheduling.

Iii-a The Synchronization Framework

Under federated scheduling, a resource can be shared locally or globally. A resource is a local resource if it is shared only by the vertices of a single task, and it is a global resource if it is shared by more than one task. For example in Fig. 1, is a global resource and is a local resource. We use and to denote the local resources and the global resources respectively.

Each global resource is assigned to a processor, and all requests to must execute on that processor, e.g., by means of an RPC-like agent [16]. Once a vertex requests a global resource, it is suspended until the agent finishes. Requests to local resources are executed by the tasks directly, i.e., no migration is required.

For brevity, we use to denote the set of global resources on processor . The global resources that are assigned to the same processor as are denoted by , and the global resources that are assigned to the same processors as are denoted by .

Iii-B Queue Structure

While pending, a vertex is either executing, ready and not scheduled, or suspended. The following queues are used to maintain the states of the vertices for each task.

  • : the ready queue of for the vertices that are ready to execute non-critical sections. The vertices in are scheduled in a First In First Out (FIFO) order.

  • : the ready queue of for the vertices that are holding local resources. The vertices in are scheduled in a FIFO order. If both and are not empty, the vertices in are scheduled first .

  • : the suspended queue of . Each vertex in is waiting for a request to be finished.

In addition, each processor maintains two hybrid queues to handle the global-resource-requests.

  • : the ready queue of the global-resource-requests on processor . The requests in are scheduled by the priorities of the tasks.

  • : the suspended queue of the global-resource-requests on processor .

Iii-C Locking Rules

Under priority scheduling, the problem of priority inversion [2] is inevitable when jobs compete for shared resources. Various progress mechanisms [16, 15, 2] are used to minimize the duration of priority inversions. We consider the inherent priority ceiling mechanism as used in the DPCP [16] in the following.

Consider a global resource on processor , the priority ceiling of is defined as , where is a priority level higher than the base priority of any task in . At runtime, the processor ceiling of at some time , denoted by , is the maximum of the priority ceilings of the global resources that are allocated to and locked at time . Let be a request from a job to a global resource . The effective priority of at some time , denoted by , is elevated to . The priority ceiling mechanism ensures that: a global-resource-request is granted the lock at time only if .

Next, we introduce the locking rules of DPCP-p. Consider a vertex issues a request for a resource at some time .

Rule 1. If is a local resource locked by another vertex at time , then is suspended and enqueued to .

Rule 2. If is a local resource not locked at time , then locks and queues upon , i.e., is ready to be scheduled to execute the critical section.

Rule 3. If is a global resource on some processor , then is suspended and enqueued to . Meanwhile, tries to lock according to the priority ceiling mechanism. queues upon and is ready to be scheduled (according to its priority) if the lock is granted, otherwise is enqueued to .

Rule 4. Once finishes, it releases the lock of , and dequeues from if is a global resource. Then, is enqueued to .

Fig. 1 shows an example schedule of DPCP-p with two DAG tasks on a four-core processor, and each task is assigned two processors. At time , (i) is suspended and enqueued to until the global-resource-request finishes on processor at time , (ii) is suspended and enqueued to until releases at time , and (iii) locks a local resource , enqueued to , and is scheduled until time , while is suspended and queued upon until releases at time .

(a) The structures of two DAG tasks with resources (red) and (blue).
(b) Example schedule of DPCP-p with being assigned to .
Fig. 1: Example schedule of two DAG tasks.
Lemma 1.

Under DPCP-p, a request can be blocked by lower-priority requests at most once.

Proof.

We prove by contradiction. Since each local resource is used only by a task, we consider global-resource-requests. Suppose that a request () on a processor is blocked by at least two lower-priority requests and (, ). Let and be the time when arrives and finishes respectively. Let and be the time when and are granted the locks respectively. Without loss of generality, we assume that .

While is pending at some time , the processor ceiling according to the priority ceiling mechanism. Since can be blocked by , the priority ceiling of is larger than , i.e., . Thus, during . Further, by hypothesis, is blocked by , then must be granted the lock at some time . Accoring to the priority ceiling mechanism, the effective priority of must be larger than the processor ceiling at time , i.e., . Thus, . Contradiction. ∎

Iv Worst-Case Response Time Analysis

We derive an upper bound of the WCRT of an arbitrary path of using the fixed-point Response-Time Analysis (RTA) in this section. Let be the WCRT of an arbitrary path , then can be upper bounded by the maximum of the WCRTs of the paths, that is

(1)

To upper bound

, we classify the delays of a path into four categories as follows.

Iv-a Blocking and Interference

First, we consider a global-resource-request of a job (, ) that causes to incur

  • inter-task blocking, if an agent on behalf of is executing on some processor while is suspended on a global resource on .

Second, a vertex of that is not on (i.e., ) causes to incur

  • intra-task blocking, if is holding a local resource and scheduled while is suspended on , or if an agent is executing on behalf of on some processor while is suspended on a global resource on ; and

  • intra-task interference, if is executing a non-critical section or a local-resource-request while is ready and not executing.

Third, a global-resource-request from another job or from a vertex that is not on causes to incur

  • agent interference, if an agent on behalf of the request is executing while is (i) ready and not executing, or (ii) suspended on a local resource and the resource holder is not scheduled (i.e., preempted by the agent of the request).

Notabaly, the defined delays are mutually exclusive, i.e., at any point in time, a vertex or an agent can cause a path to incur at most one type of delay. This is essential to minimize over-counting in the blocking time analysis. For example in Fig. 1(b), at any time during , only causes path to incur inter-task blocking, only causes path to incur intra-task blocking, only causes path to incur intra-task interference, and only causes path to incur agent interference. It is also noted that a path can incur multiple types of delays at a time. For instance, at any time during , path incurs intra-task interference and agent interference due to and respectively.

Based on these definitions, we derive an upper bound on in Theorem 1. In preparation, we use to denote the workload of the other tasks that causes to incur inter-task blocking. Analogously, let and denote the workloads of the vertices of not on that cause to incur intra-task blocking and intra-task interference, respectively. Let denote the workload of the agents that causes to incur agent interference. These open variables will be bounded in Sect. IV-B and IV-C.

Theorem 1.

.

Proof.

While is pending, it is either (I) executing, (II) suspended and executing on global resources, (III) ready and not executing, (IV) suspended and not executing on any global resource. By definition, the duration of (I) and (II) can be bounded by .

For case (III). The workload executed on can be from (i) the vertices of not on (i.e., intra-task interference), and (ii) the agents on behalf of the vertices not on (i.e., agent interference). By definition, the workload of (i) can be upper-bounded by , and the workload of (ii), denoted by , is a part of .

For case (IV). If is suspended on a local resource , then is either (iii) waiting a vertex of not on to release (i.e., intra-task blocking), or (iv) waiting the agents that preempted the resource holder to finish (i.e., agent interference). If is suspended on a global resource on a processor , then it can be delayed by (v) an agent on behalf of another task on (i.e., inter-task blocking), or (vi) an agent on behalf of a vertex of not on on (i.e., intra-task blocking). By definition, the duration of (iii) and (vi) is , and the duration of (v) is . Further, for case (iv), we let the workload of the agents be .

Total durations of (I) - (IV). In (III) and (IV)-(iv), there is at least a vertex ready and not executing, thus none of the processors is idle according to work-conserving scheduling. Let the duration of (III) and (IV)-(iv) be , then . By definition, . Hence, . Summing up (I) - (IV), we have . ∎

Iv-B Upper Bounds on Blockings

We begin with the analysis of inter-task blocking. To derive an upper bound on , we first derive an upper bound on the response time of a global-resource-request.

In preparation, let denote the maximum number of jobs of a task during a time interval of length . It has been well studied that . Further, let be the cumulative length of the requests from higher-priority tasks of to the global resources that are assigned on the same processor as during a time interval of length . Since there are jobs of each higher-priority task () during a time interval length of , and each job uses resource for a time of at most , summing up the workload of all the higher-priority requests we have

(2)

Let be the response time of a request from to a global resource . We bound according to the following lemma.

Lemma 2.

can be upper bounded by the least positive solution, if one exists, of the following equation.

(3)

Where,

Proof.

Under DPCP-p, a global-resource-request has an effective priority higher than . Thus, while is pending, only the global-resource-requests can execute. Since global-resource-requests are scheduled by their priorities, may wait for (i) at most one lower-priority request to a global resource with priority ceiling no less than on the processor, (ii) intra-task requests from the vertices not on to the global resources on the processor, and (iii) higher-priority requests to the global resources on the processor.

Be definition, (i) can be bounded by , and (ii) can be bounded by . By the definition of , (iii) can be bounded by . In addition, executes at most . We claim the lemma by summing up the respective bounds. ∎

With Lemma 2 in place, we are ready to upper bound .

Lemma 3.

, where,

(4)

and

(5)
Proof.

Each time requests a global resource on , it can be blocked by (i) at most one lower-priority request and (ii) all higher-priority requests. Analogous to the proof in Lemma 2, (i) can be bounded by , and (ii) can be bounded by . Since requests each global resource at most times, the workload of the other tasks that cause to incur inter-task blocking on can be bounded by in Eq. (4).

Further, each other task () has at most jobs before finishes, and each job uses a resource for a time of at most . Thus, the workload of the other tasks for the global resources on can be bounded by in Eq. (5). We claim the lemma by summing up the minimum of and for all processors. ∎

Next, we derive an upper bound for intra-task blocking. For brevity, let . Intuitively, if there is a vertex on path requests a global resource on processor , and otherwise.

Lemma 4.

, where,

(6)

and,

(7)
Proof.

By definition, incurs intra-task blocking on a local resource only if it requests . Clearly, if requests , and otherwise. Given that the vertices of not on can execute on a resource for a total of at most , incurs intra-task blocking for at most , as in Eq. (6).

Moreover, incurs intra-task blocking on a global resource on some processor only if it requests some global resource on . By definition, if requests some global resource on , and otherwise. Thus, the workload that cause to incur intra-task blocking on can be bounded by summing up for all the global resources on , i.e., , as in Eq. (7).

Thus, can be bounded by summing up (i) for all local resource in , and (ii) for all processors. ∎

Iv-C Upper Bounds on Interference

Next, we derive upper bounds for the intra-task interference and the agent interference. First, the intra-task interference of can be upper bounded by the workload of the non-critical sections and the local-resource-requests of the vertices of that are not on .

Lemma 5.

.

Proof.

By definition, consists of the workload of (i) the non-critical sections and (ii) the local-resource-requests from the vertices of that are not on . From the task model, (i) and (ii) are bounded by and , respectively. Thus, can be bounded by the total of (i) and (ii). ∎

For each global resource on , the agent interference of consists of the agent workload of the vertices that are not on .

Lemma 6.

, where,

(8)

and,

(9)
Proof.

While is pending, the other tasks can request a resource for at most , and the vertices of not on can execute on for at most . Thus, the agent interference of can be bounded by summing up for all the global resources on . ∎

Now that we bounded all the variables in Theorem 1, thus the WCRT of task can be bounded according to Eq. (1) by calculating the WCRTs of all paths of .

V Task and Resource Partitioning

According to the schedulability analysis in Sect. IV, the WCRT of a task can be determined only if the tasks and the global resources are partitioned. In this section, we present a partitioning algorithm to iteratively assign tasks and resources.

For ease of description, we consider the processors assigned to each task as a cluster. Accordingly, we use to denote the th cluster (). The capacity of , denoted by , is equal to the number of the processors in . The utilization of , denoted by , is the total of the utilizations of the task and the resources assigned to , where the utilization of a resource is defined as . The total utilization of the global resources assigned to a processor is denoted by , i.e., . The utilization slack of a cluster is defined by . A cluster is infeasible if .

Each task is initially assigned processors, and the global resources are partitioned according to the Worst-Fit Decreasing (WFD) heuristic, as shown in Algorithm 1. Intuitively, the global resource with the highest utilization is assigned to the processor with the lowest resource utilization in the cluster with maximum utilization slack, as shown in Algorithm 2. The schedulability analysis is performed from the task with highest base priority. If there is a task unschedulable, then we assign an additional processor, if one exists, to that task. Since the capacity of the cluster is updated when an additional processor is assigned, we re-assign global resources and perform schedulability tests accordingly. The partitioning process runs at most rounds for systems containing only heavy tasks.

1:the tasks , the processors , and the resources
2:the schedulability of the system
3:for do
4:    if there are processors unassigned then
5:        assign processors to
6:    else
7:        return unschedulable
8:while true do
9:    if WFD_Resource(, ) is infeasible then
10:        return unschedulable
11:    for in decreasing order of priority do
12:        if then
13:            if there is a processor unassigned then
14:                assign one more processor to
15:                rollback of the global resource assignment
16:                break     // i.e., go to line 9
17:            else
18:                return unschedulable
19:    return schedulable
Algorithm 1 Task and Resource Partitioning
1:the global resources , and the processors
2:the feasibility of the global resource allocation
3:sort in non-increasing order of utilization
4:for do
5:    
6:for do
7:    select the cluster with the maximum value of
8:    if then
9:        return infeasible allocation
10:    else
11:        assign to processor , s.t.,
12:        
13:return feasible allocation
Algorithm 2 WFD_Resources

Vi Discussions

Although we focus on heavy tasks in this paper, the DPCP-p approach can be extended to support light tasks. First, light tasks are treated as sequential tasks under federated scheduling, thus the original DPCP can be used to handle resource sharing between them. Further, since each heavy task is exclusively assigned a cluster of processors, the delays between heavy and light tasks are only due to global resources. According to the definitions in Sect. IV-A, such delays can be captured by inter-task blocking and agent interference. According to Lemma 3 and Lemma 6, bounding both inter-task blocking and agent interference does not distinguish between heavy and light tasks. Thus, the delays between heavy and light tasks can be analyzed using the analysis framework as presented in Sect. IV. Notably, handling light tasks with shared resources optimally under federated scheduling remains as an open problem.

Further, we assume that the maximum number of requests of each vertex is known. This is possible in some real-life applications when the maximum number of requests of each vertex can be pre-determined. Thus we can derive a more accurate blocking bound by using the exact number of requests on a path , i.e., , rather than enumerating the value of from  [6]. The tradeoff is more calculations to enumerate all paths of the task in analysis. Notably, the presented analysis applies to the prior model [6, 11] by using the key-path-oriented analysis [11].

The blocking-time analysis can be further improved by modern analysis techniques, e.g., the Linear-Programming-based (LP-based) analysis in 

[3]. However, we have no evidence on how the LP-based analysis [3] can be applied for this scenario yet. Thus, we first establish the fundamental analysis framework in this paper and remain fine-grained analysis as future work.

Vii Empirical Comparisons

In this section, we conduct schedulability experiments to evaluate the DPCP-p approach using synthesized heavy tasks.

Vii-a Experimental Setup

Multiprocessor platforms with unispeed processors and , ranging over , or , shared resources were considered. For each , we generated the total utilizations of the tasksets from 1 to in steps of 0.05. The task utilizations of a taskset were generated according to the RandFixedSum algorithm [7] ranging over , where is the average utilization of the tasks. The base priority of the tasks was assigned by the Rate Monotonic (RM) heuristic. The number of tasks was determined by the chosen and the total utilization of the taskset.

For each task , the DAG structure was generated by the Grégory Erdös-Rényi algorithm [5], where the number of vertices was randamly chosen in

, and the probability of an edge between any two vertices was set to 0.1. Task period

was randomly chosen from log-uniform distributions ranging over

, and was computed by . uses each resource in a probability . If used , the maximum number of requests was randomly chose from or , and the maximum critical section length was chosen in or . The WCET of each vertex and the maximum number of requests in each vertex were randamly determined such that and . To ensure plausibility, we enforced that and . The combination of the parameters consists of 216 experimental scenarios.

Vii-B Baselines

We compare DPCP-p with existing locking protocols, denoted by SPIN-SON [6] and LPP [11], under federated scheduling (there is no study on locking protocols for the other state-of-the-art scheduling approaches in the literature, for which we will discuss in Sect. VIII). For DPCP-p, we use DPCP-p-EP to denote the analysis as presented in Sect. IV by enumerating all paths, and use DPCP-p-EN to denote the analysis by enumerating from 0 to for as in [6, 11]. We also use FED-FP to denote a hypothesis baseline without considering shared resources under federated scheduling [13].

Vii-C Results

Fig. 2 shows acceptance ratios of the tested approaches with increasing normalized utilization, where Fig. 2(b) and (d) include more resource contentions compared to Fig. 2(a) and (c). It is shown that DPCP-p-EP consistently schedules more tasksets than SPIN-SON and LPP. In particular, the advantage of the DPCP-p approach is more significant for heavy resource-contentions as shown in Fig. 2(b) and (d), while SPIN-SON appears to be competitive for light resource-contentions as shown in Fig. 2(a) and (c).

For brevity, we use the notations of dominance and outperformance111For an experimental scenario, algorithm is said to outperform algorithm if algorithm scheduled more task sets than algorithm , or dominate algorithm if its acceptance ratio is higher than that of algorithm at some tested points and never lower than that of algorithm at any tested point. to summarize the main trends of the results in Table 2 and 3. It is shown that the DPCP-p approach improves upon SPIN-SON and LPP significantly. In particular, DPCP-p-EP outperforms in all scenarios, and it dominates in more than 99% scenarios. Similarly, DPCP-p-EN dominates and outperforms more often than less.

(a) . (b) . (c) . (d) .
Fig. 2: Experiment results for , , where , , for (a) and (c), and , , (b) and (d).

Table 2. Statistic for Dominance.

DPCP-p-EP DPCP-p-EN SPIN-SON LPP
DPCP-p-EP N/A 216(100%) 215(99.5%) 216(100%)
DPCP-p-EN 0(0.0%) N/A 104(48.1%) 87(40.3%)
SPIN-SON 0(0.0%) 10(4.6%) N/A 39(18.1%)
LPP 0(0.0%) 32(14.8%) 38(17.6%) N/A

Table 3. Statistic for Outperformance.

DPCP-p-EP DPCP-p-EN SPIN-SON LPP
DPCP-p-EP N/A 216(100%) 216(100%) 216(100%)
DPCP-p-EN 0(0.0%) N/A 138(63.9%) 158(73.1%)
SPIN-SON 0(0.0%) 78(36.1%) N/A 114(52.8%)
LPP 0(0.0%) 58(26.9%) 102(47.2%) N/A

Viii Related Work

Real-time scheduling algorithms and analysis techniques for independent parallel tasks have been widely studied in the literature [8, 13, 12, 14, 1, 4], where shared resources are not modeled explicitly.

The study of multiprocessor real-time locking protocols stems from the DPCP [16] and the Multiprocessor Priority Ceiling Protocol (MPCP) [15]. Empirical studies [3] showed that the DPCP has better schedulability performance than the MPCP. Based on the DPCP, Hsiu et al. [9] presented a dedicated-core scheduling. More recently, Huang et al. [10] proposed the ROP scheduling. However, the work in [16, 15, 3, 9, 10] all assumes sequential task models. Although the locking protocols that are originally used for sequential tasks, e.g., [16, 15], might be used to handle concurrent requests of parallel tasks, no work on the corresponding analysis has been established in the literature. In this paper, we extend the DPCP to support parallel real-time tasks and present the schedulability analysis.

Recently, there is significant progress on the scheduling of parallel real-time tasks, e.g., partitioned [4], semi-partitioned [1], global [8, 14], federated [13], and decomposition-based scheduling [12]. However, no study on locking protocols for the state-of-the-art scheduling approaches other than the federated scheduling have been reported in the literature, so far as we know. For federated scheduling, Dinh et al. [6] studied the schedulability analysis for spinlocks. Jiang et al. [11] developed a semaphore protocol called LPP under federated scheduling. Both [6] and [11] assume local execution of resource requests. The presented DPCP-p is based on a distributed synchronization framework, where requests to global resources are conducted on designated processors. In this way, the contention on each resource can be isolated to the designated processor such that blocking among tasks can be better managed.

Ix Conclusion

This paper for the first time studies the distributed synchronization framework for parallel real-time tasks with shared resources. We extend the DPCP to DAG tasks for federated scheduling and develop analysis techniques and partitioning heuristic to bound the task response time. More precise blocking analysis based on the concrete DAG structure would be an interesting future work.

References

  • [1] V. Bonifaci, G. D’Angelo, and A. Marchetti-Spaccamela (2017) Algorithms for hierarchical and semi-partitioned parallel scheduling. In IPDPS, pp. 738–747. External Links: Link, Document Cited by: §VIII, §VIII.
  • [2] B. B. Brandenburg and J. H. Anderson (2010) Optimality results for multiprocessor real-time locking. See DBLP:conf/rtss/2010, pp. 49–60. External Links: Link, Document Cited by: §III-C.
  • [3] B. B. Brandenburg (2013) Improved analysis and evaluation of real-time semaphore protocols for P-FP scheduling. See DBLP:conf/rtas/2013, pp. 141–152. External Links: Link, Document Cited by: §I, §VI, §VIII.
  • [4] D. Casini, A. Biondi, G. Nelissen, and G. C. Buttazzo (2018) Partitioned fixed-priority scheduling of parallel tasks without preemptions. In RTSS, pp. 421–433. External Links: Link, Document Cited by: §VIII, §VIII.
  • [5] D. Cordeiro, G. Mounié, S. Perarnau, D. Trystram, J. Vincent, and F. Wagner (2010) Random graph generation for scheduling simulations. See DBLP:conf/simutools/2010, pp. 60. External Links: Link, Document Cited by: §VII-A.
  • [6] S. Dinh, J. Li, K. Agrawal, C. D. Gill, and C. Lu (2018) Blocking analysis for spin locks in real-time parallel tasks. IEEE Trans. Parallel Distrib. Syst. 29 (4), pp. 789–802. External Links: Link, Document Cited by: §I, §VI, §VII-B, §VIII.
  • [7] P. Emberson, R. Stafford, and R. I. Davis (2010) Techniques for the synthesis of multiprocessor tasksets. In WATERS, pp. 6–11. Cited by: §VII-A.
  • [8] J. C. Fonseca, G. Nelissen, and V. Nélis (2017) Improved response time analysis of sporadic DAG tasks for global FP scheduling. In RTNS, E. Bini and C. Pagetti (Eds.), pp. 28–37. External Links: Link, Document Cited by: §VIII, §VIII.
  • [9] P. Hsiu, D. Lee, and T. Kuo (2011) Task synchronization and allocation for many-core real-time systems. See DBLP:conf/emsoft/2011, pp. 79–88. External Links: Link, Document Cited by: §I, §VIII.
  • [10] W. Huang, M. Yang, and J. Chen (2016) Resource-oriented partitioned scheduling in multiprocessor systems: how to partition and how to share?. See DBLP:conf/rtss/2016, pp. 111–122. External Links: Link, Document Cited by: §I, §I, §VIII.
  • [11] X. Jiang, N. Guan, W. Liu, and M. Yang (2019) Scheduling and analysis of parallel real-time tasks with semaphores. See DBLP:conf/dac/2019, pp. 93. External Links: Link, Document Cited by: §I, §VI, §VII-B, §VIII.
  • [12] X. Jiang, X. Long, N. Guan, and H. Wan (2016) On the decomposition-based global EDF scheduling of parallel real-time tasks. See DBLP:conf/rtss/2016, pp. 237–246. External Links: Link, Document Cited by: §VIII, §VIII.
  • [13] J. Li, J. Chen, K. Agrawal, C. Lu, C. D. Gill, and A. Saifullah (2014) Analysis of federated and global scheduling for parallel real-time tasks. See DBLP:conf/ecrts/2014, pp. 85–96. External Links: Link, Document Cited by: §I, §II, §VII-B, §VIII, §VIII.
  • [14] A. Melani, M. Bertogna, V. Bonifaci, A. Marchetti-Spaccamela, and G. C. Buttazzo (2017) Schedulability analysis of conditional parallel task graphs in multicore systems. IEEE Trans. Comput. 66 (2), pp. 339–353. External Links: Link, Document Cited by: §VIII, §VIII.
  • [15] R. Rajkumar (1990) Real-time synchronization protocols for shared memory multiprocessors. In ICDCS, pp. 116–123. External Links: Link, Document Cited by: §I, §III-C, §VIII.
  • [16] R. Rajkumar, L. Sha, and J. P. Lehoczky (1988) Real-time synchronization protocols for multiprocessors. See DBLP:conf/rtss/1988, pp. 259–269. External Links: Link, Document Cited by: §I, §I, §III-A, §III-C, §III, §VIII.
  • [17] G. von der Brüggen, J. Chen, W. Huang, and M. Yang (2017) Release enforcement in resource-oriented partitioned scheduling for multiprocessor systems. See DBLP:conf/rtns/2017, pp. 287–296. External Links: Link, Document Cited by: §I.
  • [18] M. Yang, W. Huang, and J. Chen (2019) Resource-oriented partitioning for multiprocessor systems with shared resources. IEEE Trans. Compt. 68 (6), pp. 882–898. Cited by: §I.