1. Introduction
With the prevalence of multiprocessor platforms and parallel programming languages and runtime systems such as OpenMP (OpenMP, 2019), Cilk Plus (Frigo et al., 1998; Intel Cilk Plus, 2019), and Intel’s Threading Building Blocks (Intel Threading Building Blocks, 2019), the demand for computer programs to be able to exploit the parallelism offered by modern hardware is inevitable. In recent years, the realtime systems research community has worked to address this trend for realtime applications that require parallel execution to satisfy their deadlines, such as realtime hybrid simulation of structures (Ferry et al., 2014), and autonomous vehicles (Kim et al., 2013).
Much effort has been made to develop analysis techniques and schedulability tests for scheduling parallel realtime tasks under scheduling algorithms such as Global Earliest Deadline First (GEDF), and Global Deadline Monotonic (GDM). However, schedulability analysis for parallel tasks is inherently more complex than for conventional sequential tasks. This is because intratask parallelism is allowed within individual tasks, which enables each individual task to execute simultaneously upon multiple processors. The parallelism of each task can also vary as it is executing, as it depends on the precedence constraints imposed on the task. Consequently, this raises questions of how to account for intertask interference caused by other tasks on a task and intratask interference among threads of the task itself.
In this paper, we consider task systems that consist of parallel tasks scheduled under Global FixedPriority (GFP), in which each task is represented by a Directed Acyclic Graph (DAG). Our analysis is based on the concepts of critical interference and critical chain (Chwa et al., 2013; Melani et al., 2015; Chwa et al., 2017), which allow the analysis to focus on a special chain of sequential segments of each task, and hence enable us to use techniques similar to the ones developed for sequential tasks (Baker, 2003; Bertogna et al., 2005; Bertogna and Cirinei, 2007; Bertogna et al., 2009).
The contributions of this paper are as follows:

We summarize the stateoftheart analyses for GFP and highlight their limitations, specifically for the calculation of interference of carryin jobs and carryout jobs.

We propose a new technique for computing upperbounds on carryout workloads, by transforming the problem into an optimization problem that can be solved by modern optimization solvers.

We present a responsetime analysis, using the workload bound computed with the new technique. Experimental results for randomly generated DAG tasks confirm that our technique dominates existing analyses for GFP.
The rest of this paper is organized as follows. In Sections 2 and 3 we discuss related work and present the task model we consider in this paper. Section 4 reviews the concepts of critical interference and critical chain and discusses a general framework to bound responsetime. Section 5 summarizes the most recent analyses of GFP, and also highlights limitations of those analyses. In Section 6 we propose a new technique to bound carryout workload. A responsetime analysis and a discussion of the complexity of our method are given in Section 7. Section 8 presents the evaluation of our method for randomly generated DAG tasks. We conclude our work in Section 9.
2. Related Work
For the sequential task model, Bertogna et al. (Bertogna and Cirinei, 2007) proposed a responsetime analysis that works for GEDF and GFP. They bound the interference of a task in a problem window by the worstcase workload it can generate in that window. The worstcase workload is then bounded by considering a worstcase release pattern of the interfering task. This technique was later extended by others to analyze parallel tasks, as is done in this work. Bertogna et al. (Bertogna et al., 2009) proposed a sufficient slackbased schedulability test for GEDF and GFP in which the slack values for the tasks are used in an iterative algorithm to improve the schedulability gradually. Later, Guan et al. (Guan et al., 2009) proposed a new responsetime analysis for both constraineddeadline and arbitrarydeadline tasks.
Initially, simple parallel realtime task models were studied, such as the forkjoin task model and the synchronous task model. Lakshmanan et al. (Lakshmanan et al., 2010) presented a transformation algorithm to schedule forkjoin tasks where all parallel segments of each task must have the same number of threads, which must be less than the number of processors. They also proved a resource augmentation bound of 3.42 for their algorithm. Saifullah et al. (Saifullah et al., 2013) improved on that work by removing the restriction on the number of threads in parallel segments. They proposed a task decomposition algorithm and proved resource augmentation bounds for the algorithm under GEDF and Partitioned Deadline Monotonic (PDM) scheduling. Axer et al. (Axer et al., 2013) presented a responsetime analysis for forkjoin tasks under Partitioned FixedPriority (PFP) scheduling. Chwa et al. (Chwa et al., 2013) developed an analysis for synchronous parallel tasks scheduled under GEDF. They introduced the concept of critical interference and presented a sufficient test for GEDF. Maia et al. (Maia et al., 2014) reused the concept of critical interference to introduce a responsetime analysis for synchronous tasks scheduled under GFP. A general parallel task model was presented by Baruah et al. (Baruah et al., 2012) in which each task is modeled as a Directed Acyclic Graph (DAG) and can have an arbitrary deadline. They presented a polynomial test and a pseudopolynomial test for a DAG task scheduled with EDF and proved their speedup bounds. However, they only considered a single DAG task. Bonifaci et al. (Bonifaci et al., 2013) later developed feasibility tests for task systems with multiple DAG tasks, scheduled under GEDF and GDM.
Melani et al. (Melani et al., 2015) proposed a responsetime analysis for conditional DAG tasks where each DAG can have conditional vertices. Their analysis utilizes the concepts of critical interference and critical chain, and works for both GEDF and GFP. However, the bounds for carryin and carryout workloads are likely to be overestimated since they ignore the internal structures of the tasks. Chwa et al. (Chwa et al., 2017) extended their work in (Chwa et al., 2013) for DAG tasks scheduled under GEDF. They proposed a sufficient, workloadbased schedulability test and improved it by exploiting slack values of the tasks. Fonseca et al. (Fonseca et al., 2017) proposed a responsetime analysis for sporadic DAG tasks scheduled under GFP that improves upon the responsetime analysis in (Melani et al., 2015). They improve the upper bounds for interference by taking the DAGs of the tasks into consideration. In particular, by explicitly considering the DAGs the workloads generated by the carryin and carryout jobs can be reduced compared to the ones in (Melani et al., 2015), and hence schedulability can be improved. The carryin workload is bounded by considering a schedule for the carryin job with unrestricted processors, in which subtasks execute as soon as they are ready and for their full WCETs. The carryout workload is bounded for a less general type of DAG tasks, called nested forkjoin DAGs. We discuss the stateoftheart analyses for GFP and differentiate our work in detail in Section 5.
3. System Model
We consider a set of realtime parallel tasks, , scheduled preemptively by a global fixedpriority scheduling algorithm upon identical processors. Each task is a recurrent, sporadic process which may release an infinite sequence of jobs and is modeled by , where denotes its relative deadline and denotes the minimum interarrival time of two consecutive jobs of . We assume that all tasks have constrained deadlines, i.e., . Each task is represented as a Directed Acyclic Graph (DAG) , where is the set of vertices of the DAG and is the set of directed edges of . In this paper, we also use subtasks and nodes to refer to the vertices of the tasks. Each subtask of represents a section of instructions that can only be run sequentially. A subtask is called a predecessor of if there exists an edge from to in , i.e., . Subtask is then called a successor of . Each edge represents a precedence constraint between the two subtasks. A subtask is ready if all of its predecessors have finished. Whenever a task releases a job, all of its subtasks are released and have the same deadline as the job’s deadline. We use to denote an arbitrary job of which has release time and absolute deadline .
Each subtask has a worstcase execution time (WCET), denoted by . The sum of WCETs of all subtasks of is the worstcase execution time of the whole task, and is denoted by . The WCET of a task is also called its work. A sequence of subtasks of , in which , is called a chain of and is denoted by . The length of a chain is the sum of the WCETs of subtasks in and is denoted by , i.e., . A chain of which has the longest length is a critical path of the task. The length of a critical path of a DAG is called its critical path length or span, and is denoted by . Figure 1 illustrates an example DAG task with 6 subtasks, whose work and span are and , respectively. In this paper, we consider tasks that are scheduled using a preemptive, global fixedpriority algorithm where each task is assigned a fixed tasklevel priority. All subtasks of a task have the same priority as the task. Without loss of generality, we assume that tasks have distinct priorities, and has higher priority than if .
4. Background
In this section we discuss the concept of critical interference that our work is based on, and present a general framework to bound responsetimes of DAG tasks scheduled under GFP. In the next section, we summarize the stateoftheart analyses for GFP and give an overview of our method.
4.1. Critical Chain and Critical Interference
The notions of critical chain and critical interference were introduced by Chwa et al. (Chwa et al., 2017, 2013) for analyzing parallel tasks scheduled with GEDF. Unlike sequential tasks, analysis of DAG tasks with internal parallelism is inherently more complicated: (i) some subtasks of a task can be interfered with by other subtasks of the same task (i.e., intratask interference); (ii) subtasks of a task can be interfered with by subtasks of higherpriority tasks (i.e., intertask interference); and (iii) the parallelism of a DAG task may vary during execution, subject to the precedence constraints imposed by its graph. The critical chain and critical interference concepts alleviate the complexity of the analysis by focusing on a special chain of subtasks of a task which accounts for its response time, thus bringing the problem closer to a more familiar analysis technique for sequential tasks. Although they were originally proposed for analysis of GEDF (Chwa et al., 2017, 2013), these concepts are also useful for analyzing GFP. We therefore use them in our analysis and include a discussion of them in this section.
Consider any job of a task and its corresponding schedule. A lastcompleting subtask of is a subtask that completes last among all subtasks in the schedule of . A lastcompleting predecessor of a subtask is a predecessor that completes last among all predecessors of in the schedule of . Note that a subtask can only be ready after a lastcompleting predecessor finishes, since only then are all the precedence constraints for the subtask satisfied. Starting from a lastcompleting subtask of , we can recursively trace back through all lastcompleting predecessors until we reach a subtask with no predecessors. If during that process, a subtask has more than one lastcompleting predecessors, we arbitrarily pick one. The chain that is reconstructed by appending those lastcompleting predecessors and the lastcompleting subtask is called a critical chain of job . We call the subtasks that belong to a critical chain critical subtasks.
Example 4.1 ().
Figure 2 presents an example of a critical chain of a job of task , which has the same DAG as shown in Figure 1. In Figure 2, boxes with bold, solid borders denote the execution of critical subtasks of ; boxes with bold, dashed borders denote the execution of the other subtasks of . The other boxes are for jobs of other tasks. Subtask is a lastcompleting subtask. A lastcompleting predecessor of is . Similarly, a lastcompleting predecessor of is , and a lastcompleting predecessor of is . Hence a critical chain of is .
The critical chain concept has a few properties that make it useful for schedulability analysis of parallel DAG tasks. First, the first subtask of any critical chain of a job is ready to execute as soon as the job is released, since it does not have any predecessor. Second, when the last subtask of a critical chain completes, the corresponding job finishes — this is from the construction of the critical chain. Thus the scheduling window of a critical chain of — i.e., from the release time of its first subtask to the completion time of its last subtask — is also the scheduling window of job — i.e., from the job’s release time to its completion time. Third, consider a critical chain of : at any instant during the scheduling window of , either a critical subtask of is executed or a critical subtask of is ready but not executed because all processors are busy executing subtasks not belonging to , including noncritical subtasks of job and subtasks from other tasks (see Figure 2). Therefore, the responsetime of a critical chain of is also the responsetime of . Hence if we can upperbound the responsetime of a critical chain for any job of , that bound also serves as an upperbound for the responsetime of .
The third property of the critical chain suggests that we can partition the scheduling window of a job into two sets of intervals. One includes all intervals during which critical subtasks of are executed and the other includes all intervals during which a critical subtask of is ready but not executed. The total length of the intervals in the second set is called the critical interference of . We include definitions for critical interference and interference caused by an individual task on as follows.
Definition 4.2 ().
Critical interference on a job of task is the aggregated length of all intervals in [a, b) during which a critical subtask of is ready but not executed.
Definition 4.3 ().
Critical interference on a job of task due to task is the aggregated processor time from all intervals in [a, b) during which one or more subtasks of are executed and a critical subtask of is ready but not executed.
In Figure 2, the critical interference of is the sum of the lengths of intervals , , , and which is 7. The critical interference caused by a task is the total processor time of in those four intervals. Note that may execute simultaneously on multiple processors, and we must sum its processor time on all processors. From the definition of critical interference, we have:
(1) 
4.2. A General Method for Bounding ResponseTime
We now discuss a general framework for bounding responsetime in GFP that is used in this work and was also employed by the stateoftheart analyses (Melani et al., 2015; Fonseca et al., 2017). Based on the definitions of critical chain and critical interference, the responsetime of is:
where is a critical chain of and is its length (see Figure 2 for example). Applying Equation 1 we have:
(2) 
where is the set of tasks with higher priorities than ’s. Thus if we can bound the righthand side of Equation 2, we can bound the responsetime of . To do so, we bound the contributions to ’s responsetime caused by subtasks of itself and by jobs of higherpriority tasks separately.
4.2.1. IntraTask Interference
The sum , which includes the intratask interference on the critical chain of caused by noncritical subtasks of , is bounded by Lemma V.3 in (Melani et al., 2015). We include the bound below.
Lemma 4.4 ().
The following inequality holds for any task scheduled by any workconserving algorithm:
4.2.2. InterTask Interference
Now we need to bound the intertask interference on the righthand side of Equation 2. Since the interference caused by a task in an interval is at most the workload generated by the task during that interval, we can bound using the bound for the workload generated by in the interval . Let denote the maximum workload generated by in the interval . Let denote the maximum workload generated by in any interval of length . The following inequality holds for any :
(3) 
Let the problem window be the interval of interest with length . The jobs of
that may generate workload within the problem window are classified into three types: (i) A
carryin job is released strictly before the problem window and has a deadline within it, (ii) A carryout job is released within the problem window and has its deadline strictly after it, and (iii) body jobs have both release time and deadline within the problem window. Similar to analyses for sequential tasks (e.g., Bertogna et al. (Bertogna and Cirinei, 2007)), the maximum workload generated by in the problem window can be attained with a release pattern in which (i) jobs of are released as quickly as possible, meaning that the gap between any two consecutive releases is exactly the period , (ii) the carryin job finishes as late as its worstcase finishing time, and (iii) the body jobs and the carryout job start executing as soon as they are released. Figure 3 shows an example of such a jobrelease pattern of an interfering task with the DAG structure shown in Figure 1.However, unlike sequential tasks, analysis for parallel DAG tasks is more challenging in two aspects. First, it is not obvious which schedule for the subtasks of the carryin (carryout) job would generate maximum carryin (carryout) workload. This is because the parallelism of a DAG task can vary depending on its internal graph structure. Second, for the same reason, aligning the problem window’s start time with the start time of the carryin job of may not correspond to the maximum workload generated by . For instance, in Figure 3 if we shift the problem window to the right 2 time units, the carryin job’s workload loses 2 time units but the carryout job’s workload gains 5 time units. The total workload thus increases 3 time units. Therefore in order to compute the maximum workload generated by we must slide the problem window to find a position that corresponds to the maximum sum of the carryin workload and carryout workload. We discuss an existing method for computing carryin workload in Section 5 and our technique for computing carryout workload in Section 6. In Section 7, we combine those two bounds in a responsetime analysis and explain how we slide problem windows to compute maximum workloads.
We note that the maximum workload generated by each body job does not depend on the schedule of its subtasks and is simply its total work. Furthermore, regardless of the position of the problem window, the workload contributed by the body jobs, denoted by , is bounded as follows.
Lemma 4.5 ().
The workload generated by the body jobs of task in a problem window with length is upperbounded by
Proof.
Consider the case where the start of the problem window is aligned with the starting time of the carryin job, as shown in Figure 3. The number of body jobs is at most . Thus for this case the workload of the body jobs is at most .
Shifting the problem window to the left or right can change the workload contributed by the carryin and carryout jobs but does not increase the maximum number of body jobs or their workload. The bound thus follows. ∎
Let the carryin window and carryout window be the intervals within the problem window during which the carryin job and the carryout job are executed, respectively. Intuitively, the carryin window spans from the start of the problem window to the completion time of the carryin job; the carryout window spans from the starting time of the carryout job to the end of the problem window. We denote the lengths of the carryin window and carryout window for task by and respectively. The sum of and is:
(4) 
Let be the maximum carryin workload of for a carryin window of length . Similarly, let be the maximum carryout workload of for a carryout window of length . The maximum workload generated by in any problem window of length can be computed by taking the maximum over all and that satisfy Equation 4:
(5) 
Therefore if we can bound and , we can bound the intertask interference of on and thus the responsetime of .
5. The StateoftheArt Analysis for GFP
Melani et al. (Melani et al., 2015) proposed a responsetime analysis for GFP scheduling of conditional DAG tasks that may contain conditional vertices, for modeling conditional constructs such as ifthenelse statements. They bounded the interfering workload by assuming that jobs of the interfering task execute perfectly in parallel on all processors. Their bound for the interfering workload is computed as follows.
Figure 4 illustrates the workload computation for an interfering task given in (Melani et al., 2015). As shown in this figure, both carryin and carryout jobs are assumed to execute with perfect parallelism upon processors. Thus their workload contributions in the considered window are maximized. This assumption simplifies the workload computation as it ignores the internal DAG structures of the interfering tasks. However, assuming that DAG tasks have such abundant parallelism is likely unrealistic and thus makes the analysis pessimistic.
Fonseca et al. (Fonseca et al., 2017) later considered a task model similar to the one in this paper and proposed a method to improve the bounds for carryin and carryout workloads by explicitly considering the DAGs. The carryin workload was bounded using a hypothetical schedule for the carryin job, in which the carryin job can use as many processors as it needs to fully exploit its parallelism. They proved that the carryin workload of the hypothetical schedule is maximized when: (i) the hypothetical schedule’s completion time is aligned with the worstcase completion time of the interfering task, (ii) every subtask in the hypothetical schedule starts executing as soon as all of its predecessors finish, and (iii) every subtask in the hypothetical schedule executes for its full WCET. Figure 3 shows the hypothetical schedule of the carryin job for the task in Figure 1. In this paper, we adopt their method for computing carryin workload. In particular, the carryin workload of task with a carryin window of length , i.e., from the start of the problem window to the completion time of the carryin job (see Figure 3), is computed as follows.
(6) 
In Equation 6, is the start time of subtask in the hypothetical schedule for the carryin job described above. It can be computed by taking a longest path among all paths from source subtasks to and adding up the WCETs of the subtasks along that path excluding itself.
For the carryout workload, (Fonseca et al., 2017) considered a subset of generalized DAG tasks, namely nested forkjoin DAG (NFJDAG) tasks. A NFJDAG is constructed recursively from smaller NFJDAGs using two operations: series composition and parallel composition. Figure 4(b) shows an example NFJDAG task. Figure 4(a) shows a similar DAG with one more edge . The DAG in Figure 4(a) is not a NFJDAG due to a single cross edge . To deal with a non NFJDAG, (Fonseca et al., 2017) first transforms the original DAG to a NFJDAG by removing the conflicting edges, such as in Figure 5. Then they compute the upperbound for the carryout workload using the obtained NFJDAG. The computed bound is proved to be an upperbound for the carryout workload. We note that the transformation removes some precedence constraints from the original DAG, and thus the resulting NFJDAG may have higher parallelism than the original DAG. Hence, computing the carryout workload of a generalized DAG task via its transformed NFGDAG may be pessimistic, especially for a complex DAG, as the transformation may remove many edges from the original DAG.
In this paper, we propose a new technique to directly compute an upperbound for the carryout workload of generalized DAG task. The high level idea is to frame the problem of finding the bound as an optimization problem, which can be solved effectively by solvers such as the CPLEX (IBM ILOG CPLEX Optimizer, 2019), Gurobi (Gurobi Solver, 2019), or SCIP (SCIP Solver, 2019). The solution of the optimization problem then serves as a safe and tight upperbound for the carryout workload. In the next section we present our method in detail.
6. Bound for CarryOut Workload
In this section we propose a method to bound the carryout workload that can be generated by a job of task by constructing an
integer linear program
(ILP) for which the optimal solution value is an upperbound of the carryout workload.Consider a carryout job of task , which is scheduled with an unrestricted number of processors, meaning that it can use as many processors as it requires to fully exploit its parallelism. Each subtask of the carryout job executes as soon as it is ready, i.e., immediately after all of its predecessors have finished. We label such a schedule for the carryout job . We prove in the following lemma that the workload generated by is an upperbound for the carryout workload.
Lemma 6.1 ().
For specific values of the execution times for the subtasks of , workload generated by in a carryout window of length is an upperbound for the carryout workload generated by with the given subtasks’s execution times.
Proof.
We prove by contradiction. Consider a schedule for the carryout job in which subtasks execute for the same lengths as in . Suppose subtask is the first subtask in time order that produces more workload in than it does in . This means must have started executing earlier in than it have in . Hence, must have started its execution before all of its predecessors have finished in . This is impossible and the lemma follows. ∎
Unlike the carryin workload, the carryout workload generated when all subtasks execute for their full WCETs is not guaranteed to be the maximum. Consider an interfering task shown in Figure 1 and a carryout window of length 3 time units. If all subtasks of the carryout job of execute for their WCETs, the carryout workload would be 4 time units, as shown in Figure 5(a). However, if subtask finishes immediately, i.e., executes for 0 time units, the carryout workload would be 7 time units, as shown in Figure 5(b). From Lemma 6.1 and the discussion above, to compute an upperbound for carryout workload we must consider all possible execution times of the subtasks and subtasks must execute as soon as they are ready.
For each subtask of the carryout job of an interfering task , we define two nonnegative integer variables and . represents the actual execution time of subtask in the carryout job and denotes the contribution of subtask to the carryout workload. Let be an integer constant denoting the length of the carryout window. Then the carryout workload is the sum of the contributions of all subtasks in , which is upperbounded by the maximum of the following optimization objective function:
(7) 
The optimal value for the above objective function gives the actual maximum workload generated by the carryout job with unrestricted number of processors. We now construct a set of constraints on the contribution of each subtask in to the carryout workload. From the definitions of and , we have the following bounds for them.
Constraint 1 ().
For any interfering task :
Constraint 2 ().
For any interfering task :
These two constraints come from the fact that the actual execution time of subtask cannot exceed its WCET, and each subtask can contribute at most its whole execution time to the carryout workload. Let be the starting time of in assuming that the carryout job starts at time instant 0. For simplicity of exposition, we assume that the DAG has exactly one source vertex and one sink vertex. If this is not the case, we can always add a couple of dummy vertices, and , with zero WCETs for source and sink vertices, respectively. Then we add edges from to all vertices with no predecessors in the original DAG , and edges from all vertices with no successors in to . Without loss of generality, we assume that and are the source vertex and sink vertex of , respectively. Let denote a path from the source to : , where , , and is an edge in . Let denote the set of all paths from to in : . for all subtasks can be constructed by a graph traversal algorithm. For instance, a simple modification of depthfirst search would accomplish this.
For a particular path , the sum of execution times of all subtasks in this path, excluding is called the distance to with respect to this path. We let be a variable denoting the distance to in path . We impose the following two straightforward constraints on based on its definition.
Constraint 3 ().
For any interfering task :
Constraint 4 ().
For any interfering task :
In the schedule , the starting time of a subtask cannot be smaller than the distance to in any path . We prove this as follows.
Lemma 6.2 ().
In the schedule of any interfering task :
Proof.
We prove by contradiction. Let be a path so that the starting time is smaller than . Subtask must be ready to start execution, meaning all of its predecessors must finish, at time . Since , there must be a subtask executing (and thus not finished) at time . Then cannot be ready at time since it depends on . This contradicts the assumption that is ready at and the lemma follows. ∎
In fact, in the schedule the starting time of is equal to the longest distance among all paths to it.
Lemma 6.3 ().
In the schedule of any interfering task :
Proof.
Consider a path constructed as follows. First we take a lastcompleting predecessor of , say . Since executes as soon as it is ready, it executes immediately after finishes. We recursively trace back through the lastcompleting predecessors in that way until we reach the source vertex . Path is then constructed by chaining the lastcompleting predecessors together with . We note that any subtask in executes as soon as its immediately preceding subtask finishes, since no other predecessors of finish later than it does. Therefore, . From Lemma 6.2, must have the longest distance to among all paths in . Thus the lemma follows. ∎
Constraint 5 ().
For any interfering task :
Proof.
We prove that this constraint requires that of every subtask for which satisfies Lemma 6.3, that is . (Recall that is a constant denoting the carryout window’s length.) In other words, we prove that it requires that every subtask , which would start executing within the carryout window in an unrestrictedprocessor schedule , gets exactly the same starting time from the solution to the optimization problem. Let denote the collection of such subtasks — the ones that would start executing within the carryout window in .
Let be the solution to the optimization problem and be the corresponding value for the starting time of any subtask in the solution . Obviously for any since any solution to the optimization problem satisfies this constraint. If for any , then we are done. Suppose instead that , for some . Let denote the set of such subtasks. We construct a solution to the optimization problem from as follows. Consider a first subtask in time. We reduce its starting time by : . Since is the first delayed subtask, doing this does not violate the precedence constraints for other subtasks. We iteratively perform that operation for other subtasks in in increasing time order. The solution constructed in this way yields a larger carryout workload since more workload from individual subtasks can fit in the carryout window. Therefore is a better solution, which contradicts the assumption that is an optimal solution. ∎
The workload contributed by a subtask is:
.
The second part of the outer minimization has been taken care of by Constraint 2.
We now construct constraints to impose the first part of the minimization.
Let be an integer variable representing the expression .
Let
be a binary variable which takes value either 0 or 1. We have the following constraints.
Constraint 6 ().
For any interfering task :
Constraint 7 ().
For any interfering task :
Constraint 8 ().
For any interfering task :
Constraints 7 and 8 bound the value for and Constraint 6 enforces another upper bound for the workload . If , can only be 0 in order to satisfy both Contraints 7 and 8. If , the value of does not matter. In both cases, these three constraints together with Constraint 2 bound to zero contribution of to the carryout workload. If , the maximizing process enforces that takes value 1. Therefore in any case Constraints 2, 6, 7, and 8 enforce a correct value for the workload contribution of .
We have constructed an ILP with a quadratic constraint (Constraint 8) for each , for which the optimal solution value is an upper bound for the carryout workload. The carryout workload of in a carryout window of length can also be upperbounded by the following straightforward lemma.
Lemma 6.4 ().
The carryout workload of an interfering task scheduled by GFP in a carryout window of length is upperbounded by .
Lemma 6.4 follows directly from the fact that the carryout job can execute at most on all processors of the system during the carryout window. Since the carryout workload of is upperbounded by both the maximum value returned for the optimization problem and Lemma 6.4, it is upperbounded by the minimum of the two quantities.
Theorem 6.5 ().
The carryout workload of an interfering task scheduled by GFP in a carryout window of length is upperbounded by: , where is the maximum value returned for the maximization problem (Equation 7).
As discussed in Section 5, the technique proposed by Fonseca et al. (Fonseca et al., 2017) can be applied directly for NFJDAGs but not for general DAGs. For a general DAG, the procedure to transform the general DAG to an NFJDAG will likely inflate the carryout workload bound as it removes some precedence constraints between subtasks and enables a higher parallelism (and thus a greater interfering workload) for the carryout job. In contrast, our method directly bounds the carryout workload for any DAG and the optimal value obtained is the actual maximum carryout workload. Hence, our method theoretically yields better schedulability than (Fonseca et al., 2017)’s for general DAGs. The cost of our method is higher time complexity for computing carryout workload due to the hardness of the ILP problem. However, it can be implemented and works effectively with modern optimization solvers, as we show in our experiments (Section 8).
7. ResponseTime Analysis
From the above calculations for the bounds of intratask interference and intertask interference on , we have the following theorem for the responsetime bound of .
Theorem 7.1 ().
A constraineddeadline task scheduled by a global fixedpriority algorithm has responsetime
upperbounded by the smallest integer that satisfies the following fixedpoint iteration:
Proof.
In Theorem 7.1, is computed using Equation 5 for all carryin and carryout windows that satisfy Equation 4. For specific carryin and carryout window lengths, the carryin workload is bounded using Equation 6 and the carryout workload is bounded as discussed in Section 6. The lengths for carryin window and carryout window are varied as follows. Let denote the righthand side of Equation 4. First takes its largest value: , and takes the remaining sum: . Then in each subsequent step, is decreased and is increased until takes its largest value and takes the remaining value. We note that if at the first step both and are greater than or equal to , the carryin workload and carryout workload are bounded by and , respectively. Similarly, if the sum of and is 0 in Equation 4, both the carryin workload and the carryout workload are 0. We also note that for the highest priority task, there is no interference from any other task, and thus its responsetime bound can be computed simply by: .
Using the above responsetime bound, we derive a schedulability test, shown in Algorithm 1. First we initialize the responsetimes for the tasks to be for all tasks . If for any task, the initial responsetime is larger than its relative deadline, then the task set is deemed unschedulable (lines 27). Otherwise, we repeatedly compute the responsetime bound for each task in descending order of priority using the fixedpoint iteration in Theorem 7.1 (line 10). After the computation for each task finishes, we check whether the responsetime bound is larger than its deadline. If it is, then the task set is deemed unschedulable (lines 1113). Otherwise, the task set is deemed schedulable after all tasks have been checked (line 15).
As expected for responsetime analysis, for each task the number of iterations in the fixedpoint equation (Theorem 7.1) is pseudopolynomial in the task’s deadline (line 10). In each iteration of the fixedpoint equation and for each interfering task, we consider all combinations of carryin and carryout window lengths that satisfy Equation 4 to compute the maximum interfering workload. There are such combinations, and thus the ILP for the carryout workload is solved times. The maximum workload over all combinations of carryin and carryout window lengths gives an upperbound for the interfering workload generated by the given interfering task.
8. Evaluation
As we discussed in Sections 5 and 6, we apply a similar, highlevel framework for analyzing schedulability of GFP scheduling to the one used by Fonseca et al. (Fonseca et al., 2017) — i.e., accounting for the interfering workloads caused by the body jobs, the carryin and carryout jobs separately, and maximizing the interference by sliding the problem window. However, unlike (Fonseca et al., 2017) our technique for bounding carryout workload works directly for general DAGs and does not introduce pessimism due to the removal of precedence constraints between subtasks, as presented in (Fonseca et al., 2017). Though for carryin workload, we reuse the result from (Fonseca et al., 2017). Hence, we consider our work as a generalization/extension of (Fonseca et al., 2017) that can be applied for general sporadic DAG tasks. The performance of our method in term of schedulability ratio is compatible with (Fonseca et al., 2017)’s — it theoretically is at least as good as (Fonseca et al., 2017) for NFJDAGs and is better than (Fonseca et al., 2017) for non NFJDAGs. We thus focus on measuring the performance of our method and use the work by Melani et al. (Melani et al., 2015) as a reference for evaluating the improvement of our method upon their simple one.
We applied the ErdősRényi method, described in (Cordeiro et al., 2010), to generate DAG tasks. In this method the number of subtasks, given by parameter in
, is first fixed. Then, directed edges between pairs of vertices are added with probability
. Since the obtained DAG may not necessarily be connected, we added a minimum number of edges to make it weakly connected. In our experiments, the probability for a directed edge to be added is . We chose the number of subtasks uniformly in the range . Other parameters for each DAG task were generated similarly to (Melani et al., 2015). In particular, the WCETs of subtasks of were generated uniformly in the range . After that, the work and span were calculated. ’s utilization was generated uniformly in the range , where is a parameter to control the minimum task’s utilization and represents the degree of parallelism of task . ’s deadlinewas generated using a normal distribution with mean equal to
and standard deviation equal to
. We kept generating the relative deadline until a value in the range was obtained.To generate a task set for a given total utilization, we repeatedly add DAG tasks to the task set until the desired utilization is reached. The utilization (and period) of the last task may need to be adjusted to match the total utilization. We used the SCIP solver (SCIP Solver, 2019) with CPLEX (IBM ILOG CPLEX Optimizer, 2019) as its underlying LPsolver to compute the bound for carryout workload. For our experiments, we set the default minimum utilization of individual tasks to . For each configuration we generated 500 task sets and recorded the ratios of task sets that were deemed schedulable. We compare our responsetime analysis, denoted by DGARTA, with the responsetime analysis introduced in (Melani et al., 2015), denoted by MBBRTA. For all generated task sets, priorities were assigned in Deadline Monotonic order — studying an efficient priority assignment scheme for GFP is beyond the scope of this paper.
Figures 6(a), 6(b), 6(c), and 6(d) show representative results for our experiments. In Figure 6(a) and 6(b), we fixed the total number of processors and varied the total utilization from 1.0 to 14.0. The minimum task utilization was set to and in these two experiments, respectively. Unsurprisingly, DGARTA dominates MBBRTA, as also observed in (Fonseca et al., 2017). Notably, its schedulability ratios for some configurations are two times or more greater than MBBRTA, e.g., for total utilizations of 8.0, 9.0 in Figure 6(a), and 7.0, 8.0 in Figure 6(b). In Figures 6(c) and 6(d), we fixed the normalized total utilization and varied the number of processors from 2 to 36. For each value of , we generated task sets with total utilization or for these two experiments, respectively. Similar to the previous experiments, the schedulability ratios of the generated task sets were improved significantly using DGARTA compared to MBBRTA.
To provide a tradeoff between computational complexity and accuracy of schedulability test, one can employ our analysis in combination with the analysis presented in (Fonseca et al., 2017) by first applying their responsetime analysis and then using our analysis if the task set is deemed unschedulable by (Fonseca et al., 2017). In this way, one can get the best result from both analyses.
9. Conclusion
In this paper we consider constraineddeadline, parallel DAG tasks scheduled under a preemptive, GFP scheduling algorithm on multiprocessor platforms. We propose a new technique for bounding carryout workload of interfering task by converting the calculation of the bound to an optimization problem, for which efficient solvers exist. The proposed technique applies directly to general DAG tasks. The optimal solution value for the optimization problem serves as a safe and tight upper bound for carryout workload. We present a responsetime analysis for GFP based on the proposed workload bounding technique. Experimental results affirm the dominance of the proposed approach over existing techniques. There are a couple of open questions that we would like to address in future. They include bounding carryin and carryout workloads for the actual number of processors of the system and designing an efficient priority assignment scheme for parallel DAG tasks scheduled under GFP algorithm.
References
 (1)
 Axer et al. (2013) Philip Axer, Sophie Quinton, Moritz Neukirchner, Rolf Ernst, Björn Döbel, and Hermann Härtig. 2013. Responsetime analysis of parallel forkjoin workloads with realtime constraints. In 25th Euromicro Conference on RealTime Systems, 2013. IEEE, 215–224.
 Baker (2003) Theodore Baker. 2003. Multiprocessor EDF and deadline monotonic schedulability analysis. In 24th RealTime Systems Symposium, 2003. IEEE, 120–129.
 Baruah et al. (2012) Sanjoy Baruah, Vincenzo Bonifaci, Alberto MarchettiSpaccamela, Leen Stougie, and Andreas Wiese. 2012. A generalized parallel task model for recurrent realtime processes. In 33rd RealTime Systems Symposium, 2012. IEEE, 63–72.
 Bertogna and Cirinei (2007) Marko Bertogna and Michele Cirinei. 2007. Responsetime analysis for globally scheduled symmetric multiprocessor platforms. In 28th RealTime Systems Symposium, 2007. IEEE, 149–160.
 Bertogna et al. (2005) Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. 2005. Improved schedulability analysis of EDF on multiprocessor platforms. In 17th Euromicro Conference on RealTime Systems, 2005. IEEE, 209–218.
 Bertogna et al. (2009) Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. 2009. Schedulability analysis of global scheduling algorithms on multiprocessor platforms. IEEE Transactions on parallel and distributed systems 20, 4 (2009), 553–566.
 Bonifaci et al. (2013) Vincenzo Bonifaci, Alberto MarchettiSpaccamela, Sebastian Stiller, and Andreas Wiese. 2013. Feasibility analysis in the sporadic DAG task model. In 25th Euromicro Conference on RealTime Systems, 2013. IEEE, 225–233.
 Chwa et al. (2017) Hoon Sung Chwa, Jinkyu Lee, Jiyeon Lee, KiewMy Phan, Arvind Easwaran, and Insik Shin. 2017. Global EDF schedulability analysis for parallel tasks on multicore platforms. IEEE Transactions on Parallel and Distributed Systems 28, 5 (2017), 1331–1345.
 Chwa et al. (2013) Hoon Sung Chwa, Jinkyu Lee, KieuMy Phan, Arvind Easwaran, and Insik Shin. 2013. Global EDF schedulability analysis for synchronous parallel tasks on multicore platforms. In 25th Euromicro Conference on RealTime Systems, 2013. IEEE, 25–34.
 Cordeiro et al. (2010) Daniel Cordeiro, Grégory Mounié, Swann Perarnau, Denis Trystram, JeanMarc Vincent, and Frédéric Wagner. 2010. Random graph generation for scheduling simulations. In Proceedings of the 3rd international ICST conference on simulation tools and techniques. ICST (Institute for Computer Sciences, SocialInformatics and Telecommunications Engineering), 60.
 Ferry et al. (2014) David Ferry, Gregory Bunting, Amin Maghareh, Arun Prakash, Shirley Dyke, Kunal Agrawal, Chris Gill, and Chenyang Lu. 2014. Realtime system support for hybrid structural simulation. In Proceedings of the 14th International Conference on Embedded Software. ACM, 1–10.
 Fonseca et al. (2017) José Fonseca, Geoffrey Nelissen, and Vincent Nélis. 2017. Improved response time analysis of sporadic DAG tasks for global FP scheduling. In Proceedings of the 25th International Conference on RealTime Networks and Systems. ACM, 28–37.
 Frigo et al. (1998) Matteo Frigo, Charles E Leiserson, and Keith H Randall. 1998. The implementation of the Cilk5 multithreaded language. ACM Sigplan Notices 33, 5, 212–223.
 Guan et al. (2009) Nan Guan, Martin Stigge, Wang Yi, and Ge Yu. 2009. New response time bounds for fixed priority multiprocessor scheduling. In 30th RealTime Systems Symposium, 2009. IEEE, 387–397.
 Gurobi Solver (2019) Gurobi Solver. 2019. http://www.gurobi.com/index.
 IBM ILOG CPLEX Optimizer (2019) IBM ILOG CPLEX Optimizer. 2019. https://www.ibm.com/analytics/cplexoptimizer.
 Intel Cilk Plus (2019) Intel Cilk Plus. 2019. https://www.cilkplus.org/.
 Intel Threading Building Blocks (2019) Intel Threading Building Blocks. 2019. https://www.threadingbuildingblocks.org/.
 Kim et al. (2013) Junsung Kim, Hyoseung Kim, Karthik Lakshmanan, and Ragunathan Raj Rajkumar. 2013. Parallel scheduling for cyberphysical systems: Analysis and case study on a selfdriving car. In Proceedings of the ACM/IEEE 4th International Conference on CyberPhysical Systems. ACM, 31–40.
 Lakshmanan et al. (2010) Karthik Lakshmanan, Shinpei Kato, and Ragunathan Raj Rajkumar. 2010. Scheduling parallel realtime tasks on multicore processors. In 31st IEEE RealTime Systems Symposium, 2010. IEEE, 259–268.
 Maia et al. (2014) Cláudio Maia, Marko Bertogna, Luís Nogueira, and Luis Miguel Pinho. 2014. Responsetime analysis of synchronous parallel tasks in multiprocessor systems. In Proceedings of the 22nd International Conference on RealTime Networks and Systems. ACM, 3.
 Melani et al. (2015) Alessandra Melani, Marko Bertogna, Vincenzo Bonifaci, Alberto MarchettiSpaccamela, and Giorgio C Buttazzo. 2015. Responsetime analysis of conditional DAG tasks in multiprocessor systems. In 27th Euromicro Conference on RealTime Systems, 2015. IEEE, 211–221.
 OpenMP (2019) OpenMP. 2019. https://www.openmp.org/.
 Saifullah et al. (2013) Abusayeed Saifullah, Jing Li, Kunal Agrawal, Chenyang Lu, and Christopher Gill. 2013. Multicore realtime scheduling for generalized parallel task models. RealTime Systems 49, 4 (2013), 404–435.
 SCIP Solver (2019) SCIP Solver. 2019. http://scip.zib.de/.
Comments
There are no comments yet.