Timing Analysis for DAG-based and GFP Scheduled Tasks

06/04/2014 ∙ by José Marinho, et al. ∙ Politécnico do Porto 0

Modern embedded systems have made the transition from single-core to multi-core architectures, providing performance improvement via parallelism rather than higher clock frequencies. DAGs are considered among the most generic task models in the real-time domain and are well suited to exploit this parallelism. In this paper we provide a schedulability test using response-time analysis exploiting exploring and bounding the self interference of a DAG task. Additionally we bound the interference a high priority task has on lower priority ones.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The strive for higher computational power has brought about the multicore platforms as a compelling solution first in general purpose and now also in the embedded real-time systems arena. Rather than relying on the increase of the throughput of single processors, the multicore paradigm, while providing its ability to perform a greater number of simultaneous calculations, has given rise to a new challenge. It often forces system designers to utilize the hardware facilities and use parallel algorithms in order to perform tasks of high computational demand in a predefined time window. However, this implies a subtle difference in the way schedulability conditions are posed since parts of the workload from the same task are allowed to execute concurrently; each task is then referred to as a parallel or Directed Acyclic Graph () task. This paper presents a framework to address this issue for fully preemptive global fixed task priority () schedulers and homogeneous multicores in which all cores have the same computing capabilities and are interchangeable. It is worth to mention that schedulers are commonly adopted and supported out of the box on several industry grade real-time operating systems such as  [11].

Related Work

Valuable works such as [5, 12, 1, 9, 4] addressed the scheduling problem of tasks upon homogeneous multicores. Saifullah et al. [12] presented a method to decompose a generic task into a set of virtual sequential tasks and after the decomposition, the popular global earliest deadline first (GEDF) density-based schedulability test is applied. Andersson and Niz [1] presented an analysis for GEDF where an upper bound on the workload that each task may execute in a given time window is computed. Nevertheless, this upper-bound is computed for a special case of DAG tasks, namely the “fork-join” tasks. For such a task: () the parallel workloads have the same execution requirement; () they are spawned after a common point; and () they join again after a common point. Note: When a task is executing a section of workload in parallel no further path forks can occur. Chwa et al. [5] provided a method to compute the interference that each task would suffer in a system of so-called “synchronous parallel” tasks – Each task is composed of multiple and potentially contiguous regions of parallel workloads with distinct parallelism levels –. In more than one aspect tasks cover a broader area as they allow for parallel workloads to yield distinct execution requirements and a different immediate predecessor for each node. Previous works using schedulers exist, but in a partitioned environment, i.e., tasks are assigned to cores at design time and no migration is allowed at runtime [6, 8]. For example, Lakshmanan et al. [8] presented a basic form of tasks, namely “Gang tasks”, in which all the parallel workloads have to be scheduled simultaneously on the processing platform.

This Research

In this paper, we present a sufficient schedulability test applicable to constrained deadline DAG tasks (see Section II for a formal definition) scheduled by using a GFTP scheduler on a homogeneous multicore platform.

Ii System Model

Fig. 1: Task

Task specifications

We consider a task-set composed of sporadic tasks. Each sporadic task , , is characterized by a , a relative deadline and a minimum timespan  (also called period) between two consecutive activations of . These parameters are given with the following interpretation. Nodes in

(also called sub-jobs in the literature) stand for a vector of execution requirements at each activation of

, and the edges represent dependencies between the nodes. A node is denoted by , with , where is the total number of nodes in . The execution requirement of node is denoted by . A direct edge from node to node , denoted as , implies that the execution of cannot start unless that of has completed. In this case, is called a parent of , while is its child. We denote the set of all children of node by and the set of all parents of node by . If , then and may execute concurrently. In this case, we state that , and reversely, . A node without parent is called an entry node, while a node without child is called an exit node. We assume that a node can start executing only after all its parents have completed. For brevity sake, we consider only tasks with a single entry and exit nodes. For each task , we assume , which is commonly referred to as the constrained deadline task model. Figure 1 illustrates a task with nodes. Note: the analysis presented in this paper is easily tunable for tasks with multiple entry and exit nodes.

The total execution requirement of , denoted by , is the sum of the execution requirements of all the nodes in , i.e., . The task set is said to be -schedulable, if can schedule such that all the nodes of every task meet its deadline .

Definition 1 (Critical path).

A critical path for task , denoted by , is a directed path that has the maximum execution requirement among all paths in .

Definition 2 (Critical path length).

The critical path length for task , denoted by , is the sum of execution requirements of the nodes belonging to a critical path in .

Platform and scheduler specifications

We consider a platform consisting of -unit capacity cores, and a fully preemptive scheduler. That is: () a priority is assigned to each task at system design-time and then, at run-time, every node inherits the priority of the task it belongs to; () different nodes of the same task may execute upon different cores; and finally () a preempted node may resume execution upon the same or a different core, at no cost or penalty. We assume that each node may execute on at most one core at any time instant and that the lower the index of a task the higher its priority.

Iii Timing Analysis and Self-Interference Extraction

Intrinsically, some nodes of a given task may prevent some others of the same task from executing. This constitutes a form of self-interference. Since may be viewed as a set of paths, say , each path represents a set of sequential nodes in connected to each other via an edge, i.e., from the view-point of any node of , the other nodes of are either children or parents. We denote the complementary set of which contains all the nodes that do not belong to by . Note: the nodes in are not necessarily concurrent to all the nodes in .

Let be the set of all partial paths in which connect nodes and , and let be a specific path. For brevity sake, we denote by for the remainder of this paper. Since by definition of , each path has a worst-case execution requirement which is computed by summing up the execution requirements of all its nodes, i.e., . Note: also has a critical path defined as in Definition 1, i.e., the path with the largest execution requirement between and . It is fairly straightforward that if either or is not part of the end-to-end critical path then it follows that the critical partial path between and is not contained in either. Now we can quantify the maximum self-interference that a task may generate on a given subset of .

Let denote the worst-case response time of the task – The response time of every activation of is the timespan between its workload completion and its release – Hence is the largest value from all the activations of . On the roadway for the computation of an upper-bound on , there are some important checkpoints we must investigate.

Definition 3 (partial worst-case response time).

The partial worst-case response time of the set of partial paths is the largest timespan between node completion time and node release time.

Lemma 1 (Critical Self-interference Path).

Considering only self-interference, the partial path of which leads to the worst-case response time of is the critical partial path among all partial paths in .

Proof (made by contradiction).

Initially for any other partial path . Baker and Cirinei [2] provided an upper-bound on the interference of a Liu & Layland (LL) task (in the LL model, each task generates a potentially infinite sequence of jobs and is characterized by a 3-tuple , where is the worst-case execution time of each job, is the relative deadline and is the minimum inter-arrival time between two consecutive jobs of ) on a -multicore platform (). In this work, we extend this result to compute the interference that concurrent nodes induce on in the same manner (see Eq. 1).

(1)

Let us assume that for some we have:

(2)

Then it follows that:

(3)

Since

(4)

By substituting Eq. (4) into Eq. (3), Eq. (2) leads us to:

(5)

which trivially means , contradicting the initial assumption. The Lemma follows. ∎

Informally speaking Lemma 1 infers, for any non-parallel pair of fringe nodes and , that an upper-bound on the response time of is obtained by considering between any and . At the same time, the nodes which do not belong to are assumed to induce the maximum interference over it. As this is proven for any pair of nodes, the result also holds for the extreme nodes.

Fig. 2: Earliest and latest release times for nodes in a

Now we focus on deriving the critical path in . For every node in , we denote by and its earliest and latest release times, respectively. Note: These quantities can be computed through a breadth-first [10] traversal of . Assuming and are the entry and exit nodes of , the earliest release time of any node without any interference can be computed as follows.

(6)
(7)

where is the minimum execution requirement of . In the same manner, a breath-first traversal of starting from provides the latest release time of as follows.

(8)
(9)
(10)

Eq. (7) and Eq. (10) clearly represent a lower- and an upper-bound on the best-case and worst-case start times of node , respectively. This can be observed in the following two scenarios:  Node does not suffer any external interference and all its parents request for their minimum execution requirements purveys ;  Node suffers the maximum possible external interference and its parents request for their maximum execution requirements purveys . With these equations, we can derive the worst-case response time of in isolation, denoted by . To do so, without explicitly referring to , we compute the critical path length of as the two problems can be addressed separately. From Eq. (8) and (9) and by starting from the exit node of , is obtained as follows.

(11)
(12)
(13)

For any , the execution requirement of the nodes in is yielded by . From Lemma 1, an upper-bound on the response time of task , including only the self-interference is thus given by:

(14)

Iv Upper-bound on the Interference and Schedulability Condition

In this section we provide an upper-bound on the interference of any task and we derive a sufficient schedulability condition. To this end, we distinguish between two scenarios:  The scenario where does not suffer any interference from higher priority tasks, and  The scenario where suffers the maximum possible interference. For brevity sake we assume that all tasks have carry-in at this stage, and will relax this assumption in Section V.

Regarding Scenario , we recall that is a lower-bound on the release time of node . This leads to an upper-bound function on the workload request of at any time (see Figure 3) defined as follows.

(15)

Since the workload request of the task is the sum over the workload requests of all its nodes, then an upper-bound on the workload request of at any time (see Figure 4) is defined as follows.

(16)

where and .

Regarding scenario , is assigned to a core at most time units after the task is released and is released at most time units after node has started execution. This leads to a lower-bound function on the workload request of at any time (see Figure 3) defined as follows.

(17)
Fig. 3: Extreme cases for node execution requirements
Fig. 4: Extreme cases for task execution requirements

A lower-bound on the workload request of at any time (see Figure 4) is thus defined as follows.

(18)

where .

Eq. (16) and Eq. (18) can be used to obtain an upper-bound on the workload request of in a time window of length . To this end, we consider that an activation of occurs time units prior to the beginning of the targeted window. Then two situations can lead to increasing the workload request of in the window:  At the beginning of the window, say at time , suffers the maximum possible interference and its nodes are released as late as possible;  At the end of the window, say at time , does not suffer any interference and its nodes are released as early as possible.

Lemma 2 (Upper-bound on the Workload of with Carry-in).

Assuming task has carry-in, an upper-bound on its workload request in a window of length is given by:

(19)
Proof Sketch.

We consider an activation of occurring at time . The worst-case scenario for task is when it is prevented from execution on any core by higher priority tasks in the interval . Let us assume this worst-case scenario and let us assume that all nodes are released at time but one specific node is released at time . Since is a lower-bound on the workload request of at any time , it follows that the workload executed after , when is released at time , is greater than or equal to the workload request of when it is released at time . Hence on the left border of the window of length (i.e., at the beginning of the window), if the nodes are assumed to be released as late as possible, then the workload request in the window is maximized. On the right border of the window (i.e., at the end of the window), we assume the earliest release time of all the nodes but one specific node . By applying the same logic, it follows that the workload request in the window is maximized since the nodes are assumed to be released as early as possible and is an upper-bound on the workload request of at any time .

Now, let denote the maximum number of parallel nodes in . We recall that the summation of the workload requests of all the nodes is a piecewise linear function, where each segment has its first derivative in the interval . In order to compute the maximum workload request of each task in an interval of length , we must evaluate the workload request in all windows of length assuming an offset . Since on the one hand the first derivative of (resp. the first derivative of ) is clearly periodic from time with a period (see Fig. 4), and on the other hand, the next activation of occurs only at time , it is not necessary to check the offsets over as there is no extra workload after by construction. Hence and the lemma follows. ∎

In order to obtain the solution of Eq. (19), instead of exaustively testing all the values of in the continuous interval , we derive the finite set of offsets which maximizes it hereafter.

As previously mentioned, both and are piecewise linear functions. Hence, the set of points where the first derivative of increases and the set points where the first derivative of decreases should be considered respectively at the left and at the right border of the targeted window of length . The points in these sets maximize the workload request in the window. Formally, let where and . Since , then for each node , the first derivative of can increase only at points . Similarly, the first derivative of can decrease only at points such that and . Therefore can be defined as follows.

(20)

The computation of for each (with ) makes it easy to assess an upper-bound on the interference it will induce on the workload of the lower priority tasks in any given time window. From [2], it has been proven that every unit of execution of a LL task can interfere for at most units on the workload request of any other LL task with a lower priority. Thus, an upper-bound on the interference suffered by the task in a window of size is provided as:

(21)

where is the set of tasks with a higher priority than . A sufficient schedulability condition for a task-set is derived from Eq. (21) as follows.

Theorem 1 (Sufficient schedulability condition).

A task-set is schedulable on a -homogeneous multicores using a scheduler if:

(22)

where is computed by the following fixed-point algorithm.

Note: This iterative algorithm stops as soon as for any , or . In the latter case, is deemed not schedulable.

Proof.

This theorem follows directly from Lemma 1, Lemma 2, Eq. (14) and Eq. (21). ∎

V Reduction of the number of tasks with carry-in

Fig. 5: Functions and for task

Rather than considering that each task has a carry-in as in Section IV, the intuitive idea of this section consists of reducing the number of tasks with carry-in to at most () tasks (where is the number or cores). Since it is usually the case that , we thus obtain a tighter upper-bound on the interference that each task may suffer at run-time and finally a better schedulability condition for each task. To accomplish this, first let us recall some fondamental results regarding the “Liu & Layland (LL) task model”.

Upper-bound on the workload request of a LL task without carry-in. Let be a LL task with no pending workload at the beginning of a window of length . An upper-bound on its workload request in this window is recalled (see [3, 7]):

(23)

Upper-bound on the workload request of a LL task with carry-in. Let be a LL task with some pending workload at the beginning of a window of length . An upper-bound on its workload request in this window is recalled (see [3, 7]):

(24)

Extra workload request of a LL task. The difference between the upper-bounds –with and without– carry-in of a LL Task in a window of length is thus recalled as:

(25)

Upper-bound on the interference of a LL task. Assume a scheduler and a task-set in which tasks are in a decreasing priority order. An upper-bound on the interference that higher priority tasks induce on the execution of task in a targeted window of length is recalled (see [3, 7]):

(26)

In Eq. (26), returns the greatest value among the workload of tasks with a higher priority than . For a LL task-set, it has been proven in [7] that a worst-case scenario in terms of total workload request in a targeted window of length can be constructed by considering () tasks with carry-in. Therefore, it follows that the workload induced by these carry-in tasks in this window of concern cannot exceed the difference between () the maximum workload assuming no carry-in for all tasks (see Eq. (23)) and () the workload assuming the carry-in scenario (see Eq. (24)). Consequently, from the view-point of task , if , then does not suffer any interference, otherwise, if , then we can choose the () tasks among such that the difference between the workload assuming the non-carry-in scenario and the workload assuming the carry-in scenario is the largest possible for each selected task. By summing up these differences and the remaining “” workloads corresponding to the tasks without carry-in, an upper-bound on the workload that higher priority tasks induce in the window of length is computed.

Before we extend Eq. (26) to the scheduling problem of tasks using a scheduler, let us present an alternative formal proof to the one provided by Guan et al. [7] for the analysis considering () tasks with carry-in.

Theorem 2 (Eq. (26) is an Upper-bound for LL tasks [7]).

Let be a feasible LL task-set scheduled by using a scheduler on a -homogeneous multicores. Let task . Eq. (26) is an upper-bound on the interference on in any window of length .

Proof.

Since is feasible, let be the latest time-instant such that at least one core is idle at time , then at most () tasks have a carry-in workload at time instant . Let be the window of length starting at . By considering the () tasks with the largest possible carry-in, we are conservative w.r.t. the workload request of the tasks with carry-in in . In the same vein, by considering () tasks without carry-in to be simultaneously released at time with the future activations of each of these tasks to occur as soon as it is legally permitted to do so, we are also conservative w.r.t. the workload request of the tasks without carry-in in .

Now let be a window of length starting at time with the offset . Assume that the beginning of triggers the first activation of . The earliest time-instant at which may start executing is at such that . Indeed: ( (as cannot start executing before its activation time), and ( (as all the cores are busy executing higher priority tasks between and ), by construction. As all the cores are busy executing higher priority tasks between and , getting the first activation of at any time-instant in the interval (i.e., by sliding towards ), we can only increase the interference on (as the end of the execution of remains unchanged). The maximum interference is obtained when is activated simultaneously with all higher priority tasks, i.e., at time as then we have the largest possible carry-in as well as non-carry-in interference on the execution of . The theorem follows. ∎

Vi Extension to -based Tasks

In this section we extend the reduction of the number of tasks with carry-in obtained in the framework of LL tasks to the task model. To accomplish this end, we distinguish between the upper-bound on the workload request of the tasks with carry-in (see Eq. (19)) and without carry-in (which is detailed hereafter). These expressions will be considered when computing the interference of higher priority tasks on the execution of every task in a window of length .

Upper-bound on the workload request of tasks without carry-in. Let us assume that is a task without carry-in. An upper-bound on its workload request in a targeted window of length can be constructed by distinguishing between the same two scenarios as those which allowed us to derive Eq. (19) in Section IV.

Regarding Scenario () where does not suffer any interference from higher priority tasks, an upper-bound on the workload request of at any time is defined as follows.

(27)

In a similar manner, regarding Scenario () where suffers the maximum interference from higher priority tasks, an lower-bound on the workload request of at any time is defined as follows.

(28)

As for the carry-in tasks case, Eq. (27) and Eq. (28) can be used to obtain an upper-bound on the workload request of in a time window of length as claimed in Lemma 3.

Lemma 3 (Upper-bound on the Workload of Without Carry-in).

Assiming no carry-in of task , an upper-bound on its workload request in a window of length is given by:

(29)
Proof Sketch.

The proof sketch of this lemma follows the same reasoning as that of Lemma 2. ∎

From Lemma 2 and Lemma 3, it follows that the difference between the upper-bounds –with and without– carry-in for a task in a window of length is can be written as:

(30)

All the results presented so far enable us to present a tighter upper-bound on the interference of a task together with the corresponding sufficient schedulability condition.

Tighter Upper-bound on the Interference of a Task. Assume a scheduler and a task-set in which tasks are in a decreasing priority order as in Section V. An upper-bound on the interference that higher priority tasks induce on the execution of task in a targeted window of length is obtained as follows.

(31)

Each term in Eq. (31) is explained as the corresponding term in Eq. (26) and a tighter schedulability test follows.

Theorem 3 (Tighter Sufficient Schedulability Condition).

A task-set is schedulable on a -homogeneous multicores using a scheduler if:

(32)

where is computed by the following fixed-point algorithm.

Note: This algorithm also stops as soon as for any , or . Again, in the latter case, is deemed not schedulable.

Proof.

The proof of this theorem is similar to that of Theorem 2. The difference here resides in the evaluation of the upper-bound on the workload of tasks without carry-in. Instead of considering a synchronous activation at these tasks at the beginning of the targeted window and assume their subsequent activations to occur as soon as it is legally permitted to do so, the upper-bound has to be computed by using Eq. 29). ∎

Vii conclusions

In this paper, a sufficient schedulability test for fully preemptive -based tasks with constrained deadlines is presented. A global fixed task priority () scheduler and a homogeneous multicore platform are assumed. Under these settings, this work is the first to address this problem to the best of our knowledge. As future work we intend to evaluate the properties of a task model where nodes belonging to each task may execute with different priorities rather than directly inheriting their priority from the task they belong to.

References

  • [1] B. Andersson and D. Niz. Analyzing global-edf for multiprocessor scheduling of parallel tasks. In OPODIS, 2012.
  • [2] T. Baker and M. Cirinei. A unified analysis of global edf and fixed-task-priority schedulability of sporadic task systems on multiprocessors. Journal of Embedded Computing, 4(2):55–69, 2010.
  • [3] M. Bertogna and M. Cirinei. Response-time analysis for globally scheduled symmetric multiprocessor platforms. In RTSS, 2007.
  • [4] V. Bonifaci, A. Marchetti-Spaccamela, S. Stiller, and A. Wiese. Feasibility analysis in the sporadic DAG task model. In ECRTS, 2013.
  • [5] H. Sung Chwa, J. Lee, K. Phan, A. Easwaran, and I. Shin. Global edf schedulability analysis for synchronous parallel tasks on multicore platforms. In ECRTS, 2013.
  • [6] F. Fauberteau, M. Qamhieh, and S. Midonnet. Partitioned scheduling of parallel real-time tasks on multiprocessor systems. In WIP ECRTS, 2011.
  • [7] Nan Guan, Martin Stigge, Wang Yi, and Ge Yu. New response time bounds for fixed priority multiprocessor scheduling. In RTSS, 2009.
  • [8] K. Lakshmanan, S. Kato, and R. Rajkumar. Scheduling parallel real-time tasks on multi-core processors. In RTSS, 2010.
  • [9] J. Li, K. Agrawal, C. Lu, and C. Gill. Analysis of global edf for parallel tasks. In ECRTS, 2012.
  • [10] J. Marinho, V. Nélis, S. Petters, and I. Puaut. Preemption delay analysis for floating non-preemptive region scheduling. In DATE, 2012.
  • [11] Wind River. VxWorks Platforms. http://www.windriver.com/products/
    product-notes/PN_VE_6_9_Platform_0311.pdf.
  • [12] A. Saifullah, K. Agrawal, C. Lu, and C. Gill. Multi-core real-time scheduling for generalized parallel task models. In RTSS, 2010.