# Energy-Efficient Real-Time Scheduling for Two-Type Heterogeneous Multiprocessors

## Authors

• 3 publications
• 5 publications
• ### Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems

We present a number of novel algorithms, based on mathematical optimizat...
10/19/2015 ∙ by Mason Thammawichai, et al. ∙ 0

• ### Energy Minimization in DAG Scheduling on MPSoCs at Run-Time: Theory and Practice

Static (offline) techniques for mapping applications given by task graph...
12/19/2019 ∙ by Bertrand Simon, et al. ∙ 0

• ### Feedback Scheduling for Energy-Efficient Real-Time Homogeneous Multiprocessor Systems

Real-time scheduling algorithms proposed in the literature are often bas...
06/08/2016 ∙ by Mason Thammawichai, et al. ∙ 0

• ### Flow Network Models for Online Scheduling Real-time Tasks on Multiprocessors

We consider the flow network model to solve the multiprocessor real-time...
10/19/2018 ∙ by Hyeonjoong Cho, et al. ∙ 0

• ### Energy-Aware Scheduling of Task Graphs with Imprecise Computations and End-to-End Deadlines

Imprecise computations provide an avenue for scheduling algorithms devel...
05/10/2019 ∙ by Amirhossein Esmaili, et al. ∙ 0

• ### Multiprocessor Global Scheduling on Frame-Based DVFS Systems

In this ongoing work, we are interested in multiprocessor energy efficie...
09/24/2008 ∙ by Vandy Berten, et al. ∙ 0

• ### A Novel Approach for the Process Planning and Scheduling Problem Using the Concept of Maximum Weighted Independent Set

Process Planning and Scheduling (PPS) is an essential and practical topi...
08/05/2020 ∙ by Kai Sun, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Efficient energy management has become an important issue for modern computing systems due to higher computational power demands in today’s computing systems, e.g. sensor networks, satellites, multi-robot systems, as well as personal electronic devices. There are two common schemes used in modern computing energy management systems. One is dynamic power management (DPM), where certain parts of the system are turned off during the processor idle state. The other is dynamic voltage and frequency scaling (DVFS), which reduces the energy consumption by exploiting the relation between the supply voltage and power consumption. In this work, we consider the problem of scheduling real-time tasks on heterogeneous multiprocessors under a DVFS scheme with the goal of minimizing energy consumption, while ensuring that both the execution cycle requirement and timeliness constraints of real-time tasks are satisfied.

### 1.1 Terminologies and Definitions

This section provides basic terminologies and definitions used throughout the paper.

Speed : The operating speed is defined as the ratio between the operating frequency of processor type- and the maximum system frequency , i.e. , , where .

Minimum Execution Time111In the literature, this is often called ‘worst-case execution time’. However, in the case where the speed is allowed to vary, using the term ‘minimum execution time’ makes more sense, since the execution time increases as the speed is scaled down. For simplicity of exposition, we also assume no uncertainty, hence ‘worst-case’ is not applicable here. Extensions to uncertainty should be relatively straightforward, in which case then becomes ‘minimum worst-case execution time’. : The minimum execution time is the execution time of task when executed at the maximum system frequency , i.e. .

Task Density222When all tasks are assumed to have implicit deadlines, this is often called ‘task utilization’. : For a periodic task, a task density is defined as the ratio between the task execution time and the minimum of its deadline and its period, i.e. , where is the task execution speed.

System Capacity : The system capacity is defined as , where is the maximum speed of processor type-, i.e. , is the total number of processors of type-.

Migration Scheme: A global scheduling scheme allows task migration between processors and a partitioned scheduling scheme does not allow task migration.

Feasibility Optimal: An algorithm is feasibility optimal if the algorithm is guaranteed to be able to construct a valid schedule such that no deadlines are missed, provided a schedule exists.

Energy Optimal: An algorithm is energy optimal when it is guaranteed to find a schedule that minimizes the energy, while meeting the deadlines, provided such a schedule exists.

Step Function: A function is a step (also called a piecewise constant) function, denoted , if there exists a finite partition of and a set of real numbers such that for all , .

### 1.2 Related Work

Due to the heterogeneity of the processors, one should not only consider the different operating frequency sets among processors, but also the hardware architecture of the processors, since task execution time will be different for each processor type. In other words, the system has to be captured by two aspects: the difference in operating speed sets and the execution cycles required by different tasks on different processor types.

With these aspects, fully-migration/global based scheduling algorithms, where tasks are allowed to migrate between different processor types, are not applicable in practice, since it will be difficult to identify how much computational work is executed on one processor type compared to another processor type due to differences in instruction sets, register formats, etc. Thus, most of the work related to heterogeneous multiprocessor scheduling are partition-based/non-preemptive task scheduling algorithms [1, 2, 3, 4, 5, 6, 7], i.e. tasks are partitioned onto one of the processor types and a well-known uniprocessor scheduling algorithm, such as Earliest Deadline First (EDF) [8], is used to find a valid schedule. With this scheme, the heterogeneous multiprocessor scheduling problem is reduced to a task partitioning problem, which can be formulated as an integer linear program (ILP). Examples of such work are [1] and [5].

However, with the advent of ARM two-type heterogeneous multicores architecture, such as the big.LITTLE architecture [9], that supports task migrations among different core types, a global scheduling algorithm is possible. In [10, 11], the first energy-aware global scheduling framework for this special architecture is presented, where an algorithm called Hetero-Split is proposed to solve a workload assignment and a Hetero-Wrap algorithm to solve a schedule generation problem. Their framework is similar to ours, except that we adopt a fluid model to represent a scheduling dynamic, our assigned operating frequency is time-varying and the CPU idle energy consumption is also considered.

A fluid model is the ideal schedule path of a real-time task. The remaining execution time is represented by a straight line where the slope of the line is the task execution speed. However, a practical task execution path is nonlinear, since a task may be preempted by other tasks. The execution interval of a task is represented by a line with a negative slope and a non-execution interval is represented by a line with zero slope.

However, the algorithms that are based on the fairness notion [13, 18, 19, 14, 16, 17] are feasibility optimal, but have hardly been applied in a real system, since they suffer from high scheduling overheads, i.e. task preemptions and migrations. Recently, two feasibility optimal algorithms that are not based on the notion of fairness have been proposed. One is the RUN algorithm [20], which uses a dualization technique to reduce the multiprocessor scheduling problem to a series of uniprocessor scheduling problems. The other is U-EDF [21], which generalises the earliest deadline first (EDF) algorithm to multiprocessors by reducing the problem to EDF on a uniprocessor.

Alternatively to the above methods, the multiprocessor scheduling problem can also be formulated as an optimization problem. However, since the problem is NP-hard [22]

, in general, an approximated polynomial-time heuristic method is often used. An example of these approaches can be found in

[23, 24]

, which consider energy-aware multiprocessor scheduling with probabilistic task execution times. The tasks are partitioned among the set of processors, followed with computing the running frequency based on the task execution time probabilities. Among all of the feasibility assignments, an optimal energy consumption assignment is chosen by solving a mathematical optimization problem, where the objective is to minimize some energy function. The constraints are to ensure that all tasks will meet their deadlines and only one processor is assigned to a task. In partitioned scheduling algorithms, such as

[23, 24], once a task is assigned to a specific processor, the multiprocessor scheduling problem is reduced to a set of uniprocessor scheduling problems, which is well studied [25]. However, a partitioned scheduling method cannot provide an optimal schedule.

### 1.3 Contribution

The main contributions of this work are:

• The formulation of a real-time multiprocessor scheduling problem as an infinite-dimensional continous-time optimal control problem.

• Three mathematical programming formulations to solve a hard real-time task scheduling problem on heterogeneous multiprocessor systems with DVFS capabilities are proposed.

• We provide a generalised optimal speed profile solution to a uniprocessor scheduling problem with real-time taskset.

• Our work is a multiprocessor scheduling algorithm that is both feasibility optimal and energy optimal.

• Our formulations are capable of solving a multiprocessor scheduling problem with any periodic tasksets as well as aperiodic tasksets, compared to existing work, due to the incorporation of a scheduling dynamic and a time-varying speed profile.

• The proposed algorithms can be applied to both an online scheduling scheme, where the characteristics of the taskset is not known until the time of execution, and an offline scheduling scheme, where the taskset information is known a priori.

• Moreover, the proposed formulations can also be extended to a multicore architecture, which only allows frequency to be changed at a cluster-level, rather than at a core-level, as explained in Section 2.3.

### 1.4 Outline

This paper is organized as follows: Section 2 defines our feasibility scheduling problem in detail. Details on solving the scheduling problem with finite-dimensional mathematical optimization is given in Section 3. The optimality problem formulations are presented in Section 4. The simulation setup and results are presented in Section 5. Finally, conclusions and future work are discussed in Section 6.

## 2 Feasibility Problem Formulation

Though our objective is to minimize the total energy consumption, we will first consider a feasiblity problem before presenting an optimality problem.

### 2.1 System model

We consider a set of real-time tasks that are to be partitioned on a two-type heterogeneous multiprocessor system composed of processors of type-. We will assume that the system supports task migration among processor types, e.g. sharing the same instruction set and having a special interconnection for data transfer between processor types. Note that is the same for all processor types, since the instruction set is the same.

All tasks do not share resources, do not have any precedence constraints and are ready to start at the beginning of the execution. A task can be preempted/migrated between different processor types at any time. The cost of preemption and migration is assumed to be negligible or included in the minimum task execution times. Processors of the same type are homogeneous, i.e. having the same set of operating frequencies and power consumptions. Each processor’s voltage/speed can be adjusted individually. Additionally, for an ideal system, a processor is assumed to have a continuous speed range. For a practical system, a processor is assumed to have a finite set of operating speed levels.

### 2.3 Scheduling as an Optimal Control Problem

Below, we will refer to the sets , and , where is the largest deadline of all tasks. Note that are short-hand notations for , respectively. The scheduling problem can therefore be formulated as the following infinite-dimensional continous-time optimal control problem:

 find\mathrlapxi(⋅),arik(⋅),srk(⋅), ∀i∈I,k∈Kr,r∈R subject to xi(bi)=x––i, ∀i (1a) xi(t)=0, ∀i,t∉[bi,bi+di) (1b) ˙xi(t)≥−κ∑r=1mr∑k=1arik(t)srk(t), ∀i,t,a.e. (1c) κ∑r=1mr∑k=1arik(t)≤1, ∀i,t (1d) n∑i=1arik(t)≤1, ∀k,r,t (1e) srk(t)∈Sr, ∀k,r,t (1f) arik(t)∈{0,1}, ∀i,k,r,t (1g) arik(⋅)∈PC,srk(⋅)∈PC, ∀i,k,r,t (1h)

where the state is the remaining minimum execution time of task at time , the control input is the execution speed of the processor of type- at time and the control input is used to indicate the processor assignment of task at time , i.e.  if and only if task is active on processor of type-. Notice that here we formulated the problem with speed selection at a core-level; a stricter assumption of a multicore architecture, i.e. a cluster-level speed assignment, is straightforward. Particularly, by replacing a core-level speed assignment with a cluster-level speed assignment in the above formulation.

The initial conditions on the minimum execution time of all tasks and task deadline constraints are specified in (1a) and (1b), respectively. The fluid model of the scheduling dynamic is given by the differential constraint (1c). Constraint (1d) ensures that each task will be assigned to at most one non-idle processor at a time. Constraint (1e) quarantees that each non-idle processor will only be assigned to at most one task at a time. The speeds are constrained by (1f) to take on values from . Constraint (1g) emphasis that task assignment variables are binary. Lastly, (1h) denotes that the control inputs should be step functions.

###### Fact 1

A solution to (1) where (1c) is satisfied with equality can be constructed from a solution to (1).

###### Proof:

Let be a feasible point to (1). Let . Choose such that (i) and (ii) . Choose and . It follows that is a solution to (1) where (1c) is an equality.

## 3 Solving the Scheduling Problem with Finite-dimensional Mathematical Optimization

The original problem (1) will be discretized by introducing piecewise constant constraints on the control inputs and . Let , which we will refer to as the major grid, denote the set of discretization time steps corresponding to the distinct arrival times and deadlines of all tasks within , where .

### 3.1 Mixed-Integer Nonlinear Program (MINLP-DVFS)

The above scheduling problem, subject to piecewise constant constraints on the control inputs, can be naturally formulated as an MINLP, defined below. Since the context switches due to task preemption and migration can jeopardize the performance, a variable discretization time step [26] method is applied on a minor grid, so that the solution to our scheduling problem does not depend on the size of the discretization time step. Let denote the set of discretization time steps on a minor grid on the interval with , so that is to be determined for all from solving an appropriately-defined optimization problem.

Let and be short notations for and . Define the notation . Denote the discretized state and input sequences as

 xi[μ,ν]:=xi(τμ,ν), ∀i,μ,ν (2a) srk[μ,ν]:=srk(τμ,ν), ∀k,r,μ,ν (2b) arik[μ,ν]:=arik(τμ,ν), ∀i,k,r,μ,ν (2c)

Let and be step functions inbetween time instances on a minor grid, i.e.

 srk(t)=srk[μ,ν], ∀t∈[τμ,ν,τμ,ν+1),μ,ν (3a) arik(t)=arik[μ,ν], ∀t∈[τμ,ν,τμ,ν+1),μ,ν (3b)

Let denote the set of all tasks within , i.e. . Define a task arrival time mapping by such that for all and a task deadline mapping by such that for all . Define and let be short notation for

By solving a first-order ODE with piecewise constant input, a solution of the scheduling dynamic (1c) has to satisfy the difference constraint

 xi[μ,ν+1]≥xi[μ,ν]−h[μ,ν]κ∑r=1mr∑k=1srk[μ,ν]arik[μ,ν],∀i,μi,ν. (4a) where h[μ,ν]:=τμ,ν+1−τμ,ν,∀μ,ν.

The discretization of the original problem (1) subject to piecewise constant constraints on the inputs (3) is therefore equivalent to the following finite-dimensional MINLP:

 find\mathrlapxi[⋅],arik[⋅],srk[⋅],h[⋅], ∀i∈I,k∈Kr,r∈R subject to (4a) and xi[Φb(Ti),0]=x––i, ∀i (4b) xi[μ,ν]=0, ∀i,μ∉Ui,ν (4c) κ∑r=1mr∑k=1arik[μ,ν]≤1, ∀i,μ,ν (4d) n∑i=1arik[μ,ν]≤1, ∀k,r,μ,ν (4e) srk[μ,ν]∈Sr, ∀k,r,μ,ν (4f) arik[μ,ν]∈{0,1}, ∀i,k,r,μ,ν (4g) 0≤h[μ,ν], ∀μ,ν (4h) M−1∑ν=0h[μ,ν]≤τμ+1−τμ, ∀μ (4i)

where (4h)-(4i) enforce upper and lower bounds on discretization time steps.

###### Theorem 2

Let the size of the minor grid . A solution to (1) exists if and only if a solution to (4) exists.

###### Proof:

Follows from the fact that if a solution exists to (1), then the Hetero-Wrap scheduling algorithm [11] can find a valid schedule with at most migrations within the cluster. [11, Lemma 2].

Next, we will show that , a solution to (4), can be constructed from , a solution to (1). Specifically, choose as above and such that

 ~h[μ,ν]~arik[μ,ν]=∫τμ,ν+1τμ,νarik(t)dt, ∀i,r,μ,ν. (5)

Then (4a)-(4c) are satisfied with . It follows from (1d),(1e) and (1g) that (4d),(4e) and (4g) are satified, respectively. (4f) is satified with .

Suppose now we have , a solution to (4). We can choose to be a solution to (1) if the inputs are the step functions and when . It is simple to verify that (1) is satisfied by the above choice.

### 3.2 Computationally Tractable Multiprocessor Scheduling Algorithms

The time to compute a solution to problem (4) is impractical even with a small problem size. However, if we relax the binary constraints in (4g) so that the value of can be interpreted as the percentage of a time interval during which the task is executed (this will be denoted as in later formulations), rather than the processor assignment, the problem can be reformulated as an NLP for a system with continuous operating speed and an LP for a system with discrete speed levels. The NLP and LP can be solved at a fraction of the time taken to solve the MINLP above. Particularly, the heterogeneous multiprocessor scheduling problem can be simplified into two steps:

STEP 1:

Determine the percentage of task execution times and execution speed within a time interval such that the feasibility constraints are satisfied.

STEP 2:

From the solution given in the workload partitioning step, find the execution order of all tasks within a time interval such that no task will be executed on more than one processor at a time.

#### 3.2.1 Solving the Workload Partitioning Problem as a Continuous Nonlinear Program (NLP-DVFS)

Since knowing the processor on which a task will be executed does not help in finding the task execution order, the corresponding processor assignment subscript  of the control variables and  is dropped to reduce the number of decision variables. Moreover, partitioning time using only a major grid (i.e. ) is enough to guarantee a valid solution, i.e. the percentage of the task exection time within a major grid is equal to the sum of all percentages of task execution times in a minor grid. Since we only need a major grid, we define the notation and . Note that we make an assumption that . We also assume that the set of allowable speed levels is a closed interval given by the lower bound and upper bound .

Consider now the following finite-dimensional NLP:

 find\mathrlapxi[⋅],ωri[⋅],sri[⋅], ∀i∈I,r∈R subject to xi[Φb(Ti)]=x––i, ∀i (6a) xi[μ]=0, ∀i,μ∉Ui (6b) xi[μ+1]≥xi[μ,ν]− h[μ]κ∑r=1ωri[μ]sri[μ], ∀i,μ (6c) κ∑r=1ωri[μ]≤1, ∀i,μ (6d) n∑i=1ωri[μ]≤mr, ∀r,μ (6e) srmin≤sri[μ]≤srmax, ∀i,r,μ (6f) 0≤ωri[μ]≤1, ∀i,r,μ (6g)

where is defined as the percentage of the time interval for which task is executing on a processor of type- at speed . (6d) guarantees that a task will not run on more than one processor at a time. The constraint that the total workload at each time interval should be less than or equal to the system capacity is specified in (6e). Upper and lower bounds on task execution speed and percentage of task execution time are given in (6f) and (6g), respectively.

#### 3.2.2 Solving the Workload Partitioning Problem as a Linear Program (LP-DVFS)

The problem (6) can be further simplified to an LP if the set of speed levels is finite, as is often the case for practical systems. We denote with the execution speed at level of an -type processor, where is the total number of speed levels of an -type processor. Let be short-hand for .

Consider now the following finite-dimensional LP:

 find\mathrlapxi[⋅],ωriq[⋅], ∀i∈I,q∈Qr,r∈R subject to xi[Φb(Ti)]=x––i, ∀i (7a) xi[μ]=0, ∀i,μ∉Ui (7b) xi[μ+1]≥xi[μ]− h[μ]κ∑r=1lr∑q=1ωriq[μ]srq, ∀i,μ (7c) κ∑r=1lr∑q=1ωriq[μ]≤1, ∀i,μ (7d) n∑i=1lr∑q=1ωriq[μ]≤mr, ∀r,μ (7e) 0≤ωriq[μ]≤1, ∀i,q,r,μ (7f)

where is the percentage of the time interval for which task is executing on a processor of type- at a speed level . Note that all constraints are similar to (6), but the speed levels are fixed.

###### Theorem 3

A solution to (6) can be constructed from a solution to (7), and vice versa, if the discrete speed set  is any finite subset of the closed interval with and in for all .

###### Proof:

Let denote a solution to (6) and a solution to (7). The result follows by noting that one can choose such that , and are satisfied.

This section discusses how to find a valid schedule in the task ordering step for each time interval . Since the solutions obtained in the workload partitioning step are partitioning workloads of each task on each processor type within each time interval, one might think of using McNaughton’s wrap around algorithm [15] to find a valid schedule for each processor within the processor type. However, McNaughton’s wrap around algorithm only guarantees that a task will not be executed at the same time within the cluster. There exists a possibility that a task will be assigned to more than one processor type (cluster) at the same time.

To avoid a parallel execution on any two clusters, we can adopt the Hetero-Wrap algorithm proposed in [11] to solve a task ordering problem of a two-type heterogeneous multiprocessor platform. The algorithm takes the workload partitioning solution to STEP 1 as its inputs and returns , which is a task-to-processor interval assignment on each cluster. Note that, for a solution to problem (7), we define the total execution workload of a task and assume that the percentage of execution times of each task at all frequency levels will be grouped together in order to minimize the number of migrations and preemptions. In order to be self-contained, the Hetero-Wrap algorithm is given in Algorithm 1.

Specifically, the algorithm classifies the tasks into four subsets: (i) a set

of migrating tasks with , (ii) a set of migrating tasks with , (iii) a set of partitioned tasks on cluster of type-1, and (iv) a set of partitioned tasks on cluster of type-2. The algorithm then employs the following simple rules:

• For a type-1 cluster, tasks are scheduled in the order of and using McNaughton’s wrap around algorithm. That is, a slot along the number line is allocated, starting at zero, with the length equal to and the task is aligned with its assigned workload on empty slots of the cluster in the specified order starting from left to right.

• For a type-2 cluster, in the same manner, tasks are scheduled using McNaughton’s wrap around algorithm, but in the order of and starting from right to left. Note that the order of tasks in has to be consistent with the order in a type-1 cluster.

However, the algorithm requires a feasible solution to (6) or (7), in which has at most one task, which we will call an inter-cluster migrating task. From Theorem 3, we can always transform a solution to (6) into a solution to (7). Therefore, we only need to show that there exists a solution to (7) with at most one inter-cluster migrating tasks that lies on the vertex of the feasible region by the following facts and lemma.

###### Fact 4

Among all the solutions to an LP, at least one solution lies at a vertex of the feasible region. In other words, at least one solution is a basic solution.

###### Proof:

The Fundamental Theorem of Linear Programming, which states that if a feasible solution exists, then a basic feasible solution exists [27, p.38].

###### Fact 5

A feasible solution to an LP that is not a basic solution can always be converted into a basic solution.

###### Proof:

This follows from the Fundamental Theorem of Linear Programming [27, p.38].

###### Fact 6

[28, Fact 2] Consider a linear program for some , , . Suppose that constraints are nonnegative constraints on each variable, i.e.  and the rest are linearly independent constraints. If , then a basic solution will have at most non-zero values.

###### Proof:

A unique basic solution can be identified by any linearly independent active constraints. Since there are nonnegative constraints and , a basic solution will have at most non-zero values.

###### Lemma 7

For a solution to (7) that lies on the vertex of the feasible region, there will be at most one inter-cluster partitioning task.

###### Proof:

The number of variables subjected to nonnegative constraint (7f) at each time interval of (7) is . The number of variables subjected to a set of necessary and sufficient feasibility constraints (7d)-(7e) is . Note that we do not count the number of variables in (7c) because (7c) and (7d) are linearly dependent constraints for a given value of . If we assume that and each processor type has at least one speed level, then it follows from Fact 6 that the number of non-zero values of variable , a solution to (7) at the vertex of the feasible region, is at most . Let be the number of tasks assigned to two processor types. Therefore, there are entries of variable that are non-zero. This implies that , i.e. the number of inter-cluster partitioning tasks is at most one.

To illustrate how Algorithm 1 works, consider a simple taskset in which the percentage of execution workload partition at time interval for each task is as shown in Table I.

A feasible schedule obtained by Algorithm 1 is shown in Figure 3.2.3.

For this example, , and .

###### Theorem 8

If a solution to (1) exists, then a solution to (6)/(7) exists. Furthermore, at least one valid schedule satisfying (1) can be constructed from a solution to problem (6)/(7) and the output from Algorithm 1.

###### Proof:

The existence of a valid schedule is proven in [11, Thm 3]. It follows from Facts 46 and Lemma 7 that one can compute a solution with at most one inter-cluster partitioning task. Given a solution to (6)/(7) and the output from Algorithm 1 for all intervals, choose to be a step function such that when and otherwise, . Specifically, one can verify that the following condition holds

 h[μ,ν]ωri[μ]=∫τμ,ν+1τμ,ν∑karik(t)dt, ∀i,r,μ,ν. (8)

Then it is straightforward to show that (1) is satisfied.

Note that, although, we need to solve the same multiprocessor scheduling problem with two steps in this section, the computation times to solve (6) or (7) is extremely fast compared to solving problem (1), i.e. even for a small problem, the times to compute a solution of (4) can be up to an hour, while (6) or (7) can be solved in milliseconds using a general-purpose desktop PC with off-the-shelf optimization solvers. Furthermore, the complexity of Algorithm 1 is  [11].

## 4 Energy Optimality

### 4.1 Energy Consumption model

A power consumption model can be expressed as a summation of dynamic power consumption  and static power consumption . Dynamic power consumption is due to the charging and discharging of CMOS gates, while static power consumption is due to subthreshold leakage current and reverse bias junction current [29]. The dynamic power consumption of CMOS processors at a clock frequency is given by

 Pd(s)=CefV2ddsfmax, (9a) where the constraint sfmax≤ζ(Vdd−Vt)2Vdd (9b)

has to be satisfied [29]. Here denotes the effective switch capacitance, is the supply voltage, is the threshold voltage ( V) and is a hardware-specific constant.

From (9b), it follows that if  increases, then the supply voltage may have to increase (and if decreases, so does ). In the literature, the total power consumption is often simply expressed as an increasing function of the form

 P(s):=Pd(s)+Ps=αsβ+Ps, (10)

where and are hardware-dependent constants, while the static power consumption is assumed to be either constant or zero [30].

The energy consumption of executing and completing a task at a constant speed is given by

 E(si):=cifmax(Pd(si)+Ps)si=x––i(Pd(si)+Ps)si. (11a)

In the literature, it is often assumed that is an increasing function of the operating speed. However, because is a decreasing function, it follows that the energy consumed might not be an increasing function if is non-zero; Figure 6 gives an example of when the energy is non-monotonic, even if the power is an increasing function of clock frequency.

This result implies the existence of a non-zero energy-efficient speed , i.e. the minimizer of (11[31, 32, 33]. Moreover, in the work of [34], the non-convex relationship between the energy consumption and processor speed can be observed as a result of scaling supply voltage.

The total energy consumption of executing a real-time task can be expressed as a summation of active energy consumption and idle energy consumption, i.e. , where is the energy consumption when the processor is busy executing the task and is the energy consumption when the processor is idle. The energy consumption of executing and completing a task at a constant speed is

 E(si) =Eactive(si)+Eidle (12a) =cifmax(Pactive(si)−Pidle)si+Pidledi (12b) =x––i(Pactive(si)−Pidle)si+Pidledi, (12c)

where is the total power consumption in the active interval, is the total power consumption during the idle period. and are dynamic and static power consumption during the active period, respectively. Similarly, and are the dynamic and static power consumption during the idle period. will be assumed to be a constant, since the processor is executing a nop (no operation) instruction at the lowest frequency during the idle interval. and are also assumed to be constants where . Note that is strictly greater than zero.

### 4.2 Optimality Problem Formulation

The scheduling problem with the objective to minimize the total energy consumption of executing the taskset on a two-type heterogeneous multiprocessor can be formulated as the following optimal control problems:

 \mathrlapI) Continuous Optimal Control Problem: minimizexi(⋅),arik(⋅),srk(⋅),∀i∈I,k∈Kr,r∈R∑r,k,i∫L0ℓr(arik(t),srk(t))dt (13) subject to (1). \mathrlapII) MINLP-DVFS: