I Introduction
Computing devices, such as server farms, data centers, portable devices and desktops, will consume more than 14% of global electricity consumption by 2020 [1]. As the performance and speed of processors increase, the challenges in designing these future highperformance computing systems are processor power consumption and heat dissipation. Moreover, these systems may need to operate under tight energy requirements while guaranteeing a quality of service.
As specified by the Advanced Configuration and Power Interface (ACPI) [2], which is an open industry standard for device configuration as well as power and thermal management, the power usage of a device can be controlled by various methods. For example, by controlling the time in the idling power states, changing the operating frequency in the performance states or by putting a CPU to sleep in throttling states when the CPU temperature is critically high.
Dynamic Voltage and Frequency Scaling (DVFS) techniques have been widely used as an energy management scheme in modern computing systems. Typically, a processor running at a higher clock frequency consumes more energy than a processor running at a lower clock frequency. Hence, DVFS techniques aim to reduce the power/energy consumption by dynamically controlling the CPU operating frequency/voltage to match the workload. Since timeliness is an important aspect for realtime systems, the main consideration in applying DVFS is to ensure that deadline constraints are not violated.
Though a lot of work has been proposed to solve realtime scheduling problems, most of them are based on the assumption that the computational task parameters, e.g. the task’s execution time, period and deadline, do not change. In other words, they are openloop controllers. Though an openloop scheduler can provide good performance in a predictive environment, the performance can be degraded in an unpredictable environment, where there are uncertainties in task parameters. Specifically, the actual execution time of the task can vary by as much as 87% of measured worstcase execution times [3]. Since it is often the case that the task parameters are based on the worstcase, it follows that the system workload is overestimated, resulting in higher energy consumption due to nonoptimal solutions. Therefore, in this work, we aim to apply feedback methods from control theory to address a scheduling problem subjected to timevarying workload uncertainty.
Only a few works have adopted feedback methods from control theory to cope with a dynamic environment for realtime scheduling. For example, [4] proposed an energyaware feedback scheduling architecture for soft realtime tasks for a uniprocessor. A proportional controller adjusts the workload utilization^{1}^{1}1The utilization of the task is defined as the ratio between the task execution time and its deadline. For this work, we will use the term ‘density’ rather than utilization; in the literature, utilization is often used for a special case of a periodic taskset, i.e. when the task deadline is equal to its period. through a variable voltage optimization unit. Specifically, the controlled variable is the energy savings ratio and the manipulated variable is the worstcase utilization.
Similarly, [5] proposed a feedback method for estimating execution times to improve the system performance, i.e. the number of tasks that meet deadlines and the number of tasks that are admitted to the system. That is, the estimated execution time is calculated at each decision time interval based on the deadline miss and rejection ratios.
In [6], a feedback method was developed for a uniprocessor hard realtime scheduling problem with DVFS to cope with varying execution time tasksets. In the same manner, the actual execution time of the task is fed back to a PID controller to adjust the estimated execution time of the task, as well as the execution frequency.
A twolevel power optimization control on a multicore realtime systems was proposed in [7]. At the corelevel, the utilization of each CPU is monitored and a DVFS scheme is implemented in response to uncertainties in task execution times in order to obtain a desired utilization. To further reduce power consumption, task reassignment and idle core shutdown schemes were employed at the processor level.
All of the work in this area only consider feedback of realtime scheduling as regulation problems. However, our work will consider realtime multiprocessor scheduling as a constrained optimal control problem [8], which can be combined with a feedback scheme to handle uncertainties in an unpredictable scheduling environment, as is done in model predictive control [9]. Our proposed scheme would also be known as a slack reclamation scheme in the realtime scheduling literature, in which the slack time due to early completion of a task is exploited to reduce energy consumption by decreasing the operating speed of the remaining tasks in the system [10, 11].
The main contributions of this paper are:

A feedback and optimal control framework is proposed to solve a realtime scheduling problem with uncertainty in task execution times on a homogeneous multiprocessor system with DVFS capabilities.

A convex optimization formulation is proposed to solve a workload partitioning problem.

The first energyoptimal scheduling algorithm to solve multiprocessor scheduling with aperiodic tasksets.

Though we introduce the problem with discrete frequency level systems, the framework can be applied to continuous frequency multiprocessor systems by simply replacing the workload partitioning algorithm by the nonlinear programming formulation proposed in [8].
Details of the system model is given in Section II. The feedback scheduling framework is presented in Section III. That is, Section IIIA describes scheduling as an optimal control problem, Section IIIB presents an LP formulation to solve the problem and the overall feedback scheduling architecture is provided in Section IIIC. Simulation results to demonstrate the performance of our feedback algorithm are given in Section IV. Lastly, we summarise the results and discuss future work in Section V
Ii Task and Processor Models
A task is assumed to be aperiodic and defined as a triple , where is the task arrival time, is the estimated number of CPU cycles to complete the task and is the task relative deadline, i.e. a task arriving at time has a deadline at time . The estimated minimum execution time is the estimated execution time of the task when executed at the maximum clock frequency , i.e . The minimum task density is defined as the ratio between the task minimum execution time and deadline, i.e. The actual minimum execution time of the task is the actual execution time when the task is executed at clock frequency , i.e. , where is the estimation factor. Note that the actual execution time of the task is not known until the task has finished. We will assume that the tasks can be preempted at any time, i.e. the execution of the task on a processor can be suspended in order to start executing another task. Moreover, task migration is allowed, i.e. execution is allowed to be suspended on one processor and able to be continued on another processor. There is no delay with task preemption or migration, since we assume that the delay is added to the estimated task execution times or that the delay is negligible. Lastly, it will also be assumed that tasks do not have any resource or precedence constraints, i.e. the task is ready to start upon its arrival time.
For this work, we assume a practical processor model, i.e. a processor has a finite set of operating frequency levels. Additionally, the processors are homogeneous, that is, having the same set of operating frequencies and power consumptions. The processor voltage/frequency can be adjusted individually using a DVFS technique.
The energy consumed during the time interval is
(1) 
where is the instantaneous power consumption of executing a task at an execution speed , defined as the ratio between the operating frequency to , i.e. . The energy consumed by executing and completing task at a constant speed is the summation of the energy in the active and idle modes, hence , where is the power while active and is the idle power. Note that is not a function of speed, hence can be omitted when minimizing energy.
Iii Feedback Scheduling
Iiia Continuoustime Optimal Control Problem
This section recalls an optimal control formulation of a multiprocessor scheduling problem with the objective to minimize the total energy consumption [8]. The problem statement is: Given homogeneous processors and realtime tasks, determine a schedule for all tasks within a time interval that solves the following infinitedimensional continuoustime optimal control problem:
(2a)  
subject to  
(2b)  
(2c)  
(2d)  
(2e)  
(2f)  
(2g) 
where is the remaining estimated minimum execution time of task , denotes that processor executes task at speed level at time , where is the corresponding speed and is the total number of nonidle speed levels of a processor. If then will be used as shorthand for , respectively.
The objective is to minimize energy consumption. The estimated execution time and deadline constraints are specified in (2b) and (2c), respectively. The scheduling dynamic (2d) is represented by a flow model (an integrator) with the state and control input . Constraints (2e) and (2f), respectively, ensure that at all times a task is not assigned to at most one nonidle processor and vice versa. Constraint (2g) indicates assignment variables are binary.
IiiB Discretetime Optimal Control Problem as an LP
It was shown in [8] that for a practical system, where each processor has a discrete set of operating frequencies, the problem (2) can be simplified into two steps: (i) solving a workload partitioning problem using a linear programming (LP) formulation and (ii) given a solution to the workload partitioning problem, solve a task ordering problem using McNaughton’s wrap around algorithm [12].
IiiB1 Workload Partitioning
By relaxing the constraint (2g) so that the value of is interpreted as the fraction of the task execution time during each discretization time interval, the workload partitioning problem can be formulated as a finitedimensional LP (annotated as LPDVFS). For this purpose, let denote the fraction of the interval during which task is to be executed at speed level .
Let denote a taskset composed of all active tasks within . Let be the set of times corresponding to the distinct task arrival times and deadlines within the time interval , where . Let and define a task arrival time mapping by such that , a task deadline mapping by such that and .
The workload partitioning statement is: Given homogeneous processors and a taskset with tasks, determine the fraction of task execution times within each time interval that solves the following discretetime optimal control problem:
(3a)  
subject to  
(3b)  
(3c)  
(3d)  
(3e)  
(3f)  
(3g) 
where the state is the estimated minimum execution time of task and can be interpreted as the value of a control input at time instant .
The constraints on the dynamics (3b)–(3d) correspond to (2b)–(2d). Constraint (3e) assures that a task will not be assigned to more than one processor at a time. Constraint (3f) guarantees that the total workload during each time interval will not exceed the system capacity. Lastly, (3g) provides the appropriate lower and upper bounds on .
The functions and map finite sets to the Euclidean space, hence it follows that (3
) is equivalent to a finitedimensional LP with a tractable number of decision variables and constraints. Note that many of the components of the solution are always zero and that the LP is highly structured with sparse matrices and vectors. These facts can be exploited to develop efficient tailormade solvers, as in the literature on model predictive control
[9].Note that does not have a subscript to indicate processor assignment, which is done during task ordering.
IiiB2 Task Ordering
Given a solution to (3), we can find an execution order for all tasks within each time interval such that no task is executed on more than one nonidle processor at each time instant. This can be done using McNaughton’s wrap around algorithm [12], which is detailed in Algorithm 1 for the problem considered here^{2}^{2}2Note that this version of McNaughton’s algorithm is to simplify the presentation in this paper — there could be better ways to order tasks and modes to minimise preemptions, migrations, etc..
The algorithm proceeds as follows for a given interval . The fractions care aligned in an order by task, with modes grouped together by task, along the real number line starting at zero. The line is split at each natural number 1, 2, etc., with each chunk assigned to one processor. Tasks that have been split (called migrating tasks) are assigned to two different processors at nonoverlapping time intervals. The algorithm returns , which is used to define the start and end times of tasks on processors during an interval. Processor starts to work on task at mode at time and ends at time .
Consider the taskset composed of four tasks are to be scheduled on two homogeneous processors with two nonidle modes. Suppose execution fractions in a time interval is as shown in Table I.
Task  Task  

0.1  0.2  0.2  0.4  
0  0.5  0.4  0 
Figure 3 illustrates a feasible schedule of the taskset using McNaughton’s wrap around algorithm.
We are now in a position to state the following.
Theorem 1
Given a solution to (2), choose such that
(4) 
This ensures (3b)–(3d) are satisfied with , . It follows from (2e) and (2f) that (3e) and (3f) are satisfied, respectively. One can similarly verify (3g) holds.
Given a solution to (3) and the output from Algorithm 1 for all intervals. It follows from the properties of McNaughton’s algorithm [12] that only one task is assigned to a processor at a time if is chosen to be piecewise constant such that when and otherwise, . After verifying that (4) holds, one can show that (2b)–(2g) are satisfied.
The result follows by noting that the costs of the two problems are equal with the above choices.
IiiC Feedback Scheduler
As can be seen, our openloop optimal control problem is based on the estimated minimum execution time . The task will often finish earlier than expected, i.e. the actual minimum execution time is often less than . Consider Figure 4, which illustrates a fluid path of executing a task .
Our openloop algorithm follows a different path from the one that we really want to follow, i.e. the dotted line, due to uncertainty in task execution times. In other words, the openloop algorithm can provide a solution that is overestimating the system workload, leading to higher energy consumption, due to the fact that the system operates at an unnecessarily higher speed. Therefore, it is better to feed back information whenever (i) a task finishes or (ii) a new task arrives at the system, in order to recalculate a new control action to respond to the changing workload.
Processor type  XScale [13]  PowerPC 405LP [14]  

Frequency (MHz)  150  400  600  800  1000  33  100  266  333 
Speed  0.15  0.4  0.6  0.8  1.0  0.1  0.3  0.8  1.0 
Voltage (V)  0.75  1.0  1.3  1.6  1.8  1.0  1.0  1.8  1.9 
Active Power (mW)  80  170  400  900  1600  19  72  600  750 
Idle Power (mW)  40 [15]  12 
The overall architecture of our feedback scheduling system is given in Figure 5, where the scheduler is called at two scheduling events.
One occurs when a task finishes its required executing workload/cycles on one of the processors and the other when a new task arrives. The scheduler is composed of two subunits, i.e. a workload partitioning unit and a task ordering unit. By solving (3), the workload partitioning unit provides control input to the task ordering unit, which then uses McNaughton’s wrap around algorithm to produce a valid schedule to the execution unit.
0.50  (0,1,5)  (0,2,10)  (0,1.5,15) 
0.75  (0,1,5)  (0,3.5,10)  (0,3,15) 
1.00  (0,2,5)  (0,4,10)  (0,3,15) 
1.25  (0,1,5)  (0,6.5,10)  (0,6,15) 
1.50  (0,2,5)  (0,7,10)  (0,6,15) 
1.75  (0,3,5)  (0,7.5,10)  (0,6,15) 
2.00  (0,4,5)  (0,6,10)  (0,9,15) 
Note: The second parameter of a task is ;  
can be obtained by multiplying by . 
Iv Simulation and Results
To evaluate the performance of our feedback scheme, we consider a set of aperiodic tasks to be scheduled on two commercial processors, namely a PowerPC 405LP and an XScale. The details of the two processors are given in Table II. Two homogeneous systems composed of two processors of the same type were chosen. The energy consumed by executing each taskset, listed in Table III, were evaluated.
The minimum taskset density , a measurement of the utilization of computing resources in a given time interval, is defined as the sum of minimum task densities of all tasks within the system. The LP (3) was modelled using OPTI TOOLBOX [16] and solved with SoPlex [17].
For this simulation, we only consider the scheduling event when a task finishes. Three algorithms are compared: (i) Feedback LPDVFS, which is our LPDVFS + McNaughton’s wrap around algorithm proposed in Section IIIC, (ii) OpenLoop LPDVFS, which is our LPDVFS without feedback information on finishing tasks, and (iii) No mismatch/Ideal, which is our LPDVFS with the actual minimum task execution times equal to the estimated, i.e. .
Figure 8 shows results from executing the tasksets in Table III onto two homogeneous multiprocessor system, composed of two of each processor type, with the estimation factor .
The vertical axis is the total energy consumption normalised by the OpenLoop LPDVFS algorithm. For a system composed of PowerPCs, the feedback scheme can save energy up to about 40% compared to an openloop scheme. However, for a system with XScale processors, the feedback scheme starts to perform better than the openloop scheme only when the density is more than 1. Moreover, the percentage saving of the XScale system is less than that of the PowerPC’s. This is due to the differences in the distribution of speed levels of the two processor types, i.e. the XScale processor has more evenly distributed speed levels than that of the PowerPC; therefore, the optimizer can select the operating speed level that is closer to the optimal continous speed value.
The results from varying the estimation factor of the taskset with are shown in Figure 11.
Note that, for this simulation, the estimation factors of all tasks are the same. For a PowerPC system, the energy saving is high when the estimation factor is low. In addition, the difference between the energy consumed by the feedback strategy and the ideal decreases as the estimation factor increases. On the other hand, for an XScale system, the maximum energy saving does not occur when the estimation factor is the lowest, but rather occurs at . Furthermore, the energy consumption difference between the feedback and optimal/ideal is larger than that of the PowerPC’s. Note that the energy saving varies with the tasksets, solutions from different LP solvers, and the task execution order. Particularly, since the solutions are not unique, the choice of selecting the task execution order has an effect on the total energy consumption.
V Conclusions and Future Work
A feedback method was adopted to solve a multiprocessor scheduling problem with uncertainty in task execution times. We have shown that our proposed closedloop optimal control scheduling algorithm performs better than the openloop algorithm in terms of energy efficiency. Simulation results suggest that the difference between closedloop and openloop performance can be reduced by having a more refined distribution of operating speed levels.
The work presented here can be extended in a number of ways. For a periodic task, an estimator could be incorporated to obtain a better performance. For further energy savings, a dynamic power management scheme (DPM), which determines when and how long the processor should be in the active or idle state, could also be integrated in the scheme.
Finally, note that there are many links here to model predictive control [9] and it would therefore be of interest to investigate how methods developed in that community could be applied to the scheduling problem defined here. For example, one could extend the work to the problem of optimizing over feedback policies, rather than openloop input sequences, as was done here. Efficient numerical methods, including distributed cooperative schemes, could also be developed to solve the LP (3) in realtime.
References
 [1] W. Vereecken, W. Van Heddeghem, D. Colle, M. Pickavet, and P. Demeester, “Overall ICT footprint and green communication technologies,” in Communications, Control and Signal Processing (ISCCSP), 2010 4 International Symposium on, March 2010, pp. 1–6.
 [2] HewlettPackard, Intel, Microsoft, P. T. Ltd., and Toshiba, “Advanced Configuration and Power Interface Specification (ACPI),” http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf, 2010.
 [3] J. Wegener and F. Mueller, “A comparison of static analysis and evolutionary testing for the verification of timing constraints,” RealTime Systems, vol. 21, no. 3, pp. 241–268, 2001.
 [4] A. SoriaLopez, P. MejiaAlvarez, and J. Cornejo, “Feedback scheduling of poweraware soft realtime tasks,” in Computer Science, 2005. ENC 2005. 6 Mexican International Conference on, Sept 2005, pp. 266–273.
 [5] D. R. Sahoo, S. Swaminathan, R. Aomari, M. V. Salapaka, G. Manimaran, and A. K. Somani, “Feedback control for realtime scheduling I,” in In Proc. American Controls Conference, 2002, pp. 1254–1259.
 [6] Y. Zhu and F. Mueller, “Feedback EDF scheduling of realtime tasks exploiting dynamic voltage scaling,” RealTime Systems, vol. 31, no. 13, pp. 33–63, 2005.
 [7] X. Fu and X. Wang, “Utilizationcontrolled task consolidation for power optimization in multicore realtime systems,” in Embedded and RealTime Computing Systems and Applications (RTCSA), 2011 IEEE 17 International Conference on, vol. 1, Aug 2011, pp. 73–82.
 [8] M. Thammawichai and E. C. Kerrigan, “Energyefficient scheduling for homogeneous multiprocessor systems,” arXiv:1510.05567v2 [cs.OS], 2015.
 [9] D. Q. Mayne, “Model predictive control: Recent developments and future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
 [10] D. Zhu, R. Melhem, and B. Childers, “Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor realtime systems,” in RealTime Systems Symposium, 2001. (RTSS 2001). Proceedings. 22 IEEE, Dec 2001, pp. 84–94.
 [11] J.J. Che, C.Y. Yang, and T.W. Kuo, “Slack reclamation for realtime task scheduling over dynamic voltage scaling multiprocessors,” in Sensor Networks, Ubiquitous, and Trustworthy Computing, 2006. IEEE International Conference on, vol. 1, June 2006, pp. 8 pp.–.

[12]
R. McNaughton, “Scheduling with deadlines and loss function,”
Machine Science, vol. 6(1), pp. 1–12, October 1959.  [13] Intel XScale Microarchitecture: Benchmarks, 2005, http://web.archive.org/web/20050326232506/developer.intel.com/design/intelxscale/benchmarks.htm.
 [14] C. Rusu, R. Xu, R. Melhem, and D. Mossé, “Energyefficient policies for requestdriven soft realtime systems,” in RealTime Systems, 2004. ECRTS 2004. Proceedings. 16 Euromicro Conference on, June 2004, pp. 175–183.
 [15] R. Xu, C. Xi, R. Melhem, and D. Moss, “Practical PACE for embedded systems,” in Proceedings of the 4 ACM International Conference on Embedded Software, ser. EMSOFT ’04. New York, NY, USA: ACM, 2004, pp. 54–63.
 [16] J. Currie and D. I. Wilson, “OPTI: Lowering the Barrier Between Open Source Optimizers and the Industrial MATLAB User,” in Foundations of ComputerAided Process Operations, N. Sahinidis and J. Pinto, Eds., Savannah, Georgia, USA, 8–11 January 2012.
 [17] R. Wunderling, “Paralleler und objektorientierter SimplexAlgorithmus,” Ph.D. dissertation, Technische Universität Berlin, 1996, http://www.zib.de/Publications/abstracts/TR9609/.