Feedback Scheduling for Energy-Efficient Real-Time Homogeneous Multiprocessor Systems

06/08/2016 ∙ by Mason Thammawichai, et al. ∙ Imperial College London 0

Real-time scheduling algorithms proposed in the literature are often based on worst-case estimates of task parameters. The performance of an open-loop scheme can be degraded significantly if there are uncertainties in task parameters, such as the execution times of the tasks. Therefore, to cope with such a situation, a closed-loop scheme, where feedback is exploited to adjust the system parameters, can be applied. We propose an optimal control framework that takes advantage of feeding back information of finished tasks to solve a real-time multiprocessor scheduling problem with uncertainty in task execution times, with the objective of minimizing the total energy consumption. Specifically, we propose a linear programming based algorithm to solve a workload partitioning problem and adopt McNaughton's wrap around algorithm to find the task execution order. The simulation results illustrate that our feedback scheduling algorithm can save energy by as much as 40 open-loop method for two processor models, i.e. a PowerPC 405LP and an XScale processor.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Computing devices, such as server farms, data centers, portable devices and desktops, will consume more than 14% of global electricity consumption by 2020 [1]. As the performance and speed of processors increase, the challenges in designing these future high-performance computing systems are processor power consumption and heat dissipation. Moreover, these systems may need to operate under tight energy requirements while guaranteeing a quality of service.

As specified by the Advanced Configuration and Power Interface (ACPI) [2], which is an open industry standard for device configuration as well as power and thermal management, the power usage of a device can be controlled by various methods. For example, by controlling the time in the idling power states, changing the operating frequency in the performance states or by putting a CPU to sleep in throttling states when the CPU temperature is critically high.

Dynamic Voltage and Frequency Scaling (DVFS) techniques have been widely used as an energy management scheme in modern computing systems. Typically, a processor running at a higher clock frequency consumes more energy than a processor running at a lower clock frequency. Hence, DVFS techniques aim to reduce the power/energy consumption by dynamically controlling the CPU operating frequency/voltage to match the workload. Since timeliness is an important aspect for real-time systems, the main consideration in applying DVFS is to ensure that deadline constraints are not violated.

Though a lot of work has been proposed to solve real-time scheduling problems, most of them are based on the assumption that the computational task parameters, e.g. the task’s execution time, period and deadline, do not change. In other words, they are open-loop controllers. Though an open-loop scheduler can provide good performance in a predictive environment, the performance can be degraded in an unpredictable environment, where there are uncertainties in task parameters. Specifically, the actual execution time of the task can vary by as much as 87% of measured worst-case execution times [3]. Since it is often the case that the task parameters are based on the worst-case, it follows that the system workload is overestimated, resulting in higher energy consumption due to non-optimal solutions. Therefore, in this work, we aim to apply feedback methods from control theory to address a scheduling problem subjected to time-varying workload uncertainty.

Only a few works have adopted feedback methods from control theory to cope with a dynamic environment for real-time scheduling. For example, [4] proposed an energy-aware feedback scheduling architecture for soft real-time tasks for a uniprocessor. A proportional controller adjusts the workload utilization111The utilization of the task is defined as the ratio between the task execution time and its deadline. For this work, we will use the term ‘density’ rather than utilization; in the literature, utilization is often used for a special case of a periodic taskset, i.e. when the task deadline is equal to its period. through a variable voltage optimization unit. Specifically, the controlled variable is the energy savings ratio and the manipulated variable is the worst-case utilization.

Similarly, [5] proposed a feedback method for estimating execution times to improve the system performance, i.e. the number of tasks that meet deadlines and the number of tasks that are admitted to the system. That is, the estimated execution time is calculated at each decision time interval based on the deadline miss and rejection ratios.

In [6], a feedback method was developed for a uniprocessor hard real-time scheduling problem with DVFS to cope with varying execution time tasksets. In the same manner, the actual execution time of the task is fed back to a PID controller to adjust the estimated execution time of the task, as well as the execution frequency.

A two-level power optimization control on a multi-core real-time systems was proposed in [7]. At the core-level, the utilization of each CPU is monitored and a DVFS scheme is implemented in response to uncertainties in task execution times in order to obtain a desired utilization. To further reduce power consumption, task reassignment and idle core shutdown schemes were employed at the processor level.

All of the work in this area only consider feedback of real-time scheduling as regulation problems. However, our work will consider real-time multiprocessor scheduling as a constrained optimal control problem [8], which can be combined with a feedback scheme to handle uncertainties in an unpredictable scheduling environment, as is done in model predictive control [9]. Our proposed scheme would also be known as a slack reclamation scheme in the real-time scheduling literature, in which the slack time due to early completion of a task is exploited to reduce energy consumption by decreasing the operating speed of the remaining tasks in the system [10, 11].

The main contributions of this paper are:

  • A feedback and optimal control framework is proposed to solve a real-time scheduling problem with uncertainty in task execution times on a homogeneous multiprocessor system with DVFS capabilities.

  • A convex optimization formulation is proposed to solve a workload partitioning problem.

  • The first energy-optimal scheduling algorithm to solve multiprocessor scheduling with aperiodic tasksets.

  • Though we introduce the problem with discrete frequency level systems, the framework can be applied to continuous frequency multiprocessor systems by simply replacing the workload partitioning algorithm by the nonlinear programming formulation proposed in [8].

Details of the system model is given in Section II. The feedback scheduling framework is presented in Section III. That is, Section III-A describes scheduling as an optimal control problem, Section III-B presents an LP formulation to solve the problem and the overall feedback scheduling architecture is provided in Section III-C. Simulation results to demonstrate the performance of our feedback algorithm are given in Section IV. Lastly, we summarise the results and discuss future work in Section V

Ii Task and Processor Models

A task is assumed to be aperiodic and defined as a triple , where is the task arrival time, is the estimated number of CPU cycles to complete the task and is the task relative deadline, i.e. a task arriving at time has a deadline at time . The estimated minimum execution time is the estimated execution time of the task when executed at the maximum clock frequency , i.e . The minimum task density is defined as the ratio between the task minimum execution time and deadline, i.e.  The actual minimum execution time of the task is the actual execution time when the task is executed at clock frequency , i.e. , where is the estimation factor. Note that the actual execution time of the task is not known until the task has finished. We will assume that the tasks can be preempted at any time, i.e. the execution of the task on a processor can be suspended in order to start executing another task. Moreover, task migration is allowed, i.e. execution is allowed to be suspended on one processor and able to be continued on another processor. There is no delay with task preemption or migration, since we assume that the delay is added to the estimated task execution times or that the delay is negligible. Lastly, it will also be assumed that tasks do not have any resource or precedence constraints, i.e. the task is ready to start upon its arrival time.

For this work, we assume a practical processor model, i.e. a processor has a finite set of operating frequency levels. Additionally, the processors are homogeneous, that is, having the same set of operating frequencies and power consumptions. The processor voltage/frequency can be adjusted individually using a DVFS technique.

The energy consumed during the time interval is

(1)

where is the instantaneous power consumption of executing a task at an execution speed , defined as the ratio between the operating frequency to , i.e. . The energy consumed by executing and completing task at a constant speed  is the summation of the energy in the active and idle modes, hence , where is the power while active and is the idle power. Note that is not a function of speed, hence can be omitted when minimizing energy.

Iii Feedback Scheduling

Iii-a Continuous-time Optimal Control Problem

This section recalls an optimal control formulation of a multiprocessor scheduling problem with the objective to minimize the total energy consumption [8]. The problem statement is: Given homogeneous processors and real-time tasks, determine a schedule for all tasks within a time interval that solves the following infinite-dimensional continuous-time optimal control problem:

(2a)
subject to
(2b)
(2c)
(2d)
(2e)
(2f)
(2g)

where is the remaining estimated minimum execution time of task , denotes that processor executes task at speed level at time , where is the corresponding speed and is the total number of non-idle speed levels of a processor. If then will be used as short-hand for , respectively.

The objective is to minimize energy consumption. The estimated execution time and deadline constraints are specified in (2b) and (2c), respectively. The scheduling dynamic (2d) is represented by a flow model (an integrator) with the state and control input . Constraints (2e) and (2f), respectively, ensure that at all times a task is not assigned to at most one non-idle processor and vice versa. Constraint (2g) indicates assignment variables are binary.

Iii-B Discrete-time Optimal Control Problem as an LP

It was shown in [8] that for a practical system, where each processor has a discrete set of operating frequencies, the problem (2) can be simplified into two steps: (i) solving a workload partitioning problem using a linear programming (LP) formulation and (ii) given a solution to the workload partitioning problem, solve a task ordering problem using McNaughton’s wrap around algorithm [12].

Iii-B1 Workload Partitioning

By relaxing the constraint (2g) so that the value of is interpreted as the fraction of the task execution time during each discretization time interval, the workload partitioning problem can be formulated as a finite-dimensional LP (annotated as LP-DVFS). For this purpose, let denote the fraction of the interval during which task is to be executed at speed level .

Let denote a taskset composed of all active tasks within . Let be the set of times corresponding to the distinct task arrival times and deadlines within the time interval , where . Let and define a task arrival time mapping by such that , a task deadline mapping by such that and .

The workload partitioning statement is: Given homogeneous processors and a taskset with tasks, determine the fraction of task execution times within each time interval that solves the following discrete-time optimal control problem:

(3a)
subject to
(3b)
(3c)
(3d)
(3e)
(3f)
(3g)

where the state is the estimated minimum execution time of task  and can be interpreted as the value of a control input at time instant .

The constraints on the dynamics (3b)–(3d) correspond to (2b)–(2d). Constraint (3e) assures that a task will not be assigned to more than one processor at a time. Constraint (3f) guarantees that the total workload during each time interval will not exceed the system capacity. Lastly, (3g) provides the appropriate lower and upper bounds on .

The functions and map finite sets to the Euclidean space, hence it follows that (3

) is equivalent to a finite-dimensional LP with a tractable number of decision variables and constraints. Note that many of the components of the solution are always zero and that the LP is highly structured with sparse matrices and vectors. These facts can be exploited to develop efficient tailor-made solvers, as in the literature on model predictive control 

[9].

Note that does not have a subscript to indicate processor assignment, which is done during task ordering.

Iii-B2 Task Ordering

Given a solution to (3), we can find an execution order for all tasks within each time interval such that no task is executed on more than one non-idle processor at each time instant. This can be done using McNaughton’s wrap around algorithm [12], which is detailed in Algorithm 1 for the problem considered here222Note that this version of McNaughton’s algorithm is to simplify the presentation in this paper — there could be better ways to order tasks and modes to minimise preemptions, migrations, etc..

1:INPUT
2:
3:
4:for  do
5:     for  do
6:         if  then
7:              
8:         else
9:              if  then
10:                  
11:                  
12:              else
13:                  
14:                  
15:                  
16:                  
17:              end if
18:         end if
19:     end for
20:end for
21:RETURN
Algorithm 1 McNaughton’s wrap around algorithm [12]

The algorithm proceeds as follows for a given interval . The fractions  care aligned in an order by task, with modes grouped together by task, along the real number line starting at zero. The line is split at each natural number 1, 2, etc., with each chunk assigned to one processor. Tasks that have been split (called migrating tasks) are assigned to two different processors at non-overlapping time intervals. The algorithm returns , which is used to define the start and end times of tasks on processors during an interval. Processor  starts to work on task at mode at time and ends at time .

Consider the taskset composed of four tasks are to be scheduled on two homogeneous processors with two non-idle modes. Suppose execution fractions in a time interval is as shown in Table I.

Task Task
0.1 0.2 0.2 0.4
0 0.5 0.4 0
TABLE I: Execution workload partition example

Figure 3 illustrates a feasible schedule of the taskset using McNaughton’s wrap around algorithm.

(a) Tasks are aligned along the real number line.
(b) Each chunk of length 1 is assigned to a processor.
Fig. 3: Feasible schedule at time interval obtained by McNaughton’s wrap around algorithm, where the number in each box is .

We are now in a position to state the following.

Theorem 1

A solution to (2) can be used to construct a solution to (3). Furthermore, a solution to (2) can be constructed from a solution to (3) and the output from Algorithm 1.

Given a solution to (2), choose such that

(4)

This ensures (3b)–(3d) are satisfied with , . It follows from (2e) and (2f) that (3e) and (3f) are satisfied, respectively. One can similarly verify (3g) holds.

Given a solution to (3) and the output from Algorithm 1 for all intervals. It follows from the properties of McNaughton’s algorithm [12] that only one task is assigned to a processor at a time if is chosen to be piecewise constant such that when and otherwise, . After verifying that (4) holds, one can show that (2b)–(2g) are satisfied.

The result follows by noting that the costs of the two problems are equal with the above choices.

Iii-C Feedback Scheduler

As can be seen, our open-loop optimal control problem is based on the estimated minimum execution time . The task will often finish earlier than expected, i.e. the actual minimum execution time is often less than . Consider Figure 4, which illustrates a fluid path of executing a task .

Fig. 4: Fluid scheduling model with uncertainty in the task execution time

Our open-loop algorithm follows a different path from the one that we really want to follow, i.e. the dotted line, due to uncertainty in task execution times. In other words, the open-loop algorithm can provide a solution that is overestimating the system workload, leading to higher energy consumption, due to the fact that the system operates at an unnecessarily higher speed. Therefore, it is better to feed back information whenever (i) a task finishes or (ii) a new task arrives at the system, in order to recalculate a new control action to respond to the changing workload.

Processor type XScale [13] PowerPC 405LP [14]
Frequency (MHz) 150 400 600 800 1000 33 100 266 333
Speed 0.15 0.4 0.6 0.8 1.0 0.1 0.3 0.8 1.0
Voltage (V) 0.75 1.0 1.3 1.6 1.8 1.0 1.0 1.8 1.9
Active Power (mW) 80 170 400 900 1600 19 72 600 750
Idle Power (mW) 40 [15] 12
TABLE II: Commercial processor details for simulation

The overall architecture of our feedback scheduling system is given in Figure 5, where the scheduler is called at two scheduling events.

Fig. 5: Feedback scheduling architecture

One occurs when a task finishes its required executing workload/cycles on one of the processors and the other when a new task arrives. The scheduler is composed of two sub-units, i.e. a workload partitioning unit and a task ordering unit. By solving (3), the workload partitioning unit provides control input to the task ordering unit, which then uses McNaughton’s wrap around algorithm to produce a valid schedule to the execution unit.

0.50 (0,1,5) (0,2,10) (0,1.5,15)
0.75 (0,1,5) (0,3.5,10) (0,3,15)
1.00 (0,2,5) (0,4,10) (0,3,15)
1.25 (0,1,5) (0,6.5,10) (0,6,15)
1.50 (0,2,5) (0,7,10) (0,6,15)
1.75 (0,3,5) (0,7.5,10) (0,6,15)
2.00 (0,4,5) (0,6,10) (0,9,15)
Note: The second parameter of a task is ;
can be obtained by multiplying by .
TABLE III: Simulation tasksets

Iv Simulation and Results

To evaluate the performance of our feedback scheme, we consider a set of aperiodic tasks to be scheduled on two commercial processors, namely a PowerPC 405LP and an XScale. The details of the two processors are given in Table II. Two homogeneous systems composed of two processors of the same type were chosen. The energy consumed by executing each taskset, listed in Table III, were evaluated.

The minimum taskset density , a measurement of the utilization of computing resources in a given time interval, is defined as the sum of minimum task densities of all tasks within the system. The LP (3) was modelled using OPTI TOOLBOX [16] and solved with SoPlex [17].

For this simulation, we only consider the scheduling event when a task finishes. Three algorithms are compared: (i) Feedback LP-DVFS, which is our LP-DVFS + McNaughton’s wrap around algorithm proposed in Section III-C, (ii) Open-Loop LP-DVFS, which is our LP-DVFS without feedback information on finishing tasks, and (iii) No mismatch/Ideal, which is our LP-DVFS with the actual minimum task execution times equal to the estimated, i.e. .

Figure 8 shows results from executing the tasksets in Table III onto two homogeneous multiprocessor system, composed of two of each processor type, with the estimation factor .

(a) PowerPC 405LP
(b) XScale
Fig. 8: Simulation results for different minimum taskset density with .

The vertical axis is the total energy consumption normalised by the Open-Loop LP-DVFS algorithm. For a system composed of PowerPCs, the feedback scheme can save energy up to about 40% compared to an open-loop scheme. However, for a system with XScale processors, the feedback scheme starts to perform better than the open-loop scheme only when the density is more than 1. Moreover, the percentage saving of the XScale system is less than that of the PowerPC’s. This is due to the differences in the distribution of speed levels of the two processor types, i.e. the XScale processor has more evenly distributed speed levels than that of the PowerPC; therefore, the optimizer can select the operating speed level that is closer to the optimal continous speed value.

The results from varying the estimation factor of the taskset with are shown in Figure 11.

(a) PowerPC 405LP
(b) XScale
Fig. 11: Simulation results for different estimation factor with .

Note that, for this simulation, the estimation factors of all tasks are the same. For a PowerPC system, the energy saving is high when the estimation factor is low. In addition, the difference between the energy consumed by the feedback strategy and the ideal decreases as the estimation factor increases. On the other hand, for an XScale system, the maximum energy saving does not occur when the estimation factor is the lowest, but rather occurs at . Furthermore, the energy consumption difference between the feedback and optimal/ideal is larger than that of the PowerPC’s. Note that the energy saving varies with the tasksets, solutions from different LP solvers, and the task execution order. Particularly, since the solutions are not unique, the choice of selecting the task execution order has an effect on the total energy consumption.

V Conclusions and Future Work

A feedback method was adopted to solve a multiprocessor scheduling problem with uncertainty in task execution times. We have shown that our proposed closed-loop optimal control scheduling algorithm performs better than the open-loop algorithm in terms of energy efficiency. Simulation results suggest that the difference between closed-loop and open-loop performance can be reduced by having a more refined distribution of operating speed levels.

The work presented here can be extended in a number of ways. For a periodic task, an estimator could be incorporated to obtain a better performance. For further energy savings, a dynamic power management scheme (DPM), which determines when and how long the processor should be in the active or idle state, could also be integrated in the scheme.

Finally, note that there are many links here to model predictive control [9] and it would therefore be of interest to investigate how methods developed in that community could be applied to the scheduling problem defined here. For example, one could extend the work to the problem of optimizing over feedback policies, rather than open-loop input sequences, as was done here. Efficient numerical methods, including distributed cooperative schemes, could also be developed to solve the LP (3) in real-time.

References

  • [1] W. Vereecken, W. Van Heddeghem, D. Colle, M. Pickavet, and P. Demeester, “Overall ICT footprint and green communication technologies,” in Communications, Control and Signal Processing (ISCCSP), 2010 4 International Symposium on, March 2010, pp. 1–6.
  • [2] Hewlett-Packard, Intel, Microsoft, P. T. Ltd., and Toshiba, “Advanced Configuration and Power Interface Specification (ACPI),” http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf, 2010.
  • [3] J. Wegener and F. Mueller, “A comparison of static analysis and evolutionary testing for the verification of timing constraints,” Real-Time Systems, vol. 21, no. 3, pp. 241–268, 2001.
  • [4] A. Soria-Lopez, P. Mejia-Alvarez, and J. Cornejo, “Feedback scheduling of power-aware soft real-time tasks,” in Computer Science, 2005. ENC 2005. 6 Mexican International Conference on, Sept 2005, pp. 266–273.
  • [5] D. R. Sahoo, S. Swaminathan, R. A-omari, M. V. Salapaka, G. Manimaran, and A. K. Somani, “Feedback control for real-time scheduling I,” in In Proc. American Controls Conference, 2002, pp. 1254–1259.
  • [6] Y. Zhu and F. Mueller, “Feedback EDF scheduling of real-time tasks exploiting dynamic voltage scaling,” Real-Time Systems, vol. 31, no. 1-3, pp. 33–63, 2005.
  • [7] X. Fu and X. Wang, “Utilization-controlled task consolidation for power optimization in multi-core real-time systems,” in Embedded and Real-Time Computing Systems and Applications (RTCSA), 2011 IEEE 17 International Conference on, vol. 1, Aug 2011, pp. 73–82.
  • [8] M. Thammawichai and E. C. Kerrigan, “Energy-efficient scheduling for homogeneous multiprocessor systems,” arXiv:1510.05567v2 [cs.OS], 2015.
  • [9] D. Q. Mayne, “Model predictive control: Recent developments and future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
  • [10] D. Zhu, R. Melhem, and B. Childers, “Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems,” in Real-Time Systems Symposium, 2001. (RTSS 2001). Proceedings. 22 IEEE, Dec 2001, pp. 84–94.
  • [11] J.-J. Che, C.-Y. Yang, and T.-W. Kuo, “Slack reclamation for real-time task scheduling over dynamic voltage scaling multiprocessors,” in Sensor Networks, Ubiquitous, and Trustworthy Computing, 2006. IEEE International Conference on, vol. 1, June 2006, pp. 8 pp.–.
  • [12]

    R. McNaughton, “Scheduling with deadlines and loss function,”

    Machine Science, vol. 6(1), pp. 1–12, October 1959.
  • [13] Intel XScale Microarchitecture: Benchmarks, 2005, http://web.archive.org/web/20050326232506/developer.intel.com/design/intelxscale/benchmarks.htm.
  • [14] C. Rusu, R. Xu, R. Melhem, and D. Mossé, “Energy-efficient policies for request-driven soft real-time systems,” in Real-Time Systems, 2004. ECRTS 2004. Proceedings. 16 Euromicro Conference on, June 2004, pp. 175–183.
  • [15] R. Xu, C. Xi, R. Melhem, and D. Moss, “Practical PACE for embedded systems,” in Proceedings of the 4 ACM International Conference on Embedded Software, ser. EMSOFT ’04.   New York, NY, USA: ACM, 2004, pp. 54–63.
  • [16] J. Currie and D. I. Wilson, “OPTI: Lowering the Barrier Between Open Source Optimizers and the Industrial MATLAB User,” in Foundations of Computer-Aided Process Operations, N. Sahinidis and J. Pinto, Eds., Savannah, Georgia, USA, 8–11 January 2012.
  • [17] R. Wunderling, “Paralleler und objektorientierter Simplex-Algorithmus,” Ph.D. dissertation, Technische Universität Berlin, 1996, http://www.zib.de/Publications/abstracts/TR-96-09/.