Nowadays, it is straightforward that energy efficiency is a crucial aspect of embedded systems where a huge number of small and very specialized autonomous devices interacting together through many kinds of media (wired/wireless network, bluetooth, GSM/GPRS, infrared…). Moreover, we know that the uniprocessor paradigm will no longer hold in those devices. Even today, a lot of mobile phones are already equipped with several processors.
In this ongoing work, we are interested in multiprocessor energy efficient systems, where task durations are not known in advance, but are know stochastically. More precisely, we consider global scheduling algorithms for frame-based multiprocessor stochastic DVFS (Dynamic Voltage and Frequency Scaling) systems. Moreover, we consider processors with a discrete set of available frequencies.
In the past few years, a lot of work has been provided in multiprocessor energy efficient systems. Most work was done considering static partitioning strategies, meaning that a task was assigned to a specific processor, and each instance of this task runs on the same processor. First of those work where devoted to deterministic tasks (with a task duration known beforehand, or the worst-case is considered), such as [1, 8, 4, 5], and later probabilistic models were also considered [7, 6]. Only a little work has been provided about global scheduling, such as , but for deterministic systems, or , using some slack reclamation mechanism, but not really using stochastic information.
We consider sequential tasks . Task requires
cycles with a probability, and its maximum number of cycles is (Worst Case Execution Cycles, or WCEC). The number of cycles a task requires is not known before the end of its execution. We consider a frame-based model, where all tasks share the same deadline and period and are synchronous. In the following denote the frame length.
Those tasks run on identical cpu , and each of those cpu can run at frequencies .
We consider that tasks cannot be preempted, but different instances of the same task can run on different processors, i.e., task migrations are allowed. We consider global scheduling techniques which schedule a queue of tasks ; each time a cpu is available, it picks up the first task in the queue, choose a frequency, and run the job. We assume the system is expedient333An expedient system is a system where tasks never wait intentionally. In other words, if a task is ready, the processor cannot be idle., and the job order has been chosen beforehand, but in some cases, in order to ensure the schedulability, the scheduler can adapt that order. In other words, we assume that the initial task order is not crucial and can be considered to be a soft constraint.
3 Global Scheduling Algorithm
In , we have provided techniques allowing to schedule such a task set on a single cpu. The main idea is to compute (offline) a function giving, for each task, the frequency to run the task based on the time elapsed in the current frame. This function, gave the frequency at which should run if started at time in the current frame. Here, for the sake of clarity, we are going to consider the symmetric function of : gives the frequency for if this task is started units of time before the end of the frame.
In the uniprocessor case, we were able to give schedulability guarantees, as well as good energy consumption performance. We want to be able to provide both in this multiprocessor case, using a global scheduling algorithm. As far as we know, global scheduling algorithm on multiprocessor system using stochastic tasks, and a limited number of available frequencies, has not been considered so far.
The idea of our scheduling algorithm is to consider that a system with cpu, and a frame length , is close to a system with a single cpu, but a frame length , or, with a frame length , but times faster. We then first compute a set of -functions considering the same set of tasks, but a deadline . A very naive approach would consist in considering that when a task ends at time , the total remaining available time before the deadline is the sum of remaining time available on each cpu, which means on the current cpu, and on the other ones, where is the worst time at which the task currently running on will end. Then, we could use to choose the frequency.
Unfortunately, this simple approach does not work, because a single task cannot use time on several cpus simultaneously. However, if the number of tasks is reasonably greater than the number of cpus, we think that in most cases, will not require to use more than the available time on the current cpu, and somehow, will let the available time on other cpus for future tasks. And when requires more time than actually available, we just use a faster frequency.
Of course, we need to ensure the schedulability of the system, which cannot be guarantied with the previous approach: for instance, at the end of a frame, we might have some slack time unusable because too short to run any of the remaining task. But as this time has been taken into account when we chose the frequency of previous tasks, we might miss the deadline if we do not take any precaution.
The algorithm we propose is composed of two phases, one off-line, and one on-line. The off-line one consists in performing a (virtual) static partitioning, aiming at reserving enough time in the system for each task. This phase is close to what we did in  with Danger Zones. The on-line phase uses both this pre-reservation to ensure the schedulability (but performing dynamic changes to this static partitioning), and the -functions, to improve the energy efficiency.
3.1 Virtual Static Partitioning
We first perform a “virtual static partitioning”. The aim of this partitioning is not to assign a task to a processor, but to make sure that every task can be executed. A task does not have to run on its assigned processor, but we know that some time has been reserved for this task, which allows to guarantee the schedulability.
This static partitioning can be performed in many ways, but we propose in Algorithm 1 to do it as balanced as possible, by sorting tasks according to their WCEC.
After this first step of virtual static partitioning, we can see the system as in Figure 1, left part. Notice that it is not because we cannot manage to do this virtual partitioning that the system is not schedulable. But at least, if we manage to do so, then we can ensure that the system is schedulable. This virtual static partitioning can be computed offline, and used for the whole life of the system.
3.2 On-line algorithm
Based on the virtual static partitioning, the main idea of the on-line part is to start a task at a frequency which allows it to end before the beginning of the “reserved” part of the frame. For instance, in Figure 1, could start on using all the space between the beginning of the frame, and the reserved space for . But we will see situations where the scheduler needs to give more time for . In such cases, we can also move, for instance, or on , or to . By doing so, and because we never let a running task using the reserved time of another (not started) task, we can guarantee that, if we were able to build a partitioning in the on-line phase, no task will never miss its deadline. Of course, as soon as a task starts, we release the reserved time for this task.
The on-line part of the algorithm is given in Algorithm 4. We first give some explanation about two procedures we need in the main algorithm.
This procedure (Algorithm 2) aims at moving enough tasks from cpu , until enough space (the quantity in the algorithm) is available, or no task can be moved anymore. For instance, in Figure 1, at time , we may want to run on at frequency . But according to the worst case of , we do not have enough time to run this task between 0, and the beginning of the reserved area of . However, we can move to , and or to .
While units of time is not available, we take the largest task on , and put it on the cpu
with the largest free space. This is of course a heuristic, since finding the optimal choice is probably NP-hard or at least intractable problem.
This procedure (Algorithm 3) aims at trying to move a task assigned to some cpu to the cpu . The main idea is that we first move out as many tasks as needed from (line LABEL:al:mti_mto), until we have enough space to import (lines LABEL:al:mti_move to LABEL:al:mti_e_move). If we have not managed to get enough space, false is returned (line LABEL:al:mti_fail). However, this algorithm is a heuristic, and is not always able to find a solution, even whether such a solution exists.
For instance (see Figure 1, right part), at the end of , we would like to start on . But neither not can be moved on another cpu, so our algorithm fails in finding a solution. However, a smarter algorithm could find out that by swapping and , would be able to start on . Notice that giving a solution in any solvable case is probably also an NP-hard or at least intractable problem.
The procedure we give here is quite naive, and not very efficient, but we let a better algorithm for further research. The naiveness of this algorithm does not affect the schedulability at all: it just makes the system to be forced more often to accept tasks order changes, which might degrade the energy efficiency (-functions are computed according to the given order), and the user satisfaction, if its preferences are often not respected.
3.2.3 Main algorithm
Here are the main steps of the procedure given in Algorithm 4, which is called each time a cpu (say ) is available, at time , with the next task to start. This procedure will always start at task at a speed guarantying deadlines, but not necessarily .
line LABEL:al:d: We first evaluate , the remaining time we have for : if is the worst time where is going to be available (the time of the last start, plus the worst case execution time of the current task at the chosen frequency), we have:
line LABEL:al:f: Let , the frequency chosen for in the single cpu model with units of time before the deadline. We are going to check if we can use this frequency (we assume this frequency to be a “good” one from the energy consumption point of view).
line LABEL:al:notin-LABEL:al:e_notin: If was not assigned to , we first try to move it to (Algorithm 3). If we have enough space on , the situation is easy. Otherwise, we need to move some tasks out from , in order to create enough space.
line LABEL:al:starti+1: If we cannot manage to make enough space, then we are not able to start right now. We try then the same procedure for , but we need to left-shift functions of . This is not required from the schedulability point of view (we ensure the schedulability by controlling the available time), but we guess it will improve the energy consumption. For the same reason, we will need to right-shift functions of the same amount when starts, because we have one task less to run after . (This improvement is not yet implemented in the given algorithm. It requires to be done carefully, because we might have several swapped tasks).
line LABEL:al:mto: If we succeeded, we try to move as many tasks as possible from to other cpus (Algorithm 2), until we have enough space to start at , or no task can be moved anymore. We then start either at , or at the smallest frequency allowing to run in the space we manage to free (line LABEL:al:f2). As was assigned to (possibly after some changes), we are at least sure that we can start at .
Notice that when StartTask is invoked, it is always possible to run a job, and therefore, we will never consider in Algorithm 4, line LABEL:al:starti+1. Because of space limitation, we will not give the proof here.
Here are a few points we want to look deeper, allowing to improve the energy consumption, or the number of systems we are able to schedule.
At the end of a frame, assuming we can verify that after the task we start, we won’t run tasks anymore on this cpu, we can try to run tasks using the cpu until . For instance, if we start a task on at a speed which lets a free space too small to run any of the remaining tasks, then we should try to stretch the task to use up to .
If we accept to change the frequency during the execution of tasks, we can use the continuous model to obtain a frequency , and use two frequencies and to “emulate” this , where (resp. ) stands for the smallest frequency above (resp. largest below) .
Several steps require to solve NP-hard problems by using some heuristics: Static partitioning (Algorithm 1), MoveTaskIn (Algorithm 3), and MoveTasksOut (Algorithm 2). The efficiency of the first one improves the number of systems we can accept to schedule, the second one, the number of tasks we will need to swap (not run in the right order), and the third one, how close we can stay from the uniprocessor algorithm. We may try to improve those three algorithms.
In order to reduce leakage or static energy consumption, we could turn off cpu if they are not needed anymore before the end of the frame.
Of course, we also — and mainly — need to validate our model and show its efficiency by the way of simulations, using realistic environment and workloads.
-  Aydin, H., and Yang, Q. Energy-aware partitioning for multiprocessor real-time systems. In IPDPS ’03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing (Washington, DC, USA, 2003), IEEE Computer Society, p. 113.2.
-  Berten, V., Chang, C.-J., and Kuo, T.-W. Discrete frequency selection of frame-based stochastic real-time tasks. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (Taiwan, August 2008), IEEE, Ed., RTCSA2008, p. 8.
-  Chen, J.-J., Hsu, H.-R., Chuang, K.-H., Yang, C.-L., Pang, A.-C., and Kuo, T.-W. Multiprocessor energy-efficient scheduling with task migration considerations. In ECRTS ’04: Proceedings of the 16th Euromicro Conference on Real-Time Systems (Washington, DC, USA, 2004), IEEE Computer Society, pp. 101–108.
-  Chen, J.-J., and Kuo, T.-W. Energy-efficient scheduling of periodic real-time tasks over homogeneous multiprocessors. In PARC (September 2005), pp. 30–35.
-  Chen, J.-J., and Kuo, T.-W. Multiprocessor energy-efficient scheduling for real-time tasks with different power characteristics. In ICPP ’05: Proceedings of the 2005 International Conference on Parallel Processing (Washington, DC, USA, 2005), IEEE Computer Society, pp. 13–20.
-  Mishra, R., Rastogi, N., Zhu, D., Mossé, D., and Melhem, R. Energy aware scheduling for distributed real-time systems. In IPDPS ’03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing (Washington, DC, USA, 2003), IEEE Computer Society, p. 21.2.
-  Xian, C., Lu, Y.-H., and Li, Z. Energy-aware scheduling for real-time multiprocessor systems with uncertain task execution time. In DAC ’07: Proceedings of the 44th annual conference on Design automation (New York, NY, USA, 2007), ACM, pp. 664–669.
-  Yang, C.-Y., Chen, J.-J., and Kuo, T.-W. An approximation algorithm for energy-efficient scheduling on a chip multiprocessor. In DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe (Washington, DC, USA, 2005), IEEE Computer Society, pp. 468–473.
-  Zhu, D., Melhem, R., and Childers, B. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems. In RTSS ’01: Proceedings of the 22nd IEEE Real-Time Systems Symposium (RTSS’01) (Washington, DC, USA, 2001), IEEE Computer Society, p. 84.