Energy-aware Fixed-Priority Multi-core Scheduling for Real-time Systems

12/23/2015 ∙ by Yao Guo, et al. ∙ 0

Multi-core processors are becoming more and more popular in embedded and real-time systems. While fixed-priority scheduling with task-splitting in real-time systems are widely applied, current approaches have not taken into consideration energy-aware aspects such as dynamic voltage/frequency scheduling (DVS). In this paper, we propose two strategies to apply dynamic voltage scaling (DVS) to fixed-priority scheduling algorithms with task-splitting for periodic real-time tasks on multi-core processors. The first strategy determines voltage scales for each processor after scheduling (Static DVS), which ensures all tasks meet the timing requirements on synchronization. The second strategy adaptively determines the frequency of each task before scheduling (Adaptive DVS) according to the total utilization of task-set and number of cores available. The combination of frequency pre-allocation and task-splitting makes it possible to maximize energy savings with DVS. Simulation results show that it is possible to achieve significant energy savings with DVS while preserving the schedulability requirements of real-time schedulers for multi-core processors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multi-core processors have been adopted not only in high-performance servers and personal computers, but also for embedded and real-time systems. Many real-time scheduling algorithms for multi-core processors have been proposed in recent years, among which the semi-partitioned fixed-priority multi-core scheduling algorithms [Guan:2010:FMS:1828428.1829220, Lakshmanan:2009:PFP:1581378.1581523] could achieve higher utilization bound.

On the other hand, energy consumption is critical for many battery-operated embedded and real-time systems. However, although techniques such as dynamic voltage/frequency scaling (DVS) [WeiserWDS94] have been available in most modern processors, these energy-aware aspects have not been considered in the recently proposed semi-partitioned fixed-priority multi-core scheduling algorithms with high utilization bound.

In order to understand the energy implications of these semi-partitioned fixed-priority multi-core scheduling algorithms, this paper explores the possibility of applying DVS to two recently proposed semi-partitioned fixed-priority multi-core scheduling algorithms: SPA2 [Guan:2010:FMS:1828428.1829220] introduced by Guan et al. and PDMS_HPTS_DS (PHD) [Lakshmanan:2009:PFP:1581378.1581523] introduced by Lakshmanan et al. Among the two techniques, SPA2 could reach a utilization bound of 69.3% and PHD 65%, both considerably higher compared to the previous approaches in priority-based multi-core scheduling.

Because neither algorithm has considered energy in their approaches, we introduce two different methods to apply DVS to the above two multi-core scheduling algorithms (SPA2 and PHD):

  • Static DVS: We first develop an extended DVS algorithm based on the traditional DVS algorithm for fixed-priority scheduler [Pillai:2001:RDV:502034.502044], and apply it after scheduling with task-splitting to evaluate the schedulability and energy savings. To be specific, we set the maximal frequency for certain split sub-tasks, so that all the former sub-tasks could finish before the latter ones release. This ensures that the scheduling will meet the timing requirements on synchronization for split tasks.

  • Adaptive DVS: We then propose a new algorithm, which adaptively determines the frequency of each task before scheduling in order to achieve better performance on both schedulability and energy consumption. Adaptive DVS actually schedules the tasks with prolonged execution time based on DVS, while ensuring that all the tasks meet the timing requirements after DVS. Specifically, the frequency is determined by the total utilization of task-set and the number of cores available, pursuing the maximal balance of tasks in each processor and therefore getting considerably more energy saving compared to the static DVS approach.

In order to evaluate the two approaches mentioned above, we developed a simulator to compare their schedulability and energy consumption with different configurations including different total utilization and different number of processor cores.

In terms of schedulability, PHD performs much better than SPA2 when the utilization is over 70% (Both algorithms can be one hundred percent schedulable when the utilization is below 65% due to their utilization bounds). PHD can keep its schedulability at more than 90% when the utilization reaches 90%, while SPA2 sharply decreases to zero.

When considering their energy consumption, PHD with static DVS algorithm nearly gets close to the worst case. PHD with adaptive DVS saves most energy because of its equal distribution of tasks to every core available. Overall, the simulation results show that PHD with adaptive DVS algorithm has demonstrated lower energy consumption compared to the other three strategies while maintaining the good schedulability of PHD.

The main contribution of this paper is that we explored the possibility of applying DVS to semi-partitioned fixed-priority multi-core scheduling algorithms. We have presented two different approaches to apply DVS, and simulation results shows that energy-aware scheduling is achievable without affecting the schedulability of two state-of-the-art algorithms. To the best of our knowledge, this is the first work considering energy-efficient multi-core scheduling with task-splitting.

The rest of the paper is organized as follows: Section 2 introduces the background information and assumptions, as well as notations used in this paper. The proposed static DVS and adaptive DVS algorithms are presented in Section 3 and 4 respectively. Section 5 presents the experimental evaluation of scheduling and DVS schemes. In Section 6, we review the related work, which focuses on multi-core scheduling for real-time system and energy-aware approaches. We conclude the paper with Section 7.

2 Preliminaries

This section introduces the background information and assumptions, as well as notations used in this paper.

2.1 Task Model

We shall use the following notation throughout this paper. We consider a task-set comprising of periodic tasks. This task-set is assigned to processor cores. We use the classical model to represent the parameters of a task , where represents the worst-case computation time at maximal frequency of each job of , represents the period of T , and is defined as the deadline of each job of relative to job release time. For a task without split, its deadline equals to its period . When it comes to a subtask of a split task, its deadline is less than its period , in order to set aside time for other subtasks from the same split task.

Tasks : are ordered such that implies . Since our proposed algorithms use deadline-monotonic scheduling as the scheduling algorithm on each processor, we can use the task indices to represent the task priorities, i.e., has higher priority than if and only if . The utilization of each task is defined as . The total utilization is given by .

During scheduling, some tasks are split and assigned to different processors. We call these tasks split tasks, which are split into several subtasks. For a split task , denotes the th subtask of . We define the last subtask of its tail subtask, and other subtasks are called body subtasks.

The subtasks of a split task need to be synchronized to execute correctly, which means that cannot start execution until is finished. Therefore, the time for a subtask to execute is shorter than its period, in order to share time with other subtasks from the same split task. For a subtask split from task , its deadline satisfies the equation , where means the actual time span of the subtask from its release to completion.

2.2 Energy Model

For processors based on the CMOS technology, the power consumption of each core consists of two parts: dynamic power dissipation and static power dissipation . Dynamic power dissipation is given by: , where is the supply voltage, is the effective switching capacitance, and is the processor clock frequency. For simplicity, processor frequency can be considered roughly linearly to the supply voltage: , where is a constant and is the threshold voltage [Chandrakasan95lowpower]. Thus, is almost cubically related to : . Static power dissipation is dominated by the leakage current, and can be regarded as a constant.

For simplicity of presentation, we assume the power dissipation function as below:

(1)

where and are non-negative constants.

The total power consumption of a multi-core processor is simply the sum of the power dissipated in each core: . For real-time systems, because we can assume that the processors are always running, the energy consumption is actually proportional to power dissipation in all cases.

Frequency(MHz) 150 400 600 800 1000
Voltage(V) 0.75 1.0 1.3 1.6 1.8
TABLE I: Frequency/voltage settings of the XScale processor.

Based on the frequency/power settings of Intel processor XScale[Xu:2004:PPE:1017753.1017767] in Table I

, we estimate the parameters of power consumption model in equation (

1) as below:

(2)

To eliminate the effect of leakage power consumption, we introduce a critical speed [Chen:2007:PDP:1326073.1326132] as the minimal frequency that a processor core can execute at. Based on the model in equation (2), we can calculate that the critical speed, as well as the minimal frequency , is 0.297GHz.

To model different processor frequencies, we have simulated both continuously changed frequencies among (the ideal case) and also a discrete frequency level model based on an actual processor model (i.e., Intel XScale).

2.3 Scheduling Algorithms Studied

As mentioned earlier, this paper studies two existing multi-core scheduling algorithms, on which we will give a brief description respectively.

2.3.1 Spa2

The algorithm SPA2 [Guan:2010:FMS:1828428.1829220] could reach Liu & Layland’s Utilization Bound [Liu:1973:SAM:321738.321743] for task sets without any constraints. It assigns tasks in increasing order of priority, and each time selects the processor with the minimal utilization to assign. In order to ensure the schedulability of task-set with heavy tasks, it first assigns heavy tasks that satisfy a particular condition so that the heavy tasks will not be split and the execution sequence will be guaranteed.

2.3.2 Pdms_hpts_ds

The algorithm PDMS_HPTS_DS (PHD) [Lakshmanan:2009:PFP:1581378.1581523] has a slightly lower utilization bound (65%) than the algorithm SPA2 (69.3%). Based on our simulation results, PHD could achieve an average schedulable utilization of 88%, which is considerably higher than SPA2.

PHD assigns the tasks in decreasing order of utilization, and assigns tasks to the next processor only when the previous processor could no longer hold more tasks. In order to ensure the sequence rules of split task, the algorithm only split the task with highest priority of each processor so that the sub-tasks of the split task could be completed in sequence.

3 The Static DVS Approach

Pillai and Shin have introduced a DVS algorithm for fixed-priority schedulers [Pillai:2001:RDV:502034.502044], achieving energy savings by reducing the operating frequency and voltage when remaining tasks need less than the remaining time before the next deadline. However, for scheduling with task-splitting, one cannot reduce the frequency freely, because the synchronous requirements of split-tasks may be violated when postponing the execution time of each subtask, as the example shown in Fig. 1.

In order to achieve energy savings for scheduling with task-splitting, we develop a new static DVS algorithm based on the previous work, which did not take task-splitting into consideration. We reselect the frequency when any of the tasks is released or finishes. Before selecting frequencies, we first examine whether the task/subtask is a body subtask. If it is a body subtask, we it under maximal frequency and reselect the frequency when this subtask finishes, so that the synchronous requirement is always satisfied.

[width=3.5in]s.eps

Fig. 1: Synchronous Violation with Traditional DVS

The detailed description of the static DVS algorithm for scheduling with task-splitting is shown in Algorithm 1. In function , the time line represents either the release time or deadline. So the function returns the smaller one between the available time until the next deadline and the available time until the next release time.

1:  select_frequency():
2:  available_time_until_next_time_line()
3:  
4:  upon task_release():
5:  
6:  available_time_until_next_time_line()
7:  allocate_cycles()
8:  if  is body subtask then
9:     
10:  else
11:     select_frequency()
12:  end if
13:  task_completion():
14:  
15:  
16:  select_frequency()
17:  during task_execution():
18:  decrement and
19:  allocate_cycles(k):
20:  for  to  do
21:     if  then
22:        
23:        
24:     else
25:        
26:        
27:     end if
28:  end for
ALGORITHM 1 Static DVS for Scheduling with Task-Splitting

We can prove that the above static DVS algorithm will not violate any of the timing requirements. The timing requirements can be separated as two statements as below.

  1. If a task set is schedulable under a scheduling algorithm without DVS, it could still be ensured that all the tasks finish before their deadlines after applying the static DVS algorithm.

  2. The split task could satisfy the synchronous requirements. That is, it is guaranteed that the next sub-task will be released after the previous one finishes.

For the first statement, we take into consideration the earliest time from now that may break a deadline. This earliest time may be a task’s deadline or a task’s release time which gains the remaining cycles to execute and therefore increasing the chance to violate a later deadline. While taking this earliest time as a bound while selecting frequency, it is guaranteed that all the deadlines can be met.

For the second statement, we notice that the two algorithms [Guan:2010:FMS:1828428.1829220] [Lakshmanan:2009:PFP:1581378.1581523] we studied both satisfy the following property: the body subtask always has the highest priority in its core, so that they can complete for the next subtask as soon as possible. The tail task only needs to meet its deadline. So all we need to do is allowing these body subtasks execute with no latency compared to the original schedule without DVS. Executing the body subtask at the maximal frequency can guarantee this.

On the other hand, the energy savings suffer very little through our trade-off for split tasks. We performed a simple simulation for scheduling with eight cores, comparing the energy consumption between traditional DVS and the proposed static DVS algorithm for scheduling with task-splitting. The simulation results are shown in Fig. 2. It shows clearly that there are only slight differences on energy consumption between traditional DVS and the static DVS algorithm. Actually, the energy consumption of static DVS is 1.2% more than that of the traditional DVS approach on average. We believe that such a small cost is acceptable in order to meet the synchronization requirements for split tasks.

[width=1.7in]bs8.eps

(a) Energy for SPA2

[width=1.7in]bp8.eps

(b) Energy for PHD
Fig. 2: Energy consumption between traditional DVS and our DVS

The static DVS approach will be evaluated in detail later in Section 5.

4 The Adaptive DVS Approach

To get the best results of DVS, we attempt explore better algorithms to achieve more energy savings compared to the above static DVS algorithm.

In this section, we first consider the potential of energy optimization, and then propose a new DVS algorithm which adaptively determines the frequency of each task before scheduling (Adaptive DVS) that can save more energy compared to the static DVS approach.

4.1 Energy Optimization

Given a task-set with periodic real-time tasks and a processor with M cores, we need to find a schedulable task-to-core assignment that minimizes the energy consumption under DVS. Thus, two conditions [Aydin:2003:EPM:838237.838347] must be satisfied:

  1. The assignment must evenly divide the total load among all the cores.

  2. In each core with total utilization , the frequency must be constant and equal to .

That is, each processor must manage to run under constant frequency that satisfies , where is the maximal frequency a processor’s multiple supply voltages could provide. We call this frequency the “ideal frequency”:

With frequency pre-allocation and task-splitting, it is possible to get very close to the minimal energy, as long as the assignment is schedulable.

4.2 Motivating Example

Consider three tasks shown in table II to be executed on a 2-core processor. We try to assign the task-set with different algorithms and compute the energy consumption on each processor.

3 5 0.6
3 5 0.6
4 10 0.4
TABLE II: An example task-set

4.2.1 Spa2

According to the algorithm description from Guan et al. [Guan:2010:FMS:1828428.1829220], the task-set is not schedulable under SPA2, because the average utilization for each processor exceeds Liu & Layland’s Utilization Bound [Liu:1973:SAM:321738.321743]:

.

4.2.2 PHD with Static DVS

With PHD, the task assignment is shown in Fig. 3(a): The first processor is fully used, while only 60% of the second processor is used. With static DVS, based on the model in equation (2), the energy consumption in 10 seconds of the first core is . The energy consumption in 10 seconds of the second core is . Thus, the total energy is .

4.2.3 PHD with Adaptive DVS

If we first decide the frequency to be , the utilization for each task could be considered as (0.75, 0.75, 0.5), because of the postponement of the execution. Then, we adjust such data into the PHD algorithm. The task assignment is shown in Fig. 3(b): Both cores are fully used, and each core’s energy consumption in 10 seconds is . Thus, the total energy is , which is obviously much less than the previous result of . Actually, in this case, the strategy that deciding the frequency before scheduling has satisfied the two conditions for energy minimization mentioned above.

In the example above, we can see that adaptively deciding the frequency before scheduling (Adaptive DVS) consumes much less energy than directly assigning tasks in a depth-first way (as PHD with static DVS does). At the same time, PHD with adaptive DVS manages to assign task-sets with larger total utilization than SPA2.

[width=1.65in]a.eps

(a) PHD+Static DVS

[width=1.65in]b.eps

(b) PHD+Adaptive DVS
Fig. 3: Assignment for task-set in TABLE II

[width=3.5in]ae.eps

(a) PHD+Static DVS

[width=3.5in]be.eps

(b) PHD+Adaptive DVS
Fig. 4: Execution for task-set in TABLE II

4.3 Algorithm Description

Even though the two conditions for energy minimization are usually hard to satisfy, we could still try to achieve a result as close as possible. Below are the detailed steps of the proposed adaptive DVS algorithm.

Step 1. We try to set all the tasks’ execution frequency to the same ideal level: .

Step 2. Because some tasks may not execute in such low frequency due to their relatively high utilization, which might be greater than , we deal with the tasks in two different ways: if a task satisfies , we set its frequency as . Otherwise, we set its frequency as .

Step 3. After changing the frequency of each task, we need to extend their execution time accordingly. For convenience in the next step, we regard the extended execution time as new execution time, and store the original one. Thus, for each task, .

Step 4. We try the new task-set with the scheduling algorithm (PHD for example). If it is schedulable, we can decide the minimal that is schedulable for certain task-set and processors. Otherwise, we need to gradually increase the frequency and repeat Step 2, 3, 4 until it is schedulable.

As a result, we could assign the tasks to processors as balanced as possible and achieve significant energy savings with the adaptive DVS algorithm. Experiments in the next section will show that such an algorithm could maintain the good schedulability of PHD while consuming much less energy.

The detailed description of the adaptive DVS algorithm for scheduling with task-splitting is shown in Algorithm 2.

1:  for each available among the range of [, do
2:     for each  do
3:        
4:     end for
5:     for each  do
6:        if  then
7:           
8:           
9:        else
10:           
11:           
12:        end if
13:     end for
14:     if  then
15:        Done
16:     else
17:        for each  do
18:           
19:        end for
20:     end if
21:  end for
ALGORITHM 2 Adaptive DVS for Scheduling with Task-Splitting

5 Simulation

We have developed a simulator to evaluate the schedulability and energy savings from dynamic voltage scaling in a multi-core real-time system for both static DVS and adaptive DVS approaches on the two scheduling algorithms SPA2 and PHD.

5.1 Simulation Methodology

We developed a simulator for the operation of hardware capable of voltage and frequency scaling with real-time scheduling. This simulator takes average utilization per core and number of cores as input, and calculates the schedulability and energy consumption for each of the algorithms we have studied: SPA2 with static DVS, PHD with static DVS, SPA2 with adaptive DVS and PHD with adaptive DVS. Figure 5 illustrates the outline of the simulator, which consists of four components - task set generator, scheduler of different algorithms, frequency allocator under different time and energy estimator.

[width=2.5in]sim.eps

Fig. 5: The working flow of our simulator.

The real-time tasks are specified using pairs of

, indicating their worst-case computation time and period. The task-sets are generated as follows. Each task has an equal probability of having a period among [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]. This simulates the varied mix of short and long period tasks commonly found in real-time systems. The worst-case computation time is uniformly distributed over [0, P]. Finally, the last task computation requirements are scaled by a constant chosen such that the sum of the utilizations of the tasks in the task-set reaches a desired value. We simulate each task on platforms with 2, 4, 8 or 16 cores. For a fixed number of cores

, we varied average utilization per core among .

5.2 Energy Calculation

To achieve an ideal effect from DVS, we first assume that multiple supply voltages is continuous. That is, the processor could execute in any frequency in range [, ], where is the critical speed.

Then, for more practical purpose, we simulate the approaches under discrete voltages and frequency. To model a real processor, we use the frequency/power settings of Intel XScale [Xu:2004:PPE:1017753.1017767](as shown in Table I) for discrete frequencies.

Only energy consumed by the processor is computed, and variations due to different types of instructions executed are not taken into account. With this simplification, the task execution model can be reduced to counting cycles of execution, and execution traces are not needed. In particular, this does not consider preemption and task switching overheads, or the time required to switch operating frequency or voltages. There is no loss of generality from these simplifications. The preemption and task switch overheads are the same with static DVS or adaptive DVS, so they have no (or very little) effect on relative energy consumption numbers.

5.3 Simulation Results

We performed simulations for the four approaches, which include the two static DVS approaches (PHD+Static DVS and SPA2+Static DVS), and the two adaptive DVS approaches (PHD+Adaptive DVS and SPA2+Adaptive DVS). Each utilization and core number level in the results corresponds to simulations of 10,000 task-sets. The results shown are average numbers from 10,000 simulations.

[width=1.80in]s2.eps

(a) 2 Cores

[width=1.80in]s4.eps

(b) 4 Cores

[width=1.80in]s8.eps

(c) 8 Cores

[width=1.80in]s16.eps

(d) 16 Cores
Fig. 6: Schedulability simulation results (under continuous voltages/frequencies).

[width=1.80in]ds2.eps

(a) 2 Cores

[width=1.80in]ds4.eps

(b) 4 Cores

[width=1.80in]ds8.eps

(c) 8 Cores

[width=1.80in]ds16.eps

(d) 16 Cores
Fig. 7: Schedulability simulation results (under discrete voltages/frequencies).

5.3.1 Schedulability

Fig. 6 shows simulation results of schedulability for all the approaches under continuous frequencies/voltages. Schedulability stands for the possibility that a task-set with specified total utilization is schedulable to a specified number of processors under an algorithm.

From Fig. 6, we can see clearly that all the algorithms have a schedulability of 100% when the average utilization is under 70%, which corresponds to the utilization bound given by the previous work – 65% for PHD and 69.3% for SPA2. However, the two scheduling algorithms show a remarkable difference when the average utilization exceeds 70%. SPA2 does not perform so well as its utilization bound, due to its severe restrictions set by Liu & Layland’s Utilization Bound [Liu:1973:SAM:321738.321743] during scheduling. On the contrary, PHD has a much better schedulability at higher utilization because it fills every processor as full as possible without unnecessary restrictions. Of course, it is difficult for either algorithm to schedule most of task-sets when the total utilization equals to the number of processors. So the schedulability of almost all algorithms decreases to zero when the average utilization reaches 100%.

In addition, we realize that the time when we perform DVS (either static DVS or adaptive DVS) does not make much difference on schedulability. After all, adaptive DVS will try all the frequencies until it is schedulable or the frequency becomes maximal as static DVS does. So any task-set that is schedulable with static DVS algorithms could be schedulable with adaptive DVS algorithms as well. On the other hand, adaptive DVS algorithms try to reduce the frequency by prolonging the time a task executes, so it is hard to find a task-set schedulable with adaptive DVS while unschedulable with static DVS, although it does exist in rare instances. As a whole, the schedulability of static DVS and adaptive DVS are at approximately the same level, as shown in the simulation results.

To model a real processor, we also simulate the algorithms with discrete voltages and frequencies as the XScale processor mentioned above. The results of schedulability are shown in Fig. 7. The results show that the schedulability under discrete voltages are almost the same as the previous results under continuous voltages. This is because the differences of frequencies are not significant enough to affect the schedulability statistics.

[width=1.80in]p2.eps

(a) 2 Cores

[width=1.80in]p4.eps

(b) 4 Cores

[width=1.80in]p8.eps

(c) 8 Cores

[width=1.80in]p16.eps

(d) 16 Cores
Fig. 8: Energy simulation results (under continuous voltages/frequencies).

[width=1.80in]dp2.eps

(a) 2 Cores

[width=1.80in]dp4.eps

(b) 4 Cores

[width=1.80in]dp8.eps

(c) 8 Cores

[width=1.80in]dp16.eps

(d) 16 Cores
Fig. 9: Energy simulation results (under discrete voltages/frequencies).

5.3.2 Energy

Fig. 8 shows the energy numbers for all the approaches. The energy numbers in our results are normalized to the energy consumed under a processor’s maximal frequency.

From Fig. 8, we notice that when the average utilization is close to 0, the energy numbers of all the algorithms reduce to 0 as expected, because there are few tasks to execute and DVS just reduces the frequency to 0 to save energy (of course this is the ideal case). When the average utilization comes to 100%, the normalized energy numbers of all the algorithms reach 1, because all the cores have to be full and keep at maximal frequency as long as the task-set is schedulable.

When the utilization gets close to about 50%, PHD with static DVS gradually shows much higher energy consumption compared to the other three techniques. This is because PHD greedily assigns tasks to as few processors as possible and just keeps other cores idle, while DVS could not take effect when a core is completely full or empty. As a result, the normalized energy of PHD with static DVS always approximately equals to the average utilization per core, showing an almost linear relationship.

As to the other three algorithms, they have different ways to schedule the tasks evenly to all the processors, and therefore achieve a considerable energy saving. The SPA2 algorithms assign tasks in a width-first way. Adaptive DVS algorithms increase all the tasks utilization as high as possible, so that they have to fill all the cores.

In all the cases, we found that PHD with adaptive DVS achieves the most energy savings, while PHD with static DVS achieves the least energy saving. For example, for the 8-core case with 70% utlization, PHD with adaptive DVS reduces energy by 56.7%, compared to only 32.7% savings for PHD with static DVS. The benefits of the other two techniques (SPA2 with static DVS and adaptive DVS) are roughly 50%. The explanation to this is that pre-allocation of frequencies and a depth-first way assigning with task-splitting produce the most balancing scheduling, thus taking most advantages of all the cores.

In a more practical situation, we simulate the algorithms with discrete voltages/frequencies as the XScale processor mentioned above. The results are shown in Fig. 9. The energy savings become smaller for all the algorithms compared to the continuous voltage/frequency case. It is because that the frequencies for a processor to choose is much more restricted. Among the four algorithms, the energy savings of PHD with static DVS decreases the least, because it has been to the worst point under continuous frequencies and could not get worse. PHD with adaptive DVS remains to be the best approaches, although the gap between the best and worst technique also shrinks.

From the simulation results, we can see that it is practical to apply energy-saving techniques such as DVS to multi-core scheduling algorithms with task-splitting. Although all the four approaches we have studied could save considerable energy consumption with DVS, the PHD scheduling algorithm with adaptive DVS shows both excellent schedulability and energy savings among all the approaches.

6 Related Work

Multi-core scheduling schemes for real-time system can be classified into global and partitioned approaches. In global scheduling, all tasks are put in a global queue and each processor selects from the queue the task with the highest priority for execution. In partitioned scheduling, each task is assigned to a specific processor and each processor fetches tasks for execution from its own queue. It has been shown that each of these categories has its own advantages and disadvantages

[Lauzac98comparisonof]. Global scheduling schemes can better utilize the available processors, as illustrated by PFair [Baruah:1998:PSG:626526.627187] and LLREF [Cho:2006:ORS:1193218.1194408]. These schemes appear to be best-suited for applications with small working set sizes. On the other hand, partitioned approaches are severely limited by the low utilization bounds associated with bin-packing problems. The advantage of these schemes is their stronger processor affinity, and hence they provide better average response times for tasks with larger working set sizes.

Global scheduling schemes based on rate-monotonic scheduling (RMS) and earliest deadline first (EDF) are known to suffer from the so-called Dhall effect. When heavyweight (high-utilization) tasks are mixed with lightweight (low-utilization) tasks, conventional real-time scheduling schemes can yield arbitrarily low utilization bounds on multiprocessors. By dividing the task-set into heavy-weight and lightweight tasks, the RM-US [Andersson01static-priorityscheduling] algorithm achieves a utilization bound of 33% for fixed-priority global scheduling. These results have been improved with a higher bound of 37.5% [Lundberg:2002:AFG:827265.828503]. The global EDF scheduling schemes have been shown to possess a higher utilization bound of 50% [Baker:2005:AES:1070609.1070737]. PFair scheduling algorithms based on the notion of proportionate progress [Baruah:1993:PPN:167088.167194] can achieve the optimal utilization bound of 100%. Despite the superior performance of global schemes, significant research has also been devoted to partitioned schemes due to their appeal for a significant class of applications, and their scalability to massive multi-cores, while exploiting cache affinity.

Partitioned multiprocessor scheduling techniques have largely been restricted by the underlying bin-packing problem. The utilization bound of strictly partitioned scheduling schemes is known to be 50%. This optimal bound has been achieved for both fixed-priority algorithms [10.1109/EMRTS.2003.1212725] and dynamic priority algorithms based on EDF [Lopez:2004:UBE:1008193.1008208]. Most modern multi-core processors provide some level of data sharing through shared levels of the memory hierarchy. Therefore, it could be useful to split a bounded number of tasks across processing cores to achieve higher system utilization [NizR06]. Partitioned dynamic-priority scheduling schemes with task splitting have been explored in this context [Andersson:2006:MSF:1157741.1158329] [Kato:2007:RST:1306877.1307300]. Partitioned fix-priority scheduling schemes with task-splitting are also explored recently [Lakshmanan:2009:PFP:1581378.1581523] [Guan:2010:FMS:1828428.1829220]. Lakshmanan et al.. [Lakshmanan:2009:PFP:1581378.1581523] showed that the cache overheads due to task-splitting can be expected to be negligible on multi-core platforms. However, few works have considered the problem of energy consumption while scheduling with task-splitting.

On the other hand, there are also many works on energy-aware scheduling for real-time system [Aydin:2003:EPM:838237.838347, Mei-2013, Fan-SAC13, Wu-HPCC13]. It is proved that DVS can achieve significant energy savings. But none of the energy-aware techniques have considered the task-splitting strategies, which provides better utilization on the available processors. In this paper, we focus on the energy aspects and explore energy-aware partitioned fix-priority scheduling schemes with task-splitting. Our previous work [Lu-RTCSA13] has shown preliminary and promising results on this topic.

7 Conclusion

In this paper, we have explored the possibility of combining dynamic voltage (frequency) scheduling with semi-partitioned fixed-priority multi-core scheduling with task-splitting for real-time systems. We proposed two different techniques to apply the DVS algorithm to multi-core scheduling approaches with task-splitting features. The techniques proposed include performing DVS after scheduling (Static DVS) and performing DVS before scheduling (Adaptive DVS).

We simulated the proposed techniques under different processor setups. Simulation results show that it is possible to achieve significant energy savings with DVS while preserving the schedulability requirements of real-time schedulers for multi-core processors.

There are several areas we would like to explore in order to improve the current approach. First, we realize that the rounding of frequency under real frequency settings may lead to a waste of processer resources, and in turn result in more energy consumption and even weaker schedulability. Thus we will explore the possibility that takes frequency settings as conditions of energy optimization in order to alleviate the loss on rounding frequency. Secondly, at present we only evaluate our DVS strategies on simulators, upon which many factors could not be simulated precisely compared to the real case. In future, we plan to conduct our DVS strategies on real multi-core processors to produce more accurate performance and energy evaluation results.

Acknowledgments

This work is supported partly by the National Basic Research Program of China (973) under Grant No. 2009CB320703, the Science Fund for Creative Research Groups of China under Grant No. 60821003, the National Natural Science Foundation of China under Grant No.61103026, and the National High Technology Research and Development (863) Program of China under Grant No 2011AA01A202.

References