Online Starvation Mitigation to Balance Average Flow Time and Fairness

12/29/2021
by   Tung-Wei Kuo, et al.
National Chengchi University
0

In job scheduling, it is well known that Shortest Remaining Processing Time (SRPT) minimizes the average flow time. However, SRPT may cause starvation and unfairness. To balance fairness and average flow time, one common approach is to minimize the ℓ_2 norm of flow time. All non-trivial algorithms designed for this problem are offline algorithms based on linear programming rounding. For the online setting, all previous works consider standard scheduling algorithms under the assumptions of speed augmentation or certain input distributions. In their seminal paper, Bansal and Pruhs prove that under speed augmentation, fairness is not sacrificed much when SRPT is used [SICOMP 2010]. However, in practice, to achieve better fairness, it is not uncommon to complement SRPT with some starvation mitigation mechanism. Nonetheless, starvation mitigation inevitably destroys SRPT's optimality in minimizing the average flow time. Thus, it is not clear whether starvation mitigation can improve SRPT's performance on minimizing the ℓ_2 norm of flow time. In this paper, we answer this question in the affirmative. Let n be the number of jobs. We use an estimate of n to carefully mitigate the starvation caused by SRPT. Given a good estimate of n, our starvation mitigation mechanism reduces the competitive ratio of SRPT for the ℓ_2 norm of flow time from Ω(n^1/2) to O(n^1/3). Finally, we remark that all the online algorithms considered previously for this problem have competitive ratios Ω̃(n^1/2).

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/23/2018

Online Non-Preemptive Scheduling to Minimize Weighted Flow-time on Unrelated Machines

In this paper, we consider the online problem of scheduling independent ...
05/24/2018

Non-Preemptive Flow-Time Minimization via Rejections

We consider the online problem of minimizing weighted flow-time on unrel...
03/24/2022

Size-based scheduling vs fairness for datacenter flows: a queuing perspective

Contrary to the conclusions of a recent body of work where approximate s...
03/09/2021

Flow Time Scheduling with Uncertain Processing Time

We consider the problem of online scheduling on a single machine in orde...
02/28/2018

Online Non-preemptive Scheduling on Unrelated Machines with Rejections

When a computer system schedules jobs there is typically a significant c...
09/17/2021

Distortion-Oblivious Algorithms for Minimizing Flow Time

We consider the classic online problem of scheduling on a single machine...
08/25/2021

A Case for Sampling Based Learning Techniques in Coflow Scheduling

Coflow scheduling improves data-intensive application performance by imp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

When a job is submitted to a server, the response time (i.e., the amount of time between job release and job completion) is usually the primary concern. In job scheduling terminology, response time is also termed the flow time. To improve the average quality of service, an obvious approach is to minimize the average flow time. It is well known that Shortest Remaining Processing Time (SRPT) minimizes the average flow time. However, to minimize the average flow time, some jobs may starve and cause inequitable flow time. In their classic textbook, Operating System Concepts, Silberschatz, Galvin, and Gagne argue that a system with reasonable and predictable response time may be considered more desirable than a system that is faster on the average, but is highly variable [26]. To avoid starvation and improve fairness, one possible approach is to minimize the maximum flow time, which can done by First-Come-First-Served (FCFS) [6]. However, to minimize the maximum flow time, the average flow time may increase significantly and degrade the system-wide efficiency.

To balance the average flow time (or equivalently, the norm of flow time) and the maximum flow time (or equivalently, the norm of flow time), a natural approach is to minimize the norm of flow time. For the offline setting, there are constant factor approximation algorithms for minimizing the general norm of flow time [18, 5, 12]. All the non-trivial offline algorithms for minimizing the norm of flow time are based on linear programming rounding [18, 5, 12, 3]. For the online setting, only standard scheduling algorithms, including SRPT, Shortest Job First (SJF), Round Robin (RR, which shares the machine equally among all jobs), and Shortest Execution Time First (SETF), have been studied, and these analyses are based on speed augmentation [4, 17]. In particular, Bansal and Pruhs prove that SRPT is -speed -competitive for all  [4].111The competitive ratio of an algorithm for minimizing is , where is the objective value achieved by algorithm on problem instance and is the optimal algorithm. In addition, is -speed -competitive if its competitive ratio is at most when it is running on a machine that is times faster than the original machine. On the other hand, some experimental results indicate that when the job size distribution is heavy tailed, the fear that SRPT may cause starvation is unfounded [2, 9, 15, 25, 4, 7].

1.2 Motivation: The Need to Mitigate Starvation for SRPT in Practice

The above results show that under the assumptions of speed augmentation or special job size distributions, fairness may not be a serious concern for SRPT.222In fact, SRPT, SJF, SETF, and RR are all -competitive for minimizing the general norm of flow time under -speed augmentation. However, in practice, it is not uncommon to further improve SRPT’s fairness by some starvation mitigation mechanism. For example, Mangharam et al. observe that SRPT may cause unfairness in multimedia transmission [22], and some SRPT-based schedulers have starvation mitigation mechanisms [22, 27, 10, 21, 11, 23].

Nonetheless, starvation mitigation inevitably destroys the optimality of SRPT for the norm of flow time. Because the norm of flow time balances the norm of flow time and the norm of flow time, it is unclear whether starvation mitigation is beneficial for SRPT to minimize the norm of flow time. As examples, we detail two natural starvation mitigation methods that cannot improve SRPT’s norm of flow time in Section 6. Thus, we have the following question, which is of both theoretical and practical interest:

Question 1.1.

For SRPT, can starvation mitigation improve the norm of flow time?

1.3 Our Result

In this paper, we assume that a good estimate of is available. For example, usually job scheduling is only critical during peak hours. Moreover, in many systems, the job arrival process is a Poisson process. Thus, the product of the job arrival rate and the duration of peak hours would be a good estimate of . Nonetheless, in this study, we do not make any assumptions about the job arrival process or the estimation method. Instead, we assume that an estimate of with bounded error is given. Specifically, for any and that satisfy , an estimate of is termed a -estimate of if . and can be viewed as the estimation error rates.

In this paper, we propose SRPT with Starvation Mitigation (SRPT/SM). We use to mitigate the starvation caused by SRPT without sacrificing the norm of flow time too much. Recall that SRPT is optimal for the norm of flow time, and FCFS is optimal for the norm of flow time. Our algorithm can also be viewed as a combination of SRPT and FCFS. Specifically, when a job has been waiting for a sufficiently long time, it becomes an urgent job. Our algorithm follows SRPT when there is no urgent job. When there is some urgent job, the job that becomes urgent first is served first.

It can be shown easily that the competitive ratio of SRPT for minimizing the norm of flow time is . We then answer Question 1.1 in the affirmative. Given a good estimate of , SRPT/SM is -competitive for minimizing the norm of flow time. Specifically, we prove the following result.

Theorem 1.2.

SRPT/SM is -competitive for minimizing the norm of flow time.

We remark that all the aforementioned online algorithms for this problem have competitive ratios .

1.4 Definitions

We consider the online problem of minimizing the norm of flow time. In this problem, there are jobs, , and one server, and each job has a processing time and a release time . As in [3], we assume that and are in .333In this paper, we assume contains 0. For any schedule , we use to denote the completion time of job under . The flow time of a job under schedule , denoted by , is . Define . The goal is to compute a schedule that minimizes the norm of flow time, i.e., . Thus, any -competitive algorithm for minimizing is -competitive for minimizing the norm of flow time. Throughout this paper, we use , and , to denote the schedule obtained by the proposed algorithm (i.e., SRPT/SM), SRPT, and the optimal solution, respectively. A job is said to be active at time under schedule , if it is released by time but has not yet been completed by time under . We use to denote the index set of the active jobs under at time .

For every , the time slot is defined as the time interval between time and time . Thus, we can divide time into time slots . We can view each job as a chain of tasks , where task has unit processing time and a release time . Thus, every schedule must execute tasks of the same job in increasing order of release times. Because all the processing times and release times are in , by a simple exchange argument, we can assume that under the optimal schedule, the server never executes more than one task in time slot for any (i.e., the server is either idle or is executing the same task throughout the entire time slot ). Thus, in this paper, we view a schedule as a function that maps every to some (possibly empty) task executed in time slot . If a task is executed in time slot , then it is completed at time . A task is said to be queued at time under schedule , if it is released by time but has not yet been completed by time under . For any schedule and any time , we use to denote the number of queued tasks of under at time .

In this paper, we frequently consider functions from a subset of to .444In this paper, for any with , is defined as . If , then . In this paper, we call these functions maps. Thus, is a map that maps every to the number of ’s queued tasks at time under . For any map , denotes the domain of , and denotes the map such that and for any . For any map and any set , the restriction of to , denoted by , is a map from to such that for any .

Definition 1.3.

For any two maps and , dominates (or is dominated by ), denoted by , if and .

Definition 1.4.

Let be any map. Define as a function that maps any to the element in that has the th largest output of (ties can be broken arbitrarily). Thus, and .

Definition 1.5.

Let be any map. Define . Moreover,

For any , define

1.5 Our Techniques

We consider two lower bounds of . At every time , if there is some queued task, any schedule should execute some queued task in time slot . A schedule satisfying the above property is called an efficient schedule. All efficient schedules have the same number of queued tasks at any time. Let be the number of queued tasks at time under any efficient schedule. We have the following simple lower bound of , whose proof can be found in Appendix B.

Lemma 1.6.

On the other hand, observe that if all jobs are released concurrently, jobs should be executed in increasing order of processing times. Fix some time , we focus on the active jobs at time under . To derive the second lower bound (Lemma 2.3), we set for every , and optimize using the previous observation. In our analysis, we set to be the time at which SRPT/SM has the most urgent tasks.555If a job becomes urgent, all its remaining tasks become urgent as well.

For each lower bound, there exist instances where the ratio of to the lower bound is . Thus, the main challenge is to combine these two lower bounds and to compare the combined lower bound to . In our analysis, we consider a trimmed instance of the original instance. To construct , we remove all the tasks released after time from the original instance.666Thus, jobs released after time have zero processing time and zero task in . Let be the schedule obtained by applying SRPT to . Because a schedule can be viewed as a function that maps every to the (possibly empty) task executed in time slot and is identical to the original instance by time , is also an efficient schedule for the original instance by time .777A schedule is efficient for an instance by time if it satisfies the following constraint: For any , if there is a queued task at time in , then executes some queued task in in time slot . In particular, is the number of queued tasks of at time under in the original instance. We will show that majorizes (Lemma 3.7).

Definition 1.7.

For any two maps and , majorizes , denoted by , if the following two conditions are satisfied.

  • .

  • .

Observe that if , then . The notion of majorization is first studied by Hardy et al. [16], and Golovin et al. use majorization to study all symmetric norms of flow time [14]

. The original definition of majorization deals with vectors instead of maps. For example, in 

[14], the notion of majorization is applied to vectors consisting of the flow time of all jobs. In this paper, we do not directly consider the flow time. Instead, we consider the number of queued tasks and the number of remaining tasks. In addition, we will consider the restriction of a map to some subset of (e.g., ). Thus, we use maps instead of vectors.

In this paper, we show that if , then for any map dominated by , there exists a map such that , , and (Lemma 3.11). The existence of a map that satisfies the above three constraints is critical in our analysis. In addition to , we give two more sufficient conditions for the existence of a map that satisfies similar constraints (Lemmas 3.5 and 3.9). These two sufficient conditions are used when we consider the relationship between and and the relationship between and . As a result, we can obtain the relationship between and , which will be used to combine the two lower bounds and to compare the combined lower bound to .

1.6 Related Results

For the norm of flow time, SRPT is optimal. For the norm of flow time, FCFS is optimal [6]. In [4], SRPT, SJF, and SETF are shown to be scalable for the general norm of flow time.888A scheduling algorithm is scalable if for all , it is -speed -competitive for some constant (which may depend upon [24, 20]. The results are further extended to all symmetric norms of flow time [14] and identical machines [8, 13, 17]. Specifically, when there are multiple identical machines, SRPT is scalable for the general norm of flow time [13], and RR is -speed -competitive for the norm of flow time [17]. In [3, 19, 1], more general objective functions are considered. Specifically, for a job with flow time , a cost is incurred. The only restriction on is that must be non-decreasing. For this general cost minimization problem, there is an -speed -competitive algorithm [19, 1] and an -approximation algorithm [3]. Finally, for minimizing the general norm of flow time, there are -approximation algorithms [18, 12, 5].

2 The Algorithm and the Competitive Analysis

2.1 The Algorithm: SRPT/SM

For brevity, we define as . Let (respectively, ) be the set of tasks executed in time slots under SRPT (respectively, SRPT/SM). Define . Note that SRPT/SM can obtain by simulating SRPT.

To avoid starvation, SRPT/SM categorizes active jobs into two types, urgent and normal. An active job is said to be urgent at time if ; otherwise, is said to be normal. Thus, every job is normal initially. Observe that once a job becomes urgent, the job will always be urgent until completion. In addition, a job can only become urgent after time . If a job never becomes urgent, it is termed a Finished-as-Normal (FaN) job; otherwise, it is termed a Finished-as-Urgent (FaU) job. We define as follows: if is an FaU job, is the time at which becomes urgent. That is, is the smallest that satisfies . If is an FaN job, .

When there are urgent jobs, SRPT/SM executes the urgent job that has the smallest . In other words, SRPT/SM executes the job that becomes urgent first. If there is no urgent job, SRPT/SM then follows SRPT. More specifically, to decide the task to be executed in time slot when there is no urgent job, we first find an arbitrary task . SRPT/SM then executes job in time slot . Notice that if then no job needs to be executed in time slot . In the pseudocode of SRPT/SM, we use to denote the set of urgent jobs and initially . We report some simulation results in Appendix A.

1
2 while true do
3        for every active job  do
4               if  and  then
5                     
6                      Add to
7                     
8              
9       if  then
10              
11               Execute job in time slot
12               if  is completed at time  then
13                      Remove from
14                     
15              
16       else if  then
17               Pick a task from
18               Execute in time slot
19              
20       
21       
Algorithm 1 SRPT/SM

2.2 Analysis of SRPT/SM

A job ’s post-urgent flow time, denoted by , is defined as , and ’s pre-urgent flow time, denoted by , is defined as . can then be expressed as

Obviously, . Based on Lemma 1.6, we give an upper bound of in the following lemma, whose proof can be found in Appendix B.

Lemma 2.1.

.

By the above lemma, to prove Theorem 1.2, it suffices to show . Note that if is an FaN job. Next, we give an upper bound of for an FaU job . Once becomes urgent, all the remaining tasks of are said to be urgent as well. Note that at time , all the remaining tasks of have been released. Let be the number of urgent tasks that are queued at time . Under SRPT/SM, at time , waits at most time slots before completion. Thus, and . Define . We then have and thus . As a result, to prove Theorem 1.2, it is sufficient to show

(1)

We use the following two lemmas to prove Eq. (1).

Lemma 2.2.

There exists a map that satisfies the following properties:

P1:

.

P2:

.

P3:

.

Lemma 2.3.

Let be the map defined in Lemma 2.2. We then have

Proof of Eq. (1):

For simplicity, we reindex jobs so that

(2)

We then have

(by Lemma 2.3 and P3 in Lemma 2.2)
(by the AM–GM inequality)
(by the Cauchy–Schwarz inequality)

The proof of Lemma 2.3 can be found in Appendix B. The remainder of this paper is mainly devoted to the proof of Lemma 2.2.

2.3 Proof Sketch of Lemma 2.2

To prove Lemma 2.2, we only need to focus on the case where . This is because when , we can simply set for every to satisfy all the properties in Lemma 2.2. Thus, in the following proof, we assume . To prove Lemma 2.2, we first construct a map such that:

  • .

  • .

The lower bound is obtained by Lemma 1.6. For brevity, we introduce the notation .

Definition 2.4.

Let and be any two maps. We write if the following two properties hold:

  • .

  • for some constant .

Thus, to prove Lemma 2.2, it suffices to construct a map such that:

  • .

  • .

To this end, we first show that there exists a map such that and . Recall the schedule defined in Section 1.5. We then show that there exists a map such that and . Thus, . Finally, we construct the desired map based on the properties of and the fact . The constructions of the above four key maps (i.e., , , , and ) are given in Section 3.

3 The Four Key Maps

3.1 The First Key Map:

Definition 3.1.

Let be any map such that . Let be any integer in . Let be the least integer such that . The truncation of at , denoted by , is a map dominated by such that

Finally, is said to be a valid truncation if .

Example 3.2.

Assume and . We then have and . If , then , , and . For all , .

To construct , we first introduce a new map that dominates . For every and , we define as follows. If is an FaN job, . If is an FaU job and , . If is an FaU job and , . Notice that when , we have because the final task of is released before time . Thus, . We stress that the definition of can be applied to . We have the following lemma, whose proof can be found in Appendix C.

Lemma 3.3.

Assume . Let . Let . Let be a map such that if and if . We then have

  • .

  • .

3.2 The Second Key Map:

The proof of the following lemma can be found in Appendix C.

Lemma 3.4.

For every , .

The proof of following lemma can be found in Appendix D.

Lemma 3.5.

Let and be two maps that satisfy the following two constraints:

T1:

.

T2:

.

Then for any map dominated by , there exists a map such that:

  • .

  • .

The following lemma proves the existence of .

Lemma 3.6.

Let be the map defined in Lemma 3.3. There exists a map such that:

  • .

  • .

Proof.

Consider Lemma 3.5 and fix . and . By Lemma 3.4, T1 holds. T2 holds because . By the definition of in Lemma 3.3, is dominated by . Thus, the proof follows from Lemma 3.5. ∎

3.3 The Third Key Map:

We give the proofs of the following two lemmas in the following sections.

Lemma 3.7.

Let be any efficient schedule for the original instance. Then .

Lemma 3.8.

Let be any map dominated by . Let . If , then .

The proof of the following lemma can be found in Appendix D.

Lemma 3.9.

Let and be two maps that satisfy the following three constraints:

D1:

.

D2:

.

D3:

Let . If , then .

Then there exists a map such that:

  • .

  • .

We are now ready to prove the following lemma.

Lemma 3.10.

Let be the map defined in Lemma 3.6. There exists a map such that:

  • .

  • .

Proof.

Consider Lemma 3.9 and fix as the map defined in Lemma 3.6, and fix . Because , D1 holds. By Lemma 3.7, . Because , D2 holds. By Lemma 3.8, D3 holds. The proof then follows from Lemma 3.9. ∎

3.4 The Final Key Map:

The proof of the following lemma can be found in Appendix D.

Lemma 3.11.

If , then for any map dominated by , there is a map such that:

  • .

  • .

The following lemma proves the existence of in Lemma 2.2.

Lemma 3.12.

Let be the map defined in Lemma 3.10. There exists a map such that:

  • .

  • .

Proof.

Consider Lemma 3.11 and fix and . By Lemma 3.7, . Let be the map defined in Lemma 3.10. Thus, . The proof then follows from Lemma 3.11. ∎

4 The Relationship Between and

In this section, we prove Lemma 3.7. Recall the trimmed instance defined in Section 1.5.

Definition 4.1.

For any , any , and any schedule , is defined as the number of queued tasks of at time under schedule in .

Because and the original instance are identical by time , we have the following fact.

Fact 4.2.

For any schedule and any , .

Definition 4.3.

For any , any , and any schedule , is defined as the number of remaining tasks of at time under schedule in . In other words, is the number of ’s tasks in that are not completed by time under . Similarly, for any , any , and any schedule , is defined as the number of remaining tasks of at time under schedule in the original instance.

Because SRPT always executes the job that has the least remaining tasks, and is obtained by applying SRPT to , the following lemma should not be too surprising.

Lemma 4.4.

Let . Let be any schedule that is efficient for by time . Then .

The proof of the above lemma can be found in Appendix E. For any job , because all tasks of are released by time in , for any schedule . As a result, we have the last fact for the proof of Lemma 3.7.

Fact 4.5.

For any schedule and any , .

Proof of Lemma 3.7

First observe that because and are efficient schedules by time for the original instance, . For any , we have

(by Fact 4.2)
(by Fact 4.5)
(by Lemma 4.4)
(by Fact 4.5)
(by Fact 4.2)

5 The Relationship Between and

In this section, we prove Lemma 3.8.

Definition 5.1.

A job is untrimmed if its last task is released by time in the original instance. Otherwise, is trimmed.

In the following definition, we assume that there is a common deadline for every job in the original instance. Let . Because any schedule can complete at most tasks between time and time , cannot meet the deadline under if .

Definition 5.2.

Let . A job is a Definitely Late (DL) job at time under schedule if and . On the other hand, a job is said to be a Possibly-In-Time (PIT) job at time under schedule if and . Define as the set of indices of the DL jobs at time under schedule . Similarly, define as the set of indices of the PIT jobs at time under schedule .

Observe that regardless of the schedule, an untrimmed job is a PIT job when it is released, and a trimmed job is a DL job when it is released.

Notice that all the previous results about hold for any tie-breaking rule to choose the job to be executed next. To simplify the proof of Lemma 3.8, we assume that adopts the following tie-breaking rule to maximize the similarity between and