Online Makespan Minimization: The Power of Restart

by   Zhiyi Huang, et al.

We consider the online makespan minimization problem on identical machines. Chen and Vestjens (ORL 1997) show that the largest processing time first (LPT) algorithm is 1.5-competitive. For the special case of two machines, Noga and Seiden (TCS 2001) introduce the SLEEPY algorithm that achieves a competitive ratio of (5 - √(5))/2 ≈ 1.382, matching the lower bound by Chen and Vestjens (ORL 1997). Furthermore, Noga and Seiden note that in many applications one can kill a job and restart it later, and they leave an open problem whether algorithms with restart can obtain better competitive ratios. We resolve this long-standing open problem on the positive end. Our algorithm has a natural rule for killing a processing job: a newly-arrived job replaces the smallest processing job if 1) the new job is larger than other pending jobs, 2) the new job is much larger than the processing one, and 3) the processed portion is small relative to the size of the new job. With appropriate choice of parameters, we show that our algorithm improves the 1.5 competitive ratio for the general case, and the 1.382 competitive ratio for the two-machine case.



There are no comments yet.


page 1

page 2

page 3

page 4


Online scheduling of jobs with favorite machines

This work introduces a natural variant of the online machine scheduling ...

Scheduling in the Secretary Model

This paper studies Makespan Minimization in the secretary model. Formall...

Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Machine learning plays an ever-bigger part in online recruitment, poweri...

Maximizing Online Utilization with Commitment

We investigate online scheduling with commitment for parallel identical ...

Robust Online Speed Scaling With Deadline Uncertainty

A speed scaling problem is considered, where time is divided into slots,...

On the Optimality of Scheduling Dependent MapReduce Tasks on Heterogeneous Machines

MapReduce is the most popular big-data computation framework, motivating...

Online Facility Location on Semi-Random Streams

In the streaming model, the order of the stream can significantly affect...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We study in this paper the classic online scheduling problem on identical machines. Let there be identical machines, and a set of jobs that arrive over time. For each job , let denote its release time (arrival time), and denote its processing time (size). We assume without loss of generality that all ’s and ’s are distinct. We seek to schedule each job on one of the machines such that the makespan (the completion time of the job that completes last) is minimized.

We adopt the standard assumption that there is a pending pool such that jobs released but not scheduled are in the pending pool. That is, the algorithm does not need to assign a job to one of the machines at its arrival; it can decide later when a machine becomes idle. Alternatively, the immediate-dispatching model has also been considered in some papers (e.g., Avrahami and Azar (2007)).

We consider the standard competitive analysis of online algorithms. An algorithm is -competitive if for any online sequence of jobs, the makespan of the schedule made by the algorithm is at most times the minimum makespan in hindsight. Without loss of generality, (by scaling the job sizes) we assume the minimum makespan (for analysis purpose only).

Chen and Vestjens (1997) consider a greedy algorithm called largest processing time first (LPT): whenever there is an idle machine, schedule the largest job in the pending pool. They prove that the LPT algorithm is -competitive and provide a matching lower bound (consider jobs of size followed by a job of size ). They also show that no online algorithm can achieve a competitive ratio better than . For the special case when there are only two machines, Noga and Seiden (2001) introduce the SLEEPY algorithm that achieves a tight competitive ratio, due to a previous lower bound given by Chen and Vestjens (1997).

The lower bound (for two machines) holds under the assumption that whenever a job is scheduled, it must be processed all the way until its completion. However, as noted in Noga and Seiden (2001), many applications allow restart: a job being processed can be killed (put into pending) and restarted later to make place for a newly-arrived job; a job is considered completed only if it has been continuously processed on some machine for a period of time that equals to its size. In other words, whenever a job gets killed, all previous processing of this job is wasted.

Note that the restart setting is different from the preemptive setting, in which the processed portion is not wasted. Noga and Seiden (2001) leave the following open problem: Is it possible to beat the barrier with restart?

In this paper, we bring an affirmative answer to this long-standing open problem.

We propose a variant of the LPT algorithm (with restart) that improves the competitive ratio for the general case, and the competitive ratio for the two-machine case.

Our Replacement Rule.

A naïve attempt for the replacement rule would be to replace a job whenever the newly-arrived job has a larger size. However, it is easy to observe that the naïve attempt fails even on one machine: the worst case competitive ratio is if we keep replacing jobs that are almost completed (with jobs of slightly larger size). Hence we should prevent a job from being replaced if a large portion has been processed. Moreover, we allow a newly-arrived job to replace a processing job only if it has a much larger size, in order to avoid a long chain of replacements. As we will show by an example in Section 7, the worst case competitive ratio is if a job of size is replaced by a job of size , which is in turn replaced by a job of size , etc.

We hence propose the following algorithm that applies the above rules.

LPT with Restart.

As in the LPT algorithm, our algorithm schedules the largest pending job whenever there is an idle machine. The main difference is that our algorithm may kill a processing job to make place for a newly-arrived job according to the following rule. Upon the arrival of a job , we kill a processing job (i.e., put into pending) and schedule if:

  • is the largest pending job and is the smallest among the processing jobs;

  • the processed portion of is less than ;

  • the size of is more than times larger than , i.e., ,

where are parameters of the algorithm. We call such an operation a replacement (i.e. replaces ).

Intuitively, the parameter provides a bound on the total amount of wasted processing (in terms of the total processing time); while the parameter guarantees an exponential growth in the processing time of jobs if there is a chain of replacements. With appropriate choice of parameters, we show the following results.

Theorem 1.1

LPT with Restart, with parameters and , is -competitive for the Online Makespan Minimization problem with restart.

Theorem 1.2

LPT with Restart, with parameters , is -competitive for the Online Makespan Minimization problem with restart on two machines.

There are many other natural candidate replacement rules. We list some candidate algorithms that we have considered and their counter examples in Sec 7.

Our Techniques.

The main focus of our paper is the general case, i.e., on machines. The analysis for the two-machine case is built on the general case by refining some of the arguments.

We adopt an idea from Chen and Vestjens (1997) to look at the last completed job in our schedule. Intuitively, only jobs with size comparable to that of the last job matter. We develop two kinds of arguments, namely the bin-packing argument and the efficiency argument.

Assume for contrary that the algorithm has a makespan strictly larger than , where , we use the bin-packing argument to give an upper bound on the size of the last completed job. Assume that the last job is large, we will find a number of large jobs that cannot be packed into bins of size (recall that we assume ). In other words, to schedule this set of jobs, one of the machines must get a total workload strictly greater than . For example, finding jobs of size strictly greater than would suffice. We refer to such a set of large jobs as an infeasible set of jobs.

We then develop an efficiency argument to handle the case when the last job is of small size. The central of the argument is a Leftover Lemma that upper bounds the difference of total processing done by the algorithm and by OPT. As our main technical contribution, the lemma is general enough to be applied to all schedules.

Fix any schedule (produced by some algorithm ALG) and a time . Let denote the set of machines. For each machine , let be the indicator function of the event that “at time , machine is not processing while there are pending jobs”. Define to be the total waste (of processing power) before time . We show (in Section 3) the following lemma that upper bounds the leftover workload.

Lemma 1.1 (Leftover Lemma)

For all time , let be the difference in total processing time before time between ALG and OPT. We have .

Observe that the total processing power (of machines) before time is . The Leftover Lemma says that compared to any schedule (produced by algorithm ALG), the extra processing the optimal schedule can finish before time , is upper bounded by the processing power wasted by the schedule (e.g., due to replacements), plus a quarter of the total processing power, which comes from the sub-optimal schedule of jobs.

Consider applying the lemma to the final schedule222Since a job can be scheduled and replaced multiple times, its start time is finalized only when it is completed. produced by our algorithm. Since our algorithm schedules a job whenever a machine becomes idle, the waste comes only from the processing (before time ) of jobs that are replaced. Thus (by our replacement) we can upper bound by fraction of the total size of jobs that replace other jobs.

We remark that the above bound on the leftover workload is tight for LPT (for which ). Consider jobs of size arriving at time , followed by jobs of size arriving at time . The optimal schedule uses machines to process the size  jobs and machines to process the size  jobs (two per machine), finishing all jobs at time . LPT would schedule all the size  jobs first; all of the size  jobs have half of their workload unprocessed at time . Therefore, the amount of leftover workload at time is .

Other Work.

The online scheduling model with restart has been investigated in the problem of scheduling jobs on a single machine to maximize the number of jobs completed before their deadlines. Hoogeveen et al. (2000) study the general case and propose a -competitive algorithm with restart. Subsequently, Chrobak et al. (2007) consider the special case when jobs have equal lengths. They propose an improved -competitive algorithm with restart for this special case, and prove that this is optimal for deterministic algorithms. However, the restart rule and its analysis in our paper do not bear any obvious connections to those in Hoogeveen et al. (2000) and Chrobak et al. (2007) due to the different objectives.

Other settings of the online makespan minimization problem have been studied in the literature. A classic setting is when all machines are identical and all jobs have release time , but the algorithm must immediately assign each job to one of the machines at its arrival (immediate dispatching). This is the same as online load balancing problem. Graham (1969) proves that the natural greedy algorithm that assigns jobs to the machine with the smallest workload is -competitive in this setting, which is optimal for (due to folklore examples). A series of research efforts have then been devoted to improving the competitive ratio when is large (e.g., Albers (1999); Bartal et al. (1995); Karger et al. (1996)). For , the best upper bound is 1.7333 Chen et al. (1994a), while the best lower bound stands at  RudinIII and Chandrasekaran (2003). For that tends to infinity, the best upper bound is Fleischer and Wahl (2000), while the best lower bound is  RudinIII (2001).

A variant of the above setting is that a buffer is provided for temporarily storing a number of jobs; when the buffer is full, one of the jobs must be removed from the buffer and allocated to a machine (e.g., Li et al. (2007); Dósa and Epstein (2010)). Kellerer et al. (1997) and Zhang (1997) use algorithms with a buffer of size one to achieve an improved competitive ratio for two machines. Englert et al. (2014) characterize the best ratio achievable with a buffer of size , where the ratio is between and depending on the number of machines . When both preemption and migration are allowed, Chen et al. (1995) give a -competitive algorithm without buffer, matching the previous lower bound by Chen et al. (1994b). Dósa and Epstein (2011) achieve a ratio of with a buffer of size .

Finally, if the machines are related instead of identical, the best known algorithm is -competitive by Berman et al. (2000), while the best lower bound is by Epstein and Sgall (2000). When preemption is allowed, Ebenlendr et al. (2009) show that the upper bound can be improved to . For the special case of two related machines, the current best competitive ratio is by Epstein et al. (1999) without preemption, and with preemption by Ebenlendr et al. (2009) and Wen and Du (1998).


We first provide some necessary definitions in Section 2. Then we prove the most crucial structural property (Lemma 1.1, the Leftover Lemma) in Section 3, which essentially gives a lower bound on the efficiency of all schedules. We present the details of the bin-packing argument and efficiency argument in Section 4, where our main result Theorem 1.1 is proved. The special case of two machines is considered in Section 6, where Theorem 1.2 is proved. Finally, we prove in Section 8 that no deterministic algorithm, even with restart, can get a competitive ratio better than .

2 Preliminaries

Consider the online makespan minimization with identical machines and jobs arriving over time. Recall that for each job , denotes its release time and denotes its size. Let OPT and ALG be the makespan of the optimal schedule and our schedule, respectively. Recall that we assume without loss of generality that (for analysis purpose only). Hence we have for all jobs . Further, let and denote the start and completion time of job , respectively, in the final schedule produced by our online algorithm. Note that a job can be scheduled and replaced multiple times. We use to denote the last start time of before time .

We use to denote the job that completes last, i.e., we have .

We consider the time horizon as continuous, and starts from . Without loss of generality (by perturbing the variables slightly), we assume that all ’s, ’s and ’s are different.

Definition 2.1 (Processing Jobs)

For any , we denote by the set of jobs that are being processed at time , including the jobs that are completed or replaced at but excluding the jobs that start at .

Note that is defined based on the schedule produced by the algorithm at time . It is possible that jobs in are replaced at or after time .

Idle and Waste.

We say that a machine is idle in time period , if for all , the machine is not processing any job according to our algorithm, and there is no pending job. We call time idle if there exists at least one idle machine at time . Whenever a job is replaced by a job (at ), we say that a waste is created at time . The size of the waste is the portion of that is (partially) processed before it is replaced. We can also interpret the waste as a time period on the machine. We say that the waste comes from , and call the replacer.

Definition 2.2 (Total Idle and Total Waste)

For any , define as the total idle time before time , i.e., the summation of total idle time before time on each machine. Similarly, define as the total waste before time in the final schedule, i.e., the total size of wastes located before time , where if a waste crosses , then we only count its fractional size in .

3 Bounding Leftover: Idle and Waste

In this section, we prove Lemma 1.1, the most crucial structural property. Recall that we define as the total waste located before time . For applying the lemma to general scheduling algorithms, (recall from Section 1) is defined as , the total time during which machines are not processing while there are pending jobs. It is easy to check that the proofs hold under both definitions. We first give a formal definition of the leftover at time .

Definition 3.1 (Leftover)

Consider the final schedule and a fixed optimal schedule OPT. For any , let be the total processing OPT does before time , minus the total processing our algorithm does before time .

Since the optimal schedule can process a total processing at most after time , we have the following useful observation.

Observation 3.1

The total processing our algorithm does after time is at most .

We call time a marginal idle time if is idle and the time immediately after is not. We first define , which is designated to be an upper bound on the total processing that could have been done before time , i.e., the leftover workload due to sub-optimal schedule.

Definition 3.2 ()

For all , if there is no idle time before , then define , otherwise let be the last idle time before . Define , where is the total pending time of job before time .

We show the following claim, which (roughly) says that the extra processing OPT does (compared to ALG) before time , is not only upper bounded by total idle and waste (), but also by the total size or pending time of jobs currently being processed ().

Claim 3.1

We have for all .


First observe that we only need to prove the claim for marginal idle times, as we have (while ) for non-idle time . Now suppose is a marginal idle time.

It is easy to see that is at most , the total length of time periods before during which the algorithm is not processing (in the final schedule). Next we show that .

Let , and be the corresponding variables when the algorithm is run until time . Observe that for a job , if it is replaced after time , then it contributes a waste to but not to . Moreover, it has the same contribution to and to . Thus we have . By definition we have .

Hence it suffices to show that .

Since is idle, there is no pending job at time . Thus the difference in total processing at time , i.e., , must come from the difference (between ALG and OPT) in processing of jobs in that has been completed. For each , the extra processing OPT can possibly do on (compared to ALG) is at most . Hence we have .

Recall by Definition 3.2, we have is the periods during which is pending.

Thus at every time , is being processed (and replaced later). Hence is at most the total wastes from that are created before , which implies

as desired. ∎

We prove the following technical claim.

Claim 3.2

For any integer , given any three sequences of positive reals , and satisfying conditions

  • ;

  • for all , we have ,

we have .


We prove the claim by induction on . We first show that the claim holds true when . Note that we have . Combine with property (2) we know that .

Figure 1: graph representation of Claim 3.2 for

Now suppose the claim is true for all values smaller than . Using induction hypothesis on and , we have

Define . Let and .

Note that and (and their prefixes) satisfy the conditions of the claim: first, by definition we have and ; second, since and are not changed, and , if suffices to check condition (2) for :

Applying the induction hypothesis on and ,

If , then immediately we have

as desired. Otherwise we have , and hence we have

which implies . Hence we have

which completes the induction. ∎

Given Claim 3.2, we are now ready to proof the Leftover Lemma.

Proof of Lemma 1.1: As before, it suffices to prove the lemma for marginal idle times, as we have (while ) for non-idle time . Now suppose is a marginal idle time. As before, let and be the values of variables when the algorithm is run until time .

We prove a stronger statement that , by induction on the number of marginal idle times at or before time . Note that the stronger statement implies the lemma, as we have .

In the following, we use a weaker version of Claim 3.1: we only need .

Base Case: . Since is the first marginal idle time, let be the first idle time, we know that is the only idle period. Define to be the set of jobs that are processed from time to . By definition we have . Recall that , where is the total pending time of job before time . Hence we have if , and otherwise. By Claim 3.1 we have

Induction. Now suppose the statement holds for all marginal idle times , and consider the next marginal idle time . We show that . First of all, observe that the difference in and must come from the idle periods in and wastes created in . Hence for all we have

Hence, if there exists some such that , then by induction hypothesis,

and we are done. Now suppose otherwise.

For all , let be the first idle time after (assume ), i.e., are the disjoint idle periods. Define . Note that for all , we have , as is not pending during idle periods; for , we have . For all , define

We show that the three sequences of positive reals , satisfy the conditions of Claim 3.2, which implies as

Finally, we check the conditions of Claim 3.2. Condition (1) trivially holds. For condition (2), observe that . Hence we have

as required.  

4 Breaking on Identical Machines

In this section, we prove Theorem 1.1. We will prove by contradiction: assume for contrary that , we seek to derive a contradiction, e.g., no schedule could complete all jobs before time (Recall that we assume ). To do so, we introduce two types of arguments: we use a bin-packing argument to show that the last job must be of small size, as otherwise there exists a set of infeasible large jobs; then we use an efficiency argument (built on the Leftover Lemma) to show that the total processing (excluding idle and waste periods) our algorithm completes exceed , the maximum possible processing OPT does.

For convenience of presentation, in the rest of the paper, we adopt the minimum counter-example assumption Noga and Seiden (2001), i.e., we consider the instance with the minimum number of jobs such that and . As an immediate consequence of the assumption, we get that no job arrives after . This is because such jobs do not affect the start time of and therefore could be removed to obtain a smaller counter example.

Recall that in our algorithm, we set and . Define , where . We first provide some additional structural properties of our algorithm, which will be the building blocks of our later analysis.

4.1 Structural Properties

Observe that if a job is replaced, then it must be the minimum job among the jobs that are currently being processed. Hence immediately we have the following lemma, since otherwise we can find jobs (including the replacer) of size larger than .

Fact 4.1 (Irreplaceable Jobs)

Any job with size at least cannot be replaced.

Next, we show that if a job is pending for a long time, then each of the jobs processed at time must be of (relatively) large size.

Lemma 4.1

For any job , we have for all .


It suffices to consider the non-trivial case when . Consider any . If or 333Note that we use here instead of as can possibly be replaced after time ., then we have . Otherwise, we consider time , at which job is scheduled. Since and ( is already released), we know that must be processed at . Hence we know that is replaced during , which is impossible since (which is of smaller size than ) is being processed during this period. ∎

Specifically, since and , we have . Applying Lemma 4.1 to job gives the following.

Corollary 4.1 (Jobs Processed at Time )

We have for all .

In the following, we show two lemmas, one showing that if a job released very early is not scheduled, then all jobs processed at that time are (relatively) large; the other showing that if a job is replaced, then the next time it is scheduled must be the completion time of a larger job.

Lemma 4.2 (Irreplaceable Jobs at Arrival)

If a job is not scheduled at and , then for all .


By our replacement rule, is not scheduled at either because is not the largest pending job at , or is the largest pending job, but the minimum job in is not replaceable.

For the second case, since the minimum job in is processed at most , job must violate our third replacement rule, that is . For the first case, let be the first job of size at least that is not scheduled at its release time. Then we have . Hence by the above argument every job in is of size at least . Thus every job in is also of size at least . ∎

Lemma 4.3 (Reschedule Rule)

Suppose some job is replaced, then the next time is scheduled must be the completion time of a job such that .


Suppose is replaced at time and rescheduled at time . Since can only replace other jobs at , the next time is scheduled must be when some machine becomes idle. And this happens only if the job processed before on this machine is completed. Moreover, since is pending from to , if , i.e., is pending when is scheduled, then (by greedy scheduling rule) we have ; otherwise is being processed at time , and we also have as is the smallest job among all jobs in by the replacement rule. Hence, we have . ∎

We present the central lemma for our bin-packing argument as follows. Intuitively, our bin-packing argument applies if there exists time and that are far apart, and the jobs in and are large (e.g. larger than ): if , then together with job , we have found an infeasible set of large jobs; otherwise (since and are far apart) we show that the jobs in must be even larger, e.g. larger than .

Lemma 4.4 (Bin-Packing Constraints)

Given non-idle times , such that and , let and , none of the following cases can happen:

  1. , , and ;

  2. , and .


We show that if any of the cases happens, then we have the contradiction that .

We first consider case (1). We show that we can associate jobs to machines such that every machine is associated with either a job of size larger than , or two jobs of size larger than . Moreover, we show that every job is associated once, while is not associated. Note that since , such an association would imply the contradiction that .

First, we associate every to the machine that it is processed on. If then we are done with this machine; otherwise (when ), we have and we show that we can associate another job of size larger than to this machine.

  • If the job processed on this machine is not replaced, or , then we associate with this machine (it is easy to check that has not been associated before);

  • otherwise the first job that completes after must be of size larger than , which also has not been associated before. Thus we associate it to this machine.

In both cases we are able to do the association, as claimed.

Next we consider case (2). Consider any job , we have . We apply an association argument similar as before: let be the machine that job is processed on.

  • If is replaced, then we associate the first job that completes on this machine after is replaced, which is of size larger than , to machine ;

  • otherwise if then we associate to ;

  • otherwise we know that completes before , and we can further associate to the job in processed on .

It is easy to check that every job is associated at most once. Hence every machine is associated with either a job of size larger than , or two jobs of size larger than , which (together with ) gives , a contradiction. ∎

4.2 Upper Bounding : Bin-Packing Argument

We show in this section how to apply the structural properties from Section 4.1 to provide an upper bound on . Recall that we assume . We show that if , then Lemma 4.4 leads us to a contradiction. We first prove the following lemma (which will be further used in Section 4.4), under a weaker assumption, i.e., .

Lemma 4.5

If , then job is never replaced.


Assume the contrary and consider the last time when is replaced. Note that by Fact 4.1, we have . Suppose is replaced by job at time . Then by our replacement rule, we have . As and , we have

where the second last inequality holds since and . Since , by Lemma 4.1, we have . Thus we can apply Lemma 4.4(2), with , , and , and derive a contradiction. ∎

Lemma 4.6 (Upper Bound on Last Job)

We have .


We first show a weaker upper bound: . Assume the contrary that . As shown in the proof of Lemma 4.1, for all , if , we have ; otherwise and we have . Among the jobs , there exist two jobs, say and , that are scheduled on the same machine in OPT. Moreover, we have , since otherwise one of them is larger than and they cannot be completed in the same machine within makespan . Let be the one with a smaller release time, i.e., . Then we have .

Observe that is never replaced, as otherwise (by Lemma 4.3, the reschedule rule) we have , which implies , a contradiction.

By Lemma 4.2, we know that ( is scheduled at its arrival time ), as otherwise the minimum job in is of size at least . Thus we have , which is also a contradiction as (recall that and ).

By and , we have . Hence . By Lemma 4.2, we know that the minimum job in is of size at least . Hence we have the contradiction that there are jobs, namely , of size larger than .

Hence we have that . Now assume that .

By Lemma 4.5, we know that