Scheduling with Testing on Multiple Identical Parallel Machines

05/05/2021 ∙ by Susanne Albers, et al. ∙ Technische Universität München 0

Scheduling with testing is a recent online problem within the framework of explorable uncertainty motivated by environments where some preliminary action can influence the duration of a task. Jobs have an unknown processing time that can be explored by running a test. Alternatively, jobs can be executed for the duration of a given upper limit. We consider this problem within the setting of multiple identical parallel machines and present competitive deterministic algorithms and lower bounds for the objective of minimizing the makespan of the schedule. In the non-preemptive setting, we present the SBS algorithm whose competitive ratio approaches 3.1016 if the number of machines becomes large. We compare this result with a simple greedy strategy and a lower bound which approaches 2. In the case of uniform testing times, we can improve the SBS algorithm to be 3-competitive. For the preemptive case we provide a 2-competitive algorithm and a tight lower bound which approaches the same value.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One of the most fundamental problems in online scheduling is makespan minimization on multiple parallel machines. An online sequence of jobs with processing times has to be assigned to identical machines. The objective is to minimize the makespan of the schedule, i.e. the maximum load on any machine. In 1966, Graham [Graham1966] showed that the List Scheduling algorithm, which assigns every job to the currently least loaded machine, is -competitive. Since then the upper bound has been improved multiple times, most recently to by Fleischer and Wahl [FleischerWahl2000]. At the same time, the lower bound has also been the focus of a lot of research, the current best result is by Rudin [Rudin2001].

We consider this classical problem in the framework of explorable uncertainty, where part the input is initially unknown to the algorithm and can be explored by investing resources which are added as costs to the objective function. Let jobs be given. Every job has a processing time and an upper bound . It holds for all . Each job also has a testing time . A job can be executed on one of identical machines in one of two modes: It can either be run untested, which takes time , or be tested and then executed, which takes a total time of . The number of jobs , as well as all testing times and upper bounds are known to the algorithm in the beginning. In particular, an algorithm can sort/order the jobs in a convenient way based on these parameters. The processing time for job is revealed once the test is completed. This scheduling with testing setting has been recently studied by Dürr et al. [DuerrEtAl2018], and Albers and Eckl [AlbersEckl2020] on a single machine.

We differentiate between preemptive and non-preemptive settings: If preemption is allowed, a job may be interrupted at any time, and then continued later on a possibly different machine. No two machines may work on the same job at the same time. In case a job is tested, any section of the test must be scheduled earlier than any section of the actual job processing. In the non-preemptive setting, a job assigned to a machine has be fully scheduled without interruption on this machine, independent of whether it is tested or not. We also introduce the notion of test-preemptive scheduling, where a job can only be interrupted right after its test is completed.

Scheduling with testing is well-motivated by real world settings where a preliminary evaluation or operation can be executed to improve the duration or difficulty of a task. Examples for the case of multiple machines include e.g. a manufacturing plan where a number of jobs with uncertain length have to be assigned to multiple workers, or a distributed computing setting where tasks with unknown parameters have to be allocated to remote computing nodes by a central scheduler. Several examples for applicable settings for scheduling with testing can also be found in [AlbersEckl2020, DuerrEtAl2018].

In summary, we study the classical problem of makespan minimization on identical parallel machines in the framework of explorable uncertainty. We use competitive analysis to compare the value of an algorithm with an optimal offline solution. The setting closely relates to online machine scheduling problems studied previously in the literature. We investigate deterministic algorithms and lower bounds for the preemptive and non-preemptive variations of this problem.

1.1 Related Work

Scheduling with testing describes the setting where jobs with uncertain processing times have to be scheduled tested or untested on a given number of machines. The problem has been first studied by Dürr et al. [DuerrEtAl2018, DuerrEtAl2020] for the special case of scheduling jobs on a single machine with uniform testing times . For the objective of minimizing the sum of completion times, they give a lower bound of 1.8546 and an upper bound of 2 in the deterministic setting. In the randomized setting, they present a lower bound of 1.6257 and a 1.7453-competitive algorithm. They also provide several upper bounds closer to the best possible ratio of 1.8546 for special case instances. Tight algorithms for the objective of minimizing the makespan are given for both the deterministic and randomized setting. More recently, Albers and Eckl [AlbersEckl2020] considered the one machine case with testing times , presenting generalized algorithms for both objectives. In this paper, we consider scheduling with testing on identical parallel machines, a natural generalization of the previously studied one machine case.

Makespan minimization in online scheduling with identical machines has been studied extensively in the past decades, ever since Graham [Graham1966] established his -competitive List Scheduling algorithm in 1966. In the deterministic setting, a series of publications improved Graham’s result to competitive ratios of [GalambosWoeginger1993] where for large , [BartalEtAl1992], [KargerEtAl1996], and  [Albers1999], before Fleischer and Wahl [FleischerWahl2000] presented the current best result of . In terms of the deterministic lower bound for general , research has been just as fruitful. The bound was improved from [FaigleEtAl1989], to [BartalEtAl1994], and [Albers1999]. The best currently known bound of is due to Rudin [Rudin2001]. For the randomized variant, the lower bound has a current value of [ChenEtAl1994, Sgall1997], while the upper bound is [Albers2002]. For the deterministic preemptive setting, Chen et al. [ChenEtAl1994b] provide a tight bound of for large values of .

More recently, various extension of this basic case have emerged. In resource augmentation settings the algorithm receives some extra resources like machines with higher speed [KalyanasundaramPruhs2000], parallel schedules [KellererEtAl1997, AlbersHellwig2017], or a reordering buffer [KellererEtAl1997, EnglertEtAl2008]. In a related setting, the algorithm might be allowed to migrate jobs [SandersEtAl2009]. A variation that is closely related to our setting is semi-online scheduling, where some additional piece of information is available to the online algorithm in advance. Possible pieces of information include for example the sum of all processing times [KellererEtAl1997, AlbersHellwig2012, KellererEtAl2015], the value of the optimum [AzarRegev2001], or information about the job order [Graham1969]. Refer also to the survey by Epstein [Epstein2018] for an overview of makespan minimization in semi-online scheduling.

Scheduling with testing is directly related to explorable uncertainty, a research area concerned with obtaining additional information of unknown parameters through queries with a given cost. Kahan [Kahan1991] pioneered this line of research in 1991 by studying approximation guarantees for the number of queries necessary to obtain the maximum and median value of a set of uncertain elements. Following this, a variety of problems have been studied in this setting, for example finding the median or -smallest value [FederEtAl2003, Khanna2001, GuptaEtAl2011], geometric tasks [BruceEtAl2005], caching [Olston2000], as well as combinatorial problems like minimum spanning tree [ErlebachEtAl2008, Megow2017], shortest path [FederEtAl2007], and knapsack [GoerigkEtAl2015]. We refer to the survey by Erlebach and Hoffmann [ErlebachHoffmann2015] for an overview. In the scheduling with testing model, the cost of the queries is added to the objective function. Similar settings are considered for example in Weitzman’s pandora’s box problem [Weitzman1979], or in the recent ’price of information’ model by Singla [Singla2018].

1.2 Contribution

In this paper we provide the first results for makespan minimization on multiple machines with testing. We differentiate between general tests and uniform tests , and consider non-preemptive as well as preemptive environments. In Table 1, we illustrate our results for these cases. The parameter corresponds to the number of machines in the instance.

Setting General tests Uniform tests Lower bound
Non-preemptive
Preemptive
Table 1: Overview of results

In the non-preemptive setting, we present our main algorithm with competitive ratio , which we refer to as the SBS algorithm. The function is increasing in and has a value of approximately for . For uniform tests, we can improve the algorithm to a competitive ratio of , which approaches for large values of . Additionally, we analyze a simple Greedy algorithm for general tests with a competitive ratio of , where is the golden ratio. We also provide a lower bound with value . The values of , , the Greedy algorithm and the lower bound are summarized in Table 2. For all values of the SBS algorithm has better ratios compared to Greedy. At the same time, the uniform version of the algorithm improves these results further. Though our algorithms work for any number of machines , they all achieve the same ratio for as was already proven in [DuerrEtAl2018] and [AlbersEckl2020] for uniform and general tests, respectively.

If the scheduler is allowed to use preemption, we obtain a -approximation for both general and uniform tests. The result holds even in the more restrictive test-preemptive setting. The corresponding lower bound of is tight when the number of machines becomes large.

1 2 3 4 5 10 100
Greedy
SBS
Uniform-SBS
Lower Bound
Table 2: Results in the non-preemptive setting for selected values of

We utilize various methods for our algorithms and lower bounds. The Greedy algorithm we present is a variation of the well-known List Scheduling algorithm introduced by Graham [Graham1966]. For the more involved SBS algorithm and its uniform version we employ testing rules for jobs based on the ratio between their upper bound and testing time similar to [AlbersEckl2020]. We additionally divide the schedule into phases based on these ratios, therefore sorting the jobs by the given parameters to guarantee competitiveness. In the preemptive setting, we divide the schedule into two independent phases, testing and execution, and use an offline algorithm for makespan minimization to solve each instance separately. Lastly, the lower bounds we provide are loosely based on a common construction for the classical makespan minimization setting on multiple machines, where a large number of small jobs is followed by a single larger job.

The rest of the paper is structured in the following way: We start by giving some general definitions needed for later sections. In Section 2 we then first prove the competitive ratio of Greedy and the lower bound, before describing the main algorithm for the general case. At the end of the section, we then build a special version of the algorithm for the uniform case. In Section 3 we consider the preemptive setting and give an algorithm as well as a tight lower bound. We conclude the paper by describing some open problems.

1.3 Preliminary Definitions

We use the following notations throughout the document: For a job , the optimal offline running time of , i.e. the time needed by the optimum to schedule on a machine, is denoted as , while the algorithmic running time of , i.e. the time needed for an algorithm to run on a machine, is given by

(1)

It is clear that for any job . Additionally, it holds that , since the processing times are upper bounded by .

At times, we may use the definition of the minimal running time of job , which is given by .

It is clear that any job must fulfill

. In total, we get the following estimation for the different running times:

(2)

Since an algorithm does not know the values , the testing decisions for the jobs are non-trivial. A partial goal for any competitive algorithm is to define a testing scheme such that the algorithmic running times are not too large compared to the optimal offline running times. We provide the following result which was used previously in [AlbersEckl2020] and is based on Theorem 14 of [DuerrEtAl2018]. The given testing scheme based on the ratio between upper bound and testing time is used multiple times within this paper.

Proposition 1

Let job be tested iff for some . Then:

  1. [label=()]

  2. tested:

  3. not tested:

As a direct consequence of Proposition 1, an optimal testing scheme for a single job is given by setting the threshold to the golden ratio [DuerrEtAl2018].

2 Non-preemptive Setting

In this section we assume that preemption is not allowed. Any job has to be assigned to one of available machines. Since we only consider makespan minimization, we may assume that there is no idle time on the machines and the actual ordering of the executions on a machine does not influence the outcome of the objective. It is therefore sufficient to only consider the assignment of the jobs to the machines.

2.1 Lower Bound and Greedy Algorithm

We first prove a straightforward lower bound and extend the simple List Scheduling algorithm from the classical setting to our problem.

For the lower bound we choose negligibly small testing times coupled with very large upper bounds. This forces the algorithm to test all jobs and thus having to decide on a machine for a given job while having no information about its real execution time.

Theorem 2.1

No online algorithm is better than -competitive for the problem of makespan minimization on identical machines with testing, even if all testing times are equal to .

We note that is always a lower bound for our problem (see [DuerrEtAl2018]), which is relevant only for small values of . The proof of Theorem 2.1 is provided in Appendix 0.C.

To prove a simple upper bound, we can generalize the List Scheduling algorithm to our problem variant as follows:

Consider the given jobs in any order. For a job to be scheduled next, test if and only if and then execute it completely on the current least-loaded machine.

Theorem 2.2

The extension of List Scheduling described above is -competitive for minimizing the makespan on identical machines with non-uniform testing, where is the golden ratio. This analysis is tight.

The proof structure is similar to the proof of List Scheduling and uses common lower bounds for makespan minimization. We again refer to Appendix 0.C for all details.

2.2 SBS Algorithm

In this section we provide a -competitive algorithm for the non-preemptive setting. It assigns jobs into three classes and based on their ratios between upper bounds and testing times.

Let be the set of all jobs. We define a threshold function for all and divide the jobs into disjoint sets , where will be further subdivided into and . The set corresponds to jobs where the ratio between upper bound and testing time is large, while jobs in have a small ratio. We define

For the set , we would like the algorithm to be able to distinguish jobs based on their optimal offline running time . Of course, without testing the algorithm does not know these values, so we instead use the minimal running time , which can be computed directly using offline input only, to divide the set further.

We define , such that and : . In other words, is the set of at most jobs in with the largest minimal running times. If this definition of is not unique, we may choose any such set. We set . It follows that if , then .

The idea behind dividing into two sets is to identify the largest jobs according to minimal running time and schedule them first, each on a separate machine. This allows us to lower bound the runtime of the remaining jobs later in the schedule.

In Algorithm 1 we describe the SBS algorithm which solves the non-uniform case and works in three phases corresponding to the sets and :

1 ;
2 ;
3 ;
4 ;
5 foreach  do
6       if  then
7            test and run on an empty machine;
8      else
9            run untested on an empty machine;
10       end if
11      
12 end foreach
13foreach  do
14       test and run on the current least-loaded machine;
15      
16 end foreach
17foreach  do
18       run untested on the current least-loaded machine;
19      
20 end foreach
Algorithm 1 SBS algorithm

In order to have a non-trivial testing decision for jobs in , it makes sense to require that for all . More specifically, we will define the threshold function in the non-uniform setting as follows:

Theorem 2.3

Let be a parameter function of defined as above. The SBS algorithm is -competitive for minimizing the makespan on identical machines with non-uniform testing.

The function is increasing for all and fulfills as well as approximately for . The competitive ratio of the algorithm is explicitly given by

For this function we have as well as approximately if approaches infinity. Additionally, it holds that for all .

Proof

We assume w.l.o.g. that the job indices are sorted by non-increasing optimal offline running times . We denote the last job to finish in the schedule of the algorithm as and the minimum machine load before job as . It follows that the value of the algorithm is .

The value of the optimum is at least as large as the average sum of the optimal offline running times, or

(3)

since in any schedule at least one machine must have a load of at least this average. At the same time, we know that the optimum has to schedule every job on some machine:

(4)

We also utilize another common lower bound in makespan minimization, which is the sum of the processing times of the -th and -th largest job. If there are at least jobs, then some machine has to schedule at least of these jobs:

(5)

Here, is defined as if the instance has less than jobs.

We differentiate between jobs handled by the algorithm in different phases and bound the algorithmic running times against the optimal offline running times. We write and define different values for depending on the set belongs to. It holds that

(6)

by Proposition 1 and the testing strategy of the algorithm.

The objective value of the algorithm depends on the set job belongs to, so we differentiate between three cases. The following proposition upper bounds the algorithmic value for each of these cases:

Proposition 2

The value of the algorithm can be estimated as follows:

To prove this proposition, we utilize the lower bounds (3)-(5) and the estimates (6) for the value of . A critical step lies in the estimation of for , where we are able to lower bound using the size of the -th and -th largest job because the algorithm already ran jobs from in the beginning of the schedule. We refer to the appendix for a detailed proof.

It remains to take the maximum over all three cases and minimize the value in dependence of . The value in the case is always less than the values given by the other cases, therefore we only want to minimize

The left side of the maximum is decreasing in , while the right side is increasing. The minimal maximum is therefore attained when both sides are equal. It can be easily verified that for the given definition of the threshold function both sides of the maximum are equal for all values of .

It follows that the final ratio can be estimated by . ∎

2.3 An Improved Algorithm for the Uniform Case

The previous section established an algorithm with a competitive ratio of approximately . We now present an algorithm with a better ratio in the case when for all jobs. We define the threshold function as follows:

The Uniform-SBS algorithm works as follows: Sort the jobs by non-increasing . Go through the sorted list of jobs and put the next job on the machine with the lowest current load. A job is tested if , otherwise it is run untested.

Theorem 2.4

Uniform-SBS is a -competitive algorithm for uniform instances.

For uniform jobs with , sorting by non-increasing upper bound is consistent with sorting by non-increasing ratio . Hence, Uniform-SBS is similar to the SBS algorithm reduced to the phases corresponding to the sets and , where contains all small jobs. The reason behind running the largest jobs of first in the SBS algorithm was to upper bound the remaining jobs in . For uniform testing times, this bound can be achieved without this special structure.

The function is increasing for all and fulfills as well as for . Computing the competitive ratio explicitly yields

These values start from and approach if . Additionally, it holds that for all . In other words, this special version of the algorithm is strictly better than the general SBS algorithm described in Section 2.2. We defer the proof of Theorem 2.4 to Appendix 0.C.

3 Results with Preemption

In this section we assume that jobs can be preempted at any time during their execution. An interrupted job may be continued on a possibly different machine, but no two machines may work on the same job at the same time. Testing a job must be completely finished before any part of its execution can take place.

It makes sense to additionally consider the following stricter definition of preemption within scheduling with testing: Untested jobs must be run without interruption on a single machine. If a job is tested, its test must also be run without interruption on one machine. The execution after the test may then be run without interruption on a possibly different machine. We call this setting test-preemptive, referring to the fact that the only place where we might preempt a job is exactly when its test is completed. From an application point of view, the test-preemptive setting is a natural extension of the non-preemptive setting, allowing the scheduler to reconsider the assignment of a job after receiving more information through the test.

Clearly, the difficulty of settings within scheduling with testing increases in the following order: preemptive, test-preemptive and non-preemptive. We now present the -competitive Two Phases algorithm for the test-preemptive setting, which can be applied directly to the ordinary preemptive case. Additionally, we construct a lower bound of for the ordinary preemptive case. This lower bound then also holds for test-preemption, and is therefore tight for both settings when the number of machines approaches infinity.

The Two Phases algorithm for the test-preemptive setting works as follows: Let OFF denote an optimal offline algorithm for makespan minimization on machines. In the first phase, the algorithm schedules all jobs for their minimal running time using the algorithm OFF. Herein, the algorithm tests all jobs except trivial jobs with , where running the upper bound is optimal. In the second phase, all remaining jobs are already tested, hence the algorithm now knows all remaining processing times . We then use the offline algorithm OFF again to schedule the remaining jobs optimally. Finally, the algorithm obliviously puts the second schedule on top of the first.

Theorem 3.1

The Two Phases algorithm is -competitive for minimizing the makespan on machines with testing in the test-preemptive setting.

The proof makes use of the assumption that the algorithm has access to unlimited computational power, which is a common assumption in online optimization. If we do not give the online algorithm this power, the result is slightly worse, since offline makespan minimization is strongly NP-hard. We may then make use of the PTAS for offline makespan minimization by Hochbaum and Shmoys [HochbaumShmoys1987] to achieve a ratio of for any , where the runtime of the algorithm increases exponentially with . The complete version of the proof can be found in Appendix 0.C.

Proof (Proof sketch)

Let OFF be any optimal offline algorithm for makespan minimization on machines. In the first phase, our algorithm tests all jobs except trivial jobs and schedules them for their minimal running time using OFF. The resulting value is bounded by the optimum of the original instance.

In the second phase, we use the offline algorithm OFF again to schedule the remaining jobs optimally. The value of OFF is again bounded by the optimum.

The algorithm obliviously puts the second schedule on top of the first. In the worst case the completion time of the entire schedule is the sum of the two sub-schedules. ∎

For the lower bound result we now consider the standard preemptive setting where a job can be interrupted at any time.

Theorem 3.2

In the preemptive setting, no online algorithm for makespan minimization on identical machines with testing can have a better competitive ratio than , even if all testing times are equal to .

We note that also remains a lower bound even for the preemptive case, since two machines cannot run the same job concurrently. It holds only for values of .

Proof

Let us consider the following example: Let be a sufficiently large number and let small jobs be given with as well as one large job with . As argued in the proof of Theorem 2.1, OPT has a value of and we may assume that the algorithm tests every job.

In the preemptive setting we required that any execution of the actual processing time of a job can only happen after its test is completed, therefore any job that finished testing at some time is completed not earlier than . The adversary decides the processing time of by the following rule: If and job has not yet been assigned, set (i.e. set ). Else, set .

If the adversary assigns job at any point, then job finished testing at time . It follows that

Hence the competitive ratio is at least .

All that remains is to show that this assignment of happens at some point during the runtime of the algorithm. Assume that this is not the case, i.e. all jobs finish testing earlier than . The adversary sets all , hence it follows directly that all jobs are completely finished before . But this means that the algorithmic solution has a value of .

Since for all jobs, we know that the average load fulfills

But is a lower bound on the optimal value of the instance, even in the preemptive setting, contradicting . ∎

4 Conclusion

We presented algorithms and lower bounds for the problem of scheduling with testing on multiple identical parallel machines with the objective of minimizing the makespan. Such settings arise whenever a preliminary action influences cost, duration or difficulty of a task. Our main results were a -competitive algorithm for the non-preemptive case and a tight -competitive algorithm for the preemptive case if the number of machines becomes large.

Apart from closing the gaps between our ratios and the lower bounds, we propose the following consideration for future work: A natural generalization of our setting is to consider fully-online arrivals, where jobs arrive one by one and have to be scheduled immediately. It is clear that this setting is at least as hard as the problem considered in this paper. In Appendix 0.A, we provide a simple lower bound with value for this generalization that holds for all values of . An upper bound is clearly given by the Greedy algorithm we provided in Section 2. Finding further algorithms or lower bounds for this new setting is a compelling direction for future research.

References

Appendix 0.A Results for the Fully-Online Setting

As an additional consideration, we also give some results for the fully-online setting, where the jobs arrive sequentially one by one. Whenever a job arrives, its upper bound and testing time is revealed. Testing the job then reveals the processing time .

In this section we provide improved lower bounds compared to the semi-online setting, for which the lower bound was given by . Recall that this was tight for . The following result gives a better bound for all instances with at least two machines.

Theorem 0.A.1

Let . In the fully-online setting, no algorithm is better than -competitive for the problem of makespan minimization on multiple identical machines with testing, even if all testing times are equal to .

Proof

We consider an instance with jobs where the first jobs have for all . Additionally, there is a single job with values . The processing times of the first jobs are irrelevant, since it is clear that running a job untested is always optimal. If the algorithm tests the final job, then the adversary sets , otherwise it sets .

The smaller jobs arrive first. If the algorithm stacks any two or more of these jobs on the same machines, then we have for the partial instance consisting only of the first jobs.

Hence assume this is not the case and the algorithm produces a flat schedule of height after the first jobs. If the final job is now part of the instance, the optimum puts two of the smaller jobs on the same machine and can run on its own machine.

If the algorithm tests the final job then it has an algorithmic running time of . The optimum runs the job untested, resulting in a final optimal value of . In total:

On the other hand, if the algorithm runs untested, then the algorithmic running time is given by , while the optimum tests the job yielding . This gives

We now want to improve this simple and direct lower bound to some value larger than . It turns out this is increasingly harder if the number of machines increases. The reason for this difficulty lies in the typical construction of lower bound examples based on several ’rounds’ of jobs where the algorithm is forced to produce flat schedules in order to be competitive. The example above also employs this construction in the first step.

We have not yet used the difficulty in deciding the testing strategy for such rounds of jobs. For , we can improve the lower bound to a value close to .

Theorem 0.A.2

In the fully-online setting, no algorithm is better than -competitive for the problem of makespan minimization on two identical machines with testing.

The proof uses parameter optimization based on the testing and running times of jobs in one ’round’ and of the final job. Since the number of parameters in this construction increases exponentially in dependence of , we were unable to extend this result to general values of . In particular, it is not directly clear whether parameters for instances with larger values of can be chosen such that the same or a higher bound holds. We present the easiest case of as a stand-in for all results with small values of which are still computationally tractable.

Proof

The counterexample consists of three jobs. The first job has a ratio between upper bound and testing time of . We may scale all remaining running times such that, independent of the testing decision of the algorithm for the first job, we can always assume that and .

The running times of the second and third job are parameterized with the following values: with and . The adversary always chooses such that the outcome is worst possible for the algorithm, that is if is tested and otherwise.

We start by considering the first ’round’ of jobs, which consists only of the first and second job. We want the algorithm to schedule these jobs on two distinct machines. Hence we have to make sure that the competitive ratio is high in case the algorithm uses the same machine for both jobs. So assume for now that the algorithm does.

Clearly, the optimum always uses both machines if there is only two jobs. Hence it has a value of . If the algorithm tests the second job, then

(7)

If the algorithm runs the second job untested, then

(8)

These are the first two fractions we want to maximize. Assume now that the algorithm does not use the same machine for the first two jobs. Then the third job arrives and will be scheduled on top of the smaller of the two previous jobs. This gives an algorithmic value of .

We assume that the values of the parameters are such that the optimum puts jobs 1 and 2 on one machine and job 3 on the other. If this is actually not the case then the optimal value can only be smaller. Hence we have .

We now differentiate four cases corresponding to the testing decision of the algorithm with respect to jobs 2 and 3. The realizations of the processing times are chosen by the adversary as described above.

Jobs 2 and 3 are tested. Then

(9)

Job 2 is tested and job 3 is not tested. Then

(10)

Job 2 is not tested and job 3 is tested. Then

(11)

Jobs 2 and 3 are not tested. Then

(12)

All that remains is optimizing the minimum value of (7)-(12). We used numeric optimization and received values of and . This yields a minimum of

For it turned out that we may achieve a lower bound larger than with two equal-sized jobs in the first round. This changes as soon as , where three equal-sized jobs in the first round lead to a ratio of at most when the algorithm stacks two of these three jobs on the same machine. It is unclear whether this can be remedied for arbitrary values of by choosing suitable parameters.

Appendix 0.B An Improved Result for Uniform Instances with a Small Number of Uncertain Jobs

For an additional result in the uniform setting we take a closer look at jobs whose lower bound is smaller than their testing time. We call any job with (in the case of uniform testing) uncertain. For these jobs, the algorithm has to make a non-trivial decision whether to test or not. For all other jobs, running the upper bound untested is optimal. Let be ratio between uncertain jobs and machines, that is

Lemma 1

For instances with , there exists a -competitive non-preemptive algorithm for the uniform testing case.

Proof

Our algorithm for the uniform setting with first tends to all uncertain jobs before considering any others. There are at most such uncertain jobs by the definition of . We make the algorithm simply assign them to one machine each. By Proposition 1 we have for all uncertain jobs if we choose the parameter equal to the golden ratio.

We can now employ a simple trick to solve the rest of the instance: Since the only remaining jobs are those without uncertainty, the algorithm has complete information about the algorithmic running times of the instance, that is it knows all values , even those of jobs who are not yet scheduled. At this point we employ the Largest Processing Time (LPT) algorithm. LPT is a -approximation for makespan minimization on parallel machines [Graham1969].

Our algorithm has only assigned at most one job per machine so far. Additionally, since for all uncertain jobs and for all other jobs, we know that these already assigned jobs correspond to the largest jobs w.r.t. the algorithmic running times. Hence the algorithm can assign all remaining jobs such that the final assignment is exactly the same as in the solution given by the offline algorithm LPT.

Let us denote for any algorithm the solution given by on the instance with running times as . Then the final value of our online algorithm ALG fulfills

As we have argued above, it holds that for all jobs. In particular, it follows that the instance with processing times has an optimal solution which is not larger than the optimal solution of the instance with processing times . Therefore,

Altogether, it follows that . ∎

Appendix 0.C Proofs

0.c.1 Proof of Theorem 2.1

Proof

Let be a sufficiently large number and consider the following instance: On machines we are given small jobs with values and . Additionally, we are given a single large job with .

The optimum tests all jobs and has a value of

which is achieved by distributing all small jobs onto machines and running job on the final machine.

It is immediately clear that if an algorithm decides to run any job untested, the ratio between the algorithmic solution and the optimum becomes larger as increases:

Hence assume that the algorithm tests everything. Since all jobs have the same testing times and upper bounds, the algorithm cannot distinguish between them. In particular, it does not know which one the large job is. Hence the adversary can decide the realization of the processing times whenever a job is being tested. Assume the algorithm runs some job on machine . Let be the current number of jobs on machine excluding . Then the adversary sets as follows:

  • If and job is not yet run, set (i.e. set ).

  • Else, set .

If at any point the algorithm tests some job and the corresponding machine fulfills for the first time, then the adversary sets and we have

The competitive ratio of the algorithm is then given by

It remains to show that at some point the number of jobs on all machines is at least , and hence the algorithm is forced to run the next job on such a machine. Assume this is not the case and the adversary declares the processing times of all jobs to be small. The average load on the machines after such jobs is given by . Since the adversary has not set any job as large, all machines must also have a load of at most , which is a contradiction. ∎

0.c.2 Proof of Theorem 2.2

Proof

The value of the optimum is at least as large as the average sum of the optimal offline running times, or

At the same time, we know that the optimum has to at least schedule every job on some machine:

We set in Proposition 1 and combine parts (a) and (b) to bound the algorithmic running time:

Let be the job that finishes last in the schedule. Let be the minimum machine load right before is assigned. It follows that job starts at time and finishes at time . This implies that the value of the algorithm is equal to as well.

The value of is at most the average sum of algorithmic running times of all jobs scheduled before . We overestimate this average using all jobs except itself. We receive

We can estimate the algorithmic value by

where we used the lower bounds (3) and (4) in the last step.

Finally, we provide a short example to see that the above analysis is tight. We note that the counterexample depends on the fact that the algorithm does not sort the jobs. It is unclear whether Greedy with some additional sorting strategy yields a provably better result. However, since Greedy without sorting only considers one job after the other, it is directly applicable to the fully online case (see Section 0.A).

Consider small jobs with and a single large job with . It is clear that the optimal makespan is given by .

Since Greedy doesn’t sort the jobs, we may assume it tests and schedules all small jobs first. Afterwards, all machines have a load of . Then, job is tested and run, yielding a final makespan of

0.c.3 Proof of Proposition 2

Proof

Let the final job and the minimum machine load before job be defined as in the proof of Theorem 2.3. We want to estimate the value of the algorithm .

The value of is bounded by the average of the algorithmic running times of all jobs before . Let be the set of jobs the algorithm assigns before . Then:

Case 1: Job is in . In this case, is the first job assigned to its machine by the definition of the assignment for jobs in this set. Since is also the last job on its machine, it follows that is the only job on its machine and hence . By (6) and (4) we have

Case 2: . In this case the set only contains jobs from and . Since itself is also in , we can use (6) to write

where we additionally used .

For the value of the algorithm we use (6) again to receive

Here we additionally used (3) and (4) in the final step.

Case 3: . The set may now contain jobs of any set. We estimate as best as possible using (6). Since , we have

To receive the desired competitive ratio, we want to estimate . It now becomes apparent why we chose to schedule the largest jobs w.r.t. the minimal running time in the first phase of the algorithm: Since is in the set (and therefore is not empty), we know that and these jobs have a minimal running time not smaller than .

Since the are lower bounds for the optimal offline running times , it follows that for all jobs . Including itself there are at least such jobs. In particular, using the sorting of the optimal offline running times, we have and for the -th and -th largest job. With equation (5), we receive

If , then it follows directly that . If on the other hand , then, since is in , we have . Because of it follows in both cases that

The value of the algorithm is then

This concludes the proof of the proposition. ∎

0.c.4 Proof of Theorem 2.4

Proof

As before, let be the lower bound (3). The lower bounds (4) and (5) also hold. We again denote the last job to finish as and the minimum machine load before as . Hence, the value of the algorithm is .

By Proposition 1, the testing scheme of the algorithm yields , where

We first deal with the case when the number of jobs is less than or equal to the number of machines . In this case, the algorithm puts at most one job on every machine. Consider job , the last job to finish. By the testing scheme of the algorithm it holds that

where the second inequality holds due to and the last due to equation (4). This concludes the special case where .

Let us now consider . We assume w.l.o.g. that the job indices are sorted by non-increasing optimal offline running times , i.e. . Since we now have at least jobs, the lower bound of the -th and -th largest job is applicable.

We bound the value of by the average of the algorithmic running times of all jobs run before . Let be the set of jobs the algorithm assigns before .

Case 1: The algorithm tests job . Then, by the non-increasing order of the upper bounds, all jobs in are tested as well. Hence for all . Combining this with (3), we get

Finally, since itself is also tested, we can write

Case 2: The algorithm runs untested. If is part of the first round of jobs, that is if is the only job on its machine, then we can argue analogously to the case that .

Otherwise, recall the definition of the minimal running time of job , which is

in the uniform testing case. As we argued previously, it holds for all jobs.

Since there are at least jobs the algorithm considers before , and the algorithm sorts all jobs by , we know that there exist at least jobs with , including itself. Since the are lower bounds for the optimal offline running times , it follows that for at least different jobs . Using the sorting of the optimal offline running times and (5), we receive

Since is not tested by the algorithm we have . Now, if , then it follows directly that . If on the other hand , then . Since it follows in both cases that

We do not know which jobs before the algorithm tests or runs untested. From follows in both cases, hence we have for all . We write