Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

04/30/2020 ∙ by Yu Su, et al. ∙ Google Purdue University California Institute of Technology 0

Scheduling precedence-constrained tasks is a classical problem that has been studied for more than fifty years. However, little progress has been made in the setting where there are communication delays between tasks. Results for the case of identical machines were derived nearly thirty years ago, and yet no results for related machines have followed. In this work, we propose a new scheduler, Generalized Earliest Time First (GETF), and provide the first provable, worst-case approximation guarantees for the goals of minimizing both the makespan and total weighted completion time of tasks with precedence constraints on related machines with machine-dependent communication times.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we study scheduling precedence-constrained tasks onto a set of heterogeneous machines with communication delays between the machines in order to minimize the makespan or the total weighted completion time. Initially, work on this topic was motivated by the goal of scheduling jobs on multi-processor systems, e.g., [coffman1976computer]

. Today this problem is timely due to the prominence of large-scale, general-purpose machine learning platforms. For example, in systems such as Google’s TensorFlow

[abadi2016tensorflow]

, Facebook’s PyTorch

[paszke2017automatic] and Microsoft’s Azure Machine Learning (AzureML) [chappell2015azure], machine learning workflows are expressed via a computational graph, where jobs are made up of tasks, represented as vertices, and precedence relationships between the tasks, represented as edges. This “precedence graph” abstraction allows data scientists to quickly develop and incorporate modular components into their machine learning pipeline (e.g., data preprocessing, model training, and model evaluation) and then easily specify a workflow. The graphs that specify the workflows in platforms such as TensorFlow, PyTorch and AzureML can be made up of hundreds or even thousands of tasks, and the jobs may be run on systems with thousands of machines. As a result, the performance of the platforms depends on how these precedence-constrained tasks are scheduled across machines.

The goal of scheduling jobs composed of precedence-constrained tasks has been studied for more than fifty years, starting with the work of [graham1969bounds]. The simplest version of this scheduling problem focuses on scheduling a single job with precedence-constrained tasks on identical parallel machines with the goal of minimizing the makespan: the time until the last task completes. More generally, the goal of minimizing the total weighted completion time is considered, where the total weighted completion time is a weighted average of the completion time of each task in the job111Makespan is a special case of total weighted completion time as a dummy task with weight one can be added as the final task of the job, with all other tasks given weight zero.. For the goal of minimizing the makespan, Graham showed that a simple list scheduling algorithm can find a schedule of length within a multiplicative factor of of the optimal. This result is still the best guarantee known for this simple setting. Since then, research has sought to generalize the setting considered in two important ways: (i) to non-identical machines and (ii) to the case where communication is needed between tasks.

Addressing these two issues has been one of the major goals of the field since Graham’s initial result fifty years ago. Since that time, considerable progress has mostly been made on generalizations to heterogeneous machines. The focus has been on (uniformly) related machines, a model where each machine has a speed , each task has a size , and the time to run task on machine is . Under the related machine model, a sequence of results in the 1980s and 1990s culminated in a result that showed how to use list scheduling algorithms in combination with a partitioning of machines into groups with “similar” speeds in order to achieve an -approximation algorithm for makespan [chudak1999approximation]

. This result was also extended in the same work to total weighted completion time by proposing a time-indexed linear programming technique. The extension yields an

-approximation for total weighted completion time. The idea of using a group assignment rule to partition machines into groups of machines with similar speeds and then to assign tasks to a group is a powerful one and has shown up frequently in the years since; it recently led to a breakthrough when the idea of partitioning machines was adapted further and combined with a variation of list scheduling to obtain a -approximation algorithm for both makespan and total weighted completion time [li2017scheduling].

Despite the progress made in generalizing from identical machines to heterogeneous machines, there has been little progress toward the goal of incorporating communication delays. Machine-dependent communication delays are crucial for capturing issues such as data locality and the difference between intra-rack and inter-rack communication. We note that if communication delays are machine independent, they can simply be viewed as part of the processing time, making the problem much easier. The state-of-the-art result in the case of communication delays is [hwang1989scheduling], which studies machine-dependent communication costs in the setting of identical machines. In this context, a greedy algorithm called Earliest Time First (ETF) has been shown to produce schedules with a makespan bounded by , where is the optimal schedule length when ignoring communication time and is the maximum amount of communication of a chain (path) in the precedence graph. However, the analysis for the case of identical machines in [hwang1989scheduling] is quite complex and it has proven difficult to generalize to the related machines setting. As a result, there has been no progress outside the context of identical machines in the thirty years since [hwang1989scheduling].

Given the challenge of designing schedulers that are approximately optimal for related machines with machine-dependent communication time, most work studying the design of scheduling policies in this context has relied on developing scheduling heuristics and evaluating these heuristics numerically, e.g.,

[ananthanarayanan2014grass, ren2015hopper, wu1990hypertool, xu2015hybrid, yang1994dsc, topcuoglu2002performance]. For a recent survey see [wu2015workflow, mayer2017tensorflow] and the references therein.

Contributions. In this paper we propose a new scheduler, Generalized Earliest Time First (GETF), and prove that it computes a makespan that is at most of length in the case of related machines and machine-dependent communication times, where is the amount of communication time in a chain (path) in the precedence graph. Additionally, we generalize our result to the objective of total weighted completion time and show that GETF produces a schedule whose total weighted completion time is at most , where is the optimal total weighted completion time, is the weight in the objective, and is the communication requirement in a chain in the precedence graph. These two results address long-standing open problems. Note that the makespan result matches state-of-the-art bounds for the special cases (i) when there is zero communication time and (ii) when the machines are identical. In the case of total weighted completion time, no previous result exists for the case of identical machines with communication time, but the result matches the best known bound for the case with related machines and zero communication time.

The key technical advance that enables our new result is a dramatically simplified analysis of ETF in the setting of identical machines. The state-of-the-art result in this setting is [hwang1989scheduling], which is established using a long, complex argument. In contrast, the core idea in our proof of Theorem 4.1 is a short, simple proof of a Separation Principle which can be used to provide a novel proof of the approximation ratio for ETF in the case of identical machines. The proof is simple and general enough that it can be extended from identical machines to related machines by adapting recent advances from [li2017scheduling].

Related literature. In recent years, the design and optimization of large-scale general-purpose machine learning platforms has been an overarching goal, bridging many communities in both industry and academia. The emergence of platforms such as TensorFlow, PyTorch and AzureML illustrate the power of such systems to democratize tools from machine learning, making them accessible and scalable for anyone.

Since the emergence of such systems, there has been a torrent of work that seeks to optimize the scheduling and assignment of the precedence-constrained graphs in such systems. Heuristics have emerged for managing straggler tasks, e.g., [ren2015hopper, ananthanarayanan2013effective, ananthanarayanan2014grass, ananthanarayanan2010reining]; scheduling tasks with different computational properties, e.g., jobs with MapReduce-type structures [vavilapalli2013apache, lin2013joint, palanisamy2011purlieus, tan2012delay, verma2012two, wang2016maptask], scheduling approximation jobs [ananthanarayanan2014grass, zaharia2008improving, ananthanarayanan2010reining], and managing communication times [mayer2017tensorflow, hashemi2018tictac]. Many of these heuristics have led to system designs that have had a significant industrial impact.

Such designs typically address the challenges associated with precedence constraints in ad hoc ways based on simplifying assumptions about the structures of the graphs. In contrast, there is a long history of analytic work seeking to design schedulers for precedence-constrained tasks with provable worst-case guarantees. As we have already mentioned, the initial results on this topic for makespan were provided by Graham, who gave a -approximation algorithm based on list scheduling for [graham1969bounds]. A decade later, it was shown by [lenstra1978complexity] that it is NP-hard to approximate within a factor of . This left a gap which has been essentially closed recently, when [svensson2010conditional] proved that it is NP-hard to achieve an approximation factor less than , given the assumption of a new variant of the Unique Game Conjecture introduced by [bansal2009optimal]. In the case of total weighted completion time objective , the negative results carry over from the makespan objective since makespan objective can be viewed as a special case of total weighted completion time objective. Moreover, under the assumption of the stronger version of the Unique Game Conjecture, it is shown in [bansal2009optimal] that it is even hard to approximate within a factor of for the problem with one machine. On the positive side, a -approximation was given in [hall1996scheduling], and [munier1998approximation, queyranne2006approximation] later improved it to a -approximation. The current best known result is a -approximation by [li2017scheduling] via a time-indexed linear programming relaxation technique.

The results mentioned above all focus on identical machines with zero communication delays. When related machines are considered, the problem becomes more challenging. An early result on this topic is [chudak1999approximation], which proposed a Speed-based List Scheduling (SLS) algorithm that obtains an approximation of for . A time-indexed linear programming technique has been proposed in the same work that gives a bound for . Recently, an improvement to for both objectives was proven in [li2017scheduling]. The best known lower bound for the problem of related machines is from [bazzi2015towards], which shows that it is impossible for a polynomial time algorithm to approximate the minimal makespan to any constant factor assuming the hardness of an optimization problem on -partite graphs.

In contrast, when communication delay is considered, much less is known. To our knowledge, no approximation ratio is known for , and this open problem was noted by [drozdowski2009scheduling]. The only algorithm with a guaranteed worst-case performance bound in this setting is ETF [hwang1989scheduling], which provides a bound of on the makespan in the case of identical machines. Prior to our paper, no algorithm with a worst-case approximation guarantee for either makespan or total weighted completion time is known for the case of related machines with communication delays, i.e., and .

2 Problem formulation

We study a model that generalizes by including machine-dependent communication times. Our goal is to derive bounds on the total weighted completion time and the makespan, which is an important special case of the total weighted completion time that uses a particular choice of .

Specifically, we consider the task of scheduling a job made up of a set of tasks on a heterogeneous system composed of a set of machines with potentially different processing speeds and communication speeds. The tasks form a directed acyclic graph (DAG) , in which each node represents a task and an edge between task and task represents a precedence constraint. We interchangeably use node or task, as convenient. Precedence constraints are denoted by a partial order between two nodes of any edge, where means that task can only be scheduled after task completes. Let represent the processing demand of task . The amount of data to be transmitted between task and task is represented by the edge weight of .

The system is heterogeneous in two aspects: processing speed and communication speed. For processing speed, we consider the classical related machines model: a machine has speed , and it takes uninterrupted time units for task to complete on machine . Specifically, computer resources such as CPUs and GPUs have varying speeds; hence schedulers must be able to handle heterogeneous servers. The communication speed between any two machines is heterogeneous across different machine pairs. We index the machine to which task is assigned by . If and , then communication time between task and in the DAG is .

For simplicity, we consider a setting where the machines are fully connected to each other, so any machine can communicate with any other machine. This is without loss of generality as one can simply set the communication speed between any two disconnected machines to 0. We also assume that the DAG is connected. Again, this is without loss of generality because, otherwise, the DAG can be viewed as multiple DAGs and the same results can be applied to each. As a result, our results trivially apply to the case of multiple jobs. Additionally, our model assumes that each machine (processing unit) can process at most one task at a time, i.e., there is no time-sharing, and the machines are assumed to be non-preemptive, i.e., once a task starts on a machine, the scheduler must wait for the task to complete before assigning any new task to this machine. This is a natural assumption in many settings, as interrupting a task and transferring it to another machine can cause significant processing overhead and communication delays due to data locality, e.g., [kwok1999static].

The goal of the scheduler in our model is to minimize the total weighted completion time of the job, denoted by , where is the completion time of task and is the weight associated with task . We also consider the makespan, denoted by , which is the time when the the final task in the DAG completes. Note that the problem we consider is an offline scheduling problem. This is a classical problem with relevance to modern ML platforms, which use batch scheduling of precedence constrained tasks in their pipelines, e.g. [abadi2016tensorflow]. It is also known to be challenging. Specifically, minimizing the makespan (and hence also minimizing the total weighted completion time) of jobs with precedence constraints is known to be NP-complete [garey1979computers]. Thus, we aim to design a polynomial-time algorithm that computes an approximately optimal schedule. We say that an algorithm is a -approximation algorithm if it always produces a solution with an objective value within a factor of of optimal in polynomial time.

Our main results use three important concepts. First, our results provide bounds in terms of and , which are the optimal makespan and the optimal total weighted completion time if the communication delays were zero, respectively. Note that and are a lower bound of the corresponding objectives of the problem when communication delays are not included. Second, we provide bounds in terms of the communication time of a terminal chain of the schedule. A chain in the DAG is a sequence of immediate predecessor-successor pairs, whose first node is a node with no predecessor and last node is a leaf node with no successors. Third, we provide bounds in terms of the communication time of a terminal chain of a subset of the DAG that is naturally formed in the scheduling process. Formally, for any given schedule, a terminal chain of length can be constructed in the following fashion. We start with one of the tasks that ends last in the given schedule, denoted as . Among all the immediate predecessors of node , we pick one of the tasks that finishes last and define it as . In such a way, we can construct a chain of tasks until the first node in the chain does not have a predecessor. There may be many such terminal chains, and our results apply to any arbitrary terminal chain for the given schedule.

3 Generalized Earliest Time First (GETF) Scheduling

In this section, we introduce a new algorithm – Generalized Earliest Time First (GETF) – for scheduling tasks with precedence constraints in settings where servers have heterogeneous service rates and communication times. For GETF, we provide provable worst-case approximation guarantees for both the goal of minimizing the makespan and minimizing the total weighted completion time.

At its core, GETF is a greedy algorithm. Like ETF, it seeks to run tasks that can be started earliest, thus minimizing the idle time created by the precedence constraints in a greedy way. However, this simple heuristic does not take into account the potential difference between the service rates of different machines. For this, GETF is similar to SLS. It uses a group assignment function to determine sets of “similar” machines and then assigns tasks to different groups of machines. Within the groups of similar machines, GETF uses the ETF greedy allocation rule.

GETF is parameterized by a group assignment function and a tie-breaking rule, and proceeds in two stages. At every iteration, GETF finds a set of all the tasks that are ready to process and are not yet scheduled. For every task in , GETF calculates the earliest starting time if it was only allowed to schedule on machines in the assigned group. Then, GETF computes , the set of tasks in with the earliest starting times, and chooses one of the tasks to process on a machine based on the tie-breaking rule. The pseudocode for GETF is presented in Algorithm 1 and Figure 1 in section 3.3 illustrates the operation of GETF on a simple example (Example 1).

INPUT: group assignment rule , tie-breaking rule

OUTPUT: schedule with machine assignment mapping and starting time mapping

1:  
2:  while   do
3:     
4:      s.t.
5:     
6:     Choose from to start on machine with a starting time based on the given tie-breaking rule
7:     
8:     
9:  end while
Algorithm 1 Generalized Earliest Time First (GETF)

GETF can be instantiated with different group assignment and tie-breaking rules. To understand how these rules work, consider a situation where the machines are divided into groups by a group assignment rule. Let denote the group of machines to which task can be assigned, . Given this notation, a schedule under GETF consists of two mappings: a mapping from each task to its assigned machine and a mapping from each task to its starting time. Further, for any schedule with produced by GETF, of the produced schedule should be consistent with group assignment function , i.e., for each task .

The choice of the group assignment rule has a significant impact on the performance of GETF. Indeed, different group assignment functions are used for the goals of minimizing the makespan and total weighted completion time. While our results hold for any tie-breaking rule, different tie-breaking rules could provide meaningful improvements in real-world workloads. As it could be helpful to keep a specific tie-breaking rule in mind while considering the algorithm and proofs, the reader may find it helpful to consider random tie-breaking. Our technical results are based on the specific group assignment functions described in the following subsections.

3.1 A Group Assignment Rule for Makespan

The group assignment rule for the goal of minimizing the makespan that we focus on is adapted from SLS, which is designed for the setting without communication time. Specifically, machines of similar speeds are grouped together as follows.

First, all the machines with speed less than a fraction of the speed of the fastest machine are discarded. Then, the remaining machines are divided into groups where , . Note that . Given the removal of the slowest machines, we can assume that any remaining machine has speed within a factor of of the fastest machine. Without loss of generality, we assume the speed of the fastest machine is and the group contains machines with speeds in range .

It may seem strange that some machines are discarded, but note that the total speed of discarded machines is not bigger than the speed of the fastest machine. So, if we consider the scheduling problem with zero communication time, removing these machines at most doubles the makespan in the worst case.

After dividing machines into groups in the preprocessing step, we need to assign the machines. This step is more involved than the division. The design of the group assignment rule is based on the solution of a linear program (LP), which is a relaxed version of the following mixed integer linear program (MILP).

(1a)
(1b)
(1c)
(1d)
(1e)
(1f)

While the MILP is only designed to produce a group assignment rule, its optimal solution does not necessarily provide a feasible schedule. In the MILP, if task is assigned to machine ; otherwise . For each task , denotes the completion time of task . Constraint (1a) ensures that every task is processed on some machine. For any task , processing time is bounded by its completion time as in constraint (1b). Constraint (1c) enforces the precedence constraints between any predecessor-successor pair . Constraint (1d) guarantees that the total load assigned to machine is and it should not be greater than the makespan. Finally, constraint (1e) states that the makespan should not be smaller than the completion time of any task.

Since we cannot solve the MILP efficiently, we relax it to form an LP by replacing constraint (1f) with . Let denote the optimal solution of this LP. Note that provides a lower bound on , the optimal makespan for the same problem with zero communication time.

For a set of machines, let denote the total speed of machines in , i.e.,

Define as the total fraction of task assigned to machines in set :

For any task , define as the largest group index such that at least half of the tasks are fractionally assigned to machines in groups :

We note that any choice of constant above works for the purpose of our worst case analysis of GETF, but the choice can potentially have an impact on its empirical performance. Thus the choice of the parameter should be further optimized when applied in practice. Each task is assigned to the group that maximizes the total speed of machines in that group among candidates , i.e.,

3.2 A Group Assignment Rule for Total Weighted Completion Time

The group assignment rule for the goal of minimizing the total weighted completion time is similar in spirit to but is based on modified solutions of a different LP. We divide machines into groups in the same way as in Section 3.1. Without loss of generality, we assume that for any task to be processed on any machine . Thus, we can divide the time horizon into the following time-indexed intervals of possible task completion times: where and for . Then, the MILP that forms the basis for the group assignment rule can be formulated as follows:

(2a)
(2b)
(2c)
(2d)
(2e)
(2f)
(2g)

Again, the MILP is only designed to find a group assignment rule and thus its optimal solution does not necessarily produce a feasible schedule. Here, if task is assigned to machine and it completes in the th interval . For each task , denotes the completion time of task and represents its weight in the objective of total weighted completion time. Constraint (2a) enforces that each task will be assigned to some machine. Constraint (2b) guarantees that the completion time of a task is not smaller than its processing time. Constraints (2c) and (2d) together enforce the precedence constraint for every predecessor-successor pair. Constraint (2e) guarantees that the completion time of task is not smaller than the left boundary of the th interval . The total load assigned to machine up to th interval is , and it should not be greater than the upper bound as enforced in constraint (2f).

To define the group allocation rule, we relax constraint (2g) to form an LP. As in the previous section, let denote the optimal solution for this LP. Note that provides a lower bound for . For any task , define as the the minimum value of such that both and are satisfied. Intuitively,

can be viewed as a rough estimate of the completion time of task

. Define as the total fraction of task over any machine in the first intervals with respect to solution :

We construct a set of feasible solutions based on the optimal solution for the LP:

(3)

Notice that the group assignment rule is of the same form as , with replacing . For task j, define as before but with respect to instead of :

The group assignment rule for the goal of minimizing the total weighted completion time follows as below:

3.3 A Comparison of GETF and SLS

(a)
(b)
(c)
(d)
Figure 1: An illustration of GETF running on Example 1. (a)-(d) show the first four iterations.
(a)
(b)
(c)
(d)
Figure 2: An illustration of SLS running on Example 1. (a)-(d) show the first four iterations.

The description of GETF above highlights that it combines the greedy heuristic of ETF with the speed-based assignment heuristic of SLS. This enables GETF to provide guarantees for settings with both heterogeneous processing rates and communication delays. In contrast, SLS does not provide guarantees in settings with communication time. This is a result of the fact that SLS is based on list scheduling and does not always schedule the earliest task first, thus making it impossible to bound the overall idle time in between tasks.

To illustrate the difference between GETF and SLS, we provide a simple example of scheduling a job made up of four tasks.

Example 1.

We consider a job made up of four tasks, with processing demands that are to be scheduled on a set of two identical machines with the same processing speed equal to . The weight for the edges in the graph are listed as below: . We assume for ; otherwise for .

The schedules of GETF and SLS are illustrated in Figures 1 and 2. Note that, since the servers are identical, the group assignment rule does not play a role in these examples. Given a priority list , a possible schedule produced by SLS puts tasks and on machine and assigns the rest of tasks to machine as demonstrated in Figure 2. A terminal chain for the given schedule is task followed by task , and the idle time of length between the end of task and the start of task on machine is not bounded by the communication time between task and . In contrast, task starts earlier on machine in a schedule produced by GETF, see Figure 1. List scheduling does not always schedule the earliest task at each step, thus making the idle time on machine not necessarily bounded by communication time between task and task . Our proofs in Section 4.1 highlight that maintaining a tight bound on the communication time between tasks is crucial to achieving a good approximation ratio in settings with machine-dependent communication time.

4 Results

Our main results bound the approximation ratio of GETF in settings with related machines and heterogeneous communication time for the goals of minimizing the makespan and minimizing the total weighted completion time.

4.1 Makespan

In the case of minimizing the makespan, our main result provides a bound in terms of the communication time of a terminal chain of the schedule. Specifically, let be a terminal chain for the schedule and define as the communication time over such a chain in the worst case, i.e.

where is defined as the slowest speed between , the machine assigned to and any machine in the group , i.e.,

Note that can be computed efficiently and minimized over all the terminal chains using dynamic programming and that the tie-breaking rule can have an impact on due to its impact on terminal chains.

Theorem 4.1.

For any schedule produced by GETF with group assignment rule,

where is the optimal schedule length obtained if communication time for all pairs were zero.

Theorem 4.1 represents the first result for makespan in the setting of related machines and heterogeneous communication time, addressing a problem that has been open since ETF was introduced for identical machines thirty years ago. Additionally, it matches the state-of-the art results for the case without communication time, where the best known approximation ratio is [li2017scheduling], and the case with communication time but identical machines, where the best known approximation ratio is [hwang1989scheduling].

Concretely, in the special case of identical machines, the group assignment rule is no longer required when implementing GETF since all machines share the same speed and so there is only one group of machines. Thus, GETF reduces to ETF. The theorem makes use of which is defined as

Note that differs from since it is an average over the terminal chain. The result we obtain in this case is the following, which matches the current state-of-the-art result of [hwang1989scheduling].

Proposition 4.2.

Consider a setting with identical machines. For any schedule produced by GETF,

where is the optimal schedule length obtained if communication time for all pairs were zero.

4.2 Total Weighted Completion Time

Similarly to the makespan case, we provide a bound with respect to the communication time of chains. However, since total weighted completion time depends on the completion time of every task (instead of just one task as in the case of makespan), the communication time of terminal chains of many subsets of the DAG show up in the bound. More formally, assume that the tasks are indexed with respect to their order in the schedule determined by GETF, denoted by . At iteration , task is to be scheduled. Let denote a DAG formed by a set of the tasks that have been scheduled so far and the corresponding edges within these tasks. Define to be a subset of the given schedule up to iteration , i.e., it is a schedule for DAG . This definition ensures that task is one of the tasks that ends last in the schedule . Now, let be a terminal chain that ends with task in the schedule , and define as the communication time over such a chain in the worst case, i.e.,

This definition of generalizes the notion of used in Theorem 4.1 for makespan and plays a similar role in the theorem below.

Theorem 4.3.

For any schedule produced by GETF with group assignment rule ,

where is the optimal total weighted completion time obtained if communication time for all pairs was zero.

Theorem 4.3 is the first result on total weighted completion time for the setting of related machines with heterogeneous communication time and it matches the bounds in cases where previous results exist. In particular, if the weights are chosen so as to recover makespan, then the bound matches that of Theorem 4.1. Similarly, results for identical machines can be recovered as done in the case of makespan. However, note that the group assignment rule used for GETF here is different than that in Theorem 4.1. The rule used in Theorem 4.3 applies more generally but, while both group assignment rules yield the same worst-case performance bound for makespan, we expect that the rule used in Theorem 4.1 will lead to a smaller makespan in most practical settings as it is designed for the purpose of minimizing the makespan.

5 Proofs

In this section, we present our proofs of Theorems 4.1 and 4.3. The general form of both arguments is similar; however, the case of total weighted completion time is more involved. The first step of our argument is to show a general upper bound, which is valid for GETF regardless of choices of group assignment function , and tie-breaking rule. This Separation Principle can be used to easily establish the result for makespan in the case of identical machines (Proposition 4.2), and represents a significant simplification compared to existing proofs of that result in the literature. We then tighten the general bound by taking advantage of the choices of described in Section 3 for makespan and total weighted completion time. Finally, we establish a connection between the makespan and total weighted completion time in the same settings by introducing a time-indexed LP that enables us to bound the total weighted completion time.

5.1 A Separation Principle

The Separation Principle presented here is a key component of our proof of Theorem 4.1. The core of nearly all proofs in this area is the construction of a chain, which is then used to bound the overall makespan. This idea goes back to the first list scheduling algorithms proposed by [graham1969bounds]. The key to our argument is to bound the amount of communication time between any predecessor-successor pairs in a terminal chain. However, as we discuss in Section 3, it is not possible to do this under list scheduling algorithms.

Our approach also differs considerably from the approach used to study ETF in [hwang1989scheduling], where the authors divide into two sets of time intervals, one for the time when all the machines are busy and the other that one chain covers. Extending this approach to related machines does not appear possible. In contrast, in our argument, the construction of a terminal chain is simple and so we can identify the set of time intervals between tasks in the terminal chain and take advantage of the greedy nature of GETF to bound these times directly.

A key feature of the the Separation Principle below is that it separates the analysis of the terminal chain from the analysis of the group assignment rule, which provides another valuable simplification of the previous proof approaches.

Theorem 5.1 (Separation Principle).

For any choice of group assignment function and tie-breaking rule, GETF produces a schedule of makespan

where

Note that the upper bound in this result is valid regardless of the choice of group assignment rule and tie-breaking rule. is the sum of processing times along a terminal chain and can be viewed as total load assigned to machines in group . Both and , are not dependent on the communication constraint, which enables us to take advantage of any good choice of group assignment rule for general DAG scheduling, even in the case of zero communication time.

Proof.

Our proof proceeds in four steps:

  1. [label=()]

  2. Define a terminal chain . Recall that a chain , is a terminal chain when task completes at the end of the overall schedule.

  3. Partition the overall makespan into parts. The idea of this step is to decouple into one part where the tasks in the terminal chain are being processed and other parts associated with each machine group. Dependent on the choices of group assignment rule, we can further bound these parts.

  4. Bound the idle time in between tasks. The greedy nature of GETF makes it possible to bound the length of the idle time intervals between tasks by communication delays of task pairs.

  5. Combine (ii) and (iii) to bound the overall makespan in terms of the communication time of the terminal chain.

Define a terminal chain . To find a terminal chain of length , we start with one of the tasks that ends last, denoted as . According to the definition of and , task is assigned to machine in group with a starting time . Among all the immediate predecessors of task , we pick one of the tasks that finishes last and define it as . In such a fashion, we construct a chain of tasks of length such that does not have any predecessor.

Partition into parts, . Recall that is the number of groups for machines by the group assignment rule as we describe in the previous section. Let denote the union of the time intervals during which tasks of chain are being processed. Consider the time interval between the end of task and the start of task for , and assign it to where . As a set of time intervals, can be possibly empty or have more than one time interval. Essentially, is a set of time intervals that tasks in the terminal chain assigned to machines in group have to wait before being processed. In such a fashion, we define since maps each task to one of the machine groups. The length of the union of for is the makespan.

Bound the idle time in between tasks. Consider a task assigned to machine . For each machine , let denote a union of disjoint empty time intervals on machine between the end time of task and the start time of task . Between the end time of task and the start time of task , there can be multiple tasks being processed on machine in serial, possibly resulting in more than one idle time interval on machine during that time interval . Precedence constraints between task pairs can also possibly make a successor wait before it gets started. Regardless of the reason for idle time between tasks, each task can not possibly start earlier on any machine in the assigned group due to the greedy feature of GETF. Thus the length of is bounded above by the communication time between task and task , i.e.,

This is true because if it were not the case then task could have started earlier on machine . Note that the end time of task could possibly be earlier if it were allowed to be scheduled on a faster machine with a slightly bigger communication delay, since the processing speeds of machines in the same group vary.

Let be idle time on machine in group during the time interval , and let be maximum idle time on any machine in group during the time intervals , i.e., for all . Thus,

(4)

Bound the makespan. For , the total speed of machines in group is

Denote the total length of the intervals in by . There must be at least a sum of units of processing done on each machine in group during the time intervals . Thus for ,

Therefore,

(5)

We now bound :

(6a)
(6b)

where (6a) is due to (5) and (6b) is due to (4). ∎

5.2 Proof of Theorem 4.1

In order to apply the Separation Principle to prove Theorem 4.1, we need to prove bounds on and in the case of the group assignment rule defined in Section 3. For this, we consider the scheduling problem with zero communication time. Note that the design of group assignment function is based on the optimal solution of the relaxed LP for a scheduling problem with zero communication time, hence the upper bounds for both and are associated with the optimal objective of the relaxed LP in the setting with zero communication time as well.

The bounds of and are given in the following two lemmas, which are adapted from results in [li2017scheduling]. Theorem 4.1 follows directly from these two lemmas, the Separation Principle, and the fact that , where is the optimal solution to the LP.

Lemma 5.2.

Proof.

Recall that and as the largest group index such that at least more than half of tasks are assigned to machines in groups . For every task and any machine , by definition of the largest index ,

(7)

Thus,

(8a)
(8b)
(8c)

where (8b) is due to (7) and the fact that processing speed of machine in group for task is at most for , and (8c) is due to the fact that processing speed of machine in group , whose group index is not smaller than , is at least . Using this, we can bound as follows:

(9a)
(9b)
(9c)

where (9a) is due to (8), (9b) is due to constraint (1d) of the LP and (9c) is due to constraint (1c) of the LP. ∎

Lemma 5.3.

Proof.

For any task , by definition of , . Thus,

(10)

Inequality (10) is due to the fact that the assigned group maximizes the total speeds of machines in that group among the candidates . Thus,

(11)

The total load assigned to machines in group is while its total speed is . Summing over machines in group on both sides for constraint (1d) leads to (11). ∎

5.3 Proof of Proposition 4.2

We now show how the Separation Principle can be used to provide a new, simpler proof of the state-of-the-art approximation ratio of ETF in the case of identical machines. Recall that the group assignment function is not required for GETF in this case.

To prove Proposition 4.2, we use the same approach as we used for proving the Separation Principle. However, we can tighten the analysis in the final step of the argument. Specifically, the proof can be broken into three steps, instead of four:

  1. [label=()]

  2. Define a terminal chain . This step is identical to the definition of a terminal chain in the proof of the Separation Principle.

  3. Bound the idle time in between tasks. As the machines are identical in terms of processing speed, communication speed between different machine pairs are still heterogeneous due to the possible geolocations of machines.

  4. Combine (i) and (ii) to bound the overall makespan in terms of the communication time of the terminal chain.

Compared with the proof of the Separation Principle, Step (i) defines a terminal chain in the exactly same way. In Step (ii), bounding the idle time in the case of identical machines is also similar. Step (iii) requires more work. Here, we further tighten the bound by eliminating the processing time of the terminal chain to improve the constant factor.

Define a terminal chain . This step is identical to the definition of a terminal chain in the proof of the Separation Principle.

Bound the idle time in between tasks. Let be the time interval between the end time of task and the start time of for . As we explained in the Separation Principle, there can possibly be multiple idle time intervals on a machine during the time interval . For each machine , define as a union of disjoint empty time intervals on machine during the time interval . For any machine , the length of is bounded above by the communication time between task and task , i.e.,

Otherwise task could have started earlier on machine .

Bound the makespan. During the time intervals for , there must be at least processing units done, and it is bounded by a sum of the processing units for all the tasks except those in the terminal chain. This leads to the following bound:

(12a)

Finally, applying (