# Minimization of Weighted Completion Times in Path-based Coflow Scheduling

Coflow scheduling models communication requests in parallel computing frameworks where multiple data flows between shared resources need to be completed before computation can continue. In this paper, we introduce Path-based Coflow Scheduling, a generalized problem variant that considers coflows as collections of flows along fixed paths on general network topologies with node capacity restrictions. For this problem, we minimize the coflows' total weighted completion time. We show that flows on paths in the original network can be interpreted as hyperedges in a hypergraph and transform the path-based scheduling problem into an edge scheduling problem on this hypergraph. We present a (2λ + 1)-approximation algorithm when node capacities are set to one, where λ is the maximum number of nodes in a path. For the special case of simultaneous release times for all flows, our result improves to a (2λ)-approximation. Furthermore, we generalize the result to arbitrary node constraints and obtain a (2λΔ + 1)- and a (2λΔ)-approximation in the case of general and zero release times, where Δ captures the capacity disparity between nodes.

## Authors

• 4 publications
• 1 publication
• 6 publications
• 13 publications
• ### Stochastic Non-preemptive Co-flow Scheduling with Time-Indexed Relaxation

Co-flows model a modern scheduling setting that is commonly found in a v...
02/11/2018 ∙ by Ruijiu Mao, et al. ∙ 0

• ### Scheduling Coflows with Dependency Graph

Applications in data-parallel computing typically consist of multiple st...
12/21/2020 ∙ by Mehrnoosh Shafiee, et al. ∙ 0

• ### Generalizing the Kawaguchi-Kyan bound to stochastic parallel machine scheduling

Minimizing the sum of weighted completion times on m identical parallel ...
01/03/2018 ∙ by Sven Jäger, et al. ∙ 0

• ### Scheduling Flows on a Switch to Optimize Response Times

We study the scheduling of flows on a switch with the goal of optimizing...
05/19/2020 ∙ by Hamidreza Jahanjou, et al. ∙ 0

• ### Scheduling Opportunistic Links in Two-Tiered Reconfigurable Datacenters

Reconfigurable optical topologies are emerging as a promising technology...
10/15/2020 ∙ by Janardhan Kulkarni, et al. ∙ 0

• ### Near Optimal Coflow Scheduling in Networks

The coflow scheduling problem has emerged as a popular abstraction in th...
06/17/2019 ∙ by Mosharaf Chowdhury, et al. ∙ 0

• ### CoShare: An Efficient Approach for Redundancy Allocation in NFV

An appealing feature of Network Function Virtualization (NFV) is that in...
08/31/2020 ∙ by Yordanos Tibebu Woldeyohannes, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Parallel computing frameworks, e.g., MapReduce [6], Spark [25], or Google Data-flow [8], are a central element of today’s IT architecture and services, especially in the course of Big Data. Computing jobs in such a setting consist of a sequence of tasks, executed at different stages, such that in between data must be transferred from multiple output to multiple input ports. Herein, a consecutive task cannot start before all data is transferred to the next stage. Based on the coflow networking abstraction (see [3]), we abstract such a data transfer as a set of data flows on a communication stage and refer to it as a coflow if all flows must be finished to allow for the execution of the next task of a computing job. This data transferring between computations can contribute more than 50% to a jobs completion time (see [20]) such that minimizing coflow completion times remains a central challenge that heavily affects the efficiency of such environments.

So far, research on coflow scheduling has mainly focused on bipartite networks [1, 4, 5, 15]. Here, machines are uniquely divided into input and output ports and data can be transferred instantaneously via a direct link between each pair of input and output ports (see Figure 0(a)). Recently, research has shifted to more general network topologies. Jahanjou et al. [12] first introduced a variation of Coflow Scheduling where the underlying networks of machines is an arbitrary graph. Since then, this generalized problem has been considered more extensively [2, 20]. Applications arise for grid computing projects, i.e., inter-datacenter communication, where parallel computing tasks are executed on multiple but decentralized high-power computing units [2]. A node in the underlying network may represent a machine, a datacenter, or an intermediate routing point.

These general approaches assume an infinite router/machine capacity on a communication stage, i.e., the number of different jobs sending flow over the same node is unbounded. While it seems appropriate to treat the total capacity of large data centers as infinite, node capacities are often heterogeneous in other contexts. Especially in distributed computing projects where donors offer computing time from their personal computers for parallel computation, heterogeneous technological characteristics and router capacities impose natural restrictions. Application fields for these projects include astrobiology [22], mathematics [24], and molecular biology [21]. In order to address the disparity inherent to privately owned computing resources one may restrict the data that is sent over the nodes in the network.

To address these challenges, we introduce the concept of acr:pcsp, which considers coflow scheduling in the more general setting of Jahanjou et al., where coflows consist of multiple data flows that may run between any two machines of the underlying network on a fixed path of finite length (see Figure 0(b)). Additionally, we impose that machines can handle only a single flow type at any time. We further generalize the problem to non-uniform node capacities to consider different router capacities.

In the following, we first give a formal definition of acr:pcsp, before we review related literature and detail the contribution of our work.

### 1.1 Definition of the Path-based Coflow Scheduling Problem

Let be a multigraph with nodes. Every node corresponds to a machine and every edge to a communication line between two machines. A coflow is a collection of flows , , each sending units of data along a given path in the graph, and is associated with a weight . For the longest flow-carrying path in , we denote its number of nodes as . Along all paths, we assume data transfer to be instantaneous.

For a given discrete and finite but sufficiently large time horizon with time steps, a schedule assigns the execution of every flow of each coflow to time steps, such that each node handles at most one unit of data per time step. To this end, a coflow and its flows have a release time such that flows of coflow can only be scheduled from time step onward. Each coflow has a completion time , which is the earliest time at which all flows related to have been executed.

In this setting, the objective of acr:pcsp is to find a schedule that minimizes the weighted sum of completion times

 min n∑k=1wkCk. (1)

### 1.2 Related Work

One may view our approach as coflow scheduling with underlying matching constraints on general network topologies. Accordingly, acr:pcsp is related to different variants of Coflow Scheduling and to the acr:cosp problem in general. In the following, we concisely review related work in these fields.

Within the emerging field of coflow scheduling, primarily acr:bcsp (see Figure 0(a)) has been studied [1, 4, 5, 15]. Ahmadi et al. presented the current state of the art, providing a 5-approximation for acr:bcsp with release times, and a 4-approximation if release times are zero [1]. Recently, Shafiee and Ghaderi [19] achieved the same ratios based on a different LP formulation.

Table 1 summarizes coflow variants on arbitrary graphs, which have not been extensively studied so far and can be divided into path-based, circuit-based, and packet-based coflow scheduling: Jahanjou et al. focused on circuit-based coflows where a flow is a connection request between two nodes, every edge has limited capacity, but different jobs may send flow over the same edge [12]. They provide a -approximation for circuit-based coflows with fixed paths. Recently, Chowdhury et al. improved this ratio to a randomized 2-approximation [2]. Jahanjou et al. also considered packet-based coflows to be a set of packet transmissions from a source to a destination on a given network. Every edge can only serve at most one packet per time. Contrary to the circuit setting, packets are not transfered along the entire path instantaneously. They gave an -approximation for packet-based coflows with given paths.

acr:pcsp has not been addressed so far and differs from circuit- and packet-based coflows as it allows only single unit-sized packets to be sent over each node. In contrast, previous approaches allow fractional data transmissions on nodes and links.

In all of the previously mentioned results, the path a flow takes through the network is assumed to be fixed. Several publications additionally introduce different methods of routing for the flows in the network to further improve completion times, including [2, 12, 20, 26]. In this paper, we always assume paths are given by the problem definition and no routing takes place.

So far, providing an algorithm for acr:bcsp with an approximation ratio less than resp.  has not been successful. Im et al. [10] recently presented a -approximation for Matroid Coflow Scheduling, where the family of flow sets that can be scheduled in a given time slot form a matroid. Since the flows that can be scheduled in a bipartite network do not form a matroid, this result does not improve the afore-mentioned ratios. A -approximation for acr:bcsp was claimed in [11], which was then subsequently retracted.

Coflow Scheduling generalizes acr:cosp [13, 14, 23], where we are given a set of machines , jobs , and every job has operations on machine . The weight, release time, and completion time of a job are defined as for coflows. The goal is to minimize the weighted sum of completion times . Sachdeva and Saket showed in [18] that acr:cosp is hard to approximate within a factor of if and therefore the same result holds for any variant of Coflow Scheduling. acr:cosp admits a 3-approximation and 2-approximation for general and zero release times, respectively [13]. During our algorithm, we will compute a solution to the underlying acr:cosp inherent to any coflow instance.

### 1.3 Contribution

With this paper, we are the first to introduce acr:pcsp, which generalizes the well known acr:bcsp. We present an approximation algorithm based on a novel edge scheduling problem on a hypergraph. Specifically, we show that flows can be interpreted as hyperedges instead of paths in the network, because they occupy all machines on their path simultaneously. Theorem 1 states our main result with being the number of vertices in a longest flow-bearing path of an instance graph.

###### Theorem 1.

There exists a -approximation for acr:pcsp with arbitrary release times and a -approximation for acr:pcsp with zero release times.

Section 2 details the proof of Theorem 1. It is possible to extend the special case where all release times are zero to arbitrary release times that are smaller than . Furthermore, the approximation guarantee can be slightly improved to a factor strictly smaller than and , respectively. See Appendix C for details on how to obtain these minor improvements. Additionally, we generalize the algorithm to the case of non-uniform node capacities in Section 3.1. We also show that it matches or improves the state of the art for several problem variants. First, for we improve the deterministic state of the art for circuit-based coflows with unit capacities, which is a 17.6-approximation developed by Jahanjou et al. [12]. Refer to Appendix A for more details on this statement. Second, for our algorithm matches the state of the art for acr:bcsp, a 5-approximation with release times and a 4-approximation without release times [1, 19]. Moreover, our algorithm yields the same ratios without the bipartiteness condition. Refer to Section 3.2 for a detailed comparison. Third, with acr:pcsp reduces to acr:cosp. In this case, our algorithm matches the state of the art, yielding a 3-approximation with release times and a 2-approximation without release times. Overall, our approach seems to capture the difficulty of open shop scheduling with matching constraints well, especially if the parameter is small.

## 2 Methodology

This section details the methodological foundation for Theorem 1 in two steps.

First, we introduce an LP relaxation of acr:pcsp in Section 2.1. Specifically, we reduce an instance of acr:pcsp to an instance of acr:cosp by ignoring matching constraints and considering each node individually. We derive deadlines for the coflows from the LP solution. These tentative deadlines lie provably close to the optimal solution of the LP.

An important insight of this paper is that since flows occupy all machines on their path simultaneously they can be interpreted as hyperedges instead of paths in the network. Thus, we transform every flow-path of the underlying graph into a hyperedge in Section 2.2. Here, we determine a schedule such that every edge still finishes within a factor of the previously found deadlines but no hyperedges that contain the same node overlap. We introduce a new problem called Edge Scheduling, based on a hypergraph with release time , and deadline for every edge . At each discrete time step , we can schedule a subset of edges if they form a matching. The goal is to find, if possible, a feasible solution that schedules all edges between their release time and deadline.

In summary, we prove Theorem 1 in Section 2.2 based on the following rationale: The solution of the Edge Scheduling problem lies within a guaranteed factor of the deadlines constructed in step one. Since these deadlines were defined by the LP solution, which in turn is bounded by the optimal solution of the Coflow instance, the combined algorithm ultimately yields a provably good approximation factor.

In the remainder, we refer to coflows as jobs and to flows as operations to avoid ambiguous wording.

### 2.1 Finding Deadlines with Good Properties

Let be an instance of acr:pcsp with its underlying graph . We introduce variables to denote the completion time of each job . Further, we define the load of job on machine as the sum of all operations of that go through node :

 L(k)i=∑j: i∈P(k)jc(k)j.

For any subset and any machine , we define the variables :

With this notation our LP relaxation results to:

 min n∑k=1wkCk (2) s.t. Ck ≥rk+L(k)i ∀k∈[n],∀i∈[m] (3) ∑k∈SL(k)iCk ≥fi(S) ∀S⊂[n],∀i∈[m]. (4)

The first set of constraints (3) obtains a lower bound for the completion time of a single job based on its release time. The second set of constraints (4) provides a lower bound on the completion time of any set of jobs . Note that (4) has been used frequently in acr:cosp and acr:bcsp [16, 13, 7, 1] and, although the number of constraints is exponential, can be polynomially separated [16]. Accordingly, we can solve (2)–(4) using the ellipsoid method [9].

We denote by an optimal solution to the LP and consider (w.l.o.g.) the jobs to be ordered s.t. .

###### Lemma 1 ([13, Lemma 11]).

For all jobs and all machines the following holds:

 C∗k≥12k∑l=1L(l)i.
###### Proof.

Let . Since is a feasible solution of the LP, it must fulfill (4), such that

 k∑l=1L(l)iC∗l =∑l∈SL(l)iC∗l ≥fi(S) ≥12⋅(k∑l=1L(l)i)2.

Accordingly, we estimate the completion time of job

as follows:

 C∗k ≥∑kl=1L(l)iC∗k∑kl=1L(l)i ≥∑kl=1L(l)iC∗l∑kl=1L(l)i ≥12k∑l=1L(l)i.

We now define a deadline for every job . We utilize in Section 2.2 to define a partial order on the operations of the instance. With Lemma 1, we estimate for all :

 Dk:=2⋅C∗k≥k∑l=1L(l)i. (5)

### 2.2 The Edge Scheduling Algorithm

In this section, we design our edge scheduling algorithm. First, based on the deadlines , we define a partial order on the operations of . For every operation , it induces an upper bound on the number of preceding operations that share a node with . With this order, we can then devise our edge scheduling algorithm.

#### Operation Order Based on Deadlines.

We transform into a hypergraph . While the node set remains the same (), we derive the hyperedges from the operations of the instance, i.e. the edge set consists of all hyperedges constructed in the following way: Let be an operation on a path . Then, we add for each of the units of data sent by a corresponding hyperedge , such that it consists of all nodes of the operation’s path. By so doing, we receive identical edges for every operation. Furthermore, let denote the job corresponding to the operation of edge . We set the release time and the deadline of . Note that we have for all with the maximum path-length .

We now consider the line graph of . Note that is always a simple graph, although is a hypergraph with possibly multiple edges. Let and be hyperedges of with a common vertex . We then say the edge originated from .

As a basis for our algorithm, we define an order on the operations, i.e., the hyperedges of or the vertices of , using the notion of orientations and kernels.

###### Definition 1.

Let be a graph. An orientation of is a directed graph on the same vertex and edge set, where every edge in is assigned a unique direction in . We define as the set of outgoing edges at a vertex .

###### Definition 2.

Let be a graph and an orientation of . An independent set is called kernel of , if for all there is an arc directed from to some vertex in .

W.l.o.g. we order the vertices of (i.e., the hyperedges of ) by the converted deadlines obtained from the job deadlines of Section 2.1. For vertices that have the same deadline, we use an arbitrary order. Let this order be such that , which is consistent with the ordering obtained from the deadlines of Section 2.1. Let be the set of neighbours of an edge in . We construct an orientation of with Algorithm 1.

The algorithm simply directs any edge of such that the endpoint with the higher deadline points to the one with the lower deadline. Specifically, shows the characteristics described in Lemma 2.

###### Lemma 2.

An orientation constructed by Algorithm 1 has the following properties:

1. Any vertex satisfies the inequality .

2. is kernel-perfect, i.e. every induced subgraph of has a kernel.

###### Proof.

We prove Lemma 2 in two steps.

1. Consider an arbitrary vertex of representing edge with being the index of in the ordering of the edges . Recall that by Algorithm 1, has only outgoing arcs in to vertices in the set .

In , is a hyperedge with at most endpoints. Let be an endpoint of and let be the set of outgoing arcs from that originated from during the construction of the line graph.

We now focus on the cardinality of : the endpoint of any arc from this set must lie in . Recall that by we denote the job corresponding to an edge . For all edges we have . Hence, the same holds for all edges that are the endpoint of an arc in . Therefore, we obtain

 ∣∣d+v(e)∣∣ ≤∣∣{f∈E∖{e}:f contains v and Dkf≤Dke}∣∣ =∣∣{f∈E:f contains v and Dkf≤Dke}∣∣−1 =ke∑l=1L(l)v−1 (6) ≤Dke−1. (7)

To derive (6), we observe that the load on machine up to job is equal to the number of edges containing from jobs with a smaller or equal deadline. The final step (7) results from (5).

Since has at most endpoints in , we conclude

 ∣∣d+(e)∣∣≤∑v∈e∣∣d+v(e)∣∣≤λ⋅(Dke−1)=λ⋅(De−1).
2. We note that any digraph without directed cycles of odd length is kernel-perfect

[17]. Additionally, we observe that does not contain any directed cycles to begin with.

#### Edge Scheduling.

With these preliminaries, we devise our Edge Scheduling algorithm as described in Algorithm 2. This algorithm finds a feasible edge schedule on such that no edge is scheduled later than (see Lemma 3).

###### Lemma 3.

Algorithm 2 finds a feasible solution for Edge Scheduling on a given hypergraph , s.t. every hyperedge is scheduled not later than .

###### Proof.

We note that any induced subgraph of has a kernel (see Lemma 2). Hence, we can find a kernel in each iteration of the algorithm because the modified graph remains an induced subgraph of the original orientation. Refer to Appendix B on how to construct a kernel in a cycle-free directed graph. Accordingly, Algorithm 2 is well defined.

For an arbitrary hyperedge of , assume that in any iteration of the algorithm we have and is already released. Then, is scheduled at the current time slot because lies in the kernel as it has no outgoing edges and . Hence, it suffices to prove that for any hyperedge of after at most iterations holds.

The orientation fulfills in the beginning of the algorithm (see Lemma 2). We note that for any iteration in which , hyperedge is not considered to be scheduled at all, which is necessary to satisfy the release time constraint. We now consider all iterations . In each of these iterations, holds because and two cases remain:

1. If at any point before iteration , the result is immediate.

2. If, on the other hand, , then must have an outgoing edge to some by the kernel property of . As gets removed from at the end of the iteration, loses at least one outgoing edge. Hence, after at most such iterations, we have .

This concludes the proof. ∎

Given this upper bound on the scheduled time for every edge, we prove Theorem 1.

Proof of Theorem 1. We consider a given instance of acr:pcsp. Then, we can solve the LP relaxation (2) to receive a set of solutions for all jobs (see Section 2.1). We define deadlines . Note that we have because of (3).

Now, we transform the graph into a hypergraph as described in Section 2.2. Then, we define an orientation according to Algorithm 1 and run Algorithm 2 on .

By Lemma 3, this algorithm schedules every edge within in polynomial time. Given the specific structure of the hypergraph and the definition of deadlines for the hyperedges, the resulting schedule induces a feasible solution for the Coflow instance by assigning every operation to the slot of the corresponding hyperedge.

Let be the final completion time of job in this solution; let be the last edge in the schedule associated to ; and let be the time slot in which is scheduled. Then for all :

 Ck=Ce≤re+λDe=rk+λDk.

Summing over all jobs , we obtain

 n∑k=1wkCk ≤n∑k=1wk(rk+λDk) =n∑k=1wk(rk+λ⋅2C∗k) ≤(2λ+1)⋅n∑k=1wkC∗k ≤(2λ+1)⋅opt(I),

and if , we have

 n∑k=1wkCk ≤n∑k=1wk(λDk) =2λ⋅n∑k=1wkC∗k ≤2λ⋅opt(I).

We conclude that our Algorithm solves acr:pcsp within a factor of of the optimal solution for general release times. In the case of zero release times the solution lies within a factor of of the optimum.∎

## 3 Extensions of the Algorithm

This section generalizes our result to additional application cases. First, we show in Section 3.1 how the algorithm can be extended for general vertex constraints. Then, we apply our algorithm to acr:bcsp in Section 3.2.

### 3.1 General Vertex Constraints

In this section, we show how our algorithm can be generalized to  homogeneous vertex capacities greater than one and  heterogeneous vertex capacities.

In the homogeneous case it is simple to transform the problem back to the unit capacity case. In the heterogeneous case, the approximation ratio depends on the maximum ratio between the average and lowest capacity of the vertices of a hyperedge as we will show in the remainder of this section.

Let be a hypergraph as constructed in Section 2.2 and let be given for all . For every hyperedge we introduce the notions of average capacity () and capacity disparity ():

To this end, we show Theorem 2. We note that for , where the “hyperedges” only consist of single vertices, holds for all . Hence, we retain the ratios of and in this generalization of acr:cosp. As soon as edges consist of at least two vertices, we must include the capacity disparity in the approximation ratio.

###### Theorem 2.

Let . There exists a -approximation for Path-based Coflow Scheduling with arbitrary release times and a -approximation for Path-based Coflow Scheduling with zero release times.

We note that if all vertex capacities are homogeneous, that is , the capacity disparity of all edges is equal to . Thus, in this case we retain the ratios and from the unit capacity case. Alternatively, the homogeneous problem can be transformed back to the unit capacity case by linearly scaling the time horizon by , i.e.  timesteps in the new schedule correspond to timestep in the original problem. This incurs no additional factors in the approximation ratio of the algorithm.

Now consider general capacities for every machine . We modify constraints (3) of the LP to for all and . Changing constraints (4) analogously, we get the following LP:

 min n∑k=1wkCk s.t. Ck ≥rk+L(k)iu(i) ∀k∈[n],∀i∈[m] ∑k∈SL(k)iCk ≥fi(S)u(i) ∀S⊂[n],∀i∈[m].

Let be an optimal solution of this LP, ordered such that . Then, Lemma 4 revisits Lemma 1, requiring only minor changes in its proof.

###### Lemma 4.

For all jobs and all machines : .

We again define and consider the hypergraph constructed from the input graph where all operations correspond to hyperedges. We define release times and edge deadlines analogously to Section 2.1, but based on the updated LP. Then, we use Algorithm 1 to construct an orientation of the line graph and reformulate Lemma 2.

###### Lemma 5.

The orientation as constructed by Algorithm 1 in the case of general vertex capacities has the following properties:

1. Any vertex of the line graph satisfies .

2. It is kernel-perfect, i.e., every induced subgraph of has a kernel.

###### Proof.

We prove Lemma 5 in two steps.

1. Let be any vertex of and be an endpoint of . We may repeat the line of argument of the proof of Lemma 2 until the step

 ∣∣d+v(e)∣∣≤ke∑l=1L(l)v−1.

By Lemma 4 and the definition of , we have . Note here that endpoints of correspond to machines of the job . We sum over all such endpoints of to receive

 ∣∣d+(e)∣∣ ≤∑v∈e∣∣d+v(e)∣∣ ≤∑v∈e(Dke⋅u(v)−1) =De⋅∑v∈eu(v)−|e| =|e|⋅(De⋅avg(e)−1).

Observing that the number of endpoints is bounded by gives the final inequality.

2. See Proof of Lemma 2.

Now, we change the Edge Scheduling algorithm to nontrivial vertex constraints as follows.

###### Lemma 6.

Algorithm 3 finds a feasible solution for Edge Scheduling on a given hypergraph , s.t. every hyperedge is scheduled not later than .

###### Proof.

We note that existance and construction of a kernel is equivalent to the proof of Lemma 3.

Let be any hyperedge of . We prove that after at most time steps it holds that and that there is at least one open slot left for itself.

By Lemma 5, the orientation fulfills in the beginning of Algorithm 3. Edge is in in every iteration . In every such iteration, we repeatedly search for a kernel of until all vertices have no capacities left. One particular edge remains in as long as all its endpoints have available capacity. Accordingly, unless it is already scheduled, is considered at least times in every slot .

If at any point until iteration , then is scheduled and the claim holds. If, on the other hand, for all sub-iterations before that, then must have an outgoing edge to some in every such sub-iteration by the kernel property of . Therefore, loses at least outgoing edges in every iteration.

In total, would lose at least

 λDeΔ(e)⋅minv∈eu(v)≥λDe⋅avg(e)

outgoing edges until iteration . But since only has

 ∣∣d+(e)∣∣≤λ(De⋅avg(e)−1)<λDe⋅avg(e)

such outgoing edges, there is at least one slot left where it holds that . Hence is scheduled not later than iteration . ∎

To finally prove Theorem 2, we follow along the lines of the proof of Theorem 1. We estimate the completion time of a job by its latest edge . Hence,

 Ck=Ce≤re+λDeΔ(e)≤rk+λDk⋅maxe∈EΔ(e).

For the final estimation we then get

 n∑k=1wkCk ≤n∑k=1wk(rk+λDk⋅maxe∈EΔ(e)) ≤(2λΔ+1)⋅opt(I)

and note that the case without release times is analogous.

### 3.2 Bipartite Coflow Scheduling

We now show how our algorithm can be applied to acr:bcsp. An instance of acr:bcsp considers a bipartite graph , each side consisting of ports. Each coflow sends units from input port to output port . The definitions of weight, release time, and completion time are the same as in Section 1.1; each port can handle at most one unit-sized packet of data per time slot; and the objective remains to minimize .

We define the load of job on machine as the sum of all operations on that machine. The load on machine is defined equivalently:

 L(k)i=n∑j=1c(k)i,j,L(k)j=n∑i=1c(k)i,j.

With this notation, we redefine LP (2) as

 min n∑k=1wkCk s.t. Ck ≥rk+L(k)i ∀k∈[n],∀i∈[m] Ck ≥rk+L(k)j ∀k∈[n],∀j∈[m] ∑k∈SL(k)iCk ≥fi(S) ∀S⊂[n],∀i∈[m] ∑k∈SL(k)jCk ≥fj(S) ∀S⊂[n],∀j∈[m].

Again, we consider an optimal solution of the LP which is ordered such that . Then, Lemma 1 holds without changes and we can analogously define .

In the bipartite case, every operation already corresponds directly to an edge in the graph such that transforming becomes superfluous. Analogously to our general case, we define the release times and deadlines of the edges based on the job the edge belongs to. The orientation is defined as in Algorithm 1 and Lemmas 2 and 3 hold with ; the proofs are analogous. Moreover, the proof of Theorem 1 with is equivalent so that we can state our result for the bipartite case.

###### Theorem 3.

The acr:pcsp algorithm can be applied to acr:bcsp and gives a -approximation for arbitrary release times and a -approximation for zero release times.

In this context, we clarify that the algorithm of Ahmadi et al. [1] is not applicable to our more general acr:pcsp: We recall that they based their approach on a primal-dual analysis of the LP relaxation to receive an order for the jobs. The main idea of their algorithm is a combinatorial shifting of operations from later to earlier time slots based on this job order. They use a result by [15] to schedule single jobs within a number of steps equal to their maximum load.

We now prove that this central lemma does not generalize, even in the case if the graph is non-bipartite. With our notation this lemma is as follows.

###### Lemma 7 ([1, Lemma 1]).

There exists a polynomial algorithm that schedules a single coflow job in time steps.

We consider a simple example graph (see Figure 2) consisting of three vertices connected by three edges which form a triangle. The single coflow on this graph is defined by an operation on each edge with . The load on any vertex of the graph is equal to two, hence . However, since the edges form a triangle, three steps are needed to feasibly schedule the entire job and Lemma 7 does not hold.

Additionally, the lemma does not hold when we generalize the bipartiteness condition to -partite hypergraphs, which arise in acr:pcsp with . Figure 3 shows a counterexample that consists of one coflow with four flows. Each flow sends one unit of data along a path with three vertices (). The corresponding hypergraph is -partite with the start vertices, middle vertices, and end vertices of the flow paths forming the three disjoint vertex sets. Flows and their corresponding hyperedges have the same color. Because any two hyperedges have a common vertex, any feasible schedule requires at least four time steps. This contradicts Lemma 7 as the maximum load is only .

## 4 Conclusion

In this paper, we introduced the acr:pcsp problem with release dates that arises in the context of today’s distributed computing projects. We presented a -approximation algorithm for homogeneous unit-sized node capacities. For zero release times this result improves to a -approximation. We generalized this algorithm to arbitrary node constraints with a - and a -approximation in the case of general and zero release times. Here, captures the capacity disparity between nodes. Furthermore, we showed that our algorithm is applicable to a wide range of problem variants, often matching the state of the art, e.g., for acr:bcsp and acr:cosp.

Further work is required in closing the gaps between the presented ratios and the lower bound of given by the reduction to acr:cosp, which is not tight for . It is likely that our robust approach using orientations to sort the operations in the scheduling part of our algorithm can be further improved with new ideas.

Finally, we remark that it might be possible to fix and extend the approach of [1] to our general framework, since the given counterexamples only contradict Lemma 7 but do not yield a worse approximation ratio. We also leave this question open for future research to deliberate.

## References

• [1] S. Ahmadi, S. Khuller, M. Poruhit, and S. Yang (2017) On scheduling coflows. In

Integer Programming and Combinatorial Optimization

,
pp. 13–24. Cited by: §1.2, §1.3, §1, §2.1, §3.2, §4, Lemma 7.
• [2] M. Chowdhury, S. Khuller, M. Purohit, S. Yang, and J. You (2019) Near optimal coflow scheduling in networks. In 31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’19. Cited by: Appendix A, Appendix A, §1.2, §1.2, Table 1, §1.
• [3] M. Chowdhury and I. Stoica (2012) Coflow: a networking abstraction for cluster applications. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 31–36. External Links: ISBN 978-1-4503-1776-4 Cited by: §1.
• [4] M. Chowdhury and I. Stoica (2015) Efficient coflow scheduling without prior knowledge. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, New York, NY, USA, pp. 393–406. External Links: Document, ISBN 978-1-4503-3542-3 Cited by: §1.2, §1.
• [5] M. Chowdhury, Y. Zhong, and I. Stoica (2014) Efficient coflow scheduling with varys. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM ’14, New York, NY, USA, pp. 443–454. External Links: Document, ISBN 978-1-4503-2836-4 Cited by: §1.2, §1.
• [6] J. Dean and S. Ghemawat (2008-01) MapReduce: simplified data processing on large clusters. Commun. ACM 51 (1), pp. 107–113. External Links: Document, ISSN 0001-0782 Cited by: §1.
• [7] N. Garg, A. Kumar, and V. Pandit (2007) Order scheduling models: hardness and algorithms. In FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science, V. Arvind and S. Prasad (Eds.), Berlin, Heidelberg, pp. 96–107. External Links: ISBN 978-3-540-77050-3 Cited by: §2.1.
• [9] M. Grötschel, L. Lovász, and A. Schrijver (1993) Geometric algorithms and combinatorial optimization. Springer Berlin Heidelberg, Berlin, Heidelberg. External Links: ISBN 978-3-642-78240-4 Cited by: §2.1.
• [10] S. Im, B. Moseley, K. Pruhs, and M. Purohit (2019) Matroid Coflow Scheduling. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi (Eds.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 132, Dagstuhl, Germany, pp. 145:1–145:14. Note: Keywords: Coflow Scheduling, Concurrent Open Shop, Matroid Scheduling External Links: ISBN 978-3-95977-109-2, ISSN 1868-8969, Document Cited by: §1.2.
• [11] S. Im and M. Purohit (2017) A tight approximation for co-flow scheduling for minimizing total weighted completion time. CoRR abs/1707.04331. External Links: 1707.04331 Cited by: §1.2.
• [12] H. Jahanjou, E. Kantor, and R. Rajaraman (2017) Asymptotically optimal approximation algorithms for coflow scheduling. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’17, pp. 45–54. External Links: ISBN 978-1-4503-4593-4 Cited by: Appendix A, §1.2, §1.2, §1.3, Table 1, §1.
• [13] J. Y.-T. Leung, H. Li, and M. Pinedo (2007) Scheduling orders for multiple product types to minimize total weighted completion time. Discrete Applied Mathematics 155 (8), pp. 945 – 970. External Links: ISSN 0166-218X Cited by: §1.2, §2.1, Lemma 1.
• [14] M. Mastrolilli, M. Queyranne, A. S. Schulz, O. Svensson, and N. A. Uhan (2010) Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters 38 (5), pp. 390 – 395. External Links: ISSN 0167-6377 Cited by: §1.2.
• [15] Z. Qiu, C. Stein, and Y. Zhong (2015) Minimizing the total weighted completion time of coflows in datacenter networks. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, New York, NY, USA, pp. 294–303. External Links: Document, ISBN 978-1-4503-3588-1 Cited by: §1.2, §1, §3.2.
• [16] M. Queyranne (1993-01-01) Structure of a simple scheduling polyhedron. Mathematical Programming 58 (1), pp. 263–285. External Links: Document, ISSN 1436-4646 Cited by: §2.1.
• [17] M. Richardson (1946-02) On weakly ordered systems. Bull. Amer. Math. Soc. 52 (2), pp. 113–116. Cited by: item 2.
• [18] S. Sachdeva and R. Saket (2013) Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover. In 2013 IEEE Conference on Computational Complexity, pp. 219–229. External Links: ISSN 1093-0159 Cited by: §1.2.
• [19] M. Shafiee and J. Ghaderi (2018) An improved bound for minimizing the total weighted completion time of coflows in datacenters. IEEE/ACM Trans. Netw. 26 (4), pp. 1674–1687. External Links: Document, ISSN 1063-6692 Cited by: §1.2, §1.3.
• [20] L. Shi, J. Zhang, Y. Liu, and T. G. Robertazzi (2018) Coflow scheduling in data centers: routing and bandwidth allocation. CoRR abs/1812.06898. External Links: 1812.06898 Cited by: §1.2, §1, §1.
• [21] Stanford University (2000) Folding@home. Cited by: §1.
• [22] University of California (2002) Berkeley open infrastructure for network computing. Cited by: §1.
• [23] G. Wang and T.C. E. Cheng (2007) Customer order scheduling to minimize total weighted completion time. Omega 35 (5), pp. 623 – 626. External Links: Document, ISSN 0305-0483 Cited by: §1.2.
• [24] G. Woltman (1996) Great internet mersenne prime search. Mersenne Research, Inc.. Cited by: §1.
• [25] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica (2010) Spark: cluster computing with working sets.. HotCloud 10 (10-10), pp. 95. Cited by: §1.
• [26] Y. Zhao, K. Chen, W. Bai, M. Yu, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang (2015-04) Rapier: integrating routing and scheduling for coflow-aware data center networks. In 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 424–432. External Links: Document, ISSN 0743-166X Cited by: §1.2.

## Appendix A Equivalence of Edge Constraints and Node Constraints

In this section we show how to simulate edge capacities as used in [2, 12] by node capacities and vice versa.

For the forward direction, we assume a graph with capacities on every edge as well as a set of coflows defined by operations on given paths. The reduction works as follows: We split every edge in the middle and add a node with the corresponding capacity of the split edge. All other nodes are assigned infinite capacity. Evidently, this transformation is polynomial and correct.

In Section 1.3 we argue that our algorithm can solve circuit-based coflows with unit-capacities with an approximation ratio smaller than for all . However, the above reduction increases the lengths of the given paths in our problem, in particular it increases the parameter , on which the approximation factor of our algorithm depends. In the following, we show that the approximation ratio is still less than if , based on the value of in the original graph .

Let the set of machines be partitioned into two sets of nodes, , where is the set of machines with finite capacity and is the set of machines with infinite capacity. By looking at the LP relaxation (2), we see that the constraints for a node do not need to be added to the LP since they do not limit the completion times of the operations. In fact, if all nodes had infinite capacity, one constraint of the form for all would suffice to describe the polyhedron completely.

Therefore, we only need to consider the nodes in for the definition of deadlines . Additionally, we can simplify the construction of the line graph in Section 2.2 such that an edge in between two vertices and is only added if and share a node with finite capacity.

We redefine as the maximum number of finite nodes in a longest flow-bearing path in the graph. Then Lemmas 2 and 3 hold with this new definition, since the deadlines were defined using only such finite nodes. Finally, Theorem 1 can be amended such that there exists a -approximation and -approximation, respectively.

For our reduction described above, we see that for every path in the original graph , the number of finite nodes in the reduced setting is exactly equal to the number of edges in the path. Hence, for a given problem instance with parameter , our algorithm gives us a -approximation and -approximation, respectively. Therefore, if , the ratio of our algorithm is smaller than .

To show that node capacities can be simulated via edge capacities, we refer to [2]. There, it is stated that this can be done by replacing every node by a gadget consisting of two nodes and setting the capacity of the new edge as the capacity of the old node.

## Appendix B Constructing a Kernel in a Cycle-free Directed Graph

We present a simple algorithm on how to find a kernel in a directed graph without directed cycles in Algorithm 4.

The runtime of this algorithm is clearly polynomial, since all nodes are considered at most once. Nodes with out-degree 0 can be found in linear time.

For correctness, we first verify the termination of the algorithm. If is non-empty, then the set must contain at least one node. Assume , then every node has at least one outgoing arc which implies the existence of a directed cycle. Hence, at least one node gets removed from in every iteration, until is empty.

Now consider the properties of a kernel from Definition 2: it is an independent set and for all there exists an arc from to some node in . The set as constructed by the algorithm above is clearly independent, since nodes of out-degree 0 cannot be adjacent themselves and all other adjacent nodes of are removed from in every iteration.

For the second property, consider . Then, was in for some iteration of the algorithm by the termination property. By definition, has an outgoing arc towards in iteration . Therefore, the above algorithm correctly returns a kernel of in polynomial time.

## Appendix C Minor Improvement of Approximation Ratio

The inequalities used in the proofs of Lemma 1, Lemma 3, and Theorem 1 leave some room for minor improvements of the approximation guarantees. In the following, we show how to modify these proofs in order to obtain a slightly improved version of Theorem 1:

###### Theorem 4.

Let be the number of jobs. There exists a -approximation for acr:pcsp with arbitrary release times and a -approximation for acr:pcsp when all release times are smaller than .

We start by revisiting Lemma 1 and proving a slightly more precise version.

###### Lemma 8.

For all jobs and all machines the following holds:

 C∗k≥k+12k⋅k∑l=1L(l)i.
###### Proof.

Let . Since is a feasible solution of the LP, it must fulfill constraint (4) of the LP for and :

 k∑l=1L(l)iC∗l=∑l∈SL(l)iC∗l≥fi(S)=12⋅⎛⎝k∑l=1(L(l)i)2+(k∑l=1L(l)i)2⎞⎠.

Applying the Cauchy-Schwarz inequality on the first summand yields

Thus, we obtain the following lower bound for the completion time of job :

 C∗k ≥∑kl=1L(l)iC∗k∑kl=1L(l)i ≥∑kl=1L(l)iC∗l∑kl=1L(l)i ≥(∑kl=1(L(l)i)2+(∑kl=1L(l)i)2)2⋅∑kl=1L(l)i ≥(1k⋅(∑kl=1L(l)i)2+(∑kl=1L(l)i)2)2⋅∑kl=1L(l)i ≥(k+1)⋅(∑kl=1L(l)i)22k⋅∑kl=1L(l)i ≥k+12k⋅k∑l=1L(l)i.

This allows us to set the deadline for job to instead of in equation (5). These new deadlines later effect the approximation factor to be strictly smaller than , and , respectively.

For the second improvement, which is the extension of the special case from zero release times to release times smaller than , we focus on a detail in the proof of Lemma 3. In the end of the proof, we stated that after at most iterations, we have . According to the first part of this inequality, we can restate Lemma 3 with a slightly stricter conclusion.

###### Lemma 9.

Algorithm 2 finds a feasible solution for Edge Scheduling on a given hypergraph , s.t. every hyperedge is scheduled not later than .

For the proof of Theorem 4, we proceed analogously to the proof of Theorem 1. Hence, we only need to modify the final computations for the approximation guarantee. Applying Lemma 9, we obtain that for the final completion time of job in the solution provided by our algorithm

 Ck≤rk+λDk−(λ−1)

holds for all . Summing over all jobs yields

 n∑k=1wkCk ≤n∑k=1wk(rk+λDk−(λ−1)) =