DeepAI

# Scheduling Stochastic Real-Time Coflows in Unreliable Computing Machines

We consider a distributed computing network consisting of a master machine and multiple computing machines. The master machine is running multiple jobs. Each job stochastically generates real-time coflows with a strict coflows' deadline. While a coflow is a collection of tasks that can be processed by corresponding computing machines, it is completed only when all its tasks are completed within the deadline. Moreover, we consider unreliable computing machines, whose processing speed is uncertain but is limited. Because of the limited processing abilities of the computing machines, an algorithm for scheduling coflows in the unreliable computing machines is critical to maximize the average number of completed coflows for each job. In this paper, we develop two scheduling algorithms, namely, a feasibility-optimal scheduling algorithm and an approximate scheduling algorithm. The feasibility-optimal scheduling algorithm can fulfill the largest region of jobs' requirements for the average number of completed coflows. However, the feasibility-optimal scheduling algorithm suffers from high computational complexity when the number of jobs is large. To address the issue, the approximate scheduling algorithm is proposed with a guaranteed approximation ratio in the worst-case scenario. The approximate scheduling algorithm is also validated in the average-case scenario via computer simulations.

10/02/2019

### Scheduling Stochastic Real-Time Jobs in Unreliable Workers

We consider a distributed computing network consisting of a master and m...
02/17/2022

### Online Scheduling of Time-Critical Tasks to Minimize the Number of Calibrations

We study the online scheduling problem where the machines need to be cal...
11/11/2021

### Assigning and Scheduling Generalized Malleable Jobs under Submodular Processing Speeds

Malleable scheduling is a model that captures the possibility of paralle...
11/10/2020

### Speed-Robust Scheduling

The speed-robust scheduling problem is a two-stage problem where given m...
03/03/2022

### High Multiplicity Scheduling on Uniform Machines in FPT-Time

In high-multiplicity scheduling, jobs of the same size are encoded in an...
04/01/2020

### Scheduling Parallel-Task Jobs Subject to Packing and Placement Constraints

Motivated by modern parallel computing applications, we consider the pro...
07/06/2018

### Flow-time Optimization For Concurrent Open-Shop and Precedence Constrained Scheduling Models

Scheduling a set of jobs over a collection of machines is a fundamental ...

## I Introduction

Distributed computing networks (such as MapReduce [6], Spark [25], and Dryad [11]) become increasing popular to support data-intensive jobs. The key feature to process a data-intensive job is to divide the job into a group of small tasks that can be processed in parallel by multiple computing machines. Because of the common feature among those distributed computing networks, the term coflow was recently proposed in [3] to represent such a group of tasks belonging to the same job. It turns out that the coflow abstraction not only expresses data-parallel computing networks, but also provides new opportunities to reduce the completion time of jobs.

Moreover, because of the real-time nature of latency-intensive jobs (e.g., online retail [23]), a coflow needs to be completed in a deadline. To maximize the number of real-time coflows that meet the deadline, a scheduling algorithm allocating computing machines to coflows is needed.

Coflow scheduling has been a hot topic since the coflow abstraction was proposed. On one hand, several works developed coflow scheduling systems, e.g. [4, 5, 17]. On the other hand, numerous works established coflow scheduling theory, e.g., [14, 19, 2, 22, 16, 20, 10]. See the recent survey paper [21]. Almost all prior research focused on deterministic networks; in contrast, little attention is given to stochastic networks. Note that a coflow can be randomly generated; moreover, a computing machine can be unreliable because of unpredictable events [1] like hardware failures. In this context, a scheduling algorithm for stochastic real-time coflows in unreliable computing machines is crucial.

Although scheduling for traditional packet-based stochastic networks has been extensively studied (e.g., [18, 9]), their solutions cannot apply to coflow-based stochastic networks. That is because all tasks in a coflow are dependent in the sense that a coflow is not completed until all its tasks are completed, but all packets or tasks in traditional packet-based stochastic networks are independently treated (as also stated in [13]). The most relevant works on scheduling stochastic coflows are [15, 13]. While [15] focused on homogeneous stochastic coflows, [13] extended to a heterogeneous case. The fundamental difference between those works and ours is that we consider stochastic real-time coflows and unreliable computing machines.

In this paper, we consider a master machine and

unreliable computing machines. The master machine is running multiple jobs, which stochastically generates real-time coflows with a hard deadline. Our main contribution lies in developing coflow scheduling algorithms with provable performance guarantees. Leveraging Lyapunov techniques, we propose a feasibility-optimal scheduling algorithm for maximizing the (feasibility) region of feasible requirements for the average number of completed coflows. However, the feasibility-scheduling algorithm turns out to involve an NP-hard combinatorial optimization problem. To tackle the computational issue, we propose an approximate scheduling algorithm that is computationally tractable; furthermore, prove that its feasibility region shrinks by at most

from largest one. More surprisingly, our simulation results show that the feasibility region of the approximate scheduling algorithm is close to the largest one.

## Ii System overview

### Ii-a Network model

Consider a distributed computing network consisting of a master machine and computing machines . The master machine is running jobs . Fig. 1 illustrates an example network with and . Suppose that data transfer between the master machine and the computing machines occurs instantaneously with no error.

Divide time into frames and index them by . At the beginning of each frame , each job stochastically

generates a coflow, where a coflow is a collection of tasks that can be processed by the corresponding computing machines. Precisely, we use vector

to represent the coflow generated by job in frame , where each element indicates if the coflow has a task for computing machine : if , then the coflow has a task for computing machine ; otherwise, it does not. See Fig. 1 for example. Each task is also stochastically generated, i.e.,

is a random variable for all

, , and . By we denote the number of 1’s in vector ; in particular, if , then job does not generate any coflow in frame

. Suppose that the probability distribution of random variable

is independently and identically distributed (i.i.d.) over frame , for all and . Suppose that all tasks generated by job for computing machine have the same111For the case of time-varying tasks’ sizes, the methodology in this paper can apply if those sizes are i.i.d. over frames. We just need to revise the state defined in the proof of Theorem 5 by including those task sizes. sizes. Moreover, we consider real-time coflows and suppose that the deadline for each real-time coflow is one frame. The real-time system has been justified in the literature, e.g., see [9].

Consider time-varying processing speed for each computing machine. Suppose that the processing speed of each computing machine is i.i.d. over frames. With the i.i.d. assumption along with those constant tasks’ sizes, we can suppose that a task generated by job can be completed by machine (i.e., when ) with a constant probability over frames. At the end of each frame, each computing machine reports if its task is completed in that frame. If any task of a real-time coflow cannot be completed in the arriving frame, the coflow is expired and removed from the job.

Unaware of the completion of a task at the beginning of each frame, we suppose that the master machine assigns at most one task to a computing machine for each frame. If two coflows and , for some and , needs the same computing machine in frame , i.e., for some , then we say the two coflows have interference. For example, coflows and in Fig. 1 have the interference.

As a result of the interference, the master machine has to decide a set of interference-free coflows for computing. Let be the set of interference-free coflows decided for computing in frame . A scheduling algorithm is a time sequence of the decisions for all frames.

### Ii-B Problem formulation

Let random variable indicate if coflow is completed in frame  under scheduling algorithm , where if all tasks of the coflow are completed by the corresponding computing machines in frame ; otherwise. The random variable depends on the random variables , the task completion probabilities , and a potential randomized scheduling algorithm .

We define the average number of completed coflows for job under scheduling algorithm by

 Ni(π)=liminfT→∞∑Tt=1E[ei(t;π)]T. (1)

The goal of this paper is to design a feasibility-optimal222The feasibility-optimal scheduling defined in this paper is analogy to the throughput-optimal scheduling (e.g., [18]) or the timely-throughput-optimal scheduling (e.g., [9]). scheduling algorithm that will be defined soon.

We refer to vector as a feasible requirement under scheduling algorithm  if for all . We define the feasibility region of a scheduling algorithm as follows.

###### Definition 1.

The feasibility region of scheduling algorithm  is the region of all feasible requirements under the scheduling algorithm .

Next, we define the maximum feasibility region as follows.

###### Definition 2.

The maximum feasibility region is the region of all feasible requirements under a scheduling algorithm.

Unlike the feasibility region of scheduling algorithm , the requirements in the region can be achieved by various scheduling algorithms, i.e., .

We define an optimal scheduling algorithm as follows.

###### Definition 3.

A scheduling algorithm  is called a feasibility-optimal scheduling algorithm if its feasibility region is .

That is, for any requirement , a feasibility-optimal scheduling algorithm can meet the requirement .

## Iii Scheduling algorithm design

In this section, we develop a feasibility-optimal scheduling algorithm for managing the stochastic real-time coflows in the unreliable master machines. To that end, we transform our coflow scheduling problem into a queue scheduling problem in a virtual queueing network in Section III-A. With the transformation, we propose a feasibility-optimal scheduling algorithm in Section III-B. However, the proposed feasibility-optimal scheduling algorithm involves a combinatorial optimization problem. We show that the combinatorial optimization problem is NP-hard. Thus, we develop a tractable approximate scheduling algorithm in Section III-C; meanwhile, we establish its approximation ratio.

### Iii-a Virtual queueing network

In this section, we propose a virtual queueing network for the original distributed computing network with a requirement . The virtual queueing network consists of  queues , and servers , operating under the same frame system as that in Section II-A. For example, Fig. 2 is the virtual queueing network for the distributed computing network in Fig. 1. Let be the queue size at queue at the beginning of frame . Let be the vector of all queue sizes at the beginning of frame .

At the beginning of each frame , there are packets arriving at queue . Then, the packet arrival rate for is , and can represent the arrival rate vector for the virtual queueing network. In each frame , if , then queue connects to server . See Fig. 2 for example.

In each frame, a server can serve at most one queue; in particular, server can complete the service for queue with probability for each frame. At the end of each frame, if all servers connected by a queue can complete their services for that queue, one packet is removed from that queue.

If two queues and , for some and , connect to the same server in frame , i.e., for some , then we say the two queues have interference in frame . For example, queues and in Fig. 2 have the interference in frame 1. Because of the interference in the virtual queueing network, we redefine decision to be the set of queues that are served in frame . A scheduling algorithm is also defined by .

Let indicate if one packet is removed from queue in frame under scheduling algorithm , where if one packet is removed from queue and if no packet is removed from queue . Then,  defined in Eq. (1) can represent the packet service rate for queue .

With the above interpretation of the virtual queueing network, the region consists of all arrival rate vectors  such that each arrival rate is less than service rate for all  under scheduling algorithm . The condition of for all implies that all queues can be stabilized by the scheduling algorithm , i.e., is finite. The region  is therefore called the stability region [7] of scheduling algorithm . Moreover, the region consists of all arrival rate vectors such that all queues can be stabilized by a scheduling algorithm, and is called the capacity region [7] of the virtual queueing network. If the stability region of scheduling algorithm  is identical to the capacity region, i.e., , then the scheduling algorithm  is called a throughput-optimal scheduling algorithm [7] for the virtual queueing network. That is, for any arrival rate vector , a throughput-optimal scheduling algorithm can stabilize the virtual queueing network.

With the transformation, the feasibility-optimal scheduling problem for the distributed computing network becomes the throughput-optimal scheduling problem for the virtual queueing network. Hence, we will focus on the throughput-optimal scheduling design for the virtual queueing network. We want to emphasize that, unlike traditional stochastic networks (e.g., [18, 9, 7]), each packet in our virtual queueing network can be removed only when all its connected servers complete their services in its arriving frame. Thus, those throughput-optimal scheduling algorithms for the traditional stochastic networks cannot solve our problem. Our paper extends to stochastic networks with multiple required servers; in particular, we develop a tractable approximate scheduling algorithm in Section III-C.

### Iii-B Feasibility-optimal scheduling algorithm

In this section, we propose a throughput-optimal scheduling algorithm for the virtual queueing network in Alg. 1, i.e., for any given arrival rate vector , Alg. 1 can stabilize all queues. Thus, the corresponding scheduling algorithm for the distributed computing network can meet the requirement .

At the beginning of each frame , Alg. 1 (in Line 1) updates each queue with the new arriving packets; then, Alg. 1 (in Line 1) decides for that frame according to the present queue size vector . The decision is made for maximizing the weighted sum of the queue sizes in Eq. (2). The term in Eq. (2) calculates the expected service rate for , i.e., the probability that one packet can be removed from , where the indicator function indicates if queue connects to at least one server in frame , and if so, one packet can be removed from queue with probability of . The underlying idea of Alg. 1 is to remove as many packets in expectation as possible.

After performing the decision , Alg. 1 (in Line 1) updates each at the end of frame : if is scheduled, it connects to at least one server, and all its connected server completes their services, then one packet is removed from .

###### Example 4.

Take Fig. 2 for example. Suppose that and for all , and . According to Line 1, Alg. 1 calculates and . Thus, Alg. 1 decides to serve for frame 1 (i.e., coflow in Fig. 1 is decided for computing in frame 1). If servers , , and can complete their services for queue in frame 1 (i.e., coflow  is completed), then one packet is removed from queue at the end of frame 1, i.e., queue has packet at the end of frame 1.

Leveraging Lyapunov techniques [18], we can establish the optimality of Alg. 1 in the following.

###### Theorem 5.

Alg. 1 is throughput-optimal for the virtual queueing network, i.e., the corresponding scheduling algorithm for the distributed computing network is feasibility-optimal.

###### Proof.

Let vector represent the state of the virtual queueing network in frame . Note that the state changes over frames but its probability distribution is i.i.d., according to the assumption in Section II-A. With the i.i.d. property of the state, the proof can follow the standard argument of the Lyapunov theory in [18]. ∎

Note that Alg. 1 involves a combinatorial optimization problem in Line 1. In the next section, we will investigate the computational complexity for solving the combinatorial optimization problem.

### Iii-C Tractable approximate scheduling algorithm

We show (in the next lemma) that the combinatorial optimization problem in Line 1 of Alg. 1 is NP-hard. Therefore, Alg. 1 is computationally intractable.

###### Lemma 6.

The combinatorial optimization problem in Alg. 1 in frame  is NP-hard, for all .

###### Proof.

We construct a reduction from the set packing problem [8]. See Appendix A for details. ∎

To study the NP-hard problem, we define two notions of approximation ratios as follows. While Def. 7 studies the resulting value of Eq. (2), Def. 8 investigates the resulting stability region.

###### Definition 7.

Given queue size vector in frame . Let be the value in Eq. (2) computed by Alg. 1 in frame . Let be the value in Eq. (2) computed by scheduling algorithm  in frame . Then, the scheduling algorithm  is called a -approximate scheduling algorithm to Eq. (2) if for all possible and .

###### Definition 8.

A scheduling algorithm  is called a -approximate scheduling algorithm to if, for any arrival rate vector , the arrival rate vector lies in the stability region of the scheduling algorithm .

In this paper, we propose an approximate scheduling algorithm in Alg. 2. The procedure of Alg. 2 is similar to that of Alg. 1; hence, we point out key differences in the following.

Unlike Alg. 1 solving the combinatorial optimization problem, Alg. 2 (in Line 2) simply sorts all queues according to the values computed by Eq. (3). If a queue connects to at least one server, then its value computed by Eq. (3) is the value computed by Eq. (2) divided by the square root of the number of its connected servers. In Line 2, we use to denote the sorted queues in descending order of the values from Eq. (3). Moreover, we use (in Line 2) to indicate if connects to server in frame .

The underlying idea of Alg. 2 is to include a queue, in order, to the decision set if that queue does not cause interference. Precisely, Alg. 2 uses a set to record (in Line 2) available servers that are not allocated yet, where set is initialized to be in Line 2. Then, at the -th iteration of Line 2, Alg. 2 checks if queue satisfies the two conditions in Line 2: the first condition means that connects to at least one server and the second condition means that its connected servers are all available. If meets the conditions, then it is scheduled as in Line 2. In addition, if is scheduled, then set is updated as in Line 2 by removing the servers allocated to queue . After deciding , Alg. 2 performs the decision in Line 2 for frame , followed by updating the queue sizes in Line 2.

###### Example 9.

Follow Ex. 4. According to Eq. (3), Alg. 2 calculates and . Thus, Alg. 2 decides to serve for frame 1, i.e., coflow in Fig. 1 is decided for computing in frame 1. Note that the decision is different from that in Ex. 4.

Next, we establish the approximation ratio of Alg. 2 to Eq. (2).

###### Lemma 10.

Alg. 2 is a -approximate scheduling algorithm to Eq. (2).

###### Proof.

See Appendix B. ∎

###### Remark 11.

We remark that the approximation ratio of is the best approximation ratio to Eq. (2). That is because the combinatorial optimization problem in Alg. 1 is computationally harder than the set packing problem (see Lemma 6) and the best approximation ratio to the set packing problem is the square root (see [8]).

With Lemma 10, we can further establish the approximation ratio of Alg. 2 to .

###### Theorem 12.

Alg. 2 is a -approximate scheduling algorithm to .

###### Proof.

See Appendix C. ∎

The computational complexity of Alg. 2 is primarily caused by sorting all queues in Line 2. Thus, Alg. 2 is tractable when the number of jobs is large.

## Iv Numerical results

In this section, we investigate Algs. 1 and 2 via computer simulations. First, we consider two jobs and two computing machines. Fig. 3 displays the feasibility regions of both scheduling algorithms for various task generation probabilities by job , when and for all , , and are fixed. Fig. 4 displays the feasibility regions of both scheduling algorithms for various task completion probabilities by computing machine , when and for all , , and are fixed. Each result marked in Figs. 3 or 4 is the requirement such that the average number of completed coflows in 10,000 frames for job is no less than and that for job is no less than . The both figures reflect that Alg. 2 is not only computationally efficient but also can fulfill almost all requirements .

Second, we consider more jobs and more computing machines with the same quantities, i.e., . Moreover, all task completion probabilities are fixed to be 0.9, i.e., for all and . Then, Fig. 5 displays the maximum requirements that can be achieved by Alg. 2, when all task generation probabilities are the same. From Fig. 5, the maximum achievable requirement by Alg. 2 appears to decrease super-linearly with the number of computing machines.

## V Concluding remarks

In this paper, we provided a framework for studying real-time coflows in unreliable computing machines. In particular, we developed two algorithms for scheduling coflows in shared computing machines. While the proposed feasibility-optimal scheduling algorithm can support the largest region of jobs’ requirements, it has the notorious NP-hard issue. In contrast, the proposed approximate scheduling algorithm is not only simple, but also has a provable guarantee for the achievable requirement region. Moreover, we note that coding techniques have been exploited to alleviate stragglers in distributed computing networks, e.g., [12, 24]. Thus, including coding design into our framework is promising.

## Appendix A Proof of Lemma 6

We show a reduction from the set packing problem [8], where given a collection of non-empty sets over a universal set for some positive integers and , the objective is to identify a sub-collection of disjoint sets in that collection such that the number of sets in the sub-collection is maximized.

For the given instance of the set packing problem, we construct queues and servers in the virtual queueing network. Consider a fixed frame . In frame , for each element we connect queue to server . With the transformation, the set packing problem is equivalent to identifying a set of interference-free queues in frame  such that number of queues in that set is maximized.

Moreover, consider no connection between the queues and the servers until frame , identical arrival rates for all , and identical task completion probabilities for all and . In this context, Eq. (2) in frame  becomes

 ∑i∈D(t)r⋅t, (4)

because , (due to non-empty sets for all ), and . As a result of the constant in Eq. (4), the objective of the combinatorial optimization problem in Alg. 1 in frame  becomes identifying a set of interference-free queues such that the number of queues in that set is maximized.

Suppose there exists an algorithm such that the combinatorial optimization problem in Alg. 1 in frame  can be solved in polynomial time. Then, the polynomial-time algorithm can identify a set for maximizing the value in Eq (4); in turn, solves the set packing problem. That contradicts to the NP-hardness of the set packing problem.

Because the above argument is true for all frames , we conclude that the combinatorial optimization problem in Alg. 1 in frame  is NP-hard, for all .

## Appendix B Proof of Lemma 10

Consider a fixed queue size vector in a fixed frame . Let be the value of Eq. (2) for queue . Without loss of generality, we can assume that for all and further assume that (by reordering the queue indices), i.e., Alg. 2 processes at the -th iteration of Line 2. Let be the decision of Alg. 2 for queue size vector . Then, we can express the value of Eq. (2) computed by Alg. 2 as

 APX(t;Alg.~{}???)=∑i:Qi∈D2Vi. (5)

Let be the decision of Alg. 1 for queue size vector . If the conditions in Line 2 of Alg. 2 hold for the -th iteration (i.e., ), then we let 333Here, we use to represent the set of common servers for queues and . be a set of queues. The set has the following properties:

• For queue , we have

 Vk√|Fk(t)|≤Vi√|Fi(t)|, (6)

since .

• All queues in are interference-free, i.e., they connect to different servers, since . Moreover, queue connects to at least one of the servers for (i.e., ). Thus, we have

 |Ci|≤|Fi(t)| (7)
• Since all queues in connect to different servers, and there are servers in the virtual queueing network, we have

 ∑k:Qk∈Ci|Fk(t)|≤M. (8)

Note that . Thus, we can bound computed by Alg. 1 by

 OPT(t)≤∑i:Qi∈D2OPTi, (9)

where for all .

Furthermore, we can bound for each by

 OPTi(a)≤ Vi√|Fi(t)|∑k:Qk∈Ci√|Fk(t)| (b)≤ Vi√|Fi(t)|√|Ci|√∑k:Qk∈Ci|Fk(t)| (c)≤ Vi√M, (10)

where (a) follows Eq. (6); (b) is due to the Cauchy-Schwarz inequality; (c) follows Eqs. (7) and (8).

Then, we can bound by

 OPT(t)(a)≤ ∑i:Qi∈D2OPTi (b)≤ ∑i:Qi∈D2Vi√M (c)≤ √M⋅APX(t;Alg.~{}???),

where (a) follows Eq. (9); (b) follows Eq. (10); (c) follows Eq. (5). Because the above argument is true for all and , the approximation ratio is .

## Appendix C Proof of Theorem 12

The proof of Theorem 12 needs the following technical lemma, whose proof follows the line of [18] along with the i.i.d. property of state (as discussed in the proof of Theorem 5) and the constant task completion probabilities .

###### Lemma 13.

If there exists a scheduling algorithm (that can depend on history) to stabilize all queues in the virtual queueing network, then there exists a stationary scheduling algorithm (i.e., decision depends on the state in frame  only) to stabilize all queues.

With Lemma 13, we can focus on the stability regions of stationary scheduling algorithms. To analyze the stability region of Alg. 2, we leverage the Lyapunov theory [18] as stated in the following lemma, where we consider the Lyapunov function .

###### Lemma 14.

Given arrival rate vector , if there exist constants and such that

 E[L(Q(t+1))−L(Q(t))|Q(t)]≤B−ϵN∑i=1Qi(t),

for all frames  under scheduling algorithm , then all queues are stable under the scheduling algorithm , i.e., .

Then, we are ready to prove Theorem 12. Suppose that arrival rate vector lies in . According to Lemma 13, under arrival rate vector , all queues can be stabilized by a stationary scheduling algorithm. We denote that stationary scheduling algorithm by . Consider arrival rate vector where for all . Next, applying Lemma 14 to Alg. 2, we conclude that Alg. 2 stabilizes all queues under arrival rate vector because

 E[L(Q(t+1))−L(Q(t))|Q(t)] (a)≤ B+2N∑i=1Qi(t)⋅r′i−2N∑i=1Qi(t)⋅E[ei(t;Alg.~{}???)|Q(t)] (b)≤ B+2N∑i=1Qi(t)⋅ri√M−2√MN∑i=1Qi(t)⋅E[ei(t;Alg.~{}???)|Q(t)] (c)≤ B+2N∑i=1Qi(t)⋅ri√M−2√MN∑i=1Qi(t)⋅E[ei(t;πs)|Q(t)] (d)= B+2N∑i=1Qi(t)⋅ri√M−2√MN∑i=1Qi(t)⋅Ni(πs) = B+2√MN∑i=1Qi(t)(ri−Ni(πs)) (e)≤ B+2ϵ√MN∑i=1Qi(t),

where (a) follows [18] with some constant ; (b) is because and the approximation ratio of Alg. 2 to Eq. (2) is (as stated in Lemma 10); (c) is because Alg. 1 (in Line 1) maximizes the value of among all possible scheduling algorithms ; (d) is because decision under stationary scheduling algorithm  depends on the state only (regardless of the queue sizes) and also the state is i.i.d. over frames, yielding for all and ; (e) is because for all , i.e., there exists an such that for all .

## References

• [1] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica (2013) Effective Straggler Mitigation: Attack of the Clones. Proc. of NSDI, pp. 185–198. Cited by: §I.
• [2] M. Chowdhury, S. Khuller, M. Purohit, S. Yang, and J. You (2019) Near Optimal Coflow Scheduling in Networks. Proc. of ACM SPAA, pp. 123–134. Cited by: §I.
• [3] M. Chowdhury and I. Stoica (2012) Coflow: A Networking Abstraction for Cluster Applications.. Proc. of ACM HotNets, pp. 31–36. Cited by: §I.
• [4] M. Chowdhury and I. Stoica (2015) Efficient Coflow Scheduling without Prior Knowledge. Proc. of ACM SIGCOMM 45 (4), pp. 393–406. Cited by: §I.
• [5] M. Chowdhury, Y. Zhong, and I. Stoica (2014) Efficient Coflow Scheduling with Varys. Proc. of ACM SIGCOMM 44 (4), pp. 443–454. Cited by: §I.
• [6] J. Dean and S. Ghemawat (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51 (1), pp. 107–113. Cited by: §I.
• [7] L. Georgiadis, M. J. Neely, and L. Tassiulas (2006) Resource Allocation and Cross-Layer Control in Wireless Networks. Vol. 1, Now Publishers, Inc.. Cited by: §III-A, §III-A.
• [8] M. M. Halldórsson, J. Kratochvíl, and J. A. Telle (1998) Independent Sets with Domination Constraints. Proc. of ICALP, pp. 176–187. Cited by: Appendix A, §III-C, Remark 11.
• [9] I-H. Hou and P. R. Kumar (2013) Packets with Deadlines: A Framework for Real-Time Wireless Networks. Vol. 6, Morgan & Claypool Publishers. Cited by: §I, §II-A, §III-A, footnote 2.
• [10] S. Im, B. Moseley, K. Pruhs, and M. Purohit (2019) Matroid Coflow Scheduling. Proc. of ICALP, pp. 1–14. Cited by: §I.
• [11] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly (2007) Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. ACM SIGOPS operating systems review 41 (3), pp. 59–72. Cited by: §I.
• [12] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran (2017)

Speeding Up Distributed Machine Learning using Codes

.
ieee_j_it 64 (3), pp. 1514–1529. Cited by: §V.
• [13] B. Li, Z. Shi, and A. Eryilmaz (2018) Efficient Scheduling for Synchronized Demands in Stochastic Networks. Proc. of IEEE WiOpt, pp. 1–8. Cited by: §I.
• [14] Y. Li, S. H. Jiang, H. Tan, C. Zhang, G. Chen, J. Zhou, and F. Lau (2016) Efficient Online Coflow Routing and Scheduling. Proc. of ACM MobiHoc, pp. 161–170. Cited by: §I.
• [15] Q. Liang and E. Modiano (2017) Coflow Scheduling in Input-Queued Switches: Optimal Delay Scaling and Algorithms. Proc. of IEEE INFOCOM, pp. 1–9. Cited by: §I.
• [16] S. Luo, H. Yu, and L. Li (2016) Decentralized Deadline-Aware Coflow Scheduling for Datacenter Networks. Proc. of IEEE ICC, pp. 1–6. Cited by: §I.
• [17] S. Ma, J. Jiang, B. Li, and B. Li (2016) Chronos: Meeting Coflow Deadlines in Data Center Networks. Proc. of IEEE ICC, pp. 1–6. Cited by: §I.
• [18] M. J. Neely (2010) Stochastic Network Optimization with Application to Communication and Queueing Systems. Vol. 3, Morgan & Claypool Publishers. Cited by: Appendix C, Appendix C, Appendix C, §I, §III-A, §III-B, §III-B, footnote 2.
• [19] M. Shafiee and J. Ghaderi (2018) An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters. ieee_j_net 26 (4), pp. 1674–1687. Cited by: §I.
• [20] S.-H. Tseng and A. Tang (2018) Coflow Deadline Scheduling via Network-Aware Optimization. Proc. of Allerton, pp. 829–833. Cited by: §I.
• [21] S. Wang, J. Zhang, T. Huang, J. Liu, and Y. Liu (2018) A Survey of Coflow Scheduling Schemes for Data Center Networks. ieee_m_com 56 (6), pp. 179–185. Cited by: §I.
• [22] Z. Wang, H. Zhang, X. Shi, X. Yin, Y. Li, H. Geng, Q. Wu, and J. Liu (2019) Efficient Scheduling of Weighted Coflows in Data Centers. ieee_j_pds 30 (9), pp. 2003–2017. Cited by: §I.
• [23] C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron (2011) Better Never than Late: Meeting Deadlines in Datacenter Networks. Proc. of ACM SIGCOMM 41 (4), pp. 50–61. Cited by: §I.
• [24] C.-S. Yang, R. Pedarsani, and A. S. Avestimehr (2019) Timely-Throughput Optimal Coded Computing over Cloud Networks. Proc of ACM MobiHoc, pp. 301–310. Cited by: §V.
• [25] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica (2012) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proc. of NSDI, pp. 2–2. Cited by: §I.