Scheduling Coflows for Minimizing the Total Weighted Completion Time in Identical Parallel Networks

04/06/2022
by   Chi-Yeh Chen, et al.
0

Coflow is a recently proposed network abstraction to capture communication patterns in data centers. The coflow scheduling problem in large data centers is one of the most important NP-hard problems. Previous research on coflow scheduling focused mainly on the single-switch model. However, with recent technological developments, this single-core model is no longer sufficient. This paper considers the coflow scheduling problem in identical parallel networks. The identical parallel network is an architecture based on multiple network cores running in parallel. Coflow can be considered as divisible or indivisible. Different flows in a divisible coflow can be transmitted through different network cores. Considering the divisible coflow scheduling problem, we propose a (6-2/m)-approximation algorithm with arbitrary release times, and a (5-2/m)-approximation without release time, where m is the number of network cores. On the other hand, when coflow is indivisible, we propose a (7-2/m)-approximation algorithm with arbitrary release times, and a (6-2/m)-approximation without release time.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/16/2022

Scheduling Coflows for Minimizing the Total Weighted Completion Time in Heterogeneous Parallel Networks

Coflow is a network abstraction used to represent communication patterns...
05/05/2022

Scheduling Coflows with Precedence Constraints for Minimizing the Total Weighted Completion Time in Identical Parallel Networks

Coflow is a recently proposed network abstraction for data-parallel comp...
11/29/2019

Minimization of Weighted Completion Times in Path-based Coflow Scheduling

Coflow scheduling models communication requests in parallel computing fr...
06/17/2019

Near Optimal Coflow Scheduling in Networks

The coflow scheduling problem has emerged as a popular abstraction in th...
08/21/2019

A sufficient condition for a linear speedup in competitive parallel computing

In competitive parallel computing, the identical copies of a code in a p...
03/02/2021

Single and Parallel Machine Scheduling with Variable Release Dates

In this paper we study a simple extension of the total weighted flowtime...
11/22/2017

Test Generation and Scheduling for a Hybrid BIST Considering Test Time and Power Constraint

This paper presents a novel approach for test generation and test schedu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the past decade, large data centers have become the dominant form of computing infrastructure. Numerous studies [7, 6, 18, 1] have demonstrated the benefits of application-aware network scheduling by exploiting the structured traffic patterns of distributed applications in data centers. Data-parallel computing applications such as MapReduce [8], Hadoop [15], and Spark [17] consist of multiple stages of computation and communication. The success of these applications has led to a proliferation of applications designed to alternate between computational and communication stages. In data-parallel computing applications, computation involves only local operations in the server. However, much of the intermediate data (flows) generated during the computation stage need to be transmitted across different machines during the communication stage for further processing. As the number of applications increases, data centers require more data transfer capability. In these data transfers, the collective impact of all flows between the two machine groups becomes important. This collective communication pattern in the data center is abstracted by coflow traffic [5].

A coflow represents a collection of related flows whose completion time is determined by the completion time of the last flow in the collection [14]. A data center can be modeled as giant non-blocking switch, with input links connected to source servers and output links connected to destination servers. Moreover, we assume that the transmission speed of each port is uniform. This modeling allows us to focus only on scheduling tasks, rather than routing flows. Each coflow can be represented by a integer matrix , where entry represents the number of data units that must be transferred from input port to output port . Each coflow also has a weight and a release time. Weightings can capture different priorities for different coflows.

Previous research on coflow scheduling focused mainly on the single-switch model, which has been widely used in coflow studies. However, with recent technological developments, this single-core model is no longer sufficient. In fact, a growing data center would operate on multiple networks in parallel to increase the efficiency of the network [16, 10]. We consider the identical parallel network, an architecture based on multiple network cores running in parallel. Parallel networks provide a large amount of aggregated bandwidth by serving traffic at the same time. The goal of this paper is to schedule coflows in the identical parallel networks such that the weighted completion time is minimized. Coflow can be considered as divisible or indivisible. Different flows in a divisible coflow can be transmitted through different network cores. We assume that divisible coflows are transmitted at the flow level, so that data in a flow are distributed to the same network core. On the other hand, flows in an indivisible coflow can only be transmitted through the same network core.

1.1 Related Work

Chowdhury and Stoica [5] first introduced the coflow abstraction to capture communication patterns in data centers. Since then, many related studies have been proposed in the literature to schedule coflows, e.g. [7, 6, 12, 19, 14, 2]. It is well known that the concurrent open shop problem is NP-hard to approximate within a factor better than for any  [13, 14]. Since the coflow scheduling problem generalizes the well-studied concurrent open shop scheduling problem, it is also NP-hard to approximate within a factor better than  [2, 4, 13]. Qiu et al. [12] developed deterministic approximation and randomized -approximation algorithms for the problem of minimizing the weighted completion time of coflows. In the coflow scheduling problem with release dates, they claimed a deterministic -approximation and a randomized -approximation algorithms. However, Ahmadi et al. [2] proved that their technique actually yields only a deterministic -approximation algorithm for coflow scheduling with release times. Khuller et al. [11] also developed an approximation algorithm for coflow scheduling with arbitrary release times with a ratio of . Moreover, when all coflows had release dates equal to zero, they obtained a deterministic 8-approximation and a randomized -approximation. In recent work, Shafiee and Ghaderi [14] obtained a 5-approximation algorithm with arbitrary release times, and a 4-approximation algorithm without release time. Ahmadi et al. [2] proposed a primal-dual algorithm to improve running time for the coflow scheduling problem. Huang et al. [10] considered scheduling a single coflow on a heterogeneous parallel network and proposed a -approximation algorithm, where is the number of network cores.

1.2 Our Contributions

This paper considers the coflow scheduling problem in identical parallel networks. In the divisible coflow scheduling problem, we first propose a -approximation algorithm with arbitrary release times, and a -approximation without release time, where is the number of network cores. When coflow is indivisible, we propose a -approximation algorithm with arbitrary release times, and a -approximation without release time.

1.3 Organization

The rest of this article is organized as follows. Section 2 introduces basic notations and preliminaries. Section 3 presents an algorithm for divisible coflow scheduling. Section 4 presents an algorithm for indivisible coflow scheduling. Section 5 compares the performance of the previous algorithms with that of the proposed algorithm. Section 6 draws conclusions.

2 Notation and Preliminaries

This work abstracts the identical parallel networks as a set of giant non-blocking switch, with input links connected to source servers and output links connected to destination servers. Each switch represents a network core. This abstract model is simple and practical, as topological designs such as Fat-tree or Clos [3] enable the construction of networks with fully bisection bandwidth. In source servers (or destination servers), the -th source server (or -th destination server) is connected to the -th input (or th output) port of each parallel switch. Therefore, each source server (or destination server) has simultaneous uplinks (or downlinks). Each uplink (or downlink) can be a bundle of multiple physical links in the actual topology [10]. Let be the source server set and be the destination server set. The network core can be viewed as a bipartite graph, with on one side and on the other side. For simplicity, we assume that all network cores are the same and all links in each network core have the same capacity (or the same speed).

A coflow consists of a set of independent flows whose completion time is determined by the completion time of the latest flow in the set. The coflow can be expressed as demand matrix where denote the size of the flow to be transferred from input to output in coflow . In other words, each flow is a triple , where is its source node and is its destination node, is the coflow to which it belongs. Moreover, we assume that flows consist of discrete data units, so their sizes are integers. For simplicity, we assume that all flows in a coflow arrive at the system at the same time (as shown in [12]).

This work considers the following offline coflow scheduling problem with release dates. There is a set of coflows denoted by . Coflow is released to the system at time , , which means it can only be scheduled after time . Let be the completion time of coflow , that is, the time for all flows of coflow to finish processing. Each coflow has a positive weight . Weights can capture different priorities for different coflows. The higher the weight, the higher the priority. The goal is to schedule coflows in an identical parallel network to minimize , the total weighted completion time of the coflow. If all weights are equal, the problem is equivalent to minimizing the average coflow completion time. Table 1 presents the notation and terminology that are used herein.

The number of network cores.
The number of input/output ports.
The source server set and the destination server set.
The set of coflows.
The demand matrix of coflow .
The size of the flow to be transferred from input to output in coflow .
The completion time of coflow .
The completion time of flow . In the analysis of approximation algorithm for divisible coflow scheduling, we use to represent .
The released time of coflow .
The weight of coflow .

An optimal solution to the linear program.

The schedule solution to our algorithm.
Table 1: Notation and Terminology

3 Approximation Algorithm for Divisible Coflow Scheduling

In this section, coflows are considered divisible, where different flows in a coflow can be transmitted through different cores. We assume that divisible coflows are transmitted at the flow level, so that data in a flow are distributed to the same core. Let be the set of flows with source . Let be the set of flows with destination . For any subset (or ), let (or ) and (or ). We can formulate our problem as the following linear programming relaxation:

min (1)
s.t. (1a)
(1b)
(1c)
(1d)

In the linear program (1), is the completion time of coflow in the schedule and is the completion time of flow . The constraint (1a) is that the completion time of coflow is bounded by all its flows. The constraint (1b) ensures that the completion time of any flow is at least its release time plus its load. The constraints (1c) and (1d) are used to lower bound the completion time variable in the input port and the output port respectively. These two constraints are modified from the scheduling literature [9]. For any flow , we abstract the indices , and and replace them with only one index. Based on the analysis in [9], we have the following two lemmas:

Lemma 3.1.

For the -th input with flows, let satisfy (1c) and assume without loss of generality that . Then, for each , if ,

Proof.

For clarity of description, we use to represent and use to represent in constraint (1c). According to (1c) and the fact that , we have

The following inequality can be obtained:

Lemma 3.2.

For the -th output with flows, let satisfy (1d) and assume without loss of generality that . Then, for each , if ,

Proof.

The proof is similar to that of lemma 3.1. ∎

Our algorithm flow-driven-list-scheduling (described in Algorithm 1) is as follows. Given flows from all coflows in the coflow set , we first compute an optimal solution to the linear program (1). Without loss of generality, we assume and schedule the flows iteratively in the order of this list. For each flow , the problem is to consider all flows that congested with and scheduled before . Then, find a network core and assign flow to network core such that the complete time of is minimized. Lines 5-10 are to find the least loaded network core and assigning flow to it. Lines 11-24 are modified from Shafiee and Ghaderi’s algorithm [14]. Therefore, all flows are transmitted in a preemptible manner.

0:

  a vector

used to decide the order of scheduling
1:  let be the load on the -th input port of the network core
2:  let be the load on the -th output port of the network core
3:  let be the set of flows allocated to network core
4:  both and are initialized to zero and for all
5:  for every flow in non-decreasing order of , breaking ties arbitrarily do
6:     note that the flow is sent by link
7:     
8:     
9:      and
10:  end for
11:  for each do in parallel do
12:     wait until the first coflow is released
13:     while there is some incomplete flow do
14:        for every released and incomplete flow in non-decreasing order of , breaking ties arbitrarily do
15:           note that the flow is sent by link
16:           if the link is idle then
17:              schedule flow
18:           end if
19:        end for
20:        while no new flow is completed or released do
21:           transmit the flows that get scheduled in line 17 at maximum rate 1.
22:        end while
23:     end while
24:  end for
Algorithm 1 flow-driven-list-scheduling

3.1 Analysis

This section shows that the proposed algorithm achieves an approximation ratio of with arbitrary release times, and an approximation ratio of without release time. Let be the set of flows that belongs to the -th input and let be the set of flows that belongs to the -th output. For any flow with -th input and -th output, let

be the set of output flows congested with and scheduled before . Let

be the set of input flows congested with and scheduled before . Note these two sets also include .

Lemma 3.3.

Let be an optimal solution to the linear program (1), and let denote the completion times in the schedule found by flow-driven-list-scheduling. For each ,

Proof.

The proof is similar to Hall et al[9]. Assume the flow is sent via link . Let , , and . Consider the schedule induced by the flows . Since all links in the network cores are busy from to the start of flow , we have

(2)
(3)
(4)
(5)
(6)
(7)

The inequality (3) is due to for all , we have . The equation (4) shifts the partial flow into the second and third terms. The inequalities (5) and (6) are based on lemma 3.1 and lemma 3.2 respectively. The inequality (7) is due to in the linear program (1). ∎

According to lemma 3.3, we have the following theorem:

Theorem 3.4.

The flow-driven-list-scheduling has an approximation ratio of, at most, .

When all coflows are release at time zero, we have the following lemma:

Lemma 3.5.

Let be an optimal solution to the linear program (1), and let denote the completion times in the schedule found by flow-driven-list-scheduling. For each ,

when all coflows are released at time zero.

Proof.

Assume the flow is sent via link . Let Let , , . Consider the schedule induced by the flows . Since all links in the network cores are busy from zero to the start of flow , we have

According to lemma 3.5, we have the following theorem:

Theorem 3.6.

For the special case when all coflows are released at time zero, the flow-driven-list-scheduling has an approximation ratio of, at most, .

4 Approximation Algorithm for Indivisible Coflow Scheduling

In this section, coflows are considered to be indivisible, where flows in a coflow can only be transmitted through the same core. For every coflow and input port , let be the total amount of data that coflow needs to transmit through input port . Moreover, let be the total amount of data that coflow needs to transmit through output port . When the coflow is indivisible, we can formulate our problem as the following linear programming relaxation:

min (8)
s.t. (8a)
(8b)
(8c)
(8d)

In the linear program (8), is the completion time of coflow in the schedule. The constraints (8a) and (8b) ensure that the completion time of any coflow is at least its release time plus its load. The constraints (8c) and (8d) are used to lower bound the completion time variable in the input port and the output port, respectively. These two constraints are modified from the scheduling literature [9, 2]. Based on the analysis in [9], we have the following two lemmas:

Lemma 4.1.

For the -th input with coflows, let satisfy (8c) and assume without loss of generality that . Then, for each , if ,

Proof.

According to (8c) and the fact that , we have:

The following inequality can be obtained:

Lemma 4.2.

For the -th output with coflows, let satisfy (8d) and assume without loss of generality that . Then, for each , if ,

Proof.

The proof is similar to that of lemma 4.1. ∎

Our algorithm coflow-driven-list-scheduling (described in Algorithm 2) is as follows. Given a set of coflows, we first compute an optimal solution to the linear program (8). Without loss of generality, we assume and schedule all the flows in all coflows iteratively respecting the ordering in this list. For each coflow , we find a network core that can transmit coflow such that the complete time of coflow is minimized. Lines 5-10 are to find a network core that minimizes the maximum completion time of the coflow . Lines 10-25 transmit all the flows allocated to the network core in the order of the completion time of the coflow to which they belong.

0:  a vector used to decide the order of scheduling
1:  let be the load on the -th input port of the network core
2:  let be the load on the -th output port of the network core
3:  let be the set of coflows allocated to network core
4:  both and are initialized to zero and for all
5:  for every coflow in non-decreasing order of , breaking ties arbitrarily do
6:     
7:     
8:      and for all
9:  end for
10:  for each do in parallel do
11:     wait until the first coflow is released
12:     while there is some incomplete flow do
13:        for all , list the released and incomplete flows respecting the non-decreasing order in
14:        let be the set of flows in the list
15:        for every flow  do
16:           note that the flow is sent by link
17:           if the link is idle then
18:              schedule flow
19:           end if
20:        end for
21:        while no new flow is completed or released do
22:           transmit the flows that get scheduled in line 18 at maximum rate 1.
23:        end while
24:     end while
25:  end for
Algorithm 2 coflow-driven-list-scheduling

4.1 Analysis

This section shows that the proposed algorithm achieves an approximation ratio of with arbitrary release times, and an approximation ratio of without release time.

Lemma 4.3.

Let be an optimal solution to the linear program (1), and let denote the completion times in the schedule found by coflow-driven-list-scheduling. For each ,

Proof.

Assume the last completed flow in coflow is sent via link . Let and . Consider the schedule induced by the coflows . Since all links in the network cores are busy from to the start of the last completed flow in coflow , we have

The second inequality is due to for all , we have . The third equation shifts the partial flow into the second and third terms. The fourth inequality is based on lemma 4.1 and lemma 4.2. The last inequality is due to and in the linear program (8). ∎

According to lemma 4.3, we have the following theorem:

Theorem 4.4.

The coflow-driven-list-scheduling has an approximation ratio of, at most, .

Similar to the case of divisible coflow, we have the following lemma:

Lemma 4.5.

Let be an optimal solution to the linear program (8), and let denote the completion times in the schedule found by coflow-driven-list-scheduling. For each ,

when all coflows are released at time zero.

Proof.

The proof is similar to that of lemmas 3.5 and 4.3. ∎

According to lemma 4.5, we have the following theorem:

Theorem 4.6.

For the special case when all coflows are released at time zero, the coflow-driven-list-scheduling has an approximation ratio of, at most, .

5 Results and Discussion

This section compares the approximation ratio of the proposed algorithms to that of the previous algorithms. We consider two problems: scheduling a single coflow problem and scheduling mutiple coflow problem. In the scheduling a single coflow problem, we compares with the algorithm in Huang et al. [10]. In the scheduling mutiple coflow problem, we first use Shafiee and Ghaderi’s algorithm to obtain the order of coflows. According to this order, the algorithm in [10] distributes each coflow to the identical parallel network. Therefore, this method achieves an approximation ratio of with arbitrary release times, and an approximation ratio of without release time.

Figure 1: The approximation ratio between the algorithm in [10] and the proposed algorithm.
Figure 2: The approximation ratio between the algorithm in [14, 10] and the proposed algorithm.
Figure 3: When all coflows are released at time zero, the approximation ratio between the algorithm in [14, 10] and the proposed algorithm.

In the scheduling a single coflow problem, figure 1 presents the numerical results concerning the approximation ratio of algorithms. When , the approximation ratio of proposed algorithm tends to 5. When , the proposed algorithm outperforms the algorithm in [10]. Figure 2 and figure 3 separately present the numerical results concerning the approximation ratio of algorithms in the scheduling mutiple coflow problem with arbitrary release times and no release times. The proposed algorithm outperforms the previous algorithm in all cases

6 Concluding Remarks

With recent technological developments, the single-core model is no longer sufficient. Therefore, we consider the identical parallel network, which is an architecture based on multiple network cores running in parallel. This paper develops two polynomial-time approximation algorithms to solve the coflow scheduling problem in identical parallel networks. Coflow can be considered as divisible or indivisible. Considering the divisible coflow scheduling problem, the proposed algorithm achieves an approximation ratio of with arbitrary release times, and an approximation ratio of without release time. When coflow is indivisible, the proposed algorithm achieves an approximation ratio of with arbitrary release times, and an approximation ratio of without release time.

References

  • [1] S. Agarwal, S. Rajakrishnan, A. Narayan, R. Agarwal, D. Shmoys, and A. Vahdat, “Sincronia: Near-optimal network design for coflows,” in Proceedings of the 2018 ACM Conference on SIGCOMM, ser. SIGCOMM ’18.   New York, NY, USA: Association for Computing Machinery, 2018, p. 16–29.
  • [2] S. Ahmadi, S. Khuller, M. Purohit, and S. Yang, “On scheduling coflows,” Algorithmica, vol. 82, no. 12, pp. 3604–3629, 2020.
  • [3] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,” ACM SIGCOMM computer communication review, vol. 38, no. 4, pp. 63–74, 2008.
  • [4] N. Bansal and S. Khot, “Inapproximability of hypergraph vertex cover and applications to scheduling problems,” in Automata, Languages and Programming, S. Abramsky, C. Gavoille, C. Kirchner, F. Meyer auf der Heide, and P. G. Spirakis, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 250–261.
  • [5] M. Chowdhury and I. Stoica, “Coflow: A networking abstraction for cluster applications,” in Proceedings of the 11th ACM Workshop on Hot Topics in Networks, ser. HotNets-XI.   New York, NY, USA: Association for Computing Machinery, 2012, p. 31–36.
  • [6] ——, “Efficient coflow scheduling without prior knowledge,” in Proceedings of the 2015 ACM Conference on SIGCOMM, ser. SIGCOMM ’15.   New York, NY, USA: Association for Computing Machinery, 2015, p. 393–406.
  • [7] M. Chowdhury, Y. Zhong, and I. Stoica, “Efficient coflow scheduling with varys,” in Proceedings of the 2014 ACM Conference on SIGCOMM, ser. SIGCOMM ’14.   New York, NY, USA: Association for Computing Machinery, 2014, p. 443–454.
  • [8] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, p. 107–113, jan 2008.
  • [9] L. A. Hall, A. S. Schulz, D. B. Shmoys, and J. Wein, “Scheduling to minimize average completion time: Off-line and on-line approximation algorithms,” Mathematics of Operations Research, vol. 22, no. 3, pp. 513–544, 1997.
  • [10] X. S. Huang, Y. Xia, and T. S. E. Ng, “Weaver: Efficient coflow scheduling in heterogeneous parallel networks,” in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, pp. 1071–1081.
  • [11] S. Khuller and M. Purohit, “Brief announcement: Improved approximation algorithms for scheduling co-flows,” in Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016, pp. 239–240.
  • [12] Z. Qiu, C. Stein, and Y. Zhong, “Minimizing the total weighted completion time of coflows in datacenter networks,” in Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’15.   New York, NY, USA: Association for Computing Machinery, 2015, p. 294–303.
  • [13] S. Sachdeva and R. Saket, “Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover,” in 2013 IEEE Conference on Computational Complexity, 2013, pp. 219–229.
  • [14] M. Shafiee and J. Ghaderi, “An improved bound for minimizing the total weighted completion time of coflows in datacenters,” IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1674–1687, 2018.
  • [15] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pp. 1–10.
  • [16] A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat, “Jupiter rising: A decade of clos topologies and centralized control in google’s datacenter network,” in Proceedings of the 2015ACM Conference on SIGCOMM, ser. SIGCOMM ’15.   New York, NY, USA: Association for Computing Machinery, 2015, p. 183–197.
  • [17] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” in 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10), 2010.
  • [18] H. Zhang, L. Chen, B. Yi, K. Chen, M. Chowdhury, and Y. Geng, “Coda: Toward automatically identifying and scheduling coflows in the dark,” in Proceedings of the 2016 ACM Conference on SIGCOMM, ser. SIGCOMM ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 160–173.
  • [19] Y. Zhao, K. Chen, W. Bai, M. Yu, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang, “Rapier: Integrating routing and scheduling for coflow-aware data center networks,” in 2015 IEEE Conference on Computer Communications (INFOCOM).   IEEE, 2015, pp. 424–432.