Distributed computing networks (such as MapReduce ) become increasingly popular to support data-intensive jobs. The underlying idea to process a data-intensive job is to divide the job into a group of small tasks that can be processed in parallel by multiple workers. In general, a worker can be specialized to process a type of tasks. For example, MapReduce allows an application to specify its computing network. Another outstanding example is distributed computing networks for massive multiplayer online games . The online game system illustrated in Fig. 1 includes one master and four workers processing different types of tasks. The master is serving two players. While the present job of player 1 needs two types of workers to get completed, that of player 2 needs three types of workers.
Moreover, because of the real-time nature of latency-intensive applications (e.g., online games), a real-time job needs to be completed in a deadline. To maximize the number of jobs that meet the deadline, a scheduling algorithm allocating workers to jobs is needed. Job-level scheduling poses more challenges than packet-level scheduling. That is because all tasks in a job are dependent in the sense that a job is not completed until all its tasks are completed, but all packets or tasks in traditional packet-based networks are independently treated.
Most prior research on job-level scheduling considered general-purpose workers. The closest scenario to ours (i.e., specialized workers) is the coflow model proposed in , where a coflow is a job consisting of tasks of various types. Since the coflow model was proposed, coflow scheduling has been a hot topic, e.g. [3, 11, 14, 15, 8]. See the recent survey paper . However, almost all prior research on the coflow scheduling focused on deterministic networks; in contrast, little attention was given to stochastic networks. Note that a job can be randomly generated; moreover, a worker can be unreliable because of unpredictable events  like hardware failures. Because of the practical issues, a scheduling algorithm for stochastic real-time jobs in unreliable workers is crucial in distributed computing networks. The most relevant works to ours are [12, 10]. While  focused on homogeneous stochastic jobs in the coflow model,  extended to a heterogeneous case. The fundamental difference between those relevant works and ours is that we consider stochastic real-time jobs and unreliable workers.
In this paper, we consider a master and
specialized workers. The master is running multiple applications, which stochastically generate real-time jobs with a hard deadline. The workers are unreliable. Our main contribution lies in developing job scheduling algorithms with provable performance guarantees. Leveraging Lyapunov techniques, we propose a feasibility-optimal scheduling algorithm for maximizing the region of achievable requirements for the average number of completed jobs. However, the feasibility-scheduling algorithm turns out to involve an NP-hard combinatorial optimization problem. To tackle the computational issue, we propose an approximate scheduling algorithm that is computationally tractable; furthermore, prove that its region of achievable requirements shrinks by a factor of at mostfrom the largest one. More surprisingly, our simulation results show that the region of achievable requirements by the approximate scheduling algorithm is close to the largest one.
Ii System overview
Ii-a Network model
Consider a distributed computing network consisting of a master and specialized workers . The master is running applications . Fig. 1 illustrates an example network with and . Suppose that data transfer between the master and the workers occurs instantaneously with no error. Note that the prior works on the coflow model focused on the time for data transfer. To investigate the unreliability of the workers, we ignore the time for data transfer; instead, focus on the time for computation.
Divide time into frames and index them by . At the beginning of each frame , each application stochastically
generates a job, where a job is a collection of tasks that can be processed by the corresponding workers. Precisely, we use vectorto represent the job generated by application in frame , where each element indicates if the job has a task for worker : if , then the job has a task for worker ; otherwise, it does not. See Fig. 1 for example. Each task is also stochastically generated, i.e.,
is a random variable for all, , and . By we denote the number of 1’s in vector ; in particular, if , then application generates no job in frame
. Suppose that the probability distribution of random variableis independently and identically distributed (i.i.d.) over frame , for all and . Suppose that the tasks generated by application for worker have the same workload. See Remark 15 later for time-varying workloads. Moreover, the jobs need real-time computations. Suppose that the deadline for each job is one frame. The real-time system has been justified in the literature, e.g., see .
Consider a time-varying processing speed for each worker. Suppose that the processing speed of each worker is i.i.d. over frames. With the i.i.d. assumption along with those constant workloads, we can assume that a task generated by application can be completed by worker (i.e., when ) with a constant probability over frames. At the end of each frame, each worker reports if its task is completed in that frame. A job is completed only when all its tasks are completed in the arriving frame. If any task of a job cannot be completed in the arriving frame, the job expires and is removed from the application.
Unaware of the completion of a task at the beginning of each frame, we suppose that the master assigns at most one task to a worker for each frame. If two jobs and , for some and , need the same worker in frame , i.e., for some , then we say the two jobs have interference. For example, jobs and in Fig. 1 have the interference.
As a result of the interference, the master has to decide a set of interference-free jobs for computing in each frame. Let be the set of interference-free jobs decided for computing in frame . For example, decision in Fig. 1 can be either or . If in Fig. 1, then workers and are allocated to job in frame ; moreover, job is completed only when the two workers complete their respective tasks in frame 1. A scheduling algorithm is a time sequence of the decisions for all frames.
Ii-B Problem formulation
Let random variable indicate if job is completed in frame under scheduling algorithm , where if job is generated (i.e., ) and all tasks of the job are completed by the corresponding workers in frame ; otherwise. The random variable depends on the random variables , the task completion probabilities , and a potential randomized scheduling algorithm .
We define the average number of completed jobs for application under scheduling algorithm by
Let vector represent an applications’ requirement for the average numbers of completed jobs. We say that requirement can be fulfilled (or achieved) by scheduling algorithm if for all . Moreover, We refer to requirement as a feasible requirement if there exists a scheduling algorithm that can fulfill the requirement. We define the maximum feasibility region as follows.
The maximum feasibility region is the (-dimensional) region consisting of all feasible requirements .
We define an optimal scheduling algorithm as follows.
A scheduling algorithm is called a feasibility-optimal111The feasibility-optimal scheduling defined in this paper is analogy to the throughput-optimal scheduling (e.g., ) or the timely-throughput-optimal scheduling (e.g., ). scheduling algorithm if, for any requirement interior222We say that requirement is interior of the region if there exists an such that lies in the region . The concept of the strictly feasible requirement has been widely used in the throughput-optimal scheduling or timely-throughput-optimal scheduling. of , it can be fulfilled by the scheduling algorithm .
The goal of this paper is to devise a feasibility-optimal scheduling algorithm.
Iii Scheduling algorithm design
In this section, we develop a feasibility-optimal scheduling algorithm for managing the stochastic real-time jobs in the unreliable masters. To that end, we introduce a virtual queueing network in Section III-A. With the assistance of the virtual queueing network, we propose a feasibility-optimal scheduling design in Section III-B. However, the proposed feasibility-optimal scheduling algorithm involves a combinatorial optimization problem. We show that the combinatorial optimization problem is NP-hard. Thus, we develop a tractable approximate scheduling algorithm in Section III-C; meanwhile, we establish its approximation ratio.
Iii-a Virtual queueing network
Given the distributed computing network with scheduling algorithm and requirement , we construct a virtual queueing network. The virtual queueing network consists of queues , operating under the same frame system as that in Section II-A. For example, Fig. 2 is the virtual queueing network for the distributed computing network in Fig. 1. We want to emphasize that the virtual queueing network is not a real-world network. It is introduced for the scheduling design in Section III-B.
At the beginning of each frame , a fixed number333The virtual queueing network has a fractional number of packets. of packets arrive at queue . At the end of frame , queue can remove packet, i.e., if job is completed in frame , then queue can remove one packet at the end of frame ; otherwise, it removes no packet in frame . Again, note that those packets are not real-word packets. We summarize the packet arrival rate and the packet service rate as follows.
The packet arrival rate for queue is , and the packet service rate for queue is .
The packet arrival rate for is . The packet service rate for is . ∎
Let be the queue size at queue at the beginning (before new packet arrival) of frame . Then, the queueing dynamics of queue can be expressed by . Let be the vector of all queue sizes at the beginning of frame . We define the notion of a stable queue in Definition 4, followed by introducing a necessary condition for the stable queue in Proposition 5.
Queue is stable if the average total queue size is finite.
Proposition 5 (, Lemma 3.6).
If queue is stable, then its packet service rate is greater than or equal to its packet arrival rate.
We want to emphasize that, unlike traditional stochastic networks (e.g., [13, 7]), each packet in our virtual queueing network can be removed only when all associated tasks are completed in its arriving frame. Thus, our paper generalizes to stochastic networks with multiple required servers; in particular, we develop a tractable approximate scheduling algorithm for the scenario in Section III-C.
Iii-B Feasibility-optimal scheduling algorithm
In this section, we propose a feasibility-optimal scheduling algorithm in Alg. 1. At the beginning of frame 1, Alg. 1 (in Line 1) initializes all queue sizes to be zeros. At the beginning of each frame , Alg. 1 (in Line 1) updates each queue with the new arriving packets; then, Alg. 1 (in Line 1) decides for that frame according to the present queue size vector . The decision is made for maximizing the weighted sum of the queue sizes in Eq. (2). The term in Eq. (2) calculates the expected packet service rate for , where the indicator function indicates if job is generated in frame , and if so, that job can be completed with probability . The underlying idea of Alg. 1 is to remove as many packets from the virtual queueing network as possible (for stabilizing all queues).
After performing the decision , Alg. 1 (in Line 1) updates each at the end of frame : if job is scheduled, the job is indeed generated, and all its required workers complete their respective tasks, then one packet is removed from queue in the virtual queuing network.
Take Figs. 1 and 2 for example. Suppose that and for all , and . According to Line 1, Alg. 1 calculates and . Thus, Alg. 1 decides to compute for frame 1. If workers , , and in Fig. 1 can complete their respective tasks in frame 1, then one packet is removed from queue in Fig. 2 at the end of frame 1, i.e., queue has packet at the end of frame 1.
Alg. 1 is a feasibility-optimal scheduling algorithm.
Let vector represent the state of the virtual queueing network in frame . Note that the state changes over frames but its probability distribution is i.i.d., according to the assumption in Section II-A. Following the standard argument of the Lyapunov theory in [13, Chapter 4] along with the i.i.d. property of the state, we can prove that for any requirement interior of , all queues in the virtual queueing network (associated with Alg. 1) are stable. That is, Alg. 1 can fulfill the requirement by Propositions 3 and 5. Thus, Alg. 1 is feasibility-optimal. ∎
Iii-C Tractable approximate scheduling algorithm
The combinatorial optimization problem in Alg. 1 in frame is NP-hard, for all .
To study the NP-hard problem, we define two notions of approximation ratios as follows. While Definition 9 studies the resulting value in Eq. (2), Definition 10 investigates the resulting region of achievable requirements.
A scheduling algorithm is called a -approximate scheduling algorithm to if, for any requirement interior of , requirement can be fulfilled by the scheduling algorithm .
Unlike Alg. 1 solving the combinatorial optimization problem, Alg. 2 (in Line 2) simply sorts all jobs according to the values computed by Eq. (3). Let (in Line 2) denote the sorted jobs in frame in descending order of the values from Eq. (3). In addition, let (in Line 2) indicate if job has a task for worker in frame . While the numerator of Eq. (3) indicates the weight in Eq. (2) for job , the denominator of that reflects the maximum number of jobs interfered by job . The underlying idea of Alg. 2 is to consider jobs in order, for achieving a higher value of Eq. (2) and at the same time keeping the interference as low as possible.
More precisely, Alg. 2 uses a set to record (in Line 2) the available workers that are not allocated yet, where set is initialized to be in Line 2. Then, at the -th iteration of Line 2, Alg. 2 checks if job satisfies the two conditions in Line 2: the first condition means that job is generated and the second condition means that its required workers are all available. If job meets the conditions, then it is scheduled as in Line 2. In addition, if job is scheduled, then set is updated as in Line 2 by removing the workers allocated to job . After deciding , Alg. 2 performs the decision in Line 2 for frame , followed by updating the queue sizes in Line 2.
See Appendix B. ∎
We remark that the approximation ratio of is the best approximation ratio to Eq. (2). That is because the combinatorial optimization problem in Alg. 1 is computationally harder than the set packing problem (see Lemma 8) and the best approximation ratio to the set packing problem is the square root (see ).
Alg. 2 is a -approximate scheduling algorithm to .
See Appendix C. ∎
We remark that our methodology can apply to the case of time-varying workloads. Let be the workload generated by application for worker in frame . We just need to revise the constant task completion probability in Algs. 1 and 2 to be the probability of completing workload . If workload is i.i.d. over frames for all and , then Alg. 1 is still a feasibility-optimal scheduling algorithm and Alg. 2 is still a -approximate scheduling algorithm.
Iv Numerical results
In this section, we investigate Algs. 1 and 2 via computer simulations. First, we consider two applications and two workers. Fig. 3 displays the regions of achievable requirements by both scheduling algorithms for various task generation probabilities by application , when and for all , , and are fixed. Fig. 4 displays the regions of achievable requirements by both scheduling algorithms for various task completion probabilities by worker , when and for all , , and are fixed. Each result marked in Figs. 3 or 4 is the requirement such that the average number of completed jobs in 10,000 frames for application is at least and that for application is at least . The both figures reflect that Alg. 2 is not only computationally efficient but also can fulfill almost all requirements within (achievable by Alg. 1).
Second, we consider more applications and more workers with the same quantities, i.e., . Moreover, all task completion probabilities are fixed to be 0.9, i.e., for all and . Then, Fig. 5 displays the maximum achievable requirements (for the case of for all ) by Alg. 2, when all task generation probabilities are the same. In this case, an application generates a job in a frame with probability . When in Fig. 5, the lower task generation probability the lower achievable requirement, because a lower task generation probability generates fewer jobs. In contrast, when in Fig. 5, the lower task generation probability the higher achievable requirement, because fewer jobs cause less interference. In other words, the interference becomes severe when . Moreover, from Fig. 5, the maximum achievable requirement by Alg. 2 appears to decrease super-linearly with the number of applications.
V Concluding remarks
In this paper, we provided a framework for studying stochastic real-time jobs in unreliable workers with specialized functions. In particular, we developed two algorithms for scheduling real-time jobs in shared unreliable workers. While the proposed feasibility-optimal scheduling algorithm can support the largest region of applications’ requirements, it has the notorious NP-hard issue. In contrast, the proposed approximate scheduling algorithm is not only simple, but also has a provable guarantee for the region of achievable requirements. Moreover, we note that coding techniques have been exploited to alleviate stragglers in distributed computing networks, e.g., [9, 18]. Including coding design into our framework is promising.
Appendix A Proof of Lemma 8
We show a reduction from the set packing problem , where given a collection of non-empty sets over a universal set for some positive integers and , the objective is to identify a sub-collection of disjoint sets in that collection such that the number of sets in the sub-collection is maximized.
For the given instance of the set packing problem, we construct applications and workers in the distributed computing network. Consider a fixed frame . In frame , application generates job . With the transformation, the set packing problem is equivalent to identifying a set of interference-free jobs in frame such that number of jobs in that set is maximized.
Moreover, consider no job until frame , identical requirements for all , and identical task completion probabilities for all and . In this context, Eq. (2) in frame becomes
because , (due to non-empty sets for all ), and . As a result of the constant in Eq. (4), the objective of the combinatorial optimization problem in Alg. 1 in frame becomes identifying a set of interference-free jobs such that the number of jobs in that set is maximized.
Suppose there exists an algorithm such that the combinatorial optimization problem in Alg. 1 in frame can be solved in polynomial time. Then, the polynomial-time algorithm can identify a set for maximizing the value in Eq (4); in turn, solves the set packing problem. That contradicts to the NP-hardness of the set packing problem.
Because the above argument is true for all frames , we conclude that the combinatorial optimization problem in Alg. 1 in frame is NP-hard, for all .
Appendix B Proof of Lemma 12
Consider a fixed queue size vector in a fixed frame . Let for all . Without loss of generality, we can assume that for all and further assume that (by reordering the job indices), i.e., Alg. 2 processes job at the -th iteration of Line 2. Let be the decision of Alg. 2 in frame for the given queue size vector . Then, we can express the value of Eq. (2) computed by Alg. 2 as
Let be the decision of Alg. 1 in frame for the given queue size vector . If the conditions in Line 2 of Alg. 2 hold for the -th iteration (i.e., ), then we let 444Here, we use to represent the set of common workers for jobs and . be a set of jobs. The set has the following properties:
For job , we have
All jobs in are interference-free, i.e., they need different workers, since . Moreover, job needs at least one of the workers for (i.e., ). Thus, we have
Since all jobs in need different workers, and there are workers, we have
Note that . Thus, we can bound computed by Alg. 1 by
where for all .
Furthermore, we can bound for each by
Appendix C Proof of Theorem 14
The proof of Theorem 14 needs the following technical lemma, whose proof follows the line of [13, Appendix 4.A] along with the i.i.d. property of state (as discussed in the proof of Theorem 7) and the constant task completion probabilities .
There exists a stationary scheduling algorithm (i.e., decision depends on the state in frame only) such that, for any requirement interior of , all queues in the virtual queueing network are stable, i.e., the stationary scheduling algorithm can fulfill the requirement .
Moreover, we need the Lyapunov theory [13, Thoereom 4.1] as stated in the following lemma, where we consider the Lyapunov function .
Given scheduling algorithm and requirement , if there exist constants and such that
for all frames , then all queues in the virtual queueing network are stable, i.e., the scheduling algorithm can fulfill the requirement .
Then, we are ready to prove Theorem 14. Suppose that requirement is interior of . By Lemma 16, there exists a stationary scheduling algorithm that can fulfill the requirement . We denote that stationary scheduling algorithm by . Moreover, since requirement is interior of , requirement for some is also interior of . By Lemma 16 again, the stationary scheduling algorithm can fulfill requirement , i.e.,
for all .
where (a) follows [13, Chapter 4] with some constant ; (b) is because and the approximation ratio of Alg. 2 to Eq. (2) is (as stated in Lemma 12); (c) is because Alg. 1 (in Line 1) maximizes the value of among all possible scheduling algorithms ; (d) is because decision under stationary scheduling algorithm depends on the state only (regardless of the queue sizes) and also the state is i.i.d. over frames, yielding for all and ; (e) follows Eq. (11).
-  (2013) Effective Straggler Mitigation: Attack of the Clones. Proc. of NSDI, pp. 185–198. Cited by: §I.
-  (2012) Coflow: A Networking Abstraction for Cluster Applications.. Proc. of ACM HotNets, pp. 31–36. Cited by: §I.
-  (2014) Efficient Coflow Scheduling with Varys. Proc. of ACM SIGCOMM 44 (4), pp. 443–454. Cited by: §I.
-  (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51 (1), pp. 107–113. Cited by: §I.
-  (2006) Resource Allocation and Cross-Layer Control in Wireless Networks. Vol. 1, Now Publishers, Inc.. Cited by: Proposition 5.
-  (1998) Independent Sets with Domination Constraints. Proc. of ICALP, pp. 176–187. Cited by: Appendix A, §III-C, Remark 13.
-  (2013) Packets with Deadlines: A Framework for Real-Time Wireless Networks. Vol. 6, Morgan & Claypool Publishers. Cited by: §II-A, §III-A, footnote 1.
-  (2019) Matroid Coflow Scheduling. Proc. of ICALP, pp. 1–14. Cited by: §I.
Speeding Up Distributed Machine Learning using Codes. ieee_j_it 64 (3), pp. 1514–1529. Cited by: §V.
-  (2018) Efficient Scheduling for Synchronized Demands in Stochastic Networks. Proc. of IEEE WiOpt, pp. 1–8. Cited by: §I.
-  (2016) Efficient Online Coflow Routing and Scheduling. Proc. of ACM MobiHoc, pp. 161–170. Cited by: §I.
-  (2017) Coflow Scheduling in Input-Queued Switches: Optimal Delay Scaling and Algorithms. Proc. of IEEE INFOCOM, pp. 1–9. Cited by: §I.
-  (2010) Stochastic Network Optimization with Application to Communication and Queueing Systems. Vol. 3, Morgan & Claypool Publishers. Cited by: Appendix C, Appendix C, Appendix C, §III-A, §III-B, §III-B, footnote 1.
-  (2018) An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters. ieee_j_net 26 (4), pp. 1674–1687. Cited by: §I.
-  (2018) Coflow Deadline Scheduling via Network-Aware Optimization. Proc. of Allerton, pp. 829–833. Cited by: §I.
-  Scalable Agent-Based Simulation of Players in Massively Multiplayer Online Games. Proc. of SCAI, pp. 80–89. Cited by: §I.
-  (2018) A Survey of Coflow Scheduling Schemes for Data Center Networks. ieee_m_com 56 (6), pp. 179–185. Cited by: §I.
-  (2019) Timely-Throughput Optimal Coded Computing over Cloud Networks. Proc of ACM MobiHoc, pp. 301–310. Cited by: §V.