I Introduction
Distributed computing networks (such as MapReduce [4]) become increasingly popular to support dataintensive jobs. The underlying idea to process a dataintensive job is to divide the job into a group of small tasks that can be processed in parallel by multiple workers. In general, a worker can be specialized to process a type of tasks. For example, MapReduce allows an application to specify its computing network. Another outstanding example is distributed computing networks for massive multiplayer online games [16]. The online game system illustrated in Fig. 1 includes one master and four workers processing different types of tasks. The master is serving two players. While the present job of player 1 needs two types of workers to get completed, that of player 2 needs three types of workers.
Moreover, because of the realtime nature of latencyintensive applications (e.g., online games), a realtime job needs to be completed in a deadline. To maximize the number of jobs that meet the deadline, a scheduling algorithm allocating workers to jobs is needed. Joblevel scheduling poses more challenges than packetlevel scheduling. That is because all tasks in a job are dependent in the sense that a job is not completed until all its tasks are completed, but all packets or tasks in traditional packetbased networks are independently treated.
Most prior research on joblevel scheduling considered generalpurpose workers. The closest scenario to ours (i.e., specialized workers) is the coflow model proposed in [2], where a coflow is a job consisting of tasks of various types. Since the coflow model was proposed, coflow scheduling has been a hot topic, e.g. [3, 11, 14, 15, 8]. See the recent survey paper [17]. However, almost all prior research on the coflow scheduling focused on deterministic networks; in contrast, little attention was given to stochastic networks. Note that a job can be randomly generated; moreover, a worker can be unreliable because of unpredictable events [1] like hardware failures. Because of the practical issues, a scheduling algorithm for stochastic realtime jobs in unreliable workers is crucial in distributed computing networks. The most relevant works to ours are [12, 10]. While [12] focused on homogeneous stochastic jobs in the coflow model, [10] extended to a heterogeneous case. The fundamental difference between those relevant works and ours is that we consider stochastic realtime jobs and unreliable workers.
In this paper, we consider a master and
specialized workers. The master is running multiple applications, which stochastically generate realtime jobs with a hard deadline. The workers are unreliable. Our main contribution lies in developing job scheduling algorithms with provable performance guarantees. Leveraging Lyapunov techniques, we propose a feasibilityoptimal scheduling algorithm for maximizing the region of achievable requirements for the average number of completed jobs. However, the feasibilityscheduling algorithm turns out to involve an NPhard combinatorial optimization problem. To tackle the computational issue, we propose an approximate scheduling algorithm that is computationally tractable; furthermore, prove that its region of achievable requirements shrinks by a factor of at most
from the largest one. More surprisingly, our simulation results show that the region of achievable requirements by the approximate scheduling algorithm is close to the largest one.Ii System overview
Iia Network model
Consider a distributed computing network consisting of a master and specialized workers . The master is running applications . Fig. 1 illustrates an example network with and . Suppose that data transfer between the master and the workers occurs instantaneously with no error. Note that the prior works on the coflow model focused on the time for data transfer. To investigate the unreliability of the workers, we ignore the time for data transfer; instead, focus on the time for computation.
Divide time into frames and index them by . At the beginning of each frame , each application stochastically
generates a job, where a job is a collection of tasks that can be processed by the corresponding workers. Precisely, we use vector
to represent the job generated by application in frame , where each element indicates if the job has a task for worker : if , then the job has a task for worker ; otherwise, it does not. See Fig. 1 for example. Each task is also stochastically generated, i.e.,is a random variable for all
, , and . By we denote the number of 1’s in vector ; in particular, if , then application generates no job in frame. Suppose that the probability distribution of random variable
is independently and identically distributed (i.i.d.) over frame , for all and . Suppose that the tasks generated by application for worker have the same workload. See Remark 15 later for timevarying workloads. Moreover, the jobs need realtime computations. Suppose that the deadline for each job is one frame. The realtime system has been justified in the literature, e.g., see [7].Consider a timevarying processing speed for each worker. Suppose that the processing speed of each worker is i.i.d. over frames. With the i.i.d. assumption along with those constant workloads, we can assume that a task generated by application can be completed by worker (i.e., when ) with a constant probability over frames. At the end of each frame, each worker reports if its task is completed in that frame. A job is completed only when all its tasks are completed in the arriving frame. If any task of a job cannot be completed in the arriving frame, the job expires and is removed from the application.
Unaware of the completion of a task at the beginning of each frame, we suppose that the master assigns at most one task to a worker for each frame. If two jobs and , for some and , need the same worker in frame , i.e., for some , then we say the two jobs have interference. For example, jobs and in Fig. 1 have the interference.
As a result of the interference, the master has to decide a set of interferencefree jobs for computing in each frame. Let be the set of interferencefree jobs decided for computing in frame . For example, decision in Fig. 1 can be either or . If in Fig. 1, then workers and are allocated to job in frame ; moreover, job is completed only when the two workers complete their respective tasks in frame 1. A scheduling algorithm is a time sequence of the decisions for all frames.
IiB Problem formulation
Let random variable indicate if job is completed in frame under scheduling algorithm , where if job is generated (i.e., ) and all tasks of the job are completed by the corresponding workers in frame ; otherwise. The random variable depends on the random variables , the task completion probabilities , and a potential randomized scheduling algorithm .
We define the average number of completed jobs for application under scheduling algorithm by
(1) 
Let vector represent an applications’ requirement for the average numbers of completed jobs. We say that requirement can be fulfilled (or achieved) by scheduling algorithm if for all . Moreover, We refer to requirement as a feasible requirement if there exists a scheduling algorithm that can fulfill the requirement. We define the maximum feasibility region as follows.
Definition 1.
The maximum feasibility region is the (dimensional) region consisting of all feasible requirements .
We define an optimal scheduling algorithm as follows.
Definition 2.
A scheduling algorithm is called a feasibilityoptimal^{1}^{1}1The feasibilityoptimal scheduling defined in this paper is analogy to the throughputoptimal scheduling (e.g., [13]) or the timelythroughputoptimal scheduling (e.g., [7]). scheduling algorithm if, for any requirement interior^{2}^{2}2We say that requirement is interior of the region if there exists an such that lies in the region . The concept of the strictly feasible requirement has been widely used in the throughputoptimal scheduling or timelythroughputoptimal scheduling. of , it can be fulfilled by the scheduling algorithm .
The goal of this paper is to devise a feasibilityoptimal scheduling algorithm.
Iii Scheduling algorithm design
In this section, we develop a feasibilityoptimal scheduling algorithm for managing the stochastic realtime jobs in the unreliable masters. To that end, we introduce a virtual queueing network in Section IIIA. With the assistance of the virtual queueing network, we propose a feasibilityoptimal scheduling design in Section IIIB. However, the proposed feasibilityoptimal scheduling algorithm involves a combinatorial optimization problem. We show that the combinatorial optimization problem is NPhard. Thus, we develop a tractable approximate scheduling algorithm in Section IIIC; meanwhile, we establish its approximation ratio.
Iiia Virtual queueing network
Given the distributed computing network with scheduling algorithm and requirement , we construct a virtual queueing network. The virtual queueing network consists of queues , operating under the same frame system as that in Section IIA. For example, Fig. 2 is the virtual queueing network for the distributed computing network in Fig. 1. We want to emphasize that the virtual queueing network is not a realworld network. It is introduced for the scheduling design in Section IIIB.
At the beginning of each frame , a fixed number^{3}^{3}3The virtual queueing network has a fractional number of packets. of packets arrive at queue . At the end of frame , queue can remove packet, i.e., if job is completed in frame , then queue can remove one packet at the end of frame ; otherwise, it removes no packet in frame . Again, note that those packets are not realword packets. We summarize the packet arrival rate and the packet service rate as follows.
Proposition 3.
The packet arrival rate for queue is , and the packet service rate for queue is .
Proof.
The packet arrival rate for is . The packet service rate for is . ∎
Let be the queue size at queue at the beginning (before new packet arrival) of frame . Then, the queueing dynamics of queue can be expressed by . Let be the vector of all queue sizes at the beginning of frame . We define the notion of a stable queue in Definition 4, followed by introducing a necessary condition for the stable queue in Proposition 5.
Definition 4.
Queue is stable if the average total queue size is finite.
Proposition 5 ([5], Lemma 3.6).
If queue is stable, then its packet service rate is greater than or equal to its packet arrival rate.
By Propositions 3 and 5, we can turn our attention to developing a scheduling algorithm such that, for any requirement interior of , all queues in the virtual queueing network are stable.
We want to emphasize that, unlike traditional stochastic networks (e.g., [13, 7]), each packet in our virtual queueing network can be removed only when all associated tasks are completed in its arriving frame. Thus, our paper generalizes to stochastic networks with multiple required servers; in particular, we develop a tractable approximate scheduling algorithm for the scenario in Section IIIC.
IiiB Feasibilityoptimal scheduling algorithm
(2) 
In this section, we propose a feasibilityoptimal scheduling algorithm in Alg. 1. At the beginning of frame 1, Alg. 1 (in Line 1) initializes all queue sizes to be zeros. At the beginning of each frame , Alg. 1 (in Line 1) updates each queue with the new arriving packets; then, Alg. 1 (in Line 1) decides for that frame according to the present queue size vector . The decision is made for maximizing the weighted sum of the queue sizes in Eq. (2). The term in Eq. (2) calculates the expected packet service rate for , where the indicator function indicates if job is generated in frame , and if so, that job can be completed with probability . The underlying idea of Alg. 1 is to remove as many packets from the virtual queueing network as possible (for stabilizing all queues).
After performing the decision , Alg. 1 (in Line 1) updates each at the end of frame : if job is scheduled, the job is indeed generated, and all its required workers complete their respective tasks, then one packet is removed from queue in the virtual queuing network.
Example 6.
Take Figs. 1 and 2 for example. Suppose that and for all , and . According to Line 1, Alg. 1 calculates and . Thus, Alg. 1 decides to compute for frame 1. If workers , , and in Fig. 1 can complete their respective tasks in frame 1, then one packet is removed from queue in Fig. 2 at the end of frame 1, i.e., queue has packet at the end of frame 1.
Theorem 7.
Alg. 1 is a feasibilityoptimal scheduling algorithm.
Proof.
Let vector represent the state of the virtual queueing network in frame . Note that the state changes over frames but its probability distribution is i.i.d., according to the assumption in Section IIA. Following the standard argument of the Lyapunov theory in [13, Chapter 4] along with the i.i.d. property of the state, we can prove that for any requirement interior of , all queues in the virtual queueing network (associated with Alg. 1) are stable. That is, Alg. 1 can fulfill the requirement by Propositions 3 and 5. Thus, Alg. 1 is feasibilityoptimal. ∎
IiiC Tractable approximate scheduling algorithm
We show (in the next lemma) that the combinatorial optimization problem in Line 1 of Alg. 1 is NPhard. Therefore, Alg. 1 is computationally intractable.
Lemma 8.
The combinatorial optimization problem in Alg. 1 in frame is NPhard, for all .
(3) 
To study the NPhard problem, we define two notions of approximation ratios as follows. While Definition 9 studies the resulting value in Eq. (2), Definition 10 investigates the resulting region of achievable requirements.
Definition 9.
Definition 10.
A scheduling algorithm is called a approximate scheduling algorithm to if, for any requirement interior of , requirement can be fulfilled by the scheduling algorithm .
In this paper, we propose an approximate scheduling algorithm in Alg. 2. The procedure of Alg. 2 is similar to that of Alg. 1; hence, we point out key differences in the following.
Unlike Alg. 1 solving the combinatorial optimization problem, Alg. 2 (in Line 2) simply sorts all jobs according to the values computed by Eq. (3). Let (in Line 2) denote the sorted jobs in frame in descending order of the values from Eq. (3). In addition, let (in Line 2) indicate if job has a task for worker in frame . While the numerator of Eq. (3) indicates the weight in Eq. (2) for job , the denominator of that reflects the maximum number of jobs interfered by job . The underlying idea of Alg. 2 is to consider jobs in order, for achieving a higher value of Eq. (2) and at the same time keeping the interference as low as possible.
More precisely, Alg. 2 uses a set to record (in Line 2) the available workers that are not allocated yet, where set is initialized to be in Line 2. Then, at the th iteration of Line 2, Alg. 2 checks if job satisfies the two conditions in Line 2: the first condition means that job is generated and the second condition means that its required workers are all available. If job meets the conditions, then it is scheduled as in Line 2. In addition, if job is scheduled, then set is updated as in Line 2 by removing the workers allocated to job . After deciding , Alg. 2 performs the decision in Line 2 for frame , followed by updating the queue sizes in Line 2.
Example 11.
Proof.
See Appendix B. ∎
Remark 13.
We remark that the approximation ratio of is the best approximation ratio to Eq. (2). That is because the combinatorial optimization problem in Alg. 1 is computationally harder than the set packing problem (see Lemma 8) and the best approximation ratio to the set packing problem is the square root (see [6]).
Theorem 14.
Alg. 2 is a approximate scheduling algorithm to .
Proof.
See Appendix C. ∎
The computational complexity of Alg. 2 is primarily caused by sorting all queues in Line 2. Thus, Alg. 2 is tractable when the number of applications is large.
Remark 15.
We remark that our methodology can apply to the case of timevarying workloads. Let be the workload generated by application for worker in frame . We just need to revise the constant task completion probability in Algs. 1 and 2 to be the probability of completing workload . If workload is i.i.d. over frames for all and , then Alg. 1 is still a feasibilityoptimal scheduling algorithm and Alg. 2 is still a approximate scheduling algorithm.
Iv Numerical results
In this section, we investigate Algs. 1 and 2 via computer simulations. First, we consider two applications and two workers. Fig. 3 displays the regions of achievable requirements by both scheduling algorithms for various task generation probabilities by application , when and for all , , and are fixed. Fig. 4 displays the regions of achievable requirements by both scheduling algorithms for various task completion probabilities by worker , when and for all , , and are fixed. Each result marked in Figs. 3 or 4 is the requirement such that the average number of completed jobs in 10,000 frames for application is at least and that for application is at least . The both figures reflect that Alg. 2 is not only computationally efficient but also can fulfill almost all requirements within (achievable by Alg. 1).
Second, we consider more applications and more workers with the same quantities, i.e., . Moreover, all task completion probabilities are fixed to be 0.9, i.e., for all and . Then, Fig. 5 displays the maximum achievable requirements (for the case of for all ) by Alg. 2, when all task generation probabilities are the same. In this case, an application generates a job in a frame with probability . When in Fig. 5, the lower task generation probability the lower achievable requirement, because a lower task generation probability generates fewer jobs. In contrast, when in Fig. 5, the lower task generation probability the higher achievable requirement, because fewer jobs cause less interference. In other words, the interference becomes severe when . Moreover, from Fig. 5, the maximum achievable requirement by Alg. 2 appears to decrease superlinearly with the number of applications.
V Concluding remarks
In this paper, we provided a framework for studying stochastic realtime jobs in unreliable workers with specialized functions. In particular, we developed two algorithms for scheduling realtime jobs in shared unreliable workers. While the proposed feasibilityoptimal scheduling algorithm can support the largest region of applications’ requirements, it has the notorious NPhard issue. In contrast, the proposed approximate scheduling algorithm is not only simple, but also has a provable guarantee for the region of achievable requirements. Moreover, we note that coding techniques have been exploited to alleviate stragglers in distributed computing networks, e.g., [9, 18]. Including coding design into our framework is promising.
Appendix A Proof of Lemma 8
We show a reduction from the set packing problem [6], where given a collection of nonempty sets over a universal set for some positive integers and , the objective is to identify a subcollection of disjoint sets in that collection such that the number of sets in the subcollection is maximized.
For the given instance of the set packing problem, we construct applications and workers in the distributed computing network. Consider a fixed frame . In frame , application generates job . With the transformation, the set packing problem is equivalent to identifying a set of interferencefree jobs in frame such that number of jobs in that set is maximized.
Moreover, consider no job until frame , identical requirements for all , and identical task completion probabilities for all and . In this context, Eq. (2) in frame becomes
(4) 
because , (due to nonempty sets for all ), and . As a result of the constant in Eq. (4), the objective of the combinatorial optimization problem in Alg. 1 in frame becomes identifying a set of interferencefree jobs such that the number of jobs in that set is maximized.
Suppose there exists an algorithm such that the combinatorial optimization problem in Alg. 1 in frame can be solved in polynomial time. Then, the polynomialtime algorithm can identify a set for maximizing the value in Eq (4); in turn, solves the set packing problem. That contradicts to the NPhardness of the set packing problem.
Because the above argument is true for all frames , we conclude that the combinatorial optimization problem in Alg. 1 in frame is NPhard, for all .
Appendix B Proof of Lemma 12
Consider a fixed queue size vector in a fixed frame . Let for all . Without loss of generality, we can assume that for all and further assume that (by reordering the job indices), i.e., Alg. 2 processes job at the th iteration of Line 2. Let be the decision of Alg. 2 in frame for the given queue size vector . Then, we can express the value of Eq. (2) computed by Alg. 2 as
(5) 
Let be the decision of Alg. 1 in frame for the given queue size vector . If the conditions in Line 2 of Alg. 2 hold for the th iteration (i.e., ), then we let ^{4}^{4}4Here, we use to represent the set of common workers for jobs and . be a set of jobs. The set has the following properties:

For job , we have
(6) since .

All jobs in are interferencefree, i.e., they need different workers, since . Moreover, job needs at least one of the workers for (i.e., ). Thus, we have
(7) 
Since all jobs in need different workers, and there are workers, we have
(8)
Appendix C Proof of Theorem 14
The proof of Theorem 14 needs the following technical lemma, whose proof follows the line of [13, Appendix 4.A] along with the i.i.d. property of state (as discussed in the proof of Theorem 7) and the constant task completion probabilities .
Lemma 16.
There exists a stationary scheduling algorithm (i.e., decision depends on the state in frame only) such that, for any requirement interior of , all queues in the virtual queueing network are stable, i.e., the stationary scheduling algorithm can fulfill the requirement .
Moreover, we need the Lyapunov theory [13, Thoereom 4.1] as stated in the following lemma, where we consider the Lyapunov function .
Lemma 17.
Given scheduling algorithm and requirement , if there exist constants and such that
for all frames , then all queues in the virtual queueing network are stable, i.e., the scheduling algorithm can fulfill the requirement .
Then, we are ready to prove Theorem 14. Suppose that requirement is interior of . By Lemma 16, there exists a stationary scheduling algorithm that can fulfill the requirement . We denote that stationary scheduling algorithm by . Moreover, since requirement is interior of , requirement for some is also interior of . By Lemma 16 again, the stationary scheduling algorithm can fulfill requirement , i.e.,
(11) 
for all .
Consider requirement where for all . Next, applying Lemma 17 to Alg. 2, we conclude that Alg. 2 can fulfill requirement because
where (a) follows [13, Chapter 4] with some constant ; (b) is because and the approximation ratio of Alg. 2 to Eq. (2) is (as stated in Lemma 12); (c) is because Alg. 1 (in Line 1) maximizes the value of among all possible scheduling algorithms ; (d) is because decision under stationary scheduling algorithm depends on the state only (regardless of the queue sizes) and also the state is i.i.d. over frames, yielding for all and ; (e) follows Eq. (11).
References
 [1] (2013) Effective Straggler Mitigation: Attack of the Clones. Proc. of NSDI, pp. 185–198. Cited by: §I.
 [2] (2012) Coflow: A Networking Abstraction for Cluster Applications.. Proc. of ACM HotNets, pp. 31–36. Cited by: §I.
 [3] (2014) Efficient Coflow Scheduling with Varys. Proc. of ACM SIGCOMM 44 (4), pp. 443–454. Cited by: §I.
 [4] (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51 (1), pp. 107–113. Cited by: §I.
 [5] (2006) Resource Allocation and CrossLayer Control in Wireless Networks. Vol. 1, Now Publishers, Inc.. Cited by: Proposition 5.
 [6] (1998) Independent Sets with Domination Constraints. Proc. of ICALP, pp. 176–187. Cited by: Appendix A, §IIIC, Remark 13.
 [7] (2013) Packets with Deadlines: A Framework for RealTime Wireless Networks. Vol. 6, Morgan & Claypool Publishers. Cited by: §IIA, §IIIA, footnote 1.
 [8] (2019) Matroid Coflow Scheduling. Proc. of ICALP, pp. 1–14. Cited by: §I.

[9]
(2017)
Speeding Up Distributed Machine Learning using Codes
. ieee_j_it 64 (3), pp. 1514–1529. Cited by: §V.  [10] (2018) Efficient Scheduling for Synchronized Demands in Stochastic Networks. Proc. of IEEE WiOpt, pp. 1–8. Cited by: §I.
 [11] (2016) Efficient Online Coflow Routing and Scheduling. Proc. of ACM MobiHoc, pp. 161–170. Cited by: §I.
 [12] (2017) Coflow Scheduling in InputQueued Switches: Optimal Delay Scaling and Algorithms. Proc. of IEEE INFOCOM, pp. 1–9. Cited by: §I.
 [13] (2010) Stochastic Network Optimization with Application to Communication and Queueing Systems. Vol. 3, Morgan & Claypool Publishers. Cited by: Appendix C, Appendix C, Appendix C, §IIIA, §IIIB, §IIIB, footnote 1.
 [14] (2018) An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters. ieee_j_net 26 (4), pp. 1674–1687. Cited by: §I.
 [15] (2018) Coflow Deadline Scheduling via NetworkAware Optimization. Proc. of Allerton, pp. 829–833. Cited by: §I.
 [16] Scalable AgentBased Simulation of Players in Massively Multiplayer Online Games. Proc. of SCAI, pp. 80–89. Cited by: §I.
 [17] (2018) A Survey of Coflow Scheduling Schemes for Data Center Networks. ieee_m_com 56 (6), pp. 179–185. Cited by: §I.
 [18] (2019) TimelyThroughput Optimal Coded Computing over Cloud Networks. Proc of ACM MobiHoc, pp. 301–310. Cited by: §V.
Comments
There are no comments yet.