GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks

We consider the classical problem of scheduling task graphs corresponding to complex applications on distributed computing systems. A number of heuristics have been previously proposed to optimize task scheduling with respect to metrics such as makespan and throughput. However, they tend to be slow to run, particularly for larger problem instances, limiting their applicability in more dynamic systems. Motivated by the goal of solving these problems more rapidly, we propose, for the first time, a graph convolutional network-based scheduler (GCNScheduler). By carefully integrating an inter-task data dependency structure with network settings into an input graph and feeding it to an appropriate GCN, the GCNScheduler can efficiently schedule tasks of complex applications for a given objective. We evaluate our scheme with baselines through simulations. We show that not only can our scheme quickly and efficiently learn from existing scheduling schemes, but also it can easily be applied to large-scale settings where current scheduling schemes fail to handle. We show that it achieves better makespan than the classic HEFT algorithm, and almost the same throughput as throughput-oriented HEFT (TP-HEFT), while providing several orders of magnitude faster scheduling times in both cases. For example, for makespan minimization, GCNScheduler schedules 50-node task graphs in about 4 milliseconds while HEFT takes more than 1500 seconds; and for throughput maximization, GCNScheduler schedules 100-node task graphs in about 3.3 milliseconds, compared to about 6.9 seconds for TP-HEFT.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/18/2020

Distributed Scheduling using Graph Neural Networks

A fundamental problem in the design of wireless networks is to efficient...
09/12/2021

Link Scheduling using Graph Neural Networks

Efficient scheduling of transmissions is a key problem in wireless netwo...
12/27/2021

Design and Experimental Evaluation of Algorithms for Optimizing the Throughput of Dispersed Computing

With growing deployment of Internet of Things (IoT) and machine learning...
11/30/2020

Value Function Based Performance Optimization of Deep Learning Workloads

As machine learning techniques become ubiquitous, the efficiency of neur...
04/25/2019

Genet: A Quickly Scalable Fat-Tree Overlay for Personal Volunteer Computing using WebRTC

WebRTC enables browsers to exchange data directly but the number of poss...
10/31/2021

Graph Neural Network based scheduling : Improved throughput under a generalized interference model

In this work, we propose a Graph Convolutional Neural Networks (GCN) bas...
07/28/2020

Real-Time Neural Network Scheduling of Emergency Medical Mask Production during COVID-19

During the outbreak of the novel coronavirus pneumonia (COVID-19), there...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Successfully running complex graph-based applications, ranging from edge-cloud processing in IoT systems Cheng et al. (2019)-Kaur et al. (2018) to processing astronomical observations Garcia-Piquer, A. et al. (2017), heavily relies on executing all sub-components of such applications through an efficient task-scheduling. Not only does efficient task scheduling play a crucial role in improving the utilization of computing resources and reducing the required time to executing tasks, it can also lead to significant profits to service providers Barroso et al. (2013). In this framework, any application consists of multiple tasks with a given inter-task data dependency structure, i.e. each task generates inputs for certain other tasks. Such dependencies can be expressed via a directed acyclic graph (DAG), also known as task graph, where vertices and edges represent tasks and inter-task data dependencies, respectively. An input job for an application is completed once all the tasks are executed by compute machines according to the inter-task dependencies.

There are two commonly used metrics for schedulers to optimize: makespan and throughput. The required time to complete all tasks for a single input is called the makespan. The maximum steady state rate at which inputs can be processed in a pipelined manner is called throughput. Makespan minimization and throughput maximization can each be achieved through relevant efficient task-scheduling algorithms that assign tasks to appropriate distributed computing resources to be executed.

The underlying methodology for task scheduling can be categorized into heuristic-based (e.g. Eskandari et al. (2018)- Pham et al. (2017)), meta-heuristic ones (e.g. Kennedy and Eberhart (1997),Addya et al. (2017)-Fan et al. (2013), Shishido et al. (2018),Dasgupta et al. (2013),Izadkhah and Li (2019)), and optimization-based schemes (e.g. Azar and Epstein (2005)-Skutella (2001)).

One of the most well-known heuristic scheduling schemes for makespan minimization is the heterogeneous earliest-finish time (HEFT) algorithm Topcuoglu et al. (2002), which will be considered as one of our benchmarks. For throughput maximization, we benchmark against the algorithm presented in Gallet et al. (2009) which we refer to as TP-HEFT.

One of the fundamental disadvantages of all the above-mentioned scheduling schemes is that they work well only in relatively small settings; once a task graph becomes large or extremely large, they require very long computation times. We anticipate that applications in many domains, such as IoT for smart cities will result in increasingly complex applications with numerous inter-dependent tasks, and scheduling may need to be repeated quite frequently in the presence of network or resource dynamics Dustdar et al. (2017); Syed et al. (2021). Therefore, it is essential to design a faster method to schedule tasks for such large-scale task graphs.

A promising alternative is to apply machine learning techniques for function approximation to this problem, leveraging the fact that scheduling essentially has to do with finding a function mapping tasks to compute machines. Given the graph structure of applications, we propose to use an appropriate graph convolutional network (GCN)

Kipf and Welling (2017) to schedule tasks through learning the inter-task dependencies of the task graph as well as network settings (i.e., execution speed of compute machines and communication bandwidth across machines) in order to extract the relationship between different entities. The GCN has attracted significant attention in the literature for its ability in addressing many graph-based applications to perform semi-supervised link prediction Zhang and Chen (2018) and node classification Kipf and Welling (2017)

. The idea behind GCN is to construct node embeddings layer by layer. In each layer, a node embedding is achieved by aggregating its neighbors’ embeddings, followed by a neural network (i.e. a linear transformations and nonlinear activation). In case of node classification, the last layer embedding is given to a softmax operator to predict node labels, and consequently the parameters of GCN can be learned in an end-to-end manner. In general, there are two types of GCNs, namely spectral-based GCNs 

Defferrard et al. (2016) and special-based ones Hamilton et al. (2017). To obtain node embedding, the former requires matrix decomposition of Laplacian matrix (which results in scalability issues due to non-linear computation complexity of the decomposition) while the latter does not have such complexity thanks to the idea of message-passing.

To the best of our knowledge, there is no prior work that has proposed a pure spatial-based GCN, incorporated with carefully-designed the features of both nodes and edges for task graphs, to perform scheduling over distributed computing systems.

The main contributions of this paper are as follows:

  • We propose GCNScheduler, which can quickly schedules tasks by carefully integrating a task graph with network settings into a single input graph and feeding it to an appropriate GCN model.

  • Any existing scheduling algorithm can be used as a teacher to train GCNScheduler, for any metric. We illustrate this by training GCNScheduler using HEFT Topcuoglu et al. (2002) for makespan minimization, and TP-HEFT Gallet et al. (2009) for throughput maximization.

  • We evaluate the performance of our proposed scheme and show that, not only can our GCNScheduler be trained in a very short period of time111For instance, it takes around 15 seconds to train a graph with 8,000 nodes., it also gives scheduling performance comparable to the teacher algorithm. We show our approach gives comparable or better scheduling performance in terms of makespan with respect to HEFT and throughput with respect to TP-HEFT, respectively.

  • We show that GCNScheduler is several orders of magnitude faster than previous heuristic algorithms in obtaining the schedule for a given task graph. For example, for makespan minimization, GCNScheduler schedules 50-node task graphs in about 4 milliseconds while HEFT takes more than 1500 seconds; and for throughput maximization, GCNScheduler schedules 100-node task graphs in about 3.3 milliseconds, compared to about 6.9 seconds for TP-HEFT.

  • We show that GCNScheduler is able to efficiently perform scheduling for any size task graph. In particular, we show that our proposed scheme is able to operate over large-scale task graphs where existing schemes require excessive computational resources.

1.1 Related Work

Task scheduling can be categorized into multiple groups from different perspectives. One way of categorizing task scheduling schemes has to do with the type of algorithms used for assigning tasks to compute resources. Heuristic, meta-heuristic, and optimization-based are three categories of task scheduling schemes. Heuristic task scheduling schemes can be divided into quite a few sub categories based on their approach, such as load balancing Ren et al. (2012),Bhatia et al. (2012),Kumar and Sharma (2017), priority-based scheduling Topcuoglu et al. (2002),Sudarsan and Ribbens (2016),Dubey et al. (2015),Chrétienne et al. (1995), task duplication Ishfaq Ahmad and Yu-Kwong Kwok (1998), and clustering Palis et al. (1996).

Since heuristic algorithms may sometimes perform poorly compared to optimal task scheduling, meta-heuristic (e.g. Particle Swarm Optimization

Kennedy and Eberhart (1997), Simulated Annealing Addya et al. (2017)-Fan et al. (2013), Genetic-based approach Shishido et al. (2018),Dasgupta et al. (2013),Izadkhah and Li (2019)) and optimization-based schemes (e.g. Azar and Epstein (2005)-Skutella (2001)), which aim at approximating the NP-hard optimization of task scheduling, have attracted significant attention. However, all the above-mentioned heuristic, meta-heuristic and optimization-based schemes tend to run extremely slowly as number of tasks becomes large due to iterative nature of these methods, which requires excessive computations. Moreover, this issue makes the aforementioned schemes unable to handle large-scale task graphs.

As obtaining the optimal scheduler is basically the same as finding an appropriate mapper function, which maps tasks to compute machines, machine-learning based scheduling has begun emerging as an alternative thanks to advances in fundamental learning methods, such as deep learning 

Goodfellow et al. (2016)

and reinforcement learning (RL) 

Sutton and Barto (2018). Sun et. al. proposed DeepWave Sun et al. (2020), a scheduler which reduces job completion time using RL while specifying a priority list 222Which indicates the scheduling priority of edges in a job DAG. as the action and the completion time of a job DAG as the reward. Furthermore, Decima Mao et al. (2019) schedules tasks over a spark cluster by training a neural network using RL with scheduling the next task for execution as the action and a high-level scheduling objective of minimizing the makespan as the reward. The aforementioned RL-based schemes suffer from having a huge action space (i.e., the space of scheduling decisions).

While Decima Mao et al. (2019) only operates in homogeneous environment, Luo et. al. proposed Lachesis Luo et al. (2021) to operate over heterogeneous environment. Lachesis combines three different components, a GCN, an RL policy network, and a heuristic task mapper. There are three main differences from our work with respect to their use of GCN: first, they use the GCN to embed task nodes only without taking network settings into account as we do; second, they use a regular GCN which does not explicitly account for directed nodes while we use an EDGNN Jaume et al. (2019) which does; and finally, the GCN in Lachesis does not do scheduling (only task node embedding), whereas we are the first to propose to use a GCN directly for task scheduling.

The remaining of the paper is organized as follows: In the next section, we elaborate upon the problem formulation. In section 3, we overview GCNs and explain in detail on how our proposed scheme works. Finally, in section 4, we show the numerical results on the performance of our proposed scheme against well-known approaches.

2 Problem Statements

We now elaborate upon formally representing the minimization of makespan and the maximization of throughput as optimization problems. Every application/job is comprised of inter-task data dependencies. In order to finish a job, all its tasks require to be executed at least on a single compute machine. As far as compute machines are concerned, they are interconnected via communication links.

Before expressing the definition of makespan and throughput, let us explain about task dependencies, referred to as task graph, and network settings.

Task Graph: Since there are dependencies across different tasks, meaning that a task generates inputs for certain other tasks, we can model this dependency through a DAG as depicted in Fig. 1. Suppose we have tasks with a given task graph where and respectively represent the set of vertices and edges (task dependencies) with if task generates inputs for task

. Let us define vector

as the amount of computations required by tasks. For every tasks and , where , task produces amount of data for task after being executed by a machine.

Figure 1: An example of task graph, which is in the form of a DAG, with eight tasks. For instance, task requires tasks , , and to be first executed and generate their outputs before executing task .

Network Settings: Each task is required to be executed on a compute node (machine) which is connected to other compute nodes (machines) through communication links (compute node and machine are interchangeably used in this paper). Let us suppose to have compute nodes . Regarding the execution speed of compute nodes, we consider vector as the executing speed of machines. The communication link delay between any two compute nodes can be characterized by bandwidth. Let us denote as the communication bandwidth of the link from compute node to compute node . In case of two machines not being connected to each other, we can assume the corresponding bandwidth is zero (infinite time for communication delay).

In general, a task-scheduling scheme maps tasks to compute nodes according to a given objective. Formally speaking, a task scheduler can be represented as a function where task , , is assigned to machine . We next present two well-known objectives, namely the makespan-minimization and the throughput-maximization.

Objective 1:

The first objective function for the task assignment we consider is the makespan-minimization. In particular, we need to find a scheduler that assigns tasks to compute machines such that the resulting makespan is minimized. Our proposed scheme aims at obtaining such scheduler by utilizing a carefully-designed GCN where it is able to classify tasks into machines

333Each machine represents a class in our problem..

Before formally defining the makespan, we need to define Earliest Start Time (EST), Earliest Finish Time (EFT), Actual Start Time (AST), and Actual Finish Time (AFT) as follows:

Definition 1: denotes the earliest execution start time for task being executed on compute node . Note that .

Definition 2: denotes the earliest execution finish time for task being executed on compute node .

Definition 3: and denote the actual start time and the actual finish time of task .

Regarding the computations of the aforementioned definitions for each task, one can recursively compute them starting from task according to the following formula Topcuoglu et al. (2002):

(1)

where and indicate the earliest time at which compute node is ready to execute a task.

Definition 4 (Makespan): After all tasks are assigned to compute nodes for execution, the actual time for completion of a job is equal to the actual finish time of the last task. Therefore, the makespan can be represented as

(2)

Objective 2: The second objective function that we consider for task-scheduling is the throughput maximization. Unlike makespan which is the overall execution time for a given input, the throughput stands for the average number of inputs that can be executed per unit-time in steady state. By assuming the number of inputs to be infinite and denoting as the number of inputs completely executed by a scheduler at time , the throughput would be . In Gallet et al. (2009), authors showed that the following definition characterize the throughput of a scheduler.

Definition 5 (Throughput Gallet et al. (2009)): For a given task-assignment, the throughput of a scheduler is where is the time taken by any resource to execute an input, and it can be written as

(3)

with

  • : representing the computation time of compute machine for a single input (i.e. ),

  • : representing the time taken by compute machine for outgoing interface (i.e. where and respectively indicate amount of data transferred from compute machine to and maximum outgoing bandwidth of compute machine ),

  • : representing the time taken by compute machine for incoming interface (i.e. where indicates the maximum incoming bandwidth of compute machine ),

  • : representing the communication time taken to transfer data from compute machine to compute machine (i.e. ).

Remark: other objectives could also be considered in the future.

3 Proposed GCNScheduler

We present a novel machine-learning based task scheduler which can be trained with respect to aforementioned objectives. Since the nature of task-scheduling problem has to do with graphs (i.e. task graph), it is essential to utilize a machine-learning approach designed for capturing the underlying graph-based relationships of the task-scheduling problem. To do so, we employ a suitable GCNJaume et al. (2019), in order to obtain a model which can automatically assign tasks to compute machines. We need to point out there is no prior GCN-based scheduling scheme incorporated with both carefully-designed features of nodes and edges. This novel idea has significant advantages over the conventional scheduling schemes. First, it can remarkably reduce the computational complexity compared to previous scheduling algorithms. Second, after training an appropriate GCN, our scheme can handle any large-scale task graph while conventional schemes severely suffer from the scalability issue.

3.1 Overview of Graph Neural Networks

Conceptually speaking, the idea of GCN is closely related to neighborhood-aggregation encoder algorithm Hamilton et al. (2018). However, instead of aggregating information from neighbors, the intuition behind GCNs is to view graphs from the perspective of message passing algorithm between nodes Scarselli et al. (2009). An illustration of message passing is depicted in Fig. 2. In the GCN framework, every node is initialized with an embedding which is the same at its feature vector. At each layer of the GCN algorithm, nodes take the average of neighbor messages and apply a neural network on that as follows

(4)

where , , , and respectively represent the hidden vector of node at layer , weight matrix at layer for self node, weight matrix at layer

for neighboring nodes, and a non-linear function (e.g. ReLU). Furthermore,

indicates the neighbors of node . After

-layers of neighborhood aggregation, we get output embedding for each node. We can use these embeddings along with any loss function and running stochastic gradient descent to train GCN model parameters.

Figure 2: An illustration of message-passing for the input graph shown on the left side. Regarding the notation, represents node A’s feature. Each square box indicates a deep neural network and arrows shows the average messages from neighbors.

The above scheme is suitable for undirected graphs where edges show reciprocal relationship between ending nodes.

3.2 Overview of GCNs for Directed Graphs

In real applications such as social media where relations between nodes are not reciprocal, using regular GCN-based schemes might not be useful. An alternative for this type of situations is to utilize schemes deisgned for directed graphs such as EDGNN Jaume et al. (2019) where incoming and outgoing edges are treated differently in order to capture nonreciprocal relationship between nodes. In other words, EDGNN considers different weights for outgoing and incoming edges in addition to weights set for neighboring nodes. In particular, the embedding of node would be as follows:

(5)

where , , , and represent weight matrices of layer for embedding of self node, neighboring nodes, incoming edges, and outgoing edges, respectively. Moreover, and respectively denote embedding of node and the embedding of the edge from node to node at layer .

3.3 Proposed Input Graph

In order to train an EDGNN-based model, we need to carefully design the input graph components, namely adjacency matrix, nodes’ features, edges’ features, and labels. It should be noted that our scheme is not tailored to a particular criterion as we will show later that it can learn from two scheduling schemes with different objectives. Our deigned input graph can be fed into the EDGNN and the model will be trained according to labels generated from a given scheduling scheme. We next explain how we carefully design the input graph.

Designed Input Graph: We start from the original task graph and consider the same set of nodes and edges for our input graph as the task graph. In other words, by representing the input graph as , we have and . The crucial part for having an efficacious GCN-based scheduler has to do with carefully designing the features of nodes and edges as well as the labels.

The feature of node , , is denoted by and it has the following -dimension features:

The intuition behind is that these features represent the required computational time of task across all compute machines.

The feature of edge , , is denoted by and it has the following -dimension features:

The intuition behind is that these features represent the required time for transferring the result of executing task to the following task across all possible pair-wise compute machines. An illustration of our designed input graph for the task graph of Fig. 1 is depicted in Fig. 3.

Objective-Dependent Labeling: Based on what task scheduler our method should learn from (which we refer to as the “teacher” scheduler, namely, HEFT for makespan minimization and TP-HEFT for throughput maximization), we label all nodes as well as edges. Let us define and as labels of node and edge , respectively. Regarding nodes’ labeling, we consider the label of node , , as the index of compute node that the teacher algorithm assigns task to run on.

Thus, for makespan minimization, we have that: where is the mapper function of HEFT algorithm.

And for throuhgput maximization, we have that:

where is the mapper function of the TP-HEFT algorithm.

Finally, we label each edge according to the label of the ending vertex it is from. In other words, such that . We should note that this edge-labeling is crucial in enforcing the model to learn to label out-going edges of a node with same label as its corresponding node’s label.

3.4 Implementation

As far as the model parameters are concerned, we consider a 4-layer EDGCN with 128 nodes per layer with activation function and dropout. Since we suppose nodes and edges have features, we let both nodes and edges to be embedded.

For training our model, we need to create a sufficiently large graph. However, since the HEFT and TP-HEFT algorithms are extremely slow in performing task-scheduling for large-scale task graphs, obtaining labels (i.e. determining the machine each task need to be executed on) for a single large graph is cumbersome. Therefore, we create a large graph by taking the union of disjoint medium-size graphs ’s (i.e., ) such that HEFT and TP-HEFT can handle scheduling tasks over each of them. Regarding splitting dataset, we consider , , and of the disjoint graphs for training, validation, and test, respectively. This allows us to apply the teacher algorithms to train our GCNScheduler for even larger graphs than the teacher algorithms themselves can handle efficiently.

Figure 3: An illustration of designed input graph fed into the EDGNN model for the example of the task graph shown in Fig. 1. Node and edge features are represented with brown and green colors, respectively.

4 Experimental Results

In this section, we evaluate the performance of our proposed GCNScheduler in terms of two criteria, namely the makespan minimization and the throughput maximization, for various task graphs (medium-scale and large-scale task graphs as well as the task graphs of three real perception applications considered in Ra et al. (2011)). For each criterion, we measure the performance of GCNScheduler as well as the time it takes to assign tasks and compare these values with the corresponding values of our benchmarks (i.e. HEFT/TP-HEFT and the random task-scheduler). We evaluate all schemes by running them on our local cluster which has 16 CPUs (with 8 cores and 2 threads per core) of Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz.

As far as network settings are concerned, the computation amount of tasks, the execution speed of compute machines, and the communication bandwidth are drawn randomly from uniform distributions. For simplicity, we assume each task produces the same amount of data after being executed. Regarding task graphs, we generate random DAGs in two different ways:

establishing an edge between any two tasks with a given probability (which we call

edge probability (EP)), then pruning444Only keep edges between nodes and if . them such that it forms a DAG, specifying the width and depth of the graph, then randomly selecting successive tasks for each task.

4.1 Makespan Minimization

For training the model, since the teacher scheduler, i.e. the HEFT algorithm Topcuoglu et al. (2002), is extremely slow in generating labels for large task graphs555We observed that HEFT is incapable of conducting task-assignment running on a commodity PC when the task graph become large, on the order of 100 nodes, see table 1 for the increasing trend in compute time with task graph size., we create sufficient number (approximately 400) of random medium-size task graphs (i.e. each has 50 nodes with either an =0.25 or a width and depth of 5 and 10, respectively) and label tasks for each of these medium-size task graphs according to the HEFT algorithm Topcuoglu et al. (2002). By doing so, we create a single large-scale graph which is the union of disjoint medium-size graphs. On the significance of our proposed scheme, it only takes seconds to train our model with such a large graph using just CPU’s (no GPU’s or other specialized hardware). After training the model, we consider both medium-scale and large-scale task graphs as input samples. Then our model labels the tasks and determines what machine will execute each task. We next measure the performance of GCNScheduler over medium-scale and large-scale task graphs as well as the task graphs of the three real perception applications provided in Ra et al. (2011).

4.1.1 Medium-scale task graphs

Fig. 4 shows the average makespan of GCNScheduler (with the makespan-minimization objective) compared to HEFT Topcuoglu et al. (2002) and the random task-scheduler for medium-size task graphs with different number of tasks. Our model significantly outperforms the random task-scheduler and considerably improves the makespan compared to HEFT, specially as number of tasks increases. The accuracy of GCNScheduler is around with respect to replicating the schedules produced by HEFT; however, we should note that ultimately the makespan of executing all tasks is more important rather than the accuracy of labeling tasks, and in this regard GCNScheduler does better than HEFT. To gain some intuitive understanding for why GCNSchduler outperforms HEFT in terms of makespan, we manually examined and observed the schedules produced by GCNScheduler and HEFT for many scenarios. We found that the HEFT is sometimes unable to prevent assigning tasks to machines with poor communication bandwidth while GCNScheduler is able to learn to do so more consistently. We believe this is due to the carefully-designed features of edges in the input to GCNScheduler which explicitly take communication bandwidth between machines into account.

Figure 4: Makespan of GCNScheduler, HEFT Topcuoglu et al. (2002), and the random scheduler in small settings for different number of tasks with .

The time taken to assign tasks to compute machines for both GCNScheduler and HEFT Topcuoglu et al. (2002) is presented in Table 1. As one can easily see, our GCNScheduler outperforms HEFT by 3-7 orders of magnitude. This clearly shows GCNScheduler is a game-changer for task scheduling.

20 30 40 50
GCNSch. 0.0026 0.0027 0.0029 0.0037
HEFT 0.6764 9.4330 117.35 1552.0
Table 1: Time taken (in seconds) by GCNScheduler and HEFT to perform scheduling for medium-scale task graphs with different number of tasks.
Figure 5: Makespan of GCNScheduler and the random scheduler for large-scale task graphs with different number of tasks and different .

4.1.2 Large-scale task graphs

In addition to the promising results on the medium-size settings, we now focus on large-scale task graphs where HEFT algorithm Topcuoglu et al. (2002) is extremely slow to operate; hence we compare the performance of our GCNScheduler with only possible benchmark which is the random task-scheduler. Fig. 5 shows the average makespan of our proposed GCNScheuler (top plot) and the random task-scheduler (bottom plot) in large-scale settings where number of tasks varies from 3,500 to 5,000 and the edge probability (i.e. ) takes 0.005, 0.01, and 0.02. One can easily observe that our proposed GCNScheduler significantly reduces makespan by a factor of 8 (for larger ). The intuition behind the significant gain for larger (i.e. node’s degrees are larger) is that some tasks may require more predecessor tasks to be executed in advance (because of having larger nodes’ degree), hence randomly assigning tasks may potentially assign one of the predecessor task to a machine with poor computing power or communication bandwidth, resulting in a larger average makespan. However, GCNScheuler efficiently exploits inter-task dependencies as well as network settings information (i.e. execution speed of machines and communication bandwidth across machines) through carefully-designed node and edge features; therefore it leads to a remarkably lower makespan.

Figure 6: Inference time of our GCNScheduler for large-scale task graphs with different number of tasks and different .

Finally, Fig. 6 illustrates the inference time (i.e. time taken to assigning tasks to compute machines) of our proposed GCNScheduler for different number of tasks and different . Our GCNScheduler takes milliseconds to obtain labels for each of these large-scale task graphs. This clearly shows the great advantage of our proposed scheme which makes it an ideal alternative to state-of-the-art scheduling schemes which fail to efficiently operate over complicated jobs each of which may have thousands of tasks with any inter-task dependencies.

Figure 7: Makespan of GCNScheduler (with the makespan-minimization objective), HEFT, and the random task-scheduler for the three real perception applications considered in Ra et al. (2011).
Table 2: Time taken (in milliseconds) by GCNScheduler and HEFT Topcuoglu et al. (2002) to perform scheduling for the task graph of the three real perception applications considered in Ra et al. (2011).

4.1.3 Real Application Benchmarks

To show the significance of our proposed scheme on real applications, we consider the three real perception applications provided in Ra et al. (2011)

, namely the face recognition, the object-and-pose recognition, and the gesture recognition with corresponding task graphs depicted in Fig.

8 (Please see Ra et al. (2011) for more detail regarding each task). We measure the makespan of each application by running GCNScheduler (with the makespan-minimization objective) over the carefully-designed input graphs (obtained from original task graphs). Fig. 7 illustrates the makespan of GCNScheduler (with the makespan-minimization objective) against HEFT Topcuoglu et al. (2002) and the random task-scheduler for the three perception applications. While our GCNScheduler leads to a slightly better makespan performance compared to HEFT Topcuoglu et al. (2002), it significantly (by 3-4 orders of magnitude) reduces the time taken to perform scheduling as it is shown in Table 2. This remarkable time reduction demonstrates the importance of our scheme to other applications (in business or security) used/built upon these perception applications.

4.2 Throughput Maximization

For the purpose of maximizing the throughput, we use the TP-HEFT algorithm Gallet et al. (2009) as the teacher scheduler for training our GCNScheduler. Since the TP-HEFT scheduler Gallet et al. (2009), similar to the HEFT algorithm Topcuoglu et al. (2002), excessively becomes slow in generating labels for large task graphs, we create sufficient number of random medium-size task graphs (i.e. each of which has around 40 tasks with the width and depth of 5 and 8, respectively) and label tasks according to the TP-HEFT algorithm Gallet et al. (2009). We then build a single large-scale graph, which is the union of disjoint medium-size graphs, and train GCNScheduler with the throughput-maximization objective. Similarly, we test our GCNScheduler over medium-scale and large-scale task graphs as well as the task graph of the three perception applications.

4.2.1 Medium-scale task graphs

Table 3 shows the throughput of GCNScheduler (with the throughput-maximization objective) compared to TP-HEFT Gallet et al. (2009) and the random task-schedulers for medium-size task graphs with different number of tasks. GCNScheduler leads to slightly higher throughput compared to TP-HEFT Gallet et al. (2009) scheduler, while it significantly outperforms random task-scheduler. Table 4 also shows the time taken to schedule tasks. Moreover, the accuracy of our model is around .

100 200 300 400
GCNSch. 3.1254 3.1251 3.1261 3.1185
TP-HEFT 2.1731 2.1690 1.8034 2.0046
Random 0.0193 0.01801 0.0174 0.0177
Table 3: Throughput of GCNScheduler (with the throughput-maximization objective), Throughput(TP)-HEFT algorithm, and the random task-scheduler for medium-size task graphs with different number of tasks.
100 200 300 400
GCNSch. 0.0033 0.0049 0.0071 0.0078
TP-HEFT 6.9235 27.229 70.940 115.221
Table 4: Time taken (in seconds) by GCNScheduler (with the throughput-maximization objective) and TP-HEFT to schedule for medium-size task graphs with different number of tasks.
Figure 8: Task graphs of the three perception application considered in Ra et al. (2011).

4.2.2 Large-scale task graphs

Since TP-HEFT Gallet et al. (2009) and other existing scheduling schemes are extremely slow for very large task graphs (e.g. task graphs with few thousands tasks), we only compare the throughput of SCNScheduler against the random task-scheduler, as shown in Table 5. Further, Table 6 shows the time taken for assigning tasks to compute nodes for large-scale task graphs. One can easily see GCNScheduler (with the throughput-maximization objective) is remarkably fast while handling large-scale task graphs.

3,500 4,000 4,500 5,000
GCNSch. 2.9737 2.9731 2.9733 2.9734
Random 0.6344 0.6336 0.6332 0.6331
Table 5: Throughput of GCNScheduler (with the throughput-maximization objective) and the random task-scheduler for large-scale task graphs.
3,500 4,000 4,500 5,000
GCNSch. 66.050 74.978 83.817 87.388
Table 6: Time taken (in milliseconds) by GCNScheduler (with throughput objective) to schedule.

4.2.3 Real Application Benchmarks

We now evaluate the throughput of our GCNScheduler, given the task graph of the three real perception applications provided in Ra et al. (2011), namely the face recognition, the object-and-pose recognition, and the gesture recognition. In particular, we run our trained GCNScheduler (with the throughput-maximization objective) over the carefully-designed input graphs and measure the throughput for each application. Fig. 9 shows the throughput of our GCNScheduler (with the throughput-maximization objective) compared to TP-HEFT Gallet et al. (2009) and the random task-scheduler for the three perception applications. While our GCNScheduler leads to a marginally better throughput performance compared to TP-HEFT scheduler Gallet et al. (2009), it significantly (2-3 orders of magnitude) reduces the time taken to perform task-assignment as it is shown in Table 7.

Face Recog. Pose Recog. Gesture Recog.
GCNSch. 0.488 0.511 0.560
TP-HEFT 87.34 257.8 290.1
Table 7: Time taken (in milliseconds) by GCNScheduler (with the throughput-maximization objective) and TP-HEFT to perform scheduling for the task graph of the three real perception applications.
Figure 9: Throughput of GCNScheduler (with throughput objective), TP-HEFT, and random scheduler for the task graph of the three real perception applications.

5 Conclusion

We proposed GCNScheduler, a scalable and fast task-scheduling scheme which can perform scheduling according to different objectives (such as minimizing the makespan or maximizing the throughout). By evaluating our scheme against benchmarks through simulations, we show that not only can our scheme easily handle large-scale settings where existing scheduling schemes are unable to do, but also it can lead to a better performance with significant lower required time (i.e., several orders of magnitude faster) to perform scheduling. As our future direction, we aim at investigating the performance of our proposed GCNScheduler with respect to other objectives with other teacher schedulers.

Acknowledgments

This material is based upon work supported in part by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001117C0053 and by the Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196. Any views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

References

  • S. K. Addya, A. K. Turuk, B. Sahoo, M. Sarkar, and S. K. Biswash (2017) Simulated annealing based vm placement strategy to maximize the profit for cloud service providers. Engineering Science and Technology, an International Journal 20 (4), pp. 1249–1259. External Links: ISSN 2215-0986 Cited by: §1.1, §1.
  • Y. Azar and A. Epstein (2005) Convex programming for scheduling unrelated parallel machines. In

    Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing

    ,
    STOC ’05, New York, NY, USA, pp. 331–337. External Links: ISBN 1581139608 Cited by: §1.1, §1.
  • L. A. Barroso, J. Clidaras, and U. Hölzle (2013) The datacenter as a computer: an introduction to the design of warehouse-scale machines, second edition. External Links: Link Cited by: §1.
  • J. Bhatia, T. Patel, H. Trivedi, and V. Majmudar (2012) HTV dynamic load balancing algorithm for virtual machine instances in cloud. In 2012 International Symposium on Cloud and Services Computing, Vol. , pp. 15–20. External Links: Document Cited by: §1.1.
  • N. Cheng, F. Lyu, W. Quan, C. Zhou, H. He, W. Shi, and X. Shen (2019) Space/aerial-assisted computing offloading for iot applications: a learning-based approach. IEEE Journal on Selected Areas in Communications 37 (5), pp. 1117–1129. External Links: Document Cited by: §1.
  • P. Chrétienne, E.G. Coffman, J.K. Lenstra, and Z. Liu (Eds.) (1995) Scheduling theory and its applications. Wiley, United States (English). External Links: ISBN 0-471-94059-3 Cited by: §1.1.
  • K. Dasgupta, B. Mandal, P. Dutta, J. K. Mandal, and S. Dam (2013)

    A genetic algorithm (ga) based load balancing strategy for cloud computing

    .
    Procedia Technology 10, pp. 340–347. Note: First International Conference on Computational Intelligence: Modeling Techniques and Applications (CIMTA) 2013 External Links: ISSN 2212-0173 Cited by: §1.1, §1.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Red Hook, NY, USA, pp. 3844–3852. External Links: ISBN 9781510838819 Cited by: §1.
  • K. Dubey, M. Kumar, and M. A. Chandra (2015) A priority based job scheduling algorithm using iba and easy algorithm for cloud metaschedular. In 2015 International Conference on Advances in Computer Engineering and Applications, Vol. , pp. 66–70. External Links: Document Cited by: §1.1.
  • S. Dustdar, S. Nastić, and O. Šćekić (2017) Smart cities. In The Internet of Things, People and Systems, Cited by: §1.
  • L. Eskandari, J. Mair, Z. Huang, and D. Eyers (2018) Iterative scheduling for distributed stream processing systems. In Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems, DEBS ’18, New York, NY, USA, pp. 234–237. External Links: ISBN 9781450357821, Link, Document Cited by: §1.
  • Z. Fan, H. Shen, Y. Wu, and Y. Li (2013) Simulated-annealing load balancing for resource allocation in cloud environments. In 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies, Vol. , pp. 1–6. External Links: Document Cited by: §1.1, §1.
  • M. Gallet, L. Marchal, and F. Vivien (2009) Efficient scheduling of task graph collections on heterogeneous resources. In IEEE International Symposium on Parallel Distributed Processing, Vol. , pp. 1–11. External Links: Document Cited by: 2nd item, §1, §2, §2, §4.2.1, §4.2.2, §4.2.3, §4.2.
  • Garcia-Piquer, A., Morales, J. C., Ribas, I., Colomé, J., Guàrdia, J., Perger, M., Caballero, J. A., Cortés-Contreras, M., Jeffers, S. V., Reiners, A., Amado, P. J., Quirrenbach, A., and Seifert, W. (2017) Efficient scheduling of astronomical observations - application to the carmenes radial-velocity survey. A&A 604, pp. A87. External Links: Document, Link Cited by: §1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §1.1.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, pp. 1025–1035. External Links: ISBN 9781510860964 Cited by: §1.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2018) Inductive representation learning on large graphs. External Links: 1706.02216 Cited by: §3.1.
  • Ishfaq Ahmad and Yu-Kwong Kwok (1998) On exploiting task duplication in parallel program scheduling. IEEE Transactions on Parallel and Distributed Systems 9 (9), pp. 872–892. External Links: Document Cited by: §1.1.
  • H. Izadkhah and Y. Li (2019) Learning based genetic algorithm for task graph scheduling. Appl. Comp. Intell. Soft Comput. 2019. External Links: ISSN 1687-9724, Link, Document Cited by: §1.1, §1.
  • G. Jaume, A. Nguyen, M. R. Martínez, J. Thiran, and M. Gabrani (2019) EdGNN: a simple and powerful gnn for directed labeled graphs. External Links: 1904.08745 Cited by: §1.1, §3.2, §3.
  • K. Kaur, S. Garg, G. S. Aujla, N. Kumar, J. J. P. C. Rodrigues, and M. Guizani (2018) Edge computing in the industrial internet of things environment: software-defined-networks-based edge-cloud interplay. IEEE Communications Magazine 56 (2), pp. 44–51. External Links: Document Cited by: §1.
  • J. Kennedy and R. C. Eberhart (1997) A discrete binary version of the particle swarm algorithm. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Vol. 5, pp. 4104–4108 vol.5. External Links: Document Cited by: §1.1, §1.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. External Links: 1609.02907 Cited by: §1.
  • M. Kumar and S.C. Sharma (2017) Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing. Procedia Computer Science 115, pp. 322–329. Note: 7th International Conference on Advances in Computing and Communications, ICACC-2017, 22-24 August 2017, Cochin, India External Links: ISSN 1877-0509 Cited by: §1.1.
  • J. Luo, X. Li, M. Yuan, J. Yao, and J. Zeng (2021) Learning to optimize DAG scheduling in heterogeneous environment. CoRR abs/2103.06980. External Links: Link, 2103.06980 Cited by: §1.1.
  • H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh (2019) Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM ’19, New York, NY, USA, pp. 270–288. External Links: ISBN 9781450359566, Link, Document Cited by: §1.1, §1.1.
  • M. A. Palis, J. Liou, and D. S. L. Wei (1996) Task clustering and scheduling for distributed memory parallel architectures. IEEE Trans. Parallel Distributed Syst. 7 (1), pp. 46–55. External Links: Link, Document Cited by: §1.1.
  • X. Pham, N. D. Man, N. D. T. Tri, N. Q. Thai, and E. Huh (2017) A cost- and performance-effective approach for task scheduling based on collaboration between cloud and fog computing. International Journal of Distributed Sensor Networks 13 (11), pp. 1550147717742073. External Links: Document, Link, https://doi.org/10.1177/1550147717742073 Cited by: §1.
  • M. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govindan (2011) Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, New York, NY, USA, pp. 43–56. External Links: ISBN 9781450306430, Link, Document Cited by: Figure 7, Figure 8, §4.1.3, §4.1, §4.2.3, Table 2, §4.
  • H. Ren, Y. Lan, and C. Yin (2012) The load balancing algorithm in cloud computing environment. In Proceedings of 2012 2nd International Conference on Computer Science and Network Technology, Vol. , pp. 925–928. External Links: Document Cited by: §1.1.
  • F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. External Links: Document Cited by: §3.1.
  • H. Y. Shishido, J. C. Estrella, C. F. M. Toledo, and M. S. Arantes (2018) Genetic-based algorithms applied to a workflow scheduling algorithm with security and deadline constraints in clouds. Computers and Electrical Engineering 69, pp. 378–394. External Links: ISSN 0045-7906 Cited by: §1.1, §1.
  • M. Skutella (2001) Convex quadratic and semidefinite programming relaxations in scheduling. J. ACM 48 (2), pp. 206–242. External Links: ISSN 0004-5411, Link, Document Cited by: §1.1, §1.
  • R. Sudarsan and C. J. Ribbens (2016) Combining performance and priority for scheduling resizable parallel applications. Journal of Parallel and Distributed Computing 87, pp. 55–66. External Links: ISSN 0743-7315 Cited by: §1.1.
  • P. Sun, Z. Guo, J. Wang, J. Li, J. Lan, and Y. Hu (2020) DeepWeave: accelerating job completion time with deep reinforcement learning-based coflow scheduling. In IJCAI, Cited by: §1.1.
  • R. S. Sutton and A. G. Barto (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA. External Links: ISBN 0262039249 Cited by: §1.1.
  • A. S. Syed, D. Sierra-Sosa, A. Kumar, and A. Elmaghraby (2021) IoT in smart cities: a survey of technologies, practices and challenges. Smart Cities 4 (2), pp. 429–475. Cited by: §1.
  • H. Topcuoglu, S. Hariri, and Min-You Wu (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13 (3), pp. 260–274. External Links: Document Cited by: 2nd item, §1.1, §1, §2, Figure 4, §4.1.1, §4.1.1, §4.1.2, §4.1.3, §4.1, §4.2, Table 2.
  • M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. External Links: 1802.09691 Cited by: §1.