1 Introduction
Successfully running complex graphbased applications, ranging from edgecloud processing in IoT systems Cheng et al. (2019)Kaur et al. (2018) to processing astronomical observations GarciaPiquer, A. et al. (2017), heavily relies on executing all subcomponents of such applications through an efficient taskscheduling. Not only does efficient task scheduling play a crucial role in improving the utilization of computing resources and reducing the required time to executing tasks, it can also lead to significant profits to service providers Barroso et al. (2013). In this framework, any application consists of multiple tasks with a given intertask data dependency structure, i.e. each task generates inputs for certain other tasks. Such dependencies can be expressed via a directed acyclic graph (DAG), also known as task graph, where vertices and edges represent tasks and intertask data dependencies, respectively. An input job for an application is completed once all the tasks are executed by compute machines according to the intertask dependencies.
There are two commonly used metrics for schedulers to optimize: makespan and throughput. The required time to complete all tasks for a single input is called the makespan. The maximum steady state rate at which inputs can be processed in a pipelined manner is called throughput. Makespan minimization and throughput maximization can each be achieved through relevant efficient taskscheduling algorithms that assign tasks to appropriate distributed computing resources to be executed.
The underlying methodology for task scheduling can be categorized into heuristicbased (e.g. Eskandari et al. (2018) Pham et al. (2017)), metaheuristic ones (e.g. Kennedy and Eberhart (1997),Addya et al. (2017)Fan et al. (2013), Shishido et al. (2018),Dasgupta et al. (2013),Izadkhah and Li (2019)), and optimizationbased schemes (e.g. Azar and Epstein (2005)Skutella (2001)).
One of the most wellknown heuristic scheduling schemes for makespan minimization is the heterogeneous earliestfinish time (HEFT) algorithm Topcuoglu et al. (2002), which will be considered as one of our benchmarks. For throughput maximization, we benchmark against the algorithm presented in Gallet et al. (2009) which we refer to as TPHEFT.
One of the fundamental disadvantages of all the abovementioned scheduling schemes is that they work well only in relatively small settings; once a task graph becomes large or extremely large, they require very long computation times. We anticipate that applications in many domains, such as IoT for smart cities will result in increasingly complex applications with numerous interdependent tasks, and scheduling may need to be repeated quite frequently in the presence of network or resource dynamics Dustdar et al. (2017); Syed et al. (2021). Therefore, it is essential to design a faster method to schedule tasks for such largescale task graphs.
A promising alternative is to apply machine learning techniques for function approximation to this problem, leveraging the fact that scheduling essentially has to do with finding a function mapping tasks to compute machines. Given the graph structure of applications, we propose to use an appropriate graph convolutional network (GCN)
Kipf and Welling (2017) to schedule tasks through learning the intertask dependencies of the task graph as well as network settings (i.e., execution speed of compute machines and communication bandwidth across machines) in order to extract the relationship between different entities. The GCN has attracted significant attention in the literature for its ability in addressing many graphbased applications to perform semisupervised link prediction Zhang and Chen (2018) and node classification Kipf and Welling (2017). The idea behind GCN is to construct node embeddings layer by layer. In each layer, a node embedding is achieved by aggregating its neighbors’ embeddings, followed by a neural network (i.e. a linear transformations and nonlinear activation). In case of node classification, the last layer embedding is given to a softmax operator to predict node labels, and consequently the parameters of GCN can be learned in an endtoend manner. In general, there are two types of GCNs, namely spectralbased GCNs
Defferrard et al. (2016) and specialbased ones Hamilton et al. (2017). To obtain node embedding, the former requires matrix decomposition of Laplacian matrix (which results in scalability issues due to nonlinear computation complexity of the decomposition) while the latter does not have such complexity thanks to the idea of messagepassing.To the best of our knowledge, there is no prior work that has proposed a pure spatialbased GCN, incorporated with carefullydesigned the features of both nodes and edges for task graphs, to perform scheduling over distributed computing systems.
The main contributions of this paper are as follows:

We propose GCNScheduler, which can quickly schedules tasks by carefully integrating a task graph with network settings into a single input graph and feeding it to an appropriate GCN model.

We evaluate the performance of our proposed scheme and show that, not only can our GCNScheduler be trained in a very short period of time^{1}^{1}1For instance, it takes around 15 seconds to train a graph with 8,000 nodes., it also gives scheduling performance comparable to the teacher algorithm. We show our approach gives comparable or better scheduling performance in terms of makespan with respect to HEFT and throughput with respect to TPHEFT, respectively.

We show that GCNScheduler is several orders of magnitude faster than previous heuristic algorithms in obtaining the schedule for a given task graph. For example, for makespan minimization, GCNScheduler schedules 50node task graphs in about 4 milliseconds while HEFT takes more than 1500 seconds; and for throughput maximization, GCNScheduler schedules 100node task graphs in about 3.3 milliseconds, compared to about 6.9 seconds for TPHEFT.

We show that GCNScheduler is able to efficiently perform scheduling for any size task graph. In particular, we show that our proposed scheme is able to operate over largescale task graphs where existing schemes require excessive computational resources.
1.1 Related Work
Task scheduling can be categorized into multiple groups from different perspectives. One way of categorizing task scheduling schemes has to do with the type of algorithms used for assigning tasks to compute resources. Heuristic, metaheuristic, and optimizationbased are three categories of task scheduling schemes. Heuristic task scheduling schemes can be divided into quite a few sub categories based on their approach, such as load balancing Ren et al. (2012),Bhatia et al. (2012),Kumar and Sharma (2017), prioritybased scheduling Topcuoglu et al. (2002),Sudarsan and Ribbens (2016),Dubey et al. (2015),Chrétienne et al. (1995), task duplication Ishfaq Ahmad and YuKwong Kwok (1998), and clustering Palis et al. (1996).
Since heuristic algorithms may sometimes perform poorly compared to optimal task scheduling, metaheuristic (e.g. Particle Swarm Optimization
Kennedy and Eberhart (1997), Simulated Annealing Addya et al. (2017)Fan et al. (2013), Geneticbased approach Shishido et al. (2018),Dasgupta et al. (2013),Izadkhah and Li (2019)) and optimizationbased schemes (e.g. Azar and Epstein (2005)Skutella (2001)), which aim at approximating the NPhard optimization of task scheduling, have attracted significant attention. However, all the abovementioned heuristic, metaheuristic and optimizationbased schemes tend to run extremely slowly as number of tasks becomes large due to iterative nature of these methods, which requires excessive computations. Moreover, this issue makes the aforementioned schemes unable to handle largescale task graphs.As obtaining the optimal scheduler is basically the same as finding an appropriate mapper function, which maps tasks to compute machines, machinelearning based scheduling has begun emerging as an alternative thanks to advances in fundamental learning methods, such as deep learning
Goodfellow et al. (2016)and reinforcement learning (RL)
Sutton and Barto (2018). Sun et. al. proposed DeepWave Sun et al. (2020), a scheduler which reduces job completion time using RL while specifying a priority list ^{2}^{2}2Which indicates the scheduling priority of edges in a job DAG. as the action and the completion time of a job DAG as the reward. Furthermore, Decima Mao et al. (2019) schedules tasks over a spark cluster by training a neural network using RL with scheduling the next task for execution as the action and a highlevel scheduling objective of minimizing the makespan as the reward. The aforementioned RLbased schemes suffer from having a huge action space (i.e., the space of scheduling decisions).While Decima Mao et al. (2019) only operates in homogeneous environment, Luo et. al. proposed Lachesis Luo et al. (2021) to operate over heterogeneous environment. Lachesis combines three different components, a GCN, an RL policy network, and a heuristic task mapper. There are three main differences from our work with respect to their use of GCN: first, they use the GCN to embed task nodes only without taking network settings into account as we do; second, they use a regular GCN which does not explicitly account for directed nodes while we use an EDGNN Jaume et al. (2019) which does; and finally, the GCN in Lachesis does not do scheduling (only task node embedding), whereas we are the first to propose to use a GCN directly for task scheduling.
The remaining of the paper is organized as follows: In the next section, we elaborate upon the problem formulation. In section 3, we overview GCNs and explain in detail on how our proposed scheme works. Finally, in section 4, we show the numerical results on the performance of our proposed scheme against wellknown approaches.
2 Problem Statements
We now elaborate upon formally representing the minimization of makespan and the maximization of throughput as optimization problems. Every application/job is comprised of intertask data dependencies. In order to finish a job, all its tasks require to be executed at least on a single compute machine. As far as compute machines are concerned, they are interconnected via communication links.
Before expressing the definition of makespan and throughput, let us explain about task dependencies, referred to as task graph, and network settings.
Task Graph: Since there are dependencies across different tasks, meaning that a task generates inputs for certain other tasks, we can model this dependency through a DAG as depicted in Fig. 1. Suppose we have tasks with a given task graph where and respectively represent the set of vertices and edges (task dependencies) with if task generates inputs for task
. Let us define vector
as the amount of computations required by tasks. For every tasks and , where , task produces amount of data for task after being executed by a machine.Network Settings: Each task is required to be executed on a compute node (machine) which is connected to other compute nodes (machines) through communication links (compute node and machine are interchangeably used in this paper). Let us suppose to have compute nodes . Regarding the execution speed of compute nodes, we consider vector as the executing speed of machines. The communication link delay between any two compute nodes can be characterized by bandwidth. Let us denote as the communication bandwidth of the link from compute node to compute node . In case of two machines not being connected to each other, we can assume the corresponding bandwidth is zero (infinite time for communication delay).
In general, a taskscheduling scheme maps tasks to compute nodes according to a given objective. Formally speaking, a task scheduler can be represented as a function where task , , is assigned to machine . We next present two wellknown objectives, namely the makespanminimization and the throughputmaximization.
Objective 1:
The first objective function for the task assignment we consider is the makespanminimization. In particular, we need to find a scheduler that assigns tasks to compute machines such that the resulting makespan is minimized. Our proposed scheme aims at obtaining such scheduler by utilizing a carefullydesigned GCN where it is able to classify tasks into machines
^{3}^{3}3Each machine represents a class in our problem..Before formally defining the makespan, we need to define Earliest Start Time (EST), Earliest Finish Time (EFT), Actual Start Time (AST), and Actual Finish Time (AFT) as follows:
Definition 1: denotes the earliest execution start time for task being executed on compute node . Note that .
Definition 2: denotes the earliest execution finish time for task being executed on compute node .
Definition 3: and denote the actual start time and the actual finish time of task .
Regarding the computations of the aforementioned definitions for each task, one can recursively compute them starting from task according to the following formula Topcuoglu et al. (2002):
(1)  
where and indicate the earliest time at which compute node is ready to execute a task.
Definition 4 (Makespan): After all tasks are assigned to compute nodes for execution, the actual time for completion of a job is equal to the actual finish time of the last task. Therefore, the makespan can be represented as
(2) 
Objective 2: The second objective function that we consider for taskscheduling is the throughput maximization. Unlike makespan which is the overall execution time for a given input, the throughput stands for the average number of inputs that can be executed per unittime in steady state. By assuming the number of inputs to be infinite and denoting as the number of inputs completely executed by a scheduler at time , the throughput would be . In Gallet et al. (2009), authors showed that the following definition characterize the throughput of a scheduler.
Definition 5 (Throughput Gallet et al. (2009)): For a given taskassignment, the throughput of a scheduler is where is the time taken by any resource to execute an input, and it can be written as
(3) 
with

: representing the computation time of compute machine for a single input (i.e. ),

: representing the time taken by compute machine for outgoing interface (i.e. where and respectively indicate amount of data transferred from compute machine to and maximum outgoing bandwidth of compute machine ),

: representing the time taken by compute machine for incoming interface (i.e. where indicates the maximum incoming bandwidth of compute machine ),

: representing the communication time taken to transfer data from compute machine to compute machine (i.e. ).
Remark: other objectives could also be considered in the future.
3 Proposed GCNScheduler
We present a novel machinelearning based task scheduler which can be trained with respect to aforementioned objectives. Since the nature of taskscheduling problem has to do with graphs (i.e. task graph), it is essential to utilize a machinelearning approach designed for capturing the underlying graphbased relationships of the taskscheduling problem. To do so, we employ a suitable GCNJaume et al. (2019), in order to obtain a model which can automatically assign tasks to compute machines. We need to point out there is no prior GCNbased scheduling scheme incorporated with both carefullydesigned features of nodes and edges. This novel idea has significant advantages over the conventional scheduling schemes. First, it can remarkably reduce the computational complexity compared to previous scheduling algorithms. Second, after training an appropriate GCN, our scheme can handle any largescale task graph while conventional schemes severely suffer from the scalability issue.
3.1 Overview of Graph Neural Networks
Conceptually speaking, the idea of GCN is closely related to neighborhoodaggregation encoder algorithm Hamilton et al. (2018). However, instead of aggregating information from neighbors, the intuition behind GCNs is to view graphs from the perspective of message passing algorithm between nodes Scarselli et al. (2009). An illustration of message passing is depicted in Fig. 2. In the GCN framework, every node is initialized with an embedding which is the same at its feature vector. At each layer of the GCN algorithm, nodes take the average of neighbor messages and apply a neural network on that as follows
(4) 
where , , , and respectively represent the hidden vector of node at layer , weight matrix at layer for self node, weight matrix at layer
for neighboring nodes, and a nonlinear function (e.g. ReLU). Furthermore,
indicates the neighbors of node . Afterlayers of neighborhood aggregation, we get output embedding for each node. We can use these embeddings along with any loss function and running stochastic gradient descent to train GCN model parameters.
The above scheme is suitable for undirected graphs where edges show reciprocal relationship between ending nodes.
3.2 Overview of GCNs for Directed Graphs
In real applications such as social media where relations between nodes are not reciprocal, using regular GCNbased schemes might not be useful. An alternative for this type of situations is to utilize schemes deisgned for directed graphs such as EDGNN Jaume et al. (2019) where incoming and outgoing edges are treated differently in order to capture nonreciprocal relationship between nodes. In other words, EDGNN considers different weights for outgoing and incoming edges in addition to weights set for neighboring nodes. In particular, the embedding of node would be as follows:
(5)  
where , , , and represent weight matrices of layer for embedding of self node, neighboring nodes, incoming edges, and outgoing edges, respectively. Moreover, and respectively denote embedding of node and the embedding of the edge from node to node at layer .
3.3 Proposed Input Graph
In order to train an EDGNNbased model, we need to carefully design the input graph components, namely adjacency matrix, nodes’ features, edges’ features, and labels. It should be noted that our scheme is not tailored to a particular criterion as we will show later that it can learn from two scheduling schemes with different objectives. Our deigned input graph can be fed into the EDGNN and the model will be trained according to labels generated from a given scheduling scheme. We next explain how we carefully design the input graph.
Designed Input Graph: We start from the original task graph and consider the same set of nodes and edges for our input graph as the task graph. In other words, by representing the input graph as , we have and . The crucial part for having an efficacious GCNbased scheduler has to do with carefully designing the features of nodes and edges as well as the labels.
The feature of node , , is denoted by and it has the following dimension features:
The intuition behind is that these features represent the required computational time of task across all compute machines.
The feature of edge , , is denoted by and it has the following dimension features:
The intuition behind is that these features represent the required time for transferring the result of executing task to the following task across all possible pairwise compute machines. An illustration of our designed input graph for the task graph of Fig. 1 is depicted in Fig. 3.
ObjectiveDependent Labeling: Based on what task scheduler our method should learn from (which we refer to as the “teacher” scheduler, namely, HEFT for makespan minimization and TPHEFT for throughput maximization), we label all nodes as well as edges. Let us define and as labels of node and edge , respectively. Regarding nodes’ labeling, we consider the label of node , , as the index of compute node that the teacher algorithm assigns task to run on.
Thus, for makespan minimization, we have that: where is the mapper function of HEFT algorithm.
And for throuhgput maximization, we have that:
where is the mapper function of the TPHEFT algorithm.
Finally, we label each edge according to the label of the ending vertex it is from. In other words, such that . We should note that this edgelabeling is crucial in enforcing the model to learn to label outgoing edges of a node with same label as its corresponding node’s label.
3.4 Implementation
As far as the model parameters are concerned, we consider a 4layer EDGCN with 128 nodes per layer with activation function and dropout. Since we suppose nodes and edges have features, we let both nodes and edges to be embedded.
For training our model, we need to create a sufficiently large graph. However, since the HEFT and TPHEFT algorithms are extremely slow in performing taskscheduling for largescale task graphs, obtaining labels (i.e. determining the machine each task need to be executed on) for a single large graph is cumbersome. Therefore, we create a large graph by taking the union of disjoint mediumsize graphs ’s (i.e., ) such that HEFT and TPHEFT can handle scheduling tasks over each of them. Regarding splitting dataset, we consider , , and of the disjoint graphs for training, validation, and test, respectively. This allows us to apply the teacher algorithms to train our GCNScheduler for even larger graphs than the teacher algorithms themselves can handle efficiently.
4 Experimental Results
In this section, we evaluate the performance of our proposed GCNScheduler in terms of two criteria, namely the makespan minimization and the throughput maximization, for various task graphs (mediumscale and largescale task graphs as well as the task graphs of three real perception applications considered in Ra et al. (2011)). For each criterion, we measure the performance of GCNScheduler as well as the time it takes to assign tasks and compare these values with the corresponding values of our benchmarks (i.e. HEFT/TPHEFT and the random taskscheduler). We evaluate all schemes by running them on our local cluster which has 16 CPUs (with 8 cores and 2 threads per core) of Intel(R) Xeon(R) E52620 v4 @ 2.10GHz.
As far as network settings are concerned, the computation amount of tasks, the execution speed of compute machines, and the communication bandwidth are drawn randomly from uniform distributions. For simplicity, we assume each task produces the same amount of data after being executed. Regarding task graphs, we generate random DAGs in two different ways:
establishing an edge between any two tasks with a given probability (which we call
edge probability (EP)), then pruning^{4}^{4}4Only keep edges between nodes and if . them such that it forms a DAG, specifying the width and depth of the graph, then randomly selecting successive tasks for each task.4.1 Makespan Minimization
For training the model, since the teacher scheduler, i.e. the HEFT algorithm Topcuoglu et al. (2002), is extremely slow in generating labels for large task graphs^{5}^{5}5We observed that HEFT is incapable of conducting taskassignment running on a commodity PC when the task graph become large, on the order of 100 nodes, see table 1 for the increasing trend in compute time with task graph size., we create sufficient number (approximately 400) of random mediumsize task graphs (i.e. each has 50 nodes with either an =0.25 or a width and depth of 5 and 10, respectively) and label tasks for each of these mediumsize task graphs according to the HEFT algorithm Topcuoglu et al. (2002). By doing so, we create a single largescale graph which is the union of disjoint mediumsize graphs. On the significance of our proposed scheme, it only takes seconds to train our model with such a large graph using just CPU’s (no GPU’s or other specialized hardware). After training the model, we consider both mediumscale and largescale task graphs as input samples. Then our model labels the tasks and determines what machine will execute each task. We next measure the performance of GCNScheduler over mediumscale and largescale task graphs as well as the task graphs of the three real perception applications provided in Ra et al. (2011).
4.1.1 Mediumscale task graphs
Fig. 4 shows the average makespan of GCNScheduler (with the makespanminimization objective) compared to HEFT Topcuoglu et al. (2002) and the random taskscheduler for mediumsize task graphs with different number of tasks. Our model significantly outperforms the random taskscheduler and considerably improves the makespan compared to HEFT, specially as number of tasks increases. The accuracy of GCNScheduler is around with respect to replicating the schedules produced by HEFT; however, we should note that ultimately the makespan of executing all tasks is more important rather than the accuracy of labeling tasks, and in this regard GCNScheduler does better than HEFT. To gain some intuitive understanding for why GCNSchduler outperforms HEFT in terms of makespan, we manually examined and observed the schedules produced by GCNScheduler and HEFT for many scenarios. We found that the HEFT is sometimes unable to prevent assigning tasks to machines with poor communication bandwidth while GCNScheduler is able to learn to do so more consistently. We believe this is due to the carefullydesigned features of edges in the input to GCNScheduler which explicitly take communication bandwidth between machines into account.
The time taken to assign tasks to compute machines for both GCNScheduler and HEFT Topcuoglu et al. (2002) is presented in Table 1. As one can easily see, our GCNScheduler outperforms HEFT by 37 orders of magnitude. This clearly shows GCNScheduler is a gamechanger for task scheduling.
20  30  40  50  
GCNSch.  0.0026  0.0027  0.0029  0.0037 
HEFT  0.6764  9.4330  117.35  1552.0 
4.1.2 Largescale task graphs
In addition to the promising results on the mediumsize settings, we now focus on largescale task graphs where HEFT algorithm Topcuoglu et al. (2002) is extremely slow to operate; hence we compare the performance of our GCNScheduler with only possible benchmark which is the random taskscheduler. Fig. 5 shows the average makespan of our proposed GCNScheuler (top plot) and the random taskscheduler (bottom plot) in largescale settings where number of tasks varies from 3,500 to 5,000 and the edge probability (i.e. ) takes 0.005, 0.01, and 0.02. One can easily observe that our proposed GCNScheduler significantly reduces makespan by a factor of 8 (for larger ). The intuition behind the significant gain for larger (i.e. node’s degrees are larger) is that some tasks may require more predecessor tasks to be executed in advance (because of having larger nodes’ degree), hence randomly assigning tasks may potentially assign one of the predecessor task to a machine with poor computing power or communication bandwidth, resulting in a larger average makespan. However, GCNScheuler efficiently exploits intertask dependencies as well as network settings information (i.e. execution speed of machines and communication bandwidth across machines) through carefullydesigned node and edge features; therefore it leads to a remarkably lower makespan.
Finally, Fig. 6 illustrates the inference time (i.e. time taken to assigning tasks to compute machines) of our proposed GCNScheduler for different number of tasks and different . Our GCNScheduler takes milliseconds to obtain labels for each of these largescale task graphs. This clearly shows the great advantage of our proposed scheme which makes it an ideal alternative to stateoftheart scheduling schemes which fail to efficiently operate over complicated jobs each of which may have thousands of tasks with any intertask dependencies.
4.1.3 Real Application Benchmarks
To show the significance of our proposed scheme on real applications, we consider the three real perception applications provided in Ra et al. (2011)
, namely the face recognition, the objectandpose recognition, and the gesture recognition with corresponding task graphs depicted in Fig.
8 (Please see Ra et al. (2011) for more detail regarding each task). We measure the makespan of each application by running GCNScheduler (with the makespanminimization objective) over the carefullydesigned input graphs (obtained from original task graphs). Fig. 7 illustrates the makespan of GCNScheduler (with the makespanminimization objective) against HEFT Topcuoglu et al. (2002) and the random taskscheduler for the three perception applications. While our GCNScheduler leads to a slightly better makespan performance compared to HEFT Topcuoglu et al. (2002), it significantly (by 34 orders of magnitude) reduces the time taken to perform scheduling as it is shown in Table 2. This remarkable time reduction demonstrates the importance of our scheme to other applications (in business or security) used/built upon these perception applications.4.2 Throughput Maximization
For the purpose of maximizing the throughput, we use the TPHEFT algorithm Gallet et al. (2009) as the teacher scheduler for training our GCNScheduler. Since the TPHEFT scheduler Gallet et al. (2009), similar to the HEFT algorithm Topcuoglu et al. (2002), excessively becomes slow in generating labels for large task graphs, we create sufficient number of random mediumsize task graphs (i.e. each of which has around 40 tasks with the width and depth of 5 and 8, respectively) and label tasks according to the TPHEFT algorithm Gallet et al. (2009). We then build a single largescale graph, which is the union of disjoint mediumsize graphs, and train GCNScheduler with the throughputmaximization objective. Similarly, we test our GCNScheduler over mediumscale and largescale task graphs as well as the task graph of the three perception applications.
4.2.1 Mediumscale task graphs
Table 3 shows the throughput of GCNScheduler (with the throughputmaximization objective) compared to TPHEFT Gallet et al. (2009) and the random taskschedulers for mediumsize task graphs with different number of tasks. GCNScheduler leads to slightly higher throughput compared to TPHEFT Gallet et al. (2009) scheduler, while it significantly outperforms random taskscheduler. Table 4 also shows the time taken to schedule tasks. Moreover, the accuracy of our model is around .


4.2.2 Largescale task graphs
Since TPHEFT Gallet et al. (2009) and other existing scheduling schemes are extremely slow for very large task graphs (e.g. task graphs with few thousands tasks), we only compare the throughput of SCNScheduler against the random taskscheduler, as shown in Table 5. Further, Table 6 shows the time taken for assigning tasks to compute nodes for largescale task graphs. One can easily see GCNScheduler (with the throughputmaximization objective) is remarkably fast while handling largescale task graphs.
3,500  4,000  4,500  5,000  
GCNSch.  2.9737  2.9731  2.9733  2.9734 
Random  0.6344  0.6336  0.6332  0.6331 
3,500  4,000  4,500  5,000  
GCNSch.  66.050  74.978  83.817  87.388 
4.2.3 Real Application Benchmarks
We now evaluate the throughput of our GCNScheduler, given the task graph of the three real perception applications provided in Ra et al. (2011), namely the face recognition, the objectandpose recognition, and the gesture recognition. In particular, we run our trained GCNScheduler (with the throughputmaximization objective) over the carefullydesigned input graphs and measure the throughput for each application. Fig. 9 shows the throughput of our GCNScheduler (with the throughputmaximization objective) compared to TPHEFT Gallet et al. (2009) and the random taskscheduler for the three perception applications. While our GCNScheduler leads to a marginally better throughput performance compared to TPHEFT scheduler Gallet et al. (2009), it significantly (23 orders of magnitude) reduces the time taken to perform taskassignment as it is shown in Table 7.
Face Recog.  Pose Recog.  Gesture Recog.  
GCNSch.  0.488  0.511  0.560 
TPHEFT  87.34  257.8  290.1 
5 Conclusion
We proposed GCNScheduler, a scalable and fast taskscheduling scheme which can perform scheduling according to different objectives (such as minimizing the makespan or maximizing the throughout). By evaluating our scheme against benchmarks through simulations, we show that not only can our scheme easily handle largescale settings where existing scheduling schemes are unable to do, but also it can lead to a better performance with significant lower required time (i.e., several orders of magnitude faster) to perform scheduling. As our future direction, we aim at investigating the performance of our proposed GCNScheduler with respect to other objectives with other teacher schedulers.
Acknowledgments
This material is based upon work supported in part by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001117C0053 and by the Army Research Laboratory under Cooperative Agreement W911NF1720196. Any views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
References
 Simulated annealing based vm placement strategy to maximize the profit for cloud service providers. Engineering Science and Technology, an International Journal 20 (4), pp. 1249–1259. External Links: ISSN 22150986 Cited by: §1.1, §1.

Convex programming for scheduling unrelated parallel machines.
In
Proceedings of the ThirtySeventh Annual ACM Symposium on Theory of Computing
, STOC ’05, New York, NY, USA, pp. 331–337. External Links: ISBN 1581139608 Cited by: §1.1, §1.  The datacenter as a computer: an introduction to the design of warehousescale machines, second edition. External Links: Link Cited by: §1.
 HTV dynamic load balancing algorithm for virtual machine instances in cloud. In 2012 International Symposium on Cloud and Services Computing, Vol. , pp. 15–20. External Links: Document Cited by: §1.1.
 Space/aerialassisted computing offloading for iot applications: a learningbased approach. IEEE Journal on Selected Areas in Communications 37 (5), pp. 1117–1129. External Links: Document Cited by: §1.
 Scheduling theory and its applications. Wiley, United States (English). External Links: ISBN 0471940593 Cited by: §1.1.

A genetic algorithm (ga) based load balancing strategy for cloud computing
. Procedia Technology 10, pp. 340–347. Note: First International Conference on Computational Intelligence: Modeling Techniques and Applications (CIMTA) 2013 External Links: ISSN 22120173 Cited by: §1.1, §1.  Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Red Hook, NY, USA, pp. 3844–3852. External Links: ISBN 9781510838819 Cited by: §1.
 A priority based job scheduling algorithm using iba and easy algorithm for cloud metaschedular. In 2015 International Conference on Advances in Computer Engineering and Applications, Vol. , pp. 66–70. External Links: Document Cited by: §1.1.
 Smart cities. In The Internet of Things, People and Systems, Cited by: §1.
 Iterative scheduling for distributed stream processing systems. In Proceedings of the 12th ACM International Conference on Distributed and EventBased Systems, DEBS ’18, New York, NY, USA, pp. 234–237. External Links: ISBN 9781450357821, Link, Document Cited by: §1.
 Simulatedannealing load balancing for resource allocation in cloud environments. In 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies, Vol. , pp. 1–6. External Links: Document Cited by: §1.1, §1.
 Efficient scheduling of task graph collections on heterogeneous resources. In IEEE International Symposium on Parallel Distributed Processing, Vol. , pp. 1–11. External Links: Document Cited by: 2nd item, §1, §2, §2, §4.2.1, §4.2.2, §4.2.3, §4.2.
 Efficient scheduling of astronomical observations  application to the carmenes radialvelocity survey. A&A 604, pp. A87. External Links: Document, Link Cited by: §1.
 Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §1.1.
 Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, pp. 1025–1035. External Links: ISBN 9781510860964 Cited by: §1.
 Inductive representation learning on large graphs. External Links: 1706.02216 Cited by: §3.1.
 On exploiting task duplication in parallel program scheduling. IEEE Transactions on Parallel and Distributed Systems 9 (9), pp. 872–892. External Links: Document Cited by: §1.1.
 Learning based genetic algorithm for task graph scheduling. Appl. Comp. Intell. Soft Comput. 2019. External Links: ISSN 16879724, Link, Document Cited by: §1.1, §1.
 EdGNN: a simple and powerful gnn for directed labeled graphs. External Links: 1904.08745 Cited by: §1.1, §3.2, §3.
 Edge computing in the industrial internet of things environment: softwaredefinednetworksbased edgecloud interplay. IEEE Communications Magazine 56 (2), pp. 44–51. External Links: Document Cited by: §1.
 A discrete binary version of the particle swarm algorithm. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Vol. 5, pp. 4104–4108 vol.5. External Links: Document Cited by: §1.1, §1.
 Semisupervised classification with graph convolutional networks. External Links: 1609.02907 Cited by: §1.
 Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing. Procedia Computer Science 115, pp. 322–329. Note: 7th International Conference on Advances in Computing and Communications, ICACC2017, 2224 August 2017, Cochin, India External Links: ISSN 18770509 Cited by: §1.1.
 Learning to optimize DAG scheduling in heterogeneous environment. CoRR abs/2103.06980. External Links: Link, 2103.06980 Cited by: §1.1.
 Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM ’19, New York, NY, USA, pp. 270–288. External Links: ISBN 9781450359566, Link, Document Cited by: §1.1, §1.1.
 Task clustering and scheduling for distributed memory parallel architectures. IEEE Trans. Parallel Distributed Syst. 7 (1), pp. 46–55. External Links: Link, Document Cited by: §1.1.
 A cost and performanceeffective approach for task scheduling based on collaboration between cloud and fog computing. International Journal of Distributed Sensor Networks 13 (11), pp. 1550147717742073. External Links: Document, Link, https://doi.org/10.1177/1550147717742073 Cited by: §1.
 Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, New York, NY, USA, pp. 43–56. External Links: ISBN 9781450306430, Link, Document Cited by: Figure 7, Figure 8, §4.1.3, §4.1, §4.2.3, Table 2, §4.
 The load balancing algorithm in cloud computing environment. In Proceedings of 2012 2nd International Conference on Computer Science and Network Technology, Vol. , pp. 925–928. External Links: Document Cited by: §1.1.
 The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. External Links: Document Cited by: §3.1.
 Geneticbased algorithms applied to a workflow scheduling algorithm with security and deadline constraints in clouds. Computers and Electrical Engineering 69, pp. 378–394. External Links: ISSN 00457906 Cited by: §1.1, §1.
 Convex quadratic and semidefinite programming relaxations in scheduling. J. ACM 48 (2), pp. 206–242. External Links: ISSN 00045411, Link, Document Cited by: §1.1, §1.
 Combining performance and priority for scheduling resizable parallel applications. Journal of Parallel and Distributed Computing 87, pp. 55–66. External Links: ISSN 07437315 Cited by: §1.1.
 DeepWeave: accelerating job completion time with deep reinforcement learningbased coflow scheduling. In IJCAI, Cited by: §1.1.
 Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA. External Links: ISBN 0262039249 Cited by: §1.1.
 IoT in smart cities: a survey of technologies, practices and challenges. Smart Cities 4 (2), pp. 429–475. Cited by: §1.
 Performanceeffective and lowcomplexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13 (3), pp. 260–274. External Links: Document Cited by: 2nd item, §1.1, §1, §2, Figure 4, §4.1.1, §4.1.1, §4.1.2, §4.1.3, §4.1, §4.2, Table 2.
 Link prediction based on graph neural networks. External Links: 1802.09691 Cited by: §1.
Comments
There are no comments yet.