Neural Architecture Search in Graph Neural Networks

Performing analytical tasks over graph data has become increasingly interesting due to the ubiquity and large availability of relational information. However, unlike images or sentences, there is no notion of sequence in networks. Nodes (and edges) follow no absolute order, and it is hard for traditional machine learning (ML) algorithms to recognize a pattern and generalize their predictions on this type of data. Graph Neural Networks (GNN) successfully tackled this problem. They became popular after the generalization of the convolution concept to the graph domain. However, they possess a large number of hyperparameters and their design and optimization is currently hand-made, based on heuristics or empirical intuition. Neural Architecture Search (NAS) methods appear as an interesting solution to this problem. In this direction, this paper compares two NAS methods for optimizing GNN: one based on reinforcement learning and a second based on evolutionary algorithms. Results consider 7 datasets over two search spaces and show that both methods obtain similar accuracies to a random search, raising the question of how many of the search space dimensions are actually relevant to the problem.



There are no comments yet.


page 1

page 2

page 3

page 4


Simplifying Architecture Search for Graph Neural Network

Recent years have witnessed the popularity of Graph Neural Networks (GNN...

Auto-GNN: Neural Architecture Search of Graph Neural Networks

Graph neural networks (GNN) has been successfully applied to operate on ...

Search to aggregate neighborhood for graph neural network

Recent years have witnessed the popularity and success of graph neural n...

Evolutionary Architecture Search for Graph Neural Networks

Automated machine learning (AutoML) has seen a resurgence in interest wi...

GraphNAS: Graph Neural Architecture Search with Reinforcement Learning

Graph Neural Networks (GNNs) have been popularly used for analyzing non-...

Efficient Exploration of Interesting Aggregates in RDF Graphs

As large Open Data are increasingly shared as RDF graphs today, there is...

Event Classification with Multi-step Machine Learning

The usefulness and value of Multi-step Machine Learning (ML), where a ta...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Performing analytical tasks over graph111In this work we use the terms “graph” and “network” interchangeably. When referring to “neural networks” we will use NN or “neural network”.

data has become increasingly interesting due to the ubiquity and large availability of relational information. Predicting interaction between proteins, classifying users in social networks and recommending movies to users are some classical examples of such tasks


. However, unlike images (formed by a grid of pixels) and sentences (formed by a string of ordered words), there is no notion of sequence in networks. Nodes (and edges) follow no absolute order, so it is hard for traditional machine learning (ML) algorithms, which were built to handle data stored in tensors, to recognize a pattern and generalize their predictions on this type of data


Due to the success of convolutional neural networks (CNNs) for tasks such as image classification

[12], object identification [14] and semantic segmentation [1], a large body of work began to re-define the concept of convolution to the graph domain. Following the work of Gori et. al. [10] and Scarselli et al. [17] on Graph Neural Networks (GNNs), the concept of spectral-based graph convolution function was defined by Bruna et al. [3] and later refined by Defferrard et. al. [5]

. In this approach, unlike traditional neural networks where the architecture is composed by fully connected layers of neurons, graph neural networks follow the graph structure itself


. Forward propagation is done on the nodes of the graph, which pass information onto the next layer by aggregating information from the neighborhood and applying an activation function to the result.

Since the concept of convolution was adapted to the context of graphs, a plethora of GNN models were proposed, including GraphSAGE [11], Graph Attention Networks (GAT) [20], Graph Isomorphism Network (GIN) [22] and many others. These methods achieve state-of-the-art results on tasks such as node classification and link prediction. However, the design and optimization of GNN architectures is currently hand-made, based on heuristics or empirical intuition, which makes it an ineffective and error prone task [22].

Automated Machine Learning (AutoML) appears as a solution to this problem, as it aims to automate the process of building and optimizing machine learning pipelines, relieving users from that burden [7]. Neural Architecture Search (NAS) is considered the current challenge in automating machine learning algorithms [8]. Its methods are composed by a search space of possible architectures, a search method to explore this space and an evaluation framework for the generated architectures.

To the best of our knowledge, there were few attempts in the literature to employ NAS for GNNs [9, 25]. In these works, reinforcement learning methods are used to explore similar search spaces. The NAS literature poses two main types of methods as the most effective to solve the problem: reinforcement learning (RL) and evolutionary algorithms (EAs) [8]. The second type of technique has been so far overlooked in the context of GNNs.

This work employs an EA previously proposed for NAS in the context of image classification [16] to optimize GNNs and performs a comparative analysis of the method with reinforcement learning and random search in terms of model accuracy and runtime. It also conducts a study of the characteristics of the previously proposed search spaces for GNNs in order to identify opportunities for performance improvement on GNN NAS algorithms. Results show that both RL and EA are able to find equivalent models in terms of accuracy, with EA being faster in some cases, which corroborates previous findings for image classification. Furthermore, following the already discussed problems of large search spaces – such as those required in the case of GNNs – with many low effective dimensions [2], we show a Random Search is able to find architectures with equivalent accuracy while being faster. We discuss these results in the light of previous works that discuss this problem.

The remainder of this work is organized as follows. Section 2 introduces background on GNNs and Section 3 discusses related work. Section 4 describes the methodology followed to apply the tested methods in GNN search spaces, while Section 5 presents the results. Finally, Section 6 draws conclusions and discusses directions of future work.

2 Background

In this work, we assume as input a graph composed of a set of nodes and edges, . Each node

is attached to a feature/attribute vector

, and a label . The presence of node labels indicates that we are assuming a supervised learning situation. We define by the neighborhood of a node , i.e., the set of nodes connected to by an edge. The primary concept behind GNNs is that each node in the graph represents an abstract concept, and edges represent the relationship between these concepts. Therefore, the node’s features should correlate with its neighboring features, defining a state (or hidden node representation) for each node [17].

Traditionally, each GNN layer is composed of a function that aggregates information from the neighborhood of each node , forming an intermediate vector , and a second function that combines this value with the current node representation , which in turn goes through an activation function before being output [17, 11]. Formally, this process can be defined as:


By convention, the first hidden representation of each node is its feature vector,

[13]. Figure 1 shows how the structure of a GNN is generated. Given the graph represented in part (a) of the figure, which has 4 nodes and a feature vector associated to each of them, an intermediate representation is generated ((b) in the figure). In this representation, for each node, the neighborhood information generates the intermediate vectors according to the process described in Eq.  1. The third part of the picture (c) shows the GNN itself, where each layer corresponds to an update of the state of the feature vectors of the current node.

In this work we consider undirected graphs and a one-hop neighborhood for each node, which means that only features from a node’s direct neighbors are considered in aggregation. There are many options of aggregation and activation functions, and other mechanisms can also be added to this standard GNN architecture. These components choices are the main subject of this paper, as detailed in Section 4.1.

Figure 1: Structure of a GNN, adapted from Scarselli et. al. [17]

3 Related Work

NAS is considered the current challenge in automating machine learning algorithms, after the success of automated feature engineering [8]. Famous NAS works can be roughly split into two categories: Reinforcement Learning (RL) [26, 4] and Evolutionary Algorithms (EA) [16]. It has been shown that both types of methods are able to find models that perform better than hand-crafted engineered ones, but Real et al. presents empirical proof that EA-based and RL-based methods are able to find equally well-suited models in terms of performance, with EA-based methods finding less complex models in less overall time [8, 16]. Our idea is to adapt and employ NAS methods to the task of finding a good GNN model for large-scale graph embedding, whereas in previous works, the tasks of interest were mostly image classification and object detection.

To the best of our knowledge, NAS has not yet been largely explored in the context of GNNs. GraphNAS [9] is one of the few that uses RL to find feasible architectures for the node classification task. The authors define a search space composed of sampling, aggregation and gated functions, which can be extended to account for hyperparameters. Auto-GNN [25] follows the same line of work, exploring RL and a similar search space to GraphNAS.

Figure 2: Macro Search Space GNN Layer Example

4 Methodology

The problem of NAS in GNNs can be formally defined as follows. Given a dataset – split into training and validation sets and , respectively – and a search space of Graph Neural Architectures , capable of generating a GNN with an architecture with its own set of hyperparameters , the goal is to find the model with the highest expected accuracy on , when its parameters are set on , setting the following bi-level optimization problem:

This section details the search spaces previously defined for GraphNAS [9] and describes the evolutionary algorithm and the RL methods we evaluated in the context of GNN architecture search.

const, sum tanh
gcn, mean linear
gat, max softplus
sym-gat, mlp sigmoid
cos, elu
linear, relu
gen_linear, relu6
Table 1: Macro search space options for 5 actions.

4.1 Search Spaces

The two search spaces evaluated in this work, named by the authors in [9] as “Macro” and “Micro”, are composed by different GNN layers, as detailed next.

4.1.1 Macro Search Space

The name “Macro” comes from the fact that architectures generated from this space always follow the same structure: each layer is composed by a multi-head attention mechanism and the number of heads , a choice of aggregator , the output dimension and an activation function , in this order. The neighborhood sampling method is fixed as a first-order sampler, i.e. only direct neighbors of each node are sampled at each step.

Considering the definitions in Section 2, we have a new component here, which is the attention mechanism. As described by the authors in [20], an attention mechanism – implemented by the coefficients , is designed to attribute different importance value to the features of each of a node’s neighbors. Such coefficients are calculated only for for performance reasons (in order to avoid an matrix), and in practice define the importance of node ’s features over node

. They are implemented as a single-layer feed-forward neural network, and a range of options to this mechanism is available (see first column of Table 

1). Multi-head attention is a way of having independent attention mechanisms over the node’s features. It has been proven that concatenating the results of these independent mechanisms yields better results than using a single attention head [20].

Figure 2 presents the disposition of the actions. The number of multi-heads can be merged with the attention mechanism as they alter the same behavior. The output dimension can also be merged with the activation function .

Table 1 presents the options for each action on the layers. Considering the number of options for each action on the layers, the search space presents () = possibilities for each layer. According to the authors in [13], GNNs achieve the best overall results using architectures with 2 or 3 layers. Therefore in this paper the architectures have 2 layers, in a total of architecture possibilities.

One important characteristic of this search space is that the hyperparameters of the GNNs, such as learning rate, dropout, weight decay are kept fixed. The learning rate is set to , the dropout to 0.6 and the weight decay to .

, GCN, Cheb, SAGE, ARMA, SG, Linear, Zero
Add, Product, Concat
Sigmoid, tanh, elu, relu, linear
Table 2: Micro search space action and hyperparameters.

4.1.2 Micro Search Space

The name “Micro” comes from the fact that architectures generated from this search space are composed by combining different convolution schemes, and do not follow a single fixed structure. The choice of actions in this space are: a convolutional layer , a combination scheme and an activation function . The hyperparameters which can be tuned are: the learning rate , the dropout rate , the weight decay rate and the number of hidden units . In the options for , the option means that there are 8 possible convolutions, using 1 to 8 multi-heads attention.

Figure 3: Micro Search Space GNN architectures Example

Figure 3 illustrates the types of architectures that can be generated from this space. The straight arrows represent one type of connectivity, where the input is fed to two separate convolutional layers and their outputs are fed to the combination layer. The dashed line represents the second type, when two convolutional layers are stacked before feeding the output to the combination layer. The full list of actions and hyperparameters for this space is presented in Table  2. Regarding the number of possibilities for each action and hyperparameter listed, there are architecture possibilities in this space.

Note that the architectures in the micro-space take advantage of convolutions. Graph convolution methods are classified mainly into two streams, both covered by the micro-search space: spectral-based and spatial-based methods [21]. Spectral methods [3, 13]

rely on spectral properties of the graph, by finding eigenvectors of the normalized graph Laplacian. This approach is limited because eigendecomposition is an expensive operation, eigenbasis are sensible to minimal graph perturbations and the learned filters do not generalize well to graphs of different structure (therefore they do not work well on inductive learning scenarios). Spatial-based methods

[11, 20] follow the message passing idea of traditional GNNs (also known as Recursive GNNs), in which a node’s hidden representation is an input to its neighbors computation. These methods are scalable to large graphs and are more generalizable to various types of graphs (heterogeneous, directed, graphs which contain edge labels, etc.).

4.2 Search Methods

This section describes the two methods we apply to search the macro and micro search spaces described in the previous section: the evolutionary method and the reinforcement learning. We also describe the random search method that will be used as a baseline for the results.

Evolutionary algorithm - Evolutionary methods are inspired by Darwin’s theory of evolution, and evolve a set of individuals – which represent solutions to the problem at hand – for a number of iterations (also known as generations) [6]

. From one iteration to the next, individuals are evaluated according to a fitness function, which assesses their ability to solve the problem. The value of fitness is used to probabilistic select the individuals that will undergo crossover and mutation operators, which are applied according to user-defined probabilities. We explore an evolutionary method inspired on the Aging Evolution method, described by Real at. al.

[16]. In this method, a population of individuals –i.e., a set of GNNs – is generated randomly by sampling options for each action in a layer, considering the number of layers specified. These GNNs are then trained in a training set and have their accuracy measured on a validation set. This value of accuracy is used to select an individual via tournament selection to generate a new offspring. The child individual is generated via mutation, which is uniform over the actions and replaces the selected action by a random option. The child individual is always added to the population and the oldest individual in the population (i.e., the individual that has been in the population for the highest number of iterations) is always removed (hence the name “Aging Evolution”).

Reinforcement Learning

- GraphNAS uses a LSTM (Long-Short Term Memory) network as a controller to generate fixed-length architectures, which act as GNN architecture descriptors and can be viewed as a list of actions. The accuracy achieved by the GNN in the validation dataset at convergence is used as the reward signal to the training process of the reinforcement learning controller. As the reward signal

is non-differentiable, a policy gradient method is used to iteratively update

with a moving average baseline for reward to reduce variance.

Random Search - An initial random GNN is generated by sampling options from each action in a layer, for the specified number of layers. The GNN is trained and the accuracy on the validation set measured. This process is repeated for the specified number of iterations, storing the GNN with the highest accuracy.

5 Experimental Analysis

We assess the performance of the evolutionary algorithm (EA)222Code available at:, the reinforcement learning (RL) method and the random search (RS) on the transductive learning scenario, in a node classification task, over a set of 7 datasets in terms of accuracy and runtime, as detailed next. It is important to note that this work does not compare the architectures obtained by the optimization methods to hand-crafted ones, as that was already done in GraphNAS’ paper [9].

5.1 Datasets

Table 3 presents the details of the datasets, as previously used in [19] and provided by Pytorch Geometric333 For all cases, we are dealing with a node classification task, where we use information from the nodes with known-labels to assign a class to nodes with unknown label (test set).

Dataset (Abbrv.) # Classes # Features # Nodes # Edges
Citeseer (CIT)
Pubmed (MED)
Coauthor CS (CS)
Coauthor Physics (PHY)
Amazon Computers (CMP)
Amazon Photo (PHO)
Table 3: Dataset characteristics.

The first three datasets (COR, CIT, MED) are paper co-authorships networks, used previously in [13]. Nodes represent documents, and an edge between two documents means that one paper cited the other. Class labels represent sub-areas of machine learning [18]. Node features are sparse bag-of-words vectors.

CS and PHY are also co-authorship networks, based on the Microsoft Academic Graph from KDD Cup 2016. However, in these datasets nodes represent authors instead of papers, connected by an edge if they have co-authored a paper. Node features represent paper keywords for each author’s papers. Class labels indicate the most active field of study for each author in the network.

CMP and PHO are segments of the Amazon co-purchase graph, where nodes represent products and edges are added between items frequently bought together. The nodes features are a bag-of-words representation of product reviews, and class labels represent the product category.

Macro Micro
Accuracy Time Accuracy Time
Table 4: Accuracies and execution times (in seconds) of search methods.

5.2 Experimental Setup

All search methods were executed for 1000 iterations in order to enable a fair comparison. In each iteration, a single GNN architecture is generated, trained on and evaluated (in terms of accuracy) on . The architecture with the highest validation accuracy is saved across iterations, and returned as the result of the optimization process. The generated architectures are trained using the following fixed hyperparameters for all search spaces and methods: minimizing cross-entropy loss using ADAM optimizer, initial learning rate of 0.005 and an early stopping strategy with a patience of epochs.

Random search has only one parameter: the number of iterations. The reinforcement learning controller is trained using the same hyperparameters as described on GraphNAS’ paper [9]: a one-layer LSTM with 100 hidden units, ADAM optimizer, learning rate at and random initialization of weights. Aging Evolution has three main parameters: the population size, the tournament size and the number of iterations . The first parameter is related to the number of solutions evaluated during the search process, while the tournament size controls the convergence speed. The higher the value of , the faster the algorithm converges. From all tested values (), the best results were achieved using the population size set to and set to 3.

The dataset split between training, validation and testing sets was done in the same way as in the GraphNAS public code444 the last nodes are separated for validation and testing, split evenly between the two.

All experiments were repeated 5 times as the methods are non-deterministic. The experiments were run on a machine with a 16-core Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz, 16GB DIMM DDR4 @ 2666 MHz RAM, and a NVIDIA GV100 [TITAN V] graphics card, with 12GB dedicated RAM.

5.3 Results

Figure 4: Highest validation accuracy by iteration, for CIT and COR datasets, on the Macro search space.

Table 4 shows the results of accuracy and execution time for the Macro and Micro search spaces, at the end of the optimization process (after 1000 iterations). In terms of accuracy, the results obtained by the EA and RL methods are very similar to the ones obtained by the random search. In terms of execution time, RS wins in most cases. The execution time for the search varies between 2 and 12 GPU hours.

Figure 4 presents the evolution of the highest validation accuracy value achieved by an GNN architecture across the iterations, by search method555We present only the results for the Macro search space because the results for Micro are very similar.

. Each line represents the mean validation score across all seeds, and the shaded area around it represents the standard deviation of this value. It is very clear that

all methods converge (find a good performing architecture and plateaus) within only a few iterations. The fact that the EA already starts at a high value may be attributed to the population initialization process, depicted in Figure 6.

It may seem counter-intuitive that we are using sophisticated methods to obtain results that can be also be achieved by a random search method, but as the authors in [2] have previously discussed, in large search spaces where many of the dimensions are irrelevant to the task at hand the random search can be as effective as more sophisticated methods. This problem is aggravated by the neutrality of the space, i.e., architectures in neighbour regions of the search space may differ in a few components but do not lead to a value of accuracy different from their neighbors [15]. Another stronger indicator of a neutral search space is the fact that many high quality individuals are generated in the initialization step, and evolution takes a minor part in improving them, as shown in Figure 4.

Figure 5: Cumulative number of architectures with validation accuracy higher than threshold, for CIT and COR datasets, on the Macro search space.

Figure 5 presents the number of evaluated architectures with validation accuracy over , for CIT and COR, in the Macro search space. The threshold was set because this value represents approximately the best accuracy value for CIT on the Macro search space. The pattern shown in the figure is consistent for all datasets in both search spaces. It shows that the EA tends to converge to a better region of the search space faster than the other two methods, thus evaluating more high quality architectures. Such tendency could be explained by the EA’s selective pressure (driven by the tournament selection process), which makes the algorithm prioritize good individuals for mutation and evaluation.

Figure 6: Distribution of EA’s initial population validation accuracies on both search spaces.

The parameter size of GNNs is dependent on the dataset (since the structure of the neural network follows the graph) and on the choice of architecture. Table 5 presents the percentage of generated architectures which exceeded GPU memory, by each dataset and search method.666The smallest datasets (CIT and COR) are not present in the table because none of the generated architectures for these datasets exceeded GPU memory. EA is consistently the search method for which the smallest percentage of generated architectures are too big for the GPU memory, with the highest value as , while RL reaches of all architectures being too large. This corroborates the findings of Real et. al. [16] which state that Evolutionary Algorithms are able to find less complex but equally well performing architectures than RL.

Avg. Max
% %
EA 16.0
CMP RL 81.0
Table 5: Percentages of generated architectures which exceeded the GPU memory and therefore were not evaluated, by dataset and search method

6 Conclusions and Future Work

GNNs are able to achieve state-of-the-art performances in prediction tasks over networks. However, their design and optimization is currently hand-made and error prone. This paper compared the results of two NAS search methods – a reinforcement learning technique and an evolutionary algorithm – to a random search in the task of searching for architectures and hyperparameters for GNNs.

The three methods produced GNN architectures which achieved similar results in terms of accuracy when considering a set of 7 datasets and two architecture layer search spaces, with the random search being the fastest method followed by the evolutionary algorithm and reinforcement learning. Architectures generated by EA tend to fit in GPU memory, while the other methods generate oversized architectures in up to 80% of cases. This shows that EA generates less complex structures while achieving a similar accuracy value to the other methods, corroborating the findings of Real et. al. [16] for images.

In general, the results indicate that there are irrelevant dimensions to this task in the defined search spaces, which will require a more in-depth study of each of these spaces. Further, the neutrality of this space, i.e., the fact that neighbor solutions present different architectures but very similar results of accuracy make search even harder. As future work, we intend to perform a more in-depth investigation of the dimensions of the search space in order to identify those that may be irrelevant to search, as well as propose new search methods that may include mechanisms to try to avoid these neutral regions.


  • [1] V. Badrinarayanan, A. Kendall, and R. Cipolla (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. TPAMI’17 39 (12), pp. 2481–2495. Cited by: §1.
  • [2] J. Bergstra and Y. Bengio (2012) Random search for hyper-parameter optimization. JMLR’12 13 (Feb), pp. 281–305. Cited by: §1, §5.3.
  • [3] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2014) Spectral networks and locally connected networks on graphs. In ICLR’14, Y. Bengio and Y. LeCun (Eds.), Cited by: §1, §4.1.2.
  • [4] H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wang (2018) Efficient architecture search by network transformation. In AAAI’18, Cited by: §3.
  • [5] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS’16, D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 3837–3845. Cited by: §1.
  • [6] A. Eiben and J. Smith (2015)

    Introduction to evolutionary computing

    Springer. Cited by: §4.2.
  • [7] R. Elshawi, M. Maher, and S. Sakr (2019) Automated machine learning: state-of-the-art and open challenges. arXiv preprint arXiv:1906.02287. Cited by: §1.
  • [8] T. Elsken, J. H. Metzen, and F. Hutter (2019) Neural architecture search: A survey. JMLR’19 20, pp. 55:1–55:21. Cited by: §1, §1, §3.
  • [9] Y. Gao, H. Yang, P. Zhang, C. Zhou, and Y. Hu (2020) Graph neural architecture search. In IJCAI’20, pp. 1403–1409. Cited by: §1, §3, §4.1, §4, §5.2, §5.
  • [10] M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §1.
  • [11] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In NIPS ’17, Cited by: §1, §2, §4.1.2.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR’16, pp. 770–778. Cited by: §1.
  • [13] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In ICLR’17, Cited by: §2, §4.1.1, §4.1.2, §5.1.
  • [14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) Ssd: single shot multibox detector. In ECCV’16, pp. 21–37. Cited by: §1.
  • [15] C. G. Pimenta, A. G. de Sá, G. Ochoa, and G. L. Pappa (2020) Fitness landscape analysis of automated machine learning search spaces. In EvoCOP’20, pp. 114–130. Cited by: §5.3.
  • [16] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le (2019) Aging evolution for image classifier architecture search. In AAAI’19, Cited by: §1, §3, §4.2, §5.3, §6.
  • [17] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The graph neural network model. IEEE TNN’09. Cited by: §1, Figure 1, §2, §2.
  • [18] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §5.1.
  • [19] O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann (2018) Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868. Cited by: §5.1.
  • [20] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph attention networks. In ICLR’18, Cited by: §1, §4.1.1, §4.1.2.
  • [21] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2019) A comprehensive survey on graph neural networks. CoRR. External Links: 1901.00596 Cited by: §4.1.2.
  • [22] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2019) How powerful are graph neural networks?. In ICLR’19, Cited by: §1.
  • [23] M. Zhang, Z. Cui, M. Neumann, and Y. Chen (2018)

    An end-to-end deep learning architecture for graph classification

    In AAAI’18, Cited by: §1.
  • [24] Z. Zhang, P. Cui, and W. Zhu (2020) Deep learning on graphs: a survey. TKDE’20 (), pp. 1–1. Cited by: §1.
  • [25] K. Zhou, Q. Song, X. Huang, and X. Hu (2019) Auto-gnn: neural architecture search of graph neural networks. arXiv preprint arXiv:1909.03184. Cited by: §1, §3.
  • [26] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le (2018) Learning transferable architectures for scalable image recognition. In CVPR’2018, Cited by: §3.