1 Introduction
The recent success of neural networks has boosted research on pattern recognition and data mining. Many machine learning tasks such as object detection
[1, 2], machine translation [3, 4], and speech recognition [5], which once heavily relied on handcrafted feature engineering to extract informative feature sets, has recently been revolutionized by various endtoend deep learning paradigms, i.e., convolutional neural networks (CNNs)
[6], long shortterm memory (LSTM)
[7], and autoencoders. The success of deep learning in many domains is partially attributed to the rapidly developing computational resources (e.g., GPU) and the availability of large training data, and is partially due to the effectiveness of deep learning to extract latent representation from Euclidean data (e.g., images, text, and video). Taking image analysis as an example, an image can be represented as a regular grid in the Euclidean space. A convolutional neural network (CNN) is able to exploit the shiftinvariance, local connectivity, and compositionality of image data [8], and as a result, CNN can extract local meaningful features that are shared with the entire datasets for various image analysis tasks.While deep learning has achieved great success on Euclidean data, there is an increasing number of applications where data are generated from the nonEuclidean domain and need to be effctectively analyzed. For instance, in ecommence, a graphbased learning system is able to exploit the interactions between users and products [9, 10, 11] to make a highly accurate recommendations. In chemistry, molecules are modeled as graphs and their bioactivity needs to be identified for drug discovery [12, 13]. In a citation network, papers are linked to each other via citationship and they need to be categorized into different groups [14, 15]. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. This is because graph data are irregular. Each graph has a variable size of unordered nodes and each node in a graph has a different number of neighbors, causing some important operations (e.g., convolutions), which are easy to compute in the image domain, but are not directly applicable to the graph domain any more. Furthermore, a core assumption of existing machine learning algorithms is that instances are independent of each other. However, this is not the case for graph data where each instance (node) is related to others (neighbors) via some complex linkage information, which is used to capture the interdependence among data, including citationship, friendship, and interactions.
1.1 Practical Applications
Graph neural networks have a wide range of applications across different tasks and domains. Despite general tasks at which each category of GNNs is specialized, including node classification, node representation learning, graph classification, graph generation, and spatialtemporal forecasting, GNNs can also be applied to node clustering, link prediction [119], and graph partition [120]. In this section, we mainly introduce practical applications according to general domains to which they belong.
Computer Vision
One of biggest application areas for graph neural networks is computer vision. Researchers have explored leveraging graph structures in scene graph generation, point clouds classification and segmentation, action recognition and many other directions.
In scene graph generation, semantic relationships between objects facilitate the understanding of the semantic meaning behind a visual scene. Given an image, scene graph generation models detect and recognize objects and predict semantic relationships between pairs of objects[121, 122, 123]. Another application inverses the process by generating realistic images given scene graphs [124]. As natural language can be parsed as semantic graphs where each word represents an object, it is a promising solution to synthesize images given textual descriptions.
In point clouds classification and segmentation, a point cloud is a set of 3D points recorded by LiDAR scans. Solutions for this task enable LiDAR devices to see the surrounding environment, which is typically beneficial for unmanned vehicles. To identify objects depicted by point clouds, [125, 126, 127] convert point clouds into knearest neighbor graphs or superpoint graphs, and use graph convolution networks to explore the topological structure.
In action recognition, recognizing human actions contained in videos facilitates a better understanding of video content from a machine aspect. One group of solutions detects the locations of human joints in video clips. Human joints which are linked by skeletons naturally form a graph. Given the time series of human joint locations, [72, 73] applies spatialtemporal neural networks to learn human action patterns.
In addition, the number of possible directions in which to apply graph neural networks in computer vision is still growing. This includes fewshot image classification[128, 129], semantic segmentation [130, 131], visual reasoning [132] and question answering [133].
Recommender Systems Graphbased recommender systems take items and users as nodes. By leveraging the relations between items and items, users and users, users and items, as well as content information, graphbased recommender systems are able to produce highquality recommendations. The key to a recommender system is to score the importance of an item to an user. As a result, it can be cast as a link prediction problem. The goal is to predict the missing links between users and items. To address this problem, Van et al. [9] and Ying et al. [11] et al. propose a GCNbased graph autoencoder. Monti et al. [10] combine GCN and RNN to learn the underlying process that generates the known ratings.
Traffic Traffic congestion has become a hot social issue in modern cities. Accurately forecasting traffic speed, volume or the density of roads in traffic networks is fundamentally important in route planning and flow control. [134, 70, 71, 28] adopt a graphbased approach with spatialtemporal neural networks. The input to their models is a spatialtemporal graph. In this spatialtemporal graph, nodes are represented by sensors placed on roads, edges are represented by the distance of pairwise nodes above a threshold and each node contains a time series as features. The goal is to forecast the average speed of a road within a time interval. Another interesting application is taxidemand prediction. This greatly helps intelligent transportation systems make use of resources and save energy effectively. Given historical taxi demands, location information, weather data, and event features, Yao et al. [135] incorporate LSTM, CNN and node embeddings trained by LINE [136] to form a joint representation for each location to predict the number of taxis demanded for a location within a time interval.
Chemistry In chemistry, researchers apply graph neural networks to study the graph strcutures of molecules. In a molecular graph, atoms function as nodes and chemical bonds function as edges. Node classification, graph classification and graph generation are three main tasks targeting at molecular graphs in order to learn molecular fingerprints [80, 53], to predict molecular properties [13], to infer protein interfaces [137], and to synthesize chemical compounds [66, 65, 138].
Others There have been initial explorations into applying GNNs to other problems such as program verification [18], program reasoning [139], social influence prediction[140], adversarial attacks prevention[141], electrical health records modeling[142, 143], event detection[144]
and combinatorial optimization
[145].2 Future Directions
Though graph neural networks have proven their power in learning graph data, challenges still exist due to the complexity of graphs. In this section, we provide four future directions of graph neural networks.
Go Deep The success of deep learning lies in deep neural architectures. In image classification, for example, an outstanding model named ResNet [146] has 152 layers. However, when it comes to graphs, experimental studies have shown that with the increase in the number of layers, the model performance drops dramatically [147]. According to [147], this is due to the effect of graph convolutions in that it essentially pushes representations of adjacent nodes closer to each other so that, in theory, with an infinite times of convolutions, all nodes’ representations will converge to a single point. This raises the question of whether going deep is still a good strategy for learning graphstructured data.
Receptive Field The receptive field of a node refers to a set of nodes including the central node and its neighbors. The number of neighbors of a node follows a power law distribution. Some nodes may only have one neighbor, while other nodes may neighbors as many as thousands. Though sampling strategies have been adopted [24, 26, 27], how to select a representative receptive field of a node remains to be explored.
Scalability
Most graph neural networks do not scale well for large graphs. The main reason for this is when stacking multiple layers of a graph convolution, a node’s final state involves a large number of its neighbors’ hidden states, leading to high complexity of backpropagation. While several approaches try to improve their model efficiency by fast sampling
[46, 45] and subgraph training [24, 27], they are still not scalable enough to handle deep architectures with large graphs.Dynamics and Heterogeneity The majority of current graph neural networks tackle with static homogeneous graphs. On the one hand, graph structures are assumed to be fixed. On the other hand, nodes and edges from a graph are assumed to come from a single source. However, these two assumptions are not realistic in many scenarios. In a social network, a new person may enter into a network at any time and an existing person may quit the network as well. In a recommender system, products may have different types where their inputs may have different forms such as texts or images. Therefore, new methods should be developed to handle dynamic and heterogeneous graph structures.
3 Conclusion
In this survey, we conduct a comprehensive overview of graph neural networks. We provide a taxonomy which groups graph neural networks into five categories: graph convolutional networks, graph attention networks, graph autoencoders and graph generative networks. We provide a thorough review, comparisons, and summarizations of the methods within or between categories. Then we introduce a wide range of applications of graph neural networks. Datasets, open source codes, and benchmarks for graph neural networks are summarized. Finally, we suggest four future directions for graph neural networks.
Acknowledgment
This research was funded by the Australian Government through the Australian Research Council (ARC) under grants 1) LP160100630 partnership with Australia Government Department of Health and 2) LP150100671 partnership with Australia Research Alliance for Children and Youth (ARACY) and Global Business College Australia (GBCA). We acknowledge the support of NVIDIA Corporation and MakeMagic Australia with the donation of GPU used for this research.
References
 [1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
 [2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

[3]
M.T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” in
Proceedings of the Conference on Empirical Methods in Natural Language Processing
, 2015, pp. 1412–1421.  [4] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.
 [5] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012.
 [6] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
 [7] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [8] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
 [9] R. van den Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrix completion,” stat, vol. 1050, p. 7, 2017.
 [10] F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix completion with recurrent multigraph neural networks,” in Advances in Neural Information Processing Systems, 2017, pp. 3697–3707.
 [11] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for webscale recommender systems,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2018, pp. 974–983.
 [12] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 3844–3852.
 [13] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the International Conference on Machine Learning, 2017, pp. 1263–1272.
 [14] T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations, 2017.
 [15] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in Proceedings of the International Conference on Learning Representations, 2017.
 [16] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” in Proceedings of the International Joint Conference on Neural Networks, vol. 2. IEEE, 2005, pp. 729–734.
 [17] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
 [18] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in Proceedings of the International Conference on Learning Representations, 2015.
 [19] H. Dai, Z. Kozareva, B. Dai, A. Smola, and L. Song, “Learning steadystates of iterative algorithms over graphs,” in Proceedings of the International Conference on Machine Learning, 2018, pp. 1114–1122.
 [20] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” in Proceedings of International Conference on Learning Representations, 2014.
 [21] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on graphstructured data,” arXiv preprint arXiv:1506.05163, 2015.
 [22] R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolutional neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, pp. 3546–3553.
 [23] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: Graph convolutional neural networks with complex rational spectral filters,” arXiv preprint arXiv:1705.07664, 2017.
 [24] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.
 [25] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, 2017, p. 3.
 [26] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in Proceedings of the International Conference on Machine Learning, 2016, pp. 2014–2023.
 [27] H. Gao, Z. Wang, and S. Ji, “Largescale learnable graph convolutional networks,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp. 1416–1424.
 [28] J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D.Y. Yeung, “Gaan: Gated attention networks for learning on large and spatiotemporal graphs,” in Proceedings of the Uncertainty in Artificial Intelligence, 2018.
 [29] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. SanchezGonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018.
 [30] J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, “Attention models in graphs: A survey,” arXiv preprint arXiv:1807.07984, 2018.
 [31] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” arXiv preprint arXiv:1812.04202, 2018.
 [32] P. Cui, X. Wang, J. Pei, and W. Zhu, “A survey on network embedding,” IEEE Transactions on Knowledge and Data Engineering, 2017.
 [33] W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,” in Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.
 [34] D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Network representation learning: A survey,” IEEE Transactions on Big Data, 2018.
 [35] H. Cai, V. W. Zheng, and K. Chang, “A comprehensive survey of graph embedding: problems, techniques and applications,” IEEE Transactions on Knowledge and Data Engineering, 2018.
 [36] P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” KnowledgeBased Systems, vol. 151, pp. 78–94, 2018.
 [37] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, “Triparty deep network representation,” in Proceedings of the International Joint Conference on Artificial Intelligence. AAAI Press, 2016, pp. 1895–1901.
 [38] X. Shen, S. Pan, W. Liu, Y.S. Ong, and Q.S. Sun, “Discrete network embedding,” in Proceedings of the International Joint Conference on Artificial Intelligence, 7 2018, pp. 3549–3555.

[39]
H. Yang, S. Pan, P. Zhang, L. Chen, D. Lian, and C. Zhang, “Binarized attributed network embedding,” in
IEEE International Conference on Data Mining. IEEE, 2018.  [40] B. Perozzi, R. AlRfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 701–710.
 [41] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning graph representations,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2016, pp. 1145–1152.
 [42] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1225–1234.
 [43] A. Susnjara, N. Perraudin, D. Kressner, and P. Vandergheynst, “Accelerated filtering on graphs using lanczos method,” arXiv preprint arXiv:1509.04537, 2015.
 [44] J. Atwood and D. Towsley, “Diffusionconvolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993–2001.
 [45] J. Chen, T. Ma, and C. Xiao, “Fastgcn: fast learning with graph convolutional networks via importance sampling,” in Proceedings of the International Conference on Learning Representations, 2018.

[46]
J. Chen, J. Zhu, and L. Song, “Stochastic training of graph convolutional networks with variance reduction,” in
Proceedings of the International Conference on Machine Learning, 2018, pp. 941–949.  [47] F. P. Such, S. Sah, M. A. Dominguez, S. Pillai, C. Zhang, A. Michael, N. D. Cahill, and R. Ptucha, “Robust spatial filtering with graph convolutional neural networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 884–896, 2017.
 [48] Z. Liu, C. Chen, L. Li, J. Zhou, X. Li, and L. Song, “Geniepath: Graph neural networks with adaptive receptive paths,” arXiv preprint arXiv:1802.00910, 2018.
 [49] C. Zhuang and Q. Ma, “Dual graph convolutional networks for graphbased semisupervised classification,” in Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2018, pp. 499–508.
 [50] T. Derr, Y. Ma, and J. Tang, “Signed graph convolutional network,” arXiv preprint arXiv:1808.06354, 2018.
 [51] T. Pham, T. Tran, D. Q. Phung, and S. Venkatesh, “Column networks for collective classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2017, pp. 2485–2491.
 [52] M. Simonovsky and N. Komodakis, “Dynamic edgeconditioned filters in convolutional neural networks on graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
 [53] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, “Molecular graph convolutions: moving beyond fingerprints,” Journal of computeraided molecular design, vol. 30, no. 8, pp. 595–608, 2016.
 [54] W. Huang, T. Zhang, Y. Rong, and J. Huang, “Adaptive sampling towards fast graph representation learning,” in Advances in Neural Information Processing Systems, 2018, pp. 4563–4572.
 [55] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An endtoend deep learning architecture for graph classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
 [56] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” in Advances in Neural Information Processing Systems, 2018, pp. 4801–4811.
 [57] J. B. Lee, R. Rossi, and X. Kong, “Graph classification using structural attention,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp. 1666–1674.
 [58] S. AbuElHaija, B. Perozzi, R. AlRfou, and A. A. Alemi, “Watch your step: Learning node embeddings via graph attention,” in Advances in Neural Information Processing Systems, 2018, pp. 9197–9207.
 [59] T. N. Kipf and M. Welling, “Variational graph autoencoders,” arXiv preprint arXiv:1611.07308, 2016.
 [60] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized graph autoencoder for graph clustering,” in Proceedings of the ACM on Conference on Information and Knowledge Management. ACM, 2017, pp. 889–898.
 [61] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially regularized graph autoencoder for graph embedding.” in Proceedings of the International Joint Conference on Artificial Intelligence, 2018, pp. 2609–2615.
 [62] W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang, “Learning deep network representations with adversarially regularized autoencoders,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp. 2663–2671.
 [63] K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu, “Deep recursive network embedding with regular equivalence,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2018, pp. 2357–2366.
 [64] J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec, “Graphrnn: A deep generative model for graphs,” Proceedings of International Conference on Machine Learning, 2018.
 [65] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning deep generative models of graphs,” in Proceedings of the International Conference on Machine Learning, 2018.
 [66] N. De Cao and T. Kipf, “Molgan: An implicit generative model for small molecular graphs,” arXiv preprint arXiv:1805.11973, 2018.
 [67] A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann, “Netgan: Generating graphs via random walks,” in Proceedings of the International Conference on Machine Learning, 2018.
 [68] T. Ma, J. Chen, and C. Xiao, “Constrained generation of semantically valid graphs via regularizing variational autoencoders,” in Advances in Neural Information Processing Systems, 2018, pp. 7110–7121.
 [69] Y. Seo, M. Defferrard, P. Vandergheynst, and X. Bresson, “Structured sequence modeling with graph convolutional recurrent networks,” arXiv preprint arXiv:1612.07659, 2016.

[70]
Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Datadriven traffic forecasting,” in
Proceedings of International Conference on Learning Representations, 2018.  [71] B. Yu, H. Yin, and Z. Zhu, “Spatiotemporal graph convolutional networks: A deep learning framework for traffic forecasting,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2017, pp. 3634–3640.
 [72] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeletonbased action recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
 [73] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structuralrnn: Deep learning on spatiotemporal graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5308–5317.
 [74] S. Pan, J. Wu, X. Zhu, C. Zhang, and P. S. Yu, “Joint structure feature exploration and regularization for multitask graph classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 715–728, 2016.
 [75] S. Pan, J. Wu, X. Zhu, G. Long, and C. Zhang, “Task sensitive feature exploration and learning for multitask graph classification,” IEEE transactions on cybernetics, vol. 47, no. 3, pp. 744–758, 2017.

[76]
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013. 
[77]
L. B. Almeida, “A learning rule for asynchronous perceptrons with feedback in a combinatorial environment.” in
Proceedings of the International Conference on Neural Networks, vol. 2. IEEE, 1987, pp. 609–618.  [78] F. J. Pineda, “Generalization of backpropagation to recurrent neural networks,” Physical review letters, vol. 59, no. 19, p. 2229, 1987.
 [79] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1724–1734.
 [80] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. AspuruGuzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in Neural Information Processing Systems, 2015, pp. 2224–2232.

[81]
K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. Tkatchenko, “Quantumchemical insights from deep tensor neural networks,”
Nature communications, vol. 8, p. 13890, 2017.  [82] B. Weisfeiler and A. Lehman, “A reduction of a graph to a canonical form and an algebra arising during this reduction,” NauchnoTechnicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968.
 [83] B. L. Douglas, “The weisfeilerlehman method and graph isomorphism testing,” arXiv preprint arXiv:1101.5211, 2011.
 [84] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst, “Geodesic convolutional neural networks on riemannian manifolds,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2015, pp. 37–45.
 [85] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein, “Learning shape correspondence with anisotropic convolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 3189–3197.
 [86] M. Fey, J. E. Lenssen, F. Weichert, and H. Müller, “Splinecnn: Fast geometric deep learning with continuous bspline kernels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 869–877.
 [87] S. Pan, J. Wu, and X. Zhu, “Cogboost: Boosting for fast costsensitive graph classification,” IEEE Transactions on Knowledge & Data Engineering, no. 1, pp. 1–1, 2015.
 [88] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks,” arXiv preprint arXiv:1810.00826, 2018.
 [89] S. Verma and Z.L. Zhang, “Graph capsule convolutional neural networks,” arXiv preprint arXiv:1805.08090, 2018.
 [90] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
 [91] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [92] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems, 2014, pp. 3104–3112.

[93]
P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in
Proceedings of the international conference on Machine learning. ACM, 2008, pp. 1096–1103.  [94] G. L. Guimaraes, B. SanchezLengeling, C. Outeiral, P. L. C. Farias, and A. AspuruGuzik, “Objectivereinforced generative adversarial networks (organ) for sequence generation models,” arXiv preprint arXiv:1705.10843, 2017.
 [95] M. J. Kusner, B. Paige, and J. M. HernándezLobato, “Grammar variational autoencoder,” arXiv preprint arXiv:1703.01925, 2017.
 [96] H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song, “Syntaxdirected variational autoencoder for molecule generation,” in Proceedings of the International Conference on Learning Representations, 2018.
 [97] R. GómezBombarelli, J. N. Wei, D. Duvenaud, J. M. HernándezLobato, B. SánchezLengeling, D. Sheberla, J. AguileraIparraguirre, T. D. Hirzel, R. P. Adams, and A. AspuruGuzik, “Automatic chemical design using a datadriven continuous representation of molecules,” ACS central science, vol. 4, no. 2, pp. 268–276, 2018.
 [98] B. Chen, L. Sun, and X. Han, “Sequencetoaction: Endtoend semantic graph generation for semantic parsing,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2018, pp. 766–777.
 [99] D. D. Johnson, “Learning graphical state transitions,” in Proceedings of the International Conference on Learning Representations, 2016.
 [100] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” in European Semantic Web Conference. Springer, 2018, pp. 593–607.
 [101] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, 2017, pp. 5767–5777.
 [102] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017.
 [103] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. EliassiRad, “Collective classification in network data,” AI magazine, vol. 29, no. 3, p. 93, 2008.
 [104] X. Zhang, Y. Li, D. Shen, and L. Carin, “Diffusion maps for textual network embedding,” in Advances in Neural Information Processing Systems, 2018.
 [105] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer: extraction and mining of academic social networks,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008, pp. 990–998.
 [106] Y. Ma, S. Wang, C. C. Aggarwal, D. Yin, and J. Tang, “Multidimensional graph convolutional networks,” arXiv preprint arXiv:1808.06099, 2018.
 [107] L. Tang and H. Liu, “Relational learning via latent social dimensions,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Ciscovery and Data Mining. ACM, 2009, pp. 817–826.
 [108] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo, “Graphgan: Graph representation learning with generative adversarial nets,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
 [109] M. Zitnik and J. Leskovec, “Predicting multicellular function through multilayer tissue networks,” Bioinformatics, vol. 33, no. 14, pp. i190–i198, 2017.
 [110] N. Wale, I. A. Watson, and G. Karypis, “Comparison of descriptor spaces for chemical compound retrieval and classification,” Knowledge and Information Systems, vol. 14, no. 3, pp. 347–375, 2008.
 [111] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity,” Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991.
 [112] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structures from nonenzymes without alignments,” Journal of molecular biology, vol. 330, no. 4, pp. 771–783, 2003.
 [113] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. Von Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Scientific data, vol. 1, p. 140022, 2014.
 [114] T. Joachims, “A probabilistic analysis of the rocchio algorithm with tfidf for text categorization.” Carnegiemellon univ pittsburgh pa dept of computer science, Tech. Rep., 1996.
 [115] H. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi, “Big data and its technical challenges,” Communications of the ACM, vol. 57, no. 7, pp. 86–94, 2014.
 [116] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and J. Riedl, “Movielens unplugged: experiences with an occasionally connected recommender system,” in Proceedings of the international conference on Intelligent user interfaces. ACM, 2003, pp. 263–266.
 [117] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell, “Toward an architecture for neverending language learning.” in Proceedings of the AAAI Conference on Artificial Intelligence, 2010, pp. 1306–1313.
 [118] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax,” arXiv preprint arXiv:1809.10341, 2018.
 [119] M. Zhang and Y. Chen, “Link prediction based on graph neural networks,” in Advances in Neural Information Processing Systems, 2018.
 [120] T. Kawamoto, M. Tsubaki, and T. Obuchi, “Meanfield theory of graph neural networks in graph partitioning,” in Advances in Neural Information Processing Systems, 2018, pp. 4362–4372.
 [121] D. Xu, Y. Zhu, C. B. Choy, and L. FeiFei, “Scene graph generation by iterative message passing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2017.
 [122] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph rcnn for scene graph generation,” in European Conference on Computer Vision. Springer, 2018, pp. 690–706.
 [123] Y. Li, W. Ouyang, B. Zhou, J. Shi, C. Zhang, and X. Wang, “Factorizable net: an efficient subgraphbased framework for scene graph generation,” in European Conference on Computer Vision. Springer, 2018, pp. 346–363.
 [124] J. Johnson, A. Gupta, and L. FeiFei, “Image generation from scene graphs,” arXiv preprint, 2018.
 [125] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” arXiv preprint arXiv:1801.07829, 2018.
 [126] L. Landrieu and M. Simonovsky, “Largescale point cloud semantic segmentation with superpoint graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
 [127] G. Te, W. Hu, Z. Guo, and A. Zheng, “Rgcnn: Regularized graph cnn for point cloud segmentation,” arXiv preprint arXiv:1806.02952, 2018.
 [128] V. G. Satorras and J. B. Estrach, “Fewshot learning with graph neural networks,” in Proceedings of the International Conference on Learning Representations, 2018.
 [129] M. Guo, E. Chou, D.A. Huang, S. Song, S. Yeung, and L. FeiFei, “Neural graph matching networks for fewshot 3d action recognition,” in European Conference on Computer Vision. Springer, 2018, pp. 673–689.
 [130] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3d graph neural networks for rgbd semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5199–5208.
 [131] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6584–6592.
 [132] X. Chen, L.J. Li, L. FeiFei, and A. Gupta, “Iterative visual reasoning beyond convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
 [133] M. Narasimhan, S. Lazebnik, and A. Schwing, “Out of the box: Reasoning with graph convolution nets for factual visual question answering,” in Advances in Neural Information Processing Systems, 2018, pp. 2655–2666.
 [134] Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “Highorder graph convolutional recurrent neural network: a deep learning framework for networkscale traffic learning and forecasting,” arXiv preprint arXiv:1802.07007, 2018.
 [135] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li, “Deep multiview spatialtemporal network for taxi demand prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, pp. 2588–2595.
 [136] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Largescale information network embedding,” in Proceedings of the International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2015, pp. 1067–1077.
 [137] A. Fout, J. Byrd, B. Shariat, and A. BenHur, “Protein interface prediction using graph convolutional networks,” in Advances in Neural Information Processing Systems, 2017, pp. 6530–6539.
 [138] J. You, B. Liu, R. Ying, V. Pande, and J. Leskovec, “Graph convolutional policy network for goaldirected molecular graph generation,” in Advances in Neural Information Processing Systems, 2018.
 [139] M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to represent programs with graphs,” in Proceedings of the International Conference on Learning Representations, 2017.
 [140] J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang, “Deepinf: Social influence prediction with deep learning,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp. 2110–2119.
 [141] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2018, pp. 2847–2856.

[142]
E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “Gram: graphbased attention model for healthcare representation learning,” in
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 787–795.  [143] E. Choi, C. Xiao, W. Stewart, and J. Sun, “Mime: Multilevel medical embedding of electronic health records for predictive healthcare,” in Advances in Neural Information Processing Systems, 2018, pp. 4548–4558.
 [144] T. H. Nguyen and R. Grishman, “Graph convolutional networks with argumentaware pooling for event detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, pp. 5900–5907.
 [145] Z. Li, Q. Chen, and V. Koltun, “Combinatorial optimization with graph convolutional networks and guided tree search,” in Advances in Neural Information Processing Systems, 2018, pp. 536–545.
 [146] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[147]
Q. Li, Z. Han, and X.M. Wu, “Deeper insights into graph convolutional networks for semisupervised learning,” in
Proceedings of the AAAI Conference on Artificial Intelligence, 2018.