# Learning Vertex Convolutional Networks for Graph Classification

In this paper, we develop a new aligned vertex convolutional network model to learn multi-scale local-level vertex features for graph classification. Our idea is to transform the graphs of arbitrary sizes into fixed-sized aligned vertex grid structures, and define a new vertex convolution operation by adopting a set of fixed-sized one-dimensional convolution filters on the grid structure. We show that the proposed model not only integrates the precise structural correspondence information between graphs but also minimises the loss of structural information residing on local-level vertices. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model.

## Authors

• 15 publications
• 13 publications
• 46 publications
• 3 publications
• 11 publications
04/06/2019

### Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification

In this paper, we develop a novel Aligned-Spatial Graph Convolutional Ne...
09/04/2018

### A Quantum Spatial Graph Convolutional Neural Network using Quantum Passing Information

In this paper, we develop a new Quantum Spatial Graph Convolutional Neur...
02/08/2020

### A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs

In this paper, we develop a new graph kernel, namely the Hierarchical Tr...
02/27/2018

### Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition

Variations of human body skeletons may be considered as dynamic graphs, ...
02/23/2020

### Structural Parameterizations with Modulator Oblivion

It is known that problems like Vertex Cover, Feedback Vertex Set and Odd...
05/29/2018

### Lovasz Convolutional Networks

Semi-supervised learning on graph structured data has received significa...
11/16/2017

### On the Parikh-de-Bruijn grid

We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Graph-based representations are powerful tools to analyze real-world structured data that encapsulates pairwise relationships between its parts [Defferrard et al.2016, Zambon et al.2018]

. One fundamental challenge arising in the analysis of graph-based data is to represent discrete graph structures as numeric features that preserve the topological information. Due to the recent successes of deep learning networks in computer vision problems, many researchers have devoted their efforts to generalizing Convolutional Neural Networks (CNNs)

[Vinyals et al.2015, Krizhevsky et al.2017] to the graph domain. These neural networks on graphs are now widely known as Graph Convolutional Networks (GCNs) [Kipf and Welling2016], and have proven to be an effective way to extract highly meaningful statistical features for graph classification problems [Defferrard et al.2016].

Generally speaking, most existing state-of-the-art graph convolutional networks are developed based on two strategies, i.e., a) the spectral and b) the spatial strategies. Specifically, approaches based on the spectral strategy employ the property of the convolution operator from the graph Fourier domain that is related to spectral graph theory [Bruna et al.2013]

. By transforming the graph into the spectral domain through the eigenvectors of the Laplacian matrix, these methods perform the filter operation by multiplying the graph by a series of filter coefficients

[Bruna et al.2013, Rippel et al.2015, Henaff et al.2015]. Unfortunately, most spectral-based approaches demand the size of the graph structures to be the same and cannot be performed on graphs with different sizes and Fourier bases. As a result, approaches based on the spectral strategy are usually applied to vertex classification tasks. Methods based on the spatial strategy, on the other hand, generalize the convolution operation to the spatial structure of a graph by propagating features between neighboring vertices [Vialatte et al.2016, Duvenaud et al.2015, Atwood and Towsley2016]. Since spatial-based approaches are not restricted to the same graph structure, these methods can be directly applied to graph classification problems. Unfortunately, most existing spatial-based methods have relatively poor performance on graph classifications. The reason for this ineffectiveness is that these methods tend to directly sum up the extracted local-level vertex features from the convolution operation as global-level graph features through a SumPooling layer. As a result, the local topological information residing on the vertices of a graph may be discarded.

To address the shortcoming of the graph convolutional networks associated with SumPooling, a number of methods focusing on local-level vertex information have been proposed. For instance, [Niepert et al.2016] have developed a different graph convolutional network by re-ordering the vertices and converting each graph into fixed-sized vertex grid structures, where standard one-dimensional CNNs can be directly used. [Zhang et al.2018] have developed a novel Deep Graph Convolutional Neural Network model to preserve more vertex information through global graph topologies. Specifically, they propose a new SortPooling layer to transform the extracted vertex features of unordered vertices from the spatial graph convolution layers into a fixed-sized vertex grid structure. Then a traditional convolutional operation can be performed by sliding a fixed-sized filter over the vertex grid structures to further learn the topological information. The aforementioned methods focus more on local-level vertex features and outperform state-of-the-art graph convolutional network models on graph classification tasks. However, they tend to sort the vertex order based on the local structure descriptor of each individual graph. As a result, they cannot easily reflect the accurate topological correspondence information between graph structures. Furthermore, these approaches also lead to significant information loss. This usually occurs when they form a fixed-sized vertex grid structure and some vertices associated with lower ranking may be discarded. In summary, developing effective methods to preserve the structural information residing in graphs still remains a significant challenge.

To overcome the shortcoming of the aforementioned methods, we propose a new graph convolutional network model, namely the Aligned Vertex Convolutional Network, to learn multi-scale features from local-level vertices for graph classification. One key innovation of the proposed model is the identification of the transitively aligned vertices between graphs. That is, given three vertices , and from three sample graphs, assume and are aligned, and and are aligned, the proposed model can guarantee that and are also aligned. More specifically, the new model utilizes the transitive alignment procedure to transform different graphs into fixed-sized aligned vertex grid structures with consistent vertex orders. Overall, the main contributions are threefold.

First, we propose a new vertex matching method to transitively align the vertices of graphs. We show that this matching procedure can establish reliable vertex correspondence information between graphs, by gradually minimizing the inner-vertex-cluster sum of squares over the vertices of all graphs through a -means clustering method.

Second, with the transitive alignment information over a family of graphs to hand, we show how the graphs of arbitrary sizes can be mapped into fixed-sized aligned vertex grid structures. The resulting Aligned Vertex Convolutional Network model is defined by adopting fixed-sized one-dimensional convolution filters on the grid structure to slide over the entire ordered aligned vertices. We show that the proposed model can effectively learn the multi-scale characteristics residing on the local-level vertex features for graph classifications. Moreover, since all the original vertex information will be mapped into the aligned vertex grid structure through the transitive alignment, the grid structure not only precisely integrates the structural correspondence information but also minimises the loss of structural information residing on local-level vertices. As a result, the proposed model addresses the shortcomings of information loss and imprecise information representation arising in existing graph convolutional networks associated with SortPooling or SumPooling.

Third, we empirically evaluate the performance of the proposed model on graph classification problems. Experiments on benchmark graph datasets demonstrate the effectiveness.

## 2 Transitive Vertex Alignment Method

One main objective of this work is to convert graphs of arbitrary sizes into the fixed-sized aligned vertex grid structures, so that a fixed-sized convolution filter can directly slide over the grid structures to learn local-level structural features through vertices. To this end, we need to identify the correspondence information between graphs.

In this section, we introduce a new matching method to transitively align the vertices. We commence by designating a family of prototype representations that encapsulate the principle characteristics over all vectorial vertex representations in a set of graphs . Assume there are vertices from all graphs in , and the associated -dimensional vectorial representations of these vertices are . We employ -means [Witten et al.2011] to locate centroids over all representations in . Specifically, given clusters , the aim of -means is to minimize the objective function

 argminΩM∑j=1∑RKi∈cj∥RKi−μKj∥2, (1)

where is the mean of the vertex representations belonging to the -th cluster . Since Eq.(1) minimizes the sum of the square Euclidean distances between the vertex points and the centroid point of cluster , the set of centroid points can be seen as a family of -dimensional prototype representations that encapsulate representative characteristics over all graphs in .

To establish the correspondence information between the graph vertices over all graphs in , we align the vectorial vertex representations of each graph to the prototype representations in . Our alignment is similar to that introduced in [Bai et al.2015] for point matching in a pattern space. Specifically, for each sample graph and the associated -dimensional vectorial representation of each vertex , we compute a

-level affinity matrix in terms of the Euclidean distances between the two sets of points as

 AKp(i,j)=∥RKi−μKj∥2. (2)

where is a matrix, and each element represents the distance between the vectorial representation of vertex and the -prototype representation . If the value of is the smallest in row , the vertex is aligned to the -th prototype representation. Note that for each graph there may be two or more vertices aligned to the same prototype representation. We record the correspondence information using the -level correspondence matrix

 CKp(i,j)={1if AKp(i,j) is the smallest in row i0otherwise. (3)

For a pair of graphs and , if their vertices and are aligned to the same prototype representation, we say that and possess similar characteristics and are also aligned. Thus, we can identify the transitive alignment information between the vertices of all graphs in , by aligning their vertices to the same set of prototype representations. The alignment process is equivalent to assigning the vectorial representation of each vertex to the mean of the cluster . Thus, the proposed alignment procedure can be seen as an optimization process that gradually minimizes the inner-vertex-cluster sum of squares over the vertices of all graphs through -means, and can establish reliable vertex correspondence information over all graphs.

## 3 Learning Vertex Convolutional Networks

In this section, we develop a new vertex convolutional network model for graph classification. Our idea is to employ the transitive alignment information over a family of graphs and convert the arbitrary sized graphs into fixed-sized aligned vertex grid structures. We then define a vertex convolution operation by adopting a set of fixed-sized one-dimensional convolution filters on the grid structure. With the new vertex convolution operation to hand, the proposed model can extract the original aligned vertex grid structure as a new grid structure with a reduced number of packed aligned vertices, i.e., the extracted multi-scale vertex features learned through the convolutional operation is packed into the new grid structure. Finally, we employ the Softmax layer to read the extracted vertex features and predict the graph class.

### 3.1 Aligned Vertex Grid Structures of Graphs

In this subsection, we show how to convert graphs of different sizes into fixed-sized aligned vertex grid structures. For each sample graph from the graph set defined earlier, assume each of its vertices is represented as a

-dimensional feature vector. Then the features of all the

() vertices can be encoded using the matrix (i.e., ). If are vertex attributed graphs,

can be the one-hot encoding matrix of the vertex labels. For un-attributed graphs, we propose to use the vertex degree as the vertex label. Based on the transitive alignment method defined in Section

2, we commence by identifying the family of the -dimensional prototype representations in of . For each graph , we compute the -level vertex correspondence matrix , where the row and column of are indexed by the vertices in and the prototype representations in , respectively. With to hand, we compute the -level aligned vertex feature matrix for as

 XKp=(CKp)TFp, (4)

where is a matrix and each row of represents the feature of a corresponding aligned vertex. Since is computed by mapping the original feature information of each vertex to that of the new aligned vertices indexed by the corresponding prototypes in , it encapsulates all the original vertex feature information of .

For constructing the fixed-sized aligned vertex grid structure for each graph , we need to establish a consistent vertex order for all graphs in . As the vertices are all aligned to the same prototype representations, the vertex orders can be determined by reordering the prototype representations. To this end, we construct a prototype graph that captures the pairwise similarity between the prototype representations, then we reorder the prototype representations based on their degree. This process is equivalent to sorting the prototypes in order of average similarity to the remaining ones. Specifically, for the -dimensional prototype representations in , we compute the prototype graph as , where each vertex represents the prototype representation and each edge represents the similarity between a pair of prototype representations and . The similarity between two vertices of is computed as

 s(μKj,μKk)=exp(−∥μKj−μKk∥2K). (5)

The degree of each prototype representation is . We sort the -dimensional prototype representations in according to their degree . Then, we rearrange accordingly.

Finally, note that, to construct reliable grid structures for graphs, we employ the depth-based (DB) representations as the vectorial vertex representations to compute the required -level vertex correspondence matrix . The DB representation of each vertex is defined by measuring the entropies on a family of -layer expansion subgraphs rooted at the vertex [Bai and Hancock2014], where the parameter varies from to . It is shown that such a -dimensional DB representation encapsulates rich entropy content flow from each local vertex to the global graph structure, as a function of depth. The process of computing the correspondence matrix associated with DB representations is shown in the appendix file. When we vary the largest layer of the expansion subgraphs from to (i.e., ), we compute the final aligned vertex grid structure for each graph as

 Xp=L∑K=1XKpL, (6)

where is also a matrix as same as . Clearly, Eq.(6) transforms the original graphs of arbitrary sizes into a new aligned vertex grid structure with the same vertex number. Moreover, note that, the aligned vertex grid structure also preserve the original vertex feature information through the -level aligned vertex feature matrix .

### 3.2 The Aligned Vertex Convolutional Network

In this subsection, we develop a new Aligned Vertex Convolutional Network model that learns local-level vertex features for graph classifications. This model is defined by adopting a set of fixed-sized one-dimensional convolution filters on the aligned vertex grid structures and sliding the filter over the ordered aligned vertices to learn features, in a manner analogous to the standard convolution operation. Specifically, for each graph and its associated aligned vertex grid structure (i.e., aligned vertices each with feature channels), we denote the element of in the -th row and -th column as , i.e., the -th feature channel of the -th aligned vertex. We pass to the convolution layer. Assume the size of the receptive field is , i.e., the size of the one-dimensional convolution filter is , the vertex convolution operation associated with

-stride takes the form

 Ze,h=σ(c∑s=1(m∑j=1Wh,sjXe+j−1,s)+bh), (7)

where is the element in the -th row and -th column of the new grid structure after the convolution operation, the parameter satisfies , is the -th element of the convolution filter that maps the -th feature channel of to the -th feature channel of , is the bias of the -th convolution filter, and

is the activation function.

An example of the vertex convolution operation defined by Eq.(7) are show in Figure 1. The vertex convolution operation consists of two computational steps. In the first step, the convolution filter is applied to map the -th aligned vertex as well as its neighbor vertices () into a new feature value, associated with all the feature channels of these vertices. Specifically, Figure 1.(1) exhibits this process. Here, assume the vertex index , the convolution filter size , and we focus on the -nd aligned vertex of . The convolution filter represented by the red lines first maps the -th feature channels of the -nd aligned vertex as well as its neighbor vertices and into a new single value by , and then sums up the values computed through all the channels as the -th feature channel of . Moreover, we need to slide the convolution filter over all the aligned vertices, and this requires three convolution filters represented by the green, red and blue lines respectively. The weights for the three filters are shared, i.e., they are in fact the same filter. Finally, the second step , where

, applies the Relu function associated with the bias

and outputs the final result as .

To further extract the multi-scale features of a graph associated with its aligned vertex grid structure , we stack multiple vertex convolution layers defined as follows

 Zte,h=σ(c∑s=1(m∑j=1Wt,h,sjXt−1e+j−1,s)+bt,h), (8)

where is the input aligned vertex grid structure , and the corresponding notations of the symbols are listed in Table 1. After a number of vertex convolution operations, we employ the Softmax layer to read the extracted features computed from the vertex convolution layers and predict the graph class for graph classifications.

Discussions: Comparing to existing state-of-the-art graph convolution networks, the proposed Aligned Vertex Convolution Network (AVCN) model has a number of advantages.

First, unlike the Neural Graph Fingerprint Network (NGFN) model [Duvenaud et al.2015] and the Diffusion Convolution Neural Network (DCNN) model [Atwood and Towsley2016] that both employ a SumPooling layer to directly sum up the extracted local-level vertex features from the convolution operation as global-level graph features. The proposed AVCN model focuses more on learning local structural features through the proposed aligned vertex grid structure. Specifically, Figure 1 indicates that the associated vertex convolution operation of the proposed AVCN model can convert the original aligned vertex grid structure into a new grid structure, by packing the aligned vertex features from the original grid structure into the new grid structure. Thus, the new grid structure can be seen as a new extracted aligned vertex grid structure with a reduced number of aligned vertices. As a result, the proposed AVCN model can gradually extract multi-scale local-level vertex features through a number of stacked vertex convolution layers, and encapsulate more significant local structural information than the NGFN and DCNN models associated with SumPooling.

Second, similar to the proposed AVCN model, both the PATCHY-SAN based Graph Convolution Neural Network (PSGCNN) model [Niepert et al.2016] and the Deep Graph Convolution Neural Network model [Zhang et al.2018] need to rearrange the vertex order of each graph structure and transform each graph into the fixed-sized vertex grid structure. Unfortunately, both the PSGCNN and the DGCNN models sort the vertices of each graph based on the local structural descriptor, ignoring consistent vertex correspondence information between different graphs. By contrast, the proposed AVCN model associates with a transitive vertex alignment procedure to transform each graph into an aligned fixed-sized vertex grid structure. As a result, only the proposed AVCN model can integrate the precise structural correspondence information over all graphs under investigations.

Third, when the PSGCNN model and the DGCNN model form fixed-sized vertex grid structures, some vertices with lower ranking will be discarded. This in turn leads to significant information loss. By contrast, the required aligned vertex grid structures for the proposed AVCN model can encapsulate all the original vertex features from the original graphs. As a result, the proposed AVCN overcomes the shortcoming of information loss arising in the PSGCNN and DGCNN models.

## 4 Experiments

In this section, we compare the performance of the proposed AVCN model to both state-of-the-art graph kernels and deep learning methods on graph classification problems on eight standard graph datasets. These datasets are abstracted from bioinformatics, computer vision and social networks. A selection of statistics of these datasets are shown in Table.2.

Experimental Setup: We evaluate the performance of the proposed AVCN model on graph classification problems against a) six alternative state-of-the-art graph kernels and b) six alternative state-of-the-art deep learning methods for graphs. Specifically, the graph kernels include 1) Jensen-Tsallis q-difference kernel (JTQK) with  [Bai et al.2014], 2) the Weisfeiler-Lehman subtree kernel (WLSK) [Shervashidze et al.2010], 3) the shortest path graph kernel (SPGK) [Borgwardt and Kriegel2005], 4) the shortest path kernel based on core variants (CORE SP) [Nikolentzos et al.2018], 5) the random walk graph kernel (RWGK) [Kashima et al.2003], and 6) the graphlet count kernel (GK) [Shervashidze et al.2009]. The deep learning methods include 1) the deep graph convolutional neural network (DGCNN) [Zhang et al.2018], 2) the PATCHY-SAN based convolutional neural network for graphs (PSGCNN) [Niepert et al.2016], 3) the diffusion convolutional neural network (DCNN) [Atwood and Towsley2016], 4) the deep graphlet kernel (DGK) [Yanardag and Vishwanathan2015], 5) the graph capsule convolutional neural network (GCCNN) [Verma and Zhang2018], and 6) the anonymous walk embeddings based on feature driven (AWE) [Ivanov and Burnaev2018].

For the experiment, the proposed AVCN model uses the same network structure on all graph datasets. Specifically, we set the channel of each vertex convolution operation as , and the number of the prototype representations as , i.e., the vertex numbers of the aligned vertex grid structures for the graphs in any dataset are all . To extract different hierarchical multi-scale local vertex features, we propose to input the aligned vertex grid structure of each graph to a family of paralleling stacked vertex convolution layers associated with different convolution filter sizes. Specifically, the architecture of the AVCN model is ---. Here, denotes a vertex convolution layer consisting of paralleling vertex convolution filters each with channels, and the filter sizes are , , and respectively. denotes a fully-connected layer consisting of hidden units. An example of the architecture -- for the proposed AVCN model are shown in Figure 2. We set the stride of each filter in layer as . With extracted patterns learned from the paralleling stacked vertex convolution layers to hand, we concatenate them and add a new fully-connected layer followed by a Softmax layer to learn the graph class. We set the dropout rate for the fully connected layer as

. We employ the rectified linear units (ReLU) as the active function for the convolution layers. The only hyperparameter that we need to be optimized is the learning rate, the number of epochs, and the batch size for the mini-batch gradient decent algorithm. To optimize the our AVCN model, we utilize the Stochastic Gradient Descent with the Adam updating rules. Finally, note that, our AVCN model needs to construct the prototype representations to identify the transitive vertex alignment information over all graphs. In this evaluation we proposed to compute the prototype representations from both the training and testing graphs. Thus, our model is an instance of transductive learning

[Gammerman et al.1998], where all graphs are used to compute the prototype representations but the class labels of the testing graphs are not used during the training process. For our model, we perform -fold cross-validation to compute the classification accuracies, with nine folds for training and one fold for testing. For each dataset, we repeat the experiment 10 times and report the average classification accuracies and standard errors in Table.3.

For the alternative kernel methods, we set the parameters of the maximum subtree height for both the WLSK and JTQK kernels as , based on the previous empirical studies in the original papers. For each alternative graph kernel, we perform

-fold cross-validation associated with the LIBSVM implementation of C-Support Vector Machines (C-SVM) to compute the classification accuracies. We repeat the experiment 10 times for each kernel and dataset and we report the average classification accuracies and standard errors in Table.

3. Note that for some kernels we directly report the best results from the original corresponding papers, since the evaluation of these kernels followed the same setting of ours. On the other hand, for the alternative deep learning methods, we report the best results for the PSGCNN and DGK models from their original papers. Note that, these methods were evaluated based on the same setting with the proposed AVCN model. For the DCNN model, we report the best results from the work of Zhang et al., [Zhang et al.2018], following the same setting of ours. For the AWE model, we report the classification accuracies of the feature-driven AWE, since the author have stated that this kind of AWE model can achieve competitive performance on label dataset. Finally, note that the PSGCNN model can leverage additional edge features, most of the graph datasets and the alternative methods do not leverage edge features. Thus, we do not report the results associated with edge features in the evaluation. The classification accuracies and standard errors for each deep learning method are shown in Table.4. Note that, the alternative deep learning methods have been evaluated on the Reeb and GatorBait datasets abstracted from computer vision by any author, we do not include the accuracies for these methods.

Experimental Results and Discussions: Table.3 and Table.4

indicate that the proposed AVCN model can outperform the alternative state-of-the-art methods including either the graph kernels or the deep learning methods for graphs. Specifically, for the alternative graph kernels, only the accuracy of the SPGK kernel on the IBDM-M dataset is a little higher than that of the proposed AVCN model. On the other hand, for the alternative deep learning methods, only the accuracies of the GCCNN model on the PROTEINS dataset and the AWE model on the IMDB-M dataset are a little higher than those of the proposed AVCN model. The reasons for the effectiveness are threefold. First, these alternative graph kernels are typical examples of R-convolution kernels and are based on measuring any pair of substructures, ignoring the correspondence information between the substructures. By contrast, the proposed model associated with aligned vertex grid structure incorporates the transitive alignment information between graphs, and thus better reflect graph characteristics. Furthermore, the C-SVM classifier associated with graph kernels can only be seen as a shallow learning framework

[Zhang et al.2015]. By contrast, the proposed model can provide an end-to-end deep learning architecture, and thus better learn graph characteristics. Second, similar to the alternative graph kernels, all the alternative deep learning methods also cannot integrate the correspondence information between graphs into the learning architecture. Especially, the PSGCNN and DGCNN models need to reorder the vertices and some vertices may be discarded, leading to information loss. By contrast, the associated aligned vertex grid structures can preserve all the information of original graphs. Third, unlike the proposed model, the DCNN model needs to sum up the extracted local-level vertex features as global-level graph features. By contrast, the proposed model can learn richer multi-scale local-level vertex features. The experiments demonstrate the effectiveness of the proposed model.

## 5 Conclusion

In this paper, we have developed a new aligned vertex convolutional network model for graph classification. The proposed model cannot only integrates the precise structural correspondence information between graphs but also minimises the loss of structural information residing on local-level vertices. Experiments demonstrate the effectiveness of the proposed vertex convolution network model.

## References

• [Atwood and Towsley2016] James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Proceedings of NIPS, pages 1993–2001, 2016.
• [Bai and Hancock2014] Lu Bai and Edwin R. Hancock. Depth-based complexity traces of graphs. Pattern Recognition, 47(3):1172–1186, 2014.
• [Bai et al.2014] Lu Bai, Luca Rossi, Horst Bunke, and Edwin R. Hancock. Attributed graph kernels using the jensen-tsallis q-differences. In Proceedings of ECML-PKDD, pages 99–114, 2014.
• [Bai et al.2015] Lu Bai, Luca Rossi, Zhihong Zhang, and Edwin R. Hancock. An aligned subtree kernel for weighted graphs. In Proceedings of ICML, pages 30–39, 2015.
• [Borgwardt and Kriegel2005] Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Proceedings of the IEEE International Conference on Data Mining, pages 74–81, 2005.
• [Bruna et al.2013] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. CoRR, abs/1312.6203, 2013.
• [Defferrard et al.2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of NIPS, pages 3837–3845, 2016.
• [Duvenaud et al.2015] David K. Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of NIPS, pages 2224–2232, 2015.
• [Gammerman et al.1998] Alexander Gammerman, Katy S. Azoury, and Vladimir Vapnik. Learning by transduction. In Proceedings of UAI, pages 148–155, 1998.
• [Henaff et al.2015] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. CoRR, abs/1506.05163, 2015.
• [Ivanov and Burnaev2018] Sergey Ivanov and Evgeny Burnaev. Anonymous walk embeddings. In Proceedings of ICML, pages 2191–2200, 2018.
• [Kashima et al.2003] Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of ICML, pages 321–328, 2003.
• [Kipf and Welling2016] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016.
• [Krizhevsky et al.2017] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, 2017.
• [Niepert et al.2016] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In Proceedings of ICML, pages 2014–2023, 2016.
• [Nikolentzos et al.2018] Giannis Nikolentzos, Polykarpos Meladianos, Stratis Limnios, and Michalis Vazirgiannis. A degeneracy framework for graph similarity. In Proceedings of IJCAI, pages 2595–2601, 2018.
• [Rippel et al.2015] Oren Rippel, Jasper Snoek, and Ryan P. Adams. Spectral representations for convolutional neural networks. In Proceddings of NIPS, pages 2449–2457, 2015.
• [Shervashidze et al.2009] N. Shervashidze, S.V.N. Vishwanathan, K. Mehlhorn T. Petri, and K. M. Borgwardt. Efficient graphlet kernels for large graph comparison.

Journal of Machine Learning Research

, 5:488–495, 2009.
• [Shervashidze et al.2010] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 1:1–48, 2010.
• [Verma and Zhang2018] Saurabh Verma and Zhi-Li Zhang. Graph capsule convolutional neural networks. CoRR, abs/1805.08090, 2018.
• [Vialatte et al.2016] Jean-Charles Vialatte, Vincent Gripon, and Grégoire Mercier. Generalizing the convolution operator to extend cnns to irregular domains. CoRR, abs/1606.01166, 2016.
• [Vinyals et al.2015] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of CVPR, pages 3156–3164, 2015.
• [Witten et al.2011] Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2011.
• [Yanardag and Vishwanathan2015] Pinar Yanardag and S. V. N. Vishwanathan. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, pages 1365–1374, 2015.
• [Zambon et al.2018] Daniele Zambon, Cesare Alippi, and Lorenzo Livi.

Concept drift and anomaly detection in graph streams.

IEEE Trans. Neural Netw. Learning Syst., 29(11):5592–5605, 2018.
• [Zhang et al.2015] Shi-Xiong Zhang, Chaojun Liu, Kaisheng Yao, and Yifan Gong. Deep neural support vector machines for speech recognition. In Proceedings of ICASSP, pages 4275–4279, 2015.
• [Zhang et al.2018] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Proceedings of AAAI, 2018.