A Quantum Spatial Graph Convolutional Neural Network using Quantum Passing Information

09/04/2018
by   Lu Bai, et al.
0

In this paper, we develop a new Quantum Spatial Graph Convolutional Neural Network (QSGCNN) model that can directly learn a classification function for graphs of arbitrary sizes. Unlike state-of-the-art Graph Convolutional Neural Network (GCN) models, the proposed QSGCNN model incorporates the process of identifying transitive aligned vertices between graphs, and transforms arbitrary sized graphs into fixed-sized aligned vertex grid structures. To further learn representative graph characteristics, a new quantum spatial graph convolution is proposed and employed to extract multi-scale vertex features, in terms of quantum passing information between grid vertices of each graph. Since the quantum spatial convolution preserves the property of the input grid structures, the proposed QSGCNN model allows to directly employ the traditional convolutional neural network to further learn from the global graph topology, providing an end-to-end deep learning architecture that integrates the graph representation and learning in the quantum spatial graph convolution layer and the traditional convolutional layer for graph classifications. We demonstrate the effectiveness of the proposed QSGCNN model in terms of the theoretical connections to state-of-the-art methods. The proposed QSGCNN model addresses the shortcomings of information loss and imprecise information representation arising in existing GCN models associated with SortPooling or SumPooling layers. Experimental results on benchmark graph classification datasets demonstrate the effectiveness of the proposed QSGCNN model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

04/06/2019

Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification

In this paper, we develop a novel Aligned-Spatial Graph Convolutional Ne...
12/10/2017

DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model

Convolutional neural networks (CNNs) can be applied to graph similarity ...
02/26/2019

Learning Vertex Convolutional Networks for Graph Classification

In this paper, we develop a new aligned vertex convolutional network mod...
06/06/2021

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

Tabular data are ubiquitous for the widespread applications of tables an...
09/24/2020

PK-GCN: Prior Knowledge Assisted Image Classification using Graph Convolution Networks

Deep learning has gained great success in various classification tasks. ...
07/30/2020

grid2vec: Learning Efficient Visual Representations via Flexible Grid-Graphs

We propose grid2vec, a novel approach for image representation learning ...
03/09/2022

SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters

This work presents SkinningNet, an end-to-end Two-Stream Graph Neural Ne...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Graph-based representations have been widely employed to model and analyze data that lies on high-dimensional non-Euclidean domains and that is naturally described in terms of pairwise relationships between its parts [1]

. Typical instances of problems where data can be represented using graphs include a) classifying proteins or chemical compounds 

[2], b) recognizing objects from digital images [3], c) visualizing complex networks [4]

(e.g., Twitter and Facebook social networks, DBLP citation networks, transportation networks, etc.). A fundamental challenge arising in the analysis of real-world data represented as graphs is the lack of a clear and accurate what to represent discrete graph structures into numeric features that can be directly analyzed by standard machine learning techniques 

[5].

To this end, the aim of this paper is to develop a new graph convolutional neural network model using quantum information passing between vertices, for graph classification tasks. Our model is based on identifying the transitive alignment information between vertices of all different graphs. The alignment procedure cannot only provide a way of mapping each graph into fixed-sized vertex grid structure, but also bridge the gap between the graph convolution layer and the traditional convolutional neural network layer.

I-a Literature Review

There have been a large number of approaches that can convert the graph structures into a numeric characteristics and provide a way of directly applying standard machine learning algorithm to problems of graph classification or clustering. Generally speaking, in the last three decades, most classical state-of-the-art approaches to the analysis of graph structures can be divided into two classes, namely 1) graph embedding methods and 2) graph kernels. The methods from the first class aim to represent graphs as vectors of permutation invariant features, so that one can directly employ standard vectorial machine learning algorithms 

[6, 7, 8]. All the previous approaches are based on the computation of explicit embedding onto low dimensional vector spaces, which inevitably leads to the loss of structural information. Graph kernels, on the other hand, try to soften this limitation by (implicitly) mapping graphs to a high dimensional Hilbert space where the structural information is better preserved [9]. The majority of state-of the-art graph kernels is represented by instances of the R-convolution kernel originally proposed by Haussler [10]. The main idea underpinning R-convolution graph kernels is that of decomposing graphs into simple substructures (e.g, walks, paths, subtrees, and subgraphs) to then measure the similarity between a pair of input graphs in terms of the similarity between their constituent substructures. Representative R-convolution graph kernels include the Weisfeiler-Lehman subtree kernel [11], the subgraph matching kernel [2], the aligned subtree kernel [12], etc. A common limitation shared by both graph embedding methods and kernels is that of ignoring information from multiple graphs. This is because graph embedding methods usually capture structural features of individual graphs, while graph kernels reflect structural characteristics for pairs of graphs.

In most recent years, deep learning networks emerged as an effective way to extract highly meaningful statistical patterns in large-scale and high-dimensional data 

[13]

. As evidenced by their recent successes in computer vision problems, convolutional neural networks (CNNs) 

[14, 15] are one of the most popular class of deep learning architectures and many researchers have devoted their efforts to generalizing CNNs to the graph domain [16]. Unfortunately, applying CNNs on graphs in a straightforward way tends to be elusive, since these networks are designed to operate on regular grids [1] and the associated operations of convolution, pooling and weight-sharing cannot be easily extended to graphs.

To address the aforementioned problem, two leading strategies, namely a) the spectral strategy and b) the spatial strategy, have been proposed and employed to extend convolutional neural networks to graph structures. Specifically, approaches using the spectral strategy draw on the properties of convolution operators from the graph Fourier domain, which is related to the graph Laplacian [17, 18, 19]

. By transforming the graph into the spectral domain through the Laplacian matrix eigenvectors and the graph can be multiplied by an array of filter coefficients to perform the filtering operation. Unfortunately, most spectral-based approaches require the size of the graph structure be the same and cannot be performed on graphs with different sizes and Fourier basis. As a result, the approaches based on spectral strategy are usually applied for vertex classification task. By contrast, methods based on the spatial strategy are not restricted to the same graph structure. These methods generalize the convolution operation to the spatial structure of a graph by propagating features between neighboring vertices 

[20]. For instance, Duvenaud et al. [21] have proposed a Neural Graph Fingerprint Network by propagating vertex features between their -layer neighbors to simulate the traditional circular fingerprint. Atwood and Towsley [22] have proposed a Diffusion Convolution Neural Network by propagating vertex features between neighbors of different layers rooted at a vertex. Although the spatial strategy-based approaches can be directly applied to real-world applications of graph classifications, most existing methods have relatively worse performance on graph classifications. This is because these methods tend to directly sum up the extracted local-level vertex features from the convolution operation as global-level graph features through a SumPooling layer. It is difficult to further learn the graph topological information through the global features.

To overcome the shortcoming of the graph convolutional neural networks associated with SumPooling, unlike the works in [21] and [22], Nieper et al. [23] have developed a different graph convolutional neural network by constructing a fixed-sized local neighborhood for each vertex and re-ordering the vertices based on graph labeling methods and graph canonization tools. This procedure naturally forms a fixed-sized vertex grid structure for each graph, and the graph convolution operation can be performed by sliding a fixed-sized filter over spatially neighboring vertices. This operation is similar to that performed on images with standard convolutional neural networks. Zhang et al. [24] have developed a novel Deep Graph Convolutional Neural Network model that can preserve more vertex information and learn from the global graph topology. Specifically, this model utilizes a new developed SortPooling layer, that can transform the extracted vertex features of unordered vertices from spatial graph convolution layers into a fixed-sized vertex grid structure. Then a traditional convolutional neural networks can be performed on the grid structures and further learn the graph topological information.

Although both the methods of Nieper et al. and Zhang et al. outperform state-of-the-art graph convolutional neural network models and graph kernels on graph classification tasks, these approaches suffer from the drawback of ignoring structural correspondence information between graphs, or rely on simple but inaccurate heuristics to align the vertices of the graphs, i.e., they sort the vertex orders based on the local structure descriptor of each individual graph and ignore the vertex correspondence information between different graphs. As a result, both the methods cannot reflect precise topological correspondence information of graph structures. Moreover, these approaches also lead to significant information loss. This usually occurs when these approaches form the fixed-sized vertex grid structure and some vertices associated with lower ranking may be discarded. As a summary, developing effective method to preserve the structural information residing in graphs still remains a challenge.

I-B Contribution

The aim of this paper is to overcome the shortcomings of the aforementioned methods by developing a new spatial graph convolutional neural network model. One key innovation of the new model is the identification of the transitive vertex alignment information between graphs. Specifically, the new model can employ the transitive alignment information to map different sized graphs into fixed-sized aligned representations, i.e., it can transform different graphs into fixed-sized aligned grid structures with consistent vertex orders. Since the aligned grid structure can precisely integrate the structural correspondence information and completely preserve the original graph topology and vertex feature information without any information loss, it cannot only bridge the gap between the spatial graph convolution layer and the traditional convolutional neural network layer, but also address the shortcomings of information loss and imprecise information representation arising in most state-of-the-art graph convolutional neural networks associated with SortPooling or SumPooling layers. Overall, the main contributions of this work are threefold.

  • First, we develop a new framework for transitively aligning the vertices of a family of graphs in terms of depth-based entropy measures. This framework can establish reliable vertex correspondence information between graphs, by gradually minimizing the inner-vertex-cluster sum of squares over the vertices of all graphs. We show that this framework can be further employed to map graphs of arbitrary sizes into fixed-sized aligned vertex grid structures and associated corresponding aligned grid vertex adjacency matrices, integrating precise structural correspondence information and without any structural information loss. The resulting grid structures can bridge the gap between the spatial graph convolution layer and the traditional convolutional neural network layer.

  • Second, with the aligned vertex grid structure and the associated aligned grid vertex adjacency matrix of each graph to hand, we propose a novel quantum spatial graph convolution layer to extract multi-scale vertex features, in terms of the quantum information passing between vertices. The quantum information is formulated by the average mixing matrix of the continuous-time quantum walk evolved on the associated adjacency matrix. We show that the new convolution layer theoretically overcomes the shortcoming of popular graph convolutional neural networks and graph kernels, explaining why it can effectively work. Moreover, since the convolution layer also preserves the property of the input grid structures, it allows us to directly employ the traditional convolutional neural network to further learn from the global graph topology, providing an end-to-end deep learning architecture that integrates the graph representation and learning in the quantum spatial graph convolution layer and the traditional convolutional layer for graph classifications.

  • Third, we empirically evaluate the proposed Quantum Spatial Graph Convolutional Neural Network (QSGCNN). Experimental results on benchmark graph classification datasets demonstrate that our proposed QSGCNN significantly outperforms state-of-the-art graph kernels and deep graph convolutional network models for graph classifications.

Ii Preliminary Concepts

In this section, we introduce some preliminary concepts that will be use in this work. We commence by introducing the average mixing matrix of continuous-time quantum walk. We show that this matrix encapsulates rich quantum passing information between any pair of vertices. Furthermore, we introduce the concept of depth-based representations. We show that the representation reflects rich nested structural information rooted at each vertex, and can be seen as the vertex point in the depth space. Finally, we introduce a new framework of identifying the transitive vertex alignment information between graphs, by aligning the depth-based representations.

Ii-a Continuous-time Quantum Walks

In this work, one main objective is to develop a spatial graph convolution layer to extract multi-scale vertex features by gradually propagating each vertex information to its neighboring vertices as well as the vertex itself. This usually requires the passing information between each vertex and its neighboring vertices. Most existing methods employ the vertex adjacency matrix of each graph as the formulation of the vertex passing information [21, 22, 23, 24]. To capture richer vertex features from the proposed graph convolutional layer, in this work we propose to employ the quantum passing information in terms of the continuous-time quantum walk that is the quantum analogue of its classical continuous-time random walk [25].

The main reason of using this quantum walk is that, unlike the classical random walk where the state vector is real-valued and the evolution is governed by a doubly stochastic matrix, the state vector of the quantum walk is complex-valued and its evolution is governed by a time-varying unitary matrix. Thus, the quantum walk evolution is reversible, implying that it is non-ergodic and do not possess a limiting distribution. As a result, the behaviour of quantum walks is significantly different from its classical counterpart and possesses a number of important properties. One notable consequence of the interference properties of quantum walks is that when a quantum walker backtracks on an edge it does so with opposite phase, this significantly reduces the problem of tottering. Furthermore, since the quantum walk is not dominated by the low frequency of the Laplacian spectrum, it has better ability to distinguish different graph structures. In Section 

III, we will show that the proposed graph convolutional layer associated with the continuous-time quantum can not only reduce the tottering problem arising in some state-of-the-art graph kernels and graph convolutional network models, but also better discriminate different graph structures.

In this subsection, we review the concept of the continuous-time quantum walk. Specifically, we propose to use the average mixing matrix to summarize the time-averaged behaviour of the quantum walk and formulate the quantum passing information between vertices. The continuous-time quantum walk is the quantum analogue of the continuous-time classical random walk [25], where the latter models a Markovian diffusion process over the vertices of a graph through the transitions between adjacent vertices. Let a sample graph be denoted as with vertex set and edge set . Like the classical random walk, the state space of the quantum walk is the vertex set . Its state at time is a complex linear combination of the basis states

(1)

where and are the amplitude and both complex. Furthermore,

indicates the probability of the walker visiting vertex

at time , , and , for all , . Unlike the continuous-time classical random walk, the continuous-time quantum walk evolves based on the Schrödinger equation

(2)

where represents the system Hamiltonian. In this work, we use the adjacency matrix as the Hamiltonian. The behaviour of a quantum walk over the graph at time can be summarized using the mixing matrix [26]

(3)

where the operation symbol represents the Schur-Hadamard product of and . Because is unitary, is a doubly stochastic matrix and each entry indicates the probability of the walk visiting vertex at time when the walk initially starts from vertex . However, cannot converge, because is also norm-preserving. To overcome this problem, we can enforce convergence by taking a time average. Specifically, we take the Cesàro mean and define the average mixing matrix as

(4)

where each entry of the average mixing matrix represents the average probability for a quantum walk to visit vertex starting from vertex , and is still a doubly stochastic matrix. Furthermore, Godsil [26] has proved that the entries of are rational numbers. We can easily compute from the spectrum of the Hamiltonian. Specifically, let the adjacency matrix of be the Hamiltonian . Assume the s represent the

distinct eigenvalues of

and

is the matrix representation of the orthogonal projection on the eigenspace associated with

, i.e., Then, we can re-write the average mixing matrix as

(5)

Ii-B Vertex Points based on Depth-based Representations

In this subsection, we briefly review how to compute the depth-based representation as the vertex point. For a sample graph and each vertex , let a vertex set be defined as , where is the shortest path matrix of and corresponds to the shortest path length between vertices and . The -layer expansion subgraph rooted at is

(6)

Based on [27], the -dimensional depth-based representation rooted at is

(7)

where is the Shannon entropy of associated with the steady state of the random walk [27].

It is shown that the -dimensional depth-based representation can be seen as a nested vertex representation of that reflects both the local and global structure information of a graph [28]. This is because the family of the -layer expansion subgraphs satisfy the fact that . As a result, the entropy information of each -layer expansion subgraph includes that of the -layer to -layer expansion subgraphs.

In [12], the depth-based representations have been successfully employed as vertex points in vector space to develop a vertex matching kernel, by identifying vertex correspondence information between graphs in terms of the distance measures between the vertex points. The matching kernel demonstrates the effectiveness of the alignment procedure associated with the depth-based representations. Unfortunately, this alignment procedure cannot guarantee the transitivity between pairs of aligned vertices, i.e., if vertices and are aligned, vertices and are aligned, we cannot guarantee that and are also aligned. This is because the vertex correspondence information is identified between each individual pair of graphs. To overcome the non-transitivity problem, in next subsection we will introduce a new vertex alignment method.

Ii-C Transitive Vertex Alignment Through Prototype Representations

Fig. 1: The process of the transitive alignment procedure. Given a set of graphs, for each graph : (1) we compute the -dimensional depth-based (DB) representation rooted at each vertex v (e.g., vertex 2); (2) we identify a family of -dimensional prototype representations

using k-means on the

-dimensional DB representations of all graphs; (3) we align the -dimensional DB representations to the -dimensional prototype representations and compute a -level correspondence matrix .

In this subsection, we develop a new transitive vertex alignment approach, by aligning the depth-based representations of each graph to a family of representative depth-based representations. Specifically, the process of the alignment procedure is shown in Fig.1. To this end, we commence by identifying a family of prototype representations that reflect the main characteristics of the depth-based representations over of a set of graphs . Assume each graph in has vertices and its associated -dimensional depth-based representations of these vertices are . We use the -means [29] to identify centroids over all representations in . Specifically, given clusters , the -means method aims to minimize the following objective function

(8)

where is the mean of the depth-based representation vectors belonging to the -th cluster . Since Eq.(8) minimizes the sum of the square Euclidean distances between the vertex points and their centroid point of cluster , the centroid points can reflect main characteristics of all -dimensional depth-based representations in , i.e., these centroids can be seen as a family of -dimensional prototype representations that encapsulate representative characteristics over all graphs in .

Let be a set of graphs. For each sample graph , we commence by computing the -dimensional depth-based representation rooted at each vertex as . We apply -means on the -dimensional depth-based representations of the graphs in and identify centroids as the set of -dimensional prototype representations. To establish a set of correspondences between the vertices of the graphs, for each graph , we align the -dimensional depth-based representations of its vertices to the family of -dimensional prototype representations. The alignment process is similar to that introduced in [12] for point matching in a pattern space. Specifically, we compute a

-level affinity matrix

in terms of the Euclidean distances between the two sets of points, i.e., the affinity matrix element is

(9)

where is a matrix, and each element represents the distance between the depth-based representation rooted at and the -th prototype representation . If the value of is the smallest in row , we say that is aligned to , i.e., the vertex is aligned to the -th prototype representation. Note that for each graph there may be two or more vertices aligned to the same prototype representation. We record the correspondence information using the -level correspondence matrix

(10)

For a pair of graphs and , if and are aligned to same prototype representation , we say that and are also aligned. As a result, we can identify the transitive alignment information between the vertices of all the graphs in . In order words, all the graphs in are aligned by matching their vertices to a common set of reference points, i.e., the prototype representations.

Discussions: In fact, the alignment procedure formulated by Eq.(9) and Eq.(10) can be accommodated by the objective function of -means defined by . This is because identifying the smallest element in the -row of is equivalent to assign the depth-based representation vector of to the cluster whose mean vector is . As a result, the proposed alignment procedure can be seen as an optimization process that gradually minimizes the inner-vertex-cluster sum of squares over the vertices of all graph, and can establish reliable vertex correspondence information over all graphs.

Fig. 2: The architecture of the proposed QSGCNN model. An input graph of arbitrary sizes is first aligned to the prototype graph . Then, is mapped into a fixed-sized aligned vertex grid structure, where the vertex orders follows that of . The grid structure of is passed through multiple quantum spatial graph convolution layers to extract multi-scale vertex features, where the vertex information is propagated between specified vertices in terms of the quantum passing information. Since the graph convolution layers preserve the property of the input grid structure, the concatenated vertex features through the graph convolution layers form new vertex grid structures of . The vertex grid structure is passed to a traditional CNN layer to learn a classification function. Note, vertex features are visualized as different colors.

Iii Quantum Spatial Graph Convolutional Neural Network

In this section, we develop a new Quantum Spatial Graph Convolutional Neural Network (QSGCNN) model. The architecture of the QSGCNN model has three sequential stages, including 1) the grid structure construction and input layer, 2) the quantum spatial graph convolution layer, and 3) the traditional convolutional neural network and Softmax layer. Specifically, the grid structure construction and input layer maps graphs of arbitrary sizes into fixed-sized grid structures including the aligned vertex grid structures and the associated aligned grid vertex adjacency matrices, and input the grid structures into the new QSGCNN model. With the input graph grid structures to hand, the quantum spatial graph convolution layer further extracts the multi-scale vertex features by propagating vertex feature information in terms of the quantum passing information between aligned grid vertices. Since the extracted vertex features from the graph convolution layer preserve the property of the input grid structures, the traditional convolutional neural network and Softmax layer can read the extracted vertex features and predict the class of each graph. The architecture of the proposed QSGCNN model are show in Fig.

2.

Iii-a Aligned Vertex Grid Structures of Graphs

In this subsection, we introduce how to map graphs of different sizes into fixed-sized aligned vertex grid structures and associated corresponding fixed-sized aligned grid vertex adjacency matrices. Assume represents a set of graphs. is a sample graph from , with representing the vertex set, representing the edge set, and representing the vertex adjacency matrix. Suppose each vertex is represented as a -dimensional feature vector, the feature information of all vertices in can be denoted by a matrix , i.e., . Note that, the row of follows the same vertex order of . If the graphs in are vertex attributed graphs,

can be the one-hot encoding matrix of the vertex labels. For the un-attributed graphs, we proposed to use the vertex degree as the vertex label. Based on the transitive vertex alignment method introduced in Section 

II, for each graph , we commence by computing the -level vertex correspondence matrix that records the correspondence information between the vertices in and the -dimensional prototype representations in . The row and column of are indexed by the vertices in and the prototype representations in , respectively. With to hand, we compute the -level aligned vertex feature matrix for as

(11)

where is a matrix and each row of represents the feature of a corresponding aligned vertex. Moreover, we also compute the associated -level aligned vertex adjacency matrix for as

(12)

where is a matrix. Eq.(11) and Eq.(12) indicate that and are computed from the original vertex feature matrix and the original vertex adjacency matrix respectively, and the correspondence matrix can map the original feature and adjacent information of each vertex to that of a new aligned vertex indexed by a corresponding prototype representation in . As a result, and completely encapsulate the original vertex and structure information of , and can be seen as the new aligned vertex feature matrix and aligned vertex adjacency matrix of .

To further construct the fixed-sized aligned grid structure for each graph , we need to establish consistent orders of graph vertices. Since all vertices are aligned to the prototype representations, we propose to determine the vertex orders by reordering the prototype representations. To this end, we construct a prototype graph that captures the pairwise similarity between the prototype representations. Given this graph, one idea could be to sort the prototype representations based on their degree. This would be equivalent to sorting the prototypes in orders of average similarity to the remaining ones. Specifically, we compute the prototype graph that characterizes the relationship information between the -dimensional prototype representations in , with each vertex representing the prototype representation and each edge representing the similarity between and . The similarity between two vertices of is computed as

(13)

The degree of each prototype representation is

(14)

Based on the descend orders of the degree , we rearrange the orders of the -dimensional prototype representations in . Accordingly, we also rearrange the orders of and that are both indexed by the orders of the prototype representations. With the rearranged and to hand, we compute the final aligned vertex grid structure for as

(15)

and the associated aligned grid vertex adjacency matrix as

(16)

where is set as the greatest length of the shortest paths over all graphs in , is a matrix, and is a matrix, following the sizes of and , respectively.

Discussions: Eq.(15) and Eq.(16) can be seen as the process to transform any original graph of arbitrary sizes into a new aligned grid graph structure of the same fixed size, where is its aligned grid vertex feature matrix, each row of is the feature of a corresponding aligned grid vertex, and is the aligned grid vertex adjacency matrix. Since the rows of for any graph are indexed by the same rearranged prototype representations with consistent orders, the fixed-sized vertex grid structure can be directly employed as the input of to the traditional convolutional neural network. In other words, one can apply a fixed sized classical convolutional filter to slide over the rows of and learn the feature for . Finally, note that, and completely encapsulate the original vertex and topological structure information of , following the property of and .

Iii-B The Quantum Spatial Graph Convolution Layer

In this subsection, we propose a new quantum spatial graph convolution layer to further extract the features of the vertices of each graph in terms of the quantum information between aligned grid vertices. Specifically, we input the aligned vertex grid structure of each graph as the aligned grid vertex feature matrix where each row represents the feature of an aligned grid vertex, and the quantum passing information between the aligned grid vertices is formulated by the average mixing matrix of the continuous-time quantum walk evolved on the associated aligned grid vertex adjacency matrix.

For the set of graphs and each sample graph , we input the aligned vertex grid structure and the associated aligned grid vertex adjacency matrix of to the quantum spatial graph convolution layer. The proposed spatial graph convolution layer takes the following form, i.e.,

(17)

where

is the rectified linear units function (i.e., a nonlinear activation function),

is the average mixing matrix of the continuous-time quantum walk evolved on of defined in Section II-A, is the matrix of trainable parameters of the proposed graph convolutional layer, and is the output activation matrix.

The proposed quantum spatial graph convolution layer defined by Eq.(17) consists of three steps. First, in the first step, is applied to transform the aligned grid vertex information matrix into a new aligned grid vertex information matrix and this maps the -dimensional features of each aligned grid vertex into a new -dimensional features, i.e., maps the feature channels to channels in the next layer. The weights are shared among all aligned grid vertices. The second step , where , propagates the feature information of each aligned grid vertex to the remaining vertices as well as the vertex itself, in terms of the quantum passing information. Specifically, we note that encapsulates the average probability for a continuous-time quantum walk to visit the -th aligned grid vertex starting from the -th aligned grid vertex, and . Here, can be equal to , i.e., includes the self-loop information for each vertex. Thus, the -th row of the resulting matrix of is the feature summation of the -th aligned grid vertex and the remaining aligned grid vertices associated with the average visiting probability of quantum walks from the -th vertex to the remaining vertices as well as the -th vertex itself. The final step employs the rectified linear units function to and outputs the graph convolution result.

The proposed quantum spatial graph convolution propagates the aligned grid vertex information in terms of the quantum passing information associated with the continuous-time quantum walk between vertices. To further extract the multi-scale features of aligned grid vertices, we stack multiple graph convolution layers defined by Eq.(17) as follows

(18)

where is the input aligned vertex grid structure , is the output of the -th spatial graph convolution layer, and is the trainable parameter matrix mapping channels to channels.

After each -th quantum spatial graph convolutional layer, we also add a layer to horizontally concatenate the output associated with the outputs of the previous to spatial graph convolutional layers as well as the original input as , i.e., and . As a result, for the concatenated output , each of its row can be seen as the new multi-scale features for the corresponding grid vertex.

Discussions: Note that, the proposed quantum spatial graph convolution only extracts new features for the grid vertex and does not change the orders of the vertices. As a result, either of the output and the concatenated output remain the grid structure property of the original input , and can be directly employed as the input to the traditional convolutional neural network. This provides an elegant way of bridging the gap between the proposed quantum spatial graph convolution layer and the traditional convolutional neural network, making an end-to-end deep learning architecture that integrates the graph representation and learning in the quantum spatial graph convolution layer and the traditional convolution layer for graph classification problems.

Iii-C The Traditional Convolutional Neural Network Layers

After each of the -th proposed quantum spatial graph convolution layer, we can get a concatenated vertex grid structure , where each row of represents the multi-scale feature for a corresponding grid vertex. As we mentioned above, each grid structure can be directly employed as the input to the traditional convolutional neural network (CNN). Specifically, the Classical One-dimensional CNN part of Fig.2 exhibits the architecture of the traditional CNN layers associated with each . Here, each concatenated vertex grid structure is seen as a (in Fig.2 ) vertex grid structure and each vertex is represented by a -dimensional feature, i.e., the channel of each grid vertex is . Then, we add an one-dimensional convolutional layer and the convolutional operation can be performed by sliding a fixed-sized filter of size (in Fig.2 ) over the spatially neighboring vertices. After this, several MaxPooling layers and other one-dimensional convolutional layers can be added to learn the local patterns on the aligned grid vertex sequence. Finally, when we vary from to (in Fig.2 ), we will obtain extracted pattern representations. We concatenate the extracted patterns of each and add a fully-connected layer followed by a Softmax layer.

Iii-D Discussions of the Proposed QSGCNN Model

The proposed QSGCNN model is related to some existing state-of-the-art graph convolution network models and graph kernels. However, there are a number of significant theoretic differences between the proposed QSGCNN model and these state-of-the-art methods, explaining the effectiveness of the proposed model. In this subsection, we discuss the relationships between these methods and demonstrate the advantages of the proposed model.

First, similar to the quantum spatial graph convolution of the proposed QSGCNN model, the associated graph convolution of the Deep Graph Convolutional Neural Network (DGCNN) [24] and the spectral graph convolution of the Fast Approximate Graph Convolutional Neural Network (FAGCNN) [30] also propagate the features between specified vertices. Specifically, the graph convolutions of the DGCNN and FAGCNN models adopt the graph adjacency matrix or the normalized Laplacian matrix to determine the passing information between specified vertices. By contrast, our quantum spatial graph convolution utilizes the average mixing matrix of the continuous-time quantum walk evolved on the graph structure. As we mentioned in Section II-A, the quantum walk is not dominated by the low frequency of the Laplacian spectrum and has better ability to distinguish different graph structures. As a result, the proposed quantum spatial graph convolution associated with the quantum passing information formulated by the average mixing matrix of the quantum walk can extract more discriminative vertex features for different graphs.

Second, in order to remain the scale of the vertex feature after each graph convolution layer, the graph convolution of the DGCNN model [24] and the spectral graph convolution of the FAGCNN model [30] need to multiply the inversion of the vertex degree matrix. For instance, the graph convolution layer of the DGCNN model associated with a graph having vertices is

(19)

where is the adjacency matrix of the graph with added self-loops, is the degree matrix of , is the vertex feature matrix with each row representing the -dimensional features of a vertex, is the matrix of trainable parameters,

is a nonlinear activation function (e.g., the Relu function), and

is the output. Similar to the proposed quantum spatial graph convolution defined in Eq.(17), maps the -dimensional features of each vertex into a new -dimensional features. Moreover, () propagates the feature information of each vertex to its neighboring vertices as well as the vertex itself. The -th row of the resulting matrix represents the extracted features of the -th vertex, and corresponds to the summation of itself and from the neighbor vertices of the -th vertex. To remain the original scale of the -th vertex feature, multiplying the inversion of (i.e., ), can be seen as the process to assign the equal averaged weights to the -th vertex and each of its neighbor vertices. In other words, the graph convolution of the DGCNN model seems the mutual-influences between specified vertices for the convolution operation as the same. By contrast, the quantum spatial graph convolution of the proposed QSGCNN model defined in Eq.(17

) assigns an averaged quantum walk visiting probability distribution to specified vertices with each vertex having a different visiting probability as the weight, thus the extracted vertex feature is the weighted summation of the specified vertex features. As a result, the quantum spatial graph convolution of the proposed QSGCNN model cannot only remain the feature scale, but also discriminates the mutual-influences between specified vertices in terms of the different visiting probabilities for the convolution operation.

Third, similar to the proposed QSGCNN model, both the PATCHY-SAN based Graph Convolution Neural Network (PSGCNN) model [23] and the DGCNN model [24] need to rearrange the vertex order of each graph structure and transforms each graph into the fixed-sized vertex grid structure. Specifically, the PSGCNN model first forms the grid structures and then perform the standard classical CNN on the grid structures. The DGCNN model sorts the vertices through a SortPooling associated whit the extracted vertex features from multiple spatial graph convolution layers. Unfortunately, both the PSGCNN model and the DGCNN model sort the vertices of each graph based on the local structural descriptor, ignoring vertex correspondence information between different graphs. By contrast, the proposed QSGCNN model associates with a transitive vertex alignment procedure to transform each graph into an aligned fixed-sized vertex grid structure. As a result, only the proposed QSGCNN model can integrate the precise structural correspondence information over all graphs under investigations.

Fourth, when the PSGCNN model [23] and the DGCNN model [24] form fixed-sized vertex grid structures, some vertices with lower ranking will be discard. Moreover, the Neural Graph Fingerprint Network (NGFN) [21] and the Diffusion Convolution Neural Network (DCNN) [22] tend to capture global-level graph features by summing up the extracted local-level vertex features through a SumPooling layer, since both the NGFN model and the DCNN model cannot directly form vertex grid structures. This significantly leads to information loss of local-level vertex features. By contrast, the required aligned vertex grid structures and the associated grid vertex adjacency matrices for the proposed QSGCNN model can completely encapsulate the original vertex feature and the topological structure information of the original graphs. As a result, the proposed QSGCNN overcomes the shortcoming of information loss arising in the mentioned state-of-the-art graph convolutional neural network models.

Fifth, similar to the DGCNN model [24], the quantum spatial graph convolution of the proposed QSGCNN model is also related to the Weisfeiler-Lehman subtree kernel (WLSK) [11] Specifically, the WLSK kernel employs the classical Weisfeiler-Lehman (WL) algorithm as a canonical labeling method to extract multi-scale vertex features corresponding to subtrees for graph classification. The key idea of the WL method is to concatenate a vertex label with the labels of its neighbor vertices, and then sort the concatenated label lexicographically to assign each vertex a new label. The procedure repeats until a maximum iteration , and each vertex label at an iteration corresponds to a subtree of height rooted at the vertex. If the concatenated label of two vertices are the same, the subtree rooted at the two vertices are isomorphic, i.e., the two vertices are seen to share the same structural characteristics within the graph. The WLSK kernel uses this idea to measure the similarity between two graphs. It performs the WL method to update the vertex labels, and then counts the number of identical vertex labels (i.e. counting the number of the isomorphic subtrees) until the maximum of the iteration in order to compare two graphs at multiple scales. To exhibit the relationship between the proposed quantum spatial graph convolution defined in Eq.(17) and the WLSK kernel, we decompose Eq.(17) as a row-wise manner, i.e.,

(20)

where . For Eq.(20), can be seen as the continuous vectorial vertex label of the -th vertex. Moreover, if , the quantum walk starting from the -th vertex can visit the -th vertex, and the visiting probability is . Similar to the WL methods, Eq.(20) aggregates the continuous label of the -th vertex and the continuous labels of the vertices, that can be visited by the quantum walk starting from the -th vertex, as a new signature vector for the -th vertex. The function maps to a new continuous vectorial label. As a result, the the quantum spatial graph convolution of the proposed QSGCNN model can be seen as a quantum version of the WL algorithm, in terms of the quantum passing information formulated by the quantum walk. As we mentioned in Section II-A, the quantum walk can significantly reduce the influence of the tottering problem. On the other hand, the classical WL method also suffers from tottering problem [12]. As a result, the quantum spatial graph convolution can address the tottering problem arising in the classical WL method, and the graph convolution of the DGCNN model that is similar to the clasical WL method. In other words, the quantum spatial graph convolution of the proposed QSGCNN model can learn better vertex features of graphs

Finally, note that, the proposed QSGCNN model for each graph is invariant with respect to the permutation of vertex orders, indicating that the activations of a pair of isomorphic graphs will be the same. As we mentioned, the proposed QSGCNN model consists three stages, i.e., the grid structure construction and input layer, the quantum spatial graph convolution layer, and the traditional CNN layer. For the first layer, the construction of grid structures relies on the vertex features and adjacency matrix and is invariant to vertex permutation. As a result, the grid structures for a pair of isomorphic graphs are the same. For the second layer, the input grid structures of different graphs share the same parameter weights, thus the quantum spatial graph convolutions will produce the same extracted vertex features for a pair of isomorphic graphs associated with the the same grid structures. Consequently, the following classical CNN layer will correctly identify the isomorphic graphs. As a result, the proposed QSGCNN model can precisely categorize the isomorphic graphs.

These above discussions reveal the advantages of the proposed QSGCNN model, explaining the effectiveness of the proposed model. The proposed QSGCNN model not only overcomes the shortcomings of existing state-of-the-art methods, but also bridges the theoretical gap between these methods.

 Datasets  MUTAG  NCI1  Protein  D&D  PTC(MR)  COLLAB  IMDB-B  IMDB-M
 Max # vertices                
 Mean # vertices                
 # graphs                
TABLE I: Information of the Graph Datasets
 Datasets  MUTAG  NCI1  PROTEINS  D&D  PTC(MR)
 QSGCNN          
 JTQK          
 WLSK          
 SPGK          
 PIGK          
 RWGK          
TABLE II: Classification Accuracy (In Standard Error) for Comparisons with Graph Kernels.
 Datasets  MUTAG  NCI1  PROTEINS  D&D  PTC(MR)  COLLAB  IBDM-B  IBDM-M
 QSGCNN                
 DGCNN                
 PSGCNN                
 DCNN                
 ECC                
 DGK                
TABLE III: Classification Accuracy (In Standard Error) for Comparisons with Graph Convolutional Neural Networks.

Iv Experiments

In this section, we empirically compare the performance of the proposed QSGCNN model to state-of-the-art graph kernels and deep learning neural networks on graph classification problems.

Iv-a Comparisons with Graph Kernels

Datasets: In this subsection, we utilize five standard graph datasets abstracted from bioinformatics to evaluate the performance of the proposed QSGCNN model. These datasets include MUTAG, PTC, NCI1, PROTEINS and D&D. Statistical Details of these datasets are shown in Table.I.

Experimental Setup: We evaluate the performance of the proposed QSGCNN model on graph classification problems against five alternative state-of-the-art graph kernels. These graph kernels include 1) the Weisfeiler-Lehman subtree kernel (WLSK) [11], 2) the shortest path graph kernel (SPGK) [31], 3) the Jensen-Tsallis q-difference kernel (JTQK) with  [32], 4) the random walk graph kernel (RWGK) [33], and 5) the propagated information graph kernel (PIGK) [34].

For the evaluation, the proposed QSGCNN model uses the same network structure on all graph datasets. Specifically, we set the number of the prototype representations as , the number of the quantum spatial graph convolution layers as (note that, including the original input grid structures, the spatial graph convolution produces concatenated outputs), and the channels of each quantum spatial graph convolution as . Following each of the concatenated outputs after the quantum graph convolution layers, we add a traditional CNN layer with the architecture as C-P-C-P-C-F to learn the extracted patterns, where C denotes a traditional convolutional layer with channels,

k denotes a classical MaxPooling layer of size and stride

, and FC denotes a fully connected layer with hidden units. The filter size and stride of each C are all and . With the six sets of extracted patterns after the CNN layers to hand, we concatenate them and add a new fully-connected layer followed by a Softmax layer with a dropout rate of . We use the rectified linear units (ReLU) in either the graph convolution or the traditional convolution layer. The learning rate of the proposed model is

for any dataset. The only hyperparameter we optimized is the number of epochs and the batch size for the mini-batch gradient decent algorithm, and we use the Adam optimization algorithm for gradient descent. Finally, note that, the proposed QSGCNN model needs to construct the prototype representations to identify the transitive vertex alignment information over all graphs. The prototype representations can be computed from the training graphs or both the training and testing graphs. We observe that the proposed model associated with the two manners dose not influence the final performance. Thus, in our evaluation we proposed to compute the prototype representations from both the training and testing graphs. In this sense, our model can be seen as an instance of transductive learning 

[35], where all the graphs are used to compute the prototype representations, and we do not observe the class labels of the test graphs during the training. For the proposed QSGCNN model, we perform -fold cross-validation to compute the classification accuracies, with nine folds for training and one folds for testing. For each dataset, we repeat the experiment 10 times and report the average classification accuracies and standard errors in Table.II.

For the WLSK kernel and the JTQK kernel, we set the highest dimension (i.e., the highest height of subtrees) of the Weisfeiler-Lehman isomorphism (for the WLSK kernel) and the tree-index method (for the JTQK kernel) as , because of the good performance of the kernels with these parameters based on the suggestions of Shervashidze et al. [11] and Bai et al. [32]. Similarly, for each graph kernel, we perform

-fold cross-validation using the C-Support Vector Machine (C-SVM) Classification to compute the classification accuracies, using LIBSVM. We use nine samples for training and one for testing. For each fold, we choose the optimized parameters of each kernel together with the C parameter of the C-SVM by cross-validation on the training data. For each kernel and dataset, we repeat the experiment 10 times and report the average classification accuracies and standard errors for each kernel in Table.

II.

Experimental Results and Discussions: Table II indicates that the proposed QSGCNN model significantly outperforms the alternative state-of-the-art graph kernels in this study. Although, the proposed model cannot achieve the best classification accuracy on the NCI1 and PROTEIN datasets, but the proposed model is still competitive and the accuracy on the PROTEINS dataset is only a little lower than the SPGK kernel. The reasons for the effectiveness are twofold. First, the state-of-the-art graph kernels for comparisons are typical examples of R-convolution kernels. Specifically, these kernels are based on the isomorphism measure between any pair of substructures, ignoring the structure correspondence information between the substructures. By contrast, the associated aligned vertex grid structure for the proposed QSGCNN model incorporates the transitive alignment information between vertex over all graphs. Thus, the proposed model can better reflect the precise characteristics of graphs. Second, the C-SVM classifier associated with graph kernels can only be seen as a shallow learning framework [36]. By contrast, the proposed QSGCNN model can provide an end-to-end deep learning architecture for graph classifications, and can better learn the graph characteristics. The experiments demonstrate the advantages of the proposed QSGCNN model, comparing to the shallow learning framework.

Iv-B Comparisons with Deep Learning Mthods

Datasets: In this subsection, we further compare the performance of the proposed QSGCNN model with state-of-the-art deep learning methods for graph classifications. The datasets for the evaluations include the mentioned five datsets abstracted from bioinformatics, as well as three social network datasets. The social network datasets include COLLAB, IMDB-B, and IMDB-M. Details of these social network datasets can be found in Table.I.

Experimental Setup: We evaluate the performance of the proposed QSGCNN model on graph classification problems against five alternative state-of-the-art deep learning methods for graphs. These methods include 1) the deep graph convolutional neural network (DGCNN) [24], 2) the PATCHY-SAN based convolutional neural network for graphs (PSGCNN) [23], 3) the diffusion convolutional neural network (DCNN) [22], 4) the edge-conditioned convolutional networks (ECC) [37], and 5) the deep graphlet kernel (DGK) [38]. For the proposed QSGCNN model, we use the same experimental setups when we compare the proposed model to graph kernels. For the PSCN, ECC, and DGK model, we report the best results from the original papers [23, 37, 38]. Note that, these methods follow the same setting with the proposed QSGCNN model. For the DCNN model, we report the best results from the work of Zhang et al., [24], following the same seeting of ours. Finally, the PSCN and ECC models can leverage additional edge features. Since most graph datasets and all the other compared methods do not leverage edge features, in this work we do not report the results associated with edge features. The classification accuracies and standard errors for each deep learning method are shown in Table.III.

Experimental Results and Discussions: Table III indicates that the proposed QSGCNN model significantly outperforms state-of-the-art deep learning methods for graph classifications, on the NCI1, D&D, PTC, COLLAB, IBDM-B, IBDM-M datasets. On the other hand, the accuracy of the PSGCNN model on the MUTAG dataset and that of the DGCNN model on the PROTEINS dataset are a little higher than the proposed QSGCNN model. But the proposed QSGCNN is still competitive to the PSGCNN and DGCNN models. The reasons of the effectiveness are fivefold. First, similar to the state-of-the-art graph kernels, the alternative deep learning methods for comparisons also cannot integrate the correspondence information between graphs into the learning architecture. Especially, the PSGCNN, DGCNN and ECC models need to reorder the vertices, but these methods rely on simple but inaccurate heuristics to align the vertices of the graphs, i.e., they sort the vertex orders based on the local structure descriptor of each individual graph and ignore the vertex correspondence information between different graphs. Thus, only the proposed QSDCNN model can precisely reflect the graph characteristics through the layer-wise learning. Second, the PSGCNN and DGCNN models need to form a fixed-sized vertex grid structure for each graph. Since the vertex numbers of different graphs are different, forming such sixed-sized grid structures means some vertices of each graph may be discarded, leading to information loss. By contrast, as we have mentioned in Section II and Section III, the associated aligned vertex grid structures can completely preserve the information of original graphs. As a result, only the proposed QSGCNN model can completely integrate the original graph characteristics into the learning process. Third, unlike the proposed model, the DCNN model needs to sum up the extracted local-level vertex features from the convolution operation as global-level graph features through a SumPooling layer. Thus, only the QSGCNN model can learn the graph topological information through the local vertex features. Fifth, unlike the PSGCNN, DGCNN and ECC models that are based on the original vertex adjacency matrix to formulate vertex passing information of the graph convolution operation, the graph convolution operation of the proposed QSGCNN model formulates the vertex passing information in terms of the average mixing matrix of continuous-time quantum walk. As we have stated in Section II, the quantum walk is not dominated by the low frequency of the Laplacian spectrum and can better distinguish different graph structures. Thus, the proposed QSDCNN model has better ability to identify the difference between different graphs. Fifth, similar to the DGCNN, PSGCNN and DGK models, the proposed QSGCNN model is also related to the classical Weisfeiler-Lehman (WL) method. Since the classical WL method suffers from tottering problem, the related DGCNN, PSGCNN and DGK models also process the same drawback. By contrast, the graph convolution operation of the proposed QSGCNN model can be seen as the quantum version of the classical WL algorithm. Since the quantum walk can reduce the problem of tottering problem, the proposed QSGCNN model overcomes the shortcoming of tottering problem arising in the DGCNN, PSGCNN and DGK models. The evaluation demonstrates the advantages of the proposed QSGCNN model, comparing to the state-of-the-art deep learning methods.

V Conclusion

In this paper, we have developed a new Quantum Spatial Graph Convolutional Neural Network (QSGCNN) model, that can directly learn an end-to-end deep learning architecture for classifying graphs of arbitrary sizes. The main idea of the proposed QSGCNN model is to transform each graph into a fixed-sized vertex grid structure through transitive alignment between graphs and propagate the grid vertex features using the proposed quantum spacial graph convolution operation. Comparing to state-of-the-are deep learning methods and graph kernels, the proposed QSGCNN model cannot only preserve the original graph characteristics, but also bridge the gap between the spatial graph convolution layer and the traditional convolutional neural network layer. Moreover, the proposed QSGCNN can better distinguish different structures, and the experimental evaluations demonstrate the effectiveness of the proposed QSGCNN model on graph classification problems.

In this work, the proposed QSGCNN model follows the same network architecture for all datasets. In future works, we aim to use different structures of the proposed model for different datasets, expecting better performance. Furthermore, in future works, we also aim to extend the proposed QSGCNN model and develop a new quantum convolutional neural network based on edge-based grid structures. Specifically, in our previous works [39, 40, 41], we have shown how to characterize the edge information of the original graphs through the directed line graphs, where each vertex of the line graph represents an edge of original graphs. Moreover, we have exhibited the relationship between the discrete-time quantum walks and the directed line graphs. It will be interesting to further develop a new quantum edge-based convolutional network associated with the discrete-time quantum walks and the directed line graphs.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no.61503422, 61602535 and 61773415), the Open Projects Program of National Laboratory of Pattern Recognition (NLPR), and the program for innovation research in Central University of Finance and Economics.

References

  • [1] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Proceedings of NIPS, 2016, pp. 3837–3845.
  • [2] N. Kriege and P. Mutzel, “Subgraph matching kernels for attributed graphs,” in Proceedings of ICML, 2012.
  • [3] Z. Harchaoui and F. Bach, “Image classification with segmentation graph kernels,” in Proceedings of CVPR, 2007.
  • [4] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of KDD, 2016, pp. 1225–1234.
  • [5] K. Riesen and H. Bunke, “Graph classification by means of lipschitz embedding,” IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 39, no. 6, pp. 1472–1483, 2009.
  • [6] J. Gibert, E. Valveny, and H. Bunke, “Graph embedding in vector spaces by node attribute statistics,” Pattern Recognition, vol. 45, no. 9, pp. 3072–3083, 2012.
  • [7] R. C. Wilson, E. R. Hancock, and B. Luo, “Pattern vectors from algebraic graph theory,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 7, pp. 1112–1124, 2005.
  • [8]

    R. Kondor and K. M. Borgwardt, “The skew spectrum of graphs,” in

    Proceedings of ICML, 2008, pp. 496–503.
  • [9] M. Neuhaus and H. Bunke, Bridging the Gap between Graph Edit Distance and Kernel Machines

    , ser. Series in Machine Perception and Artificial Intelligence.   WorldScientific, 2007, vol. 68.

  • [10] D. Haussler, “Convolution kernels on discrete structures,” in Technical Report UCS-CRL-99-10, Santa Cruz, CA, USA, 1999.
  • [11] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” Journal of Machine Learning Research, vol. 1, pp. 1–48, 2010.
  • [12] L. Bai, L. Rossi, Z. Zhang, and E. R. Hancock, “An aligned subtree kernel for weighted graphs,” in Proceedings of ICML.
  • [13] L. Lu, Y. Zheng, G. Carneiro, and L. Yang, Eds., Deep Learning and Convolutional Neural Networks for Medical Image Computing - Precision Medicine, High Performance and Large-Scale Datasets, ser. Advances in Computer Vision and Pattern Recognition.   Springer, 2017.
  • [14] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of CVPR, 2015, pp. 3156–3164.
  • [15]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”

    Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.
  • [16] A. J. Tixier, G. Nikolentzos, P. Meladianos, and M. Vazirgiannis, “Classifying graphs as images with convolutional neural networks,” CoRR, vol. abs/1708.02218, 2017. [Online]. Available: http://arxiv.org/abs/1708.02218
  • [17] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” CoRR, vol. abs/1312.6203, 2013.
  • [18] O. Rippel, J. Snoek, and R. P. Adams, “Spectral representations for convolutional neural networks,” in Proceddings of NIPS, 2015, pp. 2449–2457.
  • [19] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on graph-structured data,” CoRR, vol. abs/1506.05163, 2015. [Online]. Available: http://arxiv.org/abs/1506.05163
  • [20] J. Vialatte, V. Gripon, and G. Mercier, “Generalizing the convolution operator to extend cnns to irregular domains,” CoRR, vol. abs/1606.01166, 2016. [Online]. Available: http://arxiv.org/abs/1606.01166
  • [21] D. K. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Proceedings of NIPS, 2015, pp. 2224–2232.
  • [22] J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Proceedings of NIPS, 2016, pp. 1993–2001.
  • [23] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in Proceedings of ICML, 2016, pp. 2014–2023.
  • [24] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end deep learning architecture for graph classification,” in Proceedings of AAAI, 2018.
  • [25]

    E. Farhi and S. Gutmann, “Quantum computation and decision trees,”

    Physical Review A, vol. 58, p. 915, 1998.
  • [26] C. Godsil, “Average mixing of continuous quantum walks,” Journal of Combinatorial Theory, Series A, vol. 120, no. 7, pp. 1649–1662, 2013.
  • [27] L. Bai and E. R. Hancock, “Depth-based complexity traces of graphs,” Pattern Recognition, vol. 47, no. 3, pp. 1172–1186, 2014.
  • [28] L. Bai, L. Cui, L. Rossi, L. Xu, and E. Hancock, “Local-global nested graph kernels using nested complexity traces,” Pattern Recognition Letters, To appear.
  • [29] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques.   Morgan Kaufmann, 2011.
  • [30] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” CoRR, vol. abs/1609.02907, 2016. [Online]. Available: http://arxiv.org/abs/1609.02907
  • [31] K. M. Borgwardt and H.-P. Kriegel, “Shortest-path kernels on graphs,” in Proceedings of the IEEE International Conference on Data Mining, 2005, pp. 74–81.
  • [32] L. Bai, L. Rossi, H. Bunke, and E. R. Hancock, “Attributed graph kernels using the jensen-tsallis q-differences,” in Proceedings of ECML-PKDD, 2014, pp. 99–114.
  • [33] H. Kashima, K. Tsuda, and A. Inokuchi, “Marginalized kernels between labeled graphs,” in Proceedings of ICML, 2003, pp. 321–328.
  • [34] M. Neumann, R. Garnett, C. Bauckhage, and K. Kersting, “Propagation kernels: efficient graph kernels from propagated information,” Machine Learning, vol. 102, no. 2, pp. 209–245, 2016.
  • [35] A. Gammerman, K. S. Azoury, and V. Vapnik, “Learning by transduction,” in Proceedings of UAI, 1998, pp. 148–155.
  • [36] S. Zhang, C. Liu, K. Yao, and Y. Gong, “Deep neural support vector machines for speech recognition,” in Proceedings of ICASSP, 2015, pp. 4275–4279.
  • [37] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in Proceedings of CVPR, 2017, pp. 29–38.
  • [38] P. Yanardag and S. V. N. Vishwanathan, “Deep graph kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, 2015, pp. 1365–1374.
  • [39] L. Bai, P. Ren, L. Rossi, and E. R. Hancock, “An edge-based matching kernel through discrete-time quantum walks,” in Processing of ICIAP.
  • [40] L. Bai, L. Rossi, L. Cui, Z. Zhang, P. Ren, X. Bai, and E. R. Hancock, “Quantum kernels for unattributed graphs using discrete-time quantum walks,” Pattern Recognition Letters, vol. 87, pp. 96–103, 2017.
  • [41] L. Bai, F. Escolano, and E. R. Hancock, “Depth-based hypergraph complexity traces from directed line graphs,” Pattern Recognition, vol. 54, pp. 229–240, 2016.