Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification

04/06/2019
by   Lu Bail, et al.
0

In this paper, we develop a novel Aligned-Spatial Graph Convolutional Network (ASGCN) model to learn effective features for graph classification. Our idea is to transform arbitrary-sized graphs into fixed-sized aligned grid structures, and define a new spatial graph convolution operation associated with the grid structures. We show that the proposed ASGCN model not only reduces the problems of information loss and imprecise information representation arising in existing spatially-based Graph Convolutional Network (GCN) models, but also bridges the theoretical gap between traditional Convolutional Neural Network (CNN) models and spatially-based GCN models. Moreover, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the process of spatial graph convolution, explaining the effectiveness of the proposed model. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

09/04/2018

A Quantum Spatial Graph Convolutional Neural Network using Quantum Passing Information

In this paper, we develop a new Quantum Spatial Graph Convolutional Neur...
02/26/2019

Learning Vertex Convolutional Networks for Graph Classification

In this paper, we develop a new aligned vertex convolutional network mod...
06/06/2021

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

Tabular data are ubiquitous for the widespread applications of tables an...
09/05/2020

Visual Object Tracking by Segmentation with Graph Convolutional Network

Segmentation-based tracking has been actively studied in computer vision...
12/10/2017

DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model

Convolutional neural networks (CNNs) can be applied to graph similarity ...
07/07/2018

When Work Matters: Transforming Classical Network Structures to Graph CNN

Numerous pattern recognition applications can be formed as learning from...
06/18/2021

Equivariance-bridged SO(2)-Invariant Representation Learning using Graph Convolutional Network

Training a Convolutional Neural Network (CNN) to be robust against rotat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph-based representations are powerful tools to analyze structured data that are described in terms of pairwise relationships between components [26]

. One common challenge arising in the analysis of graph-based data is how to learn effective graph representations. Due to the recent successes of deep learning networks in machine learning, there is increasing interest to generalize deep Convolutional Neural Networks (CNN) 

[15] into the graph domain. These deep learning networks on graphs are the so-called Graph Convolutional Networks (GCN) [14], and have proven to be an effective way to extract highly meaningful statistical features for graph classification [8].

Generally speaking, most existing state-of-the-art GCN approaches can be divided into two main categories with GCN models based on a) spectral and b) spatial strategies. Specifically, approaches based on the spectral strategy define the convolution operation based on spectral graph theory [7, 11, 18]

. By transforming the graph into the spectral domain through the eigenvectors of the Laplacian matrix, these methods perform the filter operation by multiplying the graph by a series of filter coefficients. Unfortunately, most spectral-based approaches cannot be performed on graphs with different size numbers of vertices and Fourier bases. Thus, these approaches demand the same-sized graph structures and are usually employed for vertex classification tasks. On the other hand, approaches based on spatial strategy are not restricted to the same-sized graph structures. These approaches generalize the graph convolution operation to the spatial structure of a graph by directly defining an operation on neighboring vertices 

[1, 9, 23]. For example, Duvenaud et al. [9] have proposed a spatially-based GCN model by defining a spatial graph convolution operation on the -layer neighboring vertices to simulate the traditional circular fingerprint. Atwood and Towsley [1]

have proposed a spatially-based GCN model by performing spatial graph convolution operations on different layers of neighboring vertices rooted at a vertex. Although these spatially-based GCN models can be directly applied to real-world graph classification problems, they still need to further transform the multi-scale features learned from graph convolution layers into the fixed-sized representations, so that the standard classifiers can directly read the representations for classifications. One way to achieve this is to directly sum up the learned local-level vertex features from the graph convolution operation as global-level graph features through a SumPooling layer. Since it is difficult to learn rich local vertex topological information from the global features, these spatially-based GCN methods associated with SumPooling have relatively poor performance on graph classification.

To overcome the shortcoming of existing spatially-based GCN models, Zhang et al. [27]

have developed a novel spatially-based Deep Graph Convolutional Neural Network (DGCNN) model to preserve more vertex information. Specifically, they propose a new SortPooling layer to transform the extracted vertex features of unordered vertices from the spatial graph convolution layers into a fixed-sized local-level vertex grid structure. This is done by sequentially preserving a specified number of vertices with prior orders. With the fixed-sized grid structures of graphs to hand, a traditional CNN model followed by a Softmax layer can be directly employed for graph classification. Although this spatially-based DGCNN model focuses more on local-level vertex features and outperforms state-of-the-art GCN models on graph classification tasks, this method tends to sort the vertex order based on each individual graph. Thus, it cannot accurately reflect the topological correspondence information between graph structures. Moreover, this model also leads to significant information loss, since some vertices associated with lower ranking may be discarded. In summary, developing effective methods to learn graph representations still remains a significant challenge.

In this paper, we propose a novel Aligned-Spatial Graph Convolutional Network (ASGCN) model for graph classification problems. One key innovation of the proposed ASGCN model is that of transitively aligning vertices between graphs. That is, given three vertices , and from three different sample graphs, if and are aligned, and and are aligned, the proposed model can guarantee that and are also aligned. More specifically, the proposed model employs the transitive alignment procedure to transform arbitrary-sized graphs into fixed-sized aligned grid structures with consistent vertex orders, guaranteeing that the vertices on the same spatial position are also transitively aligned to each other in terms of the topological structures. The conceptual framework of the proposed ASGCN model is shown in Fig.1. Specifically, the main contributions are threefold.

First, we develop a new transitive matching method to map different arbitrary-sized graphs into fixed-sized aligned vertex grid structures. We show that the grid structures not only establish reliable vertex correspondence information between graphs, but also minimize the loss of structural information from the original graphs.

Second, we develop a novel spatially-based graph convolution model, i.e., the ASGCN model, for graph classification. More specifically, we propose a new spatial graph convolution operation associated with the aligned vertex grid structures as well as their associated adjacency matrices, to extract multi-scale local-level vertex features. We show that the proposed convolution operation not only reduces the problems of information loss and imprecise information representation arising in existing spatially-based GCN models associated with SortPooling or SumPooling, but also theoretically relates to the classical convolution operation on standard grid structures. Thus, the proposed ASGCN model bridges the theoretical gap between traditional CNN models and spatially-based GCN models, and can adaptively discriminate the importance between specified vertices during the process of the spatial graph convolution operation. Furthermore, since our spatial graph convolution operation does not change the original spatial sequence of vertices, the proposed ASGCN model utilizes the traditional CNN to further learn graph features. In this way, we provide an end-to-end deep learning architecture that integrates the graph representation learning into both the spatial graph convolutional layer and the traditional convolution layer for graph classification.

Third, we empirically evaluate the performance of the proposed ASGCN model on graph classification tasks. Experiments on benchmarks demonstrate the effectiveness of the proposed method, when compared to state-of-the-art methods.

Figure 1: The architecture of the proposed ASGCN model. An input graph of arbitrary size is first aligned to the prototype graph . Then, is mapped into a fixed-sized aligned vertex grid structure, where the vertex orders follows that of . The grid structure of is passed through multiple spatial graph convolution layers to extract multi-scale vertex features, where the vertex information is propagated between specified vertices associated with the adjacency matrix. Since the graph convolution layers preserve the original vertex orders of the input grid structure, the concatenated vertex features through the graph convolution layers form a new vertex grid structure for . This vertex grid structure is then passed to a traditional CNN layer for classifications. Note that, vertex features are visualized as different colors.

2 Related Works of Spatially-based GCN Models

In this section, we briefly review state-of-the art spatially-based GCN models in the literature. More specifically, we introduce the associated spatial graph convolution operation of the existing spatially-based Deep Graph Convolutional Neural Network (DGCNN) model [27]. To commence, consider a sample graph with vertices, is the collection of

vertex feature vectors of

in dimensions, and is the vertex adjacency matrix ( can be a weighted adjacency matrix). The spatial graph convolution operation of the DGCNN model takes the following form

(1)

where is the adjacency matrix of graph with added self-loops, is the degree matrix of with , is the matrix of trainable graph convolution parameters,

is a nonlinear activation function, and

is the output of the convolution operation.

For the spatial graph convolution operation defined by Eq.(1), the process first maps the -dimensional features of each vertex into a set of new -dimensional features. Here, the filter weights are shared by all vertices. Moreover, () propagates the feature information of each vertex to its neighboring vertices as well as the vertex itself. The -th row represents the extracted features of the -th vertex, and corresponds to the summation or aggregation of itself and from the neighbor vertices of the -th vertex. Multiplying by the inverse of (i.e., ) can be seen as the process of normalizing and assigning equal weights between the -th vertex and each of its neighbours.

Remark: Eq.(1) indicates that the spatial graph convolution operation of the DGCNN model cannot discriminate the importance between specified vertices in the convolution operation process. This is because the required filter weights are shared by each vertex, i.e., the feature transformations of the vertices are all based on the same trainable function. Thus, the DGCNN model cannot directly influence the aggregation process of the vertex features. In fact, this problem also arises in other spatially-based GCN models, e.g., the Neural Graph Fingerprint Network (NGFN) model [9], the Diffusion Convolution Neural Network (DCNN) model [1], etc. Since the associated spatial graph convolution operations of these models also take the similar form with that of the DGCNN model, i.e., the trainable parameters of their spatial graph convolution operations are also shared by each vertex. This drawback influences the effectiveness of the existing spatially-based GCN models for graph classification. In this paper, we aim to propose a new spatially-based GCN model to overcome the above problems.

3 Constructing Aligned Grid Structures for Arbitrary Graphs

Although, spatially-based GCN models are not restricted to the same graph structure, and can thus be applied for graph classification tasks. These methods still require us to further transform the extracted multi-scale features from graph convolution layers into the fixed-sized characteristics, so that the standard classifiers (e.g., the traditional convolutional neural network followed by a Softmax layer) can be directly employed for classifications. In this section, we develop a new transitive matching method to map different graphs of arbitrary sizes into fixed-sized aligned grid structures. Moreover, we show that the proposed grid structure not only integrates precise structural correspondence information but also minimises the loss of structural information.

3.1 Identifying Transitive Alignment Information between Graphs

We introduce a new graph matching method to transitively align graph vertices. We first designate a family of prototype representations that encapsulate the principle characteristics over all vectorial vertex representations in a set of graphs . Assume there are vertices from all graphs in , and their associated -dimensional vectorial representations are . We utilize -means [24] to locate centroids over , by minimizing the objective function

(2)

represents clusters, and is the mean of the vertex representations belonging to the -th cluster .

Let be the graph sample set. For each sample graph and each vertex associated with its -dimensional vectorial representation , we commence by identifying a set of -dimensional prototype representations as for the graph set . We align the vectorial vertex representations of each graph to the family of prototype representations in . The alignment procedure is similar to that introduced in [5] for point matching in a pattern space, and we compute a

-level affinity matrix in terms of the Euclidean distances between the two sets of points, i.e.,

(3)

where is a matrix, and each element represents the distance between the vectrial representation of and the -th prototype representation . If is the smallest in row , we say that the vertex is aligned to the -th prototype representation. Note that for each graph there may be two or more vertices aligned to the same prototype representation. We record the correspondence information using the -level correspondence matrix

(4)

For each pair of graphs and , if their vertices and are aligned to the same prototype representation , we say that and are also aligned. Thus, we identify the transitive correspondence information between all graphs in , by aligning their vertices to a common set of prototype representations.

Remark: The alignment process is equivalent to assigning the vectorial representation of each vertex to the mean of the cluster . Thus, the proposed alignment procedure can be seen as an optimization process that gradually minimizes the inner-vertex-cluster sum of squares over the vertices of all graphs through -means, and can establish reliable vertex correspondence information over all graphs.

3.2 Aligned Grid Structures of Graphs

We employ the transitive correspondence information to map arbitrary-sized graphs into fixed-sized aligned grid structures. Assume is a sample graph from the graph set , with representing the vertex set, representing the edge set, and representing the vertex adjacency matrix with added self-loops (i.e., , where is the original adjacency matrix with no self-loops and

is the identity matrix). Let

be the collection of () vertex feature vectors of in dimensions. Note that, the row of follows the same vertex order of . If are vertex attributed graphs,

can be the one-hot encoding matrix of the vertex labels. For un-attributed graphs, we use the vertex degree as the vertex label.

For each graph , we utilize the proposed transitive vertex matching method to compute the -level vertex correspondence matrix that records the correspondence information between the -dimensional vectorial vertex representation of and the -dimensional prototype representations in . With to hand, we compute the -level aligned vertex feature matrix for as

(5)

where and each row of represents the feature of a corresponding aligned vertex. Moreover, we also compute the associated -level aligned vertex adjacency matrix for as

(6)

where . Both and are indexed by the corresponding prototypes in . Since and are computed from the original vertex feature matrix and the original adjacency matrix , respectively, by mapping the original feature and adjacency information of each vertex to that of the new aligned vertices, and encapsulate the original feature and structural information of . Note that according to Eq. 4 each vertex can be aligned to more than one prototype, thus may be a weighted adjacency matrix.

Figure 2: The procedure of computing the correspondence matrix. Given a set of graphs, for each graph : (1) we compute the -dimensional depth-based (DB) representation rooted at each vertex (e.g., vertex 2) as the -dimensional vectorial vertex representation, where each element represents the Shannon entropy of the -layer expansion subgraph rooted at vertex of  [2]; (2) we identify a family of -dimensional prototype representations

using k-means on the

-dimensional DB representations of all graphs; (3) we align the -dimensional DB representations to the -dimensional prototype representations and compute a -level correspondence matrix .

In order to construct the fixed-sized aligned grid structure for each graph , we need to sort the vertices to determine their spatial orders. Since the vertices of each graph are all aligned to the same prototype representations, we sort the vertices of each graph by reordering the prototype representations. To this end, we construct a prototype graph that captures the pairwise similarity between the -dimensional prototype representations in , with each vertex representing the prototype representation and each edge representing the similarity between and . The similarity between two vertices of is computed as

(7)

The degree of each prototype representation is . We proposed to sort the -dimensional prototype representations in according to their degree . Then, we rearrange and accordingly.

To construct reliable grid structures for graphs, in this work we employ the depth-based (DB) representations as the vectorial vertex representations to compute the required -level vertex correspondence matrix . The DB representation of each vertex is defined by measuring the entropies on a family of -layer expansion subgraphs rooted at the vertex [3], where the parameter varies from to . It is shown that such a -dimensional DB representation encapsulates rich entropy content flow from each local vertex to the global graph structure, as a function of depth. The process of computing the correspondence matrix associated with depth-based representations is shown in Fig.3. When we vary the number of layers from to (i.e., ), we compute the final aligned vertex grid structure for each graph as

(8)

and the associated aligned grid vertex adjacency matrix as

(9)

where , , the -th row of corresponds to the feature vector of the -th aligned grid vertex, and the -row and -column element of corresponds to the adjacency information between the -th and -th aligned grid vertices.

Remark: Eq.(8) and Eq.(9) indicate that they can transform the original graphs with varying number of nodes into a new aligned grid graph structure with the same number of vertices, where is the corresponding aligned grid vertex feature matrix and is the corresponding aligned grid vertex adjacency matrix. Since both and are mapped from the original graph , they not only reflect reliable structure correspondence information between and the remaining graphs in graph set but also encapsulate more original feature and structural information of .

4 The Aligned-Spatial Graph Convolutional Network Model

In this section, we propose a new spatially-based GCN model, namely the Aligned-Spatial Graph Convolutional Network (ASGCN) model. The core stage of a spatially-based GCN model is the associated graph convolution operation that extracts multi-scale features for each vertex based on the original features of its neighbour vertices as well as itself. As we have stated, most existing spatially-based GCN models perform the convolution operation by first applying a trainable parameter matrix to map the original feature of each vertex in dimensions to that in dimensions, and then averaging the vertex features of specific vertices [1, 9, 23, 27]. Since the trainable parameter matrix is shared by all vertices, these models cannot discriminate the importance of different vertices and have inferior ability to aggregate vertex features. To overcome the shortcoming, in this subsection we first propose a new spatial graph convolution operation associated with aligned grid structures of graphs. Unlike existing methods, the trainable parameters of the proposed convolution operation can directly influence the aggregation of the aligned grid vertex features, thus the proposed convolution operation can discriminate the importance between specified aligned grid vertices. Finally, we introduce the architecture of the ASGCN model associated with the proposed convolution operation.

4.1 The Proposed Spatial Graph Convolution Operation

In this subsection, we propose a new spatial graph convolution operation to further extract multi-scale features of graphs, by propagating features between aligned grid vertices. Specifically, given a sample graph with its aligned vertex grid structure and the associated aligned grid vertex adjacency matrix , the proposed spatial graph convolution operation takes the following form

(10)

where

is the rectified linear units function (i.e., a nonlinear activation function),

is the trainable graph convolution parameter matrix of the -th convolution filter with the filter size and the channel number , represents the element-wise Hadamard product, is the degree matrix of , and is the output activation matrix. Note that, since the aligned grid vertex adjacency matrix is computed based on the original vertex adjacency matrix with added self-loop information, the degree matrix also encapsulates the self-loop information from .

Figure 3: An Instance of the Proposed Spatial Graph Convolution Operation.

An instance of the proposed spatial graph convolution operation defined by Eq.(10) is shown in Fig.3. Specifically, the proposed convolution operation consists of four steps. In the first step, the procedure commences by computing the element-wise Hadamard product between and , and then summing the channels of (i.e., summing the columns of ). Fig.3 exhibits this process. Assume is the collection of aligned grid vertex feature vectors in the dimensions (i.e., feature channels), is the -th convolution filter with the filter size and the channel number . The resulting first assigns the feature vector of each -th aligned grid vertex a different weighted vector , and then sums the channels of each weighted feature vector. For the first step, can be seen as a new weighted aligned vertex grid structure with vertex feature channel. The second step , where , propagates the weighted feature information between each aligned grid vertex as well as its neighboring aligned grid vertices. Specifically, each -th row of equals to , and can be seen as the aggregated feature vector of the -th aligned grid vertex by summing its original weighted feature vector as well as all the original weighted feature vectors of the -th aligned grid vertex that is adjacent to it. Note that, since the first step has assigned each -th aligned grid vertex a different weighted vector , this aggregation procedure is similar to performing a standard fixed-sized convolution filter on a standard grid structure, where the filter first assigns different weighted vectors to the features of each grid element as well as its neighboring grid elements and then aggregates (i.e., sum) the weighted features as the new feature for each grid element. This indicates that the trainable parameter matrix of the proposed convolution operation can directly influence the aggregation process of the vertex features, i.e., it can adaptively discriminate the importance between specified vertices. Fig.3 exhibits this propagation process. For the -nd aligned grid vertex (marked by the red broken-line frame), the -st and -rd aligned grid vertices and are adjacent to it. The process of computing (marked by the red real-line frame) aggregates the weighted feature vectors of aligned grid vertex as well as its neighboring aligned grid vertices and as the new feature vector of . The vertices participating in this aggregation process is indicated by the -nd row of (marked by the purple broken-line frame on ) that encapsulates the aligned grid vertex adjacent information. The third step normalizes each -th row of by multiplying , where is the -th diagonal element of the degree matrix . This process can guarantee a fixed feature scale after the proposed convolution operation. Specifically, Fig.3 exhibits this normalization process. The aggregated feature of the -nd aligned grid vertex (marked by the red real-line frame) is multiplied by , where is the -rd diagonal element of (marked by the purple broken-line frame on ). The last step

employs the Relu activation function and outputs the convolution result.

Note that, since the proposed spatial graph convolution operation defined by Eq.(10) only extracts new features for the aligned grid vertex and does not change the orders of the aligned vertices, the output of is still an aligned vertex grid structure with the same vertex order of .

4.2 The Architecture of the Proposed ASGCN Model

In this subsection, we introduce the architecture of the proposed ASGCN Model. Fig.1 has shown the overview of the ASGCN model. Specifically, the architecture is composed of three sequential stages, i.e., 1) the grid structure construction and input layer, 2) the spatial graph convolution layer, and 3) the traditional CNNand Softmax layers.

The Grid Structure Construction and Input Layer: For the proposed ASGCN model, we commence by employing the transitive matching method defined earlier to map each graph of arbitrary sizes into the fixed-sized aligned grid structure, including the aligned vertex grid structure and the associated aligned grid vertex adjacency matrix . We then input the grid structures to the proposed ASGCN model.

The Spatial Graph Convolutional Layer: For each graph , to extract multi-scale features of the aligned grid vertices, we stack multiple graph convolution layers associated with the proposed spatial graph convolution operation defined by Eq.(10) as

(11)

where is the input aligned vertex grid structure , is the number of convolution filters in graph convolution layer , is the concatenated outputs of all the convolution filters in graph convolution layer , is the output of the -th convolution filter in layer , and is the trainable parameter matrix of the -th convolution filter in layer with the filter size and the channel number .

The Traditional CNN Layer: After each -th spatial graph convolutional layer, we horizontally concatenate the output associated with the outputs of the previous to spatial graph convolutional layers as well as the original input as , i.e., and . As a result, for the concatenated output , each of its row can be seen as the new multi-scale features for the corresponding aligned grid vertex. Since is still an aligned vertex grid structure, one can directly utilize the traditional CNN on the grid structure. Specifically, Fig.1 exhibits the architecture of the traditional CNN layers associated with each . Here, each concatenated vertex grid structure is seen as a (in Fig.1 ) vertex grid structure and each vertex is represented by a -dimensional feature, i.e., the channel of each grid vertex is . Then, we add a one-dimensional convolutional layer. The convolutional operation can be performed by sliding a fixed-sized filter of size (in Fig.1 ) over the spatially neighboring vertices. After this, several MaxPooling layers and remaining one-dimensional convolutional layers can be added to learn the local patterns on the aligned grid vertex sequence. Finally, when we vary from to (in Fig.1 ), we will obtain extracted pattern representations. We concatenate the extracted patterns of each and add a fully-connected layer followed by a Softmax layer.

4.3 Discussions of the Proposed ASGCN

Comparing to existing state-of-the-art spatial graph convolution network models, the proposed ASGCN model has a number of advantages.

First, in order to transform the extracted multi-scale features from the graph convolution layers into fixed-sized representations, both the Neural Graph Fingerprint Network (NGFN) model [9] and the Diffusion Convolution Neural Network (DCNN) model [1] sum up the extracted local-level vertex features as global-level graph features through a SumPooling layer. Although the fixed-sized features can be directly read by a classifier for classifications, it is difficult to capture local topological information residing on the local vertices through the global-level graph features. By contrast, the proposed ASGCNN model focuses more on extracting local-level aligned grid vertex features through the proposed spatial graph convolution operation on the aligned grid structures of graphs. Thus, the proposed ASGCNN model can encapsulate richer local structural information than the NGFN and DCNN models associated with SumPooling.

Second, similar to the proposed ASGCN model, both the PATCHY-SAN based Graph Convolution Neural Network (PSGCNN) model [16] and the Deep Graph Convolution Neural Network model [27] also need to form fixed-sized vertex grid structures for arbitrary-sized graphs. To achieve this, these models rearrange the vertex order of each graph structure, and preserve a specified number of vertices with higher ranks. Although, unify the number of vertices for different graphs, the discarded vertices may lead to significant information loss. By contrast, the associated aligned grid structures of the proposed ASGCN model can encapsulate all the original vertex features from the original graphs, thus the proposed ASGCN model constrains the shortcoming of information loss arising in the PSGCNN and DGCNN models. On the other hand, both the PSGCNN and DGCNN models tend to sort the vertices of each graph based on the local structural descriptor, ignoring consistent vertex correspondence information between different graphs. By contrast, the associated aligned grid structure of the proposed ASGCN model is constructed through a transitive vertex alignment procedure. As a result, only the proposed ASGCN model can encapsulate the structural correspondence information between any pair of graph structures, i.e., the vertices on the same spatial position are also transitively aligned to each other.

Finally, as we have stated in Sec.4.1, the spatial graph convolution operation of the proposed ASGCN model is similar to performing standard fixed-sized convolution filters on standard grid structures. To further reveal this property, we explain the convolution process one step further associated with Fig.3. For the sample graph shown in Fig.3, assume it has vertices following the fixed spatial vertex orders (positions) , , , and , is the collection of its vertex feature vectors with feature channels, and is the -th convolution filter with the filter size and the channel number . Specifically, the procedure marked by the blue broken-line frame of Fig.3 indicates that the performing the proposed spatial graph convolution operation on the aligned vertex grid structure can be seen as respectively performing the same -sized convolution filter on five -sized local-level neighborhood vertex grid structures included in the green broken-line frame. Here, each neighborhood vertex grid structure only encapsulates the original feature vectors of a root vertex as well as its adjacent vertices from , and all the vertices follow their original vertex spatial positions in . For the non-adjacent vertices, we assign dummy vertices (marked by the grey block) on the corresponding spatial positions of the neighborhood vertex grid structures, i.e., the elements of their feature vectors are all . Since the five neighborhood vertex grid structures are arranged by the spatial orders of their root vertices from , the vertically concatenation of these neighborhood vertex grid structures can be seen as a -sized global-level grid structure of . We observe that the process of the proposed spatial convolution operation on is equivalent to sliding the fixed-sized convolution filter over with

-stride, i.e., this process is equivalent to slide a standard classical convolution filters on standard grid structures.

As a result, the spatial graph convolution operation of the proposed ASGCN model is theoretically related to the classical convolution operation on standard grid structures, bridging the theoretical gap between traditional CNN models and the spatially-based GCN models. Furthermore, since the convolution filter of the proposed ASGCN model is related to classical convolution operation, it can assign each vertex a different weighted parameter. Thus, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the convolution operation process. By contrast, as we have stated in Sec.2, the existing spatial graph convolution operation of the DGCNN model only maps each vertex feature vector in dimensions to that in dimensions, and all the vertices share the same trainable parameters. As a result, the DGCNN model has less ability to discriminate the importance of different vertices during the convolution operation.

5 Experiments

In this section, we compare the performance of the proposed ASGCN model to both state-of-the-art graph kernels and deep learning methods on graph classification problems on seven standard graph datasets. These datasets are abstracted from bioinformatics and social networks. Detailed statistics of these datasets are shown in Table.1.

 Datasets  MUTAG  PROTEINS  D&D  ENZYMES  IMDB-B  IMDB-M  RED-B
 Max # vertices              
 Mean # vertices              
 # graphs              
 # vertex labels              
 # classes              
 Description  Bioinformatics  Bioinformatics  Bioinformatics  Bioinformatics  Social  Social  Social
Table 1: Information of the Graph Datasets

Experimental Setup: We compare the performance of the proposed ASGCN model on graph classification problems with a) six alternative state-of-the-art graph kernels and b) seven alternative state-of-the-art deep learning methods for graphs. Specifically, the graph kernels include 1) the Jensen-Tsallis q-difference kernel (JTQK) with  [4], 2) the Weisfeiler-Lehman subtree kernel (WLSK) [20], 3) the shortest path graph kernel (SPGK) [6], 4) the shortest path kernel based on core variants (CORE SP) [17], 5) the random walk graph kernel (RWGK) [13], and 6) the graphlet count kernel (GK) [19]. The deep learning methods include 1) the deep graph convolutional neural network (DGCNN) [27], 2) the PATCHY-SAN based convolutional neural network for graphs (PSGCNN) [16], 3) the diffusion convolutional neural network (DCNN) [1], 4) the deep graphlet kernel (DGK) [25], 5) the graph capsule convolutional neural network (GCCNN) [22], 6) the anonymous walk embeddings based on feature driven (AWE) [12], and 7) the edge-conditioned convolutional networks (ECC) [21].

 Datasets  MUTAG  PROTEINS  D&D  ENZYMES  IMDB-B  IMDB-M  RED-B
 ASGCN              
 JTQK              
 WLSK              
 SPGK              
 CORE SP              
  GK              
 RWGK              
 Datasets  MUTAG  PROTEINS  D&D  ENZYMES  IMDB-B  IMDB-M  RED-B
 ASGCN              
 DGCNN              
 PSGCNN              
 DCNN              
 GCCNN              
 DGK              
 AWE              
 ECC              
Table 2: Classification Accuracy (In Standard Error) for Comparisons.

For the evaluation, we employ the same network structure for the proposed ASGCN model on all graph datasets. Specifically, we set the number of the prototype representations as , the number of the proposed spatial graph convolution layers as , and the number of the spatial graph convolutions in each layer as . Based on Fig.1 and Sec.4.2, we will get concatenated outputs after the graph convolution layers, we utilize a traditional CNN layer with the architecture as C-P-C-P-C-F to further learn the extracted patterns, where C denotes a traditional convolutional layer with channels, P denotes a classical MaxPooling layer of size and stride , and FC denotes a fully-connected layer consisting of hidden units. The filter size and stride of each C are all and . With the six sets of extracted patterns after the CNN layers to hand, we concatenate and input them into a new fully-connected layer followed by a Softmax layer with a dropout rate of . We use the rectified linear units (ReLU) in either the graph convolution or the traditional convolution layer. The learning rate of the proposed model is

for all datasets. The only hyperparameter we optimized is the number of epochs and the batch size for the mini-batch gradient decent algorithm. To optimize the proposed ASGCN model, we use the Stochastic Gradient Descent with the Adam updating rules. Finally, note that, our model needs to construct the prototype representations to identify the transitive vertex alignment information over all graphs. In this evaluation we proposed to compute the prototype representations from both the training and testing graphs. Thus, our model is an instance of transductive learning 

[10], where all graphs are used to compute the prototype representations but the class labels of the testing graphs are not used during the training process. For our model, we perform -fold cross-validation to compute the classification accuracies, with nine folds for training and one fold for testing. For each dataset, we repeat the experiment 10 times and report the average classification accuracies and standard errors in Table.2.

For the alternative graph kernels, we follow the parameter setting from their original papers. We perform

-fold cross-validation using the LIBSVM implementation of C-Support Vector Machines (C-SVM) and we compute the classification accuracies. We perform cross-validation on the training data to select the optimal parameters for each kernel and fold. We repeat the experiment 10 times for each kernel and dataset and we report the average classification accuracies and standard errors in Table.

2. Note that for some kernels we directly report the best results from the original corresponding papers, since the evaluation of these kernels followed the same setting of ours. For the alternative deep learning methods, we report the best results for the PSGCNN, DCNN, DGK models from their original papers, since these methods followed the same setting of the proposed model. For the AWE model, we report the classification accuracies of the feature-driven AWE, since the author have stated that this kind of AWE model can achieve competitive performance on label dataset. Finally, note that the PSGCNN and ECC models can leverage additional edge features, most of the graph datasets and the alternative methods do not leverage edge features. Thus, we do not report the results associated with edge features in the evaluation. The classification accuracies and standard errors for each deep learning method are also shown in Table.2.

Experimental Results and Discussions: Table.2 indicates that the proposed ASGCN model can significantly outperform either the remaining graph kernel methods or the remaining deep learning methods for graph classification. Specifically, for the alternative graph kernel methods, only the accuracies of the SPGN and CORE SP kernels on the IMDB-M and RED-B datasets are a little higher than the proposed model. However, the proposed model is still competitive on the IMDB-M and RED-B datasets. On the other hand, for the alternative deep learning method, only the accuracies of the GCCNN and AWE models on the ENZYMES and IMDB-M datasets is higher than the proposed method. But the proposed model is still competitive on the IMDB-M dataset.

Overall, the reasons for the effectiveness are fourfold. First, the C-SVM classifier associated with graph kernels are instances of shallow learning methods [28]. By contrast, the proposed model can provide an end-to-end deep learning architecture, and thus better learn graph characteristics. Second, as we have discussed earlier, most deep learning based graph convolution methods cannot integrate the correspondence information between graphs into the learning architecture. Especially, the PSGCNN and DGCNN models need to reorder the vertices and some vertices may be discarded, leading to information loss. By contrast, the associated aligned vertex grid structures can preserve more information of each original graphs, reducing the problem of information loss. Third, unlike the proposed model, the DCNN model needs to sum up the extracted local-level vertex features as global-level graph features. By contrast, the proposed model can learn richer multi-scale local-level vertex features. The experiments demonstrate the effectiveness of the proposed model. Finally, as instances of spatially-based GCN models, the trainable parameters of the DGCNN and CNN models are shared for each vertex. Thus, these models cannot directly influence the aggregation process of the vertex features. By contrast, the required graph convolution operation of the proposed model is theoretically related to the classical convolution operation on standard grid structures and can adaptively discriminate the importance between specified vertices.

6 Conclusions

In this paper, we have developed a new spatially-based GCN model, namely the Aligned-Spatial Graph Convolutional Network (ASGCN) model, to learn effective features for graph classification. This model is based on transforming the arbitrary-sized graphs into fixed-sized aligned grid structures, and performing a new developed spatial graph convolution operation on the grid structures. Unlike most existing spatially-based GCN models, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the process of the spatial graph convolution operation, explaining the effectiveness of the proposed model. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no.61503422 and 61602535), the Open Projects Program of National Laboratory of Pattern Recognition (NLPR), and the program for innovation research in Central University of Finance and Economics.

References

  • [1] Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Proceedings of NIPS. pp. 1993–2001 (2016)
  • [2] Bai, L., Cui, L., Rossi, L., Xu, L., Hancock, E.: Local-global nested graph kernels using nested complexity traces. Pattern Recognition Letters (To appear)
  • [3] Bai, L., Hancock, E.R.: Depth-based complexity traces of graphs. Pattern Recognition 47(3), 1172–1186 (2014)
  • [4] Bai, L., Rossi, L., Bunke, H., Hancock, E.R.: Attributed graph kernels using the jensen-tsallis q-differences. In: Proceedings of ECML-PKDD. pp. 99–114 (2014)
  • [5] Bai, L., Rossi, L., Zhang, Z., Hancock, E.R.: An aligned subtree kernel for weighted graphs. In: Proceedings of ICML
  • [6] Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In: Proceedings of the IEEE International Conference on Data Mining. pp. 74–81 (2005)
  • [7] Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. CoRR abs/1312.6203 (2013)
  • [8] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of NIPS. pp. 3837–3845 (2016)
  • [9] Duvenaud, D.K., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: Proceedings of NIPS. pp. 2224–2232 (2015)
  • [10] Gammerman, A., Azoury, K.S., Vapnik, V.: Learning by transduction. In: Proceedings of UAI. pp. 148–155 (1998)
  • [11] Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. CoRR abs/1506.05163 (2015), http://arxiv.org/abs/1506.05163
  • [12] Ivanov, S., Burnaev, E.: Anonymous walk embeddings. In: Proceedings of ICML. pp. 2191–2200 (2018)
  • [13] Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of ICML. pp. 321–328 (2003)
  • [14] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016), http://arxiv.org/abs/1609.02907
  • [15]

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM

    60(6), 84–90 (2017)
  • [16] Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: Proceedings of ICML. pp. 2014–2023 (2016)
  • [17] Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: Proceedings of IJCAI. pp. 2595–2601 (2018)
  • [18] Rippel, O., Snoek, J., Adams, R.P.: Spectral representations for convolutional neural networks. In: Proceddings of NIPS. pp. 2449–2457 (2015)
  • [19] Shervashidze, N., Vishwanathan, S., T. Petri, K.M., Borgwardt, K.M.: Efficient graphlet kernels for large graph comparison. Journal of Machine Learning Research 5, 488–495 (2009)
  • [20] Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 1, 1–48 (2010)
  • [21] Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of CVPR. pp. 29–38 (2017)
  • [22] Verma, S., Zhang, Z.: Graph capsule convolutional neural networks. CoRR abs/1805.08090 (2018), http://arxiv.org/abs/1805.08090
  • [23] Vialatte, J., Gripon, V., Mercier, G.: Generalizing the convolution operator to extend cnns to irregular domains. CoRR abs/1606.01166 (2016), http://arxiv.org/abs/1606.01166
  • [24] Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2011)
  • [25] Yanardag, P., Vishwanathan, S.V.N.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015. pp. 1365–1374 (2015)
  • [26]

    Zambon, D., Alippi, C., Livi, L.: Concept drift and anomaly detection in graph streams. IEEE Trans. Neural Netw. Learning Syst.

    29(11), 5592–5605 (2018)
  • [27] Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: Proceedings of AAAI (2018)
  • [28] Zhang, S., Liu, C., Yao, K., Gong, Y.: Deep neural support vector machines for speech recognition. In: Proceedings of ICASSP. pp. 4275–4279 (2015)