1 Introduction
Many of the modern data are naturally represented by graphs (Shrivastava and Li, 2014; Bronstein et al., 2017; Dadaneh and Qian, 2016; Cook and Holder, 2006; Qi et al., 2017b)
and it is an important research problem to design neural network architectures which can work with graphs. The adjacency matrix of a graph exhibits the local connectivity of the nodes. Thus, it is straightforward to extend the local feature aggregation used in the Convolutional Neural Networks (CNNs) to the Graph Neural Networks (GNNs)
(Simonovsky and Komodakis, 2017; Atwood and Towsley, 2016; Niepert et al., 2016; Bruna et al., 2014; Fey et al., 2018; Zhang et al., 2018; Gilmer et al., 2017). However, when the nodes of the graphs are not labeled/attributed or the labels/attributes of the nodes do not carry information about the differences between the nodes or any information about their locations in the graph, the GNN can fail to extract discriminative features. The GNN maps nodes whose corresponding local structures are similar to same/close feature vectors in the continuous feature space. Accordingly, the GNN fails to distinguish graphs whose difference is not in their local structures. In addition, when the nodes are not labeled/attributed, the spatial graph convolution only propagates information about the degree of the nodes which might not lead to extracting informative features about the topological structure of the graph. Another limitation of the current GNN architectures is that they are mostly unable to do the hierarchical feature learning employed in the CNNs (Krizhevsky et al., 2012; He et al., 2016). The main reason is that graphs lack the tensor representation and it is difficult to measure how accurate a subset of nodes represent the topological structure of the given graph.
Summary of Contributions. In this paper, we focus on the graph classification problem. The main contributions of this paper can be summarized as follows.

A shortcoming of the GNNs is discussed and it is shown that the existing GNNs can fail to learn to perform even simple graph analysis tasks. It is shown that the proposed approach which leverages a spatial representation of the graph effectively addresses the shortcoming of the GNN. Several new experiments are presented to demonstrate the shortcoming of the GNNs and they show that providing the geometrical representation of the graph to the neural network substantially improves the capability of the GNN in inferring the structure of the graph.

The geometrical representation of the graph is leveraged to design a novel graph pooling method. The proposed approach simplifies the graph down sampling problem into a column/row sampling problem. The proposed approach samples a subset of the nodes such that they preserve the structure of the graph. It is shown that the proposed approach achieves competitive or better performance in comparison with the existing methods.
Notation. Given a vector , denotes its Euclidean norm, and denotes its element. Given a matrix , denotes the row of . A graph with nodes is represented by two matrices and , where is the adjacency matrix, is the matrix of node labels/attributes, and is the dimension of the attributes/labels of the nodes. The operation means that the content of is set equal to the content of . If is a set of indices, is the matrix of the rows of whose indexes are in . Vector is defined as . The local structure corresponding to a node is the structure of the graph in the close neighbourhood of the node.
2 Related Work
This paper mainly focuses on the graph classification problem. In the proposed approach, the geometrical representation of the graph provided by a graph embedding algorithm is utilized to make the neural network aware of the topological structure of the graph. In this section, some of the related research works in GNN and graph embedding are briefly reviewed.
Graph Embedding: A graph embedding method aims at finding a continuous embedding vector for each node of the graph such that the topological structure of the graph is encoded in the spatial distribution of the embedding vectors. The nodes which are close on the graph or they share similar structural role are mapped to nearby points in the embedding space and vice versa. In general, the graph embedding methods fall into three broad categories: matrix factorization based methods (Roweis and Saul, 2000; Ou et al., 2016; Belkin and Niyogi, 2002; Ahmed et al., 2013; Cao et al., 2015), randomwalk based approaches (Perozzi et al., 2014; Grover and Leskovec, 2016)
, and deep learning based methods
(Wang et al., 2016). In this paper, we use the DeepWalk graph embedding method (Perozzi et al., 2014) which is a random walked based method.Graph Neural Networks: In recent years, there has been a surge of interest in developing deep network architectures which can work with graphs (Kipf and Welling, 2017; Niepert et al., 2016; Kipf et al., 2018; Hamilton et al., 2017; Fout et al., 2017; Gilmer et al., 2017; Tixier et al., 2019; Simonovsky and Komodakis, 2017; Bruna et al., 2014; Bronstein et al., 2017; Duvenaud et al., 2015; Li et al., 2016)
. Local connectivity, weight sharing, and shift invariance of the convolution layer in the CNNs have led to remarkable achievements in computer vision and natural language processing
(Goodfellow et al., 2016). Accordingly, there are remarkable works focusing on the design of graph convolution layers encoding the traditional properties. Most of the existing graph convolution layers can be loosely divided into two main subsets: the spatial convolution layers and the spectral convolution layers. The spectral methods are based on the generalization of spectral filtering in graph signal processing (Bruna et al., 2014; Defferrard et al., 2016; Henaff et al., 2015; Levie et al., 2017). The major drawback of the spectral based methods is the nongeneralizability to data residing over multiple graphs which is due to the dependency on the basis of the graph Laplacian. In contrast to spectral graph convolution, the spatial graph convolution performs the convolution directly in the nodal domain and can be generalized across graphs (Niepert et al., 2016; Zhang et al., 2018; Nguyen et al., 2018; Simonovsky and Komodakis, 2017; Schlichtkrull et al., 2018; Veličković et al., 2017; Verma and Zhang, 2018). If the sum function is used to aggregate the local feature vectors, a simple spatial convolution layer can be written as(1) 
where is the elementwise nonlinear function. Some papers use a normalized version of (1) via multiplying (1) with the inverse of the degree matrix (Ying et al., 2018). The weight matrix transforms the feature vector of the given node and transforms the feature vectors of the neighbouring nodes. A drawback of the spatial convolution is that the aggregation function might not be an injective function, i.e., two nodes with different local structures can be mapped to the same feature vector by the convolution layer. The authors of (Xu et al., 2019) showed that with a minor modification, the sum aggregation function can become an injective function. However, when the nodes are not labeled/attributed or the labels/attributes do not inform the GNN about the location of the nodes, the GNN is not able to distinguish graphs whose local structures are similar but their global structures are different.
In order to obtain a global representation of the graph, the local feature vectors obtained by the convolution layers should be aggregated into a final feature vector. The elementwise max/mean functions are used widely to aggregate all the local feature vectors. Inspired by the hierarchical feature extraction in the CNNs, tools have been developed to perform nonlocal feature aggregation
(Duvenaud et al., 2015). In (Zhang et al., 2018), the nodes are ranked and the ranking is used to build a sequence of nodes using a subset of the nodes. Subsequently, a 1dimensional CNN is applied to the sequence of the nodes to perform nonlocal feature aggregation. However, the way that (Zhang et al., 2018) builds the sequence of the nodes is not data driven. In addition, a 1dimensional array might not have the capacity to preserve the topological structure of the graph. In (Ying et al., 2018), a soft graph pooling method was proposed which learns a set of cluster centers that are used to downsample the extracted local feature vectors. The graph pooling methods proposed in (Lee et al., 2019; Gao and Ji, 2019) are similar to the soft graph pooling method presented in (Ying et al., 2018) but they learn one cluster center which is used to rank the nodes and they sample the nodes which are closer to the learned cluster center. In (Defferrard et al., 2016; Fey et al., 2018), the graph clustering algorithms were used to perform graph downsizing.3 The shortcoming of the GNNs
Suppose the nodes of the given graphs are not labeled/attributed or assume that the labels/attributes do not contain any information about the role/location of the nodes in the global structure of the graph. Therefore, if two different nodes are not labeled or they are labeled similarly, they appear the same to the GNN. Accordingly, if the local structures corresponding to two nodes are similar, the convolution layer of the GNN maps them to the same feature vector in the continuous feature space. This means that if the local structures of two different graphs are similar, the GNN cannot extract discriminative features to distinguish them. In the followings, we study an example to clarify this feature of the GNN.
Illustrative example: Suppose we have a dataset of clustered unlabeled graphs (nodes and edges are not labeled/attributed) and assume that the clusters are not topologically different (a common generator created all the clusters). Every node is connected to small a set of the other nodes in its corresponding cluster. One class of the graphs consist of two clusters and the other class of graphs consist of three clusters. Therefore, the task is to infer if a graph is made of two clusters or three clusters. Consider the simplest case in which there is no connection between the clusters. Assume that we use a typical GNN which is composed of multiple spatial convolution layers and a global elementwise max/mean pooling layer. Suppose that a given graph belongs to the first class, i.e., it consists of two clusters. Define and as the aggregation of the local feature vectors corresponding to the first and the second clusters, respectively. The global feature vector of this graph is equal to the elementwise mean/max of and . Clearly, can be indistinguishable from since the clusters are generated using the same generator and the GNN is not aware of the location of the nodes. Therefore, the feature vector of the whole graph can also be indistinguishable from . The same argument is also true for a graph with three clusters. Accordingly, the representation obtained by the GNN is unable to distinguish this two classes of graphs. If one trains a GNN on this task, the test accuracy is around 50 which is not better than random guess. The main reason is that the local structures corresponding to all the nodes in a graph are similar. In addition, the local structures in a graph with two clusters is not different from the local structures in a graph with three clusters. Since the GNN is not aware of the location of the nodes, all the nodes are mapped to similar feature vectors in the feature space. Accordingly, the spatial distribution of the local feature vectors of the two classes of the graphs are indistinguishable.
As another example, note the last three columns of Table 1 which are corresponding to three classification tasks described in Section 5. The second row shows the accuracy of a GNN on these tasks. In all the three tasks, the performance of the GNN is comparable to random guess.
PROTEINS  NCI1  DD  ENZYM  SYNTHIE  HLLD  CNLC  CNC  
GNN  76.87  69.09  75.68  43.30  67.75  54.24  37.33  36.40 
GNNESR  79.47  73.72  79.77  53.51  71.15  99.10  98.18  99 
Improvement  3.4%  6.7%  5.4%  23.5%  5.0%  79.4%  156.1%  179.6% 
Classification accuracy while nodes and edges are not labeled/attributed. Both the GNN and the GNNESR are composed of three spatial convolution layers and a global max pooling layer.
4 Proposed Approach
In section 3, it was shown that the GNN can fail to infer the topological structure of the graph when the local structures in the graph are similar. The GNN is not aware of the location of each node in the global structure of the graph and it maps all the nodes whose corresponding local structures are similar to same/close points in the feature space. This was the main reason that the GNN failed to learn to perform the task described in Section 3. Ideally, if the extracted feature vector for each node is a function of its location in the graph, the nodes are mapped to different points (corresponding to their locations) in the feature space and the GNN can distinguish the graphs via analyzing the spatial distribution of the extracted local feature vectors. Accordingly, we propose an approach using which the extracted feature vectors depend on the location/role of the nodes in the global structure of graph. Another motivation for the proposed approach is the remarkable success of the neural networks in analysing pointclod data (Qi et al., 2017a). In pointcloud data, each data point is corresponding to a point on the surface of the object and the location of each point in the 3D space is included in the feature vector which represents each point of the surface. The neural networks which process the pointcloud representation yielded the stateoftheart performance in the 3D vision tasks (Qi et al., 2017b).
In order to include the locations of the nodes in the extracted local feature vectors, first we have to define a representation for the location of each node. Suppose that vector in the dimensional Euclidean space contains the information about the location of the node. Evidently, if the node and the node are close to each other on the graph, and should be close to each other in the continuous space and vice versa. Interestingly, a graph embedding method can perfectly provide the location representation vectors . Graph embedding maps each node to a vector in the embedding space such that the distance between two points in the embedding space is proportional to the distance of the corresponding nodes on the graph. Accordingly, in order to make the extracted local feature vectors a function of the role of the nodes in the structure of the graph, we propose the approach depicted by the red box in Figure 1.
In Figure 1, the matrix represents the node labels/attributes and denotes the matrix of the embedding vectors obtained by a graph embedding method (e.g., DeepWalk). The red box represents a GNN Equipped with the Spatial Representation (GNNESR). The GNNESR leverages both the node labels/attributes and the node embedding vectors to extract the local features. Accordingly, the GNNESR is aware of the locations of the nodes and the distance between different nodes and it does not map nodes whose corresponding local structures are similar to the same feature vector. The convolution layers of the GNNESR provides a spatial distribution of the local feature vectors which the next layers use to infer the structure of the graph. For instance, in the example discussed in Section 3, the spatial distribution of the local feature vectors corresponding to two graphs in two different classes were indistinguishable. In sharp contrast, if the local feature vectors are extracted using the GNNESR, they can build two/three clusters in the feature space if the graph is composed of two/three clusters. Accordingly, the distribution of the feature vectors makes the two classes distinguishable.
4.1 Graph Pooling via Data Point Sampling
One of the main challenges of extending the architecture of the CNNs to graphs is to define a pooling function which is applicable to graphs. In this section, the proposed graph pooling method is presented which utilizes the geometrical representation of the graph. The proposed method is composed of two main steps: node sampling and graph downsampling.
Node Sampling: It is not straightforward to measure how accurate a subsampled graph represents the topological structure of the given graph. Graph embedding encodes the topological structures of the graph in the spatial distribution of the embedding vectors. Accordingly, the node sampling problem can be simplified into a data point sampling problem. Since the distribution of the embedding vectors represents the topological structure of the graph, we define the primary aim of the proposed pooling function to preserve the spatial distribution of the embedding vectors.
Data point sampling is a wellknown problem in big data analysis (Halko et al., 2011) known as column/row sampling problem. A simple method is to perform random data point sampling. However, if the distribution of the data points is sparse in some region of the space, random sampling might not be able to capture the spatial distribution of the data points. Most of the existing sampling methods aim at finding a small set of informative data points whose span is equal to the rowspace of the data. However, preserving the rowspace of the data is not necessarily equivalent to preserving the spatial distribution of the rows (Rahmani and Atia, 2017). Define as the set of sampled embedding vectors where is the set of the indexes of the sampled embedding vectors. We define the objective of the sampling method as
(2) 
where is the cardinality of . The minimization problem (2) samples embedding vectors such that the summation of the distances between the embedding vectors and their nearest sampled embedding vector is minimum. This minimization problem is nonconvex and it is hard to solve. We propose Algorithm 1 which uses the farthest data point sampling and provides a greedy method to find the sampled data points. Algorithm 1 samples the first embedding vector randomly. In the subsequent steps, the next embedding vector is sampled such that it has the maximum distance to the previously sampled embedding vectors. Accordingly, in each step the embedding vectors which are not close to sampled ones are targeted and gradually the sampled embedding vectors cover the distribution of all the embedding vectors.
Input. The matrix of embedding vectors and as the number of sampled nodes.
0. Initialization. Sample index from set randomly and initialize set .
1. Repeat times: Sample the next embedding vector such that it has the maximum distance to the sampled embedding vectors, i.e., append to set such that
(3) 
2. Output: contains indexes of the sampled nodes.
Graph Down Sampling: Define as the matrix of feature vectors (Figure 1) and define and as the matrix of feature vectors and the adjacency matrix of the subsampled graph. The subsampled adjacency matrix
is obtained via sampling the columns and rows of corresponding to the indexes in .
We present two methods to obtain .
Method 1: Sample the rows of corresponding to the indexes in . In other word,
(4) 
Method 2: In (4), the feature vectors of the unsampled nodes are discarded. Thus, some useful information could be lost during the downsampling step. In the second method, we utilize the rows of corresponding to the sampled indexes as pivotal feature vectors and all the feature vectors are aggregated around them using the attention technique (Xu et al., 2015). Define as the matrix of pivotal feature vectors equal to which is equivalent to the matrix of the subsampled feature vectors in Method 1. Using the pivotal feature vectors, we define assignment vectors as follows
(5) 
where is the row of . The vector is an attention vector which represents the resemblance between all the feature vectors and the pivotal vector. The attention vector is used to aggregate the feature vectors of and obtain the feature vector of the subsampled graph as follows
(6) 
where is the row of .
Remark 1.
If the nodes of the given graph are sparsely connected, the downsampled graph can be a disconnected graph. Define If the distance between the node and the node is less than , is nonzero. Therefore, if we downsample to obtain the adjacency matrix of the subsampled graph, two sampled nodes are connected if their distance is less than . In the presented experiments, we downsized to obtain .
Remark 2.
We presented two methods to obtain . In the presented experiments, we used both of them to obtain . The final was obtained as a convex combination of them.
5 Numerical Experiments
First we focus on demonstrating the significance of equipping the GNN with the spatial representation. Subsequently, the proposed pooling method (Spatial Pooling) is compared with the existing graph pooling methods.
The structure of the basis neural network: In the presented experiments, GNNESR indicates the neural network depicted by the red box in Figure 1 with and
. The dimensionality of the output of all the convolution layers is equal to 64. Each convolution layer is equipped with batchnormalization
(Ioffe and Szegedy, 2015)and ReLu is used as the elementwise nonlinear function. The output of all the convolution layers are concatenated to obtain the global representation of the graph and the elementwise max function is used as the global aggregator. The fully connected classifier is composed of three fully connected layers. The first two layers of the fully connected classifier are equipped with dropout
(Srivastava et al., 2014)( dropout probability equal to
) and batchnormalization. In the presented tables, GNN indicates a graph neural network similar to the GNNESR while is not provided as an input.The input and the optimizer: The DeepWalk graph embedding method (Perozzi et al., 2014) was used to embed the graphs and the dimension of the embedding vectors is equal to . The length of the random walks is determined as where is the number of the nodes. All the neural networks were similarly trained using the Adam optimizer (Kingma and Ba, 2015). The learning rate was initialized at 0.005 and was reduced to 0.0005 during the training process. Following the conventional settings, we perform fold cross validation (Zhang et al., 2018). We refer the reader to (Kersting et al., 2016) for a complete description of the used realworld datasets. The size of each synthetic dataset is equal to 1000.
5.1 Analyzing unlabeled/nonattributed graphs
In this section, we consider graphs whose nodes are not labeled/attributed. In some of the utilized real datasets, the nodes are labeled/attributed which are discarded in this experiment to examine the ability of the neural networks in inferring the topological features of the graphs in the absence of the labels/attributes. First we describe the designed synthetic graph classification tasks.
High Level Loop Detection (HLLD): In this task, the generated graphs are composed of 3 to 6 clusters (the number of clusters are chosen randomly per graph). Each cluster is composed of 20 to 45 nodes (the number of nodes are chosen randomly per cluster). Each node in a cluster is connected to 5 other nodes in the same cluster. In addition, if two clusters are connected, 3 nodes of one cluster are densely connected to 3 nodes of the other cluster. In the generated graphs, the consecutive clusters are connected. The classifiers are trained to detect if the clusters in a graph form a loop. Obviously, there are many small loops inside each cluster. The task is to detect if the clusters form a high level loop. Accordingly, this task is equivalent to a binary classification problem. Part (a) of Figure 2 shows an example of the HLLD task.
Count Number of Clusters (CNC): In this task, the graphs are similar to the graphs in the HLLD task. However, the objective of the CNC task is to count the number of clusters. The generated graphs are composed of 3 to 6 clusters. This task is equivalent to a classification task with four classes.
Count the Number of the Loops of Clusters (CNLC):
In this task, the generated graphs contain several clusters and the clusters form multiple loops. For instance, the graph depicted in Part (b) of Figure 2 consists of 11 clusters and the clusters form 3 loops. In this experiment, the clusters can form 2 to 4 loops
and each loop is formed with 3 to 5 clusters. The number of clusters in each loop is chosen randomly per loop.
The objective of this task is to count the number of the loops. Thus, this task is equivalent to a graph classification task with three classes.
Table 1 shows the classification accuracy of a simple GNN and the GNNESR. The last three columns show the performance with the synthetic datasets. One can observe that the GNN failed to learn to perform the synthetic tasks. The main reason is that in the synthetic tasks, the local structure corresponding to all the nodes are similar and the convolution layers map all the nodes to similar vectors in the feature space. Thus, the GNN cannot extract discriminative features from the distribution of the local feature vectors. In sharp contrast, the GNNESR is aware of the differences/similarities between the nodes and it does not necessarily map them to similar feature vectors when the nodes are not close to each other on the graph. The results show that the GNNESR successfully extracts discriminative features from the distribution of the extracted local feature vectors. Moreover, the results with real datasets demonstrate that the proposed approach significantly (up to 23 ) improves the classification accuracy. The spatial representation makes the GNNESR aware of the difference between the nodes (although they are not labeled) and it paves the way for the GNNESR to extract discriminative features from the distribution of the embedding vectors.
5.2 Analyzing labeled graphs
This experiment is similar to the previous experiment but the node labels are included. We use the real datasets and the following synthetic tasks.
Number of Labeled Clusters (NLC): The graphs in this task are similar to the graphs in the CNC task. Each graph is composed of 6, 7, or 8 clusters. The consecutive clusters are connected and they form a loop. The labels of the nodes are binary (0 or 1) and all the nodes in the same cluster are labeled similarly. The task is to count the number of clusters whose nodes are labeled 1. There are 4 classes of graphs, i.e., the number of clusters which are labeled 1 are equal to 2, 3, 4, or 5. Figure 3 shows an example of this task.
Match the Diagonal Clusters MDC: In this task, each graph is composed of 6, 8, or 10 clusters and the clusters form a loop. The labels of the nodes are binary and all the nodes in a cluster are labeled similarly. The task is to check if all the pairs of diagonal clusters are labeled similarly. A pair of clusters are called a pair of diagonal clusters if they are in the maximum distance from each other. Therefore, the MDC task is equivalent to a binary classification task. Figure 3 shows an example of this graph classification task.
PTC  PROTEINS  DD  ENZYM  SYNTHIE  NLC  MDC  
GNN  77.44  80.01  82.90  61.66  67.75  90.33  86.44 
GNNESR  77.64  80.54  83.33  69.16  71.15  100  97.50 
Improvement  0.25  0.66  0.51  12.17  5.01  10.70  12.80 
Table 2 shows classification accuracy with different datasets. One can observe that even when the nodes are labeled, the difference between the performances is significant for most of the datasets. The main reason is that the spatial representation makes the neural network aware of the differences between the nodes and the neural network leverages both the node labels and the node embedding vectors to extract discriminative features.
5.3 Graph Pooling
In this experiment, the proposed graph pooling method (Spatial Pooling) is compared with some of the recently published pooling algorithms. In all the neural networks, the described GNNESR is used to extract the local feature vectors. Similar to the architecture depicted in Figure 1, the pooling layer is placed next to last convolution layer of the GNNESR. Since the graphs in the datasets are not large graphs (mostly less than 100 nodes), one downsampling layer was used. Four pooling methods are implemented as follows:

Spatial Pooling (our proposed method): Two spatial convolution layers were implemented after the downsampling step. Define as the concatenation of the outputs of all the convolution layers before the pooling layer (three convolution layers) and define as the aggregation of these feature vectors (by the elementwise max function). Similarly, define as the concatenation of the outputs of all the convolution layers after the pooling layer and define similar to . The final representation of the graph is defined as the concatenation of and and it is used as the input of the final classifier. We chose .
MDC  PROTEINS  DD  ENZYM  SYNTHIE  
Global MaxPooling  97.50  80.54  83.33  69.16  71.15 
SpatialPooling  98.33  80.72  83.52  71.33  72 
SortPooling  70  80.54  82.39  63.16  67.75 
DiffPool  97.8  80.45  84.36  70.33  71.75 
RankPooL  97.66  80.72  83.67  67.5  71.5 
Table 3 shows the accuracy of the GNNESR with different pooling methods. One can observe that SpatialPooling achieves higher or comparable results on all the datasets. The DiffPool method also achieves notable performance on most of the datasets. The feature vectors which DiffPool uses to downsize the graph are fixed. In contrast, the way the proposed approach downsizes the graph depends on the topology of the graph not a set of fixed cluster centers. One can observe that the accuracy of SortPool is significantly less than the accuracy of SpatialPooling and DiffPool with the MDC dataset. The main reason is that SortPool sorts the nodes in a 1dimensional array. The placement of the nodes in a 1dimensional array leads to loosing important information about the structure of the graph. An observation one can make by studying the results is that the performance of the GNNESR is close to the performance of the GNNESR with the pooling layers. In contrast, the performance of a simple GNN with the pooling layers can be notably higher than the performance of the GNN (Yanardag and Vishwanathan, 2015).
MDC  CNC  HLLD  ENZYM  SYNTHIE  
GNN  86.44  36.40  54.24  61.66  67.75 
GNN + DiffPool  95.55  36.51  56.33  69.49  67 
GNNESR  97.50  99  99.10  69.16  71.15 
GNNESR + DiffPool  97.8  100  100  70.33  71.75 
For instance, Table 4 shows the performance of the neural networks with and without the pooling layers. One can observe that the performance of the GNN equipped with the pooling layer is notably higher than the performance of the GNN on the MDC and the ENZYM datasets. In contrast, the difference between the performances reported in the third and the forth rows is lower. Another predictable point that Tables 4 makes is that the pooling layer cannot make the GNN able to learn to perform the CNC task or the HLLD task (and also the CNLC task). In these tasks, the neural network should be able to distinguish the nodes to be able to infer the structure of the graph.
6 Conclusion
An important shortcoming of the GNN was discussed. It was shown that if the nodes/edges of the given graph are not labeled/attributed or the labels/attributes do not make the GNN aware of the role of the nodes in the structure of the graph, the GNN can fail to infer the topological structure of the graph. Motivated by the success of the deep networks in analyzing pointcloud data, we proposed an approach in which a geometrical representation of the graph is provided to the GNN. The geometrical representation is computed by a graph embedding method which encodes the topological structure of the graph into the spatial distribution of the embedding vectors. We showed that the proposed approach significantly empowers the GNN via analyzing its performance on a diverse set of real and synthetic datasets. Moreover, the spatial representation was utilized to simplify the graph downsampling problem and a novel graph pooling method was proposed.
References
 Distributed largescale natural graph factorization. In International Conference on World Wide Web (WWW), pp. 37–48. Cited by: §2.
 Diffusionconvolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 1993–2001. Cited by: §1.
 Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems (NIPS), pp. 585–591. Cited by: §2.
 Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine 34, pp. 18–42. Cited by: §1, §2.
 Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR), Cited by: §1, §2.
 Grarep: learning graph representations with global structural information. In ACM International on Conference on Information and Knowledge Management (CIKM), pp. 891–900. Cited by: §2.
 Mining graph data. John Wiley & Sons. Cited by: §1.
 Bayesian module identification from multiple noisy networks. EURASIP Journal on Bioinformatics and Systems Biology 2016, pp. 5. Cited by: §1.
 Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems (NIPS), pp. 3844–3852. Cited by: §2, §2.
 Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems (NIPS), pp. 2224–2232. Cited by: §2, §2.

SplineCNN: fast geometric deep learning with continuous bspline kernels.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 869–877. Cited by: §1, §2.  Protein interface prediction using graph convolutional networks. In Advances in Neural Information Processing Systems (NIPS), pp. 6530–6539. Cited by: §2.

Graph unets.
In
International conference on machine learning (ICML)
, pp. 2083–2092. Cited by: §2, item 3.  Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), pp. 1263–1272. Cited by: §1, §2.
 Deep learning. Vol. 1, MIT Press. Cited by: §2.
 Node2vec: scalable feature learning for networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 855–864. Cited by: §2.
 Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53, pp. 217–288. Cited by: §4.1.
 Representation learning on graphs: methods and applications. IEEE Data Engineering Bulletin 40, pp. 52–74. Cited by: §2.
 Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. Cited by: §1.
 Deep convolutional networks on graphstructured data. CoRR abs/1506.05163. Cited by: §2.
 Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), pp. 448–456. Cited by: §5.
 Benchmark data sets for graph kernels. External Links: Link Cited by: §5.
 Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §5.
 Neural relational inference for interacting systems. In International Conference on Machine Learning (ICML), pp. 2693–2702. Cited by: §2.
 Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §2.
 Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pp. 1097–1105. Cited by: §1.
 Selfattention graph pooling. In International conference on machine learning (ICML), pp. 3734–3743. Cited by: §2, item 3.
 Cayleynets: graph convolutional neural networks with complex rational spectral filters. IEEE Transactions on Signal Processing 67, pp. 97–109. Cited by: §2.
 Gated graph sequence neural networks. In International Conference on Learning Representations (ICLR), pp. 2224–2232. Cited by: §2.
 Learning graph representation via frequent subgraphs. In SIAM International Conference on Data Mining, San Diego, USA, pp. 306–314. Cited by: §2.
 Learning convolutional neural networks for graphs. In International conference on machine learning (ICML), pp. 2014–2023. Cited by: §1, §2.
 Asymmetric transitivity preserving graph embedding. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1105–1114. Cited by: §2.
 Deepwalk: online learning of social representations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 701–710. Cited by: §2, §5.
 Pointnet: deep learning on point sets for 3d classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. Cited by: §4.
 Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems (NIPS), pp. 5099–5108. Cited by: §1, §4.
 Spatial random sampling: a structurepreserving data sketching tool. IEEE Signal Processing Letters 24 (9), pp. 1398–1402. Cited by: §4.1.
 Nonlinear dimensionality reduction by locally linear embedding. science 290, pp. 2323–2326. Cited by: §2.
 Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pp. 593–607. Cited by: §2.
 A new space for comparing graphs. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 62–71. Cited by: §1.
 Dynamic edgeconditioned filters in convolutional neural networks on graphs. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 29–38. Cited by: §1, §2.
 Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, pp. 1929–1958. Cited by: §5.
 Graph classification with 2d convolutional neural networks. In International Conference on Artificial Neural Networks (ICANN), pp. 578–593. Cited by: §2.
 Graph attention networks. CoRR abs/1710.10903. Cited by: §2.
 Graph capsule convolutional neural networks. CoRR abs/1805.08090. Cited by: §2.
 Structural deep network embedding. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1225–1234. Cited by: §2.
 Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning (ICML), pp. 2048–2057. Cited by: §4.1.
 How powerful are graph neural networks?. In International Conference on Learning Representations (ICLR), Cited by: §2.
 Deep graph kernels. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1365–1374. Cited by: §5.3.
 Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems (NeurIPS), pp. 4805–4815. Cited by: §2, §2, item 2.
 An endtoend deep learning architecture for graph classification. In AAAI Conference on Artificial Inteligence (AAAI), pp. 4438–4445. Cited by: §1, §2, §2, item 4, §5.
Comments
There are no comments yet.