I Introduction
Over the past decade, deep learning techniques such as convolutional neural networks (CNNs) have transformed fields like computer vision and other Euclidean data domains (i.e., domains in which data have a uniform, gridlike structure). Many important domains, however, are comprised of nonEuclidean data (i.e., data have irregular relationships that require mathematical concepts like graphs or manifolds to explicitly model). Such domains include social networks, sensor feeds, web traffic, supply chains, and biological systems. As these data grow in size and complexity, deep learning seems to recommend itself as a tool for classification and pattern recognition, but conventional deep learning approaches are often sharply limited when data lack a Euclidean structure to exploit. There are ongoing efforts to extend deep learning to these nonEuclidean domains, and such techniques have been dubbed
geometric deep learning [3].In parallel with advances in geometric deep learning are advances in graph signal processing (GSP) [19, 18]. Research in GSP attempts to generalize classical signal processing theory for irregular data defined on graphs. One attraction of GSP is that it provides a unified mathematical framework through which to view the spectral and vertex domains of a graph. Concepts like frequency or smoothness, which can be understood intuitively in classical signal processing, can be explicitly defined for data on graphs.
Graph convolutional neural networks (GCNNs), an extension of CNNs to graphstructured data, were first implemented with concepts from spectral graph theory [4], and methods based on the spectral approach have since been refined and expanded [7, 15]. Reference [9] proposes the topology adaptive graph convolutional network (TAGCN) that defines graph convolution directly in the vertex domain as multiplication by polynomials of the graph adjacency matrix. This is consistent with the concept of convolution in graph signal processing [19]. TAGCN designs a set of fixedsize learnable filters whose topologies are adaptive to the topology of the graph as the filters scan the graph to perform convolution, see also [8, 21]. Other implementations, such as GraphSAGE [12] and graph attention networks (GATs) [22], are also defined directly in the vertex domain of the graph and apply a learned, convolutionlike aggregation function.
An important operation in conventional CNNs is pooling, a nonlinear downsampling operation. Pooling layers in a CNN shrink the number of dimensions of the feature representation, thereby reducing the computation cost, memory footprint, and number of learned parameters. As a result, pooling allows for deeper networks in practice and can help control overfitting. Additionally, pooling has translation invariance properties that are desirable in many applications. Recently, the use of pooling in CNNs has come into question, but it remains popular.
Just as convolution and convolutionlike methods have been proposed to create graph convolutional layers in GCNNs, several methods have been proposed in order to perform pooling with GCNNs [24], [11], [25]. Unlike convolution, which has been derived in GSP [19], pooling has not been rigorously defined. Therefore, the current generation of pooling methods are based on ad hoc rather than systematic approaches. They nonetheless have shown improved accuracy on popular graph classification datasets.
In this paper, we perform experiments on graph classification datasets, conditionally on graph convolution and graph pooling in GCNNs. This is a supervised learning task in which previously unseen graphs are classified based on labeled graphs. This task is analogous to image classification. Like with CNNs and image classification, tools like pooling layers are important for constructing highlevel representations from nodelevel information.
The paper is divided as follows: we first present the background and related work in section II. Section III provides our proposed approach. In section IV, we discuss the datasets used and present the results and analysis. Finally, we conclude the paper in section V.
Ia Graph Signal Processing Perspective
The convolutional and pooling operator in graph neural network have a theoretical foundation in GSP. GSP [19] extends traditional discrete signal processing to graph signals, signals that are indexed by the nodes in a graph.
Let be a graph with adjacency matrix , where is the set of nodes and a nonzero entry denotes a directed edge from node to node .
on is a graph signal where is the signal space over the nodes of and . , and represents a measurement at node .
The heart of GCNNs is applying convolutional filters to graph signals. In GSP, convolution is a matrixvector multiplication of a polynomial of the adjacency matrix
and the graph signal . This definition is used to create the graph convolutional layer in GCNNs.The GSP literature includes [5] and [1]. In [5] and [1], several sampling set selection and sampling methods are proposed. The pooling methods explored herein are not based specifically on these sampling methods, but we observe that there is a relationship between sampling in GSP and pooling in GCNNs. Both reduce the number of values in the signal and can reduce the number of nodes in the graph. The key difference is that, in sampling, we focus on how to recover the original signal given the sampled signal. However, recoverability is not required in pooling algorithms in GCNNs.
Ii Related Work
In this section, we describe the infrastructure for graph convolutional and pooling layers and the related literature.
Iia Graph Convolutional Layer
We concentrate on three implementations of GCNNs, derived from different definitions of graph convolution: graph convolutional networks (GCNs) [15], GraphSAGE [12], and topologyadaptive graph convolutional networks (TAGCNs) [9, 21].
In GCN [15], given a graph signal (where denotes the input layer, is the number of nodes, and is the number of features/input channels) and a graph structure , a graph convolutional layer is defined as follows:
(1) 
where , , is the trainable weight matrix,
is the nonlinear activation function, and
is the number of output channels. for the first layer, and we can propagate the graph signal through additional layers in the network. This approach is based on a firstorder approximation of localized spectral filters on graphs [13].In GraphSAGE [12], graph convolution is defined as follows, for each node of :
(2) 
where is an aggregator function (e.g., sum, mean, or max), and is a random sample of the node ’s neighbors.
IiB Graph Pooling Layer
Similar to graph convolution, graph pooling is inspired by pooling in CNNs. In addition to static pooling methods [17, 2], various differentiable methods have been proposed.
Using the same notation as (1), a graph pooling operator should yield a new signal and adjacency matrix , usually with . See Fig. 1 for an example.
An important benefit of graph pooling is the hierarchical representation of data and structure. Otherwise, global patterns in the data are usually not considered until the final aggregation layer of a network. Below we describe four recent graph pooling algorithms.
IiB1 Sort Pooling
Sort Pooling (SortPool) [25] operates after the last graph convolution layer. Instead of summing or averaging features, SortPool arranges the vertices in a consistent order and outputs a representation with a fixed set, so that further training using CNN can be done.
The vertices are sorted based on their structural roles within the graph. Using the connection between graph convolution and the WeisfeilerLehman subtree kernel [20], SortPool sorts the node features of the last layer individually, then sorts in descending order based on the layer before, and finally selects the top nodes.
IiB2 Differentiable Pooling
Differentiable Pooling (DiffPool) [24] is a differentiable graph pooling module that learns hierarchical representations of the graphs by aggregating nodes through several pooling layers. It uses a learned assignment matrix and updates the graph signal and topology as follows:
IiB3 Topk Pooling
Topk Pool [11] pools using a trainable projection vector and select the topk indices of the projection and the corresponding edges in .
Topk pool is inspired by encoderdecoder architectures like UNets. In addition to the Topk pool operation, there is also an Unpool operation that reverses the process. These two combined create the encoderdecoder model on graph, known as the graph UNets [11]. Reference [11] shows that Topk pool with the Unet structure performs better than DiffPool, but we will show if it works well standalone vs. other pooling algorithms.
IiB4 SelfAttention Graph Pooling
SelfAttention Graph Pooling (SagPool) [11] uses an attention mechanism to select the important nodes:
(6)  
(7)  
(8)  
(9) 
The attention score is calculated from GCN and the top nodes are selected from it. Since graph convolution is used to obtain the selfattention score, SagPool uses both the graph features and structure [11]. Reference [11] shows that SAGPool performs better than DiffPool and Topk Pool across some biochemical datasets.
Iii Proposed Method
We first compare GCN, GraphSAGE, and TAGCN for graph classification across four benchmark datasets. We then investigate how pooling affects these results, by combining the different convolutional architectures with the four pooling techniques described above, i.e., SortPool [25], DiffPool [24], Topk Pool [11], and SagPool [16]. In each instance, the pooling method is paired with GCN or GraphSAGE (determined by that used in each pooling paper), and compared with the pooling method paired with TAGCN.
Iv Experiments
Iva Datasets
To evaluate the efficacy of the different methods, we apply our methods on realworld graph kernel benchmarks. See Table I for the properties of these datasets. We evaluate our methods on bioinfomatics datasets and social network datassets. Both MUTAG and Proteins datasets are bioinformatics data. MUTAG [6] is a dataset consisting of chemical compounds represented by graphs. The task is to predict whether the chemical compound is mutagenic. Proteins [14] is a dataset consisting of proteins represented by graphs. The objective is to predict whether a protein functions as an enzyme. In both of the datasets, the nodes are structure elements, and two nodes are connected if there is a chemical bond between the structure elements represented by the nodes.
For social network datasets, we chose IMDBBinary and RedditBinary. IMDBBinary [23] is a set of graphs corresponding to egonetworks of actors and actresses. An edge is drawn between two actors if they were cast in the same movie. The task is to predict whether a movie is romance or action. In RedditBinary [23], each graph corresponds to an online discussion thread. An edge is drawn between two users if one has replied to the other. The task is to predict whether a thread belongs to a discussion forum or a question answering forum.
Dataset  Graphs  Classes  Avg Nodes  Avg Edges 

MUTAG  188  2  17.7  38.9 
Proteins  1113  2  39.06  72.82 
IMDBBinary  1000  2  19.77  96.53 
RedditBinary  2000  2  429.63  497.75 
IvB Network Training
We perform 5fold crossvalidation to select the hyperparameters from the validation accuracy and estimate the test accuracy. For the baselines, the hyperparameters are the number of graph convolutional layers, number of channels in each layer, dropout rates, pooling rate (number or percentage of nodes to keep), and (for TAGCN) order of polynomial filter. For a fairer comparison, we considered 15 layers for TAGCN vs. 115 layers for GCN and GraphSAGE when using graph polynomial filters of degree 3 (to show that 1 layer of TAGCN with degree
is notlayers of GCN/GraphSAGE). We use crossentropy loss and ADAM optimization with a starting learning rate of 0.01, a decay factor of 0.5, and a decay step size of 50. Experiments were performed in PyTorch using code from the PyTorch Geometric Library
[10].IvC Results
Fig. 2
shows the results of GCNN variants with no pooling, DiffPool, SagPool, SortPool, and TopK Pool. The green, orange, and blue bars are the means of the crossvalidated accuracy and the smaller black error bars are their standard deviations.
IvC1 Graph Convolution Comparison
In general, TAGCN performs better than GCN and GraphSAGE on the four graph classification benchmarks. However, due to the increase in complexity, TAGCN has high variance, especially denser graph structures. TAGCN performs better as graphs become less sparse, i.e., as average degree increases.
We also showed empirically that simply increasing number of layers in GCN and GraphSAGE is not analogous to increasing the order of the polynomial filter in TAGCN. We attribute the the advantage of TAGCN mainly to: 1) Passing a residual connection of the graph signal, and 2) Having weights associated with each polynomial of the adjacency matrix. In comparison, GCN and GraphSAGE do not improve much after five layers, perhaps also suffering from oversmoothing.
IvC2 Graph Convolution and Graph Pooling Comparison
Among the pooling algorithms, DiffPool generally performs the best. SagPool and SortPool perform better for MUTAG and Proteins, but similar or worse for IMDBBinary and RedditBinary. Topk pool performs poorly, suggesting that it requires the autoencoder structure to perform better. In general, only Diffpool is consistently better than no pooling.
The results for graph convolution apply to graph pooling with graph convolution. TAGCN with pooling generally performs better than GCN and GraphSAGE with pooling and more prone to overfitting, likely due to the same reasons.
V Conclusion
On average, TAGCN generally performs well against GCN and GraphSAGE on graph classification datasets with and without pooling for sparser and larger graphs. We also find that DiffPool generally outperforms the other pooling methods evaluated. For future work, we would like to develop a better theoretical understanding of GCNNs, by studying different problems like oversmoothing and the design of different parameters.
References
 [1] (201607) Efficient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies. IEEE Transactions on Signal Processing 64 (14), pp. 3775–3789. External Links: Document, ISSN 19410476 Cited by: §IA.
 [2] (2006) Graph Cuts in Vision and Graphics:Theories and Application. In Handbook of Mathematical Models in Computer Vision, N. Paragios, Y. Chen, and O. D. Faugeras (Eds.), pp. 79–96. Cited by: §IIB.
 [3] (201707) Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. External Links: Document, ISSN Cited by: §I.
 [4] (2014) Spectral Networks and Locally Connected Networks on Graphs. In International Conference on Learning Representations (ICLR), Cited by: §I.
 [5] (201512) Discrete Signal Processing on Graphs: Sampling Theory. IEEE Transactions on Signal Processing 63 (24), pp. 6510–6523. External Links: Document, ISSN 19410476 Cited by: §IA.
 [6] (1991) Structureactivity Relationship of Mutagenic Aromatic and Heteroaromatic Nitro Compounds. Correlation with Molecular Orbital Energies and Hydrophobicity. Journal of Medicinal Chemistry 34 (2), pp. 786–797. External Links: Document Cited by: §IVA.
 [7] (2016) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances in Neural Information Processing Systems (NIPS), Cited by: §I.

[8]
(201806)
On Graph Convolution For Graph CNNs.
In
2018 IEEE Data Science Workshop (DSW)
, Vol. , pp. 1–5. External Links: Document, ISSN null Cited by: §I.  [9] (2017) Topology Adaptive Graph Convolutional Networks. Computing Research Repository abs/1710.10370. External Links: Link, 1710.10370 Cited by: §I, §IIA, §IIA.
 [10] (2019) Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, Cited by: §IVB.

[11]
(201909–15 Jun)
Graph UNets.
In
36th International Conference on Machine Learning
, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 2083–2092. External Links: Link Cited by: §I, §IIB3, §IIB3, §IIB4, §III.  [12] (2017) Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 1024–1034. External Links: Link Cited by: §I, §IIA, §IIA, §IIB2.
 [13] (2011) Wavelets on Graphs via Spectral Graph Theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129 – 150. External Links: ISSN 10635203, Document, Link Cited by: §IIA.
 [14] (2005) Protein Function Prediction via Graph Kernels. Intelligent Systems for Molecular Biology. Cited by: §IVA.
 [15] (2017) SemiSupervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, (ICLR) 2017, Toulon, France, dApril 2426, 2017, Conference Track Proceedings, Cited by: §I, §IIA, §IIA.
 [16] (201909–15 Jun) SelfAttention Graph Pooling. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 3734–3743. External Links: Link Cited by: §III.
 [17] (2019) Clique Pooling for Graph Classification. Computing Research Repository abs/1904.00374. External Links: 1904.00374 Cited by: §IIB.
 [18] (201805) Graph Signal Processing: Overview, Challenges, and Applications. Proceedings of the IEEE 106 (5), pp. 808–828. External Links: Document, ISSN Cited by: §I.
 [19] (201304) Discrete Signal Processing on Graphs. IEEE Trans. Signal Proc. 61 (7), pp. 1644–1656. Cited by: §IA, §I, §I, §I.
 [20] (201111) WeisfeilerLehman Graph Kernels. J. Mach. Learn. Res. 12, pp. 2539–2561. External Links: ISSN 15324435, Link Cited by: §IIB1.
 [21] (201810) Classification with VertexBased Graph Convolutional Neural Networks. In 2018 52nd Asilomar Conference on Signals, Systems, and Computers, Vol. , pp. 752–756. External Links: Document, ISSN 10586393 Cited by: §I, §IIA.
 [22] (2018) Graph Attention Networks. In 6th International Conference on Learning Representations (ICLR), 2018, Vancouver, BC, Canada, April 30  May 3, 2018, Conference Track Proceedings, Cited by: §I.
 [23] (2015) Deep Graph Kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, New York, NY, USA, pp. 1365–1374. External Links: ISBN 9781450336642, Document Cited by: §IVA.
 [24] (2018) Hierarchical Graph Representation Learning with Differentiable Pooling. In 32nd International Conference on Neural Information Processing Systems (NIPS), pp. 4805–4815. External Links: Link Cited by: §I, §IIB2, §IIB2, §III.

[25]
(2018)
An EndtoEnd Deep Learning Architecture for Graph Classification.
In
32nd AAAI Conference on Artificial IntelligenceAAAI Conference on Artificial Intelligence
, External Links: Link Cited by: §I, §IIB1, §III.
Comments
There are no comments yet.