1 Introduction
In recent years, deep learning has been widely explored on graph structured data, such as chemical compounds, protein structures, financial networks, and social networks
[1, 2, 3]. Remarkable success has been achieved by generalizing deep neural networks from gridlike data to graphs [4, 5, 6, 7], resulting in the development of various graph neural networks (GNNs), like graph convolutional network (GCN) [8], GraphSAGE [9], graph attention network (GAT) [10], jumping knowledge network (JK) [11], and graph isomorphism networks (GINs) [12]. They are able to learn representations for each node in graphs and have set new performance records on tasks like node classification and link prediction [13]. In order to extend the success to graph representation learning, graph pooling is required, which takes node representations of a graph as inputs and outputs the corresponding graph representation.While pooling is common in deep learning on gridlike data, it is challenging to develop graph pooling approaches due to the special properties of graphs. First, the number of nodes varies in different graphs, while the graph representations are usually required to have the same fixed size to fit into other machine learning models. Therefore, graph pooling should be capable of handling the variable number of node representations as inputs and producing fixedsized graph representations. Second, unlike images and texts where we can order pixels and words according to the spatial structural information, there is no inherent ordering relationship among nodes in graphs. Indeed, we can set pseudo indices for nodes in a graph. However, an isomorphism of the graph may change the order of the indices. As isomorphic graphs should have the same graph representation, it is required for graph pooling to create the same output by taking node representations in any order as inputs.
Some previous studies employ simple methods such as averaging and summation as graph pooling [14, 15, 12]. However, averaging and summation ignore the feature correlation information, hampering the overall model performance [16]. Other studies have proposed advanced graph pooling methods, including DiffPool [17], SortPool [16], TopKPool [18], SAGPool [19], and EigenPool [20]. DiffPool maps nodes to a predefined number of clusters but is hard to train. EigenPool
involves the computation of eigenvectors, which is slow and expensive.
SortPool, SAGPool and TopKPool rely on the top sorting to select a fixed number () of nodes and order them, during which the information from unselected nodes is discarded. It is worth noting that all the existing graph pooling methods only collect firstorder statistics [21].In this work, we propose to use secondorder pooling as graph pooling. Compared to existing graph pooling methods, secondorder pooling naturally solves the challenges of graph pooling and is more powerful with its ability of using information from all nodes and collecting secondorder statistics. We analyze the practical problems in directly using secondorder pooling with GNNs. To address the problems, we propose two novel and effective global graph pooling approaches based on secondorder pooling; namely, bilinear mapping and attentional secondorder pooling. In addition, we extend attentional secondorder pooling to hierarchical graph pooling for more flexible use in GNNs. We perform thorough experiments on ten graph classification benchmark datasets. The experimental results show that our methods improve the performance significantly and consistently.
2 Related Work
In this section, we review two categories of existing graph pooling methods in Section 2.1. Then in Section 2.2, we introduce what secondorder statistics are, as well as their applications in both transitional machine learning and deep learning. In addition, we discuss the motivation of using secondorder statistics in graph representation learning.
2.1 Graph Pooling: Global versus Hierarchical
Existing graph pooling methods can be divided into two categories according to their roles in graph neural networks (GNNs) for graph representation learning. One is global graph pooling, also known as graph readout operation [12, 19]. The other is hierarchical graph pooling, which is used to build hierarchical GNNs. We explain the details of the two categories and provide examples. In addition, we discuss advantages and disadvantages of the two categories.
Global graph pooling is typically used to connect embedded graphs outputted by GNN layers with classifiers for graph classification. Given a graph, GNN layers produce node representations, where each node is embedded as a vector. Global graph pooling is applied after GNN layers to process node representations into a single vector as the graph representation. A classifier takes the graph representation and performs graph classification. The “global” here refers to the fact that the output of global graph pooling encodes the entire graph. Global graph pooling is usually used only once in GNNs for graph representation learning. We call such GNNs as flat GNNs, in contrast to hierarchical GNNs. The most common global graph pooling methods include averaging and summation
[14, 15, 12].Hierarchical graph pooling is more similar to pooling in computer vision tasks
[21]. The output of hierarchical graph pooling is a pseudo graph with fewer nodes than the input graph. It is used to build hierarchical GNNs, where hierarchical graph pooling is used several times between GNN layers to gradually decrease the number of nodes. The most representative hierarchical graph pooling methods are DiffPool [17], SortPool [16], TopKPool [18], SAGPool [19], and EigenPool [20]. A straightforward way to use hierarchical graph pooling for graph representation learning is to reduce the number of nodes to one. Then the resulted single vector is treated as the graph representation. Besides, there are two other ways to generate a single vector from the pseudo graph outputted by hierarchical graph pooling. One is introduced in SAGPool [19], where global and hierarchical graph pooling are combined. After each hierarchical graph pooling, global graph pooling with an independent classifier is employed. The final prediction is an average of all classifiers. On the other hand, SortPool [16]directly applies convolutional neural networks (CNNs) to reduce the number of nodes to one. In particular, it takes advantage of a property of the pseudo graph outputted by hierarchical graph pooling. That is, the pseudo graph is a graph with a fixed number of nodes and there is an inherent ordering relationship among nodes determined by the trainable parameters in the hierarchical graph pooling. Therefore, common deep learning methods like convolutions can be directly used. In fact, we can simply concatenate node presentations following the inherent order as the graph representation.
Given this property, most hierarchical graph pooling methods can be flexibly used as global graph pooling, with the three ways introduced above. For example, SortPool [16] is used to build flat GNNs and applied only once after all GNN layers. While the idea of learning hierarchical graph representations makes sense, hierarchical GNNs do not consistently outperform flat GNNs [19]. In addition, with advanced techniques like jumping knowledge networks (JKNet) [11] to address the oversmoothing problem of GNN layers [22], flat GNNs can go deeper and achieve better performance than hierarchical GNNs [12].
In this work, we first focus on global graph pooling as secondorder pooling naturally fits this category. Later, we extend one of our proposed graph pooling methods to hierarchical graph pooling in Section 3.6.
2.2 SecondOrder Statistics
In statistics, the order statistics refer to functions which use the th power of samples. Concretely, consider samples
. The first and second moments,
i.e., the meanand variance
, are examples of first and secondorder statistics, respectively. If each sample is a vector, the covariance matrix is an example of secondorder statistics. In terms of graph pooling, it is easy to see that existing methods are based on firstorder statistics [21].Secondorder statistics have been widely explored in various computer vision tasks, such as face recognition, image segmentation, and object detection. In terms of traditional machine learning, the scaleinvariant feature transform (SIFT) algorithm
[23] utilizes secondorder statistics of pixel values to describe local features in images and has become one of the most popular image descriptors. Tuzel et‘ al.[24, 25] use covariance matrices of lowlevel features with boosting for detection and classification. The Fisher encoding [26] applies secondorder statistics for recognition as well. Carreira et al. [27] employs secondorder pooling for semantic segmentation. With the recent advances of deep learning, secondorder pooling is also used in CNNs for finegrained visual recognition [28] and visual question answering [29, 30, 31].Many studies motivates the use of secondorder statistics as taking advantage of the Riemannian geometry of the space of symmetric positive definite matrices [32, 25, 27]. In these studies, certain regularizations are cast to guarantee that the applied secondorder statistics are symmetric positive definite [33, 34]. Other work relates secondorder statistics to orderless texture descriptors for images [26, 28].
In this work, we propose to incorporate secondorder statistics in graph representation learning. Our motivations lie in three aspects. First, secondorder pooling naturally fits the goal and requirements of graph pooling, as discussed in Sections 3.1 and 3.2. Second, secondorder pooling is able to capture the correlations among features, as well as topology information in graph representation learning, as demonstrated in Section 3.2. Third, our proposed graph pooling methods based on secondorder pooling are related to covariance pooling [24, 25, 33, 34] and attentional pooling [35] used in computer vision tasks, as pointed out in Section 3.5. In addition, we show that both covariance pooling and attentional pooling have certain limitations when employed in graph representation learning, and our proposed methods appropriately address them.
3 SecondOrder Pooling for Graphs
In this section, we introduce our proposed secondorder pooling methods for graph representation learning. First, we formally define the aim and requirements of graph pooling in Section 3.1. Then we propose to use secondorder pooling as graph pooling, analyze its advantages, and point out practical problems when directly using it with GNNs in Section 3.2. In order to address the problems, we propose two novel secondorder pooling methods for graphs in Sections 3.3 and 3.4, respectively. Afterwards, we discuss why our proposed methods are more suitable as graph pooling compared to two similar pooling methods in image tasks in Section 3.5. Finally, while both methods focus on global graph pooling, we extend secondorder pooling to hierarchical graph pooling in Section 3.6.
3.1 Properties of Graph Pooling
Consider a graph represented by its adjacency matrix and node feature matrix , where is the number of nodes in and is the dimension of node features. The node features may come from node labels or node degrees. Graph neural networks (GNNs) are known to be powerful in learning good node representation matrix from and :
(1) 
where rows of , , are representations of nodes, and depends on the architecture of GNNs. The task that we focus on in this work is to obtain a graph representation vector from , which is then fed into a classifier to perform graph classification:
(2) 
where is the graph pooling function and is the dimension of . Here, means that the information from can be optionally used in graph pooling. For simplicity, we omit it in the following discussion.
Note that must satisfy two requirements to serve as graph pooling. First, should be able to take with variable number of rows as the inputs and produce fixedsized outputs. Specifically, different graphs may have different number of nodes, which means that is a variable. On the other hand, is supposed to be fixed to fit into the following classifier.
Second, should output the same when the order of rows of changes. This permutation invariance property is necessary to handle isomorphic graphs. To be concrete, if two graph and are isomorphic, GNNs will output the same multiset of node representations [16, 12]. That is, there exists a permutation matrix such that , for and . However, the graph representation computed by should be the same, i.e., if .
3.2 SecondOrder Pooling
In this work, we propose to employ secondorder pooling [27], also known as bilinear pooling [28], as graph pooling. We show that secondorder pooling naturally satisfies the two requirements above.
We start by introducing the definition of secondorder pooling.
Definition. Given , secondorder pooling (SOPool) is defined as
(3) 
In terms of graph pooling, we can view as an dimensional graph representation vector by simply flattening the matrix. Another way to transform the matrix into a vector is discussed in Section 3.4. Note that, as long as SOPool meets the two requirements, the way to transform the matrix into a vector does not affect its eligibility as graph pooling.
Now let us check the two requirements.
Proposition 1. SOPool always outputs an matrix for , regardless of the value of .
Proof.
The result is obvious since the dimension of does not depend on . ∎
Proposition 2. SOPool is invariant to permutation so that it outputs the same matrix when the order of rows of changes.
Proof.
Consider , where is a permutation matrix. Note that we have for any permutation matrix. Therefore, it is easy to derive
(4)  
This completes the proof. ∎
In addition to satisfying the requirements of graph pooling, SOPool is capable of capturing secondorder statistics, which are much more discriminative than firstorder statistics computed by most other graph pooling methods [27, 28, 29]. In detail, the advantages can be seen from two aspects. On one hand, we can tell from that, for each node representation , the features interact with each other, enabling the correlations among features to be captured. On the other hand, topology information is encoded as well. Specifically, we view as , where , . The vector encodes the spatial distribution of the th feature in the graph. Based on this view, is able to capture the topology information.
However, we point out that the direct application of secondorder pooling in GNNs leads to practical problems. The direct way to use secondorder pooling as graph pooling is represented as
(5) 
That is, we apply SOPool on and flatten the output matrix into an dimensional graph representation vector. However, it causes an explosion in the number of training parameters in the following classifier when is large, making the learning process harder to converge and easier to overfit. While each layer in a GNN usually has outputs with a small number of hidden units (e.g. 16, 32, 64), it has been pointed out that graph representation learning benefits from using information from outputs of all layers, obtaining better performance and generalization ability [11]. It is usually achieved by concatenating outputs across all layers in a GNN [16, 12]. In this case, has a large final , making direct use of secondorder pooling infeasible. For example, if a GNN has 5 layers and each layer’s outputs have 32 hidden units, becomes . Suppose is sent into a 1layer fullyconnected classifier for graph categories in a graph classification task. It results in training parameters, which is excessive. We omit the bias term for simplicity.
3.3 Bilinear Mapping SecondOrder Pooling
To address the above problem, a straightforward solution is to reduce in before . Based on this, our first proposed graph pooling method, called bilinear mapping secondorder pooling (SOPool), employs a linear mapping on to perform dimensionality reduction. Specifically, it is defined as
(6)  
where and is a trainable matrix representing a linear mapping. Afterwards, we follow the same process to flatten the matrix and obtain an dimensional graph representation vector:
(7) 
Figure 1 provides an illustration of the above process. By selecting an appropriate , the bilinear mapping secondorder pooling does not suffer from the excessive number of training parameters. Taking the example above, if we set , the total number of parameters in SOPool and a following 1layer fullyconnected classifier is , which is much smaller than .
3.4 Attentional SecondOrder Pooling
Our second proposed graph pooling method tackles with the problem by exploring another way to transform the matrix computed by SOPool into the graph representation vector, instead of simply flattening. Similarly, we use a linear mapping to perform the transformation, defined as
(8) 
where is a trainable vector. It is interesting to note that , which is similar to the sentence attention in [36]. To be concrete, consider a word embedding matrix for a sentence, where is the number of words and is the dimension of word embeddings. The sentence attention is defined as
(9)  
(10) 
where is a trainable vector and is the resulted sentence embedding. Note that Eqn. (9) is the Softmax function and serves as a normalization function [37]. Rewriting the sentence attention into matrix form, we have . The only difference between the computation of and that of is the normalization function. Therefore, we name our second proposed graph pooling method as attentional secondorder pooling (SOPool), defined as
(11) 
where is a trainable vector. It is illustrated in Figure 1. We take the same example above to show that SOPool reduces the number of training parameters. The total number of parameters in SOPool and a following 1layer fullyconnected classifier is just , significantly reducing the amount of parameters compared to .
3.5 Relationships to Covariance Pooling and Attentional Pooling
The experimental results in Section 4 show that both our proposed graph pooling methods achieve better performance significantly and consistently than previous studies. However, we note that, there are pooling methods in image tasks that have similar computation processes to our proposed methods, although they have not been developed based on secondorder pooling. In this section, we point out the key differences between these methods and ours and show why they matter in graph representation learning.
Note that images are usually processed by deep neural networks into feature maps , where , , are the height, width, and number of feature maps, respectively. Following [35, 33, 34], we reshape into the matrix , where and so that different pooling methods can be compared directly.
Covariance pooling. Covariance pooling (CovPool) [24, 25, 33, 34] has been widely explored in image tasks, such as image categorization, facial expression recognition, and texture classification. Recently, it has also been explored in GNNs [38]. The definition is
(12) 
where is the dimensional allone vector and is the mean of rows of . It differs from SOPool defined in Eqn. (3) only in whether to subtract the mean. However, subtracting the mean makes CovPool less powerful in terms of distinguishing graphs with repeating node embeddings [12], which may cause the performance loss. Figure 2(a) gives an example of this problem.
Attentional pooling. Attentional pooling (AttnPool) [35] has been used in action recognition. As shown in Section 3.4, it is also used in text classification [36], defined as
(13) 
where is a trainable vector. It differs from SOPool only in the Softmax function. We show that the Softmax
function leads to similar problems as other normalization functions, such as mean and maxpooling
[12]. Figure 2 provides examples in which AttnPool does not work.To conclude, our methods derived from secondorder pooling are more suitable as graph pooling. We compare these pooling methods through experiments in Section 4.3. The results show that CovPool and AttnPool suffer from significant performance loss on some datasets.
3.6 MultiHead Attentional SecondOrder Pooling
The proposed SOPool and SOPool belong to the global graph pooling category. As discussed in Section 2.1, they are used in flat GNNs after all GNN layers and output the graph representation for the classifier. While flat GNNs outperform hierarchical GNNs in most benchmark datasets [12], developing hierarchical graph pooling is still desired, especially for large graphs [17, 20, 19]. Therefore, we explore a hierarchical graph pooling method based on secondorder pooling.
Unlike global graph pooling, hierarchical graph pooling outputs multiple vectors corresponding to node representations in the pooled graph. In addition, hierarchical graph pooling has to update the adjacency matrix to indicate how nodes are connected in the pooled graph. To be specific, given the adjacency matrix and node representation matrix , a hierarchical graph pooling function can be written as
(14) 
where and . Here,
is a hyperparameter determining the number of nodes in the pooled graph. Note that Eqn. (
14) does not conflict with Eqn. (2), as we can always transform into a vector , as discussed in Section 2.1.We note that the proposed SOPool in Section 3.4 is closely related to the attention mechanism and can be easily extended to a hierarchical graph pooling method based on the multihead technique in the attention mechanism [37, 10]. The multihead technique means that multiple independent attentions are performed on the same inputs. Then the outputs of multiple attentions are then concatenated together. Based on this insight, we propose multihead attentional secondorder pooling (SOPool), defined as
(15) 
where is a trainable matrix. To illustrate its relationship to the multihead technique, we can equivalently write it as
(16) 
where we decompose in Eqn. (15) as . The relationship can be easily seen by comparing Eqn. (16) with Eqn. (11).
The multihead technique enables SOPool to output the node representation matrix for the pooled graph. We now describe how to update the adjacency matrix. In particular, we employ a contribution matrix in updating the adjacency matrix. The contribution matrix is a matrix, whose entries indicate how nodes in the input graph contribute to nodes in the pooled graph. In SOPool, we can simply let . With the contribution matrix , the corresponding adjacency matrix of the pooled graph can be computed as
(17) 
The proposed SOPool is closely related to DiffPool [17]. The contribution matrix corresponds to the assignment matrix in DiffPool. However, DiffPool applied GNN layers with normalization on to obtain , preventing the explicit use of secondorder statistics. In the experiments, we evaluate SOPool as both global and hierarchical graph pooling methods, in flat and hierarchical GNNs, respectively.
4 Experiments
We conduct thorough experiments on graph classification tasks to show the effectiveness of our proposed graph pooling methods, namely bilinear mapping secondorder pooling (SOPool), attentional secondorder pooling (SOPool), and multihead attentional secondorder pooling (SOPool). Section 4.1 introduces the datasets, baselines, and experimental setups for reproduction. The following sections aim at evaluating our proposed methods in different aspects, by answering the questions below:

Can GNNs with our proposed methods achieve improved performance in graph classification tasks? Section 4.2 provides the comparison results between our methods and existing methods in graph classification tasks.

Do our proposed methods outperform existing global graph pooling methods with the same flat GNN architecture? The ablation studies in Section 4.3 compare different graph pooling methods with the same GNN, eliminating the influences of different GNNs. In particular, we use hierarchical graph pooling methods as global graph pooling methods in this experiment, including SOPool.

Is the improvement brought by our proposed method consistent with various GNN architectures? Section 4.4 shows the performance of the proposed SOPool and SOPool with different GNNs.

Is SOPool effective as hierarchical graph pooling methods? We compare SOPool with other hierarchical graph pooling methods in the same hierarchical GNN architecture in Section 4.5.
4.1 Experimental Setup
Reproducibility. The code used in our experiments is available at https://github.com/divelab/sopool. Details of datasets and parameter settings are described below.
Datasets. We use ten graph classification datasets from [1], including five bioinformatics datasets (MUTAG, PTC, PROTEINS, NCI1, DD) and five social network datasets (COLLAB, IMDBBINARY, IMDBMULTI, REDDITBINARY, REDDITMULTI5K). Note that only bioinformatics datasets come with node labels. Below are the detailed descriptions of datasets:

MUTAG is a bioinformatics dataset of 188 graphs representing nitro compounds. Each node is associated with one of 7 discrete node labels. The task is to classify each graph by determining whether the compound is mutagenic aromatic or heteroaromatic [39].

PTC [40] is a bioinformatics dataset of 344 graphs representing chemical compounds. Each node comes with one of 19 discrete node labels. The task is to predict the rodent carcinogenicity for each graph.

PROTEINS [41] is a bioinformatics dataset of 1,113 graph structures of proteins. Nodes in the graphs refer to secondary structure elements (SSEs) and have discrete node labels indicating whether they represent a helix, sheet or turn. And edges mean that two nodes are neighbors along the aminoacid sequence or in space. The task is to predict the protein function for each graph.

NCI1 [42] is a bioinformatics dataset of 4,110 graphs representing chemical compounds. It contains data published by the National Cancer Institute (NCI). Each node is assigned with one of 37 discrete node labels. The graph classification label is decided by NCI anticancer screens for ability to suppress or inhibit the growth of a panel of human tumor cell lines.

COLLAB is a scientific collaboration dataset of 5,000 graphs corresponding to egonetworks generated using the method in [43]. The dataset is derived from 3 public collaboration datasets [44]. Each egonetwork contains different researchers from each field and is labeled by the corresponding field. The three fields are High Energy Physics, Condensed Matter Physics, and Astro Physics.

IMDBBINARY is a movie collaboration dataset of 1,000 graphs representing egonetworks for actors/actresses. The dataset is derived from collaboration graphs on Action and Romance genres. In each graph, nodes represent actors/actresses and edges simply mean they collaborate the same movie. The graphs are labeled by the corresponding genre and the task is to identify the genre for each graph.

IMDBMULTI is multiclass version of IMDBBINARY. It contains 1,500 egonetworks and has three extra genres, namely, Comedy, Romance and SciFi.

REDDITBINARY is a dataset of 2,000 graphs where each graph represents an online discussion thread. Nodes in a graph correspond to users appearing in the corresponding discussion thread and an edge means that one user responded to another. Datasets are crawled from top submissions under four popular subreddits, namely, IAmA, AskReddit, TrollXChromosomes, atheism. Among them, AmA and AskReddit are question/answerbased subreddits while TrollXChromosomes and atheism are discussionbased subreddits, forming two classes to be classified.

REDDITMULTI5K is a similar dataset as REDDITBINARY, which contains 5,000 graphs. The difference lies in that REDDITMULTI5K crawled data from five different subreddits, namely, worldnews, videos, AdviceAnimals, aww and mildlyinteresting. And the task is to identify the subreddit of each graph instead of determining the type of subreddits.

DD [45] is a bioinformatics dataset of 1,178 graph structures of proteins. Nodes in the graphs represent amino acids. And edges connect nodes that are less than 6 ngstroms apart. The task is a twoway classification task between enzymes and nonenzymes. DD is only used in Section 4.5. The average number of nodes in DD is 284.3.
Datasets 
MUTAG  PTC  PROTEINS  NCI1  COLLAB  IMDBB  IMDBM  RDTB  RDTM5K  

# graphs  188  344  1113  4110  5000  1000  1500  2000  5000  
# classes  2  2  2  2  3  2  3  2  5  
# nodes (max)  28  109  620  111  492  136  89  3783  3783  
# nodes (avg.)  18.0  25.6  39.1  29.9  74.5  19.8  13.0  429.6  508.5  
Kernel 
GK [2009]  81.41.7  57.31.4  71.70.6  62.30.3  72.80.3  65.91.0  43.90.4  77.30.2  41.00.2 
RW [2010]  79.22.1  57.91.3  74.20.4  1 day            
WL [2011]  90.45.7  59.94.3  75.03.1  86.01.8  78.91.9  73.83.9  50.93.8  81.03.1  52.52.1  
DGK [2015]    60.12.6  75.70.5  80.30.5  73.10.3  67.00.6  44.60.5  78.00.4  41.30.2  
AWE [2018]  87.99.8        73.91.9  74.55.9  51.53.6  87.92.5  54.72.9  
GNN 
DCNN [2016]  67.0  56.6  61.3  56.6  52.1  49.1  33.5     
PatchScan [2016]  92.64.2  60.04.8  75.92.8  78.61.9  72.62.2  71.02.2  45.22.8  86.31.6  49.10.7  
ECC [2017]      72.7  76.8  67.8          
DGCNN [2018]  85.81.7  58.62.5  75.51.0  74.40.5  73.80.5  70.00.9  47.80.9  76.01.7  48.74.5  
DiffPool [2018]  80.6    76.3  76.0  75.5          
GCAPSCNN [2018]    66.05.9  76.44.2  82.72.4  77.72.5  71.73.4  48.54.1  87.62.5  50.11.7  
GIN0 + Sum/Avg [2018]  89.45.6  64.67.0  76.22.8  82.71.7  80.21.9  75.15.1  52.32.8  92.42.5  57.51.5  
EigenGCN [2019]  79.5    76.6  77.0            
Ours 
GIN0 + SOPool  93.64.1  72.96.2  79.43.2  82.81.4  81.11.8  78.14.0  54.32.6  91.72.7  58.31.4 
GIN0 + SOPool  95.34.4  75.04.3  80.12.7  83.61.4  79.91.9  78.44.7  54.63.6  89.63.3  58.41.6  
GIN0 + SOPool  95.25.4  74.45.5  79.53.1  84.51.3  77.61.9  78.52.8  54.32.1  90.00.8  55.82.2 
Models  MUTAG  PTC  PROTEINS  NCI1  COLLAB  IMDBB  IMDBM  RDTB  RDTM5K 

GIN0 + Sum/Avg  89.45.6  64.67.0  76.22.8  82.71.7  80.21.9  75.15.1  52.32.8  92.42.5  57.51.5 
GIN0 + DiffPool  94.84.8  66.17.7  78.83.1  76.61.3  75.32.2  74.44.0  50.13.2     
GIN0 + SortPool  95.23.9  69.56.3  79.23.0  78.92.7  78.21.6  77.52.7  53.12.9  81.64.6  48.44.8 
GIN0 + TopKPool  94.73.5  68.46.4  79.12.2  79.61.7  79.62.1  77.85.1  53.72.8     
GIN0 + SAGPool  93.93.3  69.06.6  78.43.1  79.02.8  78.91.7  77.82.9  53.12.8     
GIN0 + AttnPool  93.25.8  71.28.0  77.53.3  80.62.1  81.82.2  77.14.4  53.82.5  92.52.3  57.91.7 
GIN0 + SOPool  93.64.1  72.96.2  79.43.2  82.81.4  81.11.8  78.14.0  54.32.6  91.72.7  58.31.4 
GIN0 + CovPool  95.33.7  73.35.1  80.12.2  83.51.9  79.31.8  72.15.1  47.82.7  90.33.6  58.41.7 
GIN0 + SOPool  95.34.4  75.04.3  80.12.7  83.61.4  79.91.9  78.44.7  54.63.6  89.63.3  58.41.6 
GIN0 + SOPool  95.25.4  74.45.5  79.53.1  84.51.3  77.61.9  78.52.8  54.32.1  90.00.8  55.82.2 
More statistics of these datasets are provided in the “datasets” section of Table I. The input node features are different for different datasets. For bioinformatics datasets, the nodes have categorical labels as input features. For social network datasets, we create node features. To be specific, we set all node feature vectors to be the same for REDDITBINARY and REDDITMULTI5K [12]
. And for the other social network datasets, we use onehot encoding of node degrees as features.
Configurations. In Sections 4.2, 4.3 and 4.4, the flat GNNs we use with our proposed graph pooling methods are graph isomorphism networks (GINs) [12]. The original GINs employ averaging or summation (Sum/Avg) as the graph pooling function; specifically, summation on bioinformatics datasets and averaging on social datasets. We replace averaging or summation with our proposed graph pooling methods and keep other parts the same. There are seven variants of GINs, two of which are equivalent to graph convolutional network (GCN) [8] and GraphSAGE [9], respectively. In Sections 4.2 and 4.3, we use GIN0 with our methods. In Section 4.4, we examine our methods with all variants of GINs. Details of all variants can be found in Section 4.4.
The hierarchical GNNs used in Section 4.5 follow the hierarchical architecture in [19], allowing direct comparisons. To be specific, each block is composed of one GNN layer followed by a hierarchical graph pooling. After each hierarchical pooling, a classifier is used. The final prediction is the combination of all classifiers.
Training & Evaluation. Following [1, 46]
, model performance is evaluated using 10fold crossvalidation and reported as the average and standard deviation of validation accuracies across the 10 folds. For the flat GNNs, we follow the same training process in
[12]. All GINs have 5 layers. Each multilayer perceptron (MLP) has 2 layers with batch normalization
[47]. For the hierarchical GNNs, we follow the the same training process in [19]. There are three blocks in total. Dropout [48] is applied in the classifiers. The Adam optimizer [49]is used with the learning rate initialized as 0.01 and decayed by 0.5 every 50 epochs. The number of total epochs is selected according to the best crossvalidation accuracy. We tune the number of hidden units (16, 32, 64) and the batch size (32, 128) using grid search.
Baselines. We compare our methods with various graph classification models as baselines, including both kernelbased and GNNbased methods. The kernelbased methods are graphlet kernel (GK) [50], random walk kernel (RW) [51], WeisfeilerLehman subtree kernel (WL) [52], deep graphlet kernel (DGK) [1], and anonymous walk embeddings (AWE) [53]. Among them, DGK and AWE use deep learning methods as well. The GNNbased methods are diffusionconvolutional neural network (DCNN) [54], PatchScan [46], ECC [55], deep graph CNN (DGCNN) [16], differentiable pooling (DiffPool) [17], graph capsule CNN (GCAPSCNN) [38], selfattention graph pooling (SAGPool) [19], GIN [12], and eigenvectorbased pooling (EigenGCN) [20]. We report the performance of these baselines provided in [16, 38, 12, 17, 19, 20].
4.2 Comparison with Baselines
GNNs  Pools  MUTAG  PTC  PROTEINS  NCI1  COLLAB  IMDBB  IMDBM 

SumMLP (GIN0)  Sum/Avg  89.45.6  64.67.0  76.22.8  82.71.7  80.21.9  75.15.1  52.32.8 
SOPool  93.64.1  72.96.2  79.43.2  82.81.4  81.11.8  78.14.0  54.32.6  
SOPool  95.34.4  75.04.3  80.12.7  83.61.4  79.91.9  78.44.7  54.63.6  
SumMLP (GIN)  Sum/Avg  89.06.0  63.78.2  75.93.8  82.71.6  80.11.9  74.35.1  52.13.6 
SOPool  92.65.4  73.65.5  79.21.9  83.11.8  80.61.6  78.14.3  55.43.7  
SOPool  93.75.3  73.57.0  79.31.8  83.61.4  80.42.4  77.54.5  54.53.5  
Sum1Layer  Sum/Avg  90.08.8  63.15.7  76.22.6  82.01.5  80.61.9  74.15.0  52.22.4 
SOPool  94.24.4  73.66.5  79.02.9  81.21.5  81.21.6  78.64.1  54.53.0  
SOPool  95.84.2  71.86.1  80.12.5  82.41.3  80.52.0  78.23.6  54.13.4  
MeanMLP  Sum/Avg  83.56.3  66.66.9  75.53.4  80.91.8  79.22.3  73.73.7  52.33.1 
SOPool  92.64.5  74.96.6  79.42.8  80.61.1  80.02.0  77.53.9  55.23.3  
SOPool  90.46.2  72.74.0  79.32.4  81.11.6  80.41.7  77.94.7  55.03.7  
Mean1Layer (GCN)  Sum/Avg  85.65.8  64.24.3  76.03.2  80.22.0  79.01.8  74.03.4  51.93.8 
SOPool  90.05.1  76.75.6  78.52.8  78.01.8  80.21.6  78.94.2  54.83.1  
SOPool  90.95.7  70.94.1  78.73.1  78.81.1  80.42.1  77.74.5  54.54.0  
MaxMLP  Sum/Avg  84.06.1  64.610.2  76.03.2  77.81.3    73.25.8  51.13.6 
SOPool  90.07.3  72.44.7  78.33.1  78.61.9    78.14.1  54.13.4  
SOPool  88.87.0  73.35.5  78.43.0  78.01.9    78.24.7  54.63.5  
Max1Layer (GraphSAGE)  Sum/Avg  85.17.6  63.97.7  75.93.2  77.71.5    72.35.3  50.92.2 
SOPool  90.06.8  72.15.9  79.02.9  77.41.8    77.45.1  54.13.1  
SOPool  89.95.8  73.65.1  78.92.8  77.02.0    78.64.7  54.23.9 
The comparison results between our methods and baselines are reported in Table I. GIN0 equipped with our proposed graph pooling methods, “GIN0 + SOPool”, “GIN0 + SOPool”, and “GIN0 + SOPool”, outperform all the baselines significantly on seven out of nine datasets. On NCI1, WL has better performance than all GNNbased models. However, “GIN0 + SOPool” is the second best model and has improved performance over other GNNbased models. On REDDITBINARY, our methods achieve comparable performance to the best one.
It is worth noting that the baseline “GIN0 + Sum/Avg” is the previous stateoftheart model [12]. Our methods differ from it only in the graph pooling functions. The significant improvement demonstrates the effectiveness of our proposed graph pooling methods. In the next section, we compare our methods with other graph pooling methods by fixing the GNN before graph pooling to GIN0, in order to eliminate the influences of different GNNs.
4.3 Ablation Studies in Flat Graph Neural Networks
We perform ablation studies to show that our proposed methods are superior to other global graph pooling methods under a fair setting. Starting from the baseline “GIN0 + Sum/Avg”, we replace Sum/Avg with different graph pooling methods and keep all other configurations unchanged. The graph pooling methods we include are DiffPool [17], SortPool from DGCNN [16], TopKPool from Graph UNet [18], SAGPool [19], and CovPool and AttnPool described in Section 3.5. DiffPool, TopKPool, and SAGPool are used as hierarchical graph pooling methods in their works, but they achieve good performance as global pooling methods as well [17, 19]. EigenPool from EigenGCN suffers from significant performance loss as a global pooling method [20] so that we do not include it in the ablation studies. CovPool and AttnPool use the same settings as our proposed methods.
Table II provides the comparison results. Our proposed SOPool and SOPool achieve better performance than DiffPool, SortPool, TopKPool, and SAGPool on all datasets, demonstrating the effectiveness of our graph pooling methods with secondorder statistics.
To support our discussion in Section 3.5, we analyze the performance of CovPool and AttnPool. Note that the same bilinear mapping technique used in SOPool is applied on CovPool, in order to avoid the excessive number of parameters. CovPool achieves comparable performance to SOPool on most datasets. However, huge performance loss is observed on PTC, IMDBBINARY, and IMDBMULTI, indicating that subtracting the mean is harmful in graph pooling.
Compared to SOPool, AttnPool suffers from performance loss on all datasets except COLLAB and REDDITBINARY. The loss is especially significant on bioinformatics datasets (PTC, PROTEINS, NCI1). However, AttnPool achieves the best performance on COLLAB and REDDITBINARY among all graph pooling methods, although the added Softmax function results in less discriminative power. The reason might be capturing the distributional information is more important than the exact structure in these datasets. It is similar to GINs, where using averaging as graph pooling achieves better performance on social network datasets than summation [12].
4.4 Results with Different Graph Neural Networks
We’ve already demonstrated the superiority of our proposed SOPool and SOPool over previous pooling methods. Next, we show that their effectiveness is robust to different GNNs. In this experiment, we change GIN0 into other six variants of GINs. Note that these variants cover Graph Convolutional Networks (GCN) [8] and GraphSAGE [9], thus including a wide range of different kinds of GNNs.
We first give details of different variants of graph isomorphism networks (GINs) [12]. Basically, GINs iteratively update the representation of each node in a graph by aggregating representations of its neighbors, where the iteration is achieved by stacking several layers. Therefore, it suffice to describe the th layer of GINs based on one node.
Recall that we represent a graph by its adjacency matrix and node feature matrix , where is the number of nodes in and is the dimension of node features. The adjacency matrix tells the neighboring information of each node. We introduce GINs by defining node representation matrices and as inputs and outputs to the th layer, respectively. We have . Note that the first dimension does not change during the computation, as GINs learn representations for each node.
Specifically, consider a node has corresponding representations and , which are rows of and , respectively. The set of neighboring nodes of is given by . We describe the layer of the following variants:

SumMLP (GIN0):

SumMLP (GIN):

Sum1Layer:

MeanMLP:

Mean1Layer (GCN):

MaxMLP:

Max1Layer (GraphSAGE):
Here, the multilayer perceptron (MLP) has two layers with ReLU activation functions. Note that
Mean1Layer and Max1Layer correspond to GCN [8] and GraphSAGE [9], respectively, up to minor architecture modifications.The results of these different GNNs with our graph pooling methods are reported in Table III. Our proposed SOPool and SOPool achieve satisfying performance consistently. In particular, on social network datasets, the performance does not decline when the GNNs before graph pooling become less powerful, showing the highly discriminative ability of secondorder pooling.
4.5 Ablation Studies in Hierarchical Graph Neural Networks
Models  DD  PROTEINS 

DiffPool  67.02.4  68.22.0 
TopKPool  75.00.9  71.10.9 
SAGPool  76.51.0  71.91.0 
SOPool  76.81.9  77.13.8 
Models  DD  PROTEINS 

1 block  73.32.4  77.44.3 
2 blocks  77.22.7  78.14.3 
3 blocks  76.81.9  77.13.8 
SOPool has shown its effectiveness as global graph pooling through the experiments in Sections 4.2 and 4.3. In this section, we evaluate it as hierarchical graph pooling in hierarchical GNNs. The hierarchical GNN architecture follows the one in [19], which contains three blocks of a GNN layer followed by graph pooling, as introduced in Section 4.1. The experiments are performed on DD and PROTEINS datasets, where hierarchical GNNs tend to achieve good performance [17, 19].
First, we compare SOPool with different hierarchical graph pooling methods under the same hierarchical GNN architecture. Specifically, we include DiffPool, TopKPool, and SAGPool, which have been used as hierarchical graph pooling methods in their works. The comparison results are provided in Table IV. Our proposed SOPool outperforms all the baselines on both datasets, indicating the effectiveness of SOPool as a hierarchical graph pooling method.
In addition, we conduct experiments to evaluate SOPool in different hierarchical GNNs by varying the number of blocks. The results are shown in Table V. On both datasets, SOPool achieves the best performance when the number of blocks is two. The results indicate current datasets on graph classification are not large enough yet. And without techniques like jumping knowledge networks (JKNet) [11], hierarchical GNNs tend to suffer from overfitting, leading to worse performance than flat GNNs.
5 Conclusions
In this work, we propose to perform graph representation learning with secondorder pooling, by pointing out that secondorder pooling can naturally solve the challenges of graph pooling. Secondorder pooling is more powerful than existing graph pooling methods, since it is capable of using all node information and collecting secondorder statistics that encode feature correlations and topology information. To take advantage of secondorder pooling in graph representation learning, we propose two global graph pooling approaches based on secondorder pooling; namely, bilinear mapping and attentional secondorder pooling. Our proposed methods solve the practical problems incurred by directly using secondorder pooling with GNNs. We theoretically show that our proposed methods are more suitable to graph representation learning by comparing with two related pooling methods from computer vision tasks. In addition, we extend one of the proposed method to a hierarchical graph pooling method, which has more flexibility. To demonstrate the effectiveness of our methods, we conduct thorough experiments on graph classification tasks. Our proposed methods have achieved the new stateoftheart performance on eight out of nine benchmark datasets. Ablation studies are performed to show that our methods outperform existing graph pooling methods significantly and achieve good performance consistently with different GNNs.
Acknowledgments
This work was supported in part by National Science Foundation grant IIS1908166, and Defense Advanced Research Projects Agency grant N660011724031.
References
 [1] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1365–1374.
 [2] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: Algorithms, applications and open challenges,” in International Conference on Computational Social Networks. Springer, 2018, pp. 79–91.
 [3] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
 [4] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, “Graph neural networks for social recommendation,” in The World Wide Web Conference. ACM, 2019, pp. 417–426.
 [5] H. Gao, J. Pei, and H. Huang, “Conditional random field enhanced graph convolutional neural networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY, USA: ACM, 2019, pp. 276–284. [Online]. Available: http://doi.acm.org/10.1145/3292500.3330888
 [6] J. Ma, P. Cui, K. Kuang, X. Wang, and W. Zhu, “Disentangled graph convolutional networks,” in International Conference on Machine Learning, 2019, pp. 4212–4221.
 [7] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in The World Wide Web Conference, ser. WWW ’19. New York, NY, USA: ACM, 2019, pp. 2022–2032. [Online]. Available: http://doi.acm.org/10.1145/3308558.3313562
 [8] T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2017.
 [9] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.
 [10] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018.
 [11] K. Xu, C. Li, Y. Tian, T. Sonobe, K.i. Kawarabayashi, and S. Jegelka, “Representation learning on graphs with jumping knowledge networks,” in International Conference on Machine Learning, 2018, pp. 5449–5458.
 [12] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Representations, 2019.
 [13] K. Schütt, P.J. Kindermans, H. E. S. Felix, S. Chmiela, A. Tkatchenko, and K.R. Müller, “Schnet: A continuousfilter convolutional neural network for modeling quantum interactions,” in Advances in Neural Information Processing Systems, 2017, pp. 991–1001.
 [14] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. AspuruGuzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in Neural Information Processing Systems, 2015, pp. 2224–2232.
 [15] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 3844–3852.

[16]
M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An endtoend deep learning
architecture for graph classification,” in
ThirtySecond AAAI Conference on Artificial Intelligence
, 2018.  [17] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” in Advances in Neural Information Processing Systems, 2018, pp. 4800–4810.
 [18] H. Gao and S. Ji, “Graph UNet,” in International Conference on Machine Learning, 2019.
 [19] J. Lee, I. Lee, and J. Kang, “Selfattention graph pooling,” in International Conference on Machine Learning, 2019, pp. 3734–3743.
 [20] Y. Ma, S. Wang, C. C. Aggarwal, and J. Tang, “Graph convolutional networks with eigenpooling,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY, USA: ACM, 2019, pp. 723–731. [Online]. Available: http://doi.acm.org/10.1145/3292500.3330982
 [21] Y. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in vision algorithms,” in International Conference on Machine learning, vol. 345, 2010.
 [22] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the oversmoothing problem for graph neural networks from the topological view,” in ThirtyFourth AAAI Conference on Artificial Intelligence, 2020.
 [23] D. G. Lowe, “Object recognition from local scaleinvariant features,” in International Conference on Computational Vision, vol. 99, no. 2, 1999, pp. 1150–1157.
 [24] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptor for detection and classification,” in European conference on computer vision. Springer, 2006, pp. 589–600.
 [25] ——, “Pedestrian detection via classification on riemannian manifolds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1713–1727, 2008.
 [26] F. Perronnin, J. Sánchez, and T. Mensink, “Improving the fisher kernel for largescale image classification,” in European Conference on Computer Vision. Springer, 2010, pp. 143–156.
 [27] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with secondorder pooling,” in European Conference on Computer Vision. Springer, 2012, pp. 430–443.
 [28] T.Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for finegrained visual recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1449–1457.

[29]
Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pooling,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2016, pp. 317–326. 
[30]
A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach,
“Multimodal compact bilinear pooling for visual question answering and
visual grounding,” in
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
, 2016, pp. 457–468.  [31] Z. Wang and S. Ji, “Learning convolutional text representations for visual question answering,” in Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 2018, pp. 594–602.

[32]
V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric means in a novel vector space structure on symmetric positivedefinite matrices,”
SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 1, pp. 328–347, 2007.  [33] D. Acharya, Z. Huang, D. Pani Paudel, and L. Van Gool, “Covariance pooling for facial expression recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 367–374.
 [34] Q. Wang, J. Xie, W. Zuo, L. Zhang, and P. Li, “Deep CNNs meet global covariance pooling: Better representation and generalization,” arXiv preprint arXiv:1904.06836, 2019.
 [35] R. Girdhar and D. Ramanan, “Attentional pooling for action recognition,” in Advances in Neural Information Processing Systems, 2017, pp. 34–45.
 [36] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1480–1489.
 [37] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
 [38] S. Verma and Z.L. Zhang, “Graph capsule convolutional neural networks,” arXiv preprint arXiv:1805.08090, 2018.
 [39] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity,” Journal of Medicinal Chemistry, vol. 34, no. 2, pp. 786–797, 1991.
 [40] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma, “Statistical evaluation of the predictive toxicology challenge 2000–2001,” Bioinformatics, vol. 19, no. 10, pp. 1183–1193, 2003.
 [41] K. M. Borgwardt, C. S. Ong, S. Schönauer, S. Vishwanathan, A. J. Smola, and H.P. Kriegel, “Protein function prediction via graph kernels,” Bioinformatics, vol. 21, no. suppl_1, pp. i47–i56, 2005.
 [42] N. Wale, I. A. Watson, and G. Karypis, “Comparison of descriptor spaces for chemical compound retrieval and classification,” Knowledge and Information Systems, vol. 14, no. 3, pp. 347–375, 2008.
 [43] A. Shrivastava and P. Li, “A new space for comparing graphs,” in Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press, 2014, pp. 62–71.
 [44] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: densification laws, shrinking diameters and possible explanations,” in Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 2005, pp. 177–187.
 [45] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structures from nonenzymes without alignments,” Journal of Molecular Biology, vol. 330, no. 4, pp. 771–783, 2003.
 [46] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in International Conference on Machine Learning, 2016, pp. 2014–2023.
 [47] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, 2015, pp. 448–456.
 [48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
 [50] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Artificial Intelligence and Statistics, 2009, pp. 488–495.
 [51] S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt, “Graph kernels,” Journal of Machine Learning Research, vol. 11, no. Apr, pp. 1201–1242, 2010.
 [52] N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “WeisfeilerLehman graph kernels,” Journal of Machine Learning Research, vol. 12, no. Sep, pp. 2539–2561, 2011.
 [53] S. Ivanov and E. Burnaev, “Anonymous walk embeddings,” in International Conference on Machine Learning, 2018, pp. 2191–2200.
 [54] J. Atwood and D. Towsley, “Diffusionconvolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993–2001.
 [55] M. Simonovsky and N. Komodakis, “Dynamic edgeconditioned filters in convolutional neural networks on graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3693–3702.
Comments
There are no comments yet.