I Introduction
Graphstructured data is becoming ubiquitous across a wide variety of domains, such as chemical molecules [debnath1991structure], social networks [yanardag2015deep, rozemberczki2020api], financial networks [hamilton2020graph], and citation networks [morris2020tudataset]. Learning effective graph representations plays a crucial role for many tasks across various application areas, such as drug discovery [jiang2021could], molecule property prediction [li2017learning], traffic forecast [jiang2021graph], and so on. Recently, graph neural networks (GNN) have emerged as stateoftheart models for graph representation learning, including graph convolutional network (GCN)[kipf2017semi], graph attention network (GAT) [velivckovic2018graph], graph isomorphism network (GIN) [xu2019powerful], and GraphSAGE [hamilton2017inductive]. The majority of these GNN models rely on message passing schemes in graph convolution to learn the embedding of each node by aggregating and transforming the embeddings of its neighbouring nodes. To obtain the representation of the entire graph, node embeddings are aggregated via a readout function or graph pooling methods [zhang2019hierarchical, ying2018hierarchical, ranjan2020asap, yuan2020structpool]
. Graph pooling methods focus on coarsening an input graph into a compact vectorbased representation for the entire graph, which is used for graph prediction tasks, such as graph classification or graph regression.
To learn informative graph representations, a myriad of graph pooling methods have been proposed, which can be roughly categorized as node sampling based pooling and node clustering based pooling. Node sampling based pooling methods (e.g., SAGPool [lee2019self], ASAP [ranjan2020asap], HGPSL [zhang2019hierarchical]) typically calculate an importance score for each node and select the top important nodes to generate an induced subgraph. For example, SAGPool [lee2019self] selects nodes by learning importance scores via a selfattention mechanism. HGPSL [zhang2019hierarchical] samples the important nodes and uses an additional structure learning mechanism to learn new graph structure for sampled nodes. Node clustering based pooling methods, like differentiable graph pooling (DiffPool) [ying2018hierarchical], learn an assignment matrix to cluster nodes into several supernodes level by level. Through this process, a hierarchy of the induced subgraphs can be generated for representing the whole graph. However, we argue that the existing pooling methods focus primarily on aggregating nodelevel information for learning graphlevel representations, but they fail to exploit the key graph structure. The loss of information present in the global graph structure would further hinder the message passing in subsequent layers.
To verify our argument, we selected four stateoftheart pooling methods: SAGPool [lee2019self], ASAP [ranjan2020asap], DiffPool [ying2018hierarchical], and HGPSL [zhang2019hierarchical] and analyzed the influence of changing graph topological structure on the graph classification accuracy. We used the PROTEINS dataset as a case study, where we randomly dropped and added edges with different ratios. As shown in Fig. 1, we found that the random edge manipulation does not cause a significant decrease in the graph classification accuracy. Surprisingly, when there are no edges at all, i.e. dropping 100% edges in Fig 1(a) and adding 0% edges (no edges) in Fig 1(b), the classification accuracy still retains at the same level with other edge ratios. Especially for HGPSL that implicitly uses edge information, the classification accuracy is the highest when all edges are removed. Our empirical studies indicate the current graph pooling methods are heavily nodecentric and are unable to fully leverage the crucial information contained in graph structure.
To fill this research gap, we propose a novel crossview graph pooling method called CoPooling that explicitly exploits graph structure for learning graphlevel representations. Our main motivations are twofold. First, we would like to capture crucial graph structure through explicitly pruning unimportant edges. Key structure information, such as functional groups (i.e., rings) in biomolecular networks, or cliques in proteinprotein interaction networks and social networks, has been widely recognised as a crucial source for graph prediction tasks [Milo2019network]. Second, realworld graphs may have various properties such as onehot node attributes, realvalued node attributes, or even no attributes (see Table I). Hence, we would like our new pooling method to seamlessly handle different types of graphs and make the best of nodelevel information when available.
Specifically, CoPooling is comprised of two key components: edgeview pooling and nodeview pooling. The aim of edgeview pooling is to preserve crucial graph structure, which can benefit subsequent graph prediction tasks. This is achieved by capturing highorder structural information via generalized PageRank and pruning the edges with lower proximity weights. For nodeview pooling, the importance score for each node is computed and the top important nodes are selected for pooling. The learning of graph pooling from edge and node views seamlessly reinforces each other through exchanging the proximity weights and the selected important nodes. The final pooled graph is obtained by fusing graph representations learnt from two views. Through crossview interaction, CoPooling enables edgeview pooling and nodeview pooling to complement each other towards learning effective graph representations.
Our contributions are summarised as follows:

We empirically analyze the ineffectiveness of the existing nodecentric graph pooling methods in fully leveraging graph structure.

We propose a new crossview graph pooling (CoPooling) method to learn graph representations by fusing the pooled graph information from both node view and edge view. The proposed method has the flexibility to handle different types of graphs (labeled/attributed graphs and plain graphs).

We verify the effectiveness of CoPooling in graph classification and regression tasks across a wide range of 15 graph benchmark datasets, demonstrating its superiority over stateoftheart pooling methods.
Ii Related Work
Graph pooling is a key component in GNNs for learning a vector representation of the whole graph. The existing graph pooling methods can be divided into two categories: sampling based pooling and clustering based pooling.
Sampling based pooling methods generate a smaller induced graph by selecting the top important nodes according to certain importance scores of nodes. There has been a series of graph pooling methods that fall into this category. Gao et al. [gao2019graph] proposed a pooling method that selects the top nodes to form a smaller graph. This method projects node features into scalar values and uses them as the node selection criterion. Similarly, SAGPool [lee2019self] employs a selfattention mechanism to calculate the importance score for each node, and then chooses the topranked nodes to induce the pooled graph. Ranjan et al. [ranjan2020asap] proposed adaptive structure aware pooling (ASAP) for learning graph representations. This method updates node embeddings by aggregating features of neighbouring nodes in a local region. After that, a fitness score for the updated node is calculated to select the top nodes to form the pooled graph. These methods, however, do not leverage graph structure in the pooling process. HGPSL [zhang2019hierarchical] takes a step forward to learn new edges between the top important nodes selected for the pooled graph. However, this method does not preserve the structure information contained in the original graph.
On the other hand, clustering based pooling methods learn an assignment matrix to cluster nodes into supernodes. DiffPool [ying2018hierarchical] learns a differentiable soft cluster assignment for nodes in an endtoend manner. The learnt assignment is used for grouping nodes in the last layer to several clusters in the subsequent layer. HaarPooling [wang2020haar] relies on compressive Haar transform filters to generate the induced graph of smaller size.
Most of the existing pooling methods operate on a single node view; they are unable to fully leverage crucial graph structure. Although preliminary attempts (e.g., EdgePool [diehl2019edge] and EdgeCut [galland2021graph]) have been made to pool the input graph from an edge view, these methods simply rely on local connectivity to calculate pairwise edge scores and suffer from high computational complexity. In contrast, our edgeview pooling mechanism leverages higherorder structure information to assess the importance of edges, which is fed to further guide the selection of important nodes for nodeview pooling. To the best of our knowledge, our work is the first to propose a crossview graph pooling method, which enables us to fuse useful information from both edge and node views towards learning effective graphlevel representations.
Iii Method
This section presents the overview of our proposed crossview graph pooling, followed by a detailed description of the main components.
Iiia Preliminaries
We first define notations to clarify the description of our proposed method. Assume we have input graphs and their corresponding targets . For graph classification, is a discrete class label; for graph regression, is a continuous regression target variable . A graph is represented as , where represents the node attribute matrix, where is the number of nodes and is the dimension of node attributes; is the adjacent matrix, if there is an edge between node and node , ; otherwise . For simplicity, is also noted as to represent an arbitrary graph. stands for the adjacent matrix with selfloops.
In this paper, we use graph convolution network (GCN) as our backbone to learn representations for graphs. The graph convolution operation is defined as:
(1) 
where is node embedding after convolution, is the dimension of node embedding; is diagonal degree matrix; is a learnable parameter. After node embeddings are learnt, graph pooling operation is applied to aggregate node embeddings to form a vector representation for the whole graph. This graphlevel representation can be used for downstream graph prediction tasks, i.e. graph classification and graph regression.
IiiB CrossView Graph Pooling
The proposed crossview graph pooling (CoPooling) consists of two main components: edgeview pooling and nodeview pooling, as illustrated in Fig. 2, CoPooling simultaneously performs pooling from both edge view and node view. Through crossview interaction, edgeview pooling and nodeview pooling seamlessly reinforce each other, and finally the pooled graphs from two views are fused to form the final graph representation.
IiiB1 EdgeView Pooling
The key objective of edgeview pooling is to preserve crucial information contained in graph structure. This is achieved by capturing highorder structural information via generalized PageRank (GPR) [chien2021adaptive] and pruning unimportant edges. Through edgeview pooling, the learnt representation captures a better connectivity relationship between nodes and higherorder graph structure information.
Specifically, we first update node embeddings by generalized PageRank to capture the information from higherhop neighbours. As shown in Eq. (2), the node embeddings are updated by multiplying different GPR weights . When , we have ; while , we have . Through generalized PageRank, node embeddings propagate steps. For each step, the GPR weight is learnable. Therefore, the contribution of each propagation step towards node embeddings can be learnt adaptively. The GPR operation of steps helps to incorporate higherhop neighbours’ information to update node embeddings.
(2) 
After updating node embeddings by generalized PageRank, we calculate the proximity weights between each pair of nodes, and the edges with low proximity weights can be pruned to only preserve crucial graph structure.
The process of computing the proximity weights can be illustrated using Eq. (3), where and are the GPR updated embeddings of node and node . We first transform node embeddings and
via a linear transformation parameterized with
and then concatenate the transformed embeddings. Another linear transformation with learnable parameters is used to transform the concatenated embeddings. Finally, the proximity weight between node and nodeis obtained after a Sigmoid function.
(3) 
where is the proximity weight between node and node ; is Sigmoid function; represents the concatenation operation; and are learnable parameters; indicates whether or not there is an edge connecting node and node .
According to the proximity weight of each node pair, we can obtain the proximity matrix for all node pairs in a graph. To emphasize the proximity between each node and itself, we update the proximity matrix by adding value 1 in diagonal, i.e., . For undirected graphs, we average the proximity weights at symmetric positions by .
Based on the proximity matrix , we prune unimportant edges with low proximity weights in the graph. For a given edge retaining ratio , we have the cut proximity matrix , where is the operation that retains the top percentage of edges with high proximity weights. Accordingly, we update the adjacent matrix to reflect the removal of edges. The cut proximity matrix provides a better measure to quantify the higherorder connectivity relationship between nodes, which is fed to nodeview pooling to guide the selection of important nodes.
IiiB2 NodeView Pooling
For nodeview pooling, the aim is to select the top important nodes for coarsening the input graph. To better measure the connectivity between nodes, we take the cut proximity matrix from edgeview pooling to compute an importance score for each node, given by
(4) 
where is the score vector for all nodes, is diagonal degree matrix of , , is the vector containing all ones.
Based on nodes’ importance scores, we select the top nodes, where is the node pooling ratio. After selecting the nodes, we obtain the and embeddings of all selected nodes.
IiiB3 EdgeNode View Interaction
To enable edgeview pooling and nodeview pooling to reinforce each other, our CoPooling method exchanges the cut proximity matrix and the indices of selected nodes, which serve as the mediator for the interaction between two views.
For nodeview pooling, the cut proximity matrix from edgeview pooling is used to calculate the important score for each node. The cut proximity matrix better reflects higherorder connectivity relationship between nodes, thus providing more accurate information than the original adjacent matrix to quantify the importance of nodes. After obtaining the node scores, we select the topK important nodes as the pooled representation from nodeview pooling, i.e. , where represents index selection operation.
For edgeview pooling, the indices of selected nodes obtained from nodeview pooling are used to cluster nodes into supernodes. The important node indices in nodeview pooling are useful to guide the clustering operation, as the selected important nodes are determined by considering higherorder structure information. The pooled representation from edgeview pooling is obtained through , where means matrix multiplication.
Lastly, the pooled representations from nodeview pooling and edgeview pooling are fused to form the final graph representation as:
(5) 
where is the learnable parameter for linear transformation; represents the concatenation operator. is the graphlevel representation after pooling. Through edgenode view interaction, our CoPooling method enables edgeview pooling and nodeview pooling to complement each other towards learning more effective graph representations.
Data set  # of Graphs  # of Classes  Avg. # of Nodes  Avg. # of Edges  Node Attributes  Dataset Type 
BZRA  405  2  35.75  38.36  Realvalued Attribute  Attributed 
AIDSA  2000  2  15.69  16.20  Realvalued Attribute  Attributed 
FRANKENSTEIN  4337  2  16.90  17.88  Realvalued Attribute  Attributed 
PROTEINS  1113  2  39.06  72.82  Node label  Labeled 
D&D  1178  2  284.32  715.66  Node label  Labeled 
NCI1  4110  2  29.87  32.30  Node label  Labeled 
NCI109  4127  2  29.68  32.13  Node label  Labeled 
MSRC_21  563  20  77.52  198.32  Node label  Labeled 
COLLAB  5000  3  74.49  2457.78  None  Plain 
IMDBBINARY  1000  2  19.77  96.53  None  Plain 
IMDBMULTI  1500  3  13.00  65.94  None  Plain 
REDDITBINARY  2000  2  429.63  497.75  None  Plain 
REDDITMULTI12K  11929  11  391.41  456.89  None  Plain 
Iv Experiments
In this section, we validate the performance of our proposed crossview graph pooling method on both graph classification and graph regression tasks. For graph classification task, we compare our method with several stateoftheart pooling methods at two settings: complete graphs with various types of node attributes and incomplete graphs. As illustrated in Fig. 3, complete graphs refer to the graphs with all node attributes, while incomplete graphs are the ones with a portion of nodes having completely missing attributes. The incomplete graph setting is used to simulate realworld scenarios where attribute information for some nodes are inaccessible due to privacy or legal constraints. Our method is also compared against baseline pooling methods on graph regression task.
Iva Graph Classification on Complete Graphs
Benchmark Datasets
We conduct graph classification tasks on a total of 13 benchmark graph datasets with various attribute properties, including three attributed graphs with realvalued node attributes, five labeled graphs with only onehot node attributes, and five plain graphs with no node attributes. The detailed statistics about these datasets are listed in Table I.

BZRA [sutherland2003spline]
is a dataset of chemical compounds for classifying biological activities into active and inactive. The node attributes are 3D coordinates of the compound structures.

AIDSA [riesen2008iam] contains graphs representing molecular compounds. It contains two classes of graphs, which are against HIV or not.

FRANKENSTEIN [orsini2015graph]
consists of molecules as mutagens and nonmutagens for mutagenicity classification task. Node attributes are 780 dimensions MNIST
[LecunMnist] image vectors of pixel intensities, which represent chemical atom symbols. 
D&D [dobson2003distinguishing] and PROTEINS [borgwardt2005protein] include macromolecules as graph datasets in bioinformatics, which are for enzyme and nonenzyme classification task.

NCI1 [wale2008comparison] and NCI109 [wale2008comparison] contain chemical compounds as small molecules, which are used for anticancer activity classification.

MSRC_21 [neumann2016propagation] is graph dataset constructed by semantic images. Each semantic image is represented as a conditional Markov random field graph. Nodes in a graph represent the segmented superpixels in an image. If the segmented superpixels are adjacent, the corresponding nodes are connected. Each node is assigned with a semantic label as the node attribute.

COLLAB [yanardag2015deep] is a scientific collaboration dataset, where each graph represents the collaboration network of one researcher. The task of COLLAB is to classify the graph into different research fields.

IMDBBINARY [yanardag2015deep] and IMDBMULTI [yanardag2015deep] are two datasets for classifying each graph into different movie genres. Each graph is an egonetwork for each actor/actress, and nodes also represent actors/actresses.

REDDITBINARY [yanardag2015deep] and REDDITMULTI12k [yanardag2015deep] are two datasets generated from online discussions. Each graph represents a discussion thread where nodes are different users. If one of two users responds to the other one, there is an edge between these two. The task is to classify which section the discussion belongs to.
Datasets  SAGPool  ASAP  DiffPool  HGPSL  EdgePool  CoPooling/GPR  CoPooling/NV  CoPooling 
BZRA  82.954.91  83.706.00  83.934.41  83.236.51  83.436.00  81.005.82  81.695.80  85.675.29 
AIDSA  98.850.78  99.000.74  99.400.58  99.100.66  99.050.69  98.850.71  98.900.58  99.450.42 
FRANKENSTEIN  60.942.90  66.732.76  65.081.50  62.191.74  62.992.21  64.011.70  67.002.37  64.151.34 
D&D  76.913.42  77.843.41  78.012.70  77.334.22  76.662.05  75.813.81  77.005.04  77.852.21 
PROTEINS  73.684.63  74.855.18  75.112.95  74.134.12  77.015.41  73.682.33  76.285.09  76.194.13 
NCI1  71.514.51  76.591.71  74.141.43  73.482.42  78.392.43  77.252.11  79.152.04  78.661.48 
NCI109  69.693.27  74.733.48  72.041.43  72.302.18  77.012.39  75.601.46  78.071.77  77.082.03 
MSRC_21  90.222.82  90.413.91  90.413.58  88.974.78  90.053.02  91.642.79  91.293.70  92.542.63 
COLLAB  70.582.31  72.841.84  72.181.68  74.22.72    74.822.10  68.95.59  77.302.29 
IMDBBINARY  60.92.34  65.52.80  58.275.92  62.53.5  60.35.08  70.43.85  70.83.6  72.14.44 
IMDBMULTI  39.83.39  45.934.03  40.004.52  40.534.88  44.274.50  47.64.55  44.83.94  49.073.28 
REDDITBINARY  83.554.53    84.612.42    88.352.31  88.902.00  88.04.69  89.351.25 
REDDITMULTI12K  40.563.30    41.211.96      46.842.26  49.021.56  46.852.62 
Baselines
We use five stateoftheart graph pooling methods as our baselines: SAGPool [lee2019self], ASAP [ranjan2020asap], DiffPool [ying2018hierarchical], HGPSL [zhang2019hierarchical], and EdgePool [diehl2019edge]
. When training DiffPool, we use the auxiliary link prediction loss function and entropy regularization items as did in the original paper. In addition, we also compare with two ablated variants of our crossview graph pooling (CoPooling): CoPooling/GPR, and CoPooling/NV. Crosspool/GPR is our crossview graph pooling without generalized PageRank, and CoPooling/NV is our crossview graph pooling without nodeview pooling.
Model Architecture and Training
In our experiments, the GNN used is built on the GCN architecture. The whole GNN consists of three GCN layers, two pooling layers and three linear transformation layers. After the last linear transformation layer, the SoftMax classifier is connected. Note that, the input to the first linear transformation layer is the concatenated features after each pooling layer. For all datasets, we use the same GNN architecture for fair comparison. The detailed architecture is provided in Appendix A.
When training the GNN model, we perform 10fold crossvalidation as did in [ying2018hierarchical]. We randomly split the dataset into training, validation, and test sets with 80%, 10% and 10% graphs. We use Adam [kingma2015adam]
optimizer for training the GNN model. The optimization stops if the validation loss does not improve after 50 epochs. The maximum epoch is set as 300. Following the strategy of searching optimal hyperparameters in
[lee2019self], we use grid search to obtain optimal hyperparameters for each method. The ranges of different hyperparameters are as follows: learning rate in {0.005, 0.0005, 0.001}; weight decay in {0.0001 0.001}; node pooling ratio in {0.5, 0.25}; hidden size in {128, 64}; dropout ratio in {0, 0.5}. To implement convolution operation on plain graph datasets where nodes have no attributes, we follow the implementation in DiffPool [ying2018hierarchical]to pad each node with a constant vector, i.e. allone vector in a fixed dimension.
Classification with Stateoftheart
We compare graph classification accuracy of all methods averaged over 10fold crossvalidation on each dataset. For a fair comparison, all baseline methods and our method are trained using the same training strategy. The GNN model architecture used for each method is also the same. As shown in Table II, our crossview graph pooling method achieve the best results across 11 datasets and the second place on the other two datasets. Particularly, our method significantly improves the best baseline method by 6.6%, 3.14%, 7.81%, 2.13% and 1.74% on IMDBBINARY, IMDBMULTI, REDDITMULTI12K, MSRC_21 and BZRA, respectively. This proves the effectiveness of our proposed method in predicting different types of graphs with various attribute properties. It is worth noting that our proposed crossview graph pooling method achieves the best performance on all five datasets without node attributes. This shows the superiority of our method to complement the nodeview pooling by edgeview pooling, when node attributes are not informative.
When comparing different variants of our method, CoPooling consistently outperforms CoPooling/GPR on all datasets. This shows the importance of using generalized PageRank to capture the higherorder structure information. CoPooling yields higher accuracy than CoPooling/NV on most (8/13) of the datasets. This demonstrates the effectiveness of our method in combining two complementary views. On Labeled graphs, the performance of CoPooling and CoPooling/NV is comparable. This is because the important node indices used for clustering nodes in edge view may be inaccurate, as onehot attributes provide limited information. On attributed graphs with realvalued node attributes and plain graphs with padding allone vectors as node attributes, the obtained important node indices are more accurate to complement edgeview.
IvB Graph Classification on Incomplete Graphs
We also compare the performance of our method and baseline methods on incomplete graphs. For incomplete graphs, a portion of nodes have completely missing attributes, as shown in Fig. 3. This set of experiments is used to evaluate the effectiveness of our method on realworld scenarios, where attribute information for some nodes are inaccessible due to privacy or legal constraints.
Experimental Setup and Training
We perform experiments on attributed graph AIDSA and labeled graph MSRC_21 as a case study. For the two datasets, we randomly select different ratios of nodes and remove their original node attributes, while keeping all other remaining nodes unchanged. We define the ratio of nodes with all their attributes removed as incomplete ratio. For example, if we remove all attributes for 10% of nodes, the incomplete ratio is 10%. The resulting incomplete graph datasets are randomly divided into training set (80%), validation set (10%) and test set (10%). We train the GNN model with different pooling methods on training set. The GNN model architecture used in this part is the same as that in Section IVA. The best hyperparameters obtained from Section IVA are used for training the GNN model in this part. Model architecture and training strategy for each method remain the same. We report graph classification accuracy averaged over 10fold crossvalidation.
Comparison. with Stateoftheart
Fig. 4 compares the classification accuracy of all methods on MSRC_21 incomplete dataset. For all baseline methods, the classification accuracy drops markedly as the incomplete ratio increases from 0% to 50%. For our CoPooling method, the accuracy decreases at a much slower rate than baseline methods. Especially for DiffPool, HGPSL and EdgePool, the classification accuracy drops by 3.73%, 12.33% and 4.61%, respectively, even though only 10% nodes have their attributes missing. Under the 10% incomplete ratio, CoPooling and its variants can still achieve at least 77.93% accuracy. Compared with the best baseline method ASAP, CoPooling achieves an average of 8.62% increase in classification on all incomplete graph datasets with an incomplete ratio from 0% to 50%.
Fig. 5 compares graph classification accuracy of all methods on AIDSA incomplete datasets. We can see that, for SAGPool, DiffPool and EdgePool, the classification accuracy decreases by 3.25%, 5.45%, and 3.2%, respectively, when the incomplete ratio increases from 0% to 50%. In contrast, the accuracy of our methods drops only by 1.15% under the same setting. Compared with ASAP, our methods can achieve better performance under all incomplete ratios. Compared with HGPSL, our methods achieve better performance on 0%, 10%, 20% and 40% incomplete graph datasets. Overall, our method still outperform HGPSL in terms of the average performance on all incomplete graph datasets,
The classification comparisons on incomplete graph datasets demonstrate the effectiveness of our method in handling graphs with missing node attributes. This further testifies the complementary advantage of our method by fusing nodeview and edgeview pooling, especially when node attributes are less informative.
IvC Parameter Sensitivity
The CoPooling method has the edge retaining ratio () as an important parameter to determine how many percentage of edges are retained during edgeview pooling. To investigate the effect of the edge retaining ratio (i.e., ) on the graph classification accuracy of CoPooling, we conduct empirical studies on six representative graph datasets, including two labeled graphs, two attributed graphs, and two plain graphs. On each graph dataset, we train the GNN model with the keeping ratio ranging from 10% to 100%. All other hyperparameters are set as the best parameters obtained in Section IVA. We also use the same GNN model architecture and training strategy as in Section IVA. We report the average classification accuracy on 10fold crossvalidation.
Fig. 6 plots the change in classification accuracy with respect to on the six datasets. On the two labeled graphs (PROTEINS and D&D), we find that keeping all edges () is not the best choice for graph classification. As shown in Fig. 6(a) and (b), CoPooling achieves the highest classification accuracy when on PROTEIN, and on D&D, respectively. Similarly, this phenomenon can also be observed on the two attributed graphs (BZRA and AIDSA). As shown in Fig. 6(c) and (d), CoPooling achieves the best performance when is set to 0.6 on the two datasets. The results on the four datasets indicate not all edges are useful for graph classification when graphs have informative node attributes. Again, this confirms the benefit of our proposed method to preserve crucial edge information through edgeview pooling and use this knowledge to further guide nodeview pooling.
In contrast, on the two plain graphs (IMDBBINARY and REDDITMULTI12K), keeping all edges renders the highest classification accuracy. As shown in Fig. 6 (e) and (f), CoPooling achieves the best performance on both graphs when retaining all edges (). This is what we have expected, because when graphs have no node attributes, the whole graph structure is more critical to graph classification.
IvD Graph Regression
Lastly, we also carry out experiments to evaluate the effectiveness of the proposed method in undertaking graph regression task. We compare CoPooling with the same stateoftheart pooling methods on the following two graph datasets:

ZINC [bresson2019two, irwin2012zinc] contains 250k molecules and their property values. The task is to regress the property values of input graph. In this experiment, we focus on predicting one graph property, contained solubility. Following the setting in [dwivedi2020benchmarkgnns], we use 10k graphs from ZINC for training, 1K graphs for validation, and 1K graphs for testing.

QM9 [wu2018moleculenet, ramakrishnan2014quantum]
is a graph dataset consisting of 13k molecules with 19 regression targets. This dataset is originally used in quantum chemistry for regressing the property of molecules. We try to regress dipole moment
, one of 19 properties. All 13K molecules are randomly divided into 80% training set, 10% validation set and 10% test set.
For training a regression model for each dataset, we use GCN as the backbone GNN and inject two pooling layers before a MLP. Follow the setting in [dwivedi2020benchmarkgnns], L1 loss function is used for training the model. We use Adam optimizer and learning rate decay policy to optimize the model. The initial learning rate and weight decay are set as 0.001 and 0.0001, respectively. We train the regression model under four different random seeds and report the average mean absolute error (MAE) on the test set.
We compare our CoPooling method with SAGPool, ASAP, DiffPool, HGPSL, and EdgePool on the two datasets. As shown in Table III, our CoPooling method consistently achieves better regression performance than other baseline methods. Particularly, CoPooling outperforms DiffPool and HGPSL by a large margin on both datasets. The experimental results reflect the effectiveness of our method on graph regression task. This concludes that our method is effective in learning a better graphlevel representation by fusing edgeview and nodeview pooling, leading to competitive performance on both graph classification and regression tasks.
Datasets  ZINC  QM9 

GCN+SAGPool  0.3780.031  0.5450.010 
GCN+ASAP  0.3720.026  0.5000.017 
GCN+DiffPool  1.6410.026  1.3310.014 
GCN+HGPSL  1.3260.096  1.0350.049 
GCN+EdgePool  0.3820.030  0.4890.022 
GCN+CoPooling (ours)  0.3400.036  0.4390.009 
V Conclusion and Future Work
We proposed a crossview graph pooling (CoPooling) method to learn graphlevel representations from both edge view and node view. We argued that most of the existing pooling methods are highly nodecentric and they are unable to fully leverage crucial information contained in graph structure. To explicitly exploit graph structure, our proposed method seamlessly fuses the pooled graph information from two views. From the edge view, generalized PageRank is used to aggregate information from higherhop neighbours to better capture the higherorder graph structure. The proximity weights between node pairs are calculated to prune less important edges. From the node view, the node importance scores are computed through the proximity matrix to select the top important nodes for nodeview pooling. The pooled representations from two views are fused together as the final graph representation. Through crossview interaction, edgeview pooling and nodeview pooling complement each other to effectively learn informative graphlevel representations. Experiments on a total of 15 graph benchmark datasets demonstrate the superior performance of our method on both graph classification and regression tasks. For future work, we would like to generalize crossview graph pooling to learn more interpretable graph representations.
Appendix A The graph neural network structure
The network structure for graph classification task is shown in Fig. 7.
Comments
There are no comments yet.