1 Introduction
Convolutional neural networks (CNNs) (LeCun et al., 2012)
have demonstrated great capability in various challenging artificial intelligence tasks, especially in fields of computer vision
(He et al., 2017; Huang et al., 2017)and natural language processing
(Bahdanau et al., 2015). One common property behind these tasks is that both images and texts have gridlike structures. Elements on feature maps have locality and order information, which enables the application of convolutional operations (Defferrard et al., 2016).In practice, many realworld data can be naturally represented as graphs such as social and biological networks. Due to the great success of CNNs on gridlike data, applying them on graph data (Gori et al., 2005; Scarselli et al., 2009) is particularly appealing. Recently, there have been many attempts to extend convolutions to graph data (GNNs) (Kipf & Welling, 2017; Veličković et al., 2017; Gao et al., 2018). One common use of convolutions on graphs is to compute node representations (Hamilton et al., 2017; Ying et al., 2018). With learned node representations, we can perform various tasks on graphs such as node classification and link prediction.
Images can be considered as special cases of graphs, in which nodes lie on regular 2D lattices. It is this special structure that enables the use of convolution and pooling operations on images. Based on this relationship, node classification and embedding tasks have a natural correspondence with pixelwise prediction tasks such as image segmentation (Noh et al., 2015; Gao & Ji, 2017; Jégou et al., 2017). In particular, both tasks aim to make predictions for each input unit, corresponding to a pixel on images or a node in graphs. In the computer vision field, pixelwise prediction tasks have achieved major advances recently. Encoderdecoder architectures like the UNet (Ronneberger et al., 2015) are stateoftheart methods for these tasks. It is thus highly interesting to develop UNetlike architectures for graph data. In addition to convolutions, pooling and upsampling operations are essential building blocks in these architectures. However, extending these operations to graph data is highly challenging. Unlike gridlike data such as images and texts, nodes in graphs have no spatial locality and order information as required by regular pooling operations.
To bridge the above gap, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. Based on these two operations, we propose UNetlike architectures for graph data. The gPool operation samples some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. As an inverse operation of gPool, we propose a corresponding graph unpooling (gUnpool) operation, which restores the graph to its original structure with the help of locations of nodes selected in the corresponding gPool layer. Based on the gPool and gUnpool layers, we develop graph UNets, which allow highlevel feature encoding and decoding for network embedding. Experimental results on node classification and graph classification tasks demonstrate the effectiveness of our proposed methods as compared to previous methods.
2 Related Work
Recently, there has been a rich line of research on graph neural networks (Gilmer et al., 2017). Inspired by the first order graph Laplacian methods, (Kipf & Welling, 2017) proposed graph convolutional networks (GCNs), which achieved promising performance on graph node classification tasks. The layerwise forwardpropagation operation of GCNs is defined as:
(1) 
where is used to add selfloops in the input adjacency matrix , is the feature matrix of layer . The GCN layer uses the diagonal node degree matrix to normalize .
is a trainable weight matrix that applies a linear transformation to feature vectors. GCNs essentially perform aggregation and transformation on node features without learning trainable filters.
(Hamilton et al., 2017) tried to sample a fixed number of neighboring nodes to keep the computational footprint consistent. (Veličković et al., 2017) proposed to use attention mechanisms to enable different weights for neighboring nodes. (Schlichtkrull et al., 2018) used relational graph convolutional networks for link prediction and entity classification. Some studies applied GNNs to graph classification tasks (Duvenaud et al., 2015; Dai et al., 2016; Zhang et al., 2018). (Bronstein et al., 2017)discussed possible ways of applying deep learning on graph data.
(Henaff et al., 2015) and (Bruna et al., 2014) proposed to use spectral networks for largescale graph classification tasks. Some studies also applied graph kernels on traditional computer vision tasks (Gama et al., 2019; Fey et al., 2018; Monti et al., 2017).In addition to convolution, some studies tried to extend pooling operations to graphs. (Defferrard et al., 2016) proposed to use binary tree indexing for graph coarsening, which fixes indices of nodes before applying 1D pooling operations. (Simonovsky & Komodakis, 2017) used deterministic graph clustering algorithm to determine pooling patterns. (Ying et al., 2018) used an assignment matrix to achieve pooling by assigning nodes to different clusters of the next layer.
3 Graph UNets
In this section, we introduce the graph pooling (gPool) layer and graph unpooling (gUnpool) layer. Based on these two new layers, we develop the graph UNets for node classification tasks.
3.1 Graph Pooling Layer
Pooling layers play important roles in CNNs on gridlike data. They can reduce sizes of feature maps and enlarge receptive fields, thereby giving rise to better generalization and performance (Yu & Koltun, 2016). On gridlike data such as images, feature maps are partitioned into nonoverlapping rectangles, on which nonlinear downsampling functions like maximum are applied. In addition to local pooling, global pooling layers (Zhao et al., 2015a) perform downsampling operations on all input units, thereby reducing each feature map to a single number. In contrast,
max pooling layers
(Kalchbrenner et al., 2014) select the largest units out of each feature map.However, we cannot directly apply these pooling operations to graphs. In particular, there is no locality information among nodes in graphs. Thus the partition operation is not applicable on graphs. The global pooling operation will reduce all nodes to one single node, which restricts the flexibility of networks. The max pooling operation outputs the largest units that may come from different nodes in graphs, resulting in inconsistency in the connectivity of selected nodes.
In this section, we propose the graph pooling (gPool) layer to enable downsampling on graph data. In this layer, we adaptively select a subset of nodes to form a new but smaller graph. To this end, we employ a trainable projection vector . By projecting all node features to 1D, we can perform max pooling for node selection. Since the selection is based on 1D footprint of each node, the connectivity in the new graph is consistent across nodes. Given a node with its feature vector , the scalar projection of on is . Here, measures how much information of node can be retained when projected onto the direction of . By sampling nodes, we wish to preserve as much information as possible from the original graph. To achieve this, we select nodes with the largest scalar projection values on to form a new graph.
Suppose there are nodes in a graph and each of which contains features. The graph can be represented by two matrices; those are the adjacency matrix and the feature matrix . Each nonzero entry in the adjacency matrix represents an edge between two nodes in the graph. Each row vector in the feature matrix denotes the feature vector of node in the graph. The layerwise propagation rule of the graph pooling layer is defined as:
(2)  
idx  
where is the number of nodes selected in the new graph. is the operation of node ranking, which returns indices of the largest values in . The idx returned by contains the indices of nodes selected for the new graph. and perform the row and/or column extraction to form the adjacency matrix and the feature matrix for the new graph. extracts values in with indices idx followed by a sigmoid operation. is a vector of size with all components being 1, and represents the elementwise matrix multiplication.
is the feature matrix with row vectors , each of which corresponds to a node in the graph. We first compute the scalar projection of on , resulting in with each measuring the scalar projection value of each node on the projection vector . Based on the scalar projection vector , operation ranks values and returns the largest values in . Suppose the selected indices are with and . Note that the index selection process preserves the position order information in the original graph. With indices idx, we extract the adjacency matrix and the feature matrix for the new graph. Finally, we employ a gate operation to control information flow. With selected indices idx, we obtain the gate vector by applying sigmoid to each element in the extracted scalar projection vector. Using elementwise matrix product of and , information of selected nodes is controlled. The th row vector in is the product of the th row vector in and the th scalar value in .
Notably, the gate operation makes the projection vector trainable by backpropagation (LeCun et al., 2012). Without the gate operation, the projection vector produces discrete outputs, which makes it not trainable by backpropagation. Figure 1 provides an illustration of our proposed graph pooling layer. Compared to pooling operations used in gridlike data, our graph pooling layer employs extra training parameters in projection vector . We will show that the extra parameters are negligible but can boost performance.
3.2 Graph Unpooling Layer
Upsampling operations are important for encoderdecoder networks such as UNet. The encoders of networks usually employ pooling operations to reduce feature map size and increase receptive field. While in decoders, feature maps need to be upsampled to restore their original resolutions. On gridlike data like images, there are several upsampling operations such as the deconvolution (Isola et al., 2017; Zhao et al., 2015b) and unpooling layers (Long et al., 2015). However, such operations are not currently available on graph data.
To enable upsampling operations on graph data, we propose the graph unpooling (gUnpool) layer, which performs the inverse operation of the gPool layer and restores the graph into its original structure. To achieve this, we record the locations of nodes selected in the corresponding gPool layer and use this information to place nodes back to their original positions in the graph. Formally, we propose the layerwise propagation rule of graph unpooling layer as
(3) 
where contains indices of selected nodes in the corresponding gPool layer that reduces the graph size from nodes to nodes. are the feature matrix of the current graph, and are the initially empty feature matrix for the new graph. is the operation that distributes row vectors in into feature matrix according to their corresponding indices stored in idx. In , row vectors with indices in idx are updated by row vectors in , while other row vectors remain zero.
3.3 Graph UNets Architecture
It is wellknown that encoderdecoder networks like UNet achieve promising performance on pixelwise prediction tasks, since they can encode and decode highlevel features while maintaining local spatial information. Similar to pixelwise prediction tasks (Gong et al., 2014; Ronneberger et al., 2015), node classification tasks aim to make a prediction for each input unit. Based on our proposed gPool and gUnpool layers, we propose our graph UNets (gUNets) architecture for node classification tasks.
In our graph UNets (gUNets), we first apply a graph embedding layer to convert nodes into lowdimensional representations, since original inputs of some dataset like Cora (Sen et al., 2008) usually have very highdimensional feature vectors. After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higherorder features, while GCN layers are responsible for aggregating information from each node’s firstorder information. In the decoder part, we stack the same number of decoding blocks as in the encoder part. Each decoder block is composed of a gUnpool layer and a GCN layer. The gUnpool layer restores the graph into its higher resolution structure, and the GCN layer aggregates information from the neighborhood. There are skipconnections between corresponding blocks of encoder and decoder layers, which transmit spatial information to decoders for better performance. The skipconnection can be either feature map addition or concatenation. Finally, we employ a GCN layer for final predictions before the softmax function. Figure 3 provides an illustration of a sample gUNets with two blocks in encoder and decoder. Notably, there is a GCN layer before each gPool layer, thereby enabling gPool layers to capture the topological information in graphs implicitly.
3.4 Graph Connectivity Augmentation via Graph Power
In our proposed gPool layer, we sample some important nodes to form a new graph for highlevel feature encoding. Since related edges are removed when removing nodes in gPool, the nodes in the pooled graph might become isolated. This may influence the information propagation in subsequent layers, especially when GCN layers are used to aggregate information from neighboring nodes. We need to increase connectivity among nodes in the pooled graph. To address this problem, we propose to use the graph power to increase the graph connectivity. This operation builds links between nodes whose distances are at most hops (Chepuri & Leus, 2016). In this work, we employ since there is a GCN layer before each gPool layer to aggregate information from its firstorder neighboring nodes. Formally, we replace the fifth equation in Eq 2 by:
(4) 
where is the graph power. Now, the graph sampling is performed on the augmented graph with better connectivity.
3.5 Improved GCN Layer
In Eq. 1, the adjacency matrix before normalization is computed as in which a selfloop is added to each node in the graph. When performing information aggregation, the same weight is given to node’s own feature vector and its neighboring nodes. In this work, we wish to give a higher weight to node’s own feature vector, since its own feature should be more important for prediction. To this end, we change the calculation to by imposing larger weights on self loops in the graph, which is common in graph processing. All experiments in this work use this modified version of GCN layer for better performance.
4 Experimental Study
In this section, we evaluate our gPool and gUnpool layers based on the gUNets proposed in Section 3.3. We compare our networks with previous stateoftheart models on node classification and graph classification tasks. Experimental results show that our methods achieve new stateoftheart results in terms of node classification accuracy and graph classification accuracy. Some ablation studies are performed to examine the contributions of the proposed gPool layer, gUnpool layer, and graph connectivity augmentation to performance improvements. We conduct studies on the relationship between network depth and node classification performance. We investigate if additional parameters involved in gPool layers can increase the risk of overfitting.
Dataset  Nodes  Features  Classes  Training  Validation  Testing  Degree 

Cora  2708  1433  7  140  500  1000  4 
Citeseer  3327  3703  6  120  500  1000  5 
Pubmed  19717  500  3  60  500  1000  6 
Dataset  Graphs  Nodes (max)  Nodes (avg)  Classes 

D&D  1178  5748  284.32  2 
PROTEINS  1113  620  39.06  2 
COLLAB  5000  492  74.49  3 
4.1 Datasets
In experiments, we evaluate our networks on node classification tasks under transductive learning settings and graph classification tasks under inductive learning settings.
Under transductive learning settings, unlabeled data are accessible for training, which enables the network to learn about the graph structure. To be specific, only part of nodes are labeled while labels of other nodes in the same graph remain unknown. We employ three benchmark datasets for this setting; those are Cora, Citeseer, and Pubmed (Kipf & Welling, 2017), which are summarized in Table 1. These datasets are citation networks, with each node and each edge representing a document and a citation, respectively. The feature vector of each node is the bagofword representation whose dimension is determined by the dictionary size. We follow the same experimental settings in (Kipf & Welling, 2017). For each class, there are 20 nodes for training, 500 nodes for validation, and 1000 nodes for testing.
Under inductive learning settings, testing data are not available during training, which means the training process does not use graph structures of testing data. We evaluate our methods on relatively large graph datasets selected from common benchmarks used in graph classification tasks (Ying et al., 2018; Niepert et al., 2016; Zhang et al., 2018). We use protein datasets including D&D (Dobson & Doig, 2003) and PROTEINS (Borgwardt et al., 2005), the scientific collaboration dataset COLLAB (Yanardag & Vishwanathan, 2015). These data are summarized in Table 2.
4.2 Experimental Setup
We describe the experimental setup for both transductive and inductive learning settings. For transductive learning tasks, we employ our proposed gUNets proposed in Section 3.3
. Since nodes in the three datasets are associated with highdimensional features, we employ a GCN layer to reduce them into lowdimensional representations. In the encoder part, we stack four blocks, each of which consists of a gPool layer and a GCN layer. We sample 2000, 1000, 500, 200 nodes in the four gPool layers, respectively. Correspondingly, the decoder part also contains four blocks. Each decoder block is composed of a gUnpool layer and a GCN layer. We use addition operation in skip connections between blocks of encoder and decoder parts. Finally, we apply a GCN layer for final prediction. For all layers in the model, we use identity activation function
(Gao et al., 2018) after each GCN layer. To avoid overfitting, we apply regularization on weights with . Dropout (Srivastava et al., 2014) is applied to both adjacency matrices and feature matrices with keep rates of 0.8 and 0.08, respectively.For inductive learning tasks, we follow the same experimental setups in (Zhang et al., 2018)
using our gUNets architecture as described in transductive learning settings for feature extraction. Since the sizes of graphs vary in graph classification tasks, we sample proportions of nodes in four gPool layers; those are 90%, 70%, 60%, and 50%, respectively. The dropout keep rate imposed on feature matrices is 0.3.
Models  Cora  Citeseer  Pubmed 

DeepWalk (Perozzi et al., 2014)  67.2%  43.2%  65.3% 
Planetoid (Yang et al., 2016)  75.7%  64.7%  77.2% 
Chebyshev (Defferrard et al., 2016)  81.2%  69.8%  74.4% 
GCN (Kipf & Welling, 2017)  81.5%  70.3%  79.0% 
GAT (Veličković et al., 2017)  83.0 0.7%  72.5 0.7%  79.0 0.3% 
gUNets (Ours)  84.4 0.6%  73.2 0.5%  79.6 0.2% 
Models  D&D  PROTEINS  COLLAB 

PSCN (Niepert et al., 2016)  76.27%  75.00%  72.60% 
DGCNN (Zhang et al., 2018)  79.37%  76.26%  73.76% 
DiffPoolDET (Ying et al., 2018)  75.47%  75.62%  82.13% 
DiffPoolNOLP (Ying et al., 2018)  79.98%  76.22%  75.58% 
DiffPool (Ying et al., 2018)  80.64%  76.25%  75.48% 
gUNets (Ours)  82.43%  77.68%  77.56% 
4.3 Performance Study
Under transductive learning settings, we compare our proposed gUNets with other stateoftheart models in terms of node classification accuracy. We report node classification accuracies on datasets Cora, Citeseer, and Pubmed, and the results are summarized in Table 3. We can observe from the results that our gUNets achieves consistently better performance than other networks. For baseline values listed for node classification tasks, they are the stateoftheart on these datasets. Our proposed model is composed of GCN, gPool, and gUnpool layers without involving more advanced graph convolution layers like GAT. When compared to GCN directly, our gUNets significantly improves performance on all three datasets by margins of 2.9%, 2.9%, and 0.6%, respectively. Note that the only difference between our gUNets and GCN is the use of encoderdecoder architecture containing gPool and gUnpool layers. These results demonstrate the effectiveness of gUNets in network embedding.
Under inductive learning settings, we compared our methods with other stateoftheart models on graph classification tasks with datasets D&D, PROTEINS, and COLLAB, and the results are summarized in Table 4. We can observe from the results that our proposed gPool method outperforms DiffPool (Ying et al., 2018) by margins of 1.79% and 1.43% on the D&D and PROTEINS datasets. Notably, the result obtained by DiffPoolDET on COLLAB is significantly higher than all other methods and the other two DiffPool models. On all three datasets, our model outperforms baseline models including DiffPool. In addition, DiffPool claimed that their training utilized auxiliary task of link prediction to stabilize model performance, which indicates the instability of DiffPool model. But in our experiments, we only use graph labels for training without any auxiliary tasks to stabilize training.
4.4 Ablation Study of gPool and gUnpool layers
Although GCNs have been reported to have worse performance when the network goes deeper (Kipf & Welling, 2017), it may also be argued that the performance improvement over GCN in Table 3 is due to the use of a deeper network architecture. In this section, we investigate the contributions of gPool and gUnpool layers to the performance of gUNets. We conduct experiments by removing all gPool and gUnpool layers from our gUNets, leading to a network with only GCN layers with skip connections. Table 5 provides the comparison results between gUNets with and without gPool or gUnpool layers. The results show that gUNets have better performance over gUNets without gPool or gUnpool layers by margins of 2.3%, 1.6% and 0.5% on Cora, Citeseer, and Pubmed datasets, respectively. These results demonstrate the contributions of gPool and gUnpool layers to performance improvement. When considering the difference between the two models in terms of architecture, gUNets enable higher level feature encoding, thereby resulting in better generalization and performance.
Models  Cora  Citeseer  Pubmed 

gUNets without gPool or gUnpool  82.1 0.6%  71.6 0.5%  79.1 0.2% 
gUNets (Ours)  84.4 0.6%  73.2 0.5%  79.6 0.2% 
Models  Cora  Citeseer  Pubmed 

gUNets without augmentation  83.7 0.7%  72.5 0.6%  79.0 0.3% 
gUNets (Ours)  84.4 0.6%  73.2 0.5%  79.6 0.2% 
Depth  Cora  Citeseer  Pubmed 

2  82.6 0.6%  71.8 0.5%  79.1 0.3% 
3  83.8 0.7%  72.7 0.7%  79.4 0.4% 
4  84.4 0.6%  73.2 0.5%  79.6 0.2% 
5  84.1 0.5%  72.8 0.6%  79.5 0.3% 
Models  Accuracy  #Params  Ratio of increase 

gUNets without gPool or gUnpool  82.1 0.6%  75,643  0.00% 
gUNets (Ours)  84.4 0.6%  75,737  0.12% 
4.5 Graph Connectivity Augmentation Study
In the above experiments, we employ gPool layers with graph connectivity augmentation by using the graph power in Section 3.4. Here, we conduct experiments on node classification tasks to investigate the benefits of graph connectivity augmentation based on gUNets. We remove the graph connectivity augmentation from gPool layers while keeping other settings the same for fairness of comparisons. Table 6 provides comparison results between gUNets with and without graph connectivity augmentation. The results show that the absence of graph connectivity augmentation will cause consistent performance degradation on all of three datasets. This demonstrates that graph connectivity augmentation via graph power can help with the graph connectivity and information transfer among nodes in sampled graphs.
4.6 Network Depth Study of Graph UNets
Since the network depth in terms of the number of blocks in encoder and decoder parts is an important hyperparameter in the gUNets, we conduct experiments to investigate the relationship between network depth and performance in terms of node classification accuracy. We use different network depths on node classification tasks and report the classification accuracies. The results are summarized in Table 7. We can observe from the results that the performance improves as network goes deeper until a depth of 4. The overfitting problem happens in deeper networks and prevents networks from improving when the depth goes beyond that. In image segmentation, UNet models with depth 3 or 4 are commonly used (Badrinarayanan et al., 2017; Çiçek et al., 2016), which is consistent with our choice in experiments. This indicates the capacity of gPool and gUnpool layers in receptive field enlargement and highlevel feature encoding even working with very shallow networks.
4.7 Parameter Study of Graph Pooling Layers
Since our proposed gPool layer involves extra parameters, we compute the number of additional parameters based on our gUNets. The comparison results between gUNets with and without gPool or gUnpool layers on dataset Cora are summarized in Table 8. From the results, we can observe that gPool layers in UNet model only adds 0.12% additional parameters but can promote the performance by a margin of 2.3%. We believe this negligible increase of extra parameters will not increase the risk of overfitting. Compared to gUNets without gPool or gUnpool layers, the encoderdecoder architecture with our gPool and gUnpool layers yields significant performance improvement.
5 Conclusion
In this work, we propose novel gPool and gUnpool layers in gUNets networks for network embedding. The gPool layer implements the regular global max pooling operation on graph data. It samples a subset of important nodes to enable highlevel feature encoding and receptive field enlargement. By employing a trainable projection vector, gPool layers sample nodes based on their scalar projection values. Furthermore, we propose the gUnpool layer which applies unpooling operations on graph data. By using the position information of nodes in the original graph, gUnpool layer performs the inverse operation of the corresponding gPool layer and restores the original graph structure. Based on our gPool and gUnpool layers, we propose the graph UNets (gUNets) architecture which uses a similar encoderdecoder architecture as regular UNet on image data. Experimental results demonstrate that our gUNets achieve performance improvements as compared to other GNNs on transductive learning tasks. To avoid the isolated node problem that may exist in sampled graphs, we employ the graph power to improve graph connectivity. Ablation studies indicate the contributions of our graph connectivity augmentation approach.
Acknowledgments
This work was supported in part by National Science Foundation grants IIS1908166 and IIS1908198.
References
 Badrinarayanan et al. (2017) Badrinarayanan, V., Kendall, A., and Cipolla, R. Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (12):2481–2495, 2017.
 Bahdanau et al. (2015) Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations, 2015.
 Borgwardt et al. (2005) Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S., Smola, A. J., and Kriegel, H.P. Protein function prediction via graph kernels. Bioinformatics, 21(suppl_1):i47–i56, 2005.
 Bronstein et al. (2017) Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
 Bruna et al. (2014) Bruna, J., Zaremba, W., Szlam, A., and Lecun, Y. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations, 2014.
 Chepuri & Leus (2016) Chepuri, S. P. and Leus, G. Subsampling for graph power spectrum estimation. In Sensor Array and Multichannel Signal Processing Workshop, pp. 1–5. IEEE, 2016.
 Çiçek et al. (2016) Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., and Ronneberger, O. 3D UNet: learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 424–432. Springer, 2016.
 Dai et al. (2016) Dai, H., Dai, B., and Song, L. Discriminative embeddings of latent variable models for structured data. In International Conference on Machine Learning, pp. 2702–2711, 2016.
 Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852, 2016.
 Dobson & Doig (2003) Dobson, P. D. and Doig, A. J. Distinguishing enzyme structures from nonenzymes without alignments. Journal of molecular biology, 330(4):771–783, 2003.
 Duvenaud et al. (2015) Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., and Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pp. 2224–2232, 2015.

Fey et al. (2018)
Fey, M., Eric Lenssen, J., Weichert, F., and Müller, H.
SplineCNN: Fast geometric deep learning with continuous Bspline
kernels.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 869–877, 2018.  Gama et al. (2019) Gama, F., Marques, A. G., Leus, G., and Ribeiro, A. Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing, 67(4):1034–1049, 2019.
 Gao & Ji (2017) Gao, H. and Ji, S. Efficient and invariant convolutional neural networks for dense prediction. In 2017 IEEE International Conference on Data Mining, pp. 871–876. IEEE, 2017.
 Gao et al. (2018) Gao, H., Wang, Z., and Ji, S. Largescale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1416–1424. ACM, 2018.
 Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.
 Gong et al. (2014) Gong, Y., Jia, Y., Leung, T., Toshev, A., and Ioffe, S. Deep convolutional ranking for multilabel image annotation. In Proceedings of the International Conference on Learning Representations, 2014.
 Gori et al. (2005) Gori, M., Monfardini, G., and Scarselli, F. A new model for learning in graph domains. In 2005 IEEE International Joint Conference on Neural Networks, volume 2, pp. 729–734, 2005.
 Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
 He et al. (2017) He, K., Gkioxari, G., Dollár, P., and Girshick, R. Mask rcnn. IEEE International Conference on Computer Vision, 2017.
 Henaff et al. (2015) Henaff, M., Bruna, J., and LeCun, Y. Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163, 2015.
 Huang et al. (2017) Huang, G., Liu, Z., Weinberger, K. Q., and van der Maaten, L. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition, 2017.
 Isola et al. (2017) Isola, P., Zhu, J.Y., Zhou, T., and Efros, A. A. Imagetoimage translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976. IEEE, 2017.
 Jégou et al. (2017) Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., and Bengio, Y. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 11–19, 2017.
 Kalchbrenner et al. (2014) Kalchbrenner, N., Grefenstette, E., and Blunsom, P. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 1, pp. 655–665, 2014.
 Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semisupervised classification with graph convolutional networks. International Conference on Learning Representations, 2017.
 LeCun et al. (2012) LeCun, Y., Bottou, L., Orr, G. B., and Müller, K.R. Efficient backprop. In Neural networks: Tricks of the trade, pp. 9–48. Springer, 2012.
 Long et al. (2015) Long, J., Shelhamer, E., and Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, 2015.
 Monti et al. (2017) Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124, 2017.
 Niepert et al. (2016) Niepert, M., Ahmed, M., and Kutzkov, K. Learning convolutional neural networks for graphs. In International Conference on Machine Learning, pp. 2014–2023, 2016.
 Noh et al. (2015) Noh, H., Hong, S., and Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528, 2015.
 Perozzi et al. (2014) Perozzi, B., AlRfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM, 2014.
 Ronneberger et al. (2015) Ronneberger, O., Fischer, P., and Brox, T. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Springer, 2015.
 Scarselli et al. (2009) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
 Schlichtkrull et al. (2018) Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., and Welling, M. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pp. 593–607. Springer, 2018.
 Sen et al. (2008) Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and EliassiRad, T. Collective classification in network data. AI magazine, 29(3):93, 2008.
 Simonovsky & Komodakis (2017) Simonovsky, M. and Komodakis, N. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702, 2017.
 Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
 Veličković et al. (2017) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2017.
 Yanardag & Vishwanathan (2015) Yanardag, P. and Vishwanathan, S. A structural smoothing framework for robust graph comparison. In Advances in Neural Information Processing Systems, pp. 2134–2142, 2015.

Yang et al. (2016)
Yang, Z., Cohen, W., and Salakhudinov, R.
Revisiting semisupervised learning with graph embeddings.
In International Conference on Machine Learning, pp. 40–48, 2016.  Ying et al. (2018) Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810, 2018.
 Yu & Koltun (2016) Yu, F. and Koltun, V. Multiscale context aggregation by dilated convolutions. In Proceedings of the International Conference on Learning Representations, 2016.
 Zhang et al. (2018) Zhang, M., Cui, Z., Neumann, M., and Chen, Y. An endtoend deep learning architecture for graph classification. In ThirtySecond AAAI Conference on Artificial Intelligence, pp. 4438–4445, 2018.
 Zhao et al. (2015a) Zhao, H., Lu, Z., and Poupart, P. Selfadaptive hierarchical sentence model. In TwentyFourth International Joint Conference on Artificial Intelligence, 2015a.
 Zhao et al. (2015b) Zhao, J. J., Mathieu, M., Goroshin, R., and LeCun, Y. Stacked whatwhere autoencoders. CoRR, abs/1506.02351, 2015b.
 Zitnik & Leskovec (2017) Zitnik, M. and Leskovec, J. Predicting multicellular function through multilayer tissue networks. Bioinformatics, 33(14):i190–i198, 2017.
Comments
There are no comments yet.