1 Introduction
Graphs nowadays become ubiquitous owing to the ability to model complex systems such as social relationships, biological molecules, and publication citations. The problem of classifying graphstructured data is fundamental in many areas. Besides, since there is tremendous amount of unlabeled data in nature and labeling data is often expensive and timeconsuming, it is often challenging and crucial to analyze graphs in a semisupervised manner. For instance, for semisupervised node classification in citation networks, where nodes denote articles and edges represent citation, the task is to predict the label of every article with only few labeled data.
As an efficient and effective approach to graph analysis, network embedding has attracted a lot of research interests. It aims to learn lowdimensional representations for nodes while still preserve the topological structure and node feature attributes. Many work has been proposed for network embedding, which can be used in the node classification task, such as DeepWalk [Perozzi et al.2014], LINE [Tang et al.2015], and node2vec [Grover and Leskovec2016]
. They convert the graph structure into sequences by performing random walks on the graph. Then, the first and secondorder structural similarities can be captured based on the cooccurence rate in these sequences. However, they are unsupervised algorithms and ignore feature attributes of nodes. Therefore, they cannot perform node classification tasks in an endtoend fashion. Unlike previous randomwalkbased approaches, employing neural networks on graphs has been studied extensively in recent years. In the graph neural network (GNN) model, both of the highly nonlinear topological structure and node attributes are fed into networks to obtain the graph embedding. Using an information diffusion mechanism, GNNs update states of the nodes and propagate them until a stable equilibrium
[Scarselli et al.2009]. Recently, there is an increasing research interest in applying convolutional operations on the graph. These graph convolutional networks (GCNs) [Kipf and Welling2017, Veličković et al.2018] are based on the neighborhood aggregation scheme which generates node embedding by combining information from neighborhoods. Comparing with conventional methods, GCNs achieve promising performance in various tasks such as node classification and graph classification [Defferrard et al.2016].Nevertheless, GCNbased models are usually shallow and lack the “graph pooling” mechanism, which restricts the scale of the receptive field. For example, there are only 2 layers in GCN [Kipf and Welling2017]. As each graph convolutional layer acts as the approximation of aggregation on the firstorder neighbors, the 2layer GCN model only aggregates information from 2hop neighborhoods for each node. Because of the restricted receptive field, the model has difficulty in obtaining adequate global information. However, it has been observed from the reported results [Kipf and Welling2017] that simply adding more layers will degrade the performance. As explained in [Li et al.2018], each GCN layer acts as a form of Laplacian smoothing in essence, which makes the features of nodes in the same connected component similar. Thereby, adding too many convolutional layers will result in the output features oversmoothed and make them indistinguishable. Meanwhile, deeper neural networks with more parameters are harder to train. Although some recent methods [Chen et al.2018, Xu et al.2018, Ying et al.2018] try to get the global information through deeper models, they are either unsupervised models or need many training examples. As a result, they are still not capable of solving the semisupervised node classification task directly.
To this end, we propose a novel architecture of Hierarchical Graph Convolutional Networks, HGCN for brevity, for node classification on graphs. Inspired from the flourish of applying deep architectures and the pooling mechanism into image classification tasks, the HGCN model is able to increase the receptive field of graph convolutions and better capture global information. As illustrated in Figure 1, HGCN mainly consists of several coarsening layers and refining layers. For each coarsening layer, the graph convolutional operation is first conducted to learn node representations. Then, a coarsening operation is performed to aggregate structurally similar nodes into hypernodes, as depicted in Figure 2. After such coarsening operation, each hypernode represents a local structure, which can facilitate exploiting global structures on the graph. Following coarsening layers, we apply symmetric graph refining layers to restore the original graph structure for node classification tasks. Such a hierarchical model manages to comprehensively capture nodes’ information from local to global perspectives, leading to better node representations.
The main contributions of this paper are twofold. Firstly, to the best of our knowledge, it is the first work to design a deep hierarchical model for the semisupervised node classification task. Compared to previous work, the proposed model consists of more layers with larger receptive fields, which is able to obtain more global information through the coarsening and refining procedures. Secondly, we conduct extensive experiments on a variety of public datasets and show that the proposed method constantly outperforms other stateoftheart approaches. Particularly, our model gains a considerable improvement over other approaches with only few labeled samples provided for each class.
The remaining part of this paper is organized as follows. Section 2 reviews prior related work; Section 3 is devoted to present the proposed hierarchical structureaware graph convolutional networks; Section 4 provides experiments and analysis; we conclude this paper and point out future directions in Section 5.
2 Related Work
In this section, we review some previous work on graph convolutional networks for semisupervised node classification, hierarchical representation learning on graphs, and graph reduction algorithms.
Graph convolutional networks for semisupervised learning.
In the past few years, there has been a surge of applying convolutions on graphs. These approaches are essentially based on the neighborhood aggregation scheme and can be further divided into two branches: spectral approaches and spatial approaches.The spectral approaches are based on the spectral graph theory to define parameterized filters. [Bruna et al.2014] first defines the convolutional operation in the Fourier domain. However, its heavy computational burden limits the application to largescale graphs. In order to improve efficiency, [Defferrard et al.2016] proposes ChebNet to approximate the polynomial filters by means of a Chebyshev expansion of the graph Laplacian. [Kipf and Welling2017] further simplifies the ChebNet by truncating the Chebyshev polynomial to the firstorder neighborhood. DGCN [Zhuang and Ma2018] uses random walks to construct a positive mutual information matrix. Then, it utilizes that matrix along with the graph’s adjacency matrix to encode both local consistency and global consistency.
The spatial approaches generate node embedding by combining the neighborhood information in the vertex domain. MoNet [Monti et al.2017] and SplineCNN [Fey et al.2018] integrate the local signals by designing a universe patch operator. To generalize to unseen nodes in an inductive setting, GraphSAGE [Hamilton et al.2017]
samples a fixed number of neighbors and employs several aggregation functions, such as concatenation, maxpooling, and LSTM aggregator. GAT
[Veličković et al.2018] introduces the attention mechanism to model different influences of neighbors with learnable parameters. [Gao et al.2018] selects a fixed number of neighborhood nodes for each feature and enables the use of regular convolutional operations on Euclidean spaces. However, the above two branches of GCNs are usually shallow and cannot obtain adequate global information as a consequence.Hierarchical representation learning on graphs. Some work has been proposed for learning hierarchical information on graphs. [Chen et al.2018, Liang et al.] use a coarsening procedure to construct a coarsened graph of smaller size and then employ unsupervised methods, such as Deepwalk [Perozzi et al.2014] and node2vec [Grover and Leskovec2016] to learn node embedding based on that coarsened graph. Then, they conduct a refining procedure to get the original graph embedding. Their twostage methods is not capable of utilizing node attribute information and cannot conduct node classification task in an endtoend fashion either. JKNets [Xu et al.2018] proposes general layer aggregation mechanisms to combine the output representation in every GCN layer. However, it can only propagate information across edges of the graph and are unable to aggregate information hierarchically. Therefore, the hierarchical structure of the graph cannot be learned by JKNets. To solve this problem, DiffPool [Ying et al.2018] proposes a pooling layer for graph embedding to reduce the size by a differentiable network. As DiffPool is designed for graph classification tasks, it cannot generate embedding for every node in the graph, hence it cannot be directly applied in node classification scenarios.
Graph reduction. Many approaches have been proposed to reduce the graph size without losing too much information, which facilitate downstream network analysis tasks such as community discovery and data summarization. There are two main classes of methods that reduce the graph size: graph sampling and graph coarsening. The first category is based on graph sampling strategy [Papagelis et al.2013, Hu and Lau2013, Chen et al.2017]
, which might lose key information during the sampling process. The second category applies graph coarsening strategies that collapse structuresimilar nodes into hypernodes to generate a series of increasingly coarser graphs. The coarsening operation typically consists of two steps, i.e. grouping and collapsing. At first, every vertex is assigned to groups in a heuristic manner. Here a group refers to a set of nodes that constitute a hypernode. Then, these groups are used to generate a coarser graph. For an unmatched node,
[Hendrickson and Leland1995] randomly selects one of its unmatched neighbors and merge these two vertices. [Karypis and Kumar1998] merges the two unmatched nodes by selecting those with the maximum weight edge. [LaSalle and Karypis2015] uses a secondary jump during matching.However, these graph reduction approaches are usually used in unsupervised scenarios, such as community detection and graph partition. For semisupervised node classification tasks, existing graph reduction methods cannot be used directly, as they are not capable of learning complex attributive and structural features of graphs. In this paper, HGCN conducts graph reduction like pooling mechanisms on Euclidean data. In this sense, our work bridges graph reduction for unsupervised tasks to the practical but more challenging semisupervised node classification problems.
3 The Proposed Method
3.1 Preliminaries
3.1.1 Notations and Problem Definition
For the input undirected graph , where and are respectively the set of nodes and edges, let be the adjacency matrix and be the node feature matrix. For the HGCN network with layers, graph at layer is represented as with
nodes. The adjacency matrix and hidden representation matrix of
is represented by and respectively. Since coarsening layers and refining layers are symmetrical, is identical to .Given the labeled node set containing nodes, where each node is associated with a label , our objective is to predict labels of .
3.1.2 Graph Convolutional Networks
Graph convolutional networks achieve promising generalization in various tasks and our work is built upon the GCN module. At layer , taking graph adjacency matrix and previous hidden representation matrix as input, each GCN module outputs a hidden representation matrix , which is described as:
(1) 
where , , adjacency matrix with selfloop , is the degree matrix of , and is a trainable weight matrix.
3.2 The Overall Architecture
For a HGCN network of layers, the ^{th} graph coarsening layer first conducts a graph convolutional operation as formulated in Eq. (1) and then aggregates structurally similar nodes into hypernodes, producing a coarser graph and node embedding matrix with less nodes. The corresponding adjacent matrix and will be fed into the ^{th} layer. Symmetrically, the ^{th} graph refining layer also performs a graph convolution at first and then refines the coarsened graph embedding back to to restore the finer graph structure. In order to boost optimization in deeper networks, we add shortcut connections [He et al.2016] across each coarsened graph and its corresponding refined part.
Since the topological structure of the graph changes between layers, we further introduce a node weight embedding matrix
, which transforms the number of nodes contained in each hypernode into realvalued vectors. Besides, we add multiple channels by employing
different GCNs to explore different feature subspaces.The graph coarsening layers and refining layers altogether integrate different levels of node features and thus avoid oversmmothing during repeated neighborhood aggregation. After the refining process, we obtain a node embedding matrix
, where each row represents a node representation vector. In order to classify each node, we apply an additional GCN module followed by a softmax layer on
.3.3 The Graph Coarsening Layer
Every graph coarsening layer consists of two steps: graph convolution and graph coarsening. A GCN module is firstly used to extract structural and attributive features by aggregating neighborhoods’ information as described in Eq. (1). For the graph coarsening procedure, we design the following two hybrid grouping strategies to assign the structure similar nodes into a hypernode in the coarser graph.
Structural equivalence grouping (SEG). If two nodes share the same set of neighbors, they are considered to be structurally equivalent. We then assign these two nodes to be a hypernode. For example, as illustrated in Figure 2, node , , and are structurally equivalent, so these three nodes are allocated as a hypernode. We mark all these structurally equivalent nodes and leave other nodes unmarked.
Structural similarity grouping (SSG). Then, we calculate the structural similarity between the unmarked node pairs and as the normalized connection strength :
(2) 
where is the edge weight between and , and is the node weight.
We iteratively take out an unmarked node and calculate similarity scores with all its unmarked neighbors. After that, we select its neighbor node which has the largest structural similarity to form a new hypernode and mark the two nodes. Specially, if one node is left unmarked and all of its neighbors are marked, it will be marked as well and constitutes a hypernode by itself. For example, in Figure 2, node pair and has the largest structural similarity score, so they are assigned as a group. After that, since only node is remained unmarked, it constitutes a hypernode by itself.
Please note that if we take out unmarked nodes in different order, the resulting hypergraph will be different. As a node with less neighbors has less chance to be grouped, we give these neighbors higher priority. Therefore, in this paper, we take out the unmarked node in ascending order according to the number of neighbors.
Using above two grouping strategies, we are able to acquire all the hypernodes. For one hypernode , its node weight is defined as the summation over the weight of each node contained in this hypernode. Its edge weight to neighbor node is calculated as the summation over edge weights adjacent to node of nodes contained in hypernode . The updated node weights and edge weights will be used in Eq. (2) in the next coarsening layer.
In order to help restore the coarsened graph to original graph, we preserve the collapsing relationship between nodes and their corresponding hypernodes in a grouping matrix , which can further help restore the graph back to the finer one. Formally, at layer , we construct , whose element in is calculated as:
(3) 
An example of grouping matrix is given in Figure 2. Please note that in this illustration, since node constitutes its hypernode by itself. Next, the hidden node embedding matrix is determined as:
(4) 
In the end, we generate a collapsed graph , whose adjacency matrix can be calculated by:
(5) 
The coarser graph along with the current representation matrix will be fed into the next layer as input. The resulting node embedding to generate in each coarsening layer will then be of lower resolution. The graph coarsening procedure is summarized in Algorithm 1.
3.4 The Graph Refining Layer
To restore the original topological structure of the graph and further facilitate node classification, we stack the same numbers of graph refining layers as coarsening layers. Like the coarsening procedure, each refining layer contains two steps, namely generating node embedding vectors and restoring node representations.
To learn a hierarchical representation of nodes, a GCN is employed at first. Since we have saved the collapsing relationship in the matching matrix during the coarsening process, we utilize to restore the refined node representation matrix of layer
. We further employ residual connections between the two corresponding coarsening and refining layers. In summary, node representations are computed by:
(6) 
3.5 Node Weight Embedding and Multiple Channels
Since different hypernodes may carry different numbers of nodes, as depicted in Figure 2, we assume such node weights could reflect the hierarchical characteristics of coarsened graphs. Here we transform the node weight into realvalued vectors by looking up one randomly initialized node weight embedding matrix where is a fixedsized weight set and is the dimension of the embedding. We apply node weight embedding in every coarsening and refining layer. For example, for graph , we obtain its representation and its node weight embedding . We then concat and and the resulting matrix will be fed into next layer subsequently.
Multihead mechanisms help explore features in different subspaces and HGCN employs multiple channels on GCN to obtain rich information jointly at each layer. After obtained channels , we perform weighted average on these feature maps:
(7) 
3.6 The Output Layer
Finally, in the output layer , we use a GCN with a softmax layer on
to output probabilities of nodes:
(8) 
where is a trainable weight matrix and denotes the probabilities of nodes belonging to each class .
The loss function is defined as the crossentropy of predictions over the labeled nodes:
(9) 
where is the indicator function, is the true label for , is the prediction for labeled node , and is the predicted probability that is of class .
4 Experiments and Analysis
In this section, we summarize the experimental setting, present the results, analyze the impact on available labeled data, and conduct parameter sensitivity analysis.
4.1 Experimental Settings
4.1.1 Datasets
For a comprehensive comparison with stateoftheart methods, we use four widelyused datasets including 3 citation networks and 1 knowledge graph. The statistics of these datasets is summarized in Table
1. We set the node weight and edge weight of the graph to 1 for all four datasets. The dataset configuration follows the same setting in [Yang et al.2016, Kipf and Welling2017] for fair comparison. For citation networks, documents and citations are treated as vertices and edges, respectively. For the knowledge graph, each triplet will be converted into three nodes and two undirected edges and , where and are entities and is the relation between them. During training, only 20 labels per class are used for each citation network and only 1 label per class is used for NELL during training. Besides, 500 nodes in each dataset is selected randomly as the validation set. We do not use the validation set for model training.Dataset  Cora  Citeseer  Pubmed  NELL 
Type  Citation network  Knowledge graph  
# Vectices  2,708  3,327  19,717  65,755 
# Edges  5,429  4,732  44,338  266,144 
# Classes  7  6  3  210 
# Features  1,433  3,703  500  5,414 
Labeling rate  0.052  0.036  0.003  0.003 
4.1.2 Baseline Algorithms
To evaluate the performance of HGCN, we compare our method with the following representative methods:

DeepWalk [Perozzi et al.2014] generates the node embedding via random walks in an unsupervised manner, then nodes are classified by feeding the embedding vectors into a SVM classifier.

Planetoid [Yang et al.2016] not only learns node embedding but also predicts the context in graph. It also leverages label information to build both transductive and inductive formulations.

GCN [Kipf and Welling2017] produces node embedding vectors by truncating the Chebyshev polynomial to the firstorder neighborhoods.

GAT [Veličković et al.2018] generates node embedding vectors by modeling the differences between the node and its onehop neighbors.

DGCN [Zhuang and Ma2018] utilizes the graph adjacency matrix and the positive mutual information matrix to encode both local consistency and global consistency.
4.1.3 Parameter Setting
We train our model using the Adam optimizer with learning rate of for epochs. The dropout is applied on all feature vectors with rates of . Besides, the regularization factor is set to . Considering different scales of datasets, we set the total number of layers to for citation networks and for the knowledge graph, and apply channel GCNs in both coarsening and refining layers.
4.2 Node Classification Results
To demonstrate the overall performance of semisupervised node classification, we compare the proposed method with other stateoftheart methods. The performance in terms of accuracy is shown in Table 2. The best performance of each column is highlighted in boldface. The performance of our proposed method is reported based on the average of 20 measurements. Note that running GAT on the NELL dataset requires more than 64G memory, hence its performance on NELL is not reported.
Method  Cora  Citeseer  Pubmed  NELL 
DeepWalk  67.2%  43.2%  65.3%  58.1% 
Planetoid  75.7%  64.7%  77.2%  61.9% 
GCN  81.5%  70.3%  79.0%  73.0% 
GAT  83.0 ± 0.7%  72.5 ± 0.7%  79.0 ± 0.3%  – 
DGCN  83.5%  72.6%  79.3%  74.2% 
HGCN  84.5 ± 0.5%  72.8 ± 0.5%  79.8 ± 0.4%  80.1 ± 0.4% 
The results show that the proposed method consistently outperforms other stateoftheart methods, which verify the effectiveness of the proposed coarsening and refining mechanisms. Regarding traditional randomwalkbased algorithms such as DeepWalk and Planetoid, their performance is relatively poor. Deepwalk cannot model the attribute information, which heavily restricts its performance. Though Planetoid combines supervised information with unsupervised loss, there is still information loss during random sampling. To avoid that problem, GCN and GAT employ the neighborhood aggregation scheme to boost performance. GAT outperforms GCN as it can model different relations to different neighbors rather than with a predefined order. DGCN further jointly models both local and global consistency, yet its global consistency is still attained through random walks. On the contrary, the proposed HGCN manages to capture global information through different levels of convolutional layers and achieves the best results among all four datasets. Notably, compared with citation networks, HGCN surpasses other baselines by larger margins on the NELL dataset. It is probably because there are less training samples per class on NELL than citation networks. Under such circumstance, training nodes are further away from testing nodes on average. As a result, the proposed HGCN with larger receptive field and deeper layers shows more obvious improvements than baselines.
4.3 Impact of Varying Number of Training Data
We suppose that the larger receptive field in the convolutional model promotes the propagation of features and labels on graphs. To verify the proposed HGCN can get a larger receptive field, we reduce the number of training samples to check if HGCN still performs well when limited labeled data is given. As in nature there are plenty of unlabeled data, it is also of great significance to train the model with limited labeled data. In this section we conduct experiments with different numbers of labeled instances of the Pubmed dataset. We vary the number of labeled vertices from 20 to 5 per class, where the labeled data is randomly chosen from the original training set. All parameters are the same as previously described. The corresponding performance in terms of accuracy is reported in Table 3.
Method  5  10  15  20 
GCN  69.0%  72.2%  76.9%  79.0% 
GAT  70.3%  75.4%  77.3%  79.0% 
DGCN  70.1%  76.7%  77.4%  79.3% 
HGCN  76.5%  78.6%  79.3%  79.8% 
From the table, it can be observed that, our method outperform other baselines in all cases. With the number of labeled data decreasing, our method obtains a larger margin over these baseline algorithms. Especially when the number of labeled node per class is only 5 ( 0.08% labeling rate), the accuracy of HGCN exceeds GCN, DGCN, and GAT by 7.5%, 6.4%, and 6.2% respectively. When the number of training data decreases, it is more likely that nodes used for testing will be further away from the training nodes. Only when the receptive field is large enough can information from those training nodes be captured. As the receptive field of GCN and GAT does not exceed 2hop neighborhoods, these baselines downgrade considerably. However, owing to its larger receptive field, the performance of HGCN declines slightly when labeled data decreases dramatically. Overall, it is verified that the proposed HGCN is wellsuited when training data is extremely scarce.
4.4 Ablation Study
In order to verify the effectiveness of the proposed coarsening and refining layers, we conduct ablation study on coarsening and refining layers and node weight embeddings respectively in this section.
4.4.1 Coarsening and Refining Layers
We remove all coarsening and refining operations of HGCN and compare its performance with the original HGCN. The results are shown in Table 4. From the results, it is obvious that the proposed HGCN has better performance compared with HGCN with no coarsening mechanisms on all datasets. It can be verified that the coarsening and refining mechanisms contribute to the performance improvements, since they can obtain global information with larger receptive fields.
Method  Cora  Citeseer  Pubmed  NELL  

80.3%  70.5%  76.8%  75.9%  
HGCN  84.5%  72.8%  79.8%  80.1% 
4.4.2 Node Weight Embeddings
To study the impact of node weight embeddings, we compare HGCN with no node weight embeddings used. It can be seen from Table 5 that the model with node weight embeddings performs better, which verifies the necessity to add this embedding vector in the node embeddings.
Method  Cora  Citeseer  Pubmed  NELL  

84.2%  72.4%  79.5%  79.6%  
HGCN  84.5%  72.8%  79.8%  80.1% 
4.5 Sensitivity Analysis
Last, we conduct parameter sensitivity analysis. Specifically, we investigate how different numbers of coarsening layers and different numbers of channels will affect the results respectively. The performance is reported in terms of accuracy on all four datasets. While one parameter studied in the sensitivity analysis is changed, other hyperparameters remain the same.
4.5.1 Effects of Coarsening Layers
Since the coarsening layers in our model control the granularity of the receptive field enlargement, we conduct the experiment with 1 to 8 coarsening and symmetric refining layers, where the results are shown in Figure (a)a. It is seen that the performance of HGCN achieves the best when there are 4 coarsening layers on citation networks and 5 on the knowledge graph. It is suspected that, since less labeled nodes are supplied on NELL than others, deeper layers and larger receptive fields are needed. However, when adding too many coarsening layers, the performance drops due to overfitting.
4.5.2 Effects of Channel Numbers
Next, we investigate the impact of different numbers of channels on the performance. Multiple channels benefit the graph convolutional network model, since they explore different feature subspaces, as shown in Figure (b)b. From the figure, it can be found that the performance improves with the number of channels increasing until 4 channels, which demonstrates that more channels are helpful for capturing accurate node features. Nevertheless, too many channels will inevitably introduce redundant parameters to the model, leading to overfitting as well.
5 Conclusions
In this paper, we propose a novel hierarchical graph convolutional networks for the semisupervised node classification task. The proposed HGCN consists of coarsening layers and symmetric refining layers. By collapsing structurally similar nodes into hypernodes, our model can get a larger receptive field and enable sufficient information propagation. Compared with other previous work, our proposed HGCN is deeper and can fully utilize both local and global information. Comprehensive experiments confirm that the proposed method can consistently outperform other stateoftheart methods. In particular, our method achieves substantial gains over them in the case that labeled data is extremely scarce.
The study of semisupervised learning over networks, in general remains widely open with various challenges and application in diverse areas. For the future of our work, we plan to investigate the following two directions. On the one hand, we want to further apply the proposed HGCN to other node classification scenarios, especially in heterogeneous networks. On the other hand, we will explore more efficient convolutional filters for better performance.
References
 [Bruna et al.2014] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral Networks and Locally Connected Networks on Graphs. In ICLR, 2014.
 [Chen et al.2017] Haibo Chen, Jianfei Zhao, Xiaoji Chen, Ding Xiao, and Chuan Shi. Visual analysis of large heterogeneous network through interactive centrality based sampling. In ICNSC, pages 378–383, May 2017.
 [Chen et al.2018] Haochen Chen, Bryan Perozzi, Yifan Hu, and Steven Skiena. Harp: Hierarchical representation learning for networks. In AAAI, 2018.
 [Defferrard et al.2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3844–3852, 2016.

[Fey et al.2018]
Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Müller.
SplineCNN: Fast geometric deep learning with continuous Bspline kernels.
In CVPR, 2018.  [Gao et al.2018] Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. Largescale learnable graph convolutional networks. In KDD, pages 1416–1424, 2018.
 [Grover and Leskovec2016] Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In KDD, pages 855–864, 2016.
 [Hamilton et al.2017] William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1025–1035, 2017.
 [He et al.2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
 [Hendrickson and Leland1995] Bruce Hendrickson and Robert Leland. A multilevel algorithm for partitioning graphs. In Supercomputing, 1995.
 [Hu and Lau2013] Pili Hu and Wing Cheong Lau. A survey and taxonomy of graph sampling. CoRR, abs/1308.5865, 2013.
 [Karypis and Kumar1998] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998.
 [Kipf and Welling2017] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 [LaSalle and Karypis2015] Dominique LaSalle and George Karypis. Multithreaded modularity based graph clustering using the multilevel paradigm. J. Parallel Distrib. Comput., 76(C):66–80, February 2015.
 [Li et al.2018] Qimai Li, Zhichao Han, and Xiao ming Wu. Deeper insights into graph convolutional networks for semisupervised learning. In AAAI, pages 3538–3545, 2018.
 [Liang et al.] J. Liang, S. Gurukar, and S. Parthasarathy. MILE: A MultiLevel Framework for Scalable Graph Embedding. ArXiv eprints.
 [Monti et al.2017] F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR, pages 5425–5434, 2017.
 [Papagelis et al.2013] M. Papagelis, G. Das, and N. Koudas. Sampling online social networks. TKDE, 25(3):662–676, March 2013.
 [Perozzi et al.2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, pages 701–710, 2014.
 [Scarselli et al.2009] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The Graph Neural Network Model. TNN, 20(1):61–80, 2009.
 [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, pages 1067–1077, 2015.
 [Veličković et al.2018] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.
 [Xu et al.2018] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Kenichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In ICML, pages 5453–5462, 2018.
 [Yang et al.2016] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semisupervised learning with graph embeddings. In ICML, pages 40–48, 2016.
 [Ying et al.2018] Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. ArXiv eprints, abs/1806.08804, 2018.
 [Zhuang and Ma2018] Chenyi Zhuang and Qiang Ma. Dual graph convolutional networks for graphbased semisupervised classification. In WWW, pages 499–508, 2018.
Comments
There are no comments yet.