Learning Graph Embedding with Limited Labeled Data: An Efficient Sampling Approach

Semi-supervised graph embedding methods represented by graph convolutional network has become one of the most popular methods for utilizing deep learning approaches to process the graph-based data for applications. Mostly existing work focus on designing novel algorithm structure to improve the performance, but ignore one common training problem, i.e., could these methods achieve the same performance with limited labelled data? To tackle this research gap, we propose a sampling-based training framework for semi-supervised graph embedding methods to achieve better performance with smaller training data set. The key idea is to integrate the sampling theory and embedding methods by a pipeline form, which has the following advantages: 1) the sampled training data can maintain more accurate graph characteristics than uniformly chosen data, which eliminates the model deviation; 2) the smaller scale of training data is beneficial to reduce the human resource cost to label them; The extensive experiments show that the sampling-based method can achieve the same performance only with 10%-50% of the scale of training data. It verifies that the framework could extend the existing semi-supervised methods to the scenarios with the extremely small scale of labelled data.


page 1

page 2

page 3

page 4


Active Learning for Graph Embedding

Graph embedding provides an efficient solution for graph analysis by con...

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

Most semi-supervised learning methods over-sample labeled data when cons...

Large-Scale Semi-Supervised Learning via Graph Structure Learning over High-Dense Points

We focus on developing a novel scalable graph-based semi-supervised lear...

Graph-based Active Learning for Semi-supervised Classification of SAR Data

We present a novel method for classification of Synthetic Aperture Radar...

Semi-Supervised Online Structure Learning for Composite Event Recognition

Online structure learning approaches, such as those stemming from Statis...

Graph-based Semi-supervised Learning: A Comprehensive Review

Semi-supervised learning (SSL) has tremendous value in practice due to i...

Semi-Supervised Joint Estimation of Word and Document Readability

Readability or difficulty estimation of words and documents has been inv...

1 Introduction

Graph is a natural way to represent and organize data with complicated relationships. But graph data is hard to processed by the machine learning methods directly, especially the deep learning

[11], which has achieved brilliant achievements in various fields. Learning a useful graph representation lies at the heart of many deep learning-based graph mining applications, such as node classification, link prediction, and community detection, etc.

It is now wildly adopted to embed the structure data into vectors for the well developed deep learning methods. Recently, Semi-supervised method represented by grpah convolutional network has been a hot topic in graph embedding area, and massive outstanding works are proposed

111We will use graph convolutional network as the representive of semi-supervised methods through the paper

. Kipfkipf2017semi came up with GCN, the one widely uesd today, has formally brought the field of graphs into neural networks’ era. Since then, plenty of work like GraphSAGE

[7], Graph Attention Networks[16] have been proposed.

(a) Uniformly sampled
(b) Random walk sampled
Figure 1: Comparison between uniformly sampled training nodes and random walk-based sampled training nodes in limited ratio (red nodes).

However, there are two key challenges in applying these semi-supervised methods to specific fields: 1) extreme insufficient of labeled data; 2) out-of-distribution prediction, when the distribution of training set differs from the test set much. Works like GCN[9], GAT[16]

tend to handly pick the training set in order to maintain the distribution-similarity. On the other hand, as this work is proceeding, there are some researchers seek to overcome the challenges by utilizing transfer learning

[8, 19]. But the ’pre-train’ methods needs extensive domain knowledge and pretty long training time.

To address the research gap, we proposed a sampling-based training framework for graph convolution network methods in this paper, which is more scalable and needless of domain knowledge. We integrate a random walk-based sampling strategy into the graph convolution network model training process. In this way, our framework utilizes the sampling algorithm to find out the most representative nodes of a graph, which have proven successful in graph measurement, and graph structure estimating. The comparison between random walk-based sampled nodes and uniformly chosen nodes with limited ratio is shown in Fig.


. When the sampling scale is small, the ’Law of Large Number’ fails, and nodes chosen uniformly are more likely to be in a small sub-graph, which would lost much information during the training process. Several works

[7, 3] also utilize the thought of sampling, the difference between ours and them will be clarified in Section 2.

Our framework can improve the performance of existing graph convolution network methods significantly with the following advantages: 1) the random walk-based sampled training data can maintain more accurate graph characteristics than uniformly chosen data, which eliminate the model deviation; 2) the well-sampled training nodes can estimate the parameters in graph convolution network models effectively; 3) the smaller scale of training data to be labelled would make the existing models work with limited labelled data and save lots of human resource. To demonstrate the sampling method’s verification in training graph convolution network, we combine it with five state-of-the-art GCN-based methods, and evaluate its performance on challenging multi-label node classification problems with limited labelled training data. The result shows that each combined method can outperform its original one with the same scale of training data, or achieve the same accuracy with less labelled data.

The contribution of this paper is summarized as follows:

  • To the best of our knowledge, this is the first work by utilizing the sampling strategy as a preprocess stage to improve the performance of graph convolution network algorithms. It can reduce the scale of labelled data and lower the human resource cost significantly without changing the original methods.

  • We develop a general framework by integrating a random walk sampling strategy with graph convolution network methods, which could make them obtaining better results and extent them to the application with extremely small scale of labelled data.

  • We verify the validation of our idea by evaluating our framework on different real-world networks. The case study of multi-label node classification shows that our framework can make GCN-based methods outperform their original ones.

This work is organized as follows. In Section 2, we formulate the problem definition. In Section 3, we detail the sampling-based training framework for graph convolutional network related methods. We demonstrate our experiment settings and result in Section 4. Finally, we close with a discussion of related work in Section 5 and conclusions of our work in Section 6.

2 Problem Definition

In this paper, we are trying to solve the problem that most of the state-of-the-art graph convolution network methods need a large number of labelled nodes as the training set.

We seek to solve this problem by proposing one sampling-based training framework introduced in section1, we only need to put labels on the tiny-scale nodes set we sampled out and train the graph convolution network model to the same level performance as usual.

Figure 2: Overview of our sampling-based training framework

The most related work is FastGCN[3] and GraphSAGE [7]. These two works also manage to utilize sampling method(we simplify it as F-sampling as feature sampling) to improve the performance. But it is necessary to clarified that their sampling strategies and ours mentioned in this paper are not the same concept. Specially, both FastGCN and GraphSAGE focus on how to downsize the data steam passing through layer to layer in the graph convolution network during the training process, either change the network structure or the neighbour’s information.

We, differently, use sampling strategy (we simplify it as N-sampling as training node sampling) as a pre-process method, and pronounce a general framework, which can be extented easily to almost every kind of node representation methods, without the requirement of knowing specific workflow of the inner methods.

In short, F-sampling strategy aims to select features for the training nodes by changing the workflow of original models, but N-sampling is determined to select the more representative training nodes without changing anything of the original model.

3 Methodology

We seek to build such sampling-based framework for training graph convolution networks. Our framework can be generalized into a three-stage algorithm. First, we need to perform a sampling process on the graph we are going to deal with.


The function f is the sampling strategy. Respectively, represent the sampled nodes, the corresponding feature matrix and labeled vectors. While the B denotes the the expected sampling budget. The original training process will be carried on after the training set has been sampled out by equation 1.

Generally, we demonstrate the training strategy by a generalized equation 2,


The function F represents the training process based on the sampling training dataset. After the model has been well trained, it can be functional as usual.

Our framework integrates these three stages in a pipeline, as shown in Fig. 2. The framework samples a small scale of data as training set, whose time complexity is lower than linear. What’s more, because of the independent of the stages for sampling and training, the framework can be implemented on different models and datasets easily.

3.1 Sampling Strategy

The randomly way for choosing training dataset can be sen as ’uniformly randomly sampling’(a.k.a UR), which select a node by the probability of

. refers to the number of nodes.Because of its randomness, the generalization performance of its correlated networks varies a lot.

In order to overcome the disadvantage of UR, we consider several common used sampling methods: Depth First Sampling(DFS), Breadth First Sampling(BFS), Random Walk.

DFS and BFS are common ways to explore the graph structure, but both has inherent defect. If we set the sample budget as a certain small number, DFS may sampled out a long way through the graph, loosing the information on its way deep into the graph. BFS, on the contrary, could sampled out nodes within a small part of the whole graph.

Regular random walk(RW) is one of the popular methods to explore the network structure by obtaining a series of nodes or edges. It starts from a root vertex . Then push into the traversed node list , and choosing its next hop uniformly from the neighbours of . The probability that will be selected by the probability of , is the degree of node . Repeating this moving strategy for times, we get the whole traversed node list. This makes the most common type of RW.

RW can be performed with low time cost, and it is a exploration and exploitation tradeoff, which overcomes the weakness of BFS and DFS.

But this type of regular random walk still suffers from several flaws. It has a relatively high demand for the structure of the graph. A necessary condition for a regular RW to reach stationary is that the graph must be symmetric, connected, and non-bipartite. When the graph is not connected, a regular random walker would only explore the sub-graph where it starts. This would severely damage the result of training a graph network, for it can never learn the structure or the information of those parts that disconnect with the start continent. More severely, if , the walk process would never be carried on.

Even when the graph is connected, if it is weakly connected, a regular random walker can still get temporarily ”trapped” inside a strongly connected sub-graph[15]. It would take a lot of time to escape, which contradicts with the Low time complexity we seek to achieve.

In order to overcome these drawbacks, we utilize a technique called ”Frontier Sampling”[14], which is an advanced sampling strategy based on random walk. It performs dependent walkers at the same time, who share the candidate list together.

Walkers222’Walker’ and ’Dimension’ have the same definition in ’Frontier Sampling’. So we don’t distinguish one from the other. in ’Frontier Sampling’ are less likely to get stuck in a loosely connected part of the whole graph. And this kind of method can be easily paralleled. We would also carry out a comparison experiment in Section 4.5 on the sampling methods mentioned above, in order to verify the analyse.

Input: input graph , test vertexes , corresponding test features , GCN-based algorithm
Parameter: sampling budget , number of the walkers
Output: task-oriented

1:  Initialize with uniformly randomly chosen nodes
3:  while  do
4:     Select and with probability calculated by
5:     Select a neighbour of uniformly as random
6:     if  not in  then
7:        Add into list
8:        Replace in by
10:     else
11:        continue
12:     end if
13:  end while End sampling process
15:   Training process
Task oriented functioning process
17:  return
Algorithm 1 Sampling-based training framework

3.2 Implementation

The sampling strategy provides a easy and effective way to decrease the scale of the training set scale. We intend to combine this kind of stratge as a prepocess stage of GCN-based methods in a pipline form.

The sampling stage takes the graph , sampling budget and the number of walkers as input and generate the sampled list by performing Line 1-13 in Algorithm1.

The nodes in the sampled list , their corresponds features and the labels will be fed into GCN-based methods. The models then function as it used to be. We formulate the whole structure of our framework as Algorithm 1, which is a typical three-stage framework.

3.3 Feasibility

The node sequence sampled in Algorithm 1 can maintain more accurate graph structure than that by uniformly randomly sampled, which overcomes the second drawback we mentioned in Section 1: out-of-distribution prediction. Considering a important graph characteristic – label density, we assume that each vertex is associated with a label . The label density on graph is , is defined by equation 3, is the indicator function.


Utilizing the same unbiased estimator came up by Zhao zhao2019sampling, which is depicted as equation



is the probability that node is sampled, which equals to in uniformly randomly sampling, and equals to in random walk at steady state, where is the degree of node , and is the number of edges in the graph.

Theorem 1: For a single random walker,


For notation convenience, we depict as , as . The length of sampled nodes is . Combining the equation 3 and 4, the original inequality can be written as :


When we have :


Equation 6 can be simplified and scaled as , Which is always satisfied because .

The situation also suits the condition that . ∎

Theorem 1 tells that the label density estimated by node sequence sampled by random walk is closer to than estimated by uniformly randomly sampled sequence .

Lemma 1: The M-dimensional random walk process is equivalent to the process of a single random walker over .

Lemma 1 is proved by Ribeiroribeiro2012sampling, combining with Theorem 1, we can deduced Theorem 2.

Theorem 2: Estimating label density with nodes sampled by ’Frontier Sampling’ can performs better than nodes sampled uniformly.

Theorem 2 prove the assumption we make before theoretically, and ensures the performance of the Algorithm 1.

3.4 Time Complexity

We come up with the framework to mainly reduce the training dataset scale, then reduce the labor force and compute consumption. So the first stage, i.e., sampling process, should not have a high time complexity. Or the data prepossessing stage may contrary to our original idea.

According to Algorithm 1, the whole sampling procession’s time complexity T depends only on the scale of the sampling budget ,


where represents the number of nodes in the graph G. Thus, the time complexity of sampling algorithm is lower than . Compared with the graph convolutional network methods’ time complexity, the linear time complexity for sampling is acceptable.

4 Evaluation

We verify our proposal by the multi-class classification task on three real-world datasets, including two citation networks and one social network. In the citation networks, the nodes are papers and the edges are the citation relationship. Each paper has a feature vector that contains the information of its contents. Classes implicate the kind of the categories among the papers. And for social network, the nodes represent users using the social media, and an edge between two users means the follower-followed relation. And the details for these datasets are presented in Table 1.

Dataset Type Node Edges Classes
Cora Citation 2,707 5,429 7
Pubmed Citation 19,717 44,338 3
BlogCatalog Social 10,312 333,983 10
Table 1: Overview of graph datasets

4.1 Experimental Settings

We utilize our framework on five GCN-based methods to verify the validation of our proposal. For the sampling strategy, we set . For the training data scales(sampling budget), we range it in

nodes for each dataset. For the baseline algorithms, we choose the same scale of training data from the graphs uniformly randomly; 100 nodes are randomly selected from training set as the validation part. The prediction accuracy is evaluated on another randomly selected 1000 nodes for each dataset. We use the ’Cross Entropy Loss’ as our loss function during the experiments. For the methods combined with our framework, we use

SS- with the original method name to represent them. We performs each experiment 10 times and take the average results as the final results. The experiment on FastGCN is based on the code released by the original author333https://github.com/matenure/FastGCN, and all the other algorithms are implemented based on the the Deep Graph Library (DGL)444https://github.com/dmlc/dgl.

Cora Pubmed BlogCatalog
Training Set 0.5% 1% 5% 10% 0.5% 1% 5% 10% 0.5% 1% 5% 10%
GCN 0.63 0.64 0.73 0.80 0.39 0.65 0.78 0.79 0.25 0.30 0.31 0.33
SS-GCN 0.70 0.71 0.75 0.83 0.63 0.73 0.78 0.81 0.28 0.30 0.33 0.34
GraphSAGE 0.63 0.57 0.79 0.85 0.70 0.79 0.82 0.83 0.27 0.27 0.32 0.34
SS-GraphSAGE 0.69 0.66 0.86 0.85 0.78 0.85 0.85 0.85 0.33 0.32 0.34 0.36
SGC 0.47 0.61 0.79 0.83 0.78 0.79 0.81 0.83 0.25 0.27 0.32 0.33
SS-SGC 0.56 0.68 0.81 0.84 0.81 0.82 0.83 0.84 0.33 0.33 0.33 0.35
FastGCN 0.18 0.26 0.33 0.33 0.56 0.59 0.63 0.63 0.19 0.23 0.27 0.27
SS-FastGCN 0.23 0.32 0.38 0.38 0.59 0.64 0.66 0.66 0.25 0.27 0.30 0.31
TAGCN 0.56 0.56 0.79 0.79 0.70 0.77 0.80 0.83 0.30 0.30 0.33 0.32
SS-TAGCN 0.70 0.71 0.80 0.84 0.79 0.80 0.86 0.85 0.31 0.32 0.34 0.34
APPNP 0.68 0.71 0.72 0.80 0.74 0.80 0.83 0.83 0.26 0.32 0.32 0.32
SS-APPNP 0.72 0.79 0.84 0.85 0.79 0.81 0.85 0.86 0.32 0.33 0.33 0.34
Table 2: Accuracy on Multi-label Classification

4.2 Baseline Methods

To validate the improvement of the sampling strategy to GCN-related methods, we evaluate it on several state-of-the-art GCN-related baselines:

  • GCN [9]: This is the first wildly used graph convolutional network method to embed the graph structure. It takes the graph structure and a few number of labeled nodes as input and output the node embedding vector.

  • GraphSAGE [7]: This method is built on GCN, which utilizes the neighbours’ features to represent one node. In this way, the framework can deal with dynamic graph structure.

  • SGC [17]: This method speeds up the GCN’s training time by removing nonlinearities and collapsing weight matrices between consecutive layers.

  • FastGCN [3]: This method also tries to speed up the GCN’s training time by sampling active nodes between layers, which performs like ’dropout’ in traditional neural networks.

  • TAGCN [5]: This method designs a set of fixed-size learnable node filters to perform convolutions on graphs. It differs from the spectral domain of the origin GCN.

  • APPNP [10]: This method uses the idea of PageRank[13] to improve the performance of GCN by utilizing propagation procedure to construct a simple model.

4.3 Results

We now validate the effectiveness of our framework by combining it with 5 GCN-related baseline algorithms and compare them with the original ones. Specifically, we use the task of node classification for evaluation. The experimental results are shown in Table 2. We bold the better result for each comparison pair, the detailed analyse for the table are as follows.

When we fix the training set at and evaluate the performance of each algorithm, we can tell that the ones with sampling strategy can outperform the original ones . This simulates the extreme situation that the labeled data are scarcely little. We can get a lift of on average, which is significant. And the greatest improvement happens with GCN and SS-GCN on the Pubmed dataset.

As the training set scale raises, although the improvement gets smaller, it still exists. We can get an improvement of when the training scale is set as . In some cases, under the relatively larger training set, because of the powerful structure of the original algorithm, it can get the same result with its ’SS-’ competitor. The trend that the gap shrinks with the growing of the training set scale is reasonable, because the large data scale makes sure that uniformly sampling can explore as much as the ’Frontier Sampling’ we utilize.

If we take another perspective, we can get more meaningful observations. Take the accuracy of SS-SGC on the Pubmed with dataset – 0.81 as a goal. We notice that the original SGC can achieve this accuracy on the same dataset with up to of the training data, which is 10 times of SS-SGC. In other words, sampling strategy can save need of labelled data to get the same accuracy. This observation benefits more in real-world scenarios, the saving of labelled data can significantly improve the efficiency of labelling process and cross-validate among different models or datasets.

Overall, we can summarize the conclusions drawn from the results: 1) Algorithms with sampling-strategy can get a accuracy improvement under the same situation. 2) Methods with sampling-strategy can get a close performance with of the training data scale. The outstanding results of our proposal verifies that the sampling strategy can improve the GCN-related method performance easily without change them, i.e, just add the sampling strategy to the original ones as a pipeline.

Figure 3: Fig. (a)a compare the SS-GraphSAGE with the original one; Fig. (b)b does the comparison among several sampling strategies.

4.4 Algorithm Efficiency

The training data with well distribution would help the model to quickly converge, which is a common way to reduce the training time. The sampling method we utilized is a simple but effective way to achieve the target of well-distributed training data.

To evaluate the contribution to reduce the training time, we carried out a case study on GraphSAGE and SS-GraphSAGE with the Pubmed dataset. We set 10

of the dataset as the training data, and the number of training epochs is set as 200.

The converge speed of GraphSAGE and SS-GraphSAGE is ploted in Fig.(a)a, both SS-GraphSAGE’s training and test accuracy consistently outperforms the ones of the origin GraphSAGE. If we set the final test accuracy–0.83 as a threshold, SS-GraphSAGE can achieve the similar test accuracy with only 62 epochs, which decreases the training time by 69.

4.5 Sampling Strategy Comparison

We have discussed about several sampling strategies in Section 3.1. We have done a case study on the ’cora’ dataset with GraphSAGE to verify our analyse before. We replace the ’Frontier Sampling’ we used in Algorithm 1 by ’Uniform sampling’, ’Regular Random Walk’, ’DFS’, and ’BFS’. Experiment setting is the same as the one in Section 4.1. From the result shown in Fig (b)b (We take the Natural logarithm of the training set scale for clarity display), we can tell the sampling method we use outperforms the others theoretically and experimentally. Regular random walk performs the worst when the sampling scale is shrink to , caused by its nature of ’easily been trapped’, but its accuracy raises dramatically when the scale get a little bit larger, which is consistent with our analyse. To be noticed, when the training scale get larger, uniformly sampled data can get close accuracy with ours, which meets the ’Law of Large Number’.

4.6 Parameter Influence

We take a numerical evaluation on the influence of ampling scale and the number of random walk dimensions with SS-GCN on the Cora dataset.

4.6.1 Sampling Scale

Fig. (a)a shows the accuracy distance from the steady performance with different sampling budget on multi-label classification task. The steady accuracy is obtained through taking 50 of the nodes as training data. From the result, we can observe that the sampling-based methods can easily approach the steady performance with about 1 to 3 of nodes as sampling training data. It reveals the power of sampling strategy added to the GCN-related methods.

Figure 4: Figure (a)a depicts the gap from the steady accuracy with different sampling scale for SS-GCN; Figure (b)b depicts the accuracy and memory cost with different number of walkers.

4.6.2 Number of Walkers

The sampling scheme performs a dimensional random walks. Fig. (b)b shows the influence of the number of walkers on the accuracy and memory cost. The box figure shows the distribution of accuracy, the green line is the median accuracy, and the yellow line across the boxes connects the average accuracy. It shows small fluctuations with the changing value of . The memory cost is also plotted in the same figure, Which stays at about the same level when the ranges.

5 Related Work

Two lines of research are related to our work, which are summarized as follows.

5.1 GCN-based Methods

Graph neural networks have drawn a lot of attention recently. It has been proposed since 2014[2], and been modified by[4, 6]. Kipf’s GCNkipf2017semi has brought it under the spotlight. Since then researchers seek to build more effective network structure. Like the GraphSAGE[7] is proposed to deal with the dynamic graphs, Graph Attention Network[16] is proposed to weight the node’s neighbours. Some works also focus on the problem of training efficiency, FastGCN[3]

is one of the pioneers to accelerate the training process by eliminating part of neurons. SGC

[17] steps further by simplifying convolutional computation.

5.2 Graph Sampling Based on Random Walks

Sampling methods, especially random walk-based graph sampling methods, have been widely studied[1][15][18]. Leskovecleskovec2006sampling has came up with an efficient way to down-size the sampling scale based on random walk. And Weiwei2004towards has worked out how to make the sampling by random walk more efficient. Random-walk based sampling can also be used in overlay networks[12].Based on the prior knowledge about the graph structure, Zhao et al. zhao2019sampling proposed a graph sampling strategy by random walk with indirect jumps.

6 Conclusion and Future Work

Faced with the challenge of limited labbled data for semi-supervised methods, we propose a sampling-based model to improve their performance without changing original model. The evaluation of our proposal on real-world datasets show that our framework could achieve better results with smaller scale of training data, and it surpasses the original ones .

In this paper, we take a small step to improve the performance of semi-supervised methods representived by GCN in the condition of limited labbeled data. But limited labbeled data is a very common problem in graph embedding method study, and we would try to develop a more general framework working for all of them. What’s more, we would like to study the problem of lowering the training time complexity with smaller scale of training data.


This work was supported in part by the National Natural Science Foundation of China under Grant 61902308, 61822309, 61773310, U1736205, U1766215, Foundation of Xi’an Jiaotong University under grant xxj022019016, xtr022019002, and Initiative Postdocs Supporting Program BX20190275.


  • [1] K. Avrachenkov, B. Ribeiro, and D. Towsley (2010) Improving random walk estimation accuracy with uniform restarts. In International Workshop on Algorithms and Models for the Web-Graph, pp. 98–109. Cited by: §5.2.
  • [2] J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun (2014) Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014, pp. http–openreview. Cited by: §5.1.
  • [3] J. Chen, T. Ma, and C. Xiao (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247. Cited by: §1, §2, 4th item, §5.1.
  • [4] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §5.1.
  • [5] J. Du, S. Zhang, G. Wu, J. M. Moura, and S. Kar (2017) Topology adaptive graph convolutional networks. arXiv preprint arXiv:1710.10370. Cited by: 5th item.
  • [6] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §5.1.
  • [7] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §1, §1, §2, 2nd item, §5.1.
  • [8] W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec (2019) Pre-training graph neural networks. External Links: 1905.12265 Cited by: §1.
  • [9] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §1, 1st item.
  • [10] J. Klicpera, A. Bojchevski, and S. Günnemann (2018) Predict then propagate: graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997. Cited by: 6th item.
  • [11] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §1.
  • [12] L. Massoulié, E. Le Merrer, A. Kermarrec, and A. Ganesh (2006) Peer counting and sampling in overlay networks: random walk methods. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, pp. 123–132. Cited by: §5.2.
  • [13] L. Page, S. Brin, R. Motwani, and T. Winograd (1999) The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: 6th item.
  • [14] B. Ribeiro and D. Towsley (2010) Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp. 390–403. Cited by: §3.1.
  • [15] B. Ribeiro, P. Wang, F. Murai, and D. Towsley (2012) Sampling directed graphs with random walks. In 2012 Proceedings IEEE INFOCOM, pp. 1692–1700. Cited by: §3.1, §5.2.
  • [16] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §1, §1, §5.1.
  • [17] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger (2019) Simplifying graph convolutional networks. In International Conference on Machine Learning, pp. 6861–6871. Cited by: 3rd item, §5.1.
  • [18] X. Xu, C. Lee, et al. (2014) A general framework of hybrid graph sampling for complex network analysis. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2795–2803. Cited by: §5.2.
  • [19] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim (2019)

    Graph transformer networks

    In Advances in Neural Information Processing Systems, pp. 11960–11970. Cited by: §1.