I Introduction
Recently, remarkable progress has been made toward graph representation learning, a.k.a graph/network embedding, which solves the graph analytics problem by mapping nodes in a graph to lowdimensional vector representations while effectively preserving the graph structure
[14, 4, 7, 34]. Graph neural networks (GNNs) have been widely applied in graph analysis due to the groundbreaking performance with deep architectures and recent advances in optimization techniques [27, 36]. Existing representation learning methods based on GNNs, e.g. GraphSAGE [13], Graph Convolution Networks (GCNs) [17, 6] and Graph Attention Networks (GATs) [31], rely on the aggregation of neighborhood information, which makes the model vulnerable to noises in the input graph.Some examples of such noises are as follows:

In knowledge graphs or open information extraction systems, spurious information may produce erroneous links between nodes. Likewise, incomplete information may lead to missing links.

In taskdriven graph analysis, mislabeled samples, or crossclass links can be viewed as noises in node classification task.

Node features such as user profiles in social networks are often missing, or filled with obsolete or incorrect values.
Good graph representations are expected to be robust to the erroneous links, mislabeled nodes and partial corrupted features in the input graph, and capture geometric dependencies among nodes in the graph. However existing approaches have limited efforts on robustness study in this regard. In order to overcome this limitation of graph representation learning in handling noisy graph data, we propose Graph Denoising Policy Network, denoted as GDPNet, to learn robust representations through reinforcement learning. GDPNet includes two phases: signal neighbor selection and representation learning. It first selects signal neighbors for each node, and then aggregates the information from the selected neighbors to learn node representations with respect to the downstream tasks.
The major challenge here is on how to train these two phases jointly, particularly when the model has no explicit knowledge about where the noise might be. We address this challenge by formulating the graph denoising process as a Markov decision process. Intuitively, although we do not have an explicit supervision for the signal neighbor selection, we can measure the performance of the representations learned with the selected neighbors on tasks like node classification, then the taskspecific rewards received from the representation learning phase can be used for trialanderrorsearch. In the signal neighbor selection phase, as shown in Fig. 2, GDPNet optimizes the neighborhood for each node by formulating the process of removing the noisy neighbors as a Markov decision process and learning a policy with the taskspecific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for downstream tasks, and provides taskspecific rewards to the signal neighbor selection phase. In the representation learning phase, GDPNet trains a set of aggregator functions that accumulate feature information from the selected signal neighbors of each target node. Thus in the test time, the representations of unseen nodes can be generated with the trained GDPNet with graph structure and the associated node feature information. The taskspecific rewards computed w.r.t the downstream tasks are passed to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative taskspecific rewards, and to learn robust representations for nodes.
We evaluate GDPNet on node classification benchmark, which tests GDPNet’s ability to generate useful representations on unseen data. Experimental results show that GDPNet outperforms stateoftheart graph representation learning baselines on several wellstudied datasets by a large margin, which demonstrates the effectiveness of our approach. In summary, our contributions in this work include:

We propose a novel model, GDPNet, for robust graph representation learning through reinforcement learning. GDPNet consists of two phrases, namely signal neighbor selection and representation learning, which enables GDPNet to effectively learn node representations from noisy graph data.

We formulate signal neighbor selection as a reinforcement learning problem, which enables the model to perform graph denoising just with weak supervision from the taskspecific reward signals.

GDPNet is able to generate representations for unseen nodes in an inductive fashion, which leverages both graph structure and the associated node feature information.

GDPNet is proved to be mathematically equivalent to solving the submodular maximizing problem, which guarantees our model can be bounded w.r.t the optimal solution.
The rest of this work is organized as follows. Section II reviews the preliminaries of graph neural networks and reinforcement learning. Section III formally defines the graph representation learning problem through reinforcement learning, along with the description of GDPNet. Section IV discusses the relations of GDPNet to submodular maximization problem. Section V evaluates GDPNet on node classification tasks using various realworld graph data. Section VI briefly surveys related work on graph representation learning and reinforcement learning on graph, followed by the conclusion in Section VII.
Ii Preliminaries
Iia Graph Neural Network
Graph neural networks (GNNs) are utilized to encode nodes in a lowdimensional space so that similarity in this space approximates similarity in the original graph [36]. GNNs generate node representations based on local neighborhoods, that is, by aggregating information from their neighbors using neural networks. Mathematically, the basic GNN can be formulated as follows,
(1)  
(2) 
where is the neighborhood set of node , represents node feature vector, represents layer embedding of node ,
is the nonlinear activation function (e.g. ReLU or tanh),
and are the parameters to be learned.IiB Reinforcement Learning
The goal of reinforcement learning is to learn a policy which can obtain maximum cumulative reward by making multistep decisions in a Markov decision process. Policy gradient is the main approach to solve reinforcement learning problems which directly optimizes this policy via calculating the gradient of the cumulative reward and making gradient ascent. Specifically, the policy is modeled as
. To estimate the parameter
of , policy gradient methods maximize the expected cumulative reward from the start state to the end of the decision process . The objective function of reinforcement learning is defined as follows:(3) 
where is the trajectories generated by which consists of state , action and reward , is the number of trajectory samples, is the length of the decision process. Policy gradient optimizes the parameter via gradient ascent where the gradient is calculated by the policy gradient theorem [29]:
(4) 
Iii Approach
We formulate the robust graph representation learning problem as sequentially selecting an optimal set of neighbors for each node with maximum cumulative reward signals and aggregating features from nodes’ optimal neighborhoods. In this part, we formally define the problem, the environment setting for signal neighbor selection, and the GDPNet model.
Iiia Problem Formulation
Given an attributed graph , where is the edge set and is the node set. collects the attribute information for each node where is a dimensional attribute vector of node
. Note that we can simply use onehot encoding for node features for a graph without attributes. Given a target node
, let be the onehop neighbors of .We aim to find a lowerdimensional representation for node . Firstly, a function is learned to map a neighborhood set into a signal neighborhood set , where . Then the node representations are generated based on the signal neighborhood set, . Given an order of the neighbors
, we decompose the conditional probability of
given asusing chain rule
[19], where , indicates selecting as a signal neighbor while indicates removing . We solve this signal neighbor selection problem by learning a policy with neighborhood set and the predicted action values as inputs. The objective of signal neighbor selection is to select a subset of neighbors that maximize a given reward function , where is the generated signal neighborhood set, is the taskspecific reward used to evaluate the action , and is the cumulative reward function. The representation of node can then be learned by aggregating the neighborhood information from the signal neighbors .Selecting an optimal subset from a candidate set by maximizing an objective function is NPhard which can be approximatively solved by greedy algorithms with a submodular function [22]. With this observation, we design our reward function that satisfies submodularity, and show that the proposed GDPNet is mathematically equivalent to solving the submodular maximizing problem. Thus our solution can be bounded by , where is the optimal neighborhood set.
Notation  Description  

neighborhood set of node  
signal neighborhood set of node  
signal neighborhood set of node at time  
complementary set,  
, embedding of target node at time  
, feature vector of node  
the states, , is the neighbor of  
reward function at time  
total reward function,  



IiiB Signal Neighbor Selection Environment
We formulate the problem of selecting a set of signal neighbors from a given neighborhood set as a Markov decision process (MDP) , where is the state space, is the action space, is the state transition probability matrix that describes the transition probability of the state after taking an action, is the reward function and is discount factor of the MDP. The signal neighbor selection process can be described by a trajectory with time steps . MDP requires the state transition dynamics to satisfy the Markov property . Thus we learn a policy that only considers the current state .
In reinforcement learning, the agent learns a policy via interacting with the environment. The main components (i.e., state, action, and reward) in the signal neighbor selection environment are described as follows,

State (): The state encodes the information from the current node and the selected node , which is concatenation of the intermediate embeddings and of the target node and the neighbor , respectively. The calculation of and are defined in Section IIIC. Consequently, a newly selected neighbor will update the embedding of from to which can be viewed as state transition.

Action (): Given an order of the neighbors of node , the policy maps the state into an action at each time step , . indicates is selected as a signal neighbor, while means is not selected.

Reward (): Our goal is to find an optimal set of signal neighbors from a finite neighborhood set to learn robust graph embedding for downstream tasks such as node classification, link prediction and node clustering. The downstream tasks can produce taskspecific scores as the reward signal for the signal neighbor selection phase. To ensure that the combination of the selected neighbors have maximum cumulative rewards. We employ the submodular function framework to define the marginal value reward function:
(5) where aggregates both the target node feature and the neighbors’ features to update the representations of the target node [13], and returns the microaveraged F1 score from the node classification task when considers as the neighbor.
The environment updates the states from to by calculating the representations at time . It can be considered as a state transition:
(6) 
If , , otherwise .
IiiC Graph Denoising Policy Network
With the definitions of the signal neighbor selection environment, we introduce the GDPNet model which includes two phases: signal neighbor selection and representation learning. Given a target node , GDPNet first takes its neighborhood set as input and outputs a signal neighborhood subset . Then the representations is learned by aggregating the information from the signal neighborhood subset .
IiiC1 Determine the Neighborhood Order
As aforementioned, we use chain rule to decompose the signal neighbor selection as a sequential decision making process. However, it requires an order to make decisions. Here we design a highlevel policy to learn an order for the policy to take action.
We define a regret score for each neighbor to help determine the order. A neighbor with large regret score indicates it will be selected with higher probability. At each time step, we calculate the regret score of each neighbor and sample one of the neighbor to be the neighbor. The regret score is described as follows:
(7) 
where is the th neighborhood in the neighborhood set with a random order and are parameter matrices. To reduce the size of for computational efficiency, we add an ending neighbor to for early stopping purpose. When is sampled, the neighborhood selection process of node stops. We use the Softmax function to normalize the regret scores, and sample one neighbor from the distribution generated by Softmax to be the neighbor.
(8) 
where is the neighbor for signal neighbor selection, . indicates the regret score of the ending neighbor . After selecting a neighbor , we adopt the policy to determine whether to select as a signal neighbor. Then will be removed from .
IiiC2 Signal Neighbor Selection
Given the neighbor , GDPNet takes an action at time step to decide whether to select the . We will make decisions to select the signal neighbors for node . Here the total number of signal neighbors can be automatically determined. As illustrated in Fig. 2, a policy is learned to map the state to the action at time step , meanwhile the corresponding reward will be provided. Our goal is to maximize the total reward of all the actions taken during these time steps, which can be learned by the following policy network,
(9) 
where and are weight matrices shared with Eq. (7), and action
is sampled from a Bernoulli distribution which is generated by
.IiiC3 Representation Learning
At each time step, GDPNet calculates the embeddings of the target node and the th neighbor as follows,
(10)  
(11) 
where , and are the features of node and respectively. We computed the embedding of neighbor via its own feature , because the goal is to evaluate the individual contribution of . In this work we only consider onehop neighbors for simplicity. The GDPNet model can be easily extended to aggregate the information from multihop neighbors with an augmented candidate neighborhood set for selecting the signal neighbors.
As defined in Section IIIB, the state at time step , , is a concatenation of the intermediate node embeddings and . Eventually, the representations and state can be obtained.
IiiC4 Iterationwise Optimization
We consider an iterationwise optimization approach to optimize the GDPNet model, which optimizes the signal neighbor selection phrase and representation learning phrase iteratively to learn the policy and the representations . As for representation learning phase, it aggregates the information from the signal neighbors selected by to learn an embedding for target node . Meanwhile, the policy is trained with the states calculated by and the corresponding rewards. In this paper, is optimized with Proximal Policy Optimization (PPO), one of the widely used policy gradient method [28].
(12) 
where Kullback–Leibler (KL) divergence penalty is used to control the change of the policy at each iteration to perform a trust region update with a threshold . and are the policy and Qvalue, respectively, which are saved before the current time step during training. is the discounted state distribution defined as,
(13) 
Iv Connection with Submodular Maximization
The design of the reward function in GDPNet described in Section IIIB is inspired by the submodular function. With this carefully designed reward function, we build the connections with submodular maximization problem, and show that the solution provided by GDPNet can be bounded by , where is the optimal neighborhood set. In this section we first introduce the key definitions related to submodular functions, followed by the proof of monotonicity and submodularity properties of the reward function in GDPNet.
Iva Submodular Reward Function
In this section, we show that given a special form of reward function, the total reward function in GDPNet turns out to be submodular.
Definition 1 (Submodular Function).
Let be a finite ground set and is set function. A set function is submodular if it satisfies the diminishing returns property: and the monotone property: for all and .
Definition 2 (Submodular Maximization).
Let be an optimizer which maps a set in to a subset with size smaller than :
(14) 
The submodular maximization problem is to find the best possible which satisfying:
(15) 
The reward function in GDPNet is denoted by , which is also named marginal value. Specifically, the reward function in GDPNet can be expressed as:
(16) 
where . Given this reward function , we can prove that the cumulative reward function is a submodular function.
Proposition 1.
The total reward function is a monotone function, where
(17) 
Proof.
Proposition 2.
The total reward function satisfies the submodularity property. That is,
(20) 
whenever and
Proof.
We define,
(21)  
(22) 
Then we need to show . Based on the reward definition in Eq. (16), we have,
(23) 
Assume , and , the above equations can rewritten as,
(24) 
We have based on the monotonicity property. Thus we have . ∎
IvB Equivalence between GDPNet and Submodular Maximization Problem
We will establish the following facts, which together imply the equivalence between GDPNet and submodular maximization problem,

The total reward function defined in the signal neighbor selection phase is a submodular function, which is equivalent to .

The submodular maximization problem can be formulated as an MDP which is equivalent to GDPNet.

The objective function in GDPNet is equivalent to the counterpart in submodular maximization.
Firstly, the goal of submodular maximization is to find the , with the objective function:
(25) 
where is the cumulative value of each element in set . Let be the selected neighborhood set , then .
Secondly, the submodular maximization problem can be formulated as an MDP where the set with the selected items indicates the state. After adding a new item into , the state is updated to .
Lastly, the objective function of GDPNet also aligns to the optimizer in submodular maximization, where each can be considered as an optimizer in Equation (25):
(26) 
where is equivalent to the signal neighbor set
Theorem 1.
Greedy gives a approximation for the problem of when is a monotone submodular function.
Proof.
Based on the aforementioned equivalence between the objective functions of GDPNet and submodular maximization, we need to show that a approximation solution can be achieved for with a submodular function , which has been proved in [22]. ∎
V Experiment
Experiments are conducted to evaluate the robustness of the representations learned by the proposed GDPNet model. As for quantitative experiments, we focus on two tasks: (1) Robustness Evaluation, we use microaveraged F1 score to evaluate our model against baselines on node classification task, and (2) Denoising Evaluation
, we evaluate the denoising capability of GDPNet by comparing with baselines running on the denoised graph generated by GDPNet. We extract four datasets Cora, Citeseer, PubMed and DBLP followed by spliting them for training, test and validation with the supervised learning scenario which follows the previous work
[13, 31, 6]. As for qualitative experiments, we conduct the embedding visualization which projects the learned highdimension representations to a 2D space. In all these experiments, we separate out test data from training and perform predictions on nodes that are not seen during training.Dataset  Nodes  Edges  Classes  Features  Train/Validate/Test 

Cora  2,708  5,429  7  1,433  1,208/500/1,000 
Citeseer  4,230  5,358  6  602  2,730/500/1,000 
PubMed  19,717  44,338  3  500  18,217/500/1,000 
DBLP  17,716  105,734  4  1,639  16,216/500/1,000 
Va Experimental Setup and Baselines
For all these tasks, we apply a twolayer policy network to select the signal neighbors. The architectural hyperparameters are optimized on the Cora dataset and shared by the other datasets. The embedding dimension is . The size of the two hidden layers in policy network are and , respectively, with active function ReLU. The batch size is . The discount factor is optimized as for Cora and DBLP, for PubMed and for Citeseer. We compare our method with the following baselines:

LR
: Logistic regression (LR) model which takes the node features as inputs, and ignores graph structure.

GCN [17]: GCN uses the local connection structure of the graph as the filter to perform convolution, where filter parameters are shared over all locations in the graph. We use inductive version of GCN in this paper for comparison

GAT [31]: GAT utilizes the attention mechanism to enhance the performance of the graph convolutional network by considering the entire neighborhoods.

FastGCN [6]: FastGCN considers graph convolutions as integral transforms of embedding functions, and samples the neighborhoods in each layer independently to addresses the recursive expansion of neighborhoods.

GraphSAGE [13] GraphSAGE extends the original graph convolutionstyle framework to the inductive setting. It randomly samples a fixedsize neighborhood of each node followed by performing a specific aggregator over it.
Our proposed model is denoted as GDPNet. We also introduce a variant GDPNet which performs the signal neighbor selection with a random order of the neighbors.
VB Performance Comparison
In this section, we first visualize the node representations learned by different methods, followed by the performance comparison on node classification task. Additionally, we show the distributions of the selected signal neighbors with GDPNet on different dataset.
VB1 Embedding Visualization
Node representations are learned by GAT, GCN, GraphSAGE and GDPNet on test dataset of Cora, and visualized with tSNE [20], as shown in Fig. 3. Different colors in the figure represent different categories in Cora. The following observations can be made from Fig. 3,

GDPNet correctly detects the classes in Cora, providing empirical evidence for the effectiveness of our method. This can be seen by the clear gap between samples with different colors. It also demonstrates that, removing the noisy neighbors can help nodes learn better representations.

GCN and GraphSAGE share similar “shape” in the 2D space. The reason is that in the inductive learning setting, GCN and GraphSAGE use the same methods in neighborhood sampling. GAT considers the entire neighborhoods which leads to a different visualization result from the others. It can be seen that the sampled neighbors have a profound effect on the representations.

GAT cannot effectively identify different classes as other methods, it might because it considers all the neighbors with attention weights, which is easily to introduce noisy neighbors.
Method  Cora  PubMed  DBLP  Citeseer 

LR  
GAT  
GCN  %  
FastGCN  
GraphSAGE  
GDPNet  
GDPNet 
Method  Cora  PubMed  DBLP  Citeseer 

GCN  %  
GCN  
GraphSAGE  
GraphSAGE 
VB2 Results on Node Classification
In this part, we compare the performance of GDPNet against the baselines on Cora, Citeseer, PubMed and DBLP. For all methods, we run the experiments with random seeds over
trials and record the mean and standard variance of the
microaverage F1 scores. The results are summarized in Table III. From the table we observe that,
GDPNet consistently outperforms the other methods, which demonstrates there exists a set of noisy neighbors in each dataset on node classification task, and GDPNet can learn robust embeddings by effectively removing these noisy neighbors.

GCN, FastGCN and GraphSAGE show lower F1 scores. The reason is that these methods randomly sample a subset of neighbors for representation learning, which is hard to avoid the noisy neighbors. In addition, variance is higher via random sampling.

GAT learns the importance of the neighbors with attention weights, which is also sensitive to noisy data according to the reported results.

Another interesting observation is that Logistic regression achieves better performance than the other baselines on PubMed, which indicates that there would be less signal neighbors for the nodes in PubMed. This observation can also be verified in Fig.4.

GDPNet has a lower F1 score with higher variance than GDPNet, which demonstrates that the order of the decisions has an effect on the performance of representation learning. Thus learning an order for the neighbors is beneficial for selecting signal neighbors and robust graph representation learning.
VB3 Distribution of the Selected Neighbors
Fig. 4 shows the distribution of the selected neighbor percentages, where the axis indicates the percentage of the nodes been selected as signal neighbors, and the axis indicates the probability densities. We observe that most of the neighbors in Citeseer and DBLP are selected while only a few neighbors are selected in PubMed. The results show that there would be more “noisy” citations (e.g. crossfield citation) in PubMed than in Citeseer and DBLP. Interestingly, most of the research papers collected in Citeseer and DBLP are from computer science, while PubMed collects papers from biomedical.
VC Ablation Study
VC1 Node classification performance comparison on selected signal neighbors
In this part, we evaluate the effectiveness of denoising process in GDPNet. Specifically, we first utilize the policy learned by GDPNet to remove the noisy neighbors from Citeseer and PubMed. With the denoised graphs, we learn representations with GCN and GraphSAGE to see whether the performance can be improved on the denoised graphs. The results are summarized in Table IV, where the suffix “” indicates the results on the denoised graphs generated by GDPNet. As expected, both GCN and GraphSAGE achieves better performance on the denoised graphs, which demonstrates the effectiveness of the denoising process in GDPNet.
VC2 Parameter Sensitivity Study
In Fig. 7, we vary the training percentage of nodes in Citeseer and PubMed to test the classification accuracy. We observe that, the performance of all the methods are improved with the increases of the training percentage. Additionally, it can be seen that GAT is very sensitive to the percentages of training data, and it requires larger proportion of training data in order to have a desirable performance. GraphSAGE, GCN and GDPNet achieve good performances on small training data, and GDPNet make more improvements as the training data percentage increases.
Discount factor balances the importance between instant reward and longterm reward. The large indicates the more important role of longterm reward. Fig. 5 shows that when , Citeseer achieves the best performance, while PubMed achieves best performance when . We can see that Citeseer is more sensitive to the discount factor than PubMed.
Fig. 5
presents the analysis on the number of epochs for representation learning phase. It can be seen from the figure that, with the increase of epochs (between
and ), the performances of PubMed and Citeseer are both improved. The epochs to achieve best performance are and for PubMed and Citeseer, respectively.In Fig. 7, we vary the training percentage of nodes in Citeseer and PubMed to test the classification accuracy. We observe that, the performance of all the methods are improved with the increases of the training percentage. Additionally, it can be seen that GAT is very sensitive to the percentages of training data, and it requires larger proportion of training data in order to have a desirable performance. GraphSAGE, GCN and GDPNet achieve good performances on small training data, and GDPNet make more improvements as the training data percentage increases.
VC3 Convergence Analysis
Fig. 7 shows the convergence analysis of GDPNet on Citeseer and PubMed. We initialize the policy randomly when epoch equals , and the neighbors are randomly selected as signal neighbors. We observe that Citeseer converges faster than PubMed. One explanation would be that PubMed has more nodes than Citeseer, which requires more time to explore the policy for nodes.
Vi Related Work
In this section, we briefly describe previous graph representation learning approaches including matrix factorization based methods and graph neural network based methods, and recent advancements in applying reinforcement learning on graph.
Via Graph Representation Learning
Graph representation learning tries to encode the graph structure information into vector representations. The main idea is to learn a mapping function from the nodes or entire graphs into an embedding space where the geometric relationships in the lowdimensional space coincide with the original graph. The methods can be grouped into two categories: matrix factorization based methods and graph neural network based methods [14].
ViA1 Matrix Factorization based Embedding
Matrix factorization based methods learns an embedding lookup table which trains unique embedding vectors for each node independently. These methods largely focused on matrixfactorization approaches and random walk approaches [14, 11, 4, 7]. Matrixfactorization approaches utilize dimension reduction methodology to learn the representations [5, 1, 24]
with the loss of node pair similarity. Inspired by the success of natural language processing
[21], a set of methods use random walk to learn the node embeddings where the node similarity is calculated by cooccurrence statistics from sentencelike vertex sequences generated by random walks among connected vertices [25, 32, 12, 30, 9]. The randomwalk based method have been verified to be unified into the matrix factorization framework [26].ViA2 Graph Neural Network based Embedding
A set of graph neural network based embedding methods are proposed recently for representation learning [3, 10, 18, 23]. GCN [17] first proposes the firstorder graph convolution layer to perform recursive neighborhood aggregation based on the local connection. Instead of utilizing full graph Laplacian during training in the GCN, GraphSAGE [13] considers the inductive setting to handle the large scale graph with batch training and neighborhood sampling. Followed by GraphSAGE, selfattention mechanism has been explored to enhance the representation learning performance [31, 35]. To accelerate the training of GCNs, [6] samples the nodes in each layer independently, while [15] samples the lower layer conditioned on the top one and the sampled neighborhoods are shared by different parent node. In this work, we propose to find an effective subset of neighbors for learning robust representations.
ViB Reinforcement Learning on Graph
Reinforcement learning solves the sequential decision making problem with the goal of maximizing cumulative rewards of these decisions. A set of work used reinforcement learning to solve the sequential decision making problems in graph, such as minimum vertex cover, maximum cut and travelling salesman problem [16, 2]. You et al. [33] considered the molecular graph generation process as a sequential decision making process where the reward function is designed by nondifferentiable rules. Dai et al. [8] utilized reinforcement learning to learn an attack policy to make multiple decisions (delete or add edges in the graph) to attack the graph.
Vii Conclusion
In this paper, we developed a novel framework, GDPNet, to learn robust representations from noisy graph data through reinforcement learning. GDPNet includes two phases: signal neighbor selection and representation learning. It learns a policy to sequentially select the signal neighbors for each node, and then aggregates the information from the selected neighbors to learn node representations for the downstream tasks. These two learning phases are complementary and achieves significant improvement. We show that our method mathematically is equivalent to maximizing the submodular function with the carefully designed reward function, which guarantees our objective value can be bounded by . Note that GDPNet is naturally an inductive model which can generate representations for unseen nodes. Experiments on a set of wellstudied datasets provide empirical evidence for our analytical results, and yield significant gains in performance over stateoftheart baselines.
References
 [1] (2013) Distributed largescale natural graph factorization. In WWW, pp. 37–48. Cited by: §VIA1.

[2]
(2017)
Neural combinatorial optimization with reinforcement learning
. In ICLR, Cited by: §VIB.  [3] (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §VIA2.
 [4] (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. TKDE 30 (9), pp. 1616–1637. Cited by: §I, §VIA1.
 [5] (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 891–900. Cited by: §VIA1.
 [6] (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247. Cited by: §I, 4th item, §V, §VIA2.
 [7] (2018) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §I, §VIA1.
 [8] (2018) Adversarial attack on graph structured data. In ICML, Cited by: §VIB.
 [9] (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In KDD, pp. 135–144. Cited by: §VIA1.
 [10] (2015) Convolutional networks on graphs for learning molecular fingerprints. In NeurIPS, pp. 2224–2232. Cited by: §VIA2.
 [11] (2018) Graph embedding techniques, applications, and performance: a survey. KnowledgeBased Systems 151, pp. 78–94. Cited by: §VIA1.
 [12] (2016) Node2vec: scalable feature learning for networks. In KDD, pp. 855–864. Cited by: §VIA1.
 [13] (2017) Inductive representation learning on large graphs. In NeurIPS, pp. 1024–1034. Cited by: §I, 3rd item, 5th item, §V, §VIA2.
 [14] (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584. Cited by: §I, §VIA1, §VIA.
 [15] (2018) Adaptive sampling towards fast graph representation learning. In NeurIPS, pp. 4558–4567. Cited by: §VIA2.
 [16] (2017) Learning combinatorial optimization algorithms over graphs. In NeurIPS, pp. 6348–6358. Cited by: §VIB.
 [17] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §I, 2nd item, §VIA2.
 [18] (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §VIA2.

[19]
(2015)
On the optimality of classifier chain for multilabel classification
. In NeurIPS, pp. 712–720. Cited by: §IIIA. 
[20]
(2008)
Visualizing data using tsne.
Journal of machine learning research
9 (Nov), pp. 2579–2605. Cited by: §VB1.  [21] (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §VIA1.
 [22] (1978) An analysis of approximations for maximizing submodular set functions—i. Mathematical programming 14 (1), pp. 265–294. Cited by: §IIIA, §IVB.

[23]
(2016)
Learning convolutional neural networks for graphs
. In ICML, pp. 2014–2023. Cited by: §VIA2.  [24] (2016) Asymmetric transitivity preserving graph embedding. In KDD, pp. 1105–1114. Cited by: §VIA1.
 [25] (2014) Deepwalk: online learning of social representations. In KDD, pp. 701–710. Cited by: §VIA1.
 [26] (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §VIA1.
 [27] (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §I.
 [28] (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §IIIC4.
 [29] (2000) Policy gradient methods for reinforcement learning with function approximation. In NeurIPS, pp. 1057–1063. Cited by: §IIB.
 [30] (2015) Line: largescale information network embedding. In WWW, pp. 1067–1077. Cited by: §VIA1.
 [31] (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §I, 3rd item, §V, §VIA2.
 [32] (2015) Network representation learning with rich text information. In IJCAI, Cited by: §VIA1.
 [33] (2018) Graph convolutional policy network for goaldirected molecular graph generation. In NeurIPS, pp. 6410–6421. Cited by: §VIB.

[34]
(2018)
Learning deep network representations with adversarially regularized autoencoders
. In KDD, pp. 2663–2671. Cited by: §I.  [35] (2018) Gaan: gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294. Cited by: §VIA2.
 [36] (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §I, §IIA.