Learning Robust Representations with Graph Denoising Policy Network

Graph representation learning, aiming to learn low-dimensional representations which capture the geometric dependencies between nodes in the original graph, has gained increasing popularity in a variety of graph analysis tasks, including node classification and link prediction. Existing representation learning methods based on graph neural networks and their variants rely on the aggregation of neighborhood information, which makes it sensitive to noises in the graph. In this paper, we propose Graph Denoising Policy Network (short for GDPNet) to learn robust representations from noisy graph data through reinforcement learning. GDPNet first selects signal neighborhoods for each node, and then aggregates the information from the selected neighborhoods to learn node representations for the down-stream tasks. Specifically, in the signal neighborhood selection phase, GDPNet optimizes the neighborhood for each target node by formulating the process of removing noisy neighborhoods as a Markov decision process and learning a policy with task-specific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for down-stream tasks, and provides task-specific rewards to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative task-specific rewards, and to learn robust representations for nodes. Experimental results on node classification task demonstrate the effectiveness of GDNet, outperforming the state-of-the-art graph representation learning methods on several well-studied datasets. Additionally, GDPNet is mathematically equivalent to solving the submodular maximizing problem, which theoretically guarantees the best approximation to the optimal solution with GDPNet.


GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction

Graph representation learning is of paramount importance for a variety o...

Topic-aware latent models for representation learning on networks

Network representation learning (NRL) methods have received significant ...

Inferential SIR-GN: Scalable Graph Representation Learning

Graph representation learning methods generate numerical vector represen...

Heterogeneous Graph Representation Learning with Relation Awareness

Representation learning on heterogeneous graphs aims to obtain meaningfu...

Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks

Representation learning methods for heterogeneous networks produce a low...

Deep Reinforcement Learning with Graph-based State Representations

Deep RL approaches build much of their success on the ability of the dee...

GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Graph representation learning has drawn increasing attention in recent y...

I Introduction

Recently, remarkable progress has been made toward graph representation learning, a.k.a graph/network embedding, which solves the graph analytics problem by mapping nodes in a graph to low-dimensional vector representations while effectively preserving the graph structure 

[14, 4, 7, 34]. Graph neural networks (GNNs) have been widely applied in graph analysis due to the ground-breaking performance with deep architectures and recent advances in optimization techniques [27, 36]. Existing representation learning methods based on GNNs, e.g. GraphSAGE [13], Graph Convolution Networks (GCNs) [17, 6] and Graph Attention Networks (GATs) [31], rely on the aggregation of neighborhood information, which makes the model vulnerable to noises in the input graph.

Some examples of such noises are as follows:

  • In knowledge graphs or open information extraction systems, spurious information may produce erroneous links between nodes. Likewise, incomplete information may lead to missing links.

  • In task-driven graph analysis, mislabeled samples, or cross-class links can be viewed as noises in node classification task.

  • Node features such as user profiles in social networks are often missing, or filled with obsolete or incorrect values.

Fig. 1: The framework of GDPNet with one-hop neighborhood aggregation

Fig. 2: Illustration of the GDPNet model from the view of signal neighbor selection

Good graph representations are expected to be robust to the erroneous links, mislabeled nodes and partial corrupted features in the input graph, and capture geometric dependencies among nodes in the graph. However existing approaches have limited efforts on robustness study in this regard. In order to overcome this limitation of graph representation learning in handling noisy graph data, we propose Graph Denoising Policy Network, denoted as GDPNet, to learn robust representations through reinforcement learning. GDPNet includes two phases: signal neighbor selection and representation learning. It first selects signal neighbors for each node, and then aggregates the information from the selected neighbors to learn node representations with respect to the down-stream tasks.

The major challenge here is on how to train these two phases jointly, particularly when the model has no explicit knowledge about where the noise might be. We address this challenge by formulating the graph denoising process as a Markov decision process. Intuitively, although we do not have an explicit supervision for the signal neighbor selection, we can measure the performance of the representations learned with the selected neighbors on tasks like node classification, then the task-specific rewards received from the representation learning phase can be used for trial-and-error-search. In the signal neighbor selection phase, as shown in Fig. 2, GDPNet optimizes the neighborhood for each node by formulating the process of removing the noisy neighbors as a Markov decision process and learning a policy with the task-specific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for down-stream tasks, and provides task-specific rewards to the signal neighbor selection phase. In the representation learning phase, GDPNet trains a set of aggregator functions that accumulate feature information from the selected signal neighbors of each target node. Thus in the test time, the representations of unseen nodes can be generated with the trained GDPNet with graph structure and the associated node feature information. The task-specific rewards computed w.r.t the down-stream tasks are passed to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative task-specific rewards, and to learn robust representations for nodes.

We evaluate GDPNet on node classification benchmark, which tests GDPNet’s ability to generate useful representations on unseen data. Experimental results show that GDPNet outperforms state-of-the-art graph representation learning baselines on several well-studied datasets by a large margin, which demonstrates the effectiveness of our approach. In summary, our contributions in this work include:

  • We propose a novel model, GDPNet, for robust graph representation learning through reinforcement learning. GDPNet consists of two phrases, namely signal neighbor selection and representation learning, which enables GDPNet to effectively learn node representations from noisy graph data.

  • We formulate signal neighbor selection as a reinforcement learning problem, which enables the model to perform graph denoising just with weak supervision from the task-specific reward signals.

  • GDPNet is able to generate representations for unseen nodes in an inductive fashion, which leverages both graph structure and the associated node feature information.

  • GDPNet is proved to be mathematically equivalent to solving the submodular maximizing problem, which guarantees our model can be bounded w.r.t the optimal solution.

The rest of this work is organized as follows. Section II reviews the preliminaries of graph neural networks and reinforcement learning. Section III formally defines the graph representation learning problem through reinforcement learning, along with the description of GDPNet. Section IV discusses the relations of GDPNet to submodular maximization problem. Section V evaluates GDPNet on node classification tasks using various real-world graph data. Section VI briefly surveys related work on graph representation learning and reinforcement learning on graph, followed by the conclusion in Section VII.

Ii Preliminaries

Ii-a Graph Neural Network

Graph neural networks (GNNs) are utilized to encode nodes in a low-dimensional space so that similarity in this space approximates similarity in the original graph [36]. GNNs generate node representations based on local neighborhoods, that is, by aggregating information from their neighbors using neural networks. Mathematically, the basic GNN can be formulated as follows,


where is the neighborhood set of node , represents node feature vector, represents layer embedding of node ,

is the non-linear activation function (e.g. ReLU or tanh),

and are the parameters to be learned.

Ii-B Reinforcement Learning

The goal of reinforcement learning is to learn a policy which can obtain maximum cumulative reward by making multi-step decisions in a Markov decision process. Policy gradient is the main approach to solve reinforcement learning problems which directly optimizes this policy via calculating the gradient of the cumulative reward and making gradient ascent. Specifically, the policy is modeled as

. To estimate the parameter

of , policy gradient methods maximize the expected cumulative reward from the start state to the end of the decision process . The objective function of reinforcement learning is defined as follows:


where is the trajectories generated by which consists of state , action and reward , is the number of trajectory samples, is the length of the decision process. Policy gradient optimizes the parameter via gradient ascent where the gradient is calculated by the policy gradient theorem [29]:


Iii Approach

We formulate the robust graph representation learning problem as sequentially selecting an optimal set of neighbors for each node with maximum cumulative reward signals and aggregating features from nodes’ optimal neighborhoods. In this part, we formally define the problem, the environment setting for signal neighbor selection, and the GDPNet model.

Iii-a Problem Formulation

Given an attributed graph , where is the edge set and is the node set. collects the attribute information for each node where is a -dimensional attribute vector of node

. Note that we can simply use one-hot encoding for node features for a graph without attributes. Given a target node

, let be the one-hop neighbors of .

We aim to find a lower-dimensional representation for node . Firstly, a function is learned to map a neighborhood set into a signal neighborhood set , where . Then the node representations are generated based on the signal neighborhood set, . Given an order of the neighbors

, we decompose the conditional probability of

given as

using chain rule 

[19], where , indicates selecting as a signal neighbor while indicates removing . We solve this signal neighbor selection problem by learning a policy with neighborhood set and the predicted action values as inputs. The objective of signal neighbor selection is to select a subset of neighbors that maximize a given reward function , where is the generated signal neighborhood set, is the task-specific reward used to evaluate the action , and is the cumulative reward function. The representation of node can then be learned by aggregating the neighborhood information from the signal neighbors .

Selecting an optimal subset from a candidate set by maximizing an objective function is NP-hard which can be approximatively solved by greedy algorithms with a submodular function [22]. With this observation, we design our reward function that satisfies submodularity, and show that the proposed GDPNet is mathematically equivalent to solving the submodular maximizing problem. Thus our solution can be bounded by , where is the optimal neighborhood set.

Notation Description
neighborhood set of node
signal neighborhood set of node
signal neighborhood set of node at time
complementary set,
, embedding of target node at time
, feature vector of node
the states, , is the neighbor of
reward function at time
total reward function,
action space, where represents neighbor
selection, and represents neighbor removal
the policy which maps the current state into the
action distribution.
TABLE I: Notation description

Iii-B Signal Neighbor Selection Environment

We formulate the problem of selecting a set of signal neighbors from a given neighborhood set as a Markov decision process (MDP) , where is the state space, is the action space, is the state transition probability matrix that describes the transition probability of the state after taking an action, is the reward function and is discount factor of the MDP. The signal neighbor selection process can be described by a trajectory with time steps . MDP requires the state transition dynamics to satisfy the Markov property . Thus we learn a policy that only considers the current state .

In reinforcement learning, the agent learns a policy via interacting with the environment. The main components (i.e., state, action, and reward) in the signal neighbor selection environment are described as follows,

  • State (): The state encodes the information from the current node and the selected node , which is concatenation of the intermediate embeddings and of the target node and the neighbor , respectively. The calculation of and are defined in Section III-C. Consequently, a newly selected neighbor will update the embedding of from to which can be viewed as state transition.

  • Action (): Given an order of the neighbors of node , the policy maps the state into an action at each time step , . indicates is selected as a signal neighbor, while means is not selected.

  • Reward (): Our goal is to find an optimal set of signal neighbors from a finite neighborhood set to learn robust graph embedding for downstream tasks such as node classification, link prediction and node clustering. The downstream tasks can produce task-specific scores as the reward signal for the signal neighbor selection phase. To ensure that the combination of the selected neighbors have maximum cumulative rewards. We employ the submodular function framework to define the marginal value reward function:


    where aggregates both the target node feature and the neighbors’ features to update the representations of the target node [13], and returns the micro-averaged F1 score from the node classification task when considers as the neighbor.

The environment updates the states from to by calculating the representations at time . It can be considered as a state transition:


If , , otherwise .

Iii-C Graph Denoising Policy Network

With the definitions of the signal neighbor selection environment, we introduce the GDPNet model which includes two phases: signal neighbor selection and representation learning. Given a target node , GDPNet first takes its neighborhood set as input and outputs a signal neighborhood subset . Then the representations is learned by aggregating the information from the signal neighborhood subset .

Iii-C1 Determine the Neighborhood Order

As aforementioned, we use chain rule to decompose the signal neighbor selection as a sequential decision making process. However, it requires an order to make decisions. Here we design a high-level policy to learn an order for the policy to take action.

We define a regret score for each neighbor to help determine the order. A neighbor with large regret score indicates it will be selected with higher probability. At each time step, we calculate the regret score of each neighbor and sample one of the neighbor to be the neighbor. The regret score is described as follows:


where is the -th neighborhood in the neighborhood set with a random order and are parameter matrices. To reduce the size of for computational efficiency, we add an ending neighbor to for early stopping purpose. When is sampled, the neighborhood selection process of node stops. We use the Softmax function to normalize the regret scores, and sample one neighbor from the distribution generated by Softmax to be the neighbor.


where is the neighbor for signal neighbor selection, . indicates the regret score of the ending neighbor . After selecting a neighbor , we adopt the policy to determine whether to select as a signal neighbor. Then will be removed from .

Iii-C2 Signal Neighbor Selection

Given the neighbor , GDPNet takes an action at time step to decide whether to select the . We will make decisions to select the signal neighbors for node . Here the total number of signal neighbors can be automatically determined. As illustrated in Fig. 2, a policy is learned to map the state to the action at time step , meanwhile the corresponding reward will be provided. Our goal is to maximize the total reward of all the actions taken during these time steps, which can be learned by the following policy network,


where and are weight matrices shared with Eq. (7), and action

is sampled from a Bernoulli distribution which is generated by


Iii-C3 Representation Learning

At each time step, GDPNet calculates the embeddings of the target node and the -th neighbor as follows,


where , and are the features of node and respectively. We computed the embedding of neighbor via its own feature , because the goal is to evaluate the individual contribution of . In this work we only consider one-hop neighbors for simplicity. The GDPNet model can be easily extended to aggregate the information from multi-hop neighbors with an augmented candidate neighborhood set for selecting the signal neighbors.

As defined in Section III-B, the state at time step , , is a concatenation of the intermediate node embeddings and . Eventually, the representations and state can be obtained.

Iii-C4 Iteration-wise Optimization

We consider an iteration-wise optimization approach to optimize the GDPNet model, which optimizes the signal neighbor selection phrase and representation learning phrase iteratively to learn the policy and the representations . As for representation learning phase, it aggregates the information from the signal neighbors selected by to learn an embedding for target node . Meanwhile, the policy is trained with the states calculated by and the corresponding rewards. In this paper, is optimized with Proximal Policy Optimization (PPO), one of the widely used policy gradient method [28].


where Kullback–Leibler (KL) divergence penalty is used to control the change of the policy at each iteration to perform a trust region update with a threshold . and are the policy and Q-value, respectively, which are saved before the current time step during training. is the discounted state distribution defined as,


Iv Connection with Submodular Maximization

The design of the reward function in GDPNet described in Section III-B is inspired by the submodular function. With this carefully designed reward function, we build the connections with submodular maximization problem, and show that the solution provided by GDPNet can be bounded by , where is the optimal neighborhood set. In this section we first introduce the key definitions related to submodular functions, followed by the proof of monotonicity and submodularity properties of the reward function in GDPNet.

Iv-a Submodular Reward Function

In this section, we show that given a special form of reward function, the total reward function in GDPNet turns out to be submodular.

Definition 1 (Submodular Function).

Let be a finite ground set and is set function. A set function is submodular if it satisfies the diminishing returns property: and the monotone property: for all and .

Definition 2 (Submodular Maximization).

Let be an optimizer which maps a set in to a subset with size smaller than :


The submodular maximization problem is to find the best possible which satisfying:


The reward function in GDPNet is denoted by , which is also named marginal value. Specifically, the reward function in GDPNet can be expressed as:


where . Given this reward function , we can prove that the cumulative reward function is a submodular function.

Proposition 1.

The total reward function is a monotone function, where


is monotone if whenever and .


we have based on Eq. (16). Therefore , and is monotone. ∎

Proposition 2.

The total reward function satisfies the submodularity property. That is,


whenever and


We define,


Then we need to show . Based on the reward definition in Eq. (16), we have,


Assume , and , the above equations can rewritten as,


We have based on the monotonicity property. Thus we have . ∎

Iv-B Equivalence between GDPNet and Submodular Maximization Problem

We will establish the following facts, which together imply the equivalence between GDPNet and submodular maximization problem,

  • The total reward function defined in the signal neighbor selection phase is a submodular function, which is equivalent to .

  • The submodular maximization problem can be formulated as an MDP which is equivalent to GDPNet.

  • The objective function in GDPNet is equivalent to the counterpart in submodular maximization.

Firstly, the goal of submodular maximization is to find the , with the objective function:


where is the cumulative value of each element in set . Let be the selected neighborhood set , then .

Secondly, the submodular maximization problem can be formulated as an MDP where the set with the selected items indicates the state. After adding a new item into , the state is updated to .

Lastly, the objective function of GDPNet also aligns to the optimizer in submodular maximization, where each can be considered as an optimizer in Equation (25):


where is equivalent to the signal neighbor set

Theorem 1.

Greedy gives a -approximation for the problem of when is a monotone submodular function.


Based on the aforementioned equivalence between the objective functions of GDPNet and submodular maximization, we need to show that a -approximation solution can be achieved for with a submodular function , which has been proved in [22]. ∎

V Experiment

Experiments are conducted to evaluate the robustness of the representations learned by the proposed GDPNet model. As for quantitative experiments, we focus on two tasks: (1) Robustness Evaluation, we use micro-averaged F1 score to evaluate our model against baselines on node classification task, and (2) Denoising Evaluation

, we evaluate the denoising capability of GDPNet by comparing with baselines running on the denoised graph generated by GDPNet. We extract four datasets Cora, Citeseer, PubMed and DBLP followed by spliting them for training, test and validation with the supervised learning scenario which follows the previous work 

[13, 31, 6]. As for qualitative experiments, we conduct the embedding visualization which projects the learned high-dimension representations to a 2D space. In all these experiments, we separate out test data from training and perform predictions on nodes that are not seen during training.

Dataset Nodes Edges Classes Features Train/Validate/Test
Cora 2,708 5,429 7 1,433 1,208/500/1,000
Citeseer 4,230 5,358 6 602 2,730/500/1,000
PubMed 19,717 44,338 3 500 18,217/500/1,000
DBLP 17,716 105,734 4 1,639 16,216/500/1,000
TABLE II: Basic statistics of datasets

V-a Experimental Setup and Baselines

For all these tasks, we apply a two-layer policy network to select the signal neighbors. The architectural hyper-parameters are optimized on the Cora dataset and shared by the other datasets. The embedding dimension is . The size of the two hidden layers in policy network are and , respectively, with active function ReLU. The batch size is . The discount factor is optimized as for Cora and DBLP, for PubMed and for Citeseer. We compare our method with the following baselines:

  • LR

    : Logistic regression (LR) model which takes the node features as inputs, and ignores graph structure.

  • GCN [17]: GCN uses the local connection structure of the graph as the filter to perform convolution, where filter parameters are shared over all locations in the graph. We use inductive version of GCN in this paper for comparison

  • GAT [31]: GAT utilizes the attention mechanism to enhance the performance of the graph convolutional network by considering the entire neighborhoods.

  • FastGCN [6]: FastGCN considers graph convolutions as integral transforms of embedding functions, and samples the neighborhoods in each layer independently to addresses the recursive expansion of neighborhoods.

  • GraphSAGE [13] GraphSAGE extends the original graph convolution-style framework to the inductive setting. It randomly samples a fixed-size neighborhood of each node followed by performing a specific aggregator over it.

Our proposed model is denoted as GDPNet. We also introduce a variant GDPNet which performs the signal neighbor selection with a random order of the neighbors.

(a) Cora-GDPNet
(b) Cora-GAT
(c) Cora-GCN
(d) Cora-GraphSAGE
(e) Citeseer-GDPNet
(f) Citeseer-GAT
(g) Citeseer-GCN
(h) Citeseer-GraphSAGE
(i) PubMed-GDPNet
(j) PubMed-GAT
(k) PubMed-GCN
(l) PubMed-GraphSAGE
(p) DBLP-GraphSAGE
Fig. 3: Visualizations of the compared methods on Cora.

V-B Performance Comparison

In this section, we first visualize the node representations learned by different methods, followed by the performance comparison on node classification task. Additionally, we show the distributions of the selected signal neighbors with GDPNet on different dataset.

V-B1 Embedding Visualization

Node representations are learned by GAT, GCN, GraphSAGE and GDPNet on test dataset of Cora, and visualized with t-SNE [20], as shown in Fig. 3. Different colors in the figure represent different categories in Cora. The following observations can be made from Fig. 3,

  • GDPNet correctly detects the classes in Cora, providing empirical evidence for the effectiveness of our method. This can be seen by the clear gap between samples with different colors. It also demonstrates that, removing the noisy neighbors can help nodes learn better representations.

  • GCN and GraphSAGE share similar “shape” in the 2D space. The reason is that in the inductive learning setting, GCN and GraphSAGE use the same methods in neighborhood sampling. GAT considers the entire neighborhoods which leads to a different visualization result from the others. It can be seen that the sampled neighbors have a profound effect on the representations.

  • GAT cannot effectively identify different classes as other methods, it might because it considers all the neighbors with attention weights, which is easily to introduce noisy neighbors.

Method Cora PubMed DBLP Citeseer
TABLE III: Summary of node classification results in terms of micro-averaged F1 score, for Cora, Citeseer, PubMed and DBLP
Method Cora PubMed DBLP Citeseer
TABLE IV: Node classification results on original graph and the denoised graph by GDPNet, measured with micro-averaged F1 score
Fig. 4: The distribution of the selected signal neighbor percentages.

V-B2 Results on Node Classification

In this part, we compare the performance of GDPNet against the baselines on Cora, Citeseer, PubMed and DBLP. For all methods, we run the experiments with random seeds over

trials and record the mean and standard variance of the

micro-average F1 scores. The results are summarized in Table III. From the table we observe that,

  • GDPNet consistently outperforms the other methods, which demonstrates there exists a set of noisy neighbors in each dataset on node classification task, and GDPNet can learn robust embeddings by effectively removing these noisy neighbors.

  • GCN, FastGCN and GraphSAGE show lower F1 scores. The reason is that these methods randomly sample a subset of neighbors for representation learning, which is hard to avoid the noisy neighbors. In addition, variance is higher via random sampling.

  • GAT learns the importance of the neighbors with attention weights, which is also sensitive to noisy data according to the reported results.

  • Another interesting observation is that Logistic regression achieves better performance than the other baselines on PubMed, which indicates that there would be less signal neighbors for the nodes in PubMed. This observation can also be verified in Fig.4.

  • GDPNet has a lower F1 score with higher variance than GDPNet, which demonstrates that the order of the decisions has an effect on the performance of representation learning. Thus learning an order for the neighbors is beneficial for selecting signal neighbors and robust graph representation learning.

Fig. 5: Parameters analysis
Fig. 6: Performance on different percentage of training data
Fig. 7: Convergence analysis

V-B3 Distribution of the Selected Neighbors

Fig. 4 shows the distribution of the selected neighbor percentages, where the -axis indicates the percentage of the nodes been selected as signal neighbors, and the -axis indicates the probability densities. We observe that most of the neighbors in Citeseer and DBLP are selected while only a few neighbors are selected in PubMed. The results show that there would be more “noisy” citations (e.g. cross-field citation) in PubMed than in Citeseer and DBLP. Interestingly, most of the research papers collected in Citeseer and DBLP are from computer science, while PubMed collects papers from biomedical.

V-C Ablation Study

V-C1 Node classification performance comparison on selected signal neighbors

In this part, we evaluate the effectiveness of denoising process in GDPNet. Specifically, we first utilize the policy learned by GDPNet to remove the noisy neighbors from Citeseer and PubMed. With the denoised graphs, we learn representations with GCN and GraphSAGE to see whether the performance can be improved on the denoised graphs. The results are summarized in Table IV, where the suffix “” indicates the results on the denoised graphs generated by GDPNet. As expected, both GCN and GraphSAGE achieves better performance on the denoised graphs, which demonstrates the effectiveness of the denoising process in GDPNet.

V-C2 Parameter Sensitivity Study

In Fig. 7, we vary the training percentage of nodes in Citeseer and PubMed to test the classification accuracy. We observe that, the performance of all the methods are improved with the increases of the training percentage. Additionally, it can be seen that GAT is very sensitive to the percentages of training data, and it requires larger proportion of training data in order to have a desirable performance. GraphSAGE, GCN and GDPNet achieve good performances on small training data, and GDPNet make more improvements as the training data percentage increases.

Discount factor balances the importance between instant reward and long-term reward. The large indicates the more important role of long-term reward. Fig. 5 shows that when , Citeseer achieves the best performance, while PubMed achieves best performance when . We can see that Citeseer is more sensitive to the discount factor than PubMed.

Fig. 5

presents the analysis on the number of epochs for representation learning phase. It can be seen from the figure that, with the increase of epochs (between

and ), the performances of PubMed and Citeseer are both improved. The epochs to achieve best performance are and for PubMed and Citeseer, respectively.

In Fig. 7, we vary the training percentage of nodes in Citeseer and PubMed to test the classification accuracy. We observe that, the performance of all the methods are improved with the increases of the training percentage. Additionally, it can be seen that GAT is very sensitive to the percentages of training data, and it requires larger proportion of training data in order to have a desirable performance. GraphSAGE, GCN and GDPNet achieve good performances on small training data, and GDPNet make more improvements as the training data percentage increases.

V-C3 Convergence Analysis

Fig. 7 shows the convergence analysis of GDPNet on Citeseer and PubMed. We initialize the policy randomly when epoch equals , and the neighbors are randomly selected as signal neighbors. We observe that Citeseer converges faster than PubMed. One explanation would be that PubMed has more nodes than Citeseer, which requires more time to explore the policy for nodes.

Vi Related Work

In this section, we briefly describe previous graph representation learning approaches including matrix factorization based methods and graph neural network based methods, and recent advancements in applying reinforcement learning on graph.

Vi-a Graph Representation Learning

Graph representation learning tries to encode the graph structure information into vector representations. The main idea is to learn a mapping function from the nodes or entire graphs into an embedding space where the geometric relationships in the low-dimensional space coincide with the original graph. The methods can be grouped into two categories: matrix factorization based methods and graph neural network based methods [14].

Vi-A1 Matrix Factorization based Embedding

Matrix factorization based methods learns an embedding look-up table which trains unique embedding vectors for each node independently. These methods largely focused on matrix-factorization approaches and random walk approaches [14, 11, 4, 7]. Matrix-factorization approaches utilize dimension reduction methodology to learn the representations [5, 1, 24]

with the loss of node pair similarity. Inspired by the success of natural language processing 

[21], a set of methods use random walk to learn the node embeddings where the node similarity is calculated by co-occurrence statistics from sentence-like vertex sequences generated by random walks among connected vertices [25, 32, 12, 30, 9]. The random-walk based method have been verified to be unified into the matrix factorization framework [26].

Vi-A2 Graph Neural Network based Embedding

A set of graph neural network based embedding methods are proposed recently for representation learning [3, 10, 18, 23]. GCN [17] first proposes the first-order graph convolution layer to perform recursive neighborhood aggregation based on the local connection. Instead of utilizing full graph Laplacian during training in the GCN, GraphSAGE [13] considers the inductive setting to handle the large scale graph with batch training and neighborhood sampling. Followed by GraphSAGE, self-attention mechanism has been explored to enhance the representation learning performance [31, 35]. To accelerate the training of GCNs,  [6] samples the nodes in each layer independently, while [15] samples the lower layer conditioned on the top one and the sampled neighborhoods are shared by different parent node. In this work, we propose to find an effective subset of neighbors for learning robust representations.

Vi-B Reinforcement Learning on Graph

Reinforcement learning solves the sequential decision making problem with the goal of maximizing cumulative rewards of these decisions. A set of work used reinforcement learning to solve the sequential decision making problems in graph, such as minimum vertex cover, maximum cut and travelling salesman problem [16, 2]. You et al. [33] considered the molecular graph generation process as a sequential decision making process where the reward function is designed by non-differentiable rules. Dai et al. [8] utilized reinforcement learning to learn an attack policy to make multiple decisions (delete or add edges in the graph) to attack the graph.

Vii Conclusion

In this paper, we developed a novel framework, GDPNet, to learn robust representations from noisy graph data through reinforcement learning. GDPNet includes two phases: signal neighbor selection and representation learning. It learns a policy to sequentially select the signal neighbors for each node, and then aggregates the information from the selected neighbors to learn node representations for the down-stream tasks. These two learning phases are complementary and achieves significant improvement. We show that our method mathematically is equivalent to maximizing the submodular function with the carefully designed reward function, which guarantees our objective value can be bounded by . Note that GDPNet is naturally an inductive model which can generate representations for unseen nodes. Experiments on a set of well-studied datasets provide empirical evidence for our analytical results, and yield significant gains in performance over state-of-the-art baselines.


  • [1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola (2013) Distributed large-scale natural graph factorization. In WWW, pp. 37–48. Cited by: §VI-A1.
  • [2] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio (2017)

    Neural combinatorial optimization with reinforcement learning

    In ICLR, Cited by: §VI-B.
  • [3] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §VI-A2.
  • [4] H. Cai, V. W. Zheng, and K. C. Chang (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. TKDE 30 (9), pp. 1616–1637. Cited by: §I, §VI-A1.
  • [5] S. Cao, W. Lu, and Q. Xu (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 891–900. Cited by: §VI-A1.
  • [6] J. Chen, T. Ma, and C. Xiao (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247. Cited by: §I, 4th item, §V, §VI-A2.
  • [7] P. Cui, X. Wang, J. Pei, and W. Zhu (2018) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §I, §VI-A1.
  • [8] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song (2018) Adversarial attack on graph structured data. In ICML, Cited by: §VI-B.
  • [9] Y. Dong, N. V. Chawla, and A. Swami (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In KDD, pp. 135–144. Cited by: §VI-A1.
  • [10] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In NeurIPS, pp. 2224–2232. Cited by: §VI-A2.
  • [11] P. Goyal and E. Ferrara (2018) Graph embedding techniques, applications, and performance: a survey. Knowledge-Based Systems 151, pp. 78–94. Cited by: §VI-A1.
  • [12] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In KDD, pp. 855–864. Cited by: §VI-A1.
  • [13] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In NeurIPS, pp. 1024–1034. Cited by: §I, 3rd item, 5th item, §V, §VI-A2.
  • [14] W. L. Hamilton, R. Ying, and J. Leskovec (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584. Cited by: §I, §VI-A1, §VI-A.
  • [15] W. Huang, T. Zhang, Y. Rong, and J. Huang (2018) Adaptive sampling towards fast graph representation learning. In NeurIPS, pp. 4558–4567. Cited by: §VI-A2.
  • [16] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song (2017) Learning combinatorial optimization algorithms over graphs. In NeurIPS, pp. 6348–6358. Cited by: §VI-B.
  • [17] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §I, 2nd item, §VI-A2.
  • [18] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §VI-A2.
  • [19] W. Liu and I. Tsang (2015)

    On the optimality of classifier chain for multi-label classification

    In NeurIPS, pp. 712–720. Cited by: §III-A.
  • [20] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne.

    Journal of machine learning research

    9 (Nov), pp. 2579–2605.
    Cited by: §V-B1.
  • [21] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §VI-A1.
  • [22] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher (1978) An analysis of approximations for maximizing submodular set functions—i. Mathematical programming 14 (1), pp. 265–294. Cited by: §III-A, §IV-B.
  • [23] M. Niepert, M. Ahmed, and K. Kutzkov (2016)

    Learning convolutional neural networks for graphs

    In ICML, pp. 2014–2023. Cited by: §VI-A2.
  • [24] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu (2016) Asymmetric transitivity preserving graph embedding. In KDD, pp. 1105–1114. Cited by: §VI-A1.
  • [25] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In KDD, pp. 701–710. Cited by: §VI-A1.
  • [26] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §VI-A1.
  • [27] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §I.
  • [28] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §III-C4.
  • [29] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour (2000) Policy gradient methods for reinforcement learning with function approximation. In NeurIPS, pp. 1057–1063. Cited by: §II-B.
  • [30] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In WWW, pp. 1067–1077. Cited by: §VI-A1.
  • [31] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §I, 3rd item, §V, §VI-A2.
  • [32] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Chang (2015) Network representation learning with rich text information. In IJCAI, Cited by: §VI-A1.
  • [33] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec (2018) Graph convolutional policy network for goal-directed molecular graph generation. In NeurIPS, pp. 6410–6421. Cited by: §VI-B.
  • [34] W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang (2018)

    Learning deep network representations with adversarially regularized autoencoders

    In KDD, pp. 2663–2671. Cited by: §I.
  • [35] J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D. Yeung (2018) Gaan: gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294. Cited by: §VI-A2.
  • [36] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §I, §II-A.