Generative Graph Convolutional Network for Growing Graphs

03/06/2019 ∙ by Da Xu, et al. ∙ 0

Modeling generative process of growing graphs has wide applications in social networks and recommendation systems, where cold start problem leads to new nodes isolated from existing graph. Despite the emerging literature in learning graph representation and graph generation, most of them can not handle isolated new nodes without nontrivial modifications. The challenge arises due to the fact that learning to generate representations for nodes in observed graph relies heavily on topological features, whereas for new nodes only node attributes are available. Here we propose a unified generative graph convolutional network that learns node representations for all nodes adaptively in a generative model framework, by sampling graph generation sequences constructed from observed graph data. We optimize over a variational lower bound that consists of a graph reconstruction term and an adaptive Kullback-Leibler divergence regularization term. We demonstrate the superior performance of our approach on several benchmark citation network datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Real-world graph structured data is often under dynamic growth as new nodes emerge over time. However, directly modeling the generation process from observed graph data remains a difficult task due to the complexity of graph distributions. Recently, there have been significant advances in learning graph representations by mapping nodes onto latent vector space. The latent factors (embeddings) which have simpler geometric structures can then be used for downstream machine learning analysis such as generating graph structures

[1]

and various semi-supervised learning tasks

[2].

Some early approaches in node embedding such as graph factorization algorithm [3], Laplacian eigenmaps [4] and HOPE [5] are based on deterministic matrix-factorization techniques. Later approaches arise from random walk techniques that provide stochastic measures for analysis, including DeepWalk [6], node2vec [7] and LINE [8]. More recent graph embedding techniques focus on building deep graph covolutional networks (GCN) [9] as encoders that aggregate neighborhood information [10]. Variants of GCN have been proposed to tackle the computational complexity for large graphs, such as FastGCN [11] which applies graph sampling techniques. GraphSAGE [12] is another time-efficient inductive graph representation learning approach that implements localized neighborhood aggregations.

On the other side, advancements in generative models compatible with deep neural networks such as variational autoencoders (

VAE) [13, 14] and generative adversarial networks (GAN) [15] have enabled direct modeling for generation of complex distributions. As a consequence, there have been several recent work on deep generative models for graphs [16, 17, 18, 19]. However, many of them only deal with fixed graphs [19, 18] or graphs of very small sizes [17, 1]. Moreover, most graph representation learning methods require at least some topological features from all nodes in order to conduct neighborhood aggregations or random walks, which is clearly infeasible for growing graphs with isolated new nodes. To obtain embeddings and further generate graph structures for both new and old nodes, it is essential to utilize node attributes. Also, instead of learning how the observed graph is generated as a whole, the graph generation should be decomposed into sequences that reflect how new nodes are sequentially fitted into existing graph structures.

1.2 Related methods

Variational Autoencoder Unlike vanilla autoencoder, VAE treats the latent factors

as random variables such that they can capture variations in the observed data

[13]. VAE has shown high efficiency in recovering complex multimodal data distributions. The parameters in encoding distribution and decoding distribution are optimized over the evidence (variational) lower bound (ELBO)

The expectation with respect to is approximated stochastically by reparametrizing as , where are independent standard Gaussian variables. This is also referred to as ’reparameterization trick’ [13]. It allows sampling directly from

so that the backpropagation technique becomes feasible for training deep networks.

Graph Convolutional Network The original GCN [9] deals with node classification as a semi-supervised learning task. The layer-wise propagation rule is defined as . Here is the normalized adjacency matrix with where gives the degree of node . The

is some activation function such as ReLU.

is the output of the layer and is the layer-specific aggregation weights. Here where is the dimension of the hidden units on layer.

Graph Convolutional Autoencoder (GAE) GAE is an important extension of GCN for learning node representations for link prediction [18]. A two-layer GCN is used as encoder . When adapting GCN into VAE framework (GCN-VAE), the hidden factors

are assumed to follow independent normal distributions which are parameterized by mean

and log of standard deviation

, where and . The pairwise decoding (generative) distribution for link between node and is simply where

is the sigmoid function. The ELBO is formulated as

GraphRNN Recently a graph generation approach relying only on topological structure was proposed in [16]. It learns the sequential generation mechanism by training on a set of sampled sequences of decomposed graph generation process. The formation of each new edge is conditioned on the graph structure generated so far. Our approach refers to this idea of sampling from decomposed generation sequences.

1.3 Present Work

This work addresses the challenge of generating graph structure for growing graphs with new nodes that are unconnected to the previous observed graph. It has important meaning for the cold start problems [20] in social networks and recommender systems. The major assumption is that the underlying generating mechanism is stationary during growth. GraphRNN neither takes advantage of node attributes nor does it naturally extends to isolated new nodes. Most other graph representation learning methods have similar issues, specifically the isolation from existing graph hinders passing messages or implementing aggregation.

We deal with this problem by learning how graph structures are generated sequentially, for cases where both node attributes and topological information exist as well as for cases where only node attributes are available. To the best of our knowledge, this work is the first of its kind in graph signal processing.

2 Method

Let the input be observed undirected graph with associated binary adjacency matrix , node attributes and the new nodes with attributes . Our approach learns the generation of overall adjacency matrix for .

2.1 Proposed Approach

We start by treating incoming nodes as being added one-by-one into the graph. Let be the observed adjacency matrix up to the step according to the ordering . When the

node is presented, we treat it as connected to all of previous nodes with the same probability

, where may reflect the overall sparsity of the graph. Hence the new candidate adjacency matrix denoted by is given by

(1)

Similar to GraphRNN, we obtain the marginal distribution for graph by sampling the auxiliary

from the joint distribution of

with

where maps the tuple back to a unique graph . Each sampled gives a

that constitutes one-sample mini-batch that drives the stochastic gradient descent (SGD) for updating parameters.

To illustrate the sequential generation process, we decompose joint marginal log-likelihood of under the node ordering into

(2)

The log-likelihood term of initial state is not of interest as we focus on modeling transition steps.

Following VAE framework with hidden factors as Gaussian variables, for each transition step, we use encoding distribution , generating distribution , and conditional prior . From now on we treat the conditional on as implicit for simplicity of notation. The variational lower bound for each step is given by:

(3)

Here is the reconstruction term for node attributes, which is not our target. We will discuss the interpretation for our evidence lower bound in Section 2.2. Given that we have assumed the consistency of underlying generating mechanism, we use the same set of parameters for each step.

When formulating encoding distribution, due to the efficiency of GCN in node classification and linkage prediction, we adopt their convolutional layers. The two-layer encoder for the step is then given by:

(4)

where is activation function and denotes the normalized candidate adjacency matrix constructed by (1). We also adopt the pairwise inner product decoder for edge generation:

(5)

with being the sigmoid function. Another reason for using simple decoder being that in VAE framework if the generative distribution is too expressive, the latent factors are often ignored [21].

As for conditional priors of hidden factors, standard Gaussian priors are no longer suitable because we already have information from previous nodes at the step. Hence, we use what the model has informed us till the step in an adaptive way by treating as , where are the hidden factors for previous nodes and is for the new node. For we can use the encoding distribution where the candidate adjacency matrix passes information from previous steps. For the new node we keep using standard Gaussian prior. This gives us

(6)

We use the sum of negative ELBO in each transition step as loss function (

) and obtain optimal aggregation weights by minimizing this loss.

In practice, it’s not necessary to consider adding new nodes one-by-one. Instead, the new nodes can be added in a batch-wise fashion to alleviate computational costs. In preliminary experiments we also observe that sampling uniformly at random from all node permutations gives very similar results to sampling from BFS orderings, hence we report results with the uniform sampling schema.

Figure 1: An illustration for the workflow of our approach at a transition step. The growing graph has an observed subgraph with three nodes (node 0, 1 and 2) and an incoming new node (node 3). The informative conditional priors for latent factors contain structural information of the observed subgraph. The encoding distributions for all latent factors are formed according to the candidate adjacency matrix where candidate edges (the dashed edges) are added to original subgraph.

2.2 Adaptive Evidence Lower Bound

The loss function can be rearranged into (7) (with ):

(7)

The first term sums up the reconstruction loss in each generation step. The second term serves as an adaptive regularizer that enforces the posterior of latent factors for observed nodes to remain close to their priors which contain information from previous steps. This can prevent the model from overfitting the edges of the new nodes, which is quite helpful in our batched version where new edges can outnumber original edges, as we are fitting new nodes into original structure.

Similar to -VAE [22], we also introduce the tuning parameter as shown in (7) to control the tradeoff between the reconstruction term and the adaptive regularization.

3 Experiment

Method Cora Citeseer Pubmed
AUC AP AUC AP AUC AP
Isolated new nodes
GCN-VAE 75.12 0.4 76.32 0.3 79.36 0.3 82.13 0.1 85.52 0.2 85.43 0.1
MLP-VAE 75.59 0.7 75.64 0.5 81.76 0.6 83.67 0.4 77.13 0.4 77.24 0.3
G-GCN 83.30 0.3 85.03 0.3 89.54 0.2 91.30 0.2 87.49 0.2 87.24 0.1
Nodes in observed graph
GCN-VAE 93.15 0.4 94.42 0.2 93.27 0.4 94.42 0.1 96.74 0.4 96.94 0.3
MLP-VAE 86.55 0.2 87.21 0.3 87.13 0.2 89.34 0.1 79.39 0.5 79.53 0.3
G-GCN 94.07 0.4 95.15 0.2 94.62 0.7 95.93 0.7 96.96 0.6 97.27 0.5
Table 1:

Results for link prediction tasks in citation networks. Standard error is computed over 10 runs with random initializations on random dataset splits. The first three rows are results for the first task on new nodes, and last three rows are results for the second task on nodes in observed graph.

We test our generative graph convolution network (G-GCN) for growing graphs on two tasks: link prediction for isolated new nodes, and for nodes in observed graph. We use three benchmark citation network datasets: Cora, Citeseer and Pubmed. Their details are described in [23]. Node attributes are informative for all three datasets, which is indicated by the results of GCN-VAE in [18].

3.1 Baselines

We compare our approach against GCN-VAE

and a multilayer perceptron

VAE (MLP-VAE) [13]. Here the encoder of MLP-VAE is constructed by replacing the adjacency matrices in GCN-VAE with non-informative identity matrices. Their decoders are the same as our approach in (5). The difference is that GCN-VAE uses both topological information and node attributes, while MLP-VAE only considers node attributes. When predicting edges for isolated new nodes, for all three methods, we plug the ’candidate’ adjacency matrix formulated in (1) with into the encoder-decoder frameworks and recover the true adjacency matrix.

We choose these two methods to compare with, instead of others, because both of them are able to utilize node attributes and follow from VAE framework. As we mentioned, most other graph embedding and graph generation techniques do not work for growing graphs without nontrivial modifications.

3.2 Experiment Setup

Link prediction for isolated new nodes
For each citation network, a growing graph is constructed by randomly sampling an observed subgraph containing 70% of all nodes. The left-out nodes are treated as isolated new nodes. The subgraph is used for training and the validation and test sets are formed by the edges between nodes in observed subgraph and the new nodes as well as the edges among the new nodes according to the original full graph. As we are treating new nodes as being added in a batch-wise fashion, the size of new node batch is set to be .
Link prediction for nodes in observed graph
We then test our model on the original link predictions task [18], which predicts the existence of unseen edges between nodes in observed graph. In this task we adopt their experiment setup, where 10% and 5% of the edges are removed from the training graph and used as positive validation set and test set respectively. The same amount of unconnected node pairs are sampled and constitute the negative examples.

In both tasks we use a 400-dim hidden layer and 200-dim latent variables, and train for 200 iterations using the Adam optimizer with a learning rate of 0.001 for all methods. Notice that all three methods use encoding layers of the same form and their decoding layers are all parameter-free, so they already have the same number of parameters. The implementation of GCN-VAE

on the second task is conducted using their official Tensor-Flow code. The rest are conducted with our own implementations with PyTorch.

3.3 Results

We report area under the ROC curve (AUC) and average precision (AP) scores for each model on the test sets for the two tasks (Table 1).

Firstly, our approach outperforms both baselines in new node link prediction task across all three datasets, in terms of both AUC and AP. By comparing to MLP-VAE we show our advantage of learning with topological information, and our better performance over GCN-VAE indicates the importance of modeling the sequential generating process when making predictions on new nodes.

Secondly, G-GCN has comparable or even slightly better results than GCN-VAE on link prediction task in observed graph, which suggests that our superior performance on isolated new nodes is not at the cost of the performance on nodes in observed graph. This is within expectation since our approach learns the generation process as graph structure keeps growing under our sequential training setup, where new nodes are added in each step. It targets on nodes in observed graph as well as new nodes while not overfitting either of them. In a nutshell, our approach achieves better performance on link prediction task for the growing graphs as a whole.

4 Conclusion and Future Work

We propose a generative graph convolution model for growing graphs that incorporates graph representation learning and graph convolutional network into a sequential generative model. Our approach outperforms others in all benchmark datasets on link prediction for growing graphs.

However, scalability remains a major issue as the computational complexity depends on the size of full graph. The idea of localized convolution from GraphSAGE [12] and graph sampling from FastGCN [11] may have pointed out promising directions, which we leave to future work.

References