Adversarially Regularized Graph Autoencoder

02/13/2018 ∙ by Shirui Pan, et al. ∙ University of Technology Sydney UNSW 0

Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics. Most existing embedding algorithms typically focus on preserving the topological structure or minimizing the reconstruction errors of graph data, but they have mostly ignored the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data. In this paper, we propose a novel adversarial graph embedding framework for graph data. The framework encodes the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure. Furthermore, the latent representation is enforced to match a prior distribution via an adversarial training scheme. To learn a robust embedding, two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), are developed. Experimental studies on real-world graphs validate our design and demonstrate that our algorithms outperform baselines by a wide margin in link prediction, graph clustering, and graph visualization tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphs are essential tools to capture and model complicated relationships among data. In a variety of graph applications, including protein-protein interaction networks, social media, and citation networks, analyzing graph data plays an important role in various data mining tasks including node classification [Kipf and Welling2016a], link prediction [Wang et al.2017c], and node clustering [Wang et al.2017a]

. However, the high computational complexity, low parallelizability, and inapplicability of machine learning methods to graph data have made these graph analytic tasks profoundly challenging

[Cui et al.2017]. Recently graph embedding has emerged as a general approach to these problems.

Graph embedding converts graph data into a low dimensional, compact, and continuous feature space. The key idea is to preserve the topological structure, vertex content, and other side information [Zhang et al.2017a]. This new learning paradigm has shifted the tasks of seeking complex models for classification, clustering, and link prediction to learning a robust representation of the graph data, so that any graph analytic task can be easily performed by employing simple traditional models (e.g., a linear SVM for the classification task). This merit has motivated a number of studies in this area [Cai et al.2017, Goyal and Ferrara2017].

Graph embedding algorithms can be classified into three categories: probabilistic models, matrix factorization-based algorithms, and deep learning-based algorithms. Probabilistic models like DeepWalk

[Perozzi et al.2014], node2vec [Grover and Leskovec2016] and LINE [Tang et al.2015]

attempt to learn graph embedding by extracting different patterns from the graph. The captured patterns or walks include global structural equivalence, local neighborhood connectivities, and other various order proximities. Compared with classical methods such as Spectral Clustering

[Tang and Liu2011], these graph embedding algorithms perform more effectively and are scalable to large graphs.

Matrix factorization-based algorithms, such as GraRep [Cao et al.2015], HOPE [Ou et al.2016], M-NMF [Wang et al.2017b] pre-process the graph structure into an adjacency matrix and get the embedding by decomposing the adjacency matrix. Recently it has been shown that many probabilistic algorithms are equivalent to matrix factorization approaches [Qiu et al.2017]. Deep learning approaches, especially autoencoder-based methods, are also widely studied for graph embedding. SDNE [Wang et al.2016] and DNGR [Cao et al.2016] employ deep autoencoders to preserve the graph proximities and model positive pointwise mutual information (PPMI). The MGAE algorithm utilizes a marginalized single layer autoencoder to learn representation for clustering [Wang et al.2017a].

The approaches above are typically unregularized approaches which mainly focus on preserving the structure relationship (probabilistic approaches), or minimizing the reconstruction error (matrix factorization or deep learning methods). They have mostly ignored the data distribution of the latent codes. In practice unregularized embedding approaches often learn a degenerate identity mapping where the latent code space is free of any structure [Makhzani et al.2015], and can easily result in poor representation in dealing with real-world sparse and noisy graph data. One common way to handle this problem is to introduce some regularization to the latent codes and enforce them to follow some prior data distribution [Makhzani et al.2015]. Recently generative adversarial based frameworks [Donahue et al.2016, Zhao et al.2016, Dumoulin et al.2016, Radford et al.2015] have also been developed for learning robust latent representation. However, none of these frameworks is specifically for graph data, where both topological structure and content information are required to embed to a latent space.

In this paper, we propose a novel adversarial framework with two variants, namely adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), for graph embedding. The theme of our framework is to not only minimize the reconstruction errors of the graph structure but also to enforce the latent codes to match a prior distribution. By exploiting both graph structure and node content with a graph convolutional network, our algorithms encodes the graph data in the latent space. With a decoder aiming at reconstructing the topological graph information, we further incorporate an adversarial training scheme to regularize the latent codes to learn a robust graph representation. The adversarial training module aims to discriminate if the latent codes are from a real prior distribution or from the graph encoder. The graph encoder learning and adversarial regularization learning are jointly optimized in a unified framework so that each can be beneficial to the other and finally lead to a better graph embedding. The experimental results on benchmark datasets demonstrate the superb performance of our algorithms on three unsupervised graph analytic tasks, namely link prediction, node clustering, and graph visualization. Our contributions can be summarized below:

  • We propose a novel adversarially regularized framework for graph embedding, which represent topological structure and node content in a continuous vector space. Our framework learns the embedding to minimize the reconstruction error while enforcing the latent codes to match a prior distribution.

  • We develop two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA) to learn the graph embedding.

  • Experiments on benchmark graph datasets demonstrate that our graph embedding approaches outperforms the others on three unsupervised tasks.

2 Related Work

Graph Embedding Models. From the perspective of information exploration, graph embedding algorithms can be also separated into two groups: topological embedding approaches and content enhanced embedding methods.

Topological embedding approaches assume that there is only topological structure information available, and the learning objective is to preserve the topological information maximumly. Inspired by the word embedding approach [Mikolov et al.2013], Perozzi et al. propose a DeepWalk model to learn the node embedding from a collection of random walks [Perozzi et al.2014]. Since then, a number of probabilistic models such as node2vec [Grover and Leskovec2016] and LINE [Tang et al.2015] have been developed. As a graph can be mathematically represented as an adjacency matrix, many matrix factorization approaches such as GraRep [Cao et al.2015], HOPE [Ou et al.2016], M-NMF [Wang et al.2017b] are proposed to learn the latent representation for a graph. Recently deep learning models have been widely exploited to learn the graph embedding. These algorithms preserve the first and second order of proximities [Wang et al.2016], or reconstruct the positive pointwise mutual information (PPMI) [Cao et al.2016] via different variants of autoencoders.

Content enhanced embedding methods assume node content information is available and exploit both topological information and content features simultaneously. TADW [Yang et al.2015] proved that DeepWalk can be interpreted as a factorization approach and proposed an extension to DeepWalk to explore node features. TriDNR [Pan et al.2016]

captures structure, node content, and label information via a tri-party neural network architecture. UPP-SNE employs an approximated kernel mapping scheme to exploit user profile features to enhance the embedding learning of users in social networks

[Zhang et al.2017b].

Although these algorithms are well-designed for graph-structured data, they largely ignore the latent distribution of the embedding, which may result in poor representation in practice. In this paper, we explore adversarial training methods to address this issue.

Adversarial Models. Our method is motivated by the generative adversarial network (GAN) [Goodfellow et al.2014]. GAN plays an adversarial game with two linked models: the generator and the discriminator

. The discriminator can be a multi-layer perceptron which discriminates if an input sample comes from the data distribution or from the generator we built. Simultaneously, the generator is trained to generate the samples to convince the discriminator that the generated samples come from the prior data distribution. Due to its effectiveness in many unsupervised tasks, recently a number of algorithms have been proposed including BiGAN

[Donahue et al.2016], EBGAN [Zhao et al.2016] and ALI [Dumoulin et al.2016].

Recently Makhzani et al. proposed an adversarial autoencoder (AAE) to learn the latent embedding by merging the adversarial mechanism into the autoencoder [Makhzani et al.2015]. However, it is designed for general data rather than graph data. Dai et al. applied the adversarial mechanism to graphs. However, their approach can only exploit the topological information [Dai et al.2017]. In contrast, our algorithm is more flexible and can handle both topological and content information for graph data.

Figure 1: The architecture of the adversarially regularized graph autoencoder (ARGA). The upper tier is a graph convolutional autoencoder that reconstructs a graph from an embedding which is generated by the encoder which exploits graph structure and the node content matrix . The lower tier is an adversarial network trained to discriminate if a sample is generated from the embedding or from a prior distribution. The adversarially regularized variational graph autoencoder (ARVGA) is similar to ARGA except that it employs a variational graph autoencoder in the upper tier (See Algorithm 1 for details).

3 Problem Definition and Framework

A graph is represented as , where consists of a set of nodes in a graph and represents a linkage encoding the citation edge between the nodes. The topological structure of graph can be represented by an adjacency matrix , where if , otherwise . indicates the content features associated with each node .

Given a graph , our purpose is to map the nodes to low-dimensional vectors with the formal format as follows: , where is the -th row of the matrix . is the number of nodes and is the dimension of embedding. We take as the embedding matrix and the embeddings should well preserve the topological structure as well as content information .

3.1 Overall Framework

Our objective is to learn a robust embedding given a graph . To this end, we leverage an adversarial architecture with a graph autoencoder to directly process the entire graph and learn a robust embedding. Figure 1 demonstrates the workflow of ARGA which consists of two modules: the graph autoencoder and the adversarial network.

  • Graph convolutional autoencoder. The autoencoder takes in the structure of graph and the node content as inputs to learn a latent representation , and then reconstructs the graph structure from .

  • Adversarial regularization. The adversarial network forces the latent codes to match a prior distribution by an adversarial training module, which discriminates whether the current latent code comes from the encoder or from the prior distribution.

4 Proposed Algorithm

4.1 Graph convolutional autoencoder

The graph convolutional autoencoder aims to embed a graph in a low-dimensional space. Two key questions arise (1) how to integrate both graph structure and node content in an encoder, and (2) what sort of information should be reconstructed via a decoder?

Graph Convolutional Encoder Model .   To represent both graph structure and node content in a unified framework, we develop a variant of the graph convolutional network (GCN) [Kipf and Welling2016a] as a graph encoder. Our graph convolutional network (GCN) extends the operation of convolution to graph data in the spectral domain, and learns a layer-wise transformation by a spectral convolution function :

(1)

Here, is the input for convolution, and is the output after convolution. We have ( nodes and features) for our problem. is a matrix of filter parameters we need to learn in the neural network. If

is well defined, we can build arbitrary deep convolutional neural networks efficiently.

Each layer of our graph convolutional network can be expressed with the function as follows:

(2)

where and .

is the identity matrix of

and

is an activation function such as

or . Overall, the graph encoder is constructed with a two-layer GCN. In our paper, we develop two variants of encoder, e.g., Graph Encoder and Variational Graph Encoder.

The Graph Encoder is constructed as follows:

(3)
(4)

and linear activation functions are used for the first and second layers. Our graph convolutional encoder encodes both graph structure and node content into a representation .

A Variational Graph Encoder is defined by an inference model:

(5)
(6)

Here, is the matrix of mean vectors ; similarly which share the weights with in the first layer in Eq. (3).

Decoder Model.   Our decoder model is used to reconstruct the graph data. We can reconstruct either the graph structure , content information , or both. In our paper, we propose to reconstruct graph structure , which provides more flexibility in the sense that our algorithm will still function properly even if there is no content information available (e.g., ). Our decoder predicts whether there is a link between two nodes. More specifically, we train a link prediction layer based on the graph embedding:

(7)
(8)

Graph Autoencoder Model.   The embedding and the reconstructed graph can be presented as follows:

(9)

Optimization.   For the graph encoder, we minimize the reconstruction error of the graph data by:

(10)

For the variational graph encoder, we optimize the variational lower bound as follows:

(11)

where

is the Kullback-Leibler divergence between

and . We also take a Gaussian prior .

4.2 Adversarial Model

The key idea of our model is to enforce latent representation

to match a prior distribution, which is achieved by an adversarial training model. The adversarial model is built on a standard multi-layer perceptron (MLP) where the output layer only has one dimension with a sigmoid function. The adversarial model acts as a discriminator to distinguish whether a latent code is from the prior

(positive) or from graph encoder (negative). By minimizing the cross-entropy cost for training the binary classifier, the embedding will finally be regularized and improved during the training process. The cost can be computed as follows:

(12)

In our paper, we use simple Gaussian distribution as

.

Adversarial Graph Autoencoder Model.   The equation for training the encoder model with Discriminator can be written as follows:

(13)

where and indicate the generator and discriminator explained above.

0:  : a Graph with links and features;: the number of iterations;: the number of steps for iterating discriminator;: the dimension of the latent variable
0:  
1:  for iterator = 1,2,3, ,  do
2:     Generate latent variables matrix through Eq.(4);
3:     for k = 1,2, ,  do
4:         Sample entities {, …, } from latent matrix
5:         Sample entities {, …, } from the prior distribution
6:         Update the discriminator with its stochastic gradient:
7:     Update the graph autoencoder with its stochastic gradient by Eq. (10) for ARGA or Eq. (11) for ARVGA;
8:  return  
Algorithm 1 Adversarially Regularized Graph Embedding

4.3 Algorithm Explanation

Algorithm 1 is our proposed framework. Given a graph , the step 2 gets the latent variables matrix from the graph convolutional encoder. Then we take the same number of samples from the generated and the real data distribution in step 4 and 5 respectively, to update the discriminator with the cross-entropy cost computed in step 6. After runs of training the discriminator, the graph encoder will try to confuse the trained discriminator and update itself with generated gradient in step 7. We can update Eq. (10) to train the adversarially regularized graph autoencoder (ARGA), or Eq. (11) to train the adversarially regularized variational graph autoencoder (ARVGA), respectively. Finally, we will return the graph embedding in step 8.

5 Experiments

We report our results on three unsupervised graph analytic tasks: link prediction, node clustering, and graph visualization. The benchmark graph datasets used in the paper are summarized in Table 1. Each data set consists of scientific publications as nodes and citation relationships as edges. The features are unique words in each document.

Data Set # Nodes # Links # Content Words # Features
Cora 2,708 5,429 3,880,564 1,433
Citeseer 3,327 4,732 12,274,336 3,703
PubMed 19,717 44,338 9,858,500 500
Table 1: Real-world Graph Datasets Used in the Paper

5.1 Link Prediction

Baselines.   We compared our algorithms against state-of-the-art algorithms for the link prediction task:

  • DeepWalk [Perozzi et al.2014]: is a network representation approach which encodes social relations into a continuous vector space.

  • Spectral Clustering [Tang and Liu2011]: is an effective approach for learning social embedding.

  • GAE [Kipf and Welling2016b]: is the most recent autoencoder-based unsupervised framework for graph data, which naturally leverages both topological and content information.

  • VGAE [Kipf and Welling2016b]: is a variational graph autoencoder approach for graph embedding with both topological and content information.

  • ARGA: Our proposed adversarially regularized autoencoder algorithm which uses graph autoencoder to learn the embedding.

  • ARVGA: Our proposed algorithm, which uses a variational graph autoencoder to learn the embedding.

Metric.  

We report the results in terms of AUC score (the area under a receiver operating characteristic curve) and average precision (AP)

[Kipf and Welling2016b]

score. We conduct each experiment 10 times and report the mean values with the standard errors as the final scores. Each dataset is separated into a training, testing set and validation set. The validation set contains 5% citation edges for hyperparameter optimization, the test set holds 10% citation edges to verify the performance, and the rest are used for training.

Parameter Settings.

For the Cora and Citeseer data sets, we train all autoencoder-related models for 200 iterations and optimize them with the Adam algorithm. Both learning rate and discriminator learning rate are set as 0.001. As the PubMed data set is relatively large (around 20,000 nodes), we iterate 2,000 times for an adequate training with a 0.008 discriminator learning rate and 0.001 learning rate. We construct encoders with a 32-neuron hidden layer and a 16-neuron embedding layer for all the experiments and all the discriminators are built with two hidden layers(16-neuron, 64-neuron respectively). For the rest of the baselines, we retain to the settings described in the corresponding papers.

Experimental Results.   The details of the experimental results on the link prediction are shown in Table 2. The results show that by incorporating an effective adversarial training module into our graph convolutional autoencoder, ARGA and ARVGA achieve outstanding performance: all AP and AUC scores are as higher as 92% on all three data sets. Compared with all the baselines, ARGE increased the AP score from around 2.5% compared with VGAE incorporating with node features, 11% compared with VGAE without node features; 15.5% and 10.6% compared with DeepWalk and Spectral Clustering respectively on the large PubMed data set .

Parameter Study.   We vary the dimension of embedding from 8 neurons to 1024 and report the results in Fig 2.

The results from both Fig 2 (A) and (B) reveal similar trends: when adding the dimension of embedding from 8-neuron to 16-neuron, the performance of embedding on link prediction steadily rises; but when we further increase the number of the neurons at the embedding layer to 32-neuron, the performance fluctuates however the results for both the AP score and the AUC score remain good.

It is worth mentioning that if we continue to set more neurons, for examples, 64-neuron, 128-neuron and 1024-neuron, the performance rises markedly.

Figure 2: Average performance on different dimensions of the embedding. (A) Average Precision score; (B) AUC score.
Approaches Cora Citeseer PubMed
AUC AP AUC AP AUC AP
SC 84.6 0.01 88.5 0.00 80.5 0.01 85.0 0.01 84.2 0.02 87.8 0.01
DW 83.1 0.01 85.0 0.00 80.5 0.02 83.6 0.01 84.4 0.00 84.1 0.00
84.3 0.02 88.1 0.01 78.7 0.02 84.1 0.02 82.2 0.01 87.4 0.00
84.0 0.02 87.7 0.01 78.9 0.03 84.1 0.02 82.7 0.01 87.5 0.01
GAE 91.0 0.02 92.0 0.03 89.5 0.04 89.9 0.05 96.4 0.00 96.5 0.00
VGAE 91.4 0.01 92.6 0.01 90.8 0.02 92.0 0.02 94.4 0.02 94.7 0.02
ARGE 92.4 0.003 93.2 0.003 91.9 0.003 93.0 0.003 96.8 0.001 97.1 0.001
ARVGE 92.4 0.004 92.6 0.004 92.4 0.003 93.0 0.003 96.5 0.001 96.8 0.001
Table 2: Results for Link Prediction. and are variants of GAE and VGAE, which only explore topological structure, i.e., .
Figure 3:

The Cora data visualization comparison. From left to right: graphs from our ARGA, VGAE, GAE, DeepWalk, and Spectral Clustering. The different colors represent different groups.

5.2 Node Clustering

For the node clustering task, we first learn the graph embedding, and then perform K-means clustering algorithm based on the embedding.

Baselines We compare both embedding based approaches as well as approaches directly for graph clustering. Except for the baselines we compared for link prediction, we also include baselines which are designed for clustering:

  1. K-means is a classical method and also the foundation of many clustering algorithms.

  2. Graph Encoder [Tian et al.2014] learns graph embedding for spectral graph clustering.

  3. DNGR [Cao et al.2016]

    trains a stacked denoising autoencoder for graph embedding.

  4. RTM [Chang and Blei2009] learns the topic distributions of each document from both text and citation.

  5. RMSC [Xia et al.2014] employs a multi-view learning approach for graph clustering.

  6. TADW [Yang et al.2015] applies matrix factorization for network representation learning.

Here the first three algorithms only exploit the graph structures, while the last three algorithms use both graph structure and node content for the graph clustering task.

Metrics:   Following [Xia et al.2014]

, we employ five metrics to validate the clustering results: accuracy (Acc), normalized mutual information (NMI), precision, F-score (F1) and average rand index (ARI).

Experimental Results.   The clustering results on the Cora and Citeseer data sets are given in Table 3 and Table 4. The results show that ARGA and ARVGA have achieved a dramatic improvement on all five metrics compared with all the other baselines. For instance, on Citeseer, ARGA has increased the accuracy from 6.1% compared with K-means to 154.7% compared with GraphEncoder; increased the F1 score from 31.9% compared with TADW to 102.2% compared with DeepWalk; and increased NMI from 14.8% compared with K-means to 124.4% compared with VGAE. The wide margin in the results between ARGE and GAE (and the others) has further proved the superiority of our adversarially regularized graph autoencoder.

Cora Acc NMI F1 Precision ARI
K-means 0.492 0.321 0.368 0.369 0.230
Spectral 0.367 0.127 0.318 0.193 0.031
GraphEncoder 0.325 0.109 0.298 0.182 0.006
DeepWalk 0.484 0.327 0.392 0.361 0.243
DNGR 0.419 0.318 0.340 0.266 0.142
RTM 0.440 0.230 0.307 0.332 0.169
RMSC 0.407 0.255 0.331 0.227 0.090
TADW 0.560 0.441 0.481 0.396 0.332
GAE 0.596 0.429 0.595 0.596 0.347
VGAE 0.609 0.436 0.609 0.609 0.346
ARGE 0.640 0.449 0.619 0.646 0.352
ARVGE 0.638 0.450 0.627 0.624 0.374
Table 3: Clustering Results on Cora
Citeseer Acc NMI F1 Precision ARI
K-means 0.540 0.305 0.409 0.405 0.279
Spectral 0.239 0.056 0.299 0.179 0.010
GraphEncoder 0.225 0.033 0.301 0.179 0.010
DeepWalk 0.337 0.088 0.270 0.248 0.092
DNGR 0.326 0.180 0.300 0.200 0.044
RTM 0.451 0.239 0.342 0.349 0.203
RMSC 0.295 0.139 0.320 0.204 0.049
TADW 0.455 0.291 0.414 0.312 0.228
GAE 0.408 0.176 0.372 0.418 0.124
VGAE 0.344 0.156 0.308 0.349 0.093
ARGE 0.573 0.350 0.546 0.573 0.341
ARVGE 0.544 0.261 0.529 0.549 0.245
Table 4: Clustering Results on Citeseer

5.3 Graph Visualization

We visualize the Cora data in a two-dimensional space by applying the t-SNE algorithm [Van Der Maaten2014] on the learned embedding. The results in Fig 3 validate that by applying adversarial training to the graph data, we can obtained a more meaningful layout of the graph data.

6 Conclusion

In this paper, we proposed a novel adversarial graph embedding framework for graph data. We argue that most existing graph embedding algorithms are unregularized methods that ignore the data distributions of the latent representation and suffer from inferior embedding in real-world graph data. We proposed an adversarial training scheme to regularize the latent codes and enforce the latent codes to match a prior distribution. The adversarial module is jointly learned with a graph convolutional autoencoder to produce a robust representation. Experiment results demonstrated that our algorithms ARGA and ARVGA outperform baselines in link prediction, node clustering, and graph visualization tasks.

References