Graphs are essential tools to capture and model complicated relationships among data. In a variety of graph applications, including protein-protein interaction networks, social media, and citation networks, analyzing graph data plays an important role in various data mining tasks including node classification [Kipf and Welling2016a], link prediction [Wang et al.2017c], and node clustering [Wang et al.2017a]
. However, the high computational complexity, low parallelizability, and inapplicability of machine learning methods to graph data have made these graph analytic tasks profoundly challenging[Cui et al.2017]. Recently graph embedding has emerged as a general approach to these problems.
Graph embedding converts graph data into a low dimensional, compact, and continuous feature space. The key idea is to preserve the topological structure, vertex content, and other side information [Zhang et al.2017a]. This new learning paradigm has shifted the tasks of seeking complex models for classification, clustering, and link prediction to learning a robust representation of the graph data, so that any graph analytic task can be easily performed by employing simple traditional models (e.g., a linear SVM for the classification task). This merit has motivated a number of studies in this area [Cai et al.2017, Goyal and Ferrara2017].
attempt to learn graph embedding by extracting different patterns from the graph. The captured patterns or walks include global structural equivalence, local neighborhood connectivities, and other various order proximities. Compared with classical methods such as Spectral Clustering[Tang and Liu2011], these graph embedding algorithms perform more effectively and are scalable to large graphs.
Matrix factorization-based algorithms, such as GraRep [Cao et al.2015], HOPE [Ou et al.2016], M-NMF [Wang et al.2017b] pre-process the graph structure into an adjacency matrix and get the embedding by decomposing the adjacency matrix. Recently it has been shown that many probabilistic algorithms are equivalent to matrix factorization approaches [Qiu et al.2017]. Deep learning approaches, especially autoencoder-based methods, are also widely studied for graph embedding. SDNE [Wang et al.2016] and DNGR [Cao et al.2016] employ deep autoencoders to preserve the graph proximities and model positive pointwise mutual information (PPMI). The MGAE algorithm utilizes a marginalized single layer autoencoder to learn representation for clustering [Wang et al.2017a].
The approaches above are typically unregularized approaches which mainly focus on preserving the structure relationship (probabilistic approaches), or minimizing the reconstruction error (matrix factorization or deep learning methods). They have mostly ignored the data distribution of the latent codes. In practice unregularized embedding approaches often learn a degenerate identity mapping where the latent code space is free of any structure [Makhzani et al.2015], and can easily result in poor representation in dealing with real-world sparse and noisy graph data. One common way to handle this problem is to introduce some regularization to the latent codes and enforce them to follow some prior data distribution [Makhzani et al.2015]. Recently generative adversarial based frameworks [Donahue et al.2016, Zhao et al.2016, Dumoulin et al.2016, Radford et al.2015] have also been developed for learning robust latent representation. However, none of these frameworks is specifically for graph data, where both topological structure and content information are required to embed to a latent space.
In this paper, we propose a novel adversarial framework with two variants, namely adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), for graph embedding. The theme of our framework is to not only minimize the reconstruction errors of the graph structure but also to enforce the latent codes to match a prior distribution. By exploiting both graph structure and node content with a graph convolutional network, our algorithms encodes the graph data in the latent space. With a decoder aiming at reconstructing the topological graph information, we further incorporate an adversarial training scheme to regularize the latent codes to learn a robust graph representation. The adversarial training module aims to discriminate if the latent codes are from a real prior distribution or from the graph encoder. The graph encoder learning and adversarial regularization learning are jointly optimized in a unified framework so that each can be beneficial to the other and finally lead to a better graph embedding. The experimental results on benchmark datasets demonstrate the superb performance of our algorithms on three unsupervised graph analytic tasks, namely link prediction, node clustering, and graph visualization. Our contributions can be summarized below:
We propose a novel adversarially regularized framework for graph embedding, which represent topological structure and node content in a continuous vector space. Our framework learns the embedding to minimize the reconstruction error while enforcing the latent codes to match a prior distribution.
We develop two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA) to learn the graph embedding.
Experiments on benchmark graph datasets demonstrate that our graph embedding approaches outperforms the others on three unsupervised tasks.
2 Related Work
Graph Embedding Models. From the perspective of information exploration, graph embedding algorithms can be also separated into two groups: topological embedding approaches and content enhanced embedding methods.
Topological embedding approaches assume that there is only topological structure information available, and the learning objective is to preserve the topological information maximumly. Inspired by the word embedding approach [Mikolov et al.2013], Perozzi et al. propose a DeepWalk model to learn the node embedding from a collection of random walks [Perozzi et al.2014]. Since then, a number of probabilistic models such as node2vec [Grover and Leskovec2016] and LINE [Tang et al.2015] have been developed. As a graph can be mathematically represented as an adjacency matrix, many matrix factorization approaches such as GraRep [Cao et al.2015], HOPE [Ou et al.2016], M-NMF [Wang et al.2017b] are proposed to learn the latent representation for a graph. Recently deep learning models have been widely exploited to learn the graph embedding. These algorithms preserve the first and second order of proximities [Wang et al.2016], or reconstruct the positive pointwise mutual information (PPMI) [Cao et al.2016] via different variants of autoencoders.
Content enhanced embedding methods assume node content information is available and exploit both topological information and content features simultaneously. TADW [Yang et al.2015] proved that DeepWalk can be interpreted as a factorization approach and proposed an extension to DeepWalk to explore node features. TriDNR [Pan et al.2016]
captures structure, node content, and label information via a tri-party neural network architecture. UPP-SNE employs an approximated kernel mapping scheme to exploit user profile features to enhance the embedding learning of users in social networks[Zhang et al.2017b].
Although these algorithms are well-designed for graph-structured data, they largely ignore the latent distribution of the embedding, which may result in poor representation in practice. In this paper, we explore adversarial training methods to address this issue.
Adversarial Models. Our method is motivated by the generative adversarial network (GAN) [Goodfellow et al.2014]. GAN plays an adversarial game with two linked models: the generator and the discriminator
. The discriminator can be a multi-layer perceptron which discriminates if an input sample comes from the data distribution or from the generator we built. Simultaneously, the generator is trained to generate the samples to convince the discriminator that the generated samples come from the prior data distribution. Due to its effectiveness in many unsupervised tasks, recently a number of algorithms have been proposed including BiGAN[Donahue et al.2016], EBGAN [Zhao et al.2016] and ALI [Dumoulin et al.2016].
Recently Makhzani et al. proposed an adversarial autoencoder (AAE) to learn the latent embedding by merging the adversarial mechanism into the autoencoder [Makhzani et al.2015]. However, it is designed for general data rather than graph data. Dai et al. applied the adversarial mechanism to graphs. However, their approach can only exploit the topological information [Dai et al.2017]. In contrast, our algorithm is more flexible and can handle both topological and content information for graph data.
3 Problem Definition and Framework
A graph is represented as , where consists of a set of nodes in a graph and represents a linkage encoding the citation edge between the nodes. The topological structure of graph can be represented by an adjacency matrix , where if , otherwise . indicates the content features associated with each node .
Given a graph , our purpose is to map the nodes to low-dimensional vectors with the formal format as follows: , where is the -th row of the matrix . is the number of nodes and is the dimension of embedding. We take as the embedding matrix and the embeddings should well preserve the topological structure as well as content information .
3.1 Overall Framework
Our objective is to learn a robust embedding given a graph . To this end, we leverage an adversarial architecture with a graph autoencoder to directly process the entire graph and learn a robust embedding. Figure 1 demonstrates the workflow of ARGA which consists of two modules: the graph autoencoder and the adversarial network.
Graph convolutional autoencoder. The autoencoder takes in the structure of graph and the node content as inputs to learn a latent representation , and then reconstructs the graph structure from .
Adversarial regularization. The adversarial network forces the latent codes to match a prior distribution by an adversarial training module, which discriminates whether the current latent code comes from the encoder or from the prior distribution.
4 Proposed Algorithm
4.1 Graph convolutional autoencoder
The graph convolutional autoencoder aims to embed a graph in a low-dimensional space. Two key questions arise (1) how to integrate both graph structure and node content in an encoder, and (2) what sort of information should be reconstructed via a decoder?
Graph Convolutional Encoder Model . To represent both graph structure and node content in a unified framework, we develop a variant of the graph convolutional network (GCN) [Kipf and Welling2016a] as a graph encoder. Our graph convolutional network (GCN) extends the operation of convolution to graph data in the spectral domain, and learns a layer-wise transformation by a spectral convolution function :
Here, is the input for convolution, and is the output after convolution. We have ( nodes and features) for our problem. is a matrix of filter parameters we need to learn in the neural network. If
is well defined, we can build arbitrary deep convolutional neural networks efficiently.
Each layer of our graph convolutional network can be expressed with the function as follows:
where and .
is the identity matrix ofand
is an activation function such asor . Overall, the graph encoder is constructed with a two-layer GCN. In our paper, we develop two variants of encoder, e.g., Graph Encoder and Variational Graph Encoder.
The Graph Encoder is constructed as follows:
and linear activation functions are used for the first and second layers. Our graph convolutional encoder encodes both graph structure and node content into a representation .
A Variational Graph Encoder is defined by an inference model:
Here, is the matrix of mean vectors ; similarly which share the weights with in the first layer in Eq. (3).
Decoder Model. Our decoder model is used to reconstruct the graph data. We can reconstruct either the graph structure , content information , or both. In our paper, we propose to reconstruct graph structure , which provides more flexibility in the sense that our algorithm will still function properly even if there is no content information available (e.g., ). Our decoder predicts whether there is a link between two nodes. More specifically, we train a link prediction layer based on the graph embedding:
Graph Autoencoder Model. The embedding and the reconstructed graph can be presented as follows:
Optimization. For the graph encoder, we minimize the reconstruction error of the graph data by:
For the variational graph encoder, we optimize the variational lower bound as follows:
is the Kullback-Leibler divergence betweenand . We also take a Gaussian prior .
4.2 Adversarial Model
The key idea of our model is to enforce latent representation
to match a prior distribution, which is achieved by an adversarial training model. The adversarial model is built on a standard multi-layer perceptron (MLP) where the output layer only has one dimension with a sigmoid function. The adversarial model acts as a discriminator to distinguish whether a latent code is from the prior(positive) or from graph encoder (negative). By minimizing the cross-entropy cost for training the binary classifier, the embedding will finally be regularized and improved during the training process. The cost can be computed as follows:
In our paper, we use simple Gaussian distribution as.
Adversarial Graph Autoencoder Model. The equation for training the encoder model with Discriminator can be written as follows:
where and indicate the generator and discriminator explained above.
4.3 Algorithm Explanation
Algorithm 1 is our proposed framework. Given a graph , the step 2 gets the latent variables matrix from the graph convolutional encoder. Then we take the same number of samples from the generated and the real data distribution in step 4 and 5 respectively, to update the discriminator with the cross-entropy cost computed in step 6. After runs of training the discriminator, the graph encoder will try to confuse the trained discriminator and update itself with generated gradient in step 7. We can update Eq. (10) to train the adversarially regularized graph autoencoder (ARGA), or Eq. (11) to train the adversarially regularized variational graph autoencoder (ARVGA), respectively. Finally, we will return the graph embedding in step 8.
We report our results on three unsupervised graph analytic tasks: link prediction, node clustering, and graph visualization. The benchmark graph datasets used in the paper are summarized in Table 1. Each data set consists of scientific publications as nodes and citation relationships as edges. The features are unique words in each document.
|Data Set||# Nodes||# Links||# Content Words||# Features|
5.1 Link Prediction
Baselines. We compared our algorithms against state-of-the-art algorithms for the link prediction task:
DeepWalk [Perozzi et al.2014]: is a network representation approach which encodes social relations into a continuous vector space.
Spectral Clustering [Tang and Liu2011]: is an effective approach for learning social embedding.
GAE [Kipf and Welling2016b]: is the most recent autoencoder-based unsupervised framework for graph data, which naturally leverages both topological and content information.
VGAE [Kipf and Welling2016b]: is a variational graph autoencoder approach for graph embedding with both topological and content information.
ARGA: Our proposed adversarially regularized autoencoder algorithm which uses graph autoencoder to learn the embedding.
ARVGA: Our proposed algorithm, which uses a variational graph autoencoder to learn the embedding.
We report the results in terms of AUC score (the area under a receiver operating characteristic curve) and average precision (AP)[Kipf and Welling2016b]
score. We conduct each experiment 10 times and report the mean values with the standard errors as the final scores. Each dataset is separated into a training, testing set and validation set. The validation set contains 5% citation edges for hyperparameter optimization, the test set holds 10% citation edges to verify the performance, and the rest are used for training.
For the Cora and Citeseer data sets, we train all autoencoder-related models for 200 iterations and optimize them with the Adam algorithm. Both learning rate and discriminator learning rate are set as 0.001. As the PubMed data set is relatively large (around 20,000 nodes), we iterate 2,000 times for an adequate training with a 0.008 discriminator learning rate and 0.001 learning rate. We construct encoders with a 32-neuron hidden layer and a 16-neuron embedding layer for all the experiments and all the discriminators are built with two hidden layers(16-neuron, 64-neuron respectively). For the rest of the baselines, we retain to the settings described in the corresponding papers.
Experimental Results. The details of the experimental results on the link prediction are shown in Table 2. The results show that by incorporating an effective adversarial training module into our graph convolutional autoencoder, ARGA and ARVGA achieve outstanding performance: all AP and AUC scores are as higher as 92% on all three data sets. Compared with all the baselines, ARGE increased the AP score from around 2.5% compared with VGAE incorporating with node features, 11% compared with VGAE without node features; 15.5% and 10.6% compared with DeepWalk and Spectral Clustering respectively on the large PubMed data set .
Parameter Study. We vary the dimension of embedding from 8 neurons to 1024 and report the results in Fig 2.
The results from both Fig 2 (A) and (B) reveal similar trends: when adding the dimension of embedding from 8-neuron to 16-neuron, the performance of embedding on link prediction steadily rises; but when we further increase the number of the neurons at the embedding layer to 32-neuron, the performance fluctuates however the results for both the AP score and the AUC score remain good.
It is worth mentioning that if we continue to set more neurons, for examples, 64-neuron, 128-neuron and 1024-neuron, the performance rises markedly.
|SC||84.6 0.01||88.5 0.00||80.5 0.01||85.0 0.01||84.2 0.02||87.8 0.01|
|DW||83.1 0.01||85.0 0.00||80.5 0.02||83.6 0.01||84.4 0.00||84.1 0.00|
|84.3 0.02||88.1 0.01||78.7 0.02||84.1 0.02||82.2 0.01||87.4 0.00|
|84.0 0.02||87.7 0.01||78.9 0.03||84.1 0.02||82.7 0.01||87.5 0.01|
|GAE||91.0 0.02||92.0 0.03||89.5 0.04||89.9 0.05||96.4 0.00||96.5 0.00|
|VGAE||91.4 0.01||92.6 0.01||90.8 0.02||92.0 0.02||94.4 0.02||94.7 0.02|
|ARGE||92.4 0.003||93.2 0.003||91.9 0.003||93.0 0.003||96.8 0.001||97.1 0.001|
|ARVGE||92.4 0.004||92.6 0.004||92.4 0.003||93.0 0.003||96.5 0.001||96.8 0.001|
The Cora data visualization comparison. From left to right: graphs from our ARGA, VGAE, GAE, DeepWalk, and Spectral Clustering. The different colors represent different groups.
5.2 Node Clustering
For the node clustering task, we first learn the graph embedding, and then perform K-means clustering algorithm based on the embedding.
Baselines We compare both embedding based approaches as well as approaches directly for graph clustering. Except for the baselines we compared for link prediction, we also include baselines which are designed for clustering:
K-means is a classical method and also the foundation of many clustering algorithms.
Graph Encoder [Tian et al.2014] learns graph embedding for spectral graph clustering.
RTM [Chang and Blei2009] learns the topic distributions of each document from both text and citation.
RMSC [Xia et al.2014] employs a multi-view learning approach for graph clustering.
TADW [Yang et al.2015] applies matrix factorization for network representation learning.
Here the first three algorithms only exploit the graph structures, while the last three algorithms use both graph structure and node content for the graph clustering task.
Metrics: Following [Xia et al.2014]
, we employ five metrics to validate the clustering results: accuracy (Acc), normalized mutual information (NMI), precision, F-score (F1) and average rand index (ARI).
Experimental Results. The clustering results on the Cora and Citeseer data sets are given in Table 3 and Table 4. The results show that ARGA and ARVGA have achieved a dramatic improvement on all five metrics compared with all the other baselines. For instance, on Citeseer, ARGA has increased the accuracy from 6.1% compared with K-means to 154.7% compared with GraphEncoder; increased the F1 score from 31.9% compared with TADW to 102.2% compared with DeepWalk; and increased NMI from 14.8% compared with K-means to 124.4% compared with VGAE. The wide margin in the results between ARGE and GAE (and the others) has further proved the superiority of our adversarially regularized graph autoencoder.
5.3 Graph Visualization
We visualize the Cora data in a two-dimensional space by applying the t-SNE algorithm [Van Der Maaten2014] on the learned embedding. The results in Fig 3 validate that by applying adversarial training to the graph data, we can obtained a more meaningful layout of the graph data.
In this paper, we proposed a novel adversarial graph embedding framework for graph data. We argue that most existing graph embedding algorithms are unregularized methods that ignore the data distributions of the latent representation and suffer from inferior embedding in real-world graph data. We proposed an adversarial training scheme to regularize the latent codes and enforce the latent codes to match a prior distribution. The adversarial module is jointly learned with a graph convolutional autoencoder to produce a robust representation. Experiment results demonstrated that our algorithms ARGA and ARVGA outperform baselines in link prediction, node clustering, and graph visualization tasks.
- [Cai et al.2017] Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques and applications. arXiv preprint arXiv:1709.07604, 2017.
- [Cao et al.2015] S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891–900. ACM, 2015.
- [Cao et al.2016] S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, pages 1145–1152, 2016.
- [Chang and Blei2009] J. Chang and D. Blei. Relational topic models for document networks. In Artificial Intelligence and Statistics, pages 81–88, 2009.
- [Cui et al.2017] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on network embedding. arXiv preprint arXiv:1711.08752, 2017.
- [Dai et al.2017] Q. Dai, Q. Li, J. Tang, and et.al. Adversarial network embedding. arXiv preprint arXiv:1711.07838, 2017.
- [Donahue et al.2016] J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
- [Dumoulin et al.2016] V. Dumoulin, I. Belghazi, B. Poole, and et.al. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
- [Goodfellow et al.2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, and et.al. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
- [Goyal and Ferrara2017] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. arXiv preprint arXiv:1705.02801, 2017.
- [Grover and Leskovec2016] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855–864. ACM, 2016.
- [Kipf and Welling2016a] T. N Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- [Kipf and Welling2016b] T. N Kipf and M. Welling. Variational graph auto-encoders. NIPS, 2016.
- [Makhzani et al.2015] A. Makhzani, J. Shlens, N. Jaitly, and et.al. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
- [Mikolov et al.2013] T. Mikolov, K. Chen, G. Corrado, and et.al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- [Ou et al.2016] M. Ou, P. Cui, J. Pei, and et.al. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105–1114, 2016.
- [Pan et al.2016] S. Pan, J. Wu, X. Zhu, and et.al. Tri-party deep network representation. Network, 11(9):12, 2016.
- [Perozzi et al.2014] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710. ACM, 2014.
- [Qiu et al.2017] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifyingdeepwalk, line, pte, and node2vec. arXiv preprint arXiv:1710.02971, 2017.
- [Radford et al.2015] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- [Tang and Liu2011] L. Tang and H. Liu. Leveraging social media networks for classification. DMKD, 23(3):447–478, 2011.
- [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: large-scale information network embedding. In WWW, pages 1067–1077, 2015.
- [Tian et al.2014] F. Tian, B. Gao, Q. Cui, and et.al. Learning deep representations for graph clustering. In AAAI, pages 1293–1299, 2014.
- [Van Der Maaten2014] Laurens Van Der Maaten. Accelerating t-sne using tree-based algorithms. Journal of machine learning research, 15(1):3221–3245, 2014.
- [Wang et al.2016] D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In SIGKDD, pages 1225–1234. ACM, 2016.
- [Wang et al.2017a] Chun Wang, Shirui Pan, Guodong Long, Xingquan Zhu, and Jing Jiang. Mgae: Marginalized graph autoencoder for graph clustering. In CIKM, pages 889–898. ACM, 2017.
- [Wang et al.2017b] X. Wang, P. Cui, J. Wang, and et.al. Community preserving network embedding. In AAAI, pages 203–209, 2017.
- [Wang et al.2017c] Zhitao Wang, Chengyao Chen, and Wenjie Li. Predictive network representation learning for link prediction. In SIGIR, pages 969–972. ACM, 2017.
- [Xia et al.2014] R. Xia, Y. Pan, L. Du, and et.al. Robust multi-view spectral clustering via low-rank and sparse decomposition. In AAAI, pages 2149–2155, 2014.
- [Yang et al.2015] C. Yang, Z. Liu, D. Zhao, and et.al. Network representation learning with rich text information. In IJCAI, pages 2111–2117, 2015.
- [Zhang et al.2017a] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. arXiv preprint arXiv:1801.05852, 2017.
- [Zhang et al.2017b] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. User profile preserving social network embedding. In IJCAI, pages 3378–3384. AAAI Press, 2017.
- [Zhao et al.2016] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016.