Graph are essential tools to capture and model complicated relationships among data. In a variety of graph applications, such as social networks, citation networks, protein-protein interaction networks, graph data analysis plays an important role in various data mining tasks including node classification , node clustering , node recommendation [3, 4], and graph classification [5, 6]
. However, the high computational complexity, low parallelizability, and inapplicability of machine learning methods to graph data have made these graph analytic tasks profoundly challenging. Graph embedding has recently emerged as a general approach to these problems.
Graph embedding transfers graph data into a low dimensional, compact, and continuous feature space. The fundamental idea is to preserve the topological structure, vertex content, and other side information . This new learning paradigm has shifted the tasks of seeking complex models for classification, clustering, and link prediction to learning a compact and informative representation for the graph data, so that many graph mining tasks can be easily performed by employing simple traditional models (e.g., a linear SVM for the classification task). This merit has motivated many studies in this area [4, 9].
attempt to learn graph embedding by extracting different patterns from the graph. The captured patterns or walks include global structural equivalence, local neighborhood connectivities, and other various order proximities. Compared with classical methods such as Spectral Clustering, these graph embedding algorithms perform more effectively and are scalable to large graphs.
Matrix factorization-based algorithms, such as GraRep , HOPE , M-NMF  pre-process the graph structure into an adjacency matrix and obtain the embedding by factorizing the adjacency matrix. It has been recently shown that many probabilistic algorithms including DeepWalk , LINE , node2vec , are equivalent to matrix factorization approaches , and Qiu et al. propose a unified matrix factorization approach NetMF 
for graph embedding. Deep learning approaches, especially autoencoder-based methods, are also studied for graph embedding (a most up-to-date survey on graph neural networks can be found here). SDNE  and DNGR  employ deep autoencoders to preserve the graph proximities and model the positive pointwise mutual information (PPMI). The MGAE algorithm utilizes a marginalized single layer autoencoder to learn representation for graph clustering .
The approaches above are typically unregularized approaches which mainly focus on preserving the structure relationship (probabilistic approaches) or minimizing the reconstruction error (matrix factorization or deep learning methods). They have mostly ignored the latent data distribution of the representation. In practice, unregularized embedding approaches often learn a degenerate identity mapping where the latent code space is free of any structure , and can easily result in poor representation in dealing with real-world sparse and noisy graph data. One standard way to handle this problem is to introduce some regularization to the latent codes and enforce them to follow some prior data distribution . Recently generative adversarial based frameworks [22, 23, 24, 25] have also been developed for learning robust latent representation. However, none of these frameworks is specifically for graph data, where both topological structure and content information are required to be represented into a latent space.
In this paper, we propose a novel adversarially regularized algorithm with two variants, adversarially regularized graph autoencoder (ARGA) and its variational version, adversarially regularized variational graph autoencoder
(ARVGA), for graph embedding. The theme of our framework is to not only minimize the reconstruction errors of the topological structure but also to enforce the learned latent embedding to match a prior distribution. By exploiting both graph structure and node content with a graph convolutional network, our algorithms encodes the graph data in the latent space. With a decoder aiming at reconstructing the topological graph information, we further incorporate an adversarial training scheme to regularize the latent codes to learn a robust graph representation. The adversarial training module aims to discriminate if the latent codes are from a real prior distribution or the graph encoder. The graph encoder learning and adversarial regularization learning are jointly optimized in a unified framework so that each can be beneficial to the other and finally lead to a better graph embedding. To get further insight into the influence of prior distribution, we have varied it with the Gaussian distribution and Uniform distribution for all models and tasks. Moreover, we have examined the different ways to construct the graph decoders as well as the target of the reconstructions. By doing so, we have obtained a comprehensive view of the most influential factor of the adversarially regularized graph autoencoder models for different tasks. The experimental results on three benchmark graph datasets demonstrate the superb performance of our algorithms on two unsupervised graph analytic tasks, namely link prediction and node clustering. Our contributions can be summarized below:
We propose a novel adversarially regularized framework for graph embedding, which represents topological structure and node content in a continuous vector space. Our framework learns the embedding to minimize the reconstruction error while enforcing the latent codes to match a prior distribution.
We develop two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA) to learn the graph embedding.
We have examined different prior distributions, the ways to construct decoders, and the targets of the reconstructions to point out the influence of the factors of the adversarially regularized graph autoencoder models on various tasks.
Experiments on benchmark graph datasets demonstrate that our graph embedding approaches outperform the others on different unsupervised tasks.
The paper is structured as follows. Section 2 reviews the related work. Section 3 outlines the problem definition and our overall framework. Section 4 presents the proposed algorithm and Section 5 describes the experimental results. We conclude the paper in Section 6.
2 Related Work
2.1 Graph Embedding Models
Graph embedding, also known as network embedding , or network representation learning , transfers a graph into vectors. From the perspective of information exploration, graph embedding algorithms can be separated into two groups: topological network embedding approaches and content enhanced network embedding methods.
Topological network embedding approaches Topological network embedding approaches assume that there is only topological structure information available, and the learning objective is to preserve the topological information maximumly [26, 27]. Inspired by the word embedding approach , Perozzi et al. propose a DeepWalk model to learn the node embedding from a collection of random walks . Since then, many probabilistic models have been developed. Specifically, Grover et al. propose a biased random walks approach, node2vec , which employs both breath-first sampling (BFS) and Depth-first sampling (DFS) strategies to generate random walk sequences for network embedding. Tang et al. propose a LINE algorithm  to handle large-scale information networks while preserving both first-order and second-order proximity. Other random walk variants include hierarchical representation learning approach (HARP) , and discriminative deep random walk (DDRW) , and Walklets .
Because a graph can be mathematically represented as an adjacency matrix, many matrix factorization approaches are proposed to learn the latent representation for a graph. GraRep  integrates the global topological information of the graph into the learning process to represent each node into a low dimensional space; HOPE  preserves the asymmetric transitivity by approximating high-order proximity for a better performance on capturing topological information of graphs and reconstructing from partially observed graphs; DNE  aims to learn discrete embedding which reduces the storage and computational cost. Recently deep learning models have been exploited to learn the graph embedding. These algorithms preserve the first and second order of proximities , or reconstruct the positive pointwise mutual information (PPMI)  via different variants of autoencoders.
Content enhanced network embedding methods Content enhanced embedding methods assume node content information is available and exploit both topological information and content features simultaneously. TADW  proved that DeepWalk can be interpreted as a factorization approach and proposed an extension to DeepWalk to explore node features. TriDNR  captures structure, node content, and label information via a tri-party neural network architecture. UPP-SNE  employs an approximated kernel mapping scheme to exploit user profile features to enhance the embedding learning of users in social networks. SNE  learns a neural network model to capture both structural proximity and attribute proximity for attributed social networks. DANE  deals with the dynamic environment with an incremental matrix factorization approach, and LANE  incorporates label information into the optimization process to learn a better embedding. Recently, BANE 
is proposed to learn binarized embedding for attributed graph which has potential to increase the efficiency for latter graph analytic tasks.
Although these algorithms are well-designed for graph-structured data, they have largely ignored the embedding distribution, which may result in poor representation in real-graph data. In this paper, we explore adversarial training approaches to address this issue.
2.2 Adversarial Models
Our method is motivated by the generative adversarial network (GAN) . GAN plays an adversarial game with two linked models: the generator and the discriminator . The discriminator discriminates if an input sample comes from the prior data distribution or from the generator we built. Simultaneously, the generator is trained to generate the samples to convince the discriminator that the generated samples come from the prior data distribution. Typically, the training process is split into two steps: (1) Train the discriminator for iterations to distinguish the samples from the expected data distribution from the samples generated via the generator. Then (2) train the generator to confuse the discriminator with its generated data. However, the original GAN does not fit the unsupervised data encoding, as the absence of the precise structure for inference. To implement the adversarial structure in learning data embedding, existing works like BiGAN, EBGAN and ALI arrive at extending the original adversarial framework with external structures for the inference, which have achieved non-negligible performance in applications, such as document retrieval and image classification. Other solutions manage to generate the embedding from the discriminator or generator for semi-supervised and supervised tasks via reconstructed layers. For example, DCGAN
bridges the gap between convolutional networks and generative adversarial networks with particular architectural constraints for unsupervised learning; and ANE combines a structure-preserving component and an adversarial learning scheme to learn a robust embedding.
Makhzani et al. proposed an adversarial autoencoder (AAE) to learn the latent embedding by merging the adversarial mechanism into the autoencoder . However, AAE is designed for general data rather than graph data. Recently there are some studies on applying adversarial mechanism to graphs. However, their approach can only exploit the topological information [42, 43]. In contrast, our algorithm is more flexible and can handle both topological and content information for graph data.
Though many adversarial models have achieved impressive success in computer vision, they cannot effectively and directly handle the graph-structured data. With some preliminary study in, we try to thoroughly exploit the graph convolutional models with different adversarial models to learn a robust graph embedding in this paper.
2.3 Graph Convolutional Nets based Models
Graph convolutional networks (GCN) 
is a semi-supervised framework based on a variant of convolutional neural networks, which attempt to operate the graphs directly. Specifically, the GCN represents the graph structure and the interrelationship between node and feature with an adjacent matrixand node-feature matrix . Hence, GCN can directly embed the graph structure with a spectral convolutional function for each layer and train the model on a supervised target for all labelled nodes. Because of the spectral function on the adjacent matrix of the graph, the model can distribute the gradient from the supervised cost and learn the embedding of both the labelled and unlabelled nodes. Although GCN is powerful on graph-structured data sets for semi-supervised tasks like node classification, variational graph autoencoder VGAE  extends it into unsupervised scenarios. Specifically, VGAE integrates the GCN into the variational autoencoder framework  by framing the encoder with graph convolutional layers and remolding the decoder with a link prediction layer. Taking advantage of GCN layers, VGAE can naturally leverage the information of node features, which expressively muscle the predictive performance.
3 Problem Definition and Framework
A graph is represented as , where is constitutive of a set of nodes in a graph and represents a linkage coding the citation edge between the papers (nodes). The topological structure of graph can be represented by an adjacency matrix , where if , otherwise . encodes the textual content features associated with each node .
Given a graph , our purpose is to map the nodes to low-dimensional vectors with the formal format as follows: , where is the -th row of the matrix . is the number of nodes and is the dimension of embedding. We take as the embedding matrix and the embeddings should well preserve the topological structure as well as content information .
3.1 Overall Framework
The objective is to learn a robust embedding for a given graph . To this end, we leverage an adversarial architecture with a graph autoencoder to directly process the entire graph and learn a robust embedding. Figure 1 demonstrates the workflow of ARGA which consists of two modules: the graph autoencoder and the adversarial network.
Graph convolutional autoencoder. The autoencoder takes in the structure of graph and the node content as inputs to learn a latent representation , and then reconstructs the graph structure from . We will further explore other variants of graph autoencoder in Section 4.4.
Adversarial regularization. The adversarial network forces the latent codes to match a prior distribution by an adversarial training module, which discriminates whether the current latent code comes from the encoder or from the prior distribution.
4 Proposed Algorithm
4.1 Graph Convolutional Autoencoder
Our graph convolutional autoencoder aims to embed a graph in a low-dimensional space. Two fundamental questions arise (1) how to simultaneously integrate graph structure and content feature in an encoder, and (2) what sort of information should be reconstructed via a decoder?
Graph Convolutional Encoder Model . To represent both graph structure and node content in a unified framework, we develop a variant of the graph convolutional network (GCN)  as a graph encoder. GCN introduces the convolutional operation to graph-data in the spectral area, and leverage a spectral convolutional function to build a layer-wise transformation:
Here, and are the input and output of the convolution respectively. We set ( indicates the number of nodes and indicates the number of features) for our problem. We need to learn a filter parameter matrix in the neural network, and if the spectral convolution function is well defined, we can efficiently construct arbitrary deep convolutional neural networks.
Each layer of our graph convolutional network can be expressed with the the spectral convolution function as follows:
where and .
is the identity matrix ofand
is an activation function such asor . Overall, the graph encoder is constructed with a two-layer GCN. In our paper, we develop two variants of the encoder, e.g., Graph Encoder and Variational Graph Encoder.
The Graph Encoder is constructed as follows:
and linear activation functions are used for the first and second layers. Our graph convolutional encoder encodes both graph structure and node content into a representation .
A Variational Graph Encoder is defined by an inference model:
Here, is the matrix of mean vectors ; similarly which share the weights with in the first layer in Eq. (3).
Decoder Model. Our decoder model is used to reconstruct the graph data. We can reconstruct either the graph structure , content information , or both. In the basic version of our model (ARGA), we propose to reconstruct graph structure , which provides more flexibility in the sense that our algorithm will still function properly even if there is no content information available (e.g., ). We will provide several variants of decoder model in Section 4.3. Here the ARGA decoder predicts whether there is a link between two nodes. More specifically, we train a link prediction layer based on the graph embedding:
here the prediction should be close to the ground truth .
Graph Autoencoder Model. The embedding and the reconstructed graph can be presented as follows:
Optimization. For the graph encoder, we minimize the reconstruction error of the graph data by:
For the variational graph encoder, we optimize the variational lower bound as follows:
is the Kullback-Leibler divergence betweenand . is a prior distribution which can be a uniform distribution or a Gaussian distribution in practice.
4.2 Adversarial Model
The fundamental idea of our model is to enforce latent representation
to match a prior distribution, which is achieved by an adversarial training model. The adversarial model is built on a standard multi-layer perceptron (MLP) where the output layer only has one dimension with a sigmoid function. The adversarial model acts as a discriminator to distinguish whether a latent code is from the prior(positive) or graph encoder (negative). By minimizing the cross-entropy cost for training the binary classifier, the embedding will finally be regularized and improved during the training process. The cost can be computed as follows:
In our paper, we have examined both Gaussian distribution and Uniform distribution as for all models and tasks.
Adversarial Graph Autoencoder Model. The equation for training the encoder model with Discriminator can be written as follows:
where and indicate the generator and discriminator explained above.
4.3 Algorithm Explanation
Algorithm 1 is our proposed framework. Given a graph , step 2 gets the latent variables matrix from the graph convolutional encoder. Then we take the same number of samples from the generated and the real data distribution in step 4 and 5 respectively, to update the discriminator with the cross-entropy cost computed in step 6. After runs of training the discriminator, the graph encoder will try to confuse the trained discriminator and update itself with the generated gradient in step 8. We can update Eq. (10) to train the adversarially regularized graph autoencoder (ARGA), or Eq. (11) to train the adversarially regularized variational graph autoencoder (ARVGA), respectively. Finally, we will return the graph embedding in step 9.
4.4 Decoder Variations
In ARGA and ARVGA models, the decoder is merely a link prediction layer which performs as a dot product of the embedding Z. In practice, the decoder can also be a graph convolutional layer or a combination of link prediction layer and graph convolutional decoder layer.
GCN Decoder for Graph Structure Reconstruction (ARGA_GD) We have modified the encoder by adding two graph convolutional layers to reconstruct the graph structure. This variant of approach is named ARGA_GD. Fig. 2 demonstrates the architecture of ARGA_GD. In this approach, the input of the decoder will be the embedding from the encoder, and the graph convolutional decoder is constructed as follows:
where is the embedding learned from the graph encoder while and are the outputs from the first and second layer of the graph decoder. The number of horizontal dimension of is equal to the number of nodes. Then we calculate the reconstruction error as follows:
GCN Decoder for both Graph Structure and Content Information Reconstruction (ARGA_AX) We have further modified our graph convolutional decoder to reconstruct both the graph structure and content information . The architecture is illustrated in Fig 3. We fixed the dimension of second graph convolutional layer with the same number of the features associated with every node, thus the output from the second layer . In this case, the reconstruction loss is composed of two errors. First, the reconstruction error of graph structure can be minimized as follows:
Then the reconstruction error of node content can be minimized with a similar formula:
The final reconstruction error is the sum of the reconstruction error of graph structure and node content:
We report our results on both link prediction and node clustering tasks. The benchmark graph datasets used in the paper, Cora , Citeseer  and Pubmed , are summarized in table 1. Each dataset consists of scientific publications as nodes and citation relationships as edges. The features are unique words in each document.
|Data Set||# Nodes||# Links||# Content Words||# Features|
|SC||84.6 0.01||88.5 0.00||80.5 0.01||85.0 0.01||84.2 0.02||87.8 0.01|
|DW||83.1 0.01||85.0 0.00||80.5 0.02||83.6 0.01||84.4 0.00||84.1 0.00|
|84.3 0.02||88.1 0.01||78.7 0.02||84.1 0.02||82.2 0.01||87.4 0.00|
|84.0 0.02||87.7 0.01||78.9 0.03||84.1 0.02||82.7 0.01||87.5 0.01|
|GAE||91.0 0.02||92.0 0.03||89.5 0.04||89.9 0.05||96.4 0.00||96.5 0.00|
|VGAE||91.4 0.01||92.6 0.01||90.8 0.02||92.0 0.02||94.4 0.02||94.7 0.02|
|ARGA||92.4 0.003||93.2 0.003||91.9 0.003||93.0 0.003||96.8 0.001||97.1 0.001|
|ARVGA||92.4 0.004||92.6 0.004||92.4 0.003||93.0 0.003||96.5 0.001||96.8 0.001|
|77.9 0.003||78.9 0.003||74.4 0.003||76.2 0.003||95.1 0.001||95.2 0.001|
|88.0 0.004||87.9 0.004||89.7 0.003||90.5 0.003||93.2 0.001||93.6 0.001|
|91.3 0.003||91.3 0.003||91.9 0.003||93.4 0.003||96.6 0.001||96.7 0.001|
|90.2 0.004||89.2 0.004||89.8 0.003||90.4 0.003||96.7 0.001||97.1 0.001|
5.1 Link Prediction
Baselines. Twelve algorithms in total are compared for the link prediction task:
DeepWalk  is a network representation approach which encodes social relations into a continuous vector space.
Spectral Clustering  is an effective approach to learn social embedding.
GAE  is the most recent autoencoder-based unsupervised framework for graph data, which naturally leverages both topological structure and content information . GAE is the version of GAE which only considers the topological information , i.e., .
VGAE  is the variational graph autoencoder for graph embedding with both topological and content information. Likewise, VGAE is a simplified version of VGAE which only leverages the topological information.
ARGA is our proposed adversarially regularized autoencoder algorithm which uses graph autoencoder to learn the embedding.
ARVGA is our proposed algorithm, which uses a variational graph autoencoder to learn the embedding.
ARGA_DG is a variant of our proposed ARGA which takes graph convolutional layers as its decoder to reconstruct graph structure. ARVGA_DG is the variational version of ARGA_DG.
ARGA_AX is a variant of our proposed ARGA which takes graph convolutional layers as its decoder to simultaneously reconstruct graph structure and node content. ARVGA_AX is the variational version of ARGA_AX.
We report the results concerning AUC score (the area under a receiver operating characteristic curve) and average precision (AP) score which can be computed as follow:
where is the outputs from the predictor and and are the number of positive samples and the number of negative samples respectively. We also report the Average Precision (AP) which indicates the area under the precision-recall curve:
where is an index for the class .
We conduct each experiment 10 times and report the mean values with the standard errors as the final scores. Each dataset is separated into a training, testing set, and a validation set. The validation set contains 5% citation edges for hyperparameter optimization, the test set holds 10% citation edges to verify the performance, and the rest are used for training.
For the Cora and Citeseer data sets, we train all autoencoder-related models for 200 iterations and optimize them with the Adam algorithm. Both learning rate and discriminator learning rate are set as 0.001. As the PubMed dataset is relatively large (around 20,000 nodes), we iterate 2,000 times for adequate training with a 0.008 discriminator learning rate and 0.001 learning rate. We construct encoders with a 32-neuron hidden layer and a 16-neuron embedding layer for all the experiments and all the discriminators are built with two hidden layers(16-neuron, 64-neuron respectively). For the rest of the baselines, we retain the settings described in the corresponding papers.
Experimental Results. The details of the experimental results on the link prediction are shown in Table 2. The results show that by incorporating an effective adversarial training module into our graph convolutional autoencoder, ARGA and ARVGA achieve outstanding performance: all AP and AUC scores are as higher as 92% on all three data sets. Compared with all the baselines, ARGA increased the AP score from around 2.5% compared with VGAE incorporating with node features, 11% compared with VGAE without node features; 15.5% and 10.6% compared with DeepWalk and Spectral Clustering respectively on the large PubMed data set.
The approaches which use both node content and topological information are always straightforward to get better performance compared to those only consider graph structure. The gap between ARGA and GAE models demonstrates that regularization on the latent codes has its advantage to learn a robust embedding. The impact of various distributions, architectures of the decoder as well as the reconstructions will be discussed in Sec:5.3 - ARGA Architectures Comparison.
Parameter Study. We conducted experiments on Cora dataset by varying the dimension of embedding from 8 neurons to 1024 and report the results in Fig 4.
The results from both Fig 4 (A) and (B) reveal similar trends: when adding the dimension of embedding from 8-neuron to 16-neuron, the performance of embedding on link prediction steadily rises; when we further increase the number of the neurons at the embedding layer to 32-neuron, the performance fluctuates, however, the results for both the AP score and the AUC score remain good.
It is worth mentioning that if we continue to set more neurons, for examples, 64-neuron, 128-neuron and 1024-neuron, the performance rises dramatically.
5.2 Node Clustering
For the node clustering task, we first learn the graph embedding, and after that, we perform the K-means clustering method based on the embedding.
Baselines We compare both embedding based approaches as well as approaches directly for graph clustering. Except for the baselines we compared for link prediction, we also include baselines which are designed for clustering. Twenty approaches in total are compared in the experiments. For a comprehensive validation, we take the algorithms which only consider one perspective of information source, say, network structure or node content, as well as algorithms considering both factors.
Node Content or Graph Structure Only:
K-means is a classical method and also the foundation of many clustering algorithms.
Big-Clam  is a community detection algorithm based on NMF.
Graph Encoder  learns graph embedding for spectral graph clustering.
Both Content and Structure
Circles  is an overlapping graph clustering algorithm which treats each node as ego and builds the ego graph with the linkages between the ego’s friends.
RTM  learns the topic distributions of each document from both text and citation.
TADW  applies matrix factorization for network representation learning.
Table III gives the detailed comparison of most of the baselines. For space saving, we did not list the variational versions of our models. Recovering and in the table demonstrates whether the model reconstructs the graph structure () and node content (). Please note that we do not report the clustering results from Circle on PubMed dataset as the single experiment have been running more than three days without any outcome and error. We think this is because of the large size of the PubMed dataset (around 20,000 nodes). Note that the Circle algorithm works well on the other two datasets.
Metrics: Following , we employ five metrics to validate the clustering results: accuracy (Acc), F-one score (F1), normalized mutual information (NMI), precision and average rand index (ARI).
Experimental Results. The clustering results on the Cora, Citeseer and Pubmed data sets are given in table IV, table V and table VI. The results show that ARGA and ARVGA have achieved a dramatic improvement on all five metrics compared with all the other baselines. For instance, on Citeseer, ARGA has increased the accuracy from 6.1% compared with K-means to 154.7% compared with GraphEncoder; increased the F1 score from 31.9% compared with TADW to 102.2% compared with DeepWalk; and increased NMI from 14.8% compared with K-means to 124.4% compared with VGAE.
Furthermore, as we can see from the three tables, the clustering results from approaches BigClam and DeepWalk, which only consider one perspective information of the graph, are inferior to the results from those which consider both topological information and node content of the graph. However, both purely GCNs-based approaches or the methods considering multi-view information still only obtain sub-optimal results compared to the adversarially regularized graph convolutional models.
The wide margin in the results between ARGA and GAE (and the others) has further demonstrated the superiority of our adversarially regularized graph autoencoder.
Parameter Study. We conducted experiments on Cora dataset with varying the dimension of embedding from 8 neurons to 1024 and report the results in Fig 5
. All metrics demonstrated the similar fluctuation as the dimension of the embedding is increased. We cannot extract apparent trends to represent the relations between the embedding dimensions and the score of each clustering metric. This observation indicates that the unsupervised clustering task is more sensitive to the parameters compared to the supervised learning tasks (e.g., link prediction in Section 5.2).
5.3 ARGA Architectures Comparison
In this section, we construct six versions of the model: adversarially regularized graph autoencoder (ARGA), adversarially regularized graph autoencoder with graph convolutional decoder (ARGA_DG) and adversarially regularized graph autoencoder for reconstructing both graph structure and node content (ARGA_AX) and their variational versions. Meanwhile, we conduct all experiments with a prior Gaussian distribution and a prior Uniform Distribution respectively for every model. We analyze the comparison experiments and try to figure out the reasons behind the results. The experimental results are illustrated in Fig, 6 and 7.
Gaussian Distribution vs Uniform Distribution. The performance of the proposed models is not very sensitive to the prior distributions, especially for the node clustering task. As is shown in the Fig. 6, if we compare the results of two distributions with the same metric, the results from one same model, in most cases, are very similar.
As for the link prediction (Fig. 7), the Uniform distribution dramatically lowers the performance of ARGA_DG on all datasets and metrics, compared to the results with Gaussian distribution. ARGA and its variational version are not as sensitive to the different distributions as ARGA_DG models. The standard version of ARGA with Gaussian distribution slightly outperforms the ones with Uniform distribution. The situation reversed with the variational ARGA models.
Decoders and Reconstructions. As is shown in Fig. 7, the ARGA with the Gaussian distribution and inner product decoder for reconstructing graph structure has a significant advantage in link prediction since is designed to predict whether there is a link between two nodes. Simply replacing the decoder with graph convolutional layers to reconstruct adjacency matrix (ARGA_DG) has got a sub-optimal performance on link prediction compared to ARGA. According to the statistic in Fig. 6, although the performance of ARGA_DG on clustering is comparable with original ARGA, there is still a gap between these two variations. Two graph convolutional layers in the decoder cannot effectively decode the topological information of the graph, which leads to the sub-optimal results. The model with graph convolutional decoder for reconstructing both topological information and node content (ARGA_AX) may prove this hypothesis. As can be seen in Fig. 6 and 7, ARGA_AX has dramatically improved the performance on both link prediction and clustering compared to ARGA_DG which purely reconstructs the topological structure. ARGA and ARGA_AX have very similar performances on both link prediction and clustering. The variational version of ARGA_AX (ARVGA_AX) has outstanding performance on clustering which has achieved 12.2% improvement on clustering accuracy on Cora dataset and 5.4% improvement on Citeseer dataset compared to ARVGA.
6 Conclusion and Future Work
In this paper, we proposed a novel adversarial graph embedding framework for graph data. We argue that most existing graph embedding algorithms are unregularized methods that ignore the data distributions of the latent representation and suffer from inferior embedding in real-world graph data. We proposed an adversarial training scheme to regularize the latent codes and enforce the latent codes to match a prior distribution. The adversarial module is jointly learned with a graph convolutional autoencoder to produce a robust representation. We also exploited some interesting variations of ARGA like ARGA_DG and ARGA_AX to discuss the impact of graph convolutional decoder for reconstructing both graph structure and node content. Experiment results demonstrated that our algorithms ARGA and ARVGA outperform baselines in link prediction and node clustering tasks.
There are several directions for the adversarially regularized graph autoencoders (ARGA). We will investigate how to use the ARGA model to generate some realistic graphs , which may help discover new drugs in biological domains. We will also study how to incorporate label information into ARGA to learn robust graph embedding.
-  T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
-  C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized graph autoencoder for graph clustering,” in CIKM. ACM, 2017, pp. 889–898.
-  F. Xiong, X. Wang, S. Pan, H. Yang, H. Wang, and C. Zhang, “Social recommendation with evolutionary opinion dynamics,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, no. 99, pp. 1–13, 2018.
-  H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey of graph embedding: Problems, techniques and applications,” IEEE Transactions on Knowledge and Data Engineering, 2018.
-  S. Pan, J. Wu, and X. Zhu, “Cogboost: Boosting for fast cost-sensitive graph classification,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 11, pp. 2933–2946, 2015.
-  S. Pan, J. Wu, X. Zhu, C. Zhang, and P. S. Yu, “Joint structure feature exploration and regularization for multi-task graph classification,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 3, pp. 715–728, 2016.
-  P. Cui, X. Wang, J. Pei, and W. Zhu, “A survey on network embedding,” IEEE Transactions on Knowledge and Data Engineering, 2018.
-  D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Network representation learning: A survey,” IEEE Transactions on Big Data, 2018.
-  P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” arXiv preprint arXiv:1705.02801, 2017.
-  B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in SIGKDD. ACM, 2014, pp. 701–710.
-  A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in SIGKDD. ACM, 2016, pp. 855–864.
-  J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: large-scale information network embedding,” in WWW, 2015, pp. 1067–1077.
-  L. Tang and H. Liu, “Leveraging social media networks for classification,” DMKD, vol. 23, no. 3, pp. 447–478, 2011.
-  S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representations with global structural information,” in CIKM. ACM, 2015, pp. 891–900.
-  M. Ou, P. Cui, J. Pei, and et.al, “Asymmetric transitivity preserving graph embedding.” in KDD, 2016, pp. 1105–1114.
-  X. Wang, P. Cui, J. Wang, and et.al, “Community preserving network embedding.” in AAAI, 2017, pp. 203–209.
-  J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifyingdeepwalk, line, pte, and node2vec,” arXiv preprint arXiv:1710.02971, 2017.
-  Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
-  D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in SIGKDD. ACM, 2016, pp. 1225–1234.
-  S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning graph representations.” in AAAI, 2016, pp. 1145–1152.
-  A. Makhzani, J. Shlens, N. Jaitly, and et.al, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.
-  J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial feature learning,” arXiv preprint arXiv:1605.09782, 2016.
-  J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based generative adversarial network,” arXiv preprint arXiv:1609.03126, 2016.
-  V. Dumoulin, I. Belghazi, B. Poole, and et.al, “Adversarially learned inference,” arXiv preprint arXiv:1606.00704, 2016.
-  A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
-  D. Zhu, P. Cui, Z. Zhang, J. Pei, and W. Zhu, “High-order proximity preserved embedding for dynamic networks,” IEEE Transactions on Knowledge and Data Engineering, 2018.
-  H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, L. Kaplan, and J. Han, “Embedding learning with events in heterogeneous information networks,” IEEE transactions on knowledge and data engineering, vol. 29, no. 11, pp. 2428–2441, 2017.
-  T. Mikolov, K. Chen, G. Corrado, and et.al, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
-  H. Chen, B. Perozzi, Y. Hu, and S. Skiena, “HARP: hierarchical representation learning for networks,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018.
-  J. Li, J. Zhu, and B. Zhang, “Discriminative deep random walk for network classification,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, 2016, pp. 1004–1013.
-  B. Perozzi, V. Kulkarni, and S. Skiena, “Walklets: Multiscale graph embeddings for interpretable network classification,” arXiv preprint arXiv:1605.02115, 2016.
-  X. Shen, S. Pan, W. Liu, Y. Ong, and Q. Sun, “Discrete network embedding,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., 2018, pp. 3549–3555.
-  C. Yang, Z. Liu, D. Zhao, and et.al, “Network representation learning with rich text information.” in IJCAI, 2015, pp. 2111–2117.
-  S. Pan, J. Wu, X. Zhu, and et.al, “Tri-party deep network representation,” Network, vol. 11, no. 9, p. 12, 2016.
-  D. Zhang, J. Yin, X. Zhu, and C. Zhang, “User profile preserving social network embedding,” in IJCAI. AAAI Press, 2017, pp. 3378–3384.
-  L. Liao, X. He, H. Zhang, and T.-S. Chua, “Attributed social network embedding,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2018.
-  J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, and H. Liu, “Attributed network embedding for learning in a dynamic environment,” in Proceedings of the ACM International Conference on Information and Knowledge Management, 2017, pp. 387–396.
-  X. Huang, J. Li, and X. Hu, “Label informed attributed network embedding,” in WSDM. ACM, 2017, pp. 731–739.
-  H. Yang, S. Pan, P. Zhang, L. Chen, D. Lian, and C. Zhang, “Binarized attributed network embedding,” in ICDM, 2018.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, and et.al, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
-  J. Glover, “Modeling documents with generative adversarial networks,” arXiv preprint arXiv:1612.09122, 2016.
-  Q. Dai, Q. Li, J. Tang, and et.al, “Adversarial network embedding,” arXiv preprint arXiv:1711.07838, 2017.
-  H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo, “Graphgan: Graph representation learning with generative adversarial nets,” arXiv preprint arXiv:1711.08267, 2017.
-  S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially regularized graph autoencoder for graph embedding.” in IJCAI, 2018, pp. 2609–2615.
-  T. N. Kipf and M. Welling, “Variational graph auto-encoders,” NIPS, 2016.
-  D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
-  Q. Lu and L. Getoor, “Link-based classification,” in ICML, 2003, pp. 496–503.
-  P. Sen, G. Namata, M. Bilgic, and et.al, “Collective classification in network data,” AI magazine, vol. 29, no. 3, p. 93, 2008.
-  G. Namata, B. London, L. Getoor, and et.al, “Query-driven active surveying for collective classification,” in MLG, 2012.
-  F. Tian, B. Gao, Q. Cui, and et.al, “Learning deep representations for graph clustering.” in AAAI, 2014, pp. 1293–1299.
-  J. Leskovec and J. J. Mcauley, “Learning to discover social circles in ego networks,” in NIPS, 2012, pp. 539–547.
-  J. Chang and D. Blei, “Relational topic models for document networks,” in Artificial Intelligence and Statistics, 2009, pp. 81–88.
-  R. Xia, Y. Pan, L. Du, and et.al, “Robust multi-view spectral clustering via low-rank and sparse decomposition.” in AAAI, 2014, pp. 2149–2155.
-  J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating realistic graphs with deep auto-regressive models,” in International Conference on Machine Learning, 2018, pp. 5694–5703.