Learning Graph Embedding with Adversarial Training Methods

01/04/2019 ∙ by Shirui Pan, et al. ∙ City University of Hong Kong University of Technology Sydney Monash University 0

Graph embedding aims to transfer a graph into vectors to facilitate subsequent graph analytics tasks like link prediction and graph clustering. Most approaches on graph embedding focus on preserving the graph structure or minimizing the reconstruction errors for graph data. They have mostly overlooked the embedding distribution of the latent codes, which unfortunately may lead to inferior representation in many cases. In this paper, we present a novel adversarially regularized framework for graph embedding. By employing the graph convolutional network as an encoder, our framework embeds the topological information and node content into a vector representation, from which a graph decoder is further built to reconstruct the input graph. The adversarial training principle is applied to enforce our latent codes to match a prior Gaussian or Uniform distribution. Based on this framework, we derive two variants of adversarial models, the adversarially regularized graph autoencoder (ARGA) and its variational version, adversarially regularized variational graph autoencoder (ARVGA), to learn the graph embedding effectively. We also exploit other potential variations of ARGA and ARVGA to get a deeper understanding on our designs. Experimental results compared among twelve algorithms for link prediction and twenty algorithms for graph clustering validate our solutions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph are essential tools to capture and model complicated relationships among data. In a variety of graph applications, such as social networks, citation networks, protein-protein interaction networks, graph data analysis plays an important role in various data mining tasks including node classification [1], node clustering [2], node recommendation [3, 4], and graph classification [5, 6]

. However, the high computational complexity, low parallelizability, and inapplicability of machine learning methods to graph data have made these graph analytic tasks profoundly challenging

[7]. Graph embedding has recently emerged as a general approach to these problems.

Graph embedding transfers graph data into a low dimensional, compact, and continuous feature space. The fundamental idea is to preserve the topological structure, vertex content, and other side information [8]. This new learning paradigm has shifted the tasks of seeking complex models for classification, clustering, and link prediction to learning a compact and informative representation for the graph data, so that many graph mining tasks can be easily performed by employing simple traditional models (e.g., a linear SVM for the classification task). This merit has motivated many studies in this area [4, 9].

Graph embedding algorithms can be classified into three categories: probabilistic models, matrix factorization-based algorithms, and deep learning-based algorithms. Probabilistic models like DeepWalk

[10], node2vec [11] and LINE [12]

attempt to learn graph embedding by extracting different patterns from the graph. The captured patterns or walks include global structural equivalence, local neighborhood connectivities, and other various order proximities. Compared with classical methods such as Spectral Clustering

[13], these graph embedding algorithms perform more effectively and are scalable to large graphs.

Matrix factorization-based algorithms, such as GraRep [14], HOPE [15], M-NMF [16] pre-process the graph structure into an adjacency matrix and obtain the embedding by factorizing the adjacency matrix. It has been recently shown that many probabilistic algorithms including DeepWalk [10], LINE [12], node2vec [11], are equivalent to matrix factorization approaches [17], and Qiu et al. propose a unified matrix factorization approach NetMF [17]

for graph embedding. Deep learning approaches, especially autoencoder-based methods, are also studied for graph embedding (a most up-to-date survey on graph neural networks can be found here

[18]). SDNE [19] and DNGR [20] employ deep autoencoders to preserve the graph proximities and model the positive pointwise mutual information (PPMI). The MGAE algorithm utilizes a marginalized single layer autoencoder to learn representation for graph clustering [2].

The approaches above are typically unregularized approaches which mainly focus on preserving the structure relationship (probabilistic approaches) or minimizing the reconstruction error (matrix factorization or deep learning methods). They have mostly ignored the latent data distribution of the representation. In practice, unregularized embedding approaches often learn a degenerate identity mapping where the latent code space is free of any structure [21], and can easily result in poor representation in dealing with real-world sparse and noisy graph data. One standard way to handle this problem is to introduce some regularization to the latent codes and enforce them to follow some prior data distribution [21]. Recently generative adversarial based frameworks [22, 23, 24, 25] have also been developed for learning robust latent representation. However, none of these frameworks is specifically for graph data, where both topological structure and content information are required to be represented into a latent space.

In this paper, we propose a novel adversarially regularized algorithm with two variants, adversarially regularized graph autoencoder (ARGA) and its variational version, adversarially regularized variational graph autoencoder

(ARVGA), for graph embedding. The theme of our framework is to not only minimize the reconstruction errors of the topological structure but also to enforce the learned latent embedding to match a prior distribution. By exploiting both graph structure and node content with a graph convolutional network, our algorithms encodes the graph data in the latent space. With a decoder aiming at reconstructing the topological graph information, we further incorporate an adversarial training scheme to regularize the latent codes to learn a robust graph representation. The adversarial training module aims to discriminate if the latent codes are from a real prior distribution or the graph encoder. The graph encoder learning and adversarial regularization learning are jointly optimized in a unified framework so that each can be beneficial to the other and finally lead to a better graph embedding. To get further insight into the influence of prior distribution, we have varied it with the Gaussian distribution and Uniform distribution for all models and tasks. Moreover, we have examined the different ways to construct the graph decoders as well as the target of the reconstructions. By doing so, we have obtained a comprehensive view of the most influential factor of the adversarially regularized graph autoencoder models for different tasks. The experimental results on three benchmark graph datasets demonstrate the superb performance of our algorithms on two unsupervised graph analytic tasks, namely link prediction and node clustering. Our contributions can be summarized below:

  • We propose a novel adversarially regularized framework for graph embedding, which represents topological structure and node content in a continuous vector space. Our framework learns the embedding to minimize the reconstruction error while enforcing the latent codes to match a prior distribution.

  • We develop two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA) to learn the graph embedding.

  • We have examined different prior distributions, the ways to construct decoders, and the targets of the reconstructions to point out the influence of the factors of the adversarially regularized graph autoencoder models on various tasks.

  • Experiments on benchmark graph datasets demonstrate that our graph embedding approaches outperform the others on different unsupervised tasks.

The paper is structured as follows. Section 2 reviews the related work. Section 3 outlines the problem definition and our overall framework. Section 4 presents the proposed algorithm and Section 5 describes the experimental results. We conclude the paper in Section 6.

2 Related Work

2.1 Graph Embedding Models

Graph embedding, also known as network embedding [4], or network representation learning [8], transfers a graph into vectors. From the perspective of information exploration, graph embedding algorithms can be separated into two groups: topological network embedding approaches and content enhanced network embedding methods.

Topological network embedding approaches Topological network embedding approaches assume that there is only topological structure information available, and the learning objective is to preserve the topological information maximumly [26, 27]. Inspired by the word embedding approach [28], Perozzi et al. propose a DeepWalk model to learn the node embedding from a collection of random walks [10]. Since then, many probabilistic models have been developed. Specifically, Grover et al. propose a biased random walks approach, node2vec [11], which employs both breath-first sampling (BFS) and Depth-first sampling (DFS) strategies to generate random walk sequences for network embedding. Tang et al. propose a LINE algorithm [12] to handle large-scale information networks while preserving both first-order and second-order proximity. Other random walk variants include hierarchical representation learning approach (HARP) [29], and discriminative deep random walk (DDRW) [30], and Walklets [31].

Because a graph can be mathematically represented as an adjacency matrix, many matrix factorization approaches are proposed to learn the latent representation for a graph. GraRep [14] integrates the global topological information of the graph into the learning process to represent each node into a low dimensional space; HOPE [15] preserves the asymmetric transitivity by approximating high-order proximity for a better performance on capturing topological information of graphs and reconstructing from partially observed graphs; DNE [32] aims to learn discrete embedding which reduces the storage and computational cost. Recently deep learning models have been exploited to learn the graph embedding. These algorithms preserve the first and second order of proximities [19], or reconstruct the positive pointwise mutual information (PPMI) [20] via different variants of autoencoders.

Content enhanced network embedding methods Content enhanced embedding methods assume node content information is available and exploit both topological information and content features simultaneously. TADW [33] proved that DeepWalk can be interpreted as a factorization approach and proposed an extension to DeepWalk to explore node features. TriDNR [34] captures structure, node content, and label information via a tri-party neural network architecture. UPP-SNE [35] employs an approximated kernel mapping scheme to exploit user profile features to enhance the embedding learning of users in social networks. SNE [36] learns a neural network model to capture both structural proximity and attribute proximity for attributed social networks. DANE [37] deals with the dynamic environment with an incremental matrix factorization approach, and LANE [38] incorporates label information into the optimization process to learn a better embedding. Recently, BANE [39]

is proposed to learn binarized embedding for attributed graph which has potential to increase the efficiency for latter graph analytic tasks.

Although these algorithms are well-designed for graph-structured data, they have largely ignored the embedding distribution, which may result in poor representation in real-graph data. In this paper, we explore adversarial training approaches to address this issue.

2.2 Adversarial Models

Our method is motivated by the generative adversarial network (GAN) [40]. GAN plays an adversarial game with two linked models: the generator and the discriminator . The discriminator discriminates if an input sample comes from the prior data distribution or from the generator we built. Simultaneously, the generator is trained to generate the samples to convince the discriminator that the generated samples come from the prior data distribution. Typically, the training process is split into two steps: (1) Train the discriminator for iterations to distinguish the samples from the expected data distribution from the samples generated via the generator. Then (2) train the generator to confuse the discriminator with its generated data. However, the original GAN does not fit the unsupervised data encoding, as the absence of the precise structure for inference. To implement the adversarial structure in learning data embedding, existing works like BiGAN[22], EBGAN[23] and ALI[24] arrive at extending the original adversarial framework with external structures for the inference, which have achieved non-negligible performance in applications, such as document retrieval[41] and image classification[22]. Other solutions manage to generate the embedding from the discriminator or generator for semi-supervised and supervised tasks via reconstructed layers. For example, DCGAN[25]

bridges the gap between convolutional networks and generative adversarial networks with particular architectural constraints for unsupervised learning; and ANE

[42] combines a structure-preserving component and an adversarial learning scheme to learn a robust embedding.

Makhzani et al. proposed an adversarial autoencoder (AAE) to learn the latent embedding by merging the adversarial mechanism into the autoencoder [21]. However, AAE is designed for general data rather than graph data. Recently there are some studies on applying adversarial mechanism to graphs. However, their approach can only exploit the topological information [42, 43]. In contrast, our algorithm is more flexible and can handle both topological and content information for graph data.

Though many adversarial models have achieved impressive success in computer vision, they cannot effectively and directly handle the graph-structured data. With some preliminary study in

[44], we try to thoroughly exploit the graph convolutional models with different adversarial models to learn a robust graph embedding in this paper.

2.3 Graph Convolutional Nets based Models

Graph convolutional networks (GCN) [1]

is a semi-supervised framework based on a variant of convolutional neural networks, which attempt to operate the graphs directly. Specifically, the GCN represents the graph structure and the interrelationship between node and feature with an adjacent matrix

and node-feature matrix . Hence, GCN can directly embed the graph structure with a spectral convolutional function for each layer and train the model on a supervised target for all labelled nodes. Because of the spectral function on the adjacent matrix of the graph, the model can distribute the gradient from the supervised cost and learn the embedding of both the labelled and unlabelled nodes. Although GCN is powerful on graph-structured data sets for semi-supervised tasks like node classification, variational graph autoencoder VGAE [45] extends it into unsupervised scenarios. Specifically, VGAE integrates the GCN into the variational autoencoder framework [46] by framing the encoder with graph convolutional layers and remolding the decoder with a link prediction layer. Taking advantage of GCN layers, VGAE can naturally leverage the information of node features, which expressively muscle the predictive performance.

Fig. 1: The architecture of the adversarially regularized graph autoencoder (ARGA). The upper tier is a graph convolutional autoencoder that reconstructs a graph from an embedding which is generated by the encoder which exploits graph structure and the node content matrix . The lower tier is an adversarial network trained to discriminate if a sample is generated from the embedding or from a prior distribution. The adversarially regularized variational graph autoencoder (ARVGA) is similar to ARGA except that it employs a variational graph autoencoder in the upper tier (See Algorithm 1 for details).

3 Problem Definition and Framework

A graph is represented as , where is constitutive of a set of nodes in a graph and represents a linkage coding the citation edge between the papers (nodes). The topological structure of graph can be represented by an adjacency matrix , where if , otherwise . encodes the textual content features associated with each node .

Given a graph , our purpose is to map the nodes to low-dimensional vectors with the formal format as follows: , where is the -th row of the matrix . is the number of nodes and is the dimension of embedding. We take as the embedding matrix and the embeddings should well preserve the topological structure as well as content information .

3.1 Overall Framework

The objective is to learn a robust embedding for a given graph . To this end, we leverage an adversarial architecture with a graph autoencoder to directly process the entire graph and learn a robust embedding. Figure 1 demonstrates the workflow of ARGA which consists of two modules: the graph autoencoder and the adversarial network.

  • Graph convolutional autoencoder. The autoencoder takes in the structure of graph and the node content as inputs to learn a latent representation , and then reconstructs the graph structure from . We will further explore other variants of graph autoencoder in Section 4.4.

  • Adversarial regularization. The adversarial network forces the latent codes to match a prior distribution by an adversarial training module, which discriminates whether the current latent code comes from the encoder or from the prior distribution.

4 Proposed Algorithm

4.1 Graph Convolutional Autoencoder

Our graph convolutional autoencoder aims to embed a graph in a low-dimensional space. Two fundamental questions arise (1) how to simultaneously integrate graph structure and content feature in an encoder, and (2) what sort of information should be reconstructed via a decoder?

Graph Convolutional Encoder Model .   To represent both graph structure and node content in a unified framework, we develop a variant of the graph convolutional network (GCN) [1] as a graph encoder. GCN introduces the convolutional operation to graph-data in the spectral area, and leverage a spectral convolutional function to build a layer-wise transformation:

(1)

Here, and are the input and output of the convolution respectively. We set ( indicates the number of nodes and indicates the number of features) for our problem. We need to learn a filter parameter matrix in the neural network, and if the spectral convolution function is well defined, we can efficiently construct arbitrary deep convolutional neural networks.

Each layer of our graph convolutional network can be expressed with the the spectral convolution function as follows:

(2)

where and .

is the identity matrix of

and

is an activation function such as

or . Overall, the graph encoder is constructed with a two-layer GCN. In our paper, we develop two variants of the encoder, e.g., Graph Encoder and Variational Graph Encoder.

The Graph Encoder is constructed as follows:

(3)
(4)

and linear activation functions are used for the first and second layers. Our graph convolutional encoder encodes both graph structure and node content into a representation .

A Variational Graph Encoder is defined by an inference model:

(5)
(6)

Here, is the matrix of mean vectors ; similarly which share the weights with in the first layer in Eq. (3).

Decoder Model.   Our decoder model is used to reconstruct the graph data. We can reconstruct either the graph structure , content information , or both. In the basic version of our model (ARGA), we propose to reconstruct graph structure , which provides more flexibility in the sense that our algorithm will still function properly even if there is no content information available (e.g., ). We will provide several variants of decoder model in Section 4.3. Here the ARGA decoder predicts whether there is a link between two nodes. More specifically, we train a link prediction layer based on the graph embedding:

(7)
(8)

here the prediction should be close to the ground truth .

Graph Autoencoder Model.   The embedding and the reconstructed graph can be presented as follows:

(9)

Optimization.   For the graph encoder, we minimize the reconstruction error of the graph data by:

(10)

For the variational graph encoder, we optimize the variational lower bound as follows:

(11)

where

is the Kullback-Leibler divergence between

and . is a prior distribution which can be a uniform distribution or a Gaussian distribution in practice.

4.2 Adversarial Model

The fundamental idea of our model is to enforce latent representation

to match a prior distribution, which is achieved by an adversarial training model. The adversarial model is built on a standard multi-layer perceptron (MLP) where the output layer only has one dimension with a sigmoid function. The adversarial model acts as a discriminator to distinguish whether a latent code is from the prior

(positive) or graph encoder (negative). By minimizing the cross-entropy cost for training the binary classifier, the embedding will finally be regularized and improved during the training process. The cost can be computed as follows:

(12)

In our paper, we have examined both Gaussian distribution and Uniform distribution as for all models and tasks.

Adversarial Graph Autoencoder Model.   The equation for training the encoder model with Discriminator can be written as follows:

(13)

where and indicate the generator and discriminator explained above.

0:  : a Graph with links and features;: the number of iterations;: the number of steps for iterating discriminator;: the dimension of the latent variable
0:  
1:  for iterator = 1,2,3, ,  do
2:     Generate latent variables matrix through Eq.(4);
3:     for k = 1,2, ,  do
4:         Sample entities {, …, } from latent matrix
5:         Sample entities {, …, } from the prior distribution
6:         Update the discriminator with its stochastic gradient:
7:     end for
8:     Update the graph autoencoder with its stochastic gradient by Eq. (10) for ARGA or Eq. (11) for ARVGA;
9:  end for
10:  return  
Algorithm 1 Adversarially Regularized Graph Embedding
Fig. 2: The architecture of adversarially regularized graph autoencoder with a graph convolutional decoder (ARGA_GD) to reconstruct the topological structure .The upper tier is a standard graph convolutional autoencoder. The decoder employs the graph convolutional networks. The lower tier keeps the same with both Gaussian distribution and Uniform distribution. ARVGA_GD is similar to ARGA_GD except that it employs a variational graph autoencoder in the upper tier.

4.3 Algorithm Explanation

Algorithm 1 is our proposed framework. Given a graph , step 2 gets the latent variables matrix from the graph convolutional encoder. Then we take the same number of samples from the generated and the real data distribution in step 4 and 5 respectively, to update the discriminator with the cross-entropy cost computed in step 6. After runs of training the discriminator, the graph encoder will try to confuse the trained discriminator and update itself with the generated gradient in step 8. We can update Eq. (10) to train the adversarially regularized graph autoencoder (ARGA), or Eq. (11) to train the adversarially regularized variational graph autoencoder (ARVGA), respectively. Finally, we will return the graph embedding in step 9.

Fig. 3: The architecture of the ARGA_AX which simultaneously reconstructs the graph topological structure and the node content matrix . The lower tier keeps the same, and we also exploit the variational version of the ARVGA_AX.

4.4 Decoder Variations

In ARGA and ARVGA models, the decoder is merely a link prediction layer which performs as a dot product of the embedding Z. In practice, the decoder can also be a graph convolutional layer or a combination of link prediction layer and graph convolutional decoder layer.

GCN Decoder for Graph Structure Reconstruction (ARGA_GD) We have modified the encoder by adding two graph convolutional layers to reconstruct the graph structure. This variant of approach is named ARGA_GD. Fig. 2 demonstrates the architecture of ARGA_GD. In this approach, the input of the decoder will be the embedding from the encoder, and the graph convolutional decoder is constructed as follows:

(14)
(15)

where is the embedding learned from the graph encoder while and are the outputs from the first and second layer of the graph decoder. The number of horizontal dimension of is equal to the number of nodes. Then we calculate the reconstruction error as follows:

(16)

GCN Decoder for both Graph Structure and Content Information Reconstruction (ARGA_AX) We have further modified our graph convolutional decoder to reconstruct both the graph structure and content information . The architecture is illustrated in Fig 3. We fixed the dimension of second graph convolutional layer with the same number of the features associated with every node, thus the output from the second layer . In this case, the reconstruction loss is composed of two errors. First, the reconstruction error of graph structure can be minimized as follows:

(17)

Then the reconstruction error of node content can be minimized with a similar formula:

(18)

The final reconstruction error is the sum of the reconstruction error of graph structure and node content:

(19)

5 Experiments

We report our results on both link prediction and node clustering tasks. The benchmark graph datasets used in the paper, Cora [47], Citeseer [48] and Pubmed [49], are summarized in table 1. Each dataset consists of scientific publications as nodes and citation relationships as edges. The features are unique words in each document.

Data Set # Nodes # Links # Content Words # Features
Cora 2,708 5,429 3,880,564 1,433
Citeseer 3,327 4,732 12,274,336 3,703
PubMed 19,717 44,338 9,858,500 500
TABLE I: Real-world Graph Datasets Used in the Paper
Approaches Cora Citeseer PubMed
AUC AP AUC AP AUC AP
SC 84.6 0.01 88.5 0.00 80.5 0.01 85.0 0.01 84.2 0.02 87.8 0.01
DW 83.1 0.01 85.0 0.00 80.5 0.02 83.6 0.01 84.4 0.00 84.1 0.00
84.3 0.02 88.1 0.01 78.7 0.02 84.1 0.02 82.2 0.01 87.4 0.00
84.0 0.02 87.7 0.01 78.9 0.03 84.1 0.02 82.7 0.01 87.5 0.01
GAE 91.0 0.02 92.0 0.03 89.5 0.04 89.9 0.05 96.4 0.00 96.5 0.00
VGAE 91.4 0.01 92.6 0.01 90.8 0.02 92.0 0.02 94.4 0.02 94.7 0.02
ARGA 92.4 0.003 93.2 0.003 91.9 0.003 93.0 0.003 96.8 0.001 97.1 0.001
ARVGA 92.4 0.004 92.6 0.004 92.4 0.003 93.0 0.003 96.5 0.001 96.8 0.001
77.9 0.003 78.9 0.003 74.4 0.003 76.2 0.003 95.1 0.001 95.2 0.001
88.0 0.004 87.9 0.004 89.7 0.003 90.5 0.003 93.2 0.001 93.6 0.001
91.3 0.003 91.3 0.003 91.9 0.003 93.4 0.003 96.6 0.001 96.7 0.001
90.2 0.004 89.2 0.004 89.8 0.003 90.4 0.003 96.7 0.001 97.1 0.001
TABLE II: Results for Link Prediction. and are variants of GAE and VGAE, which only explore topological structure, i.e., .

5.1 Link Prediction

Baselines.   Twelve algorithms in total are compared for the link prediction task:

  • DeepWalk [10] is a network representation approach which encodes social relations into a continuous vector space.

  • Spectral Clustering [13] is an effective approach to learn social embedding.

  • GAE [45] is the most recent autoencoder-based unsupervised framework for graph data, which naturally leverages both topological structure and content information . GAE is the version of GAE which only considers the topological information , i.e., .

  • VGAE [45] is the variational graph autoencoder for graph embedding with both topological and content information. Likewise, VGAE is a simplified version of VGAE which only leverages the topological information.

  • ARGA is our proposed adversarially regularized autoencoder algorithm which uses graph autoencoder to learn the embedding.

  • ARVGA is our proposed algorithm, which uses a variational graph autoencoder to learn the embedding.

  • ARGA_DG is a variant of our proposed ARGA which takes graph convolutional layers as its decoder to reconstruct graph structure. ARVGA_DG is the variational version of ARGA_DG.

  • ARGA_AX is a variant of our proposed ARGA which takes graph convolutional layers as its decoder to simultaneously reconstruct graph structure and node content. ARVGA_AX is the variational version of ARGA_AX.

Metric.  

We report the results concerning AUC score (the area under a receiver operating characteristic curve) and average precision (AP)

[45] score which can be computed as follow:

where is the outputs from the predictor and and are the number of positive samples and the number of negative samples respectively. We also report the Average Precision (AP) which indicates the area under the precision-recall curve:

where is an index for the class .

We conduct each experiment 10 times and report the mean values with the standard errors as the final scores. Each dataset is separated into a training, testing set, and a validation set. The validation set contains 5% citation edges for hyperparameter optimization, the test set holds 10% citation edges to verify the performance, and the rest are used for training.

Parameter Settings.

For the Cora and Citeseer data sets, we train all autoencoder-related models for 200 iterations and optimize them with the Adam algorithm. Both learning rate and discriminator learning rate are set as 0.001. As the PubMed dataset is relatively large (around 20,000 nodes), we iterate 2,000 times for adequate training with a 0.008 discriminator learning rate and 0.001 learning rate. We construct encoders with a 32-neuron hidden layer and a 16-neuron embedding layer for all the experiments and all the discriminators are built with two hidden layers(16-neuron, 64-neuron respectively). For the rest of the baselines, we retain the settings described in the corresponding papers.

K-means Spectral BigClam GraphEncoder DeepWalk DNGR Circles RTM RMSC TADW GAE ARGA
Content
Structure
Adversarial
GCN encoder
GCN dncoder
Recover A
Recover X
TABLE III: Algorithm Comparison

Experimental Results.   The details of the experimental results on the link prediction are shown in Table 2. The results show that by incorporating an effective adversarial training module into our graph convolutional autoencoder, ARGA and ARVGA achieve outstanding performance: all AP and AUC scores are as higher as 92% on all three data sets. Compared with all the baselines, ARGA increased the AP score from around 2.5% compared with VGAE incorporating with node features, 11% compared with VGAE without node features; 15.5% and 10.6% compared with DeepWalk and Spectral Clustering respectively on the large PubMed data set.

The approaches which use both node content and topological information are always straightforward to get better performance compared to those only consider graph structure. The gap between ARGA and GAE models demonstrates that regularization on the latent codes has its advantage to learn a robust embedding. The impact of various distributions, architectures of the decoder as well as the reconstructions will be discussed in Sec:5.3 - ARGA Architectures Comparison.

Fig. 4: Average performance on different dimensions of the embedding. (A) Average Precision score; (B) AUC score.

Parameter Study.   We conducted experiments on Cora dataset by varying the dimension of embedding from 8 neurons to 1024 and report the results in Fig 4.

The results from both Fig 4 (A) and (B) reveal similar trends: when adding the dimension of embedding from 8-neuron to 16-neuron, the performance of embedding on link prediction steadily rises; when we further increase the number of the neurons at the embedding layer to 32-neuron, the performance fluctuates, however, the results for both the AP score and the AUC score remain good.

It is worth mentioning that if we continue to set more neurons, for examples, 64-neuron, 128-neuron and 1024-neuron, the performance rises dramatically.

5.2 Node Clustering

For the node clustering task, we first learn the graph embedding, and after that, we perform the K-means clustering method based on the embedding.

Baselines We compare both embedding based approaches as well as approaches directly for graph clustering. Except for the baselines we compared for link prediction, we also include baselines which are designed for clustering. Twenty approaches in total are compared in the experiments. For a comprehensive validation, we take the algorithms which only consider one perspective of information source, say, network structure or node content, as well as algorithms considering both factors.

Node Content or Graph Structure Only:

  1. K-means is a classical method and also the foundation of many clustering algorithms.

  2. Big-Clam [13] is a community detection algorithm based on NMF.

  3. Graph Encoder [50] learns graph embedding for spectral graph clustering.

  4. DNGR [20]

    trains a stacked denoising autoencoder for graph embedding.

Both Content and Structure

  1. Circles [51] is an overlapping graph clustering algorithm which treats each node as ego and builds the ego graph with the linkages between the ego’s friends.

  2. RTM [52] learns the topic distributions of each document from both text and citation.

  3. RMSC [53]

    is a multi-view clustering algorithm which recovers the shared low-rank transition probability matrix from each view for clustering. In this paper, we treat node content and topological structure as two different views.

  4. TADW [33] applies matrix factorization for network representation learning.

Table III gives the detailed comparison of most of the baselines. For space saving, we did not list the variational versions of our models. Recovering and in the table demonstrates whether the model reconstructs the graph structure () and node content (). Please note that we do not report the clustering results from Circle on PubMed dataset as the single experiment have been running more than three days without any outcome and error. We think this is because of the large size of the PubMed dataset (around 20,000 nodes). Note that the Circle algorithm works well on the other two datasets.

Cora Acc NMI F1 Precision ARI
K-means 0.492 0.321 0.368 0.369 0.230
Spectral 0.367 0.127 0.318 0.193 0.031
BigClam 0.272 0.007 0.281 0.180 0.001
GraphEncoder 0.325 0.109 0.298 0.182 0.006
DeepWalk 0.484 0.327 0.392 0.361 0.243
DNGR 0.419 0.318 0.340 0.266 0.142
Circles 0.607 0.404 0.469 0.501 0.362
RTM 0.440 0.230 0.307 0.332 0.169
RMSC 0.407 0.255 0.331 0.227 0.090
TADW 0.560 0.441 0.481 0.396 0.332
0.439 0.291 0.417 0.453 0.209
0.443 0.239 0.425 0.430 0.175
GAE 0.596 0.429 0.595 0.596 0.347
VGAE 0.609 0.436 0.609 0.609 0.346
ARGA 0.640 0.449 0.619 0.646 0.352
ARVGA 0.638 0.450 0.627 0.624 0.374
0.604 0.425 0.594 0.600 0.373
0.463 0.387 0.455 0.524 0.265
0.597 0.455 0.579 0.593 0.366
0.711 0.526 0.693 0.710 0.495
TABLE IV: Clustering Results on Cora
Citeseer Acc NMI F1 Precision ARI
K-means 0.540 0.305 0.409 0.405 0.279
Spectral 0.239 0.056 0.299 0.179 0.010
BigClam 0.250 0.036 0.288 0.182 0.007
GraphEncoder 0.225 0.033 0.301 0.179 0.010
DeepWalk 0.337 0.088 0.270 0.248 0.092
DNGR 0.326 0.180 0.300 0.200 0.044
Circles 0.572 0.301 0.424 0.409 0.293
RTM 0.451 0.239 0.342 0.349 0.203
RMSC 0.295 0.139 0.320 0.204 0.049
TADW 0.455 0.291 0.414 0.312 0.228
0.281 0.066 0.277 0.315 0.038
0.304 0.086 0.292 0.331 0.053
GAE 0.408 0.176 0.372 0.418 0.124
VGAE 0.344 0.156 0.308 0.349 0.093
ARGA 0.573 0.350 0.546 0.573 0.341
ARVGA 0.544 0.261 0.529 0.549 0.245
0.479 0.231 0.446 0.456 0.203
0.448 0.256 0.410 0.496 0.149
0.547 0.263 0.527 0.549 0.243
0.581 0.338 0.525 0.537 0.301
TABLE V: Clustering Results on Citeseer
Pubmed Acc NMI F1 Precision ARI
K-means 0.398 0.001 0.195 0.579 0.002
Spectral 0.403 0.042 0.271 0.498 0.002
BigClam 0.394 0.006 0.223 0.361 0.003
GraphEncoder 0.531 0.209 0.506 0.456 0.184
DeepWalk 0.684 0.279 0.670 0.686 0.299
DNGR 0.458 0.155 0.467 0.629 0.054
RTM 0.574 0.194 0.444 0.455 0.148
RMSC 0.576 0.255 0.521 0.482 0.222
TADW 0.354 0.001 0.335 0.336 0.001
0.581 0.196 0.569 0.636 0.162
0.504 0.162 0.504 0.631 0.088
GAE 0.672 0.277 0.660 0.684 0.279
VGAE 0.630 0.229 0.634 0.630 0.213
ARGA 0.668 0.305 0.656 0.699 0.295
ARVGA 0.690 0.290 0.678 0.694 0.306
0.630 0.212 0.629 0.631 0.209
0.630 0.226 0.632 0.629 0.212
0.637 0.245 0.639 0.642 0.231
0.640 0.239 0.644 0.639 0.226
TABLE VI: Clustering Results on Pubmed
Fig. 5: Average node clustering performance on different dimensions of the embedding.

Metrics:   Following [53], we employ five metrics to validate the clustering results: accuracy (Acc), F-one score (F1), normalized mutual information (NMI), precision and average rand index (ARI).

Experimental Results.   The clustering results on the Cora, Citeseer and Pubmed data sets are given in table IV, table V and table VI. The results show that ARGA and ARVGA have achieved a dramatic improvement on all five metrics compared with all the other baselines. For instance, on Citeseer, ARGA has increased the accuracy from 6.1% compared with K-means to 154.7% compared with GraphEncoder; increased the F1 score from 31.9% compared with TADW to 102.2% compared with DeepWalk; and increased NMI from 14.8% compared with K-means to 124.4% compared with VGAE.

Furthermore, as we can see from the three tables, the clustering results from approaches BigClam and DeepWalk, which only consider one perspective information of the graph, are inferior to the results from those which consider both topological information and node content of the graph. However, both purely GCNs-based approaches or the methods considering multi-view information still only obtain sub-optimal results compared to the adversarially regularized graph convolutional models.

The wide margin in the results between ARGA and GAE (and the others) has further demonstrated the superiority of our adversarially regularized graph autoencoder.

Fig. 6: The ARGA related models comparison on the clustering task with different prior distributions.
Fig. 7: The ARGA related models comparison on the link prediction task with different prior distributions.

Parameter Study.   We conducted experiments on Cora dataset with varying the dimension of embedding from 8 neurons to 1024 and report the results in Fig 5

. All metrics demonstrated the similar fluctuation as the dimension of the embedding is increased. We cannot extract apparent trends to represent the relations between the embedding dimensions and the score of each clustering metric. This observation indicates that the unsupervised clustering task is more sensitive to the parameters compared to the supervised learning tasks (e.g., link prediction in Section 5.2).

5.3 ARGA Architectures Comparison

In this section, we construct six versions of the model: adversarially regularized graph autoencoder (ARGA), adversarially regularized graph autoencoder with graph convolutional decoder (ARGA_DG) and adversarially regularized graph autoencoder for reconstructing both graph structure and node content (ARGA_AX) and their variational versions. Meanwhile, we conduct all experiments with a prior Gaussian distribution and a prior Uniform Distribution respectively for every model. We analyze the comparison experiments and try to figure out the reasons behind the results. The experimental results are illustrated in Fig, 6 and 7.

Gaussian Distribution vs Uniform Distribution.   The performance of the proposed models is not very sensitive to the prior distributions, especially for the node clustering task. As is shown in the Fig. 6, if we compare the results of two distributions with the same metric, the results from one same model, in most cases, are very similar.

As for the link prediction (Fig. 7), the Uniform distribution dramatically lowers the performance of ARGA_DG on all datasets and metrics, compared to the results with Gaussian distribution. ARGA and its variational version are not as sensitive to the different distributions as ARGA_DG models. The standard version of ARGA with Gaussian distribution slightly outperforms the ones with Uniform distribution. The situation reversed with the variational ARGA models.

Decoders and Reconstructions.   As is shown in Fig. 7, the ARGA with the Gaussian distribution and inner product decoder for reconstructing graph structure has a significant advantage in link prediction since is designed to predict whether there is a link between two nodes. Simply replacing the decoder with graph convolutional layers to reconstruct adjacency matrix (ARGA_DG) has got a sub-optimal performance on link prediction compared to ARGA. According to the statistic in Fig. 6, although the performance of ARGA_DG on clustering is comparable with original ARGA, there is still a gap between these two variations. Two graph convolutional layers in the decoder cannot effectively decode the topological information of the graph, which leads to the sub-optimal results. The model with graph convolutional decoder for reconstructing both topological information and node content (ARGA_AX) may prove this hypothesis. As can be seen in Fig. 6 and 7, ARGA_AX has dramatically improved the performance on both link prediction and clustering compared to ARGA_DG which purely reconstructs the topological structure. ARGA and ARGA_AX have very similar performances on both link prediction and clustering. The variational version of ARGA_AX (ARVGA_AX) has outstanding performance on clustering which has achieved 12.2% improvement on clustering accuracy on Cora dataset and 5.4% improvement on Citeseer dataset compared to ARVGA.

6 Conclusion and Future Work

In this paper, we proposed a novel adversarial graph embedding framework for graph data. We argue that most existing graph embedding algorithms are unregularized methods that ignore the data distributions of the latent representation and suffer from inferior embedding in real-world graph data. We proposed an adversarial training scheme to regularize the latent codes and enforce the latent codes to match a prior distribution. The adversarial module is jointly learned with a graph convolutional autoencoder to produce a robust representation. We also exploited some interesting variations of ARGA like ARGA_DG and ARGA_AX to discuss the impact of graph convolutional decoder for reconstructing both graph structure and node content. Experiment results demonstrated that our algorithms ARGA and ARVGA outperform baselines in link prediction and node clustering tasks.

There are several directions for the adversarially regularized graph autoencoders (ARGA). We will investigate how to use the ARGA model to generate some realistic graphs [54], which may help discover new drugs in biological domains. We will also study how to incorporate label information into ARGA to learn robust graph embedding.

References