1. Introduction
Representation learning, which provides lowdimensional vectorspace representations for the data, is an important research field in machine learning since it can significantly simplify the algorithms. Representation learning for networks, a.k.a. network embedding, is specifically important for applications with massive amount of networkstyle of data, such as social networks and email graphs. The purpose of network embedding is to generate informative numerical representations of nodes and edges, which in turn enable further inference in the network data, such as link prediction and visualization.
Most existing methods in network embedding explicitly define the representative structures of the network as some numerical/computational measurements during the representation learning process. For example, LINE (Tang et al., 2015) and SDNE(Wang et al., 2016) use localstructure (firstorder proximity) and/or global structure (secondorder proximity), while DeepWalk (Perozzi et al., 2014) uses network community structure. The learned representations are then sent to some machine learning toolkits to guide a specific discriminative task, such as link prediction with classification or regression models. That is, the network embedding is learned separately from the actual tasks. Therefore, the learned representations may not be capable of optimizing the objective functions of tasks directly.
Alternative solutions utilize deep models to retrieve lowdimens
ional network representations. For example, Li et al. (Li
et al., 2017a)
use a variational autoencoder to represent an information network. Thus, the representations obtained right before the decoder layer are considered as a learned representations when the reconstruction loss converges. HNE
(Chang et al., 2015) studies network embedding for heterogeneous data. It integrates deep models into a unified framework to solve the similarity prediction problem. Similarly, the output of the layer before the prediction layer in HNE is treated as the embeddings. In these approaches, the network embedding is a byproduct of applying deep models on a specific task.However, the networking embeddings in existing solutions are somewhat handcrafted structures. Moreover, there is no systematical support to exhaustively explore potential structures of networks. Therefore, this paper proposes to incorporate Generative adversarial networks (GANs) into network embeddings.
GANs(Goodfellow et al., 2014)
are promising frameworks for various learning tasks, especially in computer vision area. Technically, a GAN consists of two components: a discriminator trying to predict whether a given data instance is generated by the generator or not, and a generator trying to fool the discriminator. Both components are trained by playing minimax games from game theory. Various extensions have been proposed to address some theoretical limitations in GANs. For example,
(Chen et al., 2016)(Yu et al., 2017) introduced modifications to GAN objectives, WGAN (Arjovsky et al., 2017) confined the distribution measurement to improve the training stability, and IRGAN (Wang et al., 2017b) extended the application domains to information retrieval tasks.Motivated by the empirical success of the adversarial training process, we propose Generative Adversarial Network Embedding (GANE) to perform the network embedding task. To simplify the discussion, GANE only considers single relational networks in which the edges are of the same type in comparison to multirelational networks with various types of edges. Hereafter, network embedding stands for network embedding for single relational networks.
In GANE, the generator tries to generate potential edges for a vertex and construct the representations for these edges, while the discriminator tries to distinguish the generated edges from real ones in the network and construct its own representations. Besides using cosine similarity, we also adopted the firstorder proximity to define the loss function for the discriminator, and to measure the structural information of the network preserved in the lowdimensional embedded vector space. Under the principles of minimax game, the generator tries to simulate the structures of the network with the hints from the discriminator, and the discriminator in turn exploits the underlying structure to recover missing links for the network. Wasserstein1 distance is adopted to train the generator with improved stability as suggested in WGAN
(Arjovsky et al., 2017). To the best of our knowledge, this is the first attempt to learn network embedding in a generative adversarial manner. Experiments on link prediction and clustering tasks were executed to evaluate the performance of GANE. The results show that network embeddings learned in GANE can significantly improve the performance for supervised discrimination tasks in comparison with existing solutions. The main contributions of this paper can be summarized as follows:
We develop a generative adversarial framework for network embedding. It is capable of performing the feature representation learning and link prediction simultaneously under the adversarial minimax game principles.

We discuss three variations of models, including naive GANE which applies cosine similarity, GANEO1 which preserves the firstorder proximity, and GANEO2 which tries to preserves the secondorder proximity of the network in the lowdimensional embedded vector space.

We evaluate the proposed models with detailed experiments on link prediction and clustering tasks. Results demonstrate significant and robust improvements in comparison with other stateoftheart network embedding approaches.
The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 illustrates the design and algorithms in GANEs. Section 4 presents the experimental design and results. Section 5 concludes the paper.
2. Related Work
The paper roots into two research fields, network embedding and generative adversarial networks.
2.1. Network Embedding
Extensive efforts have been devoted into the research about network embedding in recent years. Graph Factorization (Ahmed et al., 2013)
represents the network as an affinity matrix of graph, and then utilizes a distributed matrix factorization algorithm to find the lowdimensional representations of the graph. DeepWalk
(Perozzi et al., 2014) utilizes the distribution of node degree to model topological structures of the network via the random walk and skipgram to infer the latent representations of vertices in networks. Tang et al. proposed LINE (Tang et al., 2015) to preserve both local (firstorder) structures and global (secondorder) structures during the embedding process by minimizing the KLdivergence between the distributions of structures in the original network and the embedded space. LINE has been considered as one of the most popular network embedding approaches in the past two years. Thereafter, Wang et al. proposed modularized nonnegative matrix factorization to incorporate the community structures and preserve such structures during representation learning (Wang et al., 2017a). SDNE (Wang et al., 2016)applies a semisupervised deep learning framework for network embedding, in which the firstorder proximity is preserved by penalizing the similar vertices but far away in the embedded space and the secondorder proximity is preserved by using a deep autoencoder. Li et al.
(Li et al., 2017b) incorporated the text information and structure of the network into the embedding representations by employing the variational autoencoder(VAE) (Kingma and Welling, 2013). Chang et al. proposed HNE(Chang et al., 2015) to address network embedding tasks with heterogeneous information, in which deep models for content feature learning and structural feature learning are integrated in a unified framework.In summary, existing solutions in networking embedding either use shallow models or deep models to preserve different structural properties in the lowdimensional space, such as the connectivity between the vertices (the first order proximity), neighborhood connectivity patterns (the second order proximity), and other highorder proximities (the community structure). However, these solutions employ handcrafted structures for the network embedding, and it is hard to exhaustively explore potential structures of networks due to the lack of systematical support. Therefore, we propose to leverage on generative adversarial framework to explore potential structures in the networks to achieve more informative representations.
2.2. Generative Adversarial Networks
Recent advances in Generative Adversarial Networks (GANs)(Goodfellow et al., 2014) have proven GANs as a powerful framework for learning complex data distributions. The core idea is to define the generator and the discriminator to be the minimax game players competing with each other to push the generator to produce high quality data to fool the discriminator.
Mirza & Osindero introduced conditional GANs (Mirza and Osindero, 2014) to control the data generation by setting conditional constraints on the model. InfoGAN (Chen et al., 2016)
, another informationtheoretic extension to the GAN model, maximizes the mutual information between a small subset of the latent variables and the observations to learn interpretable and meaningful hidden representations on image datasets. SeqGAN
(Yu et al., 2017)models the data generator as a stochastic policy in reinforcement learning and uses the policy gradient to guide the learning process bypassing the generator differentiation problem for discrete data output.
Despite their successes, GANs are notably difficulty to train and prone to mode collapse (Arjovsky and Bottou, 2017), especially for discrete data. Energybased GAN (EBGAN) (Zhao et al., 2016) tries to achieve a stable training process by viewing the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. However, EBGANs, which regularize the distribution distance as JensenShannon (JS) divergence, share the same problem as classical GANs that the discriminator cannot be trained well enough, as the distance EBGANS adopted cannot offer perfect gradients. Replacing JS with the Earth Mover (EM) distance, WassersteinGAN (Arjovsky et al., 2017) theoretically and experimentally solves the problem of model fragility.
GANs are successfully applied in the field of computer vision for tasks including generating sample images. However, there are few attempts to apply GANs to other machine learning tasks. Recently, IRGAN (Wang et al., 2017b) has been proposed as an information retrieval model in which the generator focuses on predicting relevant documents given a query and the discriminator focuses on distinguish whether the generated documents are relevant. It showed superior performance over the stateoftheart information retrieval approaches.
In this paper, we propose to explore the strength of generative adversarial models for network embedding. The proposed framework, GANE, performs the feature representation learning and link prediction simultaneously under the adversarial minimax game principles. Wasserstein1 distance is adopted to define the overall objective function (Arjovsky et al., 2017) to overcome the notorious unstable training problem in conventional GANs.
3. GANE: Generative Adversarial Network Embedding
A network can be modeled as a set of vertices and a set of edges . That is, a network can be represented as . The primal task of network embedding is to learn a lowdimensional vectorspace representation for each vertex . Unlike existing approaches which need to train the embedding presentation before applying it to predictive tasks, we facilitate predictions and the embedding learning process in a unified framework by leveraging on generative adversarial model.
3.1. Naive GANE Discriminator
Two components, the generator and the discriminator , are defined in GANE to play the minimax game. The task of is to predict the wellmatched edge given , while the task of is to identify the observed edges from the ”unobserved” edges, where the ”unobserved” edges are generated by . The overall architecture and dataflow of GANE are depicted in Fig.LABEL:fig:gane.
To avoid the problem of unstable training in conventional GAN models which are prone to mode collapse, the EarthMover (also called Wasserstein) distance is utilized to define the overall objective function as suggested in WGAN (Arjovsky et al., 2017). can be informally defined as the minimum cost transporting mass from the distribution into the distribution . With mild assumptions, is continuous and differentiable almost everywhere. Following WGAN, the objective function is defined after the KantorovichRubinstein duality(Villani, 2009):
(1) 
where is the distribution of observed edges, and is the distribution of unobserved edges generated by . That is, and . is the utility function which computes the trust score (a scalar) for a given edge .
A naive version of GANE, or GANE, is defined with the following scoring function without considering the structural information in the network:
(2) 
where are the embedding representation of vertex and respectively.
The discriminator is trained to assign high trust score to an observed edge but lower score to an unobserved edge generated by G, while the generator is trained to produce contrastive edges with maximal trust score. Theoretically, there is a Nash equilibrium in which the generator perfectly fits in with the distribution of observed edges in the network (i.e., ), and the discriminator cannot distinguish true observed edges from the generated ones. However, it is computationally infeasible to reach such an equilibrium because the distribution of embedding in the lowdimensional space keeps changing dynamically along with the model training process. Consequently, the generator tends to learn the distribution
to model the network structure as accurately as possible, while the discriminator tends to accept the potential true (unobserved but in all probability be true) edges.
3.2. Structurepreserving Discriminator
Structural information of the network may provide valuable guidance in the model learning process. Therefore, we propose to extend the discriminator definition in GANE with the concepts of firstorder and secondorder proximity, which were introduced in LINE(Tang et al., 2015).
Definition 3.1 ().
(Firstorder Proximity) The firstorder proximity in a network describes the local pairwise proximity between two vertices. The strength between two vertices and is denoted as . indicates there is no edge observed between and .
The intuitive solution is to embed the vertices with strong ties (i.e. high ) close to each other in the lowdimensional space. Therefore, can be used as the weighting factor to evaluate the embedding representation.
For network embedding, the goal is to minimize the difference between the probability distribution of the edges in the original space and that in the embedded space. The distance between the empirical probability distribution
and the resulting probability distribution in the network embedding can be defined as(3)  
where is the joint probability between vertex and and is the set of observed edges in the network. The empirical probability is defined as , where . For each edge , is defined as
(4) 
Following Eq.(1), it is equivalent for the discriminator to minimize the loss of GANEO1, which is GANE with firstorder proximity, as
(5)  
Definition 3.2 ().
(Secondorder Proximity) The secondorder proximity between a pair of vertices describes the similarity between their neighborhood structure in the network. Let denote the firstorder proximity of with other vertices. Then, the secondorder proximity between and is determined by the similarity between and .
The intuitive solution is to embed the vertices which have high secondorder proximity close to each other in the lowdimensional space. By analogy with the corpus in natural language processing (NLP), the neighbors of
can be treated as its context (nearby words), and a vertex with similar context is considered to be similar. Similar to the skipgram model (Mikolov et al., 2013), the probability of ”context” generated by is defined with the softmax function as(6) 
The objective function, which is the distance between the empirical conditional distribution and the resulting conditional distribution in the embedded space , is defined as
(7)  
where denotes the prestige of in the network. For simplicity, the sum of weights in is used as the prestige of . That is, . The empirical distribution is defined as
(8) 
Then, Eq.(7) can be rewritten as
(9)  
However, the computation of the objective function Eq.(9) remains expensive because the softmax term needs to sum up all vertices of the network. A general solution is to apply negative sampling (Dyer, 2014)
to bypass the summations. This solution is based on Noise Contrastive Estimation (NCE)
(Gutmann and Hyvärinen, 2012)which shows that a good model should be able to differentiate data from noise by means of logistic regression. With the method of negative sampling,
(10) 
By replacing in Eq.(1) with updated Eq.(10) via aforementioned negative sampling counterpart, it is equivalent for the discriminator to minimize the loss of GANEO2, which is GANE with secondorder proximity, as
(11) 
Models  Distribution  Objective Function  Optimization Direction  Ranking 
Measurement  Score  
LINEO1  KLdivergence  
LINEO2  KLdivergence  
LINE(O1+O2)  KLdivergence  
IRGAN  JSdivergence  
GANE  Wasserstein distance  
GANEO1  Wasserstein distance  
The noise distribution is empirically set to power (Mikolov et al., 2013)(Tang et al., 2015). That is, , where . In general, the larger number of negative sampling , the better performance of the model. Moreover, and will be equivalent when is infinite. Therefore, Eq.(11) can be updated as
(12)  
which shows that GANEO1 (Eq.(5)) and GANEO2 have the same objective function. For this reason, the rest of paper will only experiment and discuss GANE and GANEO1.
3.3. Generator Optimization
In minimax game, the generator plays as an adversary of the discriminator, and it needs to minimize the loss function defined as (referring to Eq.(1)):
(13) 
The generator of GANE is in charge of generating unobserved edges. Different from sampling random variables during generating process in conventional GANs
(Goodfellow et al., 2014; Mirza and Osindero, 2014), GANE requires to be a real vertex in the network when it generates/predicts an unobserved edge for a given . As the sampling of vertex is discrete, Eq.(13) cannot be optimized directly. Inspired by SeqGAN (Yu et al., 2017), the policy gradient which is frequently used in reinforcement learning(Williams, 1992) is applied. The derivation of the policy gradient for GANE generator is computed as(14)  
where a sampling approximation is used in the last step. is a sample edge starting from a given source vertex following , which is the distribution of the current version of generator. The distribution is determined by the parameter of the generator. Every time is updated during the model training, a new version of distribution is generated. is the number of samples.
In reinforcement learning terminology(Sutton
et al., 2000), the discriminator acts as the environment for the generator, feeding a reward to the generator when takes an action, such as generating/pred
icting an edge for a given .
Models  10%  20%  30%  40%  50%  60%  70%  80%  90% 

LINEO1 (%)  73.12  75.77  77.51  78.39  79.18  79.40  79.92  80.09  80.33 
LINEO2 (%)  77.83  83.71  86.90  86.19  87.08  89.25  89.21  88.91  89.99 
LINE(O1+O2) (%)  82.18  86.73  85.03  89.40  91.74  90.65  92.32  92.44  93.06 
IRGAN (%)  58.52  62.07  63.06  62.52  64.48  58.54  66.78  63.71  63.42 
GANE (%)  93.85  94.61  95.04  94.99  95.11  95.23  95.32  95.46  95.09 
GANEO1 (%)  80.49  83.24  86.82  84.92  85.72  82.43  83.87  86.34  85.90 
3.4. Model Training
We randomly sample 90% edges from the network as the training set for the training process, and enforce the requirement that these samples should cover all vertices. Therefore, the embedding of all vertices could be learned in our models. For each training iteration, the discriminator is trained for
times, but the generator is trained just once. Minibatch Stochastic Gradient Descent and RMSProp
(Tieleman and Hinton, 2012) optimizer based on the momentum method are applied as they perform well even on highly nonstationary problems. In order to have parameters lie in a compact space, the paper experiments with simple variants with little difference and sticks with parameters clipping. For more details, please refer to(Arjovsky et al., 2017). The overall algorithm for GANEs is provided in Algorithm 1.4. Experimental Evaluation
To evaluate the proposed models, we applied the embedding representations to two task categories: link prediction, and clustering. For each category, we compared proposed GANE and GANEO1 with several stateoftheart approaches for network embedding.
The list of models in comparison includes:

LINE (Tang et al., 2015). LINE is a very popular and stateoftheart model for network embedding. Three variations of LINE were evaluated: LINEO1, LINEO2, LINE(O1+O2). LINEO1 and LINEO2 consider only the firstorder proximity and secondorder proximity respectively. LINE(O1+O2) utilizes the concatenated vectors from the outputs of LINEO1 and LINEO2.

IRGAN (Wang et al., 2017b). IRGAN was selected as a representative for GANrelated models. IRGAN is designed as a minimax game to improve the performance of information retrieval tasks. To enable comparison, we turned IRGAN into a model for network embedding by featuring parameters in IRGAN as the lowdimensional representations for the network.

GANE. The naive GANE model was defined in Section 3.1. It evaluates the trust score of an edge with the cosine distance between two vertices in lowdimensional vector space.

GANEO1. The GANE model with the firstorder proximity of the network was defined in Section 3.2.
An overview about key technical definitions for these models is provided in Table 1.
The dataset used in the experiments is the coauthor network constructed from the DBLP dataset^{1}^{1}1Available at http://arnetminer.org/citation (Tang et al., 2008). The coauthor network records the number of papers copublished by authors. The coauthor relation is considered as an undirected edge between two vertices (authors) in the network. Furthermore, the network we constructed consists of three different research fields: data mining, machine learning, and computer vision. It includes authors and edges. Each vertex (author) in the network is labeled according to the research areas of papers published by this author. The dimensionality of the embedding vectors is set to 128 for all models.
Metric  P@1  P@3  P@5  P@10  MAP  R@1  R@3  R@5  R@10  R@15  R@20 

LINEO1  0.0185  0.0378  0.0378  0.0326  0.0812  0.0120  0.0754  0.1230  0.2100  0.2694  0.3203 
Improve  563.78%  456.88%  470.37%  361.96%  310.47%  453.33%  361.14%  399.92%  283.24%  215.70%  170.96% 
LINEO2  0.0  0.1124  0.1247  0.0921  0.2073  0.0  0.2483  0.4554  0.6409  0.6973  0.7278 
Improve  N/A  87.28%  72.89%  63.52%  60.78%  N/A  40.03%  35.02%  25.57%  21.97%  19.25% 
LINE(O1+O2)  0.0  0.0928  0.0905  0.0650  0.1535  0.0  0.1971  0.3128  0.4323  0.4917  0.5282 
Improve  N/A  126.83%  138.23%  131.69%  117.13%  N/A  76.41%  96.58%  86.17%  72.97%  64.31% 
IRGAN  0.0231  0.1554  0.1665  0.1160  0.2681  0.0102  0.3311  0.5898  0.7750  0.8193  0.8406 
Improve  431.60%  35.46%  29.49%  29.83%  24.32%  550.98%  5.01%  4.26%  3.85%  3.81%  3.25% 
GANE  0.0  0.1864  0.2208  0.1598  0.2978  0.0  0.3080  0.6236  0.8459  0.8913  0.9112 
Improve  N/A  12.93%  2.36%  5.76%  11.92%  N/A  12.89%  1.40%  4.86%  4.58%  4.75% 
GANEO1  0.1228  0.2105  0.2156  0.1506  0.3333  0.0664  0.3477  0.6149  0.8048  0.8505  0.8679 
4.1. Link Prediction
Link prediction tries to predict the missing neighbor in an unobserved edge for a given vertex of the network or to predict the likelihood of an association between two vertices. It is worth noting that the proposed models GANE and GANEO1 both have implied answers for link prediction, as the generator is trained to produce the best answer of given
. Therefore, there is no need to train a binary classifier for link prediction, or to sort the candidates by a specific metric as most models usually do.
To make fair and impartial comparisons, we evaluated the link prediction task in two aspects as:

a binary classification problem by employing the embedding representations learned in models, and

a ranking problem by scoring the pair of vertices which is represented as a lowdimensional vector.
4.1.1. Classification Evaluation
For binary classification evaluation, we used the Multilayer Perceptron (MLP)
(Rumelhart et al., 1985) classifier to tell positive or negative samples. We randomly sampled different percentages of the edges as the training set and used the rest as the test set for the evaluation.In the training stage, the observed edges in the network were used as the positive samples, and the same size of unobserved edges were randomly sampled as negative samples. The embedding representations of two vertices of an edge were then concatenated as the input to the MLP classifier.
In the test stage, records in the test set were fed into the classifier. Table 2 reports the accuracy of the binary classification achieved by different models. The results show that our models (both GANE and GANEO1) outperform all the baselines consistently and significantly given different trainingtest cuttings. Moreover, they are quite robust/insensitive to the size of the training set in comparison with other approaches. Both GANE and GANEO1 perform better than IRGAN which demonstrates the effectiveness of the adoption of Wasserstein1 distance to GAN models.
LINE(O1+O2) has the best performance in the three variations of LINE, as it explores both firstorder proximity and second order proximity which are the representative structures in the coauthor network as suggested in (Tang et al., 2015). For the models explicitly adopt the first order proximity, GANEO1 performs better than LINEO1. Our guess is that the proposed generative adversarial framework for network embedding is capable of capturing and preserving more complex network structures implicitly.
It is worth noting that GANE shows its full strength on the link prediction task even if it is simpler than GANEO1 without considering the relationship between vertices in the network. This may be attributed to the fact that the coauthor network is quite sparse.
4.1.2. Ranking Evaluation
The multirelational network embedding approaches, such as TransE(Bordes et al., 2013) and TransH(Wang et al., 2014), usually utilize the metric, i.e. where h, r and t denotes the representation vectors for head, relation and tail respectively, to select out a bench of candidates for the ranking in link prediction. Unfortunately, the singlerelational network embedding, which is discussed in the paper, usually utilizes binaryclassifier to determine the results of link prediction as shown in Sec. 4.1.1 as there is no metric directly available as a ranking criterion. Thus, a pool of candidates cannot be provided for some special tasks, e.g., aligning the users across social networks(Liu et al., 2016).
Alternatively, we used the probability of the existence for an edge, which can be implicitly computed by the network embedding model, to evaluate the ranking for candidate selection. And we use the records, which had never appeared in the network embedding training process, as the test set. Fig. 2 depicts the training sets and test sets we used in each task (embedding, classification, ranking).
A pair of vertices is scored by tracking the optimization direction of each model, which are detailed in Table 1. Technically, we utilized the inner product ( ) of two vertex vectors as the scoring criteria since
constitutes the main part of the probability of the existence for an edge and the sigmoid function is strictly monotonically increasing. Then, we ranked all pairs
for a given based on the score. We used precision(Wikipedia, 2017), recall(Wikipedia, 2017) and Mean Average Precision (MAP) (BaezaYates et al., 1999) to evaluate the prediction performance.Table 3
shows the ranking performance for all models. Our models (GANE and GANEO1) outperform others again in terms of all evaluation metrics. Surprisingly, GANEO1 provides quite impressive prediction @1 whereas the other models present rather unpleasant results.
Even if both IRGAN and GANEs are based on GAN models, GANEs constantly have better performance than IRGAN. Moreover, GANE and GANEO1 converge rapidly in comparison with IRGAN. Fig. 3 illustrates P@3 along with the number of iterations increasing. This may be accounted for the application of Wasserstein1 distance.
4.2. Clustering
4.2.1. Visualization  A Qualitative Analysis
We first investigated the quality of embedding representations in an intuitive way by visualization. PCA(Hotelling, 1933) was adopted to facilitate dimension reduction. In this paper, we selected three components obtained by PCA to visualize vertices of the network in 3D space. The resulting visualizations with different embedding models are illustrated in Fig.5. Only the visualizations in GANE and GANEO1 present a relatively clear pattern for the labeled vertices where the authors devoted to the same research area are clustered together. GANE performs the most favorable layout in terms of clear clustering pattern. LINE variations perform not well as they require a rather dense network for the model training.
4.2.2. Quantitative Analysis
We applied kmeans
(Lloyd, 1982) to cluster all vertices in the lowdimensional vector space and set the number of clusters as 3. We utilized majority vote to label the three clusters as : ”data mining”, ”machine learning”, or ”computer vision”. Then, we quantitatively computed the accuracy of the vertices being clustered for each cluster, which is defined as the proportion of the “accurately” clustered vertices to the size of the cluster.Fig.4 illustrates the clustering accuracy achieved by different models on each cluster. Again, GANE produces the best accuracy which is consistent with the visualization. We argue that GANE can effectively preserve the proximities among vertices in the lowdimensional space.
In summary, our GANEs (GANE and GANEO1) achieve the best performance for both link prediction and clustering tasks which demonstrates the strength of the generative adversarial framework. The firstorder proximity intentionally adopted in GANEO1 does not help to significantly boost the embedding performance as seen from the comparison between GANE and GANEO1. We think that purposely to preserve the handcrafted structures may lead the embedding to overlook other underlying latent complex structures in the network, as it is impossible for us to explore all structures in conventional methods. However, GANE may provide a way to explore the underlying structures as complete as possible by incorporating link predictions as a component of the generative adversarial framework.
5. Conclusion
This paper proposes a generative adversarial framework for network embedding. To the best of our knowledge, it is the first attempt to learn network embedding in a generative adversarial manner.
We present three variations of solutions, including GANE which applies cosine similarity, GANEO1 which preserves the firstorder proximity, and GANEO2 which tries to preserves the secondorder proximity of the network in the lowdimensional embedded vector space. Wasserstein1 distance is adopted to train the generator with improved stability. We also prove that GANEO2 has the same objective function as GANEO1 when negative sampling is applied to simplify the training process in GANEO2.
Experiments on link prediction and clustering tasks demonstrate that our models constantly outperform stateoftheart solutions for network embedding. Moreover, our models are capable of performing the feature representation learning and link prediction simultaneously under the adversarial minimax game principles. The results also prove the feasibility and strength of the generative adversarial models to explore the underlying complex structures of networks.
In the future, we plan to study the application of generative adversarial framework into multirelational network embedding. We also plan to gain more insight into the mechanisms that GANs can employ to direct the exploration and discovery of underlying complex structures in networks.
References
 (1)
 Ahmed et al. (2013) Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed largescale natural graph factorization. (2013), 37–48.
 Arjovsky and Bottou (2017) Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017).
 Arjovsky et al. (2017) Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).
 BaezaYates et al. (1999) Ricardo BaezaYates, Berthier RibeiroNeto, et al. 1999. Modern information retrieval. Vol. 463. ACM press New York.
 Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto GarcíaDurán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multirelational Data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 58, 2013, Lake Tahoe, Nevada, United States. 2787–2795.
 Chang et al. (2015) Shiyu Chang, Wei Han, Jiliang Tang, GuoJun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2015. Heterogeneous Network Embedding via Deep Architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 119–128. https://doi.org/10.1145/2783258.2783296
 Chen et al. (2016) Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems. 2172–2180.
 Dyer (2014) Chris Dyer. 2014. Notes on Noise Contrastive Estimation and Negative Sampling. CoRR abs/1410.8251 (2014). arXiv:1410.8251 http://arxiv.org/abs/1410.8251
 Goodfellow et al. (2014) Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423generativeadversarialnets.pdf
 Gutmann and Hyvärinen (2012) Michael U Gutmann and Aapo Hyvärinen. 2012. Noisecontrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research 13, Feb (2012), 307–361.
 Hotelling (1933) Harold Hotelling. 1933. Analysis of a complex of statistical variables into principal components. Journal of educational psychology 24, 6 (1933), 417.
 Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

Li
et al. (2017a)
Hang Li, Haozheng Wang,
Zhenglu Yang, and Haochen Liu.
2017a.
Effective Representing of Information Network by
Variational Autoencoder. In
Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 1925, 2017
. 2103–2109. https://doi.org/10.24963/ijcai.2017/292  Li et al. (2017b) Hang Li, Haozheng Wang, Zhenglu Yang, and Haochen Liu. 2017b. Effective Representing of Information Network by Variational Autoencoder. In TwentySixth International Joint Conference on Artificial Intelligence. 2103–2109.
 Liu et al. (2016) Li Liu, William K. Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 915 July 2016. 1774–1780. http://www.ijcai.org/Abstract/16/254
 Lloyd (1982) Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.
 Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
 Mirza and Osindero (2014) Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
 Perozzi et al. (2014) Bryan Perozzi, Rami AlRfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710.
 Rumelhart et al. (1985) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.
 Sutton et al. (2000) Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057–1063.
 Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.
 Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 990–998.
 Tieleman and Hinton (2012) Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26–31.
 Villani (2009) Cédric Villani. 2009. Optimal transport: old and new. 338 (2009).
 Wang et al. (2016) Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural Deep Network Embedding. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 1225–1234. https://doi.org/10.1145/2939672.2939753
 Wang et al. (2017b) Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017b. IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models. arXiv preprint arXiv:1705.10513 (2017).
 Wang et al. (2017a) Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. 2017a. Community Preserving Network Embedding. In Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, February 49, 2017, San Francisco, California, USA. 203–209. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14589
 Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence, July 27 31, 2014, Québec City, Québec, Canada. 1112–1119. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531
 Wikipedia (2017) Wikipedia. 2017. Precision and recall — Wikipedia, The Free Encyclopedia. (2017). https://en.wikipedia.org/w/index.php?title=Precision_and_recall&oldid=803649212 [Online; accessed 25October2017].
 Williams (1992) Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8, 34 (1992), 229–256.
 Yu et al. (2017) Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.. In AAAI. 2852–2858.
 Zhao et al. (2016) Junbo Zhao, Michael Mathieu, and Yann LeCun. 2016. Energybased generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
Comments
There are no comments yet.