Introduction
Understanding and analyzing graphs is an essential topic that has been widely studied over the past decades. Many real world problems can be formulated as link predictions in graphs [Gehrke, Ginsparg, and Kleinberg2003, Freeman2000, Theocharidis et al.2009, Goyal, Sapienza, and Ferrara2018]. For example, link prediction in an author collaboration network [Gehrke, Ginsparg, and Kleinberg2003] can be used to predict potential future author collaboration. Similarly, new connections between proteins can be discovered using protein interaction networks [Pavlopoulos, Wegener, and Schneider2008], and new friendships can be predicted using social networks [Wasserman and Faust1994]. Recent work on obtaining such predictions use graph representation learning. These methods represent each node in the network with a fixed dimensional embedding, and map link prediction in the network space to a nearest neighbor search in the embedding space [Goyal and Ferrara2018]. It has been shown that such techniques can outperform traditional link prediction methods on graphs [Grover and Leskovec2016, Ou et al.2016a].
Existing works on graph representation learning primarily focus on static graphs of two types: (i) aggregated, consisting of all edges until time ; and (ii) snapshot, which comprise of edges at the current time step . These models learn latent representations of the static graph and use them to predict missing links [Ahmed et al.2013, Perozzi, AlRfou, and Skiena2014, Cao, Lu, and Xu2015, Tang et al.2015, Grover and Leskovec2016, Ou et al.2016a, Goyal et al.2018a]. However, real networks often have complex dynamics which govern their evolution. As an illustration, consider the social network shown in in Figure 1. In this example, user A moves from one friend to another in such a way that only a friend of a friend is followed, and making sure not to befriend an old friend. Methods based on static networks can only observe the network at time and cannot ascertain if A will befriend B or D in the next time step. Instead, observing multiple snapshots can capture the network dynamics and predict A’s connection to D with high certainty.
In this work, we aim to capture the underlying network dynamics of evolution. Given temporal snapshots of graphs, our goal is to learn a representation of nodes at each time step while capturing the dynamics such that we can predict their future connections. Learning such representations is a challenging task. Firstly, the temporal patterns may exist over varying period lengths. For example, in Figure 1, user A may hold to each friend for a varying length. Secondly, different vertices may have different patterns. In Figure 1, user A may break ties with friends whereas other users continue with their ties. Capturing such variations is extremely challenging. Existing research builds upon simplified assumptions to overcome these challenges. Methods including DynamicTriad [Zhou et al.2018], DynGEM [Goyal et al.2018b] and TIMERS [Zhang et al.2017] assume that the patterns are of short duration (length 2) and only consider the previous time step graph to predict new links. Furthermore, DynGEM and TIMERS make the assumption that the changes are smooth and use a regularization to disallow rapid changes.
In this work, we present a model which overcomes the above challenges. dyngraph2vec uses multiple nonlinear layers to learn structural patterns in each network. Furthermore, it uses recurrent layers to learn the temporal transitions in the network. The look back parameter in the recurrent layers controls the length of temporal patterns learned. We focus our experiments on the task of link prediction. We compare dyngraph2vec with the stateoftheart algorithms for dynamic graph embedding and show its performance on several real world networks including collaboration networks and social networks. Our experiments show that using a deep model with recurrent layers can capture temporal dynamics of the networks and significantly outperform the stateoftheart methods on link prediction.
Overall, our paper makes the following contributions:

We propose dyngraph2vec, a dynamic graph embedding model which captures temporal dynamics.

We demonstrate that capturing network dynamics can significantly improve the performance on link prediction.

We present variations of our model to show the key advantages and differences.

We publish a library, DynamicGEM ^{1}^{1}1www.anonymousurl.com, implementing the variations of our model and stateoftheart dynamic embedding approaches.
Related Work
Graph representation learning techniques can be broadly divided into two categories: (i) static graph embedding, which represents each node in the graph with a single vector; and (ii) dynamic graph embedding, which considers multiple snapshots of a graph and obtains a time series of vectors for each node. Most analysis has been done on static graph embedding. Recently, however, some works have been devoted to studying dynamic graph embedding.
Static Graph Embedding
Methods to represent nodes of a graph typically aim to preserve certain properties of the original graph in the embedding space. Based on this observation, methods can be divided into: (i) distance preserving, and (ii) structure preserving. Distance preserving methods devise objective functions such that the distance between nodes in the original graph and the embedding space have similar rankings. For example, Laplacian Eigenmaps [Belkin and Niyogi2001] minimizes the sum of the distance between the embeddings of neighboring nodes under the constraints of translational invariance, thus keeping the nodes close in the embedding space. Similarly, Graph Factorization [Ahmed et al.2013] approximates the edge weight with the dot product of the nodes’ embeddings, thus preserving distance in the inner product space. Recent methods have gone further to preserve higher order distances. Higher Order Proximity Embedding (HOPE) [Ou et al.2016a]
uses multiple higher order functions to compute a similarity matrix from a graph’s adjacency matrix and uses Singular Value Decomposition (SVD) to learn the representation. GraRep
[Cao, Lu, and Xu2015] considers the node transition matrix and its higher powers to construct a similarity matrix.On the other hand, structure preserving methods aim to preserve the roles of individual nodes in the graph. node2vec [Grover and Leskovec2016]
uses a combination of breadth first search and depth first search to find nodes similar to a node in terms of distance and role. Recently, deep learning methods to learn network representations have been proposed. These methods inherently preserve the higher order graph properties including distance and structure. SDNE
[Wang, Cui, and Zhu2016], DNGR [Cao, Lu, and Xu2016] and VGAE [Kipf and Welling2016b]use deep autoencoders for this purpose. Some other recent approaches use graph convolutional networks to learn inherent graph structure
[Kipf and Welling2016a, Bruna et al.2013, Henaff, Bruna, and LeCun2015].Dynamic Graph Embedding
Embedding dynamic graphs is an emerging topic still under investigation. Some methods have been proposed to extend static graph embedding approaches by adding regularization [Zhu et al.2016, Zhang et al.2017]. DynGEM [Goyal et al.2017] uses the learned embedding from previous time step graphs to initialize the current time step embedding. Although it does not explicitly use regularization, such initialization implicitly keeps the new embedding close to the previous. DynamicTriad [Zhou et al.2018] relaxes the temporal smoothness assumption but only considers patterns spanning two time steps. Our model uses recurrent layers to learn temporal patterns over long sequences of graphs and multiple fully connected layer to capture intricate patterns at each time step.
Motivating Example
We consider a toy example to motivate the idea of capturing network dynamics. Consider an evolution of graph , , where represents the state of graph at time . The initial graph is generated using the Stochastic Block Model [Wang and Wong1987] with 2 communities (represented by colors indigo and yellow in Figure 9
), each with 500 nodes. The inblock and crossblock probabilities are set to 0.1 and 0.01 respectively. The evolution pattern can be defined as a three step process. In the first step (shown in Figure
9(a)), we randomly and uniformly select 10 nodes (colored red in Figure 9) from the yellow community. In step two (shown in Figure 9(b)), we randomly add 30 edges between each of the selected nodes in step one and random nodes in indigo community. This is similar to having more than crossblock probability but less than inblock probability. In step three (shown in Figure 9(c)), the community membership of the nodes selected in step 2 is changed from yellow to indigo. Similarly, the edges (colored red in Figure 9) are either removed or added to reflect the crossblock and inblock connection probabilities. Then, for the next time step (shown in Figure 9(d)), the same three steps are repeated to evolve the graph. Informally, this can be interpreted as a two step movement of users from one community to another by initially increasing friends in the other community and subsequently moving to it.Our task is to learn the embeddings predictive of the change in community of the 10 nodes. Figure 8 shows the results of the stateoftheart dynamic graph embedding techniques (DynGEM, optimalSVD, and DynamicTriad) and the three variations of our model: dyngraph2vecAE, dyngraph2vecRNN and dyngraph2vecAERNN (see Methodology Section for the description of the methods). Figure 8 shows the embeddings of nodes after the first step of evolution. The nodes selected for community shift are colored in red. We show the results for 4 runs of the model to ensure robustness. Figure 8(a) shows that DynGEM brings the red nodes closer to the edge of yellow community but does not move any of the nodes to the other community. Similarly, DynamicTriad results in Figure 8(c) show that it only shifts 1 to 4 nodes to its actual community in the next step. The optimalSVD method in Figure 8(b) is not able to shift any nodes. However, our dyngraph2vecAE and dyngraph2vecRNN, and dyngraph2vecAERNN (shown in Figure 8(df)) successfully capture the dynamics and move the embedding of most of the 10 selected nodes to the indigo community, keeping the rest of the nodes intact. This shows that capturing dynamics is critical in understanding the evolution of networks.
Methodology
In this section, we define the problem statement. We then explain multiple variations of deep learning models capable of capturing temporal patterns in dynamic graphs. Finally, we design the loss functions and optimization approach.
Problem Statement
Consider a weighted graph , with and as the set of vertices and edges respectively. We denote the adjacency matrix of by , i.e. for an edge , denotes its weight, else . An evolution of graph is denoted as , where represents the state of graph at time .
We define our problem as follows: Given an evolution of graph , , we aim to represent each node in a series of lowdimensional vector space by learning mappings and such that can capture temporal patterns required to predict . In other words, the embedding function at each time step uses information from graph evolution to capture network dynamics and can thus predict links with higher precision.
dyngraph2vec
Our dyngraph2vec is a deep learning model that takes as input a set of previous graphs and generates as output the graph at the next time step, thus capturing highly nonlinear interactions between vertices at each time step and across multiple time steps. The embedding thus learned is predictive of new links. The model learns the network embedding at time step by optimizing the following loss function:
(1) 
Here we penalize the incorrect reconstruction of edges at time by using the embedding at time step . The embedding at time step is a function of the graphs at time steps where is the temporal look back. We use a weighting matrix to weight the reconstruction of observed edges higher than unobserved links as traditionally used in the literature [Wang, Cui, and Zhu2016]. Here, for , else 1.
We propose three variations of our model based on the architecture of deep learning models as shown in Figure 10: (i) dyngraph2vecAE, (ii) dyngraph2vecRNN, and (iii) dyngraph2vecAERNN. Our three methods differ in the formulation of the function .
To model the interconnection of nodes within and across time, our model dyngraph2vecAE uses multiple fully connected layers. Thus, for a node with neighborhood vector set
, the hidden representation of the first layer is learned as:
(2) 
where
is the activation function,
and , and are the dimensions of representation learned by the first layer, number of nodes in the graph, and look back, respectively. The representation of the layer is defined as:(3) 
Note that dyngraph2vecAE has parameters. As most real world graphs are sparse, learning the parameters can be challenging.
To reduce the number of model parameters and achieve a more efficient temporal learning, we propose dyngraph2vecRNN and dyngraph2vecAERNN. In dyngraph2vecRNN
we use sparsely connected Long Short Term Memory (LSTM) networks to learn the embedding. LSTM is a type of Recurrent Neural Network (RNN) capable of handling longterm dependency problems. In dynamic graphs, there can be longterm dependencies which may not be captured by fully connected autoencoders. The hidden state representation of a single LSTM network is defined as:
(4a)  
(4b)  
(4c)  
(4d)  
(4e)  
(4f) 
where represents the cell states of LSTM, is the value to trigger the forget gate, is the value to trigger the output gate, represents the value to trigger the update gate of the LSTM,
represents the new estimated candidate state, and
represents the biases. There can be LSTM networks connected in the first layer, where the cell states and hidden representation are passed in a chain from to LSTM networks. the representation of the layer is then given as follows:(5a)  
(5b) 
The problem with passing the sparse neighbourhood vector of node to the LSTM network is that the LSTM model parameters (such as the number of memory cells, number of input units, output units, etc.) needed to learn a low dimension representation become large. Rather, the LSTM network may be able to better learn the temporal representation if the sparse neighbourhood vector is reduced to a low dimension representation. To achieve this, we propose a variation of dyngraph2vec model called dyngraph2vecAERNN. In dyngraph2vecAERNN instead of passing the sparse neighbourhood vector, we use a fully connected encoder to initially acquire low dimensional hidden representation given as follows:
(6) 
where represents the output layer of the fully connected encoder. This representation is then passed to the LSTM networks.
(7a)  
(7b) 
Then the hidden representation generated by the LSTM network is passed to a fully connected decoder.
Optimization
We optimize the loss function defined above to get the optimal model parameters. By applying the gradient with respect to the decoder weights on equation 1, we get:
where is the weight matrix of the penultimate layer for all the three models. For each individual model, we back propagate the gradients based on the neural units to get the derivatives for all previous layers. For the LSTM based dyngraph2vec models, back propagation through time is performed to update the weights of the LSTM networks.
After obtaining the derivatives, we optimize the model using stochastic gradient descent (SGD)
[Rumelhart, Hinton, and Williams1988]with Adaptive Moment Estimation (Adam)
[Kingma and Ba2014].Experiments
In this section, we describe the data sets used and establish the baselines for comparison. Furthermore, we define the evaluation metrics for our experiments and parameter settings. All the experiments were performed on a 64 bit Ubuntu 16.04.1 LTS system with Intel (R) Core (TM) i97900X CPU with 19 processors, 10 CPU cores, 3.30 GHz CPU clock frequency, 64 GB RAM, and two Nvidia Titan X, each with 12 GB memory.
Name  SBM  Hepth  AS 

Nodes  1000  15014446  7716 
Edges  56016  26848274  48726467 
Time steps  10  136  733 
Datasets
We conduct experiments on two realworld datasets and a synthetic dataset to evaluate our proposed algorithm.
The datasets are summarized in Table 1.
Stochastic Block Model (SBM)  community diminishing: In order to test the performance of various static and dynamic graph embedding algorithms, we generated synthetic SBM data with two communities and total of 1000 nodes. The crossblock connectivity probability is 0.01 and inblock connectivity probability is set to 0.1. One of the communities is continuously diminished by migrating the 1020 nodes to the other community. A total of 10 dynamic graphs are generated for the evaluation.
Hepth [Gehrke, Ginsparg, and
Kleinberg2003]: The first real world data set used to test the dynamic graph embedding algorithms is the collaboration graph of authors in High Energy Physics Theory conference. The original data set contains abstracts of papers in High Energy Physics Theory conference in the period from January 1993 to April 2003. For our evaluation, we consider the last 50 snapshots of this dataset.
Autonomous Systems (AS) [Leskovec, Kleinberg, and Faloutsos2005]: The second real world dataset utilized is a communication network of whotalkstowhom from the BGP (Border Gateway Protocol) logs. The dataset contains 733 instances spanning from November 8, 1997 to January 2, 2000. For our evaluation, we consider a subset of this dataset which contains the last 50 snapshots.
Baselines
We compare our model with the following stateoftheart static and dynamic graph embedding methods:

Optimal Singular Value Decomposition (OptimalSVD) [Ou et al.2016b]: It uses the singular value decomposition of the adjacency matrix or its variation (i.e., the transition matrix) to represent the individual nodes in the graph. The low rank SVD decomposition with largest singular values are then used for graph structure matching, clustering, etc.

Incremental Singular Value Decomposition (IncSVD) [Brand2006]: It utilizes a perturbation matrix which captures the changing dynamics of the graphs and performs additive modification on the SVD.

Rerun Singular Value Decomposition (RerunSVD or TIMERS) [Zhang et al.2017]: It utilizes the incremental SVD to get the dynamic graph embedding, however, it also uses a tolerance threshold to restart the optimal SVD calculation when the incremental graph embedding starts to deviate.

Dynamic Embedding using Dynamic Triad Closure Process (dynamicTriad) [Zhou et al.2018]: It utilizes the triadic closure process to generate a graph embedding that preserves structural and evolution patterns of the graph.

Deep Embedding Method for Dynamic Graphs (dynGEM) [Goyal et al.2018b]: It utilizes deep autoencoders to incrementally generate embedding of a dynamic graph at snapshot by using only the snapshot at time .
SBM  Hepth  AS  

Method  P@100  P@500  P@1000  P@100  P@500  P@1000  P@100  P@500  P@1000 
IncrementalSVD  0.9881  0.9832  0.9152  0.9835  0.9578  0.8919  0.9524  0.9468  0.9433 
rerunSVD  0.9967  0.9897  0.9248  0.9842  0.9589  0.8932  0.9602  0.9596  0.9578 
optimalSVD  0.9996  0.9879  0.9176  1.0000  0.9856  0.9140  0.8290  0.7397  0.6988 
dynamicTriad  0.1044  0.1096  0.1047  0.6663  0.5340  0.4805  0.8665  0.8543  0.8024 
dynGEM  0.9633  0.9656  0.9673  1.0000  0.9990  0.9784  0.9321  0.9448  0.9377 
dyngraph2vecAE  0.9800  0.9851  0.9869  0.9755  0.9638  0.92080  0.8007  0.8028  0.7546 
dyngraph2vecRNN  0.9927  0.9905  0.9898  0.8741  0.8827  0.8836  0.8514  0.7955  0.7768 
dyngraph2vecAERNN  0.9800  0.9887  0.9917  0.9971  0.9917  0.9785  0.8591  0.8620  0.8577 
Evaluation Metrics
In our experiments, we evaluate our model on link prediction at time step by using all graphs until the time step . We use and Mean Average Precision (MAP) as our metrics. is the fraction of correct predictions in the top predictions. It is defined as , where and are the predicted and ground truth edges respectively. MAP averages the precision over all nodes. It can be written as where and .
Results And Analysis
In this section we present performance result of various models for link prediction on different datasets.
SBM Dataset
The MAP values for various algorithms with SBM dataset with diminishing community is shown in Figure 11. The MAP values shown are for link prediction with embedding sizes 64, 128 and 256. This figure shows that our methods dyngraph2vecAE, dyngraph2vecRNN and dyngraph2vecAERNN all have higher MAP values compared to the rest of the baselines except for dynGEM. The dynGEM algorithm is able to have higher MAP values than all the algorithms. This is due to the fact that dynGEM also generates the embedding of graph at snapshot using the graph at snapshot . Since in our SBM dataset the nodemigration criteria are introduced only one time step earlier, the dynGEM node embedding technique is able to capture these dynamics. Notice that the MAP values of SVD based methods increase as the embedding size increases. However, this is not the case for dynTriad.
Hepth Dataset
The link prediction results for the Hepth dataset is shown in Figure 12. The proposed dyngraph2vec algorithms outperform all the other stateoftheart static and dynamic algorithms. Among the proposed algorithms, dyngraph2vecAERNN has the highest MAP values, followed by dyngraph2vecRNN and dyngraph2vecAE, respectively. The dynamicTriad is able to perform better than the SVD based algorithms. Notice that dynGEM is not able to have higher MAP values than the dyngraph2vec algorithms in the Hepth dataset.
AS Dataset
The MAP value for link prediction with various algorithms for the AS dataset is shown in Figure 13. dyngraph2vecAERNN outperforms all the stateoftheart algorithms. The algorithm with second highest MAP score is dyngraph2vecRNN. However, dyngraph2vecAE has a higher MAP only with a lower embedding of size 64. SVD methods are able to improve their MAP values by increasing the embedding size. However, they are not able to outperform the dyngraph2vec algorithms.
Precision@k and MAP exploration
The average values over all embedding sizes for various datasets are shown in Table 2. The proposed dyngraph2vec algorithms generally have higher values at lower while having overall higher MAP values. On the other hand, other algorithms have higher values at lower , but have lower MAP values calculated over the entire graph nodes.
The summary of MAP values for different embedding sizes (64, 128 and 256) for different datasets is presented in Table 3. The top three highest MAP values are highlighted in bold. For the synthetic SBM dataset, the top three algorithms with highest MAP values are dynGEM, dyngraph2VecAERNN, and dyngraph2vecRNN, respectively. For the Hepth dataset, the top three algorithm with highest MAP values are dyngraph2VecAERNN, dyngraph2VecRNN, and dyngraph2VecAE, respectively. For the AS dataset, the top three algorithm with highest MAP values are dyngraph2VecAERNN, dyngraph2VecRNN, and dyngraph2VecAE, respectively. These results show that the dyngraph2vec variants are able to capture the graph dynamics much better than the most of the stateoftheart algorithms in general.
Average MAP  

Method  SBM  Hepth  AS 
IncrementalSVD  0.4421  0.2518  0.1452 
rerunSVD  0.5474  0.2541  0.1607 
optimalSVD  0.5831  0.2419  0.1152 
dynamicTriad  0.1509  0.3606  0.0677 
dynGEM  0.9648  0.2587  0.0975 
dyngraph2vecAE  0.9500  0.3951  0.1825 
dyngraph2vecRNN  0.9567  0.5451  0.2350 
dyngraph2vecAERNN  0.9581  0.5952  0.3274 
Hyperparameter Sensitivity: Look back
One of the important parameters for timeseries analysis is how much in the past the method looks to predict the future. To analyze the affect of look back on the MAP score we have trained the dyngraph2Vec algorithms with various look back values. The embedding dimension is fixed to 128. The look back size is varied from 1 to 3 with a step size of 1. We then tested the change in MAP values with the real word datasets.
Figure 16 presents the results of look back variation. These results show that MAP scores increase as the look back parameter is increased. The highest MAP value of 0.6155 is achieved for the Hepth dataset by dyngraph2VecAERNN with the look back of 3. Similarly, highest MAP value of 0.3464 is achieved for Hepth dataset by dyngraph2VecAERNN with the look back of 3.
Discussion
Other Datasets: We have validated our algorithms with a synthetic dynamic SBM and two real world datasets including Hepth and AS. We leave the test on further datasets as future work.
Hyperparameters: Currently, we provided the evaluation of the proposed algorithm with embedding size of 64, 128 and 256. We leave the exhaustive evaluation of the proposed algorithms for broader ranges of embedding size and look back size for future work.
Evaluation: We have demonstrated effectiveness of the proposed algorithms for predicting the links of the next time step. However, in dynamic graph networks there are various evaluations such as node classification that can be performed. We leave them as our future work.
Conclusion
This paper introduced dyngraph2vec, a model for capturing temporal patterns in dynamic networks. It learns the evolution patterns of individual nodes and provides an embedding capable of predicting future links with higher precision. We propose three variations of our model based on the architecture with varying capabilities. The experiments show that our model can capture temporal patterns on synthetic and real datasets and outperform stateoftheart methods in link prediction. There are several directions for future work: (1) interpretability by extending the model to provide more insight into network dynamics and better understand temporal dynamics; (2) automatic hyperparameter optimization for higher accuracy; and (3) graph convolutions to learn from node attributes and reduce the number of parameters.
References
 [Ahmed et al.2013] Ahmed, A.; Shervashidze, N.; Narayanamurthy, S.; Josifovski, V.; and Smola, A. J. 2013. Distributed largescale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, 37–48. ACM.
 [Belkin and Niyogi2001] Belkin, M., and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, 585–591.
 [Brand2006] Brand, M. 2006. Fast lowrank modifications of the thin singular value decomposition. Linear algebra and its applications 415(1):20–30.
 [Bruna et al.2013] Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
 [Cao, Lu, and Xu2015] Cao, S.; Lu, W.; and Xu, Q. 2015. Grarep: Learning graph representations with global structural information. In KDD15, 891–900.

[Cao, Lu, and Xu2016]
Cao, S.; Lu, W.; and Xu, Q.
2016.
Deep neural networks for learning graph representations.
In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
, 1145–1152. AAAI Press.  [Freeman2000] Freeman, L. C. 2000. Visualizing social networks. Journal of social structure 1(1):4.
 [Gehrke, Ginsparg, and Kleinberg2003] Gehrke, J.; Ginsparg, P.; and Kleinberg, J. 2003. Overview of the 2003 kdd cup. ACM SIGKDD Explorations 5(2).
 [Goyal and Ferrara2018] Goyal, P., and Ferrara, E. 2018. Graph embedding techniques, applications, and performance: A survey. KnowledgeBased Systems.
 [Goyal et al.2017] Goyal, P.; Kamra, N.; He, X.; and Liu, Y. 2017. Dyngem: Deep embedding method for dynamic graphs. In IJCAI International Workshop on Representation Learning for Graphs.
 [Goyal et al.2018a] Goyal, P.; Hosseinmardi, H.; Ferrara, E.; and Galstyan, A. 2018a. Embedding networks with edge attributes. In Proceedings of the 29th on Hypertext and Social Media, 38–42. ACM.
 [Goyal et al.2018b] Goyal, P.; Kamra, N.; He, X.; and Liu, Y. 2018b. Dyngem: Deep embedding method for dynamic graphs. arXiv preprint arXiv:1805.11273.
 [Goyal, Sapienza, and Ferrara2018] Goyal, P.; Sapienza, A.; and Ferrara, E. 2018. Recommending teammates with deep neural networks. In Proceedings of the 29th on Hypertext and Social Media, 57–61. ACM.
 [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, 855–864. ACM.
 [Henaff, Bruna, and LeCun2015] Henaff, M.; Bruna, J.; and LeCun, Y. 2015. Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163.
 [Kingma and Ba2014] Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
 [Kipf and Welling2016a] Kipf, T. N., and Welling, M. 2016a. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
 [Kipf and Welling2016b] Kipf, T. N., and Welling, M. 2016b. Variational graph autoencoders. arXiv preprint arXiv:1611.07308.
 [Leskovec, Kleinberg, and Faloutsos2005] Leskovec, J.; Kleinberg, J.; and Faloutsos, C. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 177–187. ACM.
 [Ou et al.2016a] Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; and Zhu, W. 2016a. Asymmetric transitivity preserving graph embedding. In Proc. of ACM SIGKDD, 1105–1114.
 [Ou et al.2016b] Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; and Zhu, W. 2016b. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 1105–1114. ACM.
 [Pavlopoulos, Wegener, and Schneider2008] Pavlopoulos, G. A.; Wegener, A.L.; and Schneider, R. 2008. A survey of visualization tools for biological network analysis. Biodata mining 1(1):12.
 [Perozzi, AlRfou, and Skiena2014] Perozzi, B.; AlRfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings 20th international conference on Knowledge discovery and data mining, 701–710.
 [Rumelhart, Hinton, and Williams1988] Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1988. Neurocomputing: Foundations of research. JA Anderson and E. Rosenfeld, Eds 696–699.
 [Tang et al.2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Largescale information network embedding. In Proceedings 24th International Conference on World Wide Web, 1067–1077.
 [Theocharidis et al.2009] Theocharidis, A.; Van Dongen, S.; Enright, A.; and Freeman, T. 2009. Network visualization and analysis of gene expression data using biolayout express3d. Nature protocols 4:1535–1550.
 [Wang and Wong1987] Wang, Y. J., and Wong, G. Y. 1987. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association 82(397):8–19.
 [Wang, Cui, and Zhu2016] Wang, D.; Cui, P.; and Zhu, W. 2016. Structural deep network embedding. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, 1225–1234. ACM.
 [Wasserman and Faust1994] Wasserman, S., and Faust, K. 1994. Social network analysis: Methods and applications, volume 8. Cambridge university press.
 [Zhang et al.2017] Zhang, Z.; Cui, P.; Pei, J.; Wang, X.; and Zhu, W. 2017. Timers: Errorbounded svd restart on dynamic networks. arXiv preprint arXiv:1711.09541.
 [Zhou et al.2018] Zhou, L.; Yang, Y.; Ren, X.; Wu, F.; and Zhuang, Y. 2018. Dynamic Network Embedding by Modelling Triadic Closure Process. In AAAI.
 [Zhu et al.2016] Zhu, L.; Guo, D.; Yin, J.; Ver Steeg, G.; and Galstyan, A. 2016. Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28(10):2765–2777.