E-LSTM-D: A Deep Learning Framework for Dynamic Network Link Prediction

02/22/2019 ∙ by Jinyin Chen, et al. ∙ City University of Hong Kong Microsoft 0

Predicting the potential relations between nodes in networks, known as link prediction, has long been a challenge in network science. However, most studies just focused on link prediction of static network, while real-world networks always evolve over time with the occurrence and vanishing of nodes and links. Dynamic network link prediction thus has been attracting more and more attention since it can better capture the evolution nature of networks, but still most algorithms fail to achieve satisfied prediction accuracy. Motivated by the excellent performance of Long Short-Term Memory (LSTM) in processing time series, in this paper, we propose a novel Encoder-LSTM-Decoder (E-LSTM-D) deep learning model to predict dynamic links end to end. It could handle long term prediction problems, and suits the networks of different scales with fine-tuned structure. To the best of our knowledge, it is the first time that LSTM, together with an encoder-decoder architecture, is applied to link prediction in dynamic networks. This new model is able to automatically learn structural and temporal features in a unified framework, which can predict the links that never appear in the network before. The extensive experiments show that our E-LSTM-D model significantly outperforms newly proposed dynamic network link prediction methods and obtain the state-of-the-art results.



There are no comments yet.


page 1

page 3

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Networks are often used to describe complex systems in various areas, such as social sicence [1, 2], biology [3], electric system [4] and economics[5] etc. And the vast majority of the real world systems evolve with time, which can be modeled as dynamic networks [6, 7], where the nodes may come and go and the links may vanish and recover as time goes by. Links, representing the interactions between different entities, are of particular significance in the analysis of dynamic networks.

Link prediction of a dynamic network [8, 9] tries to predict the future structure of the network based on the historical data, which helps us better understand network evolution and further the relationships between topologies and functions. For instance, in online social networks [10, 11, 12]

, we can predict which links are going to be established in the near future. It means that we can infer with what kind of people, or even which particular one, the target user probably makes friends base on their historical behaviors. It can also be applied to the studies on disease contagions 

[13], protein-protein interactions [14] and many other fields where the evolution matters.

Similarity indices, like Common Neighbor (CN) [15] and Resource Allocation Index (RA) [16], are widely used in link prediction of static networks [17], but they can hardly deal with the changes of the network structure directly. To learn temporal dependencies, Yao et al. [18] assigned time-varied weights to previous graphs and then execute link prediction task using the refined CN which considers the neighbors within two hops. Similarly, Zhang et al. [19] proposed an improved RA based dynamic network link prediction algorithm, which updates the similarity between pairwise nodes when the network structure changes. These methods, however, mostly depend on simple statistics of networks and thus cannot effectively deal with high non-linearity.

In order to tackle this problem, a bunch of network embedding techniques were proposed to learn the representations of networks that can preserve high-order proximity. Random walk based method, such as DeepWalk [20] and node2vec[21]

, sample sequences of nodes and get node vectors by applying skip-gram. Furthermore, with the development of deep learning 

[22, 23, 24], methods like structural deep network embedding (SDNE) [25] and Graph Convolution Network (GCN) [26], can automatically learn node representations end to end. The embedding vectors ensure the nodes of similar structural properties stay close in the embedding space. These embedding methods are powerful but still lack the ability of analyzing the evolution of networks. To learn such temproal dependencies, some recent works take the evolution of network into consideration. Ahmed et al. [27] assigned damping weights to each snapshots, ensuring that more recent snapshots are more important, and combine them into a weighted graph to do local random walk. As an extension of [27], Ahmed and Chen [28]

proposed Time Series Random Walk (TS-RW) to integrate temporal and global information. There are also some methods based on Restrict Boltzmann Machine (RBM), which regard the evolution of network as a special case of Markov random field with two-layer variables. Conditional temporal RBM 

[29], namely ctRBM, considers not only neighboring connections but also temporal connections, and thus has the ability to predict future links. Zhou et al. [30] modeled the network evolution as a triadic closure process, which however is limited to undirected networks. Following the idea of SDNE, Li et al. [31]

used Gated Recurrent Unit (GRU) 

[32] as encoder to learn both spatial and temporal information. Most of these combinations, however, are limited to predicting the added links, which only reflects a part of network evolution. Moreover, they have to obtain a representation of links and then train a binary classification model, which is less unified.

In this paper, we address the problem of predicting the global structure of networks in the near future, focusing on the links that are going to appear or disappear. We propose a novel end-to-end Encoder-LSTM-Decoder (E-LSTM-D) deep learning model for link prediction in dynamic networks, which takes the advantages of encoder-decoder architecture and a stacked Long Short-Term Memory (LSTM). The model thus can effectively handle the problems of high dimension, non-linearity and sparsity. Due to the encoder-decoder architecture, the model can automatically learn representations of networks, as well as reconstruct a graph on the grounds of the extracted information. Relatively low dimensional representations for the sequences of graphs can be well learned from the stacked LTSM module placed right behind the encoder. Considering that network sparsity may seriously affect the performance of the model, we amplify the effect of existing links at the training process, enforcing the model to account for the existing links more than missing/nonexistent ones. We conduct comprehensive experiments on five real-world datasets. The results show that our model significantly outperforms the current state-of-the-art methods. In particular, we make the following main contributions.

  • We propose a general end-to-end deep learning framework, namely E-LSTM-D, for link prediction in dynamic networks, where the encoder-decoder architecture automatically learns representations of networks and the stacked LSTM module enhances the ability of learning temporal features.

  • Our newly proposed E-LSTM-D model is competent to make long term prediction tasks with only slight drop of performances; It suits the networks of different scales by fine tuning the model structure, i.e., changing the number of units in different layers; Besides, it can predict the links that are going to appear or disappear, while most existing methods only focus on the former.

  • We define a new metric, Error Rate, to measure the performance of dynamic network link prediction, which is a good addition to the Area Under the ROC Curve (AUC), so that the evaluation is more comprehensive.

  • We conduct extensive experiments, comparing our E-LSTM-D model with five baseline methods on various metrics. It is shown that our model outperforms the others and obtain the state-of-the-art results.

The rest of paper is organized as follows. In Section II, we provide a rigorous definition of dynamic network link prediction and a detailed description of our E-LSTM-D model. Comprehensive experiments are presented in Section III, with the results carefully discussed. Finally, we conclude the paper and outline some future works in Section IV.

Fig. 1: An illustration of network evolution.The structure of the network changes overtime. At time , and emerge while vanishes, which is reflected in the change of , with those elements equal to 1 represented by filled squares.

Ii Methodology

In this section, we will introduce our E-LSTM-D model used to predict the evolution of dynamic networks.

Ii-a Problem Definition

A dynamic network is modeled as a sequence of snapshot graphs taken at a fixed interval.

Definition 1 (Dynamic Networks)

Given a sequence of graphs, {, …, }, where denotes the snapshot of a dynamic network. Let be the set of all vertices and the temporal links within the fixed timespan . The adjacency matrix of is denoted by with the element if there is a directed link from to and otherwise.

In a static network, link prediction aims to find edges that actually exist according to the distribution of observed edges. Similarly, link prediction in a dynamic network makes full use of the information extracted from previous graphs to reveal the underlying network evolving patterns, so as to predict the future status of the network. Since the adjacency matrix can precisely describe the structure of a network, it is ideal to use it as the input and output of the prediction model. We could infer just based on , due to the strong relationship between the successive snapshots of the dynamic network. However, the information contained in may be too little to do precise inference. In fact, not only the structure itself but also the structure change overtime matters in the network evolution. Thus, we prefer to use a sequence of length , i.e., {, …,}, to predict .

Definition 2 (Dynamic Network Link Prediction)

Given a sequence of graphs with length , ={, …, }, Dynamic Network Link Prediction (DNLP) aims to learn a function that maps the input sequence to .

The structure of a dynamic network evolves with time. As shown in Fig. 1, some links may emerge while some others may vanish, which can be reflected by the changes of the adjacency matrix overtime. The goal is to find the links of the network that are most likely to appear or disappear at the next timespan. Mathematically, it can also be interpreted as an optimization problem of finding a matrix, whose element is either 0 or 1, that can best fit the ground truth.

Fig. 2: The overall framework of E-LSTM-D model. Given a sequence of graphs with length , {, , , }, the encoder maps them into a lower dimensional latent space. Each graph is transformed into a matrix that represents the structural features. And then the stacked LSTM, composed of multiple LSTM cells, learns network evolution patterns from the extracted features. The decoder projects the received feature maps back to the original space to get . Here,

in LSTM cells is an activation function and we use sigmoid in this paper.

Ii-B E-LSTM-D Framework

Here, we propose a novel deep learning model, namely E-LSTM-D, combining the architecture of encoder-decoder and stacked LSTM, with the overall framework shown in Fig. 2. Specifically, the encoder is placed at the entrance of the model to learn the highly non-linear network structures and the decoder converts the extracted features back to the original space. Such encoder-decoder architecture is capable of dealing with spatial non-linearity and sparsity, while the stacked LSTM between the encoder and decoder can learn temporal dependencies. The well designed end-to-end model thus can learn both structural and temporal features and do link prediction in a unified way.

We first introduce terms and notations that will be frequent used later, all of which are listed in TABLE I. Other notations will be explained along with the corresponding equations. Notice that a single LSTM cell can be regarded as a layer, in which the terms with subscript f are the parameters of forget gate, the terms with subscripts i and C are the parameters of input gate, and those with subscript o are the parameters of output gate.

Ii-B1 Encoder-decoder architecture

Autoencoder can efficiently learn representations of data in an unsupervised way. Inspired by this, we place an encoder at the entrance of the model to capture the highly non-linear network structure and a graph reconstructor at the end to transform the latent features back into a matrix of fixed shape. Here, however, the whole process is supervised, which is different from autoencoder, since we have labeled data ( to guide the decoder to build matrices that can better fit the target distributions. In particular, the encoder, composed of multiple non-linear perceptions, projects the high dimensional graph data into a relatively lower dimensional vector space. Therefore, the obtained vectors could characterize the local structure of vertices in the network. This process can be characterized as


where represents graph in the input sequence . For an input sequence, each encoder layer processes every term separately and then concatenates all the activations in the order of time. Here, we use as the activation function for each encoder/decoder layer to accelerate convergence.

Symbol Definition
number of encoder/decoder layers
number of LSTM cells
output of the decoder
output of the stacked LSTM
, output of encode/decoder layer
, weight of encode/decoder layer
, bias of encoder/decoder layer
weight of LSTM layer
bias of LSTM layer
TABLE I: Terms and notations used in the framework.

The decoder with the mirror structure of the encoder receives the latent features and maps them into the reconstruction space under the supervision of , represented by


where is generated by the stacked LSTM and represents the features of the target snapshot rather than a sequence of features of all previous snapshots used in the encoder. Another difference is the last layer of the decoder, or the output layer, uses sigmoid as the activation function rather than . And the number of units of the output layer always equals to the number of nodes.

Ii-B2 Stacked LSTM

Although encoder-decoder architecture could deal with the high non-linearity, it is not able to capture the time-varying characteristics. LSTM [33]

, as a special kind of recurrent neural network (RNN) 

[34, 35], can learn long-term dependencies and is introduced here to solve this problem. An LSTM consists of three gates, i.e., a forget gate, an input gate and an output gate. The first step is to decide what information is going to be thrown away from previous cell state. The operation is performed by the forget gate, which is defined as


where represents the output at time . Then the input gate decides what new information should be added to the cell state. First, a sigmoid layer decides what information the input contains, , should be updated. Second, a tanh layer generates a vector of candidate state values, , which could be added to the cell state. The combination of and represents the current memory that can be used for updating . The operation is defined as


Taking the benefit of the forget gate and the input gate, LSTM cell can not only store long-term memory but also filter out the useless information. The output of LSTM cell is based on and it is controlled by the output gate which decides what information, , should be exported. The process is described as


A single LSTM cell is capable of learning time dependencies, but a chain-like LSTM module, namely stacked LSTM, is more suitable for processing time sequence data. Stacked LSTM consists of multiple LSTM cells that take signals as input in the order of time. We place the stacked LSTM between the encoder and the decoder to learn the patterns under which the network evolves. After receiving the features extracted at time

, the LSTM module turns them into and then feed back to the model at next training step. It helps the model make use of the remaining information of previous training data. It should be always noticed that the numbers of units in encoder, LSTM cells and decoder vary when changes. The larger , the more units we need in the model.

The encoder at the entrance could reduce the dimension for each graph and thus keep the computation of the stacked LSTM at a reasonable cost. And the stacked LSTM which is advanced at dealing with temporal and sequential data is supplementary to the encoder in turn.

Ii-C Balanced Training Process

distance, often applied in regression, can measure the similarity between two samples. But if we simply use it as loss function in the proposed model, the cost could probably not converge to an expected range or result in overfitting due to the sparsity of the network. There are far more zero elements than non-zero elements in

, making the decoder appeal to reconstruct zero elements. To address this sparsity problem, we should focus more on those existing links rather than nonexistent links in back propagation. We define a new loss function as


where means the Hadamard product. For each training process, if and otherwise. Such penalty matrix exerts more penalty on non-zero elements so that the model could avoid overfitting to a certain extent. And we finally use the mixed loss function


where , defined in Eq. (8), is a regularizer to prevent the model from overfitting and is a tradeoff parameter.


The value of each element in

is either 0 or 1. The output data, however, are not one-hot encoded. They are decimals and could go to infinity or move towards the opposite direction theoretically. In order to get a valid adjacency matrix, we impose a sigmoid function at the output layer and then modify the values to 0 and 1 with 0.5 as the demarcation point. That is, there exists a link between

and if and there is no link otherwise. To optimize the proposed model, we should first make a forward propagation to obtain the loss and then do back propagation to update all the parameters. In particular, the key operation is to calculate the partial derivative of and .

We would like to take the calculation of for instance. Taking partial derivative with respect to of Eq. (7), we have


According to Eq. (6), we can easily obtain


To calculate , we should iteratively take partial derivative with respect to on both sides of Eq. (1). After getting , we update the weight by


where is the learning rate which is set as 1e-3 in the following experiments.

As for and , the calculation of partial derivative almost follows the same procedure, though it is a little more complicated when it comes to the weights in LSTM cells. This is because the recurrent network makes use of cell states at every forward propagation cycle.

Iii Experiments

The proposed E-LSTM-D then is evaluated on five benchmark datasets, compared with four baseline methods.

Iii-a Datasets

We perform the experiments on five real-world dynamic networks, all of which are human contact networks, where nodes denote humans and links stand for their contacts. The contacts could be face-to-face proximity, emailing and so on. The detailed descriptions of these datasets are listed below.

  • contact [36]: It is a human contact dynamic network of face-to-face proximity. The data are collected through the wireless devices carried by people. A link between person (source) and (target) emerges along with a timestamp if gets in touch with . The data are recorded every 20 seconds and multiple edges may be shown at the same time if multiple contacts are observed in a given interval.

  • enron [37] and radoslaw [38]: They are email networks and each node represents an employee in a mid-sized company. A link occurs every time an e-mail sent from one to another. enron records email interactions for nearly 6 months and radoslaw lasts for nearly 9 months.

  • fb-forum [39]: The data were attained from a Facebook-like online forum of students at University of California, Irvine, in 2004. It is an online social network where nodes are users and links represent interactions (e.g., messages) between them. The records span more than 5 months.

  • lkml [40]: The data were collected from linux kernel mailing list. The nodes represent users which are identified by their email addresses and each link donates a reply from one user to another. We only focus on the 2210 users that were recorded from 2007-01-01 to 2007-04-01 and then construct a dynamic network based on the links between these users that appeared from 2007-04-01 to 2013-12-01.

All the experiments are implemented in both long-term and short-term networks. The basic statistics of the five datasets are summarized in TABLE II.

contact 274 28.2K 206.2 2,092 4.0
enron 151 50.5K 669.8 1,841 164.5
radoslaw 167 82.9K 993.1 9,053 271.2
fb-forum 899 50.5K 669.8 5,177 164.5
lkml 2210 422.4K 34.6 47,995 2,436.3
TABLE II: The basic statistics of the five datasets.

Before training, we take snapshots for each dataset at a fixed interval and then sort them in an ascending order of time. Considering that the connections between people are probably temporary, we remove the links that do not show up again in the following 8 intervals and the length of each interval may vary for different timespan. To obtain enough samples, we split each dataset into 320 snapshots with different intervals and set . In this case, is treated as a sample with the first ten snapshots as the input and the last one as the output. As a result, we can get 310 samples in total. We then group the first 230 samples, with varying from 11 to 240, as the training set, and the rest 80 samples, with varying from 241 to 320, as the test set.

Iii-B Baseline Methods

To validate the effectiveness of our E-LSTM-D model, we compare it with node2vec, as a widely used baseline network embedding method, as well as four state-of-the-art DNLP methods that could handle time dependencies, including Temporal Network Embedding (TNE) [41], conditional temporal RBM (ctRBM) [29]

, Gradient boosting decision tree based Temporal RBM (GTRBM) 

[42] and Deep Dynamic Network Embedding (DDNE) [31]. In particular, the five baselines are introduced as follows.

  • node2vec [21]: As a network embedding method, it maps the nodes of a network from a high dimensional space to a lower dimensional vector space. A pair of nodes tend to be connected with a higher probability, i.e., they are more similar, if the corresponding vectors are of shorter distance.

  • TNE [41]: It models network evolution as a Markov process and then use the matrix factorization to get the embedding vector for each node.

  • ctRBM [29]: It is a generative model based on temporal RBM. It first generates a vector for each node based on temporal connections and predict future linkages by integrating neighbor information.

  • GTRBM [42]: It takes the advantages of both tRBM and GBDT to effectively learn the hidden dynamic patterns.

  • DDNE [31]: Similar to autoencoder, it uses a GRU as an encoder to read historical information and decodes the concatenated embeddings of previous snapshot into future network structure.

When implementing node2vec, we set the dimension of the embedding vector as 80 for contact, enron and radoslaw which have less than 500 nodes. And for fb-forum and lkml with larger size, we set the dimension as 256. We grid search over {0.5, 1, 1.5, 2} to find the optimal values for hyper-parameters and , and then use Weighted-L2 [21] to obtain the vector for each pair of nodes and , with each element defined as


where and are the element of embedding vectors of nodes and , respectively. For TNE, we set the dimension as 80 for contact, enron and radoslaw and 200 for fb-forum and lkml. The parameters of ctRBM and GTRBM are mainly about the numbers of visible units and hidden units in tRBM. The number of visible units always equals to the number of corresponding network’s nodes and we set the dimension of hidden layers as 128 for smaller datasets like contact, enron and radoslaw and 256 for the rest. For DDNE, we set the dimension as 128 for the first three smaller datasets and 512 for the rest. When implementing our proposed model, E-LSTM-D, we choose the parameters accordingly: For the first three smaller datasets, we set and and add an additional layer to both encoder and decoder when for the rest two larger datasets. The details of the parameters are illustrated in TABLE III. Note that these parameters are chosen to get the best performance for each method, so as to make fair comparison.

No. units in
No. units in
stacked LSTM
No. units in
contact 128 256  256 274
enron 128 256  256 151
radoslaw 128 256  256 167
fb-forum 512  256 384  384 256  899
lkml 1024  512 384  384 512  2210
TABLE III: The parameters of E-LSTM-D in the 5 datasets.
Method contact enron radoslaw fb-forum lkml
20 80 20 80 20 80 20 80 20 80
AUC node2vec 0.5212 0.5126 0.7659 0.6806 0.6103 0.7676 0.5142 0.5095 0.6348 0.5892
TNE 0.9443 0.9297 0.8096 0.8314 0.8841 0.8801 0.9810 0.9749 0.9861 0.9867
ctRBM 0.9385 0.9109 0.8468 0.8295 0.8834 0.8590 0.8728 0.8349 0.8091 0.7729
GTRBM 0.9451 0.9327 0.8527 0.8491 0.9237 0.9104 0.9023 0.8749 0.8547 0.8329
DDNE 0.9347 0.9433 0.7985 0.7638 0.9027 0.8974 0.9238 0.8729 0.9328 0.9115
E-LSTM-D 0.9908 0.9893 0.8931 0.8734 0.9814 0.9782 0.9670 0.9650 0.9572 0.9553
GMAUC node2vec 0.1805 0.1398 0.4069 0.5417 0.7241 0.7203 0.2744 0.2886 0.2309 0.2193
TNE 0.9083 0.8958 0.8233 0.7974 0.8282 0.8251 0.9689 0.9629 0.9839 0.9778
ctRBM 0.9126 0.8893 0.7207 0.6921 0.8004 0.7998 0.8926 0.8632 0.7723 0.7206
GTRBM 0.9240 0.9136 0.9148 0.8675 0.9157 0.8849 0.9329 0.9117 0.6529 0.6038
DDNE 0.8925 0.8684 0.8724 0.8476 0.8938 0.8724 0.9126 0.9023 0.7894 0.7809
E-LSTM-D 0.9940 0.9902 0.9077 0.8763 0.9956 0.9938 0.9926 0.9865 0.8657 0.8511
Error Rate node2vec 44.7753 25.2278 23.9053 24.8060 20.7240 21.2489 40.5109 48.5376 53.2895 61.0274
TNE 13.1410 7.1556 23.1276 19.9167 16.7078 16.7175 19.1058 24.4350 18.5702 18.2091
ctRBM 1.8976 1.9046 2.4890 2.7328 1.8920 2.0937 3.4509 3.6782 2.9903 3.3089
GTRBM 1.5843 1.6953 1.5947 1.8836 1.9079 2.0031 2.2347 2.4396 2.5351 2.7942
DDNE 1.1780 1.6036 1.7664 1.9014 1.6316 1.5941 1.9014 1.8266 2.0134 2.2258
E-LSTM-D 0.4011 0.5735 0.9038 0.9880 0.3392 0.3938 0.5583 0.5777 0.9840 1.0093
TABLE IV: DNLP performances on AUC, GMAUC and Error Rate for the first 20 samples and all the 80 samples.

Iii-C Evaluation Metrics

There are few metrics specifically designed for the evaluation of DNLP. Usually, those evaluation metrics used in static link prediction are also employed for DNLP. The Area Under the ROC Curve (AUC) is commonly used to measure the performance of a dynamic link predictor. AUC equals to the probability that the predictor gives a higher score to a randomly chosen existing link than a randomly chosen nonexistent one. The predictor is considered more informative if its AUC value is closer to 1. Other measurements, such as precision, Mean Average Precision (MAP), F1-score and accuracy evaluate link prediction methods from the perspective of binary classification. All of them suffer from the sparsity problem and cannot give measurements to dynamic performances. The Area Under the Precision-Recall Curve (PRAUC) 

[43] developed from AUC is designed to deal with the sparsity of networks. However, the removed links in the near future, as a significant aspect of DNLP, are not characterized by PR curve and thus PRAUC may lose its effectiveness in this case. Junuthula et al. [44]

restricted the measurements to only part of node pairs and proposed the Geometric Mean of AUC and PRAUC (GMAUC) for the added and removed links, which can better reflect the dynamic performance. Li et al. 

[29] use SumD that counts the differences between the predicted network and the true one, evaluating link prediction methods in a more strict way. But the absolute difference could be misleading. For example, two dynamic link predictors both achieve SumD at 5. However, one predictor mispredicts 5 links in 10, while the other mispredicts 5 in 100. It’s obvious that the latter one performs better than the former one but SumD cannot tell.

In our experiments, we choose AUC and GMAUC, and also define a new metric, Error Rate, to evaluate our E-LSTM-D model and other baseline methods.

  • AUC: If among independent comparisons, there are times that the existing link gets a higher score than the nonexistent link and times they get the same score, then we have


    Before calculation, we randomly sample nonexistent links with the same number of existing links to ease the impact of sparsity.

  • GMAUC: It is a metric specifically designed for measuring the performance of DNLP. It combines PRAUC(the area under the Precision-Recall curve) and AUC by taking geometric mean of the two quantities, which is defined as


    where and refer to the numbers of added and removed edges, respectively. is the PRAUC value calculated among the new links and represents the AUC for the observed links.

  • Error Rate: It is defined as the ratio of the number of mispredicted links, denoted by , to the total number of truly existing links, denoted by , which is represented by


    Different from SumD that only counts the absolute different links in two graphs, Error Rate takes the number of truly existing links into consideration to avoid deceits.

Iii-D Experimental Results

For each epoch, we feed 10 historical snapshots, {

, …, } to E-LSTM-D and infer . And it is the same for implementing the other four DNLP approaches. For the methods that are not able to deal with time dependencies, i.e. node2vec, there are following two typical treatments: 1) only using to infer  [18]; or 2) aggregating previous 10 snapshots into a single network and then do link prediction [45, 31]. We choose the former one when implementing node2vec, because the relatively long sequence of historical snapshots here may carry some disturbing information that node2vec cannot handle, leading to even poor performance.

We compare our E-LSTM-D model with the five baseline methods on the performance metrics AUC, GMAUC and Error Rate. Since the patterns of network evolution may change with time, the model trained by the history data may not capture the pattern in the remote future. To investigate both short-term and long-term prediction performance, we report the average values of the three performance metrics for both the first 20 test samples and all the 80 samples. The results are presented in TABLE IV, where we can see that, generally, the E-LSTM-D model outperforms all the baseline methods in almost all the cases, no matter the network is large or small, dense or sparse, for both short-term and long-term prediction. In particular, for the metrics of AUC and GMAUC, the poor performances obtained by node2vec indicate that the methods, designed for static networks, are indeed not suitable for DNLP. On the contrary, E-LSTM-D and other DNLP baselines can get much better performances, due to their dynamic nature.

Fig. 3: DNLP performance on AUC, GMAUC and Error Rate, obtained by our E-LSTM-D model, as functions of for the five datasets. The dash lines represent the changing tendencies.
for link importance
Method contact enron radoslaw fb-forum lkml
20 80 20 80 20 80 20 80 20 80
node2vec 0.6279 0.6297 0.4900 0.4524 0.4735 0.5203 0.3873 0.3454 0.5034 0.5289
TNE 0.9622 0.9551 0.3446 0.3315 0.5068 0.4413 0.0595 0.0558 0.6390 0.6288
ctRBM 0.2739 0.3307 0.4193 0.4410 0.3028 0.3097 0.1095 0.1137 0.3291 0.3341
GTRBM 0.2209 0.2390 0.4098 0.4322 0.2109 0.2198 0.1127 0.1239 0.2973 0.3030
DDNE 0.1293 0.1359 0.2270 0.2133 0.0803 0.1249 0.1190 0.1088 0.1653 0.1821
E-LSTM-D 0.0484 0.1109 0.2182 0.2096 0.0516 0.0761 0.0160 0.0222 0.1863 0.1992
Edge betweenness
node2vec 0.6747 0.6509 0.4607 0.5953 0.4657 0.4397 0.6517 0.6799 0.8729 0.8698
TNE 0.9998 0.9987 0.9598 0.9590 1.0000 1.0000 1.0000 0.9986 1.0000 0.9992
ctRBM 0.5396 0.5619 0.6512 0.7381 0.2165 0.2291 0.4432 0.4508 0.7279 0.7503
GTRBM 0.4418 0.4573 0.6906 0.7420 0.2399 0.2511 0.4507 0.4529 0.6370 0.6524
DDNE 0.2713 0.2849 0.4988 0.5471 0.2083 0.2508 0.2697 0.3014 0.6435 0.6614
E-LSTM-D 0.2004 0.2547 0.5067 0.6157 0.1617 0.2159 0.2643 0.2825 0.5820 0.6126
TABLE V: Error Rate of the top 10% important links, in terms of DC and EBC, for the first 20 test samples and all the 80 samples.
Fig. 4: The network structural properties, average degree and average clustering coefficient, as functions of for the five datasets.

Moreover, for each predicted snapshot, we also compare the predicted links with truly existing ones to obtain the Error Rate. We find that node2vec can easily predict much more links than the truly existing ones, leading to relatively large Error Rates. We argue that it might blame to the classification process that the pre-trained linear regression model is not suitable for the classification of embedding vectors. As presented in TABLE 

IV, the results again demonstrate the best performance of our E-LSTM-D model on DNLP. TNE performaces poorly on Error Rate, because it does not specially fit the distribution of the network as the other deep learning based methods do. The dramatic difference of the Error Rate between E-LSTM-D and TNE indicates that this metric is a good addition to AUC to comprehensively measure the performance of DNLP. Other deep learning based methods, like ctRBM and DDNE, have similar performances while they could not compete with E-LSTM-D in most cases. It is worth noticing that the TNE outperforms the others on lkml from the perspective of traditional AUC and GMAUC, which shows its robustness to the scale of networks on these metrics, however, it has much larger Error Rate compared with the other DNLP methods.

For the 80 test samples with as the output, where varies from 1 to 80, we draw the DNLP performances on the three metrics, obtained by E-LSTM-D, as functions of for the five datasets to see how long it can predict network evolution with satisfying performance. The results are shown in Fig. 3 for E-LSTM-D, where we can see that, generally, AUC and GMAUC decrease, while Error Rate increases, as increases, indicating that long-term prediction on structure is indeed relatively difficult for most dynamic networks. Interestingly, for radoslaw, fb-forum and lkml, the prediction performances are relatively stable, which might be because their network structures evolve periodically, making the collection of snapshots easy to predict, especially when LSTM is integrated in our deep learning framework. To further illustrate this, we investigate the changing trends of the most common structural properties, i.e., average degree and average clustering coefficient, of the five networks as increases. The results are shown in Fig. 4, where we can see that these two properties change dramatically for contact and enron, while they are relatively stable for radoslaw, fb-forum and lkml. These results explain why we can make better long-term prediction on the last two dynamic networks.

As described above, although some methods have excellent performances on AUC, they might mispredict many links. In most real-world scenarios, however, we may only focus on the most important links. Therefore, we further evaluate our model on part of the links that are of particular significance in the network. Here, we use two metrics, degree centrality and edge betweenness centrality, to measure the importance of each link. DC is originally used to measure the importance of nodes according to the amount of neighbors. To measure the importance of a link, we use the sum of degree centralities of the two terminal nodes (source and target). We then calculate the Error Rate when predicting the top 10% important links. The results are presented in TABLE V, which demonstrate again the outstanding performance of our E-LSTM-D model in predicting important links. It also shows that the E-LSTM-D model is more capable of learning networks’ features, i.e. degree distribution and edge betweenness, which could account for the effectiveness in a way. Moreover, comparing TABLE. IV and TABLE. V, we find that Error Rates on the top 10% important links are much smaller than those on all the links in the five networks by adopting any method. This indicates that, actually, those more important links are also more easily to be predicted.

Iii-E Beyond Link Prediction

Our E-LSTM-D model learns low dimensional representation for each node in the process of link prediction. These vectors, like those generated by other network embedding methods, contains local or global structural information that can be used in other tasks such as node classification etc. To illustrate this, we conduct experiment on karate club dataset, with the network structure shown in Fig. 5 (a). We first obtain by randomly removing 10 links form the original network and then use it to predict the original network . After training, we use the output of the stacked LSTM as the input to the visualization method t-SNE [46]. Besides obtaining the excellent performance on link prediction, we also visualize the embedding vectors, as shown in Fig. 5 (b), where we can see that the nodes of the same class are close to each other while those of different classes are relatively far away. This indicates that the embedding vectors obtained by our E-LSTM-D model on link prediction can also be used to effectively solve the node classification problem, validating the outstanding transferability of the model.

Fig. 5: (a) The structure of karate club network. (b) The t-SNE visualization of the embedding features obtained by our E-LSTM-D model.

Iii-F Parameter Sensitivity

The performance of our E-LSTM-D model is mainly determined by three parts: the structure of model, the length of historical snapshots , and the penalty coefficient . In the following, we will investigate their influences on the model performance.

Fig. 6: Performance on the 5 datasets with different number of units of the first encoder layer. For each dataset, the number of the units of the first encoder layer increases at the step of 64 from left to right.
Fig. 7: Parameter sensitivity analysis of our E-LSTM-D model on five datasets. (a) The performances on AUC, GMAUC and Error Rates as functions of historical snapshot length . (b) The performances as functions of penalty coefficient .

Iii-F1 Influence of the model’s structure

The results shown in TABLE IV are obtained by the models with selected structures. The numbers of units in each layer and the number of layers are set with concerns on both computation complexity and models’ performance. We test the model with different number of units and encoder layers to prove the validity of the structures above. Fig. 6 shows that the performance will slightly drop with the reduction of the number of units in the first encoder layer. And further increasing the complexity has little contribution to the performance and may even lead to worse results. TABLE VI reports the difference of the performances between the model with an additional encoder layer which shares the same structure of the previous layer and the original model. The results show that there seems no significant improvements on AUC and GMAUC with an additional layer. But it could actually lower Error Rates with the increasing of the model’s complexity. Overall, the general structure of E-LSTM-D can achieve state-of-art performance in most cases.

contact 0.0038 0.0024 -0.1037
enron 0.0119 0.0206 -0.0397
radoslaw 0.0035 -0.0054 -0.0920
fb-forum 0.0029 0.0011 -0.1033
lkml -0.0079 0.0108 -0.1375
TABLE VI: Difference of the performance with different number of encoder layers

Iii-F2 Influence of historical snapshot length

Usually, longer length of historical snapshots contains more information and thus may improve link prediction performance. On the other hand, snapshots from long ago, however, might have little influence on the current snapshot, while more historical snapshots will increase the computational complexity. Therefore, it is necessary to find a proper length to balance efficiency and performance. We thus vary the length of historical snapshots from 5 to 25 with a regular interval 5. The results are shown in Fig. 7 (a), which tell that more historical snapshots can indeed improve the performance of our model, i.e., leading to larger AUC and GMAUC while smaller Error Rate. Moreover, it seems that AUC and GMAUC increase most when changes from 1 to 10, while Error Rate decreases most when changes from 1 to 20. Thereafter, for most dynamic networks, these metrics keep almost the same as further increases. This phenomenon suggests us to choose in the previous experiments.

Iii-F3 Influence of the penalty coefficient

The penalty coefficient is applied in the objective to avoid overfitting and accelerate convergence. When , the objective simply equals to distance. In reality, is usually larger than 1 to help the model focus more on existing links in the training process. As shown in Fig. 7 (b), we can see that the performance is relatively stable as varies. However, for some datasets, the increasing of penalty coefficient could actually lead to slightly larger GMAUC but smaller Error Rate, while it has little effect on AUC. As further increases, both GMAUC and Error Rate keep relatively stable. These suggest us to choose a relatively small , i.e., in the experiments, varying for different datasets to obtain the optimal results.

Iv Conclusion

In this paper, we propose a new deep learning model, namely E-LSTM-D, for DNLP. Specifically, to predict future links, we design an end-to-end model integrating a stacked LSTM into the architecture of encoder-decoder, which can make fully use of historical information. The proposed model learns not only the low dimensional representations and non-linearity but also the time dependencies between successive network snapshots, as a result, it can better capture the patterns of network evolution. To cope with the problem of sparsity, we impose more penalty to exis links in the objective, which can also help to preserve local structure and accelerate convergence. Empirically, we conduct extensive experiments to compare our model with traditional link prediction methods on a variety of datasets. The results demonstrate that our model outperforms the others and achieve the state-of-the-art performance. Moreover, we show that the latent features generated by our model in link prediction can be used to well characterize the global and local structure of the nodes in a network and thus may also benefit other tasks, such as node classification.

Our future research will focus on predicting the evolution of layered dynamic networks. Besides, we will make efforts to reduce the computational complexity of our E-LSTM-D model to make it suitable for large-scale network. Also, we will study the transferability of our model on various tasks by conducting more comprehensive experiments.


  • [1] D. Ediger, K. Jiang, J. Riedy, D. A. Bader, and C. Corley, “Massive social network analysis: Mining twitter for social good,” in Parallel Processing (ICPP), 2010 39th International Conference on.   IEEE, 2010, pp. 583–593.
  • [2] C. Fu, J. Wang, Y. Xiang, Z. Wu, L. Yu, and Q. Xuan, “Pinning control of clustered complex networks with different size,” Physica A: Statistical Mechanics and its Applications, vol. 479, pp. 184–192, 2017.
  • [3] L. Wang and J. Orchard, “Investigating the evolution of a neuroplasticity network for learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, doi:10.1109/TSMC.2017.2755066.
  • [4] J. Gao, Y. Xiao, J. Liu, W. Liang, and C. P. Chen, “A survey of communication/networking in smart grids,” Future Generation Computer Systems, vol. 28, no. 2, pp. 391 – 404, 2012.
  • [5] M. Kazemilari and M. A. Djauhari, “Correlation network analysis for multi-dimensional data in stocks market,” Physica A: Statistical Mechanics and its Applications, vol. 429, pp. 62–75, 2015.
  • [6] J. Sun, Y. Yang, N. N. Xiong, L. Dai, X. Peng, and J. Luo, “Complex network construction of multivariate time series using information geometry,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 1, pp. 107–122, Jan 2019.
  • [7] H. Liu, X. Xu, J.-A. Lu, G. Chen, and Z. Zeng, “Optimizing pinning control of complex dynamical networks based on spectral properties of grounded laplacian matrices,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, doi:10.1109/TSMC.2018.2882620.
  • [8] N. M. A. Ibrahim and L. Chen, “Link prediction in dynamic social networks by integrating different types of information,” Applied Intelligence, vol. 42, no. 4, pp. 738–750, 2015.
  • [9] Q. Xuan, H. Fang, C. Fu, and V. Filkov, “Temporal motifs reveal collaboration patterns in online task-oriented networks,” Physical Review E, vol. 91, no. 5, p. 052813, 2015.
  • [10] Q. Xuan, Z.-Y. Zhang, C. Fu, H.-X. Hu, and V. Filkov, “Social synchrony on complex networks,” IEEE transactions on cybernetics, vol. 48, no. 5, pp. 1420–1431, 2018.
  • [11] Q. Xuan, M. Zhou, Z.-Y. Zhang, C. Fu, Y. Xiang, Z. Wu, and V. Filkov, “Modern food foraging patterns: Geography and cuisine choices of restaurant patrons on yelp,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 508–517, 2018.
  • [12]

    C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, and Q. Xuan, “Link weight prediction using supervised learning methods and its application to yelp layered network,”

    IEEE Transactions on Knowledge and Data Engineering, 2018.
  • [13] H. H. Lentz, A. Koher, P. Hövel, J. Gethmann, C. Sauter-Louis, T. Selhorst, and F. J. Conraths, “Disease spread through animal movements: a static and temporal network analysis of pig trade in germany,” PloS one, vol. 11, no. 5, p. e0155196, 2016.
  • [14] A. Theocharidis, S. Van Dongen, A. J. Enright, and T. C. Freeman, “Network visualization and analysis of gene expression data using biolayout express 3d,” Nature protocols, vol. 4, no. 10, p. 1535, 2009.
  • [15] M. E. Newman, “Clustering and preferential attachment in growing networks,” Physical review E, vol. 64, no. 2, p. 025102, 2001.
  • [16] T. Zhou, L. Lü, and Y.-C. Zhang, “Predicting missing links via local information,” The European Physical Journal B, vol. 71, no. 4, pp. 623–630, 2009.
  • [17] L. Lü and T. Zhou, “Link prediction in complex networks: A survey,” Physica A: statistical mechanics and its applications, vol. 390, no. 6, pp. 1150–1170, 2011.
  • [18] L. Yao, L. Wang, L. Pan, and K. Yao, “Link prediction based on common-neighbors for dynamic social network,” Procedia Computer Science, vol. 83, pp. 82–89, 2016.
  • [19] Z. Zhang, J. Wen, L. Sun, Q. Deng, S. Su, and P. Yao, “Efficient incremental dynamic link prediction algorithms in social network,” Knowledge-Based Systems, vol. 132, pp. 226–235, 2017.
  • [20] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2014, pp. 701–710.
  • [21] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2016, pp. 855–864.
  • [22]

    Z. Han, Z. Liu, C.-M. Vong, Y.-S. Liu, S. Bu, J. Han, and C. P. Chen, “Deep spatiality: Unsupervised learning of spatially-enhanced global and local 3d features by deep neural network with coupled softmax,”

    IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 3049–3063, 2018.
  • [23]

    Q. Xuan, B. Fang, Y. Liu, J. Wang, J. Zhang, Y. Zheng, and G. Bao, “Automatic pearl classification machine based on a multistream convolutional neural network,”

    IEEE Transactions on Industrial Electronics, vol. 65, no. 8, pp. 6538–6547, 2018.
  • [24] Q. Xuan, H. Xiao, C. Fu, and Y. Liu, “Evolving convolutional neural network and its application in fine-grained visual categorization,” IEEE Access, 2018.
  • [25] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2016, pp. 1225–1234.
  • [26] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  • [27] N. M. Ahmed, L. Chen, Y. Wang, B. Li, Y. Li, and W. Liu, “Sampling-based algorithm for link prediction in temporal networks,” Information Sciences, vol. 374, pp. 1–14, 2016.
  • [28] N. M. Ahmed and L. Chen, “An efficient algorithm for link prediction in temporal uncertain social networks,” Information Sciences, vol. 331, pp. 120–136, 2016.
  • [29] X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang, “A deep learning approach to link prediction in dynamic networks,” in Proceedings of the 2014 SIAM International Conference on Data Mining.   SIAM, 2014, pp. 289–297.
  • [30] L. Zhou, Y. Yang, X. Ren, F. Wu, and Y. Zhuang, “Dynamic Network Embedding by Modelling Triadic Closure Process,” in AAAI, 2018.
  • [31] T. Li, J. Zhang, S. Y. Philip, Y. Zhang, and Y. Yan, “Deep dynamic network embedding for link prediction,” IEEE Access, 2018.
  • [32] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
  • [33] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” 1999.
  • [34] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” arXiv preprint arXiv:1506.00019, 2015.
  • [35] Z. Han, M. Shang, Z. Liu, C.-M. Vong, Y.-S. Liu, M. Zwicker, J. Han, and C. P. Chen, “Seqviews2seqlabels: Learning 3d global features via aggregating sequential views by rnn with attention,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 658–672, 2019.
  • [36] “Haggle network dataset – KONECT,” Apr. 2017. [Online]. Available: http://konect.uni-koblenz.de/networks/contact
  • [37] R. A. Rossi and N. K. Ahmed, “The network data repository with interactive graph analytics and visualization,” in

    Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

    , 2015. [Online]. Available: http://networkrepository.com
  • [38] R. Michalski, S. Palus, and P. Kazienko, “Matching organizational structure and social network extracted from email communication,” in Lecture Notes in Business Information Processing, vol. 87.   Springer Berlin Heidelberg, 2011, pp. 197–206.
  • [39] “Facebook wall posts network dataset – KONECT,” Apr. 2017. [Online]. Available: http://konect.uni-koblenz.de/networks/facebook-wosn-wall
  • [40] “Linux kernel mailing list replies network dataset – KONECT,” Apr. 2017. [Online]. Available: http://konect.uni-koblenz.de/networks/lkml-reply
  • [41] L. Zhu, D. Guo, J. Yin, G. Ver Steeg, and A. Galstyan, “Scalable temporal latent space inference for link prediction in dynamic social networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2765–2777, 2016.
  • [42] T. Li, B. Wang, Y. Jiang, Y. Zhang, and Y. Yan, “Restricted boltzmann machine-based approaches for link prediction in dynamic networks,” IEEE Access, 2018.
  • [43] Y. Yang, R. N. Lichtenwalter, and N. V. Chawla, “Evaluating link prediction methods,” Knowledge and Information Systems, vol. 45, no. 3, pp. 751–782, 2015.
  • [44] R. R. Junuthula, K. S. Xu, and V. K. Devabhaktuni, “Evaluating link prediction accuracy in dynamic networks with added and removed edges,” in Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom), 2016 IEEE International Conferences on.   IEEE, 2016, pp. 377–384.
  • [45] G. H. Nguyen, J. B. Lee, R. A. Rossi, N. K. Ahmed, E. Koh, and S. Kim, “Continuous-time dynamic network embeddings,” in 3rd International Workshop on Learning Representations for Big Networks (WWW BigNet), 2018.
  • [46] L. Maaten and G. Hinton, “Visualizing data using t-sne,”

    Journal of machine learning research

    , vol. 9, no. Nov, pp. 2579–2605, 2008.