Generative Temporal Link Prediction via Self-tokenized Sequence Modeling

by   Yue Wang, et al.

We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10% improvements on AUC) among other alternatives.



There are no comments yet.


page 1

page 2

page 3

page 4


Signed Link Prediction with Sparse Data: The Role of Personality Information

Predicting signed links in social networks often faces the problem of si...

High-order joint embedding for multi-level link prediction

Link prediction infers potential links from observed networks, and is on...

E-LSTM-D: A Deep Learning Framework for Dynamic Network Link Prediction

Predicting the potential relations between nodes in networks, known as l...

HTGN-BTW: Heterogeneous Temporal Graph Network with Bi-Time-Window Training Strategy for Temporal Link Prediction

With the development of temporal networks such as E-commerce networks an...

GCN-GAN: A Non-linear Temporal Link Prediction Model for Weighted Dynamic Networks

In this paper, we generally formulate the dynamics prediction problem of...

xERTE: Explainable Reasoning on Temporal Knowledge Graphs for Forecasting Future Links

Interest has been rising lately towards modeling time-evolving knowledge...

Probabilistic Latent Tensor Factorization Model for Link Pattern Prediction in Multi-relational Networks

This paper aims at the problem of link pattern prediction in collections...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many real-world applications could be modeled as link prediction problems. For example, the recommendation system could be treated as a network system learns to connect user nodes with product nodes Esslimani et al. (2011); the recommendation of friends in the social media is the prediction for the future links based on the current social network structure Liben-Nowell and Kleinberg (2007); even the financial risk could be discussed through the link formation probabilities between the financial organizations in an economic network Hisano (2018). Two mainstream categories in link prediction are either based on the statistical patterns of the link formation behaviors of the network Jeong et al. (2003); Adamic and Adar (2003); Liben-Nowell and Kleinberg (2007) or the graph representation learning Tang et al. (2015); Wang et al. (2018)

methods which embed nodes as vectors with respect to the network topological information. Most of these methods are discriminative models that verify whether an unknown link given during the test time is rational by training a classifier on existing links and negative samples

Menon and Elkan (2011a). These methods show moderate performance by learning the decision boundary between positive samples (usually the observed links) and negative samples (usually random links between two arbitrary nodes). The temporal information of how links appear in a chronological order, which embodies rich information and useful in practical applications, is completely ignored. To further improve the modeling ability, recent works study the temporal link prediction Dunlavy et al. (2011) which improve the prediction performance based on the temporal information captured by the time-dependent methods Yu et al. (2017). However, since they do not consider the contextual relationship Wörgötter and Porr (2005) contained in the chronological link sequence, they hardly capture the accurate network formation dynamics Hauert and Nowak (2005) (or link formation patterns) for the future links. The ignorance of the chronological link sequence during the formation progress of the evolving networks (e.g. the evolving social network Kossinets and Watts (2006) and evolving economic network Kirman (1997)) brings the following two challenges for the temporal link prediction.

  • Network dynamics. Most classic methods based on the node-level empirical statistical rules Jeong et al. (2003); Adamic and Adar (2003) without considering the network formation dynamics. This may result in the performance degradation when the statistical rules vary from time to time.

  • Network model bias. Since it is difficult to model the link formulation patterns, the graph representation methods model the general latent link patterns from the observed network without considering the chronological order for links they are observed. Therefore, they hardly capture the link formation patterns directly and the network reconstructed by them with the historical data may bias from the current network, while the structure is already evolved with the new links added Yu et al. (2017). This hampers the accuracy of the prediction results for the graph representation learning methods.

One way to solve the aforementioned challenges is to sort the links as a link sequence in their emerging time order and learn the link formation patterns based on the obtained link sequence. Inspired by the framework of the neural language modeling Sutskever et al. (2014)

which studies the contextual relationship between observed words and the succeeding word in NLP fields, we adopt the sequence modeling techniques for temporal link prediction. By analogizing the idea of neural language modeling in NLP to temporal link prediction in graph mining, we formalize the link formation pattern as a conditional probability distribution and propose a neural network model (Generative Link Sequence Modeling, GLSM) to learn the temporal link formation patterns from the chronologically ordered link sequences with an RNN

Mikolov et al. (2010) based sequence modeling framework. Unlike previous discriminative counterparts, GLSM introduces a generative perspective which not only models the existence of different links but also the order that they are observed. It first learns the conditional probability distribution between the preceding and succeeding sequences enumerated from the observed link sequence and then predicts the future links with a generating process to sample the potential future links based on the learned distribution.

However, simply adopting raw links for sequence modeling may lead to several issues. Since a link is a tuple of nodes which contains the binary relationship between a source and a destination node, this relationship is discarded when we directly encode the links as the unary tokens like the text tokens in NLP models. Besides, too specific tokens, e.g. each one corresponding to a raw link between every two nodes, break down the dependencies among links with similar behaviors in the resulted token sequence. This may lead to the serious overfitting problem and thus the RNN could hardly capture any useful contextual relationship. To obtain the suitable token sequence for the sequence modeling framework, we propose a self-tokenization mechanism to control the granularity of the obtained tokens and the degree of contextual correlation in the resulted sequence automatically. The self-tokenization mechanism consists of a clustering process to obtain the abstract aggregation token alphabet and a mapping process to generate the tokens based on the resulted alphabet. With a differentiable clustering distance function, the loss function of the self-tokenization is incorporated into the loss function of the sequential modeling so that the model not only learns to self-tokenize the raw link sequences that preserve inter-link dependencies but also encode the temporal information among self-tokenized sequences via sequential modeling.

In the experiment sections, we verify that GLSM captures the useful contextual relationship between the preceding and succeeding link sequences and the generated future links cover the ground-truth positive links effectively in five real-world temporal networks. We also compare GLSM with the state-of-art methods on temporal link prediction tasks with different parameters, where GLSM outperforms existing methods. What’s more, the experiment results in the case study also indicates that the self-tokenization mechanism helps GLSM capture the link formation patterns between the different communities of the temporal networks.

In summary, this work includes the following contributions:

  • We introduce temporal link prediction by a sequence modeling framework to discover the conditional distribution (defined as the temporal link formation pattern) between the preceding and the succeeding link sequences.

  • We propose a self-tokenization mechanism to encode links as the tokens with respect to the clusters obtained by a clustering process on the original network while keeping the chronological order. This mechanism allows our method to capture the correct network formation dynamics from the observed network and thus alleviates the network model bias problem.

  • We propose a two-step sampling link generator to generate potential future links based on the learned temporal link formation patterns from the observed network.

  • We compare GLSM with state of the art methods on five real-world temporal network datasets and the results show that sequence modeling, along with the proposed self-tokenization mechanism, could achieve the best performance on the temporal link prediction tasks.

2 Preliminary

In this section, we formalize the related notations about the temporal network and define the temporal link prediction problem with the sequence modeling learning framework.

2.1 Temporal Network

We model the temporal network as a graph with a fixed node set and a link sequence . is the observing time for , and the links in are sorted in their emerging time order of the physical world and each link () is a tuple where and are the nodes from the set and is the timestamp when is emerging between and (). Figure 1 illustrates an example of the temporal network in this link sequence form, where and , .

Figure 1: Example of the temporal network

In this setting, for not losing the generality, the newly emerging nodes could be treated as an “unknown” node which is also included in the node set .

2.2 Temporal Link Prediction

In the practical application scenario, the intuitive requirement to predict the temporal links is to predict the potential future links given the historical links (e.g. given the current purchasing records in an e-commerce system, how to predict the future purchasing for users?). To solve the intuitive requirement in this scenario, we divide the temporal link prediction into two steps. First, to learn the temporal link formation pattern for the links from the historical link sequence, and then to infer the future links based on the learned patterns.

Suppose the observed links for the temporal network are the link sequence sorted in the chronological order, then the temporal link formation pattern describes the probability to observe a succeeding link sequence based on a preceding link sequence. We formalize the temporal link formation pattern in Definition 1.

Definition 1

Temporal Link Formation Pattern. Given a temporal network at time , the temporal link formation pattern is defined as the conditional probability distribution which means the emerging probabilities for the links in based on the initial link sequence . This conditional probability is computed in the following way.


With the definition for the temporal link formation pattern, the problem to learn the temporal link formation patterns from the observation is formalized as the following.

Definition 2

Temporal Link Formation Pattern Learning. Given a temporal network at time

, the temporal link formation pattern learning can be defined as the problem to estimate the conditional probability distribution

through the optimization of the following Equation.


where is the estimated probability for the link sequence based on the preceding sequence ; is the probability of which is measured from the observation.

The estimated probability also simplifies as in the remaining part. Note that Equation (2) is a cross entropy function de Boer et al. (2005) to get the difference between the estimated probabilities and the observed probabilities. In this work, Equation (2) actually allows a model to learn the evolving pattern between the sequence of estimated future links and the sequence of historical links. Therefore, with the objective function in Equation (2), the learned temporal link formation pattern captures both rules of the network formation dynamics and the evolving structure for networks and thus alleviate the network model bias problem. This is a sequence modeling problem Sutskever et al. (2014) while RNN is good at dealing with such a problem. Therefore, we propose an RNN based neural network model to solve it. After the training process of the RNN, we enumerate the future links based on the learned temporal link formation pattern.

3 Our Framework

In this section, we propose the Generative Link Sequence Modeling (GLSM) to learn the temporal link formation pattern and generate the future links. As shown in Figure 2, the GLSM consists of a training process to learn the temporal link formation patterns and a generating process to generate the future links. We will give the details of the GLSM in the remaining part.

(a) Training process of GLSM
(b) Generating process of GLSM
Figure 2: Framework of GLSM

3.1 Temporal Link Sequence Modeling via Self-tokenization

We introduce the training process in Figure 2 (a) which learns the temporal link formation patterns with the sequence modeling framework in this section.

The key to implement the temporal link prediction with the sequence modeling framework is to convert the link sequences to a unary token sequence fitted the input of RNN. We formalize this process in following definitions.

Definition 3

Basic link sequence tokenization. Given a temporal network , the target of tokenization is to establish a tokenization map : and result in a new sequence (), where is an alphabet containing all the tokens ().

A naive tokenization method is to map every “node-to-node” link to a unique token. It results in a alphabet which consists of all the possible links between any two nodes in . This leads to a serious problem: since all the tokens in the tokenized sequence are different, when the RNN is fed with such a sequence, it is easily overfitting and outputs the patterns without any connection between the preceding and succeeding sequences. The experimental results in Figure 4 verifies this analysis. To solve this problem, we propose the self-tokenization mechanism which generates the token sequence with a clustering process.

Definition 4

Self-tokenization Mechanism. Given a temporal network , suppose is the set of the obtained subgraphs (or communities) in after the clustering of (,, ). The target of self-tokenization is to automatically establish a tokenization map : and result in a unary token sequence (, ), where is an alphabet containing all the tokens ().

This tokenization method first applies a clustering to the network to the different subgraphs (communities) and then maps the source and destination nodes of a specific link to the corresponding subgraphs (communities). This transforms the original “node-to-node” links to the “community-to-community” links. By constructing and referring to the alphabet with all of the resulted “community-to-community” links, we finally obtain a unary token sequence. For example, given a link sequence , where , and , suppose the clustering generates communities. The nodes , and are in the -st community, the node is in the -nd community, and the node and are in the -rd community, the original links can convert to , , and . Then the alphabet . Finally, the original link sequence is tokenized to according to .

This mechanism applies the hypergraph partitioning method Selvakkumaran and Karypis (2006) which is effective in controlling the granularity of the graph for the graph mining methods. Therefore, the resulted token sequence is coarser in granularity and more general than the original link sequence. Furthermore, since the clustering results in the non-overlapping communities which consist of similar nodes, the resulted sequence also contains the network topological information. After the clustering process, the “unknown” nodes can also be labeled as a single community before the sampling process.

With this setting, the number of the community is used as a parameter to control the size of the output alphabet and thus determine the generality of the resulted token sequence. With the special conditions, with , the self-tokenization mechanism can be reduced to the basic link sequence tokenization in Definition 3.

To generate the suitable communities (or subgraphs) for this task, we integrate a self-clustering process Rozemberczki et al. (2018) in the Equation (2) and result in the following loss function.


where is the set of communities, is the center node of the -th community, can be any differentiable clustering distance function between two nodes and is the weight of the clustering loss. Note that the links in Equation (3) have already been transformed as the “community-to-community” tokens which are different from the “node-to-node” links in Equation (2). As it is illustrated in Figure 2 (a), where tokenization for the sequences is based on the obtained communities from the clustering process, the temporal link formation patterns and the clustering are learned simultaneously.

The pseudocode of the forward algorithm about the temporal link formation modeling, which corresponds to the training part indicated in Figure 2 (a), is listed in Algorithm 1.

Data: Temporal network , cluster number , clustering weight

, epoch threshold

, chunk size

, stride

Result: The trained RNN, the tokenization map
1 begin
2        Initialize the hidden state
3        for epoch do
4               Compute the map and the average clustering distance for all in with center nodes
5               Generate the token sequence given and (Def. 3.2)
6               Enumerate a chunk from with size randomly
7               Get the preceding sequence and the succeeding sequence from given
8               Infer the conditional distribution by Eq. (4)
9               Compute the loss given , by Eq. (2)
10               Compute the final loss given , and by Eq. (3)
12        end for
13       Output the final loss , the tokenization map
14 end
Algorithm 1 Temporal link sequence modeling

Line 4 clusters the nodes in different clusters and Line 5 tokenizes the original link sequence to the unary token sequence with the self-tokenization mechanism. To learn the temporal link formation patterns from the tokenized sequences, for each epoch, Lines 6 and 7 generate a preceding sequence and a succeeding sequence randomly from the input sequence, where the stride is a parameter to control the overlapped portion between and . After that, Line 8 computes the conditional distribution with the RNN by Equation (4).


where is the hidden state value at time . Line 9 computes the cross entropy loss with and by Equation (2). The final loss

is calculated by Equation (3). Since Equation (3) incorporates the cross entropy and the clustering loss (distance) together, this model trains the RNN and performs the clustering simultaneously. The training applies the backpropagation with the Adam optimizer

Kingma and Ba (2014). What’s more, since the complexity of Algorithm 1 is proportional to the epoch threshold, its time complexity is .

When the training process is complete, We obtain a trained RNN containing the temporal link formation patterns and the tokenization map which records the labels for all the nodes in respectively. Since the obtained RNN captures the network formation dynamics, we can generate the future possible network structure as the prediction for the evolving network with it and this alleviates the network model bias problem.

3.2 Generating Links with the Two-step Sampling Link Generator

Data: a trained RNN and tokenization map , the link sequence , generation round
Result: a set of predicted positive links
1 begin
2        Generate the token sequence given and (Def. 3.2)
3        Randomly enumerate a 1-length sequence from
4        Initialize the hidden states
5        for  do
6               Infer the conditional distribution with and by Eq. (4)
7               -sampling: draw the next token from a multinomial distribution experiment given
8               -sampling: draw the next link given
9               if  then
11               end if
13        end for
14       Output
15 end
Algorithm 2 Two-step sampling link generator

With the trained RNN, we propose the two-step sampling link generator to generate the new links and this process is shown in Algorithm 2. This corresponds to the generating process in Figure 2 (b). Its basic idea is to sample the links according to the conditional probabilities learned by the trained RNN. Since the token obtained by the self-tokenization mechanism refers to the abstract “community-to-community” link, Algorithm 2 consists of two sampling steps to get the “node-to-node” links. The first sampling step is started with a randomly chosen “seed” link sequence in Line 3. With the seed link sequence, Line 6 iteratively infers the next probability distribution for all the tokens of the alphabet . Line 7 generates the next token according to with a multinomial distribution sampling without replacement.

-sampling. Since is actually the token of a specific “community-to-community” link, to obtain a “node-to-node” link which may appear in the original network, Line 8 samples the “node-to-node” link with a given token . The sampling is implemented through drawing source and destination nodes from the two related communities of respectively. Note that this -sampling is a general process which could use any sampling method such as the weighted random sampling, the greedy sampling, the beam sampling, etc. The candidate link set is generated by enumerating a complete combination of all the possible links between the nodes in the source community and the destination community for a related token . The linkage probabilities are computed by Equation (5).


where and are the node sets of the source and destination communities related to and is the embedding layer of the nodes trained by the clustering process in Algorithm 1. contains the latent features for all the nodes, and a multiplication of two embedding vectors in Equation (5) actually computes all the linkage probabilities for the related nodes between the two corresponding communities. This process prunes the search space for the link probabilities of the next links and incorporates the temporal information into the basic graph representing framework. We verifies that this method indeed improves the quality of the generated links in the experiment. For each iteration of Algorithm 2, a newly generated link which is not included in and the existing link set will be appended at the end of the link sequence and it is also used as the input for the next sampling. The complexity of Algorithm 2 is which is positively proportional to the round number .

4 Experiment and Discussion

4.1 Dataset

We compare our methods with the existing methods on five real-world datasets. Their details are shown in Table 1.

CollegeMsg Movielens Bitcoin AskUbuntu Epinions
Total links 59,835 100,000 35,592 100,000 100,000
Source card. 1,350 944 4,814 10,016 6,718
Destiny card. 1,862 1,683 5,858 10,001 22,406
Node number 1,899 1,682 5,881 12,513 27,370
Edge density 0.024 0.126 0.0025 0.0020 0.0004
Rating range 0-1 0-5 0-20 0-1 0-5
Days covered 193 214 1,903 1,038 4,326
Start 2004.04 1997.09 2010.11 2001.09 1999.07
End 2004.10 1998.04 2016.01 2003.06 2011.05
Table 1: Dataset statistics

Our datasets cover the different applications on the recommendation system and social network. “Movielens”222, “Netflix”333 and “Epinions” Tang et al. (2012) are the classic datasets to test the performance of link prediction or recommendation models. “CollegeMsg” is a binary online social network from Panzarasa et al. (2009). “Bitcoin” is from Kumar et al. (2016) and it records the trust scores between the users from the Bitcoin online marketplaces444 in the corresponding transactions. All dataset are in the format of “source, destination, rating, timestamp”. All ratings are adjusted to 0 or 1 since our method can only deal with the binary prediction problem. The node number is the number of different nodes after merging the same nodes in source and destination position of the records. To test the performance of temporal link prediction for all the methods, we order the links in each dataset in the chronological order respectively to simulate the link formation in the real scenario, and then, we select a ratio, training ratio , of links as the training set (historical links) and leave the remaining links as the testing set (future links). With this setting, we test the performance of state of the art methods in predicting the real ground-truth future links.

4.2 Experiment settings and benchmark.

Comparison methods.

Our methods are compared with the state-of-the-art link prediction methods which are used in most related studies.

  • Jaccard Coefficient (JC) Liben-Nowell and Kleinberg (2007) and Adamic Adar (AA) Adamic and Adar (2003). JC and AA are the classic link prediction methods based on the statistical similarity scores 6. They assume that, for a social network, two unconnected nodes with the high statistical scores have the large probability to be linked together in the future. Their similarities are computed based on the common neighbor number Newman (2001) between two nodes in different forms.

  • Matrix Factorization (MF) Menon and Elkan (2011b). MF factorizes the adjacency matrix of the network into two matrices with latent feature dimension. Since it’s factorized matrices could easily be explained with the relationship between the latent user features, it is applied in many real-world network systems as the recommendation algorithm.

  • Temporal Matrix Factorization (TMF) Yu et al. (2017). TMF uses the time-dependent matrix factorization method to improve the temporal link prediction performance from the original MF method.

  • Graph GAN (GG) Wang et al. (2018). GG is a neural network model which is based on the framework of the graph representing method and GAN Goodfellow et al. (2014). With the dynamic game between the generator and discriminator, this model could reach higher performance than the previous methods such as LINE Tang et al. (2015), DeepWalk Perozzi et al. (2014), etc. Therefore, we pick GG as the representative method for the graph representation learning methods.

  • Graph AutoEncoder (GAE)

    Kipf and Welling (2016). GAE apply the graph convolution to extend the basic graph representation vectors to the high-order dimensions. This allows it captures the generality of the changing patterns for the links in a network. This gives it the better performance than the basic graph representation learning methods.

Experiment settings.

To make the comparison fair, we implement all these methods in our prototype system555

with the GPU-version Pytorch and thus these methods are compared with the same data and prediction task platform. In our system, each method generates a link set with the corresponding emerging probabilities for the links. For our method, GLSM, after it generates future positive links, we also sample the negative future links for GLSM from the subtraction set of the current non-existing links and the generated positive links. The set of the negative links is set to the same size of the positive links.

During all the experiments, GLSM uses a Long Short-Term Memory neural network (LSTM)

Sundermeyer et al. (2012) version of RNN with 128 hidden states and 2 layers. We set the clustering weight

to 0.5 and use the k-means clustering for the self-tokenization process. For the

-sampling process of GLSM, we use the weighted random sampling since it is the easiest to implement. All results of GLSM are generated after the training process with 20 epochs.

Our experiment is twofold. First, we compare all mentioned methods on five real-world datasets with temporal link prediction tasks in different parameters. Then, we perform two case studies to analyze the practical meaning of the obtained clusters after training.

4.3 Effectiveness Experiment

In this experiment, we split each dataset into different windows, and test the methods for all the windows. We first compare the prediction performances of the mentioned methods on the different datasets, and then, we analyze the hit ratios (measures the output quality) of the generated links for all methods on all the datasets. Finally, we analyze the sensitivity of GLSM in different parameters.

To further compare the prediction performance, we use the different training ratio to test the capabilities of the methods to get the correct future links. Since the training ratio decides the ratio of the data to be used as the training set (historical links) and the remaining data as the testing set (future links), the smaller the training ratio is, the bigger the test set with the future links is. This shows that how far into the future could these methods predict with good performance.

Temporal link prediction performance

We compare the temporal link prediction performances of all the mentioned methods with ROC-AUC and RMSE (since the result of GLSM is binary, we set the weight to any link to 1 in RMSE test) on all the datasets with 10,000 link window size. ROC-AUC result shows the capabilities of the methods to distinguish positive and negative links and RMSE result shows the accuracy of the output linkage probabilities. Tables 2 and 3 illustrate the ROC-AUC and RMSE results. We set the training ratio for all the datasets.

Model CollegeMsg Movielens Bitcoin AskUbuntu Epinions
GLSM 0.71320.0031 0.75210.0007 0.75090.0016 0.75080.0001 0.75040.0012
GAE 0.66340.0031 0.69310.0016 0.73720.0016 0.72410.0014 0.65910.0007
AA 0.55010.0003 0.74470.0006 0.60330.0002 0.61290.0023 0.51110.0000
MF 0.53950.0003 0.60550.0021 0.58540.0011 0.57170.0009 0.51230.0001
GG 0.56590.0006 0.53540.0006 0.56110.0013 0.55500.0018 0.52450.0003
TMF 0.51750.0001 0.50860.0001 0.54510.0002 0.52490.0002 0.51630.0002
JC 0.53900.0002 0.67480.0019 0.57910.0000 0.59600.0015 0.51100.0000
Table 2: AUC results with 10,000 link window
Model CollegeMsg Movielens Bitcoin AskUbuntu Epinions
GLSM 0.200.10 1.960.72 9.370.03 0.030.01 2.530.43
GAE 0.250.00 2.600.04 9.973.46 0.310.01 2.560.20
AA 0.900.00 3.270.02 11.540.16 0.970.01 4.190.01
MF 0.580.00 3.020.03 11.090.24 0.560.01 3.710.01
GG 0.610.00 3.340.03 11.230.22 0.590.01 3.710.01
TMF 0.660.00 3.230.04 11.220.23 0.670.02 3.740.01
JC 0.980.00 3.570.05 11.760.19 0.990.00 4.200.01
Table 3: RMSE results with 10,000 link window

We observe from Tables 2 and 3, the GLSM achieves the best performance of all the methods and its results are stable even on the sparse datasets with the edge degree lower than 0.002. This indicates that GLSM captures the temporal link formation pattern correctly, and the obtained patterns are indeed helpful in generating the potential future links.

Performance comparison in different parameters.

Figure 3 compares the ROC-AUC results of all the methods on the “Epinions” data with different window sizes and the training ratios respectively. The links in our datasets are ordered chronologically to simulate the temporal link prediction task. Since most of the baselines can not directly capture this information, they perform barely (with AUC around 0.5) in this experiment.

(a) Diff. window sizes
(b) Diff. training ratios
Figure 3: Comparison in different parameters

We observe from Figure 3 that GLSM performs the best on all the different parameters. Figure 3 (a) indicates the sensitivities of the models to the different window sizes, and from Figure 3 (a), GLSM could reach a relative high AUC score even in the relatively small window size (2,000 links). Moreover, Figure 3 (b) demonstrates the abilities of the models to capture the future link patterns based on a small ratio of training data. In figure 3 (b), GLSM predicts the future links well with small training data ratio (under 0.5).

Hitting ratio analysis.

Since GLSM iteratively generates the future links which are different from the prediction scoring of the other methods, we analyze the quality of GLSM’s generated links in different iteration rounds. We provide a metric, the hitting ratio, to measure the ratio of the captured real future links by the output of GLSM. This experiment runs on all the datasets with 10,000 links window size. The training ratio is set to 0.7. The results are listed in Table 4, where is the edge density of the network constructed by the testing data. In this setting, the hitting ratio for the random method equals to the corresponding of a given dataset.

Iteration CollegeMsg Movielens Bitcoin AskUbuntu Epinions
500 0.0054 0.0407 0.0061 0.0054 0.0007
1000 0.0063 0.0456 0.0049 0.0050 0.0019
1500 0.0075 0.0499 0.0042 0.0043 0.0016
2000 0.0053 0.0460 0.0035 0.0042 0.0017
2500 0.0044 0.0446 0.0025 0.0038 0.0009
3000 0.0043 0.0474 0.0030 0.0049 0.0006
0.0041 0.0046 0.0015 0.0026 0.0002
Table 4: Comparison on hitting ratios in all datasets

From Table 4, we observe that the hitting ratios of the generated links by GLSM significantly exceed the corresponding and the hitting ratios reach the max within 1,500 iterations on all the datasets. This shows that the links generated by GLSM cover the true positive future links well.

Sensitivity in different parameters.

We analyze the influence of different cluster and chunk numbers on the AUC performance. This experiment is completed on the “Movielens” dataset within the 1,000 link window which involves 586 nodes. We set the epoch to 20 and iteration round to 1,500. The result is shown in Figure 4. In this experiment, GLSM reaches the best performance when the chunk number is between 50 and 100 and its performance degrades when the cluster number is under 50 or above 250. This addresses the discussion in 3.1 about the difference between the basic tokenization and the self-tokenization mechanism. That is, when the cluster number approximates the cluster number, the self-tokenization mechanism can be reduced to the basic method in Definition 3. What’s more, the small cluster number leads to an alphabet with low generality which also degrades the prediction performance.

Figure 4: Sensitivity
Figure 5: Case study

4.4 Case study: Early User More Popular

We analyze the clustering result obtained by the self-tokenization process of GLSM on the one-mode network from the “CollegeMsg” dataset. In this study, we order all users in “CollegeMsg” in the ascending order according to their registered time. Thus the early user has the high rank and vice versa. Then, we compute the average user rank for each cluster after the training of GLSM. We use the first 1,000 links of “CollegeMsg” network which relates to 237 users and set the clustering number to 5. Based on the settings, we analyze the relationship between the size of the cluster and the average user rank in the corresponding cluster. We show the results after five independent trainings in Figure 5. We observe that the average user ranks of the clusters are significantly different and the average user rank has a strong trend to diminish with the increase of the cluster size. Since the cluster size reflects the popularity for the corresponding cluster (community), this result agrees with the conclusion in Panzarasa et al.

that the early users (with small average user rank) in the social network are more popular. We further find that the variance of the average user rank in each cluster is smaller than the variance of the average user rank for all the users. This indicates that the registered time of the users in each cluster is close to each other. Therefore, the clustering result of the self-tokenization process helps GLSM capture the link formation patterns between the different communities of users (or nodes) for the temporal network.

4.5 Case study: General Recommendation

We analyze the clustering result obtained by the self-tokenization process of GLSM on the network from the “Movielens” dataset. In this study, we use the first 500 links of the “Movielens” dataset which relates to 286 nodes (including both the users and movies) and set the clustering number to 4. Since the “Movielens” is a bipartite network and our model can not directly be applied to the bipartite problem, we transform the bipartite network as a one-mode graph by labeling the nodes in different sets with unique identities. To keep the bipartite information, we retain the map between the obtained identities and the original nodes. Based on this map, we further divide the clustering results to the user communities and the movie communities. Therefore, we obtain 8 communities after the training process of GLSM. We highlight 2 user communities and 2 movie communities from the results in Figure 6 respectively. We observe that the users or movies with similar network topological structures are classified to the same communities. After the tokenization with this clustering results, in the generating process, the 1-st sampling process of Algorithm 2 will sample a token which is related to any two communities with different types (e.g. community 1 and community 2) and the 2-nd sampling process of Algorithm 2 will generate a specific “user-to-movie” link (recommendation) based on the selected communities from the 1-st sampling. This shows that the self-tokenization prunes the complete combination for all “user-to-movie” links by utilizing the network topological structure to get the candidate link set and then generate the specific links based on the obtained candidate link set. Consequently, besides generating the abstract token sequence, the self-tokenization mechanism also helps to improve the prediction performance for GLSM with the network topological information.

(a) Community 1
(b) Community 2
(c) Community 3
(d) Community 4
Figure 6: Recommendation among different communities. The red nodes are the highlighted communities for each result.

5 Related works

Link prediction is an ubiquitous problem in recommendation system Esslimani et al. (2011), social media Hisano (2018), medicine Lichtenwalter et al. (2010) and even finance Hisano (2018). Most of these methods are the discriminative models to classify the unknown links as the existing or the non-existing links for the target networks Menon and Elkan (2011a)

. The mainstream methods for the link prediction include the classic methods based on the statistical empirical evidence and the machine learning methods. The classic methods

Liben-Nowell and Kleinberg (2007) Adamic and Adar (2003) Jeong et al. (2003) measure the similarities for between couples of nodes through their common neighbor numbers Newman (2001). The power law distribution of the complex network Barabási and Albert (1999) helps these similarity scoring methods perform well on distinguishing the positive and negative links statistically. However, since they lack the structure to get the network formation dynamics, they could not capture the accurate links for a real-world network when the statistical rules are not significant enough. To make the accurate prediction for a single node, many machine learning methods apply the matrix factorization Menon and Elkan (2011b) or the graph representation learning methods Perozzi et al. (2014). Their resulted latent representations for the users improve the result of the prediction performance on the user-level. Some works further improve the prediction performance by applying a GAN framework to supervise the representing quality Wang et al. (2018). To increase the generality of the graph representation learning methods, recent methods combine the graph convolution and the graph autoencoder (GAE) Kipf and Welling (2016) to extend the embedding vectors to a higher dimension latent space. Recent work also applies the graph neural network to analyze the link prediction problem Zhang and Chen (2018). However, since most mainstream methods ignore the temporal information in the evolving networks, it is difficult for them to accurately distinguish the future positive and negative links with the limit data in a relatively small window. To utilize the temporal information, some work Yu et al. (2017) uses the sliding-window style time-dependent method to address the temporal link prediction issue. Furthermore, several works discuss the related problems as the temporal graph mining Yang et al. (2016) Li et al. (2018) by considering the links as streams Viard et al. (2016). Whereas, the existing works ignore the important information contained in the chronological order of the link emerging. We orders the links in the sequence with their emerging chronological order to simulate the real scenario and applies the sequence modeling framework Sutskever et al. (2014) to capture the temporal link formation patterns. The experiments show that this framework could capture the effective temporal link formation patterns and result in the good performance in predicting the future links based on the observed links.

6 Conclusion

In this work, we propose the Generative Link Sequence Modeling (GLSM) to predict the future links based on the historical observation. GLSM combines an RNN process to learn the temporal link formation patterns in the sequence modeling framework and a two-step sampling link generation process to generate the future links. To transform the temporal link prediction to the framework of the sequence modeling, we propose the self-tokenization mechanism to convert the binary link sequence to the unary token sequence with the proper granularity. The self-tokenization process incorporates a clustering process which allows it generates the token sequence automatically. The clustering process also helps the resulted token sequence to capture the network topological information. The RNN process of GLSM learns the temporal link formation pattern from the resulted token sequence. Since the RNN process depends on the obtained token sequence from the self-tokenization process, the RNN and self-tokenization process could be trained simultaneously. With the learned temporal link formation pattern, GLSM generates the future links with the two-step sampling link generation process. Experimental results show that GLSM performs the best of all the mentioned methods on the real-world temporal networks and this verifies the temporal information contained in the chronological order for the links is useful in designing the link prediction models.

7 Acknowledgments

This work is done while Yue Wang is visiting the University of Illinois at Chicago. This work is supported by the National Natural Science Foundation of China (Grant No.61503422, 61602535), The Open Project Program of the National Laboratory of Pattern Recognition (NLPR) and Program for Innovation Research in Central University of Finance and Economics. This work is also supported in part by NSF through grants IIS-1526499, IIS-1763325, and CNS-1626432, and NSFC 61672313.


  • [1] L. A. Adamic and E. Adar (2003) Friends and neighbors on the web. Social networks 25 (3), pp. 211–230. Cited by: 1st item, §1, 1st item, §5.
  • [2] A. Barabási and R. Albert (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §5.
  • [3] P. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein (2005-02-01) A tutorial on the cross-entropy method. Annals of Operations Research 134 (1), pp. 19–67. External Links: ISSN 1572-9338 Cited by: §2.2.
  • [4] D. M. Dunlavy, T. G. Kolda, and E. Acar (2011-02)

    Temporal link prediction using matrix and tensor factorizations

    ACM Trans. Knowl. Discov. Data 5 (2), pp. 10:1–10:27. External Links: ISSN 1556-4681 Cited by: §1.
  • [5] I. Esslimani, A. Brun, and A. Boyer (2011) Densifying a behavioral recommender system by social networks link prediction methods. Social Network Analysis and Mining 1 (3), pp. 159–172. Cited by: §1, §5.
  • [6] (2002) Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311 (3), pp. 590 – 614. Cited by: 1st item.
  • [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. Cited by: 4th item.
  • [8] C. Hauert and M. A. Nowak (2005) Evolutionary dynamics on graphs. Nature 433, pp. 312–316. Cited by: §1.
  • [9] R. Hisano (2018) Semi-supervised graph embedding approach to dynamic link prediction. In Complex Networks IX, S. Cornelius, K. Coronges, B. Gonçalves, R. Sinatra, and A. Vespignani (Eds.), Cham, pp. 109–121. Cited by: §1, §5.
  • [10] H. Jeong, Z. Néda, and A. L. Barabási (2003) Measuring preferential attachment in evolving networks. EPL (Europhysics Letters) 61 (4), pp. 567. Cited by: 1st item, §1, §5.
  • [11] D. P. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. Cited by: §3.1.
  • [12] T. N. Kipf and M. Welling (2016) Variational graph auto-encoders. CoRR abs/1611.07308. Cited by: 5th item, §5.
  • [13] A. Kirman (1997-12-01) The economy as an evolving network. Journal of Evolutionary Economics 7 (4), pp. 339–353. Cited by: §1.
  • [14] G. Kossinets and D. J. Watts (2006) Empirical analysis of an evolving social network. Science 311 (5757), pp. 88–90. External Links: Document Cited by: §1.
  • [15] S. Kumar, F. Spezzano, V. Subrahmanian, and C. Faloutsos (2016) Edge weight prediction in weighted signed networks. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pp. 221–230. Cited by: §4.1.
  • [16] R. Li, J. Su, L. Qin, J. X. Yu, and Q. Dai (2018) Persistent community search in temporal networks. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, pp. 797–808. Cited by: §5.
  • [17] D. Liben-Nowell and J. Kleinberg (2007) The link-prediction problem for social networks. journal of the Association for Information Science and Technology 58 (7), pp. 1019–1031. Cited by: §1, 1st item, §5.
  • [18] R. Lichtenwalter, J. T. Lussier, and N. V. Chawla (2010) New perspectives and methods in link prediction. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010, pp. 243–252. Cited by: §5.
  • [19] A. Menon and C. Elkan (2011) Link prediction via matrix factorization. Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Cited by: §1, §5.
  • [20] A. K. Menon and C. Elkan (2011) Link prediction via matrix factorization. In Machine Learning and Knowledge Discovery in Databases, D. Gunopulos, T. Hofmann, D. Malerba, and M. Vazirgiannis (Eds.), Berlin, Heidelberg, pp. 437–452. External Links: ISBN 978-3-642-23783-6 Cited by: 2nd item, §5.
  • [21] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur (2010) Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp. 1045–1048. Cited by: §1.
  • [22] M. E. J. Newman (2001) The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98 (2), pp. 404–409. External Links: Document Cited by: 1st item, §5.
  • [23] P. Panzarasa, T. Opsahl, and K. M. Carley Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. Journal of the American Society for Information Science and Technology 60 (5), pp. 911–932. Cited by: §4.4.
  • [24] P. Panzarasa, T. Opsahl, and K. M. Carley (2009) Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. JASIST 60 (5), pp. 911–932. Cited by: §4.1.
  • [25] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA, pp. 701–710. External Links: ISBN 978-1-4503-2956-9 Cited by: 4th item, §5.
  • [26] B. Rozemberczki, R. Davies, R. Sarkar, and C. A. Sutton (2018) GEMSEC: graph embedding with self clustering. CoRR abs/1802.03997. Cited by: §3.1.
  • [27] N. Selvakkumaran and G. Karypis (2006-03) Multiobjective hypergraph-partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25 (3), pp. 504–517. External Links: Document, ISSN 0278-0070 Cited by: §3.1.
  • [28] M. Sundermeyer, R. Schlüter, and H. Ney (2012) LSTM neural networks for language modeling. In INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 194–197. Cited by: §4.2.
  • [29] I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3104–3112. Cited by: §1, §2.2, §5.
  • [30] Cited by: §4.1.
  • [31] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) LINE: large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, Republic and Canton of Geneva, Switzerland, pp. 1067–1077. External Links: ISBN 978-1-4503-3469-3 Cited by: §1, 4th item.
  • [32] T. Viard, M. Latapy, and C. Magnien (2016) Computing maximal cliques in link streams. Theor. Comput. Sci. 609, pp. 245–252. Cited by: §5.
  • [33] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo (2018) GraphGAN: graph representation learning with generative adversarial nets. In

    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018

    Cited by: §1, 4th item, §5.
  • [34] F. Wörgötter and B. Porr (2005) Temporal sequence learning, prediction, and control: A review of different models and their relation to biological mechanisms. Neural Computation 17 (2), pp. 245–319. Cited by: §1.
  • [35] Y. Yang, D. Yan, H. Wu, J. Cheng, S. Zhou, and J. C. S. Lui (2016) Diversified temporal subgraph pattern mining. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1965–1974. Cited by: §5.
  • [36] W. Yu, C. C. Aggarwal, and W. Wang (2017) Temporally factorized network modeling for evolutionary network analysis. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6-10, 2017, pp. 455–464. Cited by: 2nd item, §1, 3rd item, §5.
  • [37] M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp. 5171–5181. Cited by: §5.