1 Introduction
Many realworld applications could be modeled as link prediction problems. For example, the recommendation system could be treated as a network system learns to connect user nodes with product nodes Esslimani et al. (2011); the recommendation of friends in the social media is the prediction for the future links based on the current social network structure LibenNowell and Kleinberg (2007); even the financial risk could be discussed through the link formation probabilities between the financial organizations in an economic network Hisano (2018). Two mainstream categories in link prediction are either based on the statistical patterns of the link formation behaviors of the network Jeong et al. (2003); Adamic and Adar (2003); LibenNowell and Kleinberg (2007) or the graph representation learning Tang et al. (2015); Wang et al. (2018)
methods which embed nodes as vectors with respect to the network topological information. Most of these methods are discriminative models that verify whether an unknown link given during the test time is rational by training a classifier on existing links and negative samples
Menon and Elkan (2011a). These methods show moderate performance by learning the decision boundary between positive samples (usually the observed links) and negative samples (usually random links between two arbitrary nodes). The temporal information of how links appear in a chronological order, which embodies rich information and useful in practical applications, is completely ignored. To further improve the modeling ability, recent works study the temporal link prediction Dunlavy et al. (2011) which improve the prediction performance based on the temporal information captured by the timedependent methods Yu et al. (2017). However, since they do not consider the contextual relationship Wörgötter and Porr (2005) contained in the chronological link sequence, they hardly capture the accurate network formation dynamics Hauert and Nowak (2005) (or link formation patterns) for the future links. The ignorance of the chronological link sequence during the formation progress of the evolving networks (e.g. the evolving social network Kossinets and Watts (2006) and evolving economic network Kirman (1997)) brings the following two challenges for the temporal link prediction.
Network model bias. Since it is difficult to model the link formulation patterns, the graph representation methods model the general latent link patterns from the observed network without considering the chronological order for links they are observed. Therefore, they hardly capture the link formation patterns directly and the network reconstructed by them with the historical data may bias from the current network, while the structure is already evolved with the new links added Yu et al. (2017). This hampers the accuracy of the prediction results for the graph representation learning methods.
One way to solve the aforementioned challenges is to sort the links as a link sequence in their emerging time order and learn the link formation patterns based on the obtained link sequence. Inspired by the framework of the neural language modeling Sutskever et al. (2014)
which studies the contextual relationship between observed words and the succeeding word in NLP fields, we adopt the sequence modeling techniques for temporal link prediction. By analogizing the idea of neural language modeling in NLP to temporal link prediction in graph mining, we formalize the link formation pattern as a conditional probability distribution and propose a neural network model (Generative Link Sequence Modeling, GLSM) to learn the temporal link formation patterns from the chronologically ordered link sequences with an RNN
Mikolov et al. (2010) based sequence modeling framework. Unlike previous discriminative counterparts, GLSM introduces a generative perspective which not only models the existence of different links but also the order that they are observed. It first learns the conditional probability distribution between the preceding and succeeding sequences enumerated from the observed link sequence and then predicts the future links with a generating process to sample the potential future links based on the learned distribution.However, simply adopting raw links for sequence modeling may lead to several issues. Since a link is a tuple of nodes which contains the binary relationship between a source and a destination node, this relationship is discarded when we directly encode the links as the unary tokens like the text tokens in NLP models. Besides, too specific tokens, e.g. each one corresponding to a raw link between every two nodes, break down the dependencies among links with similar behaviors in the resulted token sequence. This may lead to the serious overfitting problem and thus the RNN could hardly capture any useful contextual relationship. To obtain the suitable token sequence for the sequence modeling framework, we propose a selftokenization mechanism to control the granularity of the obtained tokens and the degree of contextual correlation in the resulted sequence automatically. The selftokenization mechanism consists of a clustering process to obtain the abstract aggregation token alphabet and a mapping process to generate the tokens based on the resulted alphabet. With a differentiable clustering distance function, the loss function of the selftokenization is incorporated into the loss function of the sequential modeling so that the model not only learns to selftokenize the raw link sequences that preserve interlink dependencies but also encode the temporal information among selftokenized sequences via sequential modeling.
In the experiment sections, we verify that GLSM captures the useful contextual relationship between the preceding and succeeding link sequences and the generated future links cover the groundtruth positive links effectively in five realworld temporal networks. We also compare GLSM with the stateofart methods on temporal link prediction tasks with different parameters, where GLSM outperforms existing methods. What’s more, the experiment results in the case study also indicates that the selftokenization mechanism helps GLSM capture the link formation patterns between the different communities of the temporal networks.
In summary, this work includes the following contributions:

We introduce temporal link prediction by a sequence modeling framework to discover the conditional distribution (defined as the temporal link formation pattern) between the preceding and the succeeding link sequences.

We propose a selftokenization mechanism to encode links as the tokens with respect to the clusters obtained by a clustering process on the original network while keeping the chronological order. This mechanism allows our method to capture the correct network formation dynamics from the observed network and thus alleviates the network model bias problem.

We propose a twostep sampling link generator to generate potential future links based on the learned temporal link formation patterns from the observed network.

We compare GLSM with state of the art methods on five realworld temporal network datasets and the results show that sequence modeling, along with the proposed selftokenization mechanism, could achieve the best performance on the temporal link prediction tasks.
2 Preliminary
In this section, we formalize the related notations about the temporal network and define the temporal link prediction problem with the sequence modeling learning framework.
2.1 Temporal Network
We model the temporal network as a graph with a fixed node set and a link sequence . is the observing time for , and the links in are sorted in their emerging time order of the physical world and each link () is a tuple where and are the nodes from the set and is the timestamp when is emerging between and (). Figure 1 illustrates an example of the temporal network in this link sequence form, where and , .
In this setting, for not losing the generality, the newly emerging nodes could be treated as an “unknown” node which is also included in the node set .
2.2 Temporal Link Prediction
In the practical application scenario, the intuitive requirement to predict the temporal links is to predict the potential future links given the historical links (e.g. given the current purchasing records in an ecommerce system, how to predict the future purchasing for users?). To solve the intuitive requirement in this scenario, we divide the temporal link prediction into two steps. First, to learn the temporal link formation pattern for the links from the historical link sequence, and then to infer the future links based on the learned patterns.
Suppose the observed links for the temporal network are the link sequence sorted in the chronological order, then the temporal link formation pattern describes the probability to observe a succeeding link sequence based on a preceding link sequence. We formalize the temporal link formation pattern in Definition 1.
Definition 1
Temporal Link Formation Pattern. Given a temporal network at time , the temporal link formation pattern is defined as the conditional probability distribution which means the emerging probabilities for the links in based on the initial link sequence . This conditional probability is computed in the following way.
(1) 
With the definition for the temporal link formation pattern, the problem to learn the temporal link formation patterns from the observation is formalized as the following.
Definition 2
Temporal Link Formation Pattern Learning. Given a temporal network at time
, the temporal link formation pattern learning can be defined as the problem to estimate the conditional probability distribution
through the optimization of the following Equation.(2) 
where is the estimated probability for the link sequence based on the preceding sequence ; is the probability of which is measured from the observation.
The estimated probability also simplifies as in the remaining part. Note that Equation (2) is a cross entropy function de Boer et al. (2005) to get the difference between the estimated probabilities and the observed probabilities. In this work, Equation (2) actually allows a model to learn the evolving pattern between the sequence of estimated future links and the sequence of historical links. Therefore, with the objective function in Equation (2), the learned temporal link formation pattern captures both rules of the network formation dynamics and the evolving structure for networks and thus alleviate the network model bias problem. This is a sequence modeling problem Sutskever et al. (2014) while RNN is good at dealing with such a problem. Therefore, we propose an RNN based neural network model to solve it. After the training process of the RNN, we enumerate the future links based on the learned temporal link formation pattern.
3 Our Framework
In this section, we propose the Generative Link Sequence Modeling (GLSM) to learn the temporal link formation pattern and generate the future links. As shown in Figure 2, the GLSM consists of a training process to learn the temporal link formation patterns and a generating process to generate the future links. We will give the details of the GLSM in the remaining part.
3.1 Temporal Link Sequence Modeling via Selftokenization
We introduce the training process in Figure 2 (a) which learns the temporal link formation patterns with the sequence modeling framework in this section.
The key to implement the temporal link prediction with the sequence modeling framework is to convert the link sequences to a unary token sequence fitted the input of RNN. We formalize this process in following definitions.
Definition 3
Basic link sequence tokenization. Given a temporal network , the target of tokenization is to establish a tokenization map : and result in a new sequence (), where is an alphabet containing all the tokens ().
A naive tokenization method is to map every “nodetonode” link to a unique token. It results in a alphabet which consists of all the possible links between any two nodes in . This leads to a serious problem: since all the tokens in the tokenized sequence are different, when the RNN is fed with such a sequence, it is easily overfitting and outputs the patterns without any connection between the preceding and succeeding sequences. The experimental results in Figure 4 verifies this analysis. To solve this problem, we propose the selftokenization mechanism which generates the token sequence with a clustering process.
Definition 4
Selftokenization Mechanism. Given a temporal network , suppose is the set of the obtained subgraphs (or communities) in after the clustering of (,, ). The target of selftokenization is to automatically establish a tokenization map : and result in a unary token sequence (, ), where is an alphabet containing all the tokens ().
This tokenization method first applies a clustering to the network to the different subgraphs (communities) and then maps the source and destination nodes of a specific link to the corresponding subgraphs (communities). This transforms the original “nodetonode” links to the “communitytocommunity” links. By constructing and referring to the alphabet with all of the resulted “communitytocommunity” links, we finally obtain a unary token sequence. For example, given a link sequence , where , and , suppose the clustering generates communities. The nodes , and are in the st community, the node is in the nd community, and the node and are in the rd community, the original links can convert to , , and . Then the alphabet . Finally, the original link sequence is tokenized to according to .
This mechanism applies the hypergraph partitioning method Selvakkumaran and Karypis (2006) which is effective in controlling the granularity of the graph for the graph mining methods. Therefore, the resulted token sequence is coarser in granularity and more general than the original link sequence. Furthermore, since the clustering results in the nonoverlapping communities which consist of similar nodes, the resulted sequence also contains the network topological information. After the clustering process, the “unknown” nodes can also be labeled as a single community before the sampling process.
With this setting, the number of the community is used as a parameter to control the size of the output alphabet and thus determine the generality of the resulted token sequence. With the special conditions, with , the selftokenization mechanism can be reduced to the basic link sequence tokenization in Definition 3.
To generate the suitable communities (or subgraphs) for this task, we integrate a selfclustering process Rozemberczki et al. (2018) in the Equation (2) and result in the following loss function.
(3) 
where is the set of communities, is the center node of the th community, can be any differentiable clustering distance function between two nodes and is the weight of the clustering loss. Note that the links in Equation (3) have already been transformed as the “communitytocommunity” tokens which are different from the “nodetonode” links in Equation (2). As it is illustrated in Figure 2 (a), where tokenization for the sequences is based on the obtained communities from the clustering process, the temporal link formation patterns and the clustering are learned simultaneously.
The pseudocode of the forward algorithm about the temporal link formation modeling, which corresponds to the training part indicated in Figure 2 (a), is listed in Algorithm 1.
Line 4 clusters the nodes in different clusters and Line 5 tokenizes the original link sequence to the unary token sequence with the selftokenization mechanism. To learn the temporal link formation patterns from the tokenized sequences, for each epoch, Lines 6 and 7 generate a preceding sequence and a succeeding sequence randomly from the input sequence, where the stride is a parameter to control the overlapped portion between and . After that, Line 8 computes the conditional distribution with the RNN by Equation (4).
(4) 
where is the hidden state value at time . Line 9 computes the cross entropy loss with and by Equation (2). The final loss
is calculated by Equation (3). Since Equation (3) incorporates the cross entropy and the clustering loss (distance) together, this model trains the RNN and performs the clustering simultaneously. The training applies the backpropagation with the Adam optimizer
Kingma and Ba (2014). What’s more, since the complexity of Algorithm 1 is proportional to the epoch threshold, its time complexity is .When the training process is complete, We obtain a trained RNN containing the temporal link formation patterns and the tokenization map which records the labels for all the nodes in respectively. Since the obtained RNN captures the network formation dynamics, we can generate the future possible network structure as the prediction for the evolving network with it and this alleviates the network model bias problem.
3.2 Generating Links with the Twostep Sampling Link Generator
With the trained RNN, we propose the twostep sampling link generator to generate the new links and this process is shown in Algorithm 2. This corresponds to the generating process in Figure 2 (b). Its basic idea is to sample the links according to the conditional probabilities learned by the trained RNN. Since the token obtained by the selftokenization mechanism refers to the abstract “communitytocommunity” link, Algorithm 2 consists of two sampling steps to get the “nodetonode” links. The first sampling step is started with a randomly chosen “seed” link sequence in Line 3. With the seed link sequence, Line 6 iteratively infers the next probability distribution for all the tokens of the alphabet . Line 7 generates the next token according to with a multinomial distribution sampling without replacement.
sampling. Since is actually the token of a specific “communitytocommunity” link, to obtain a “nodetonode” link which may appear in the original network, Line 8 samples the “nodetonode” link with a given token . The sampling is implemented through drawing source and destination nodes from the two related communities of respectively. Note that this sampling is a general process which could use any sampling method such as the weighted random sampling, the greedy sampling, the beam sampling, etc. The candidate link set is generated by enumerating a complete combination of all the possible links between the nodes in the source community and the destination community for a related token . The linkage probabilities are computed by Equation (5).
(5) 
where and are the node sets of the source and destination communities related to and is the embedding layer of the nodes trained by the clustering process in Algorithm 1. contains the latent features for all the nodes, and a multiplication of two embedding vectors in Equation (5) actually computes all the linkage probabilities for the related nodes between the two corresponding communities. This process prunes the search space for the link probabilities of the next links and incorporates the temporal information into the basic graph representing framework. We verifies that this method indeed improves the quality of the generated links in the experiment. For each iteration of Algorithm 2, a newly generated link which is not included in and the existing link set will be appended at the end of the link sequence and it is also used as the input for the next sampling. The complexity of Algorithm 2 is which is positively proportional to the round number .
4 Experiment and Discussion
4.1 Dataset
We compare our methods with the existing methods on five realworld datasets. Their details are shown in Table 1.
CollegeMsg  Movielens  Bitcoin  AskUbuntu  Epinions  

Total links  59,835  100,000  35,592  100,000  100,000 
Source card.  1,350  944  4,814  10,016  6,718 
Destiny card.  1,862  1,683  5,858  10,001  22,406 
Node number  1,899  1,682  5,881  12,513  27,370 
Edge density  0.024  0.126  0.0025  0.0020  0.0004 
Rating range  01  05  020  01  05 
Days covered  193  214  1,903  1,038  4,326 
Start  2004.04  1997.09  2010.11  2001.09  1999.07 
End  2004.10  1998.04  2016.01  2003.06  2011.05 
Our datasets cover the different applications on the recommendation system and social network. “Movielens”^{2}^{2}2https://grouplens.org/datasets/movielens/, “Netflix”^{3}^{3}3https://www.kaggle.com/netflixinc/netflixprizedata and “Epinions” Tang et al. (2012) are the classic datasets to test the performance of link prediction or recommendation models. “CollegeMsg” is a binary online social network from Panzarasa et al. (2009). “Bitcoin” is from Kumar et al. (2016) and it records the trust scores between the users from the Bitcoin online marketplaces^{4}^{4}4https://bitcoinotc.com/ in the corresponding transactions. All dataset are in the format of “source, destination, rating, timestamp”. All ratings are adjusted to 0 or 1 since our method can only deal with the binary prediction problem. The node number is the number of different nodes after merging the same nodes in source and destination position of the records. To test the performance of temporal link prediction for all the methods, we order the links in each dataset in the chronological order respectively to simulate the link formation in the real scenario, and then, we select a ratio, training ratio , of links as the training set (historical links) and leave the remaining links as the testing set (future links). With this setting, we test the performance of state of the art methods in predicting the real groundtruth future links.
4.2 Experiment settings and benchmark.
Comparison methods.
Our methods are compared with the stateoftheart link prediction methods which are used in most related studies.

Jaccard Coefficient (JC) LibenNowell and Kleinberg (2007) and Adamic Adar (AA) Adamic and Adar (2003). JC and AA are the classic link prediction methods based on the statistical similarity scores 6. They assume that, for a social network, two unconnected nodes with the high statistical scores have the large probability to be linked together in the future. Their similarities are computed based on the common neighbor number Newman (2001) between two nodes in different forms.

Matrix Factorization (MF) Menon and Elkan (2011b). MF factorizes the adjacency matrix of the network into two matrices with latent feature dimension. Since it’s factorized matrices could easily be explained with the relationship between the latent user features, it is applied in many realworld network systems as the recommendation algorithm.

Temporal Matrix Factorization (TMF) Yu et al. (2017). TMF uses the timedependent matrix factorization method to improve the temporal link prediction performance from the original MF method.

Graph GAN (GG) Wang et al. (2018). GG is a neural network model which is based on the framework of the graph representing method and GAN Goodfellow et al. (2014). With the dynamic game between the generator and discriminator, this model could reach higher performance than the previous methods such as LINE Tang et al. (2015), DeepWalk Perozzi et al. (2014), etc. Therefore, we pick GG as the representative method for the graph representation learning methods.

Graph AutoEncoder (GAE)
Kipf and Welling (2016). GAE apply the graph convolution to extend the basic graph representation vectors to the highorder dimensions. This allows it captures the generality of the changing patterns for the links in a network. This gives it the better performance than the basic graph representation learning methods.
Experiment settings.
To make the comparison fair, we implement all these methods in our prototype system^{5}^{5}5https://github.com/hkharryking/tlp
with the GPUversion Pytorch and thus these methods are compared with the same data and prediction task platform. In our system, each method generates a link set with the corresponding emerging probabilities for the links. For our method, GLSM, after it generates future positive links, we also sample the negative future links for GLSM from the subtraction set of the current nonexisting links and the generated positive links. The set of the negative links is set to the same size of the positive links.
During all the experiments, GLSM uses a Long ShortTerm Memory neural network (LSTM)
Sundermeyer et al. (2012) version of RNN with 128 hidden states and 2 layers. We set the clustering weightto 0.5 and use the kmeans clustering for the selftokenization process. For the
sampling process of GLSM, we use the weighted random sampling since it is the easiest to implement. All results of GLSM are generated after the training process with 20 epochs.Our experiment is twofold. First, we compare all mentioned methods on five realworld datasets with temporal link prediction tasks in different parameters. Then, we perform two case studies to analyze the practical meaning of the obtained clusters after training.
4.3 Effectiveness Experiment
In this experiment, we split each dataset into different windows, and test the methods for all the windows. We first compare the prediction performances of the mentioned methods on the different datasets, and then, we analyze the hit ratios (measures the output quality) of the generated links for all methods on all the datasets. Finally, we analyze the sensitivity of GLSM in different parameters.
To further compare the prediction performance, we use the different training ratio to test the capabilities of the methods to get the correct future links. Since the training ratio decides the ratio of the data to be used as the training set (historical links) and the remaining data as the testing set (future links), the smaller the training ratio is, the bigger the test set with the future links is. This shows that how far into the future could these methods predict with good performance.
Temporal link prediction performance
We compare the temporal link prediction performances of all the mentioned methods with ROCAUC and RMSE (since the result of GLSM is binary, we set the weight to any link to 1 in RMSE test) on all the datasets with 10,000 link window size. ROCAUC result shows the capabilities of the methods to distinguish positive and negative links and RMSE result shows the accuracy of the output linkage probabilities. Tables 2 and 3 illustrate the ROCAUC and RMSE results. We set the training ratio for all the datasets.
Model  CollegeMsg  Movielens  Bitcoin  AskUbuntu  Epinions 

GLSM  0.71320.0031  0.75210.0007  0.75090.0016  0.75080.0001  0.75040.0012 
GAE  0.66340.0031  0.69310.0016  0.73720.0016  0.72410.0014  0.65910.0007 
AA  0.55010.0003  0.74470.0006  0.60330.0002  0.61290.0023  0.51110.0000 
MF  0.53950.0003  0.60550.0021  0.58540.0011  0.57170.0009  0.51230.0001 
GG  0.56590.0006  0.53540.0006  0.56110.0013  0.55500.0018  0.52450.0003 
TMF  0.51750.0001  0.50860.0001  0.54510.0002  0.52490.0002  0.51630.0002 
JC  0.53900.0002  0.67480.0019  0.57910.0000  0.59600.0015  0.51100.0000 
Model  CollegeMsg  Movielens  Bitcoin  AskUbuntu  Epinions 

GLSM  0.200.10  1.960.72  9.370.03  0.030.01  2.530.43 
GAE  0.250.00  2.600.04  9.973.46  0.310.01  2.560.20 
AA  0.900.00  3.270.02  11.540.16  0.970.01  4.190.01 
MF  0.580.00  3.020.03  11.090.24  0.560.01  3.710.01 
GG  0.610.00  3.340.03  11.230.22  0.590.01  3.710.01 
TMF  0.660.00  3.230.04  11.220.23  0.670.02  3.740.01 
JC  0.980.00  3.570.05  11.760.19  0.990.00  4.200.01 
We observe from Tables 2 and 3, the GLSM achieves the best performance of all the methods and its results are stable even on the sparse datasets with the edge degree lower than 0.002. This indicates that GLSM captures the temporal link formation pattern correctly, and the obtained patterns are indeed helpful in generating the potential future links.
Performance comparison in different parameters.
Figure 3 compares the ROCAUC results of all the methods on the “Epinions” data with different window sizes and the training ratios respectively. The links in our datasets are ordered chronologically to simulate the temporal link prediction task. Since most of the baselines can not directly capture this information, they perform barely (with AUC around 0.5) in this experiment.
We observe from Figure 3 that GLSM performs the best on all the different parameters. Figure 3 (a) indicates the sensitivities of the models to the different window sizes, and from Figure 3 (a), GLSM could reach a relative high AUC score even in the relatively small window size (2,000 links). Moreover, Figure 3 (b) demonstrates the abilities of the models to capture the future link patterns based on a small ratio of training data. In figure 3 (b), GLSM predicts the future links well with small training data ratio (under 0.5).
Hitting ratio analysis.
Since GLSM iteratively generates the future links which are different from the prediction scoring of the other methods, we analyze the quality of GLSM’s generated links in different iteration rounds. We provide a metric, the hitting ratio, to measure the ratio of the captured real future links by the output of GLSM. This experiment runs on all the datasets with 10,000 links window size. The training ratio is set to 0.7. The results are listed in Table 4, where is the edge density of the network constructed by the testing data. In this setting, the hitting ratio for the random method equals to the corresponding of a given dataset.
Iteration  CollegeMsg  Movielens  Bitcoin  AskUbuntu  Epinions 

500  0.0054  0.0407  0.0061  0.0054  0.0007 
1000  0.0063  0.0456  0.0049  0.0050  0.0019 
1500  0.0075  0.0499  0.0042  0.0043  0.0016 
2000  0.0053  0.0460  0.0035  0.0042  0.0017 
2500  0.0044  0.0446  0.0025  0.0038  0.0009 
3000  0.0043  0.0474  0.0030  0.0049  0.0006 
0.0041  0.0046  0.0015  0.0026  0.0002 
From Table 4, we observe that the hitting ratios of the generated links by GLSM significantly exceed the corresponding and the hitting ratios reach the max within 1,500 iterations on all the datasets. This shows that the links generated by GLSM cover the true positive future links well.
Sensitivity in different parameters.
We analyze the influence of different cluster and chunk numbers on the AUC performance. This experiment is completed on the “Movielens” dataset within the 1,000 link window which involves 586 nodes. We set the epoch to 20 and iteration round to 1,500. The result is shown in Figure 4. In this experiment, GLSM reaches the best performance when the chunk number is between 50 and 100 and its performance degrades when the cluster number is under 50 or above 250. This addresses the discussion in 3.1 about the difference between the basic tokenization and the selftokenization mechanism. That is, when the cluster number approximates the cluster number, the selftokenization mechanism can be reduced to the basic method in Definition 3. What’s more, the small cluster number leads to an alphabet with low generality which also degrades the prediction performance.
4.4 Case study: Early User More Popular
We analyze the clustering result obtained by the selftokenization process of GLSM on the onemode network from the “CollegeMsg” dataset. In this study, we order all users in “CollegeMsg” in the ascending order according to their registered time. Thus the early user has the high rank and vice versa. Then, we compute the average user rank for each cluster after the training of GLSM. We use the first 1,000 links of “CollegeMsg” network which relates to 237 users and set the clustering number to 5. Based on the settings, we analyze the relationship between the size of the cluster and the average user rank in the corresponding cluster. We show the results after five independent trainings in Figure 5. We observe that the average user ranks of the clusters are significantly different and the average user rank has a strong trend to diminish with the increase of the cluster size. Since the cluster size reflects the popularity for the corresponding cluster (community), this result agrees with the conclusion in Panzarasa et al.
that the early users (with small average user rank) in the social network are more popular. We further find that the variance of the average user rank in each cluster is smaller than the variance of the average user rank for all the users. This indicates that the registered time of the users in each cluster is close to each other. Therefore, the clustering result of the selftokenization process helps GLSM capture the link formation patterns between the different communities of users (or nodes) for the temporal network.
4.5 Case study: General Recommendation
We analyze the clustering result obtained by the selftokenization process of GLSM on the network from the “Movielens” dataset. In this study, we use the first 500 links of the “Movielens” dataset which relates to 286 nodes (including both the users and movies) and set the clustering number to 4. Since the “Movielens” is a bipartite network and our model can not directly be applied to the bipartite problem, we transform the bipartite network as a onemode graph by labeling the nodes in different sets with unique identities. To keep the bipartite information, we retain the map between the obtained identities and the original nodes. Based on this map, we further divide the clustering results to the user communities and the movie communities. Therefore, we obtain 8 communities after the training process of GLSM. We highlight 2 user communities and 2 movie communities from the results in Figure 6 respectively. We observe that the users or movies with similar network topological structures are classified to the same communities. After the tokenization with this clustering results, in the generating process, the 1st sampling process of Algorithm 2 will sample a token which is related to any two communities with different types (e.g. community 1 and community 2) and the 2nd sampling process of Algorithm 2 will generate a specific “usertomovie” link (recommendation) based on the selected communities from the 1st sampling. This shows that the selftokenization prunes the complete combination for all “usertomovie” links by utilizing the network topological structure to get the candidate link set and then generate the specific links based on the obtained candidate link set. Consequently, besides generating the abstract token sequence, the selftokenization mechanism also helps to improve the prediction performance for GLSM with the network topological information.
5 Related works
Link prediction is an ubiquitous problem in recommendation system Esslimani et al. (2011), social media Hisano (2018), medicine Lichtenwalter et al. (2010) and even finance Hisano (2018). Most of these methods are the discriminative models to classify the unknown links as the existing or the nonexisting links for the target networks Menon and Elkan (2011a)
. The mainstream methods for the link prediction include the classic methods based on the statistical empirical evidence and the machine learning methods. The classic methods
LibenNowell and Kleinberg (2007) Adamic and Adar (2003) Jeong et al. (2003) measure the similarities for between couples of nodes through their common neighbor numbers Newman (2001). The power law distribution of the complex network Barabási and Albert (1999) helps these similarity scoring methods perform well on distinguishing the positive and negative links statistically. However, since they lack the structure to get the network formation dynamics, they could not capture the accurate links for a realworld network when the statistical rules are not significant enough. To make the accurate prediction for a single node, many machine learning methods apply the matrix factorization Menon and Elkan (2011b) or the graph representation learning methods Perozzi et al. (2014). Their resulted latent representations for the users improve the result of the prediction performance on the userlevel. Some works further improve the prediction performance by applying a GAN framework to supervise the representing quality Wang et al. (2018). To increase the generality of the graph representation learning methods, recent methods combine the graph convolution and the graph autoencoder (GAE) Kipf and Welling (2016) to extend the embedding vectors to a higher dimension latent space. Recent work also applies the graph neural network to analyze the link prediction problem Zhang and Chen (2018). However, since most mainstream methods ignore the temporal information in the evolving networks, it is difficult for them to accurately distinguish the future positive and negative links with the limit data in a relatively small window. To utilize the temporal information, some work Yu et al. (2017) uses the slidingwindow style timedependent method to address the temporal link prediction issue. Furthermore, several works discuss the related problems as the temporal graph mining Yang et al. (2016) Li et al. (2018) by considering the links as streams Viard et al. (2016). Whereas, the existing works ignore the important information contained in the chronological order of the link emerging. We orders the links in the sequence with their emerging chronological order to simulate the real scenario and applies the sequence modeling framework Sutskever et al. (2014) to capture the temporal link formation patterns. The experiments show that this framework could capture the effective temporal link formation patterns and result in the good performance in predicting the future links based on the observed links.6 Conclusion
In this work, we propose the Generative Link Sequence Modeling (GLSM) to predict the future links based on the historical observation. GLSM combines an RNN process to learn the temporal link formation patterns in the sequence modeling framework and a twostep sampling link generation process to generate the future links. To transform the temporal link prediction to the framework of the sequence modeling, we propose the selftokenization mechanism to convert the binary link sequence to the unary token sequence with the proper granularity. The selftokenization process incorporates a clustering process which allows it generates the token sequence automatically. The clustering process also helps the resulted token sequence to capture the network topological information. The RNN process of GLSM learns the temporal link formation pattern from the resulted token sequence. Since the RNN process depends on the obtained token sequence from the selftokenization process, the RNN and selftokenization process could be trained simultaneously. With the learned temporal link formation pattern, GLSM generates the future links with the twostep sampling link generation process. Experimental results show that GLSM performs the best of all the mentioned methods on the realworld temporal networks and this verifies the temporal information contained in the chronological order for the links is useful in designing the link prediction models.
7 Acknowledgments
This work is done while Yue Wang is visiting the University of Illinois at Chicago. This work is supported by the National Natural Science Foundation of China (Grant No.61503422, 61602535), The Open Project Program of the National Laboratory of Pattern Recognition (NLPR) and Program for Innovation Research in Central University of Finance and Economics. This work is also supported in part by NSF through grants IIS1526499, IIS1763325, and CNS1626432, and NSFC 61672313.
References
 [1] (2003) Friends and neighbors on the web. Social networks 25 (3), pp. 211–230. Cited by: 1st item, §1, 1st item, §5.
 [2] (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §5.
 [3] (20050201) A tutorial on the crossentropy method. Annals of Operations Research 134 (1), pp. 19–67. External Links: ISSN 15729338 Cited by: §2.2.

[4]
(201102)
Temporal link prediction using matrix and tensor factorizations
. ACM Trans. Knowl. Discov. Data 5 (2), pp. 10:1–10:27. External Links: ISSN 15564681 Cited by: §1.  [5] (2011) Densifying a behavioral recommender system by social networks link prediction methods. Social Network Analysis and Mining 1 (3), pp. 159–172. Cited by: §1, §5.
 [6] (2002) Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311 (3), pp. 590 – 614. Cited by: 1st item.
 [7] (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. Cited by: 4th item.
 [8] (2005) Evolutionary dynamics on graphs. Nature 433, pp. 312–316. Cited by: §1.
 [9] (2018) Semisupervised graph embedding approach to dynamic link prediction. In Complex Networks IX, S. Cornelius, K. Coronges, B. Gonçalves, R. Sinatra, and A. Vespignani (Eds.), Cham, pp. 109–121. Cited by: §1, §5.
 [10] (2003) Measuring preferential attachment in evolving networks. EPL (Europhysics Letters) 61 (4), pp. 567. Cited by: 1st item, §1, §5.
 [11] (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. Cited by: §3.1.
 [12] (2016) Variational graph autoencoders. CoRR abs/1611.07308. Cited by: 5th item, §5.
 [13] (19971201) The economy as an evolving network. Journal of Evolutionary Economics 7 (4), pp. 339–353. Cited by: §1.
 [14] (2006) Empirical analysis of an evolving social network. Science 311 (5757), pp. 88–90. External Links: Document Cited by: §1.
 [15] (2016) Edge weight prediction in weighted signed networks. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pp. 221–230. Cited by: §4.1.
 [16] (2018) Persistent community search in temporal networks. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 1619, 2018, pp. 797–808. Cited by: §5.
 [17] (2007) The linkprediction problem for social networks. journal of the Association for Information Science and Technology 58 (7), pp. 1019–1031. Cited by: §1, 1st item, §5.
 [18] (2010) New perspectives and methods in link prediction. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 2528, 2010, pp. 243–252. Cited by: §5.
 [19] (2011) Link prediction via matrix factorization. Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Cited by: §1, §5.
 [20] (2011) Link prediction via matrix factorization. In Machine Learning and Knowledge Discovery in Databases, D. Gunopulos, T. Hofmann, D. Malerba, and M. Vazirgiannis (Eds.), Berlin, Heidelberg, pp. 437–452. External Links: ISBN 9783642237836 Cited by: 2nd item, §5.
 [21] (2010) Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 2630, 2010, pp. 1045–1048. Cited by: §1.
 [22] (2001) The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98 (2), pp. 404–409. External Links: Document Cited by: 1st item, §5.
 [23] Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. Journal of the American Society for Information Science and Technology 60 (5), pp. 911–932. Cited by: §4.4.
 [24] (2009) Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. JASIST 60 (5), pp. 911–932. Cited by: §4.1.
 [25] (2014) DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA, pp. 701–710. External Links: ISBN 9781450329569 Cited by: 4th item, §5.
 [26] (2018) GEMSEC: graph embedding with self clustering. CoRR abs/1802.03997. Cited by: §3.1.
 [27] (200603) Multiobjective hypergraphpartitioning algorithms for cut and maximum subdomaindegree minimization. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems 25 (3), pp. 504–517. External Links: Document, ISSN 02780070 Cited by: §3.1.
 [28] (2012) LSTM neural networks for language modeling. In INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 913, 2012, pp. 194–197. Cited by: §4.2.
 [29] (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3104–3112. Cited by: §1, §2.2, §5.
 [30] Cited by: §4.1.
 [31] (2015) LINE: largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, Republic and Canton of Geneva, Switzerland, pp. 1067–1077. External Links: ISBN 9781450334693 Cited by: §1, 4th item.
 [32] (2016) Computing maximal cliques in link streams. Theor. Comput. Sci. 609, pp. 245–252. Cited by: §5.

[33]
(2018)
GraphGAN: graph representation learning with generative adversarial nets.
In
Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 27, 2018
, Cited by: §1, 4th item, §5.  [34] (2005) Temporal sequence learning, prediction, and control: A review of different models and their relation to biological mechanisms. Neural Computation 17 (2), pp. 245–319. Cited by: §1.
 [35] (2016) Diversified temporal subgraph pattern mining. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 1317, 2016, pp. 1965–1974. Cited by: §5.
 [36] (2017) Temporally factorized network modeling for evolutionary network analysis. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 610, 2017, pp. 455–464. Cited by: 2nd item, §1, 3rd item, §5.
 [37] (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada., pp. 5171–5181. Cited by: §5.
Comments
There are no comments yet.