Log In Sign Up

Time-aware Random Walk Diffusion to Improve Dynamic Graph Learning

How can we augment a dynamic graph for improving the performance of dynamic graph neural networks? Graph augmentation has been widely utilized to boost the learning performance of GNN-based models. However, most existing approaches only enhance spatial structure within an input static graph by transforming the graph, and do not consider dynamics caused by time such as temporal locality, i.e., recent edges are more influential than earlier ones, which remains challenging for dynamic graph augmentation. In this work, we propose TiaRa (Time-aware Random Walk Diffusion), a novel diffusion-based method for augmenting a dynamic graph represented as a discrete-time sequence of graph snapshots. For this purpose, we first design a time-aware random walk proximity so that a surfer can walk along the time dimension as well as edges, resulting in spatially and temporally localized scores. We then derive our diffusion matrices based on the time-aware random walk, and show they become enhanced adjacency matrices that both spatial and temporal localities are augmented. Throughout extensive experiments, we demonstrate that TiaRa effectively augments a given dynamic graph, and leads to significant improvements in dynamic GNN models for various graph datasets and tasks.


The cover time of a biased random walk on a random cubic graph

We study a random walk that preferes touse unvisited edges in the contex...

Clustering for directed graphs using parametrized random walk diffusion kernels

Clustering based on the random walk operator has been proven effective f...

Diffusion Based Network Embedding

In network embedding, random walks play a fundamental role in preserving...

Compressing Deep Neural Networks: A New Hashing Pipeline Using Kac's Random Walk Matrices

The popularity of deep learning is increasing by the day. However, despi...

Bayesian Discovery of Threat Networks

A novel unified Bayesian framework for network detection is developed, u...

Dynamic social learning under graph constraints

We argue that graph-constrained dynamic choice with reinforcement can be...

Graphs, Entities, and Step Mixture

Existing approaches for graph neural networks commonly suffer from the o...


Dynamic graphs represent various real-world relationships that dynamically occur over time, e.g., friendships in an online social service, citations of scholarly papers, traffic flow on road networks, and financial transactions between traders. Learning such dynamic graphs has recently attracted considerable attention from machine learning communities 

Skarding et al. (2021); Han et al. (2021), and plays a crucial role in diverse applications such as link prediction Yang et al. (2021); Pareja et al. (2020), node or edge classification Xu et al. (2019); Pareja et al. (2020), time-series traffic forecasting Wu et al. (2020); Guo et al. (2019), knowledge completion Jung et al. (2021), and pandemic forecasting Panagopoulos et al. (2021)

. Over the last years, many researchers have put tremendous effort into developing interesting methods by sophisticatedly fusing GNNs and recurrent neural networks (RNN) or attention mechanisms for continuous-time 

Xu et al. (2020); Rossi et al. (2020) and discrete-time Seo et al. (2018); Pareja et al. (2020); Yang et al. (2021) dynamic graphs.

With the astonishing progress of deep neural networks for graph data, diverse augmentation techniques Zhao et al. (2022); Yoo et al. (2022b) have been proposed to increase the generalization power of GNN models, especially on a static graph. Previous approaches mainly transform the topological structure of the input graph. For example, drop-based methods stochastically remove a certain number of edges Rong et al. (2020) or nodes Feng et al. (2020)

at each training epoch in a similar manner to dropout regularization. On the contrary, diffusion-based methods 

Klicpera et al. (2019) insert additional edges having weights scored by graph diffusions such as Personalized PageRank (PPR), thereby augmenting a spatial locality around each node and improving graph convolution.

However, the aforementioned techniques assume to augment data within a static graph, and dynamic graph augmentation problem has not yet been comprehensively studied, especially for dynamic graphs represented in discrete-time domain. Unlikely static graphs, dynamic graphs change or evolve over time by their nature; thus, dynamic graph augmentation needs to simultaneously consider temporal dynamics as well as spatial structure. More specifically, as verified in previous works Rossi et al. (2020); Shin (2017); Lee et al. (2020), real-world dynamic graphs exhibit temporal locality indicating that graph objects such as nodes and triangles tend to be more affected by more recent edges than older ones, i.e., edges closer to a specific object in time are more likely to provide important information. Naively applying a static augmentation method to each time step cannot consider such a temporal locality.

In this work, we propose TiaRa (Time-aware Random Walk Diffusion), a novel diffusion-based augmentation method for a discrete-time dynamic graph which is represented by a temporal sequence of graph snapshots. TiaRa aims to augment both spatial and temporal localities of each graph snapshot. For this purpose, we design a time-aware random walk that a surfer randomly moves around nodes or a time-axis to measure spatially and temporally localized scores. We then derive time-aware random walk diffusion from the scores, and interpret it as the combination of spatial and temporal augmenters. Our diffusion matrices are used as augmented adjacency matrices for any dynamic GNN models in discrete-time domain. We further adopt approximate techniques such as power iteration and sparsification to reduce a heavy cost for computing the diffusion matrices.

Our contributions are summarized as follows:

  • Method. We propose TiaRa, a novel method for augmenting a dynamic graph based on time-aware random walks. TiaRa strengthens not only a spatial locality but also a temporal locality of the input dynamic graph so that dynamic GNNs perform better.

  • Analysis. We analyze how TiaRa augments both spatial and temporal localities (Theorem 1) and complexities of TiaRa (Theorem 2) in real dynamic graphs.

  • Experiments. We demonstrate that TiaRa effectively augments a given dynamic graph, and leads to consistent improvements in GNNs for temporal link prediction and node classification tasks.

Related Work

Augmentation for Static GNNs. Graph data augmentation Zhao et al. (2022) aims to reduce over-fitting for training GNN models by modifying an input graph. As representative approaches, DropEdge Rong et al. (2020) stochastically drops edges, and DropNode Feng et al. (2020) removes arbitrary nodes and their adjacent edges at each epoch. These augment the diversity of the input graph by randomly creating different copies sampled from the graph. GDC Klicpera et al. (2019) adds new edges weighted by a graph diffusion derived from node proximities. GDC boosts a spatial locality of the graph so that a GNN can consider adjacent nodes as well as distant ones during their convolutions, enhancing its representation power. Most of existing methods including the aforementioned ones are limited to augment dynamic graphs because they do not consider temporal properties.

GNNs and Augmentation for Dynamic Graphs. Dynamic graphs Kazemi et al. (2020) are categorized into two representations: discrete-time dynamic graphs (DTDG) and continuous-time dynamic graphs (CTDG) where a DTDG is represented as a sequence of graph snapshots with multiple discrete time steps while a CTDG is represented as a set of temporal edges whose time-stamps have continuous values. It is straightforward to convert a CTDG to a DTDG by distributing the continuous-time edges into multiple bins in chronological order, but the reverse is not possible because continuous-time values are generally lacked in most DTDGs Sankar et al. (2020); Yang et al. (2021), i.e., models for DTDGs can be applied to CTDGs, but the reverse is rather limited. Hence, we narrow our focus to representation learning on DTDGs in this work.

Dynamic GNNs have rapidly advanced under the framework that closely integrates GNNs and temporal sequence models such as RNNs to capture spatial and temporal relations on dynamic graphs Skarding et al. (2021). GCRN Seo et al. (2018) uses a GCN to produce node embeddings on each graph snapshot, and then forwards them to an LSTM for modeling temporal dynamics. STAR Xu et al. (2019) utilizes a GRU combined with spatial and temporal attentions. DySat Sankar et al. (2020) employs a self-attention strategy to aggregate spatial neighborhood and temporal dynamics. EvolveGCN Pareja et al. (2020) evolves the parameters of GCNs using RNNs. To consider hierarchical properties in real graphs, HTGN Yang et al. (2021) extends the framework to hyperbolic space.

Compared to the impressive progress of dynamic GNNs, it has not yet been extensively explored to augment dynamic graphs with the purpose of improving such models. As a related method, MeTA Wang et al. (2021) adaptively augments a temporal graph based on predictions of a temporal graph network, which perturbs time and removes or adds edges. However, it is difficult to employ MeTA for the aforementioned DTDG models because MeTA is designed for CTDGs requiring continuous-time values.


Random Walk with Restart (RWR). Our work is closely related to RWR Tong et al. (2006) which measures node-to-node similarity scores w.r.t. a seed node . The scores are spatially localized to  Nassar et al. (2015), i.e., scores of nearby nodes highly associated to are high while those of distant nodes are low. Based on this, diffusion methods such as GDC exploit RWR to augment a spatial locality.


be a vector of RWR scores w.r.t. the seed node

. Given a row-normalized adjacency matrix

and a restart probability

, the vector is represented as follows:

where is the random-walk normalized Laplacian matrix, and is the -th unit vector. Notice that is a column-stochastic transition matrix interpreted as a diffusion kernel that diffuses a given distribution such as on the graph through RWR.

Problem Formulation. A discrete-time dynamic graph (DTDG) is represented as a sequence of snapshots in a chronological order where is the number of time steps Skarding et al. (2021). Each snapshot is a weighted undirected graph with a shared set of nodes and a set of edges at time where is the number of nodes. is an initial node feature matrix where is a feature dimension, and denotes the sparse and self-looped adjacency matrix of . The node representation learning on the dynamic graph aims to learn a function parameterized by and produce low-dimensional hidden node embeddings for each time , represented as:


where is a normalized adjacency matrix of , and contains the latest hidden embeddings before time . Note that the definition of the DTDG and the learning scheme of Equation (1) are generally adopted in most existing methods Seo et al. (2018); Pareja et al. (2020); Yang et al. (2021); Sankar et al. (2020) for learning DTDGs where is usually designed by the combination of GNNs and RNNs.

Figure 1: Overall architecture of TiaRa. Given the adjacency matrix at time , TiaRa outputs a time-aware random walk diffusion matrix combined with spatial augmenter and temporal augmenter after spasficiation.

Our goal is to improve the performance of dynamic GNNs by augmenting the input data where the formal definition of the problem is described in Problem 1.

Problem 1 (Dynamic Graph Augmentation).

Given a temporal sequence of , the problem is to generate a sequence of new adjacency matrices improving the performance of a model . ∎

Proposed Method

Figure 2: Illustration of how a time-aware random surfer moves around a dynamic graph where dashed arrows indicate virtually connected edges along the time-axis.

We depict the overall framework of our method TiaRa in Figure 1. Given a temporal sequence of sparse adjacency matrices in a dynamic graph , our TiaRa aims to produce a time-aware random walk diffusion matrix for each time step using two diffusion based modules, called spatial and temporal augmenters.

The spatial augmenter results in a spatial diffusion matrix that enhances a spatial locality of through random walks. The temporal augmenter receives the previous that contains information squashed from the initial time to , and then disseminates it through at the current time . This leads to a temporal diffusion matrix in which a temporal locality is magnified. Finally, TiaRa linearly combines and , and sparsifies to form . We replace each adjacency matrix with for the inputs of dynamic GNN models. If necessary, we simply use edges of the graph in represented by without weights, or make the graph undirected by using after the sparsification.

Time-aware Random Walk with Restart

As described in the preliminaries section, RWR is used to measure node proximities in a graph. However, it is limited to directly employ RWR in a dynamic graph because RWR measures only spatially localized scores in a single static graph. In this section, we extend RWR to Time-aware Random Walk with Restart (TRWR) so that TRWR produces node-to-node scores which are spatially and temporally localized in the dynamic graph.

Our simple idea for TRWR is to virtually connect identical nodes from to for each time step as shown in Figure 2 so that a random surfer not only moves around the current but also jumps to the next , and the surfer becomes time-aware. In the beginning, the surfer starts from a seed node at the initial time step (e.g., ). After a few movements, suppose the surfer is at node in a graph snapshot . Then, the surfer takes one of the following actions:

  • Action 1) Random walk. The surfer randomly moves to one of the neighbors from node in the current graph with probability .

  • Action 2) Restart. The surfer goes back to the seed node in with probability .

  • Action 3) Time travel. The surfer does time travel from node in to that node in with probability .

where and are called restart and time travel probabilities, respectively, and . Note that we do not allow the surfer to move backward from to because the future information at time should be prevented when we make a prediction at time .

Through TRWR, the vector of stationary probabilities that the surfer visits each node from the seed node in is recursively represented as follows:


where is the -th unit vector of size . is a row-normalized matrix of (i.e., where is a self-looped adjacency matrix and is a diagonal out-degree matrix of ). If , we define as .

In the above equation, the random walk part propagates scores of over . The restart part makes the scores spatially localized around the seed node , which is controlled by . The time travel part injects scores of the previous to make temporally localized, which is controlled by . Notice TRWR extends RWR to a discrete-time dynamic graph, i.e., leads to RWR scores on each graph snapshot without considering temporal information.

Time-aware Random Walk Diffusion Matrices

In Equation (2), is an

-dimensional column vector of a probability distribution w.r.t. a seed node

. For all seed nodes , we horizontally stack to form such that is the -th column vector of , i.e., . We call a time-aware random walk diffusion matrix at time . The derivation of starts by moving the term of the random walk to the left side in Equation (2) as follows:

Let where is an identity matrix, and as described above. Thus, of the above equation is written as the following:


where for , and because is defined as .

Spatial and Temporal Augmenters. We obtain the recurrence relation of from Equation (3), and further rearrange it to interpret the process as follows:

In the above, we set which is the diffusion kernel by RWR on the graph where its restart probability is . Let where ; then, is represented as follows:


where is a spatial diffusion matrix, and is a temporal diffusion matrix. In TiaRa, the spatial augmenter computes , and the temporal augmenter computes .

The meaning of is the result of diffusing the -th column of through for each node . This is interpreted as the augmentation of a spatial locality of each node through RWR within . On the other hand, is the result of diffusing of through for each node . Note that contains the probabilities that the surfer visits each node starting from node during the travel from the initial time to . Thus, it spreads the past proximities of in the current through , which consequently reflects the temporal information to .

The final diffusion matrix is a convex combination between and w.r.t. , which is denoted by in Figure 1. Notice that is a column stochastic transition matrix for every time step , which is proved in Lemma 1, implying that as an augmented adjacency matrix, can be replaced with for the input of GNNs in Equation (1).

Interpretation. We further analyze how reflects the spatial and temporal information of the input dynamic graph. For this purpose, we first obtain the closed-form expression of which is described in Theorem 1.

Theorem 1.

The closed-form expression of is:


where for , and .


It is proved by mathematical induction, and the detailed proof is described in Appendix A. ∎

In the theorem, indicates a random walk diffusion traveling from former time toward latter time . According to Equation (5), is concisely represented as:

Note that is more affected by the information close to time than that passed from the distant past. The influence of is decayed by as time is further away from time , while it is emphasized as time is near to time where is interpreted as a temporal decay ratio. This explanation is consistent with the temporal locality, i.e., the tendency that recent edges are more influential than older ones. Combined with the spatial diffusion , the result of augments both spatial and temporal localities in .

Discussion. TiaRa is a generalized version of GDC with PPR kernel to dynamic graphs since TiaRa with spatially augments data within a single at each time, which is exactly what GDC does. However, GDC does not consider temporal information for its augmentation, and it performs worse than TiaRa as shown in Tables 2 and  3. Discussions on addition/deletion of nodes/edges w.r.t. augmentation are described in Appendix B.

Algorithm for TiaRa

Most graph diffusions involve heavy computational cost, especially for a large graph, and result in a dense matrix. The computation of also exhibits the same issue, and thus we adopt approximate techniques to alleviate the problem. Including the approximate strategies, the procedure of TiaRa is summarized in Algorithm 1 where is set to .

Power iteration. The main bottleneck for obtaining is to compute the matrix inversion of in Equation (4), which requires time. Instead of directly calculating the inversion, we use power iteration (lines ) based on the following Yoon et al. (2018):

where , and is the number of iterations. Let be the result after iterations; then, it is recursively represented as follows:

where and . Note is a normalized adjacency matrix (line 1) in which self-loops are added, as traditional GNNs usually do. The approximation error is bounded by , and converges to as as shown in Lemma 2. After that, we set (line 13).

At a glance, each iteration seems to take time for the matrix multiplication, but it is much faster than that since each snapshot is sparse in most cases. More specifically, only a few nodes form edges at each time step in real graphs. We call such nodes activated where is the set of activated nodes at time , and . In each , a surfer can move only between activated nodes, i.e., only pairs of nodes in are diffused. As seen in Table 1, the average of over time is smaller than except the Brain dataset.

This allows us to do the power iteration on the sub-matrix of for nodes in where is the number of non-zeros of . Then, an iteration takes time for a sparse matrix multiplication. Note is linearly proportional to in real graphs, i.e., where is a constant. Let be the average number of edges over time. As seen in Table 1, is smaller than . Thus, each iteration takes time; overall, it takes time and space for (as only nodes are diffused). More details are in Lemma 4.

Sparsification. Another bottleneck is that is likely to be dense by repeatedly multiplying (line 4) as time increases where . This could be problematic in terms of space as well as running time, especially for graph convolutions since is used as an adjacency matrix. To alleviate this issue, we adopt a sparsification technique suggested in Klicpera et al. (2019). As established in Theorem 1, the graph structure of is spatially and temporally localized, which allows us to drop small entries of , resulting in the sparse . For this purpose, we use a filtering threshold to set values of below to zero (line 6). This strategy has two advantages. First, it keeps sparse at each time. Second, it reduces the cost for processing as and are sparse. After the sparsification, we normalize (line 7) column-wise. As seen in Figure 4, this sparsification makes the augmentation process fast and lightweight with tiny errors while it does not harm predictive accuracy too much, or can even improve.

Theorem 2 (Complexity Analysis).

For each time step , TiaRa takes time on average, and produces consuming space where is the number of total nodes, is the number of activated nodes at time , is the number of iterations, and is a filtering threshold.


The proof is provided in Appendix A. ∎

1:adjacency matrix , previous time-aware diffusion matrix , restart probability , time travel probability , number of iterations, filtering threshold
2:time-aware diffusion matrix
3: where ()
4: Power-Iteration(, , , )
5: Spatial augmenter
6: Temporal augmenter
7: where
8: filter entries of if their weights are
9:normalize column-wise
11:function Power-Iteration(, , , )
12:     set and
13:     for  to  do
15:      where
16:     normalize column-wise and return
17:end function
Algorithm 1 TiaRa at time
BitcoinAlpha 3,783 31,748 138 2 105 2.2
WikiElec 7,125 212,854 100 2 354 6.0
RedditBody 35,776 484,460 88 2 2,465 2.2
Brain 5,000 1,955,488 12 10 5,000 32.6
DBLP-3 4,257 23,540 10 3 782 3.0
DBLP-5 6,606 42,815 10 5 1,212 3.5
Reddit 8,291 264,050 10 4 2,071 12.8
Table 1: Summary of datasets. and are the total numbers of nodes and edges, resp. and are the numbers of time steps and labels, resp. and are the average numbers of activate nodes and edges over time, resp. . The first 3 data are used for link prediction, and the others are for node classification.

Discussion. Theorem 2 implies TiaRa shows faster than , and uses space less than for storing in most real dynamic graphs. Nevertheless, its time complexity can reach for a graph such as the Brain dataset; thus, for larger graphs, its scalability can be limited. However, TiaRa is based on matrix operations which are easy-to-accelerate using GPUs, and other diffusion methods such as GDC lie at the same complexity. Furthermore, there are extensive works of efficient RWR computations Andersen et al. (2006); Jung et al. (2017); Shin et al. (2015); Wang et al. (2017); Hou et al. (2021) and accelerated multiplications of sparse matrices Srivastava et al. (2020), which can make TiaRa scalable. In this work, we focus on effectively augmenting a dynamic graph, and leave further computational optimization on the augmentation as future work.


In this section, we evaluate TiaRa to show its effectiveness for the augmentation problem for dynamic graphs.

Experimental Setting

[t] AUC BitcoinAlpha WikiElec RedditBody GCN GCRN EGCN GCN GCRN EGCN GCN GCRN EGCN None 57.3±1.6 80.3±6.0 58.8±1.1 59.9±0.9 72.1±2.4 66.9±3.7 77.6±0.4 88.9±0.3 77.6±0.2 DropEdge 56.3±1.0 73.9±2.2 57.4±0.9 50.1±1.0 56.0±9.3 47.9±6.4 73.0±0.4 77.0±1.7 71.9±0.7 GDC 57.5±1.6 77.3±6.5 57.4±1.2 62.8±0.8 67.9±1.0 63.1±0.7 74.6±0.0 86.4±0.3 73.8±0.3 Merge 66.8±2.6 93.1±0.4 61.0±9.2 60.6±1.7 68.4±3.2 60.7±1.3 69.7±0.7 89.8±0.5 80.3±0.5 TiaRa 76.0±1.3 94.6±0.8 77.2±1.4 69.0±1.2 73.4±2.2 69.1±0.3 80.8±0.6 90.2±0.4 82.0±0.1

Table 2: Temporal link prediction accuracy (AUC) where None is a result without augmentation, and (or ) indicates improvement (or degradation) compared to None. TiaRa shows consistent improvement across most models and datasets.

Datasets. Table 1 summarizes public datasets used in this work. BitcoinAlpha is a social network between bitcoin users Kumar et al. (2016, 2018b). WikiElec is a voting network for Wikipedia adminship elections Leskovec et al. (2010). RedditBody is a hyperlink network of connections between two subreddits Kumar et al. (2018a). For node classification, we use the following datasets evaluated in Xu et al. (2019). Brain is a network of brain tissues where edges indicate their connectivities. DBLP-3 and DBLP-5 are co-authorship networks extraced from DBLP. Reddit is a post network where two posts were connected if they contain similar keywords.

Baseline augmentation methods. We compare TiaRa to the following baselines. None indicates the result of a model without any augmentation. DropEdge is a drop-based method randomly removing edges at each epoch. GDC is a graph diffusion method where we use PPR for this as our approach is based on random walks. Merge is a simple baseline merging adjacency matrices from time to when training a model at time . We apply DropEdge and GDC to each snapshot since they are designed for a static graph.

Baseline GNNs. We use GCN Kipf and Welling (2017), GCRN Seo et al. (2018) and EvolveGCN Pareja et al. (2020), abbreviated to EGCN, for performing graph tasks. We naively apply a static GCN to each graph snapshot for verifying how temporal information is informative. We choose GCRN and EvolveGCN, lightweight and popular dynamic GNN models showing decent performance, to observe practical gains from augmentation. We adopt GCN layers for GCRN’s graph convolution. Note any GNN models following Problem 1 can utilize TiaRa because our approach is model-agnostic.

Training details.

For each dataset, we tune the hyperparameters of all models on the original graph (marked as None) and modified graphs of augmentation methods separately through a combination of grid and random search on a validation set, and report test accuracy at the best validation epoch. For

TiaRa, we fix to , search for in , and tune and in s.t. . We use the Adam optimizer with weight decay , and the learning rate is tuned in with decay factor . The dropout probability is searched in . We repeat each experiment

times with different random seeds, and report the average and standard deviation of test values. We use PyTorch to implement all models. All experiments were done at workstations with Intel Xeon 4215R and RTX 3090. Details about datasets and hyperparameters are reported

in Appendix D.

Temporal Link Prediction Task

This aims to predict whether an edge exists or not at time using the information up to time . As a standard setting Pareja et al. (2020), we follow a chronological split with ratios of training (70%), validation (10%), and test (20%) sets. We sample the same amount of negative samples (edges) to positive samples (edges) for each time, and use AUC as a representative measure. We set the number of epochs to with early stopping of patience .

As shown in Table 2, TiaRa consistently improves the performance of dynamic GNN models such as GCRN and EGCN compared to None (i.e., without augmentation) while static augmentations of DropEdge and GDC do not. TiaRa also outperforms the static methods on all models and datasets. This indicates it is not beneficial to only spatially augment the graphs for this task. TiaRa even improves static GCN, which is competitive with EGCN, implying that effectively and temporally augmented data can even make static GNNs learn dynamic graphs well. In addition, Merge also improves the accuracy of the tested models on many datasets. This confirms the need to utilize temporal information when it comes to dynamic graph augmentation in this task. However, Merge performs worse than TiaRa in most cases because TiaRa can effectively augment both spatial and temporal localities at once while Merge does not have a mechanism to enhance such localities.


Table 3: Node classification accuracy (Macro F1-score) where None is a result without augmentation, and (or ) indicates improvement (or degradation) compared to None. TiaRa shows consistent improvement across most models and datasets.
Macro F1 Brain Reddit DBLP-3 DBLP-5
None 44.7±0.8 66.8±1.0 43.4±0.7 18.2±2.9 40.4±1.6 18.6±2.3 53.4±2.6 83.1±0.6 51.3±2.7 69.6±0.9 75.4±0.7 68.5±0.6
DropEdge 35.2±1.7 67.8±0.6 39.7±1.8 19.4±0.8 40.3±1.4 18.0±2.7 55.8±1.9 84.3±0.6 52.4±1.7 70.5±0.5 75.6±0.7 68.0±0.7
GDC 63.2±1.2 88.0±1.5 67.3±1.3 17.5±2.3 41.0±1.6 18.5±2.8 53.4±2.1 84.7±0.5 52.8±2.2 70.0±0.7 75.5±1.2 69.1±1.0
Merge 34.4±3.4 63.2±1.6 53.0±0.9 19.3±3.0 39.6±0.8 20.4±3.0 54.9±3.1 83.0±1.4 53.3±1.2 70.8±0.4 74.5±0.8 69.7±1.6
TiaRa 68.7±1.2 91.3±1.0 72.0±0.6 18.4±3.0 41.5±1.5 21.9±1.6 57.5±2.2 84.9±1.6 56.4±1.8 71.1±0.6 77.9±0.4 70.1±1.0

Node Classification Task

This is to classify a label of each node where a graph and features change over time. Following 

Xu et al. (2019), we split all nodes into training, validation, and test sets by the 7:1:2 ratio. We feed node embeddings of each model forward to a softmax classifier, and use Macro F1-score because labels are imbalanced in each dataset. We set the number of epochs to with early stopping of patience .

Table 3 shows TiaRa consistently improves the accuracies of GNNs on most datasets. Especially, TiaRa significantly enhances the accuracies on the Brain dataset as another diffusion method GDC does, but TiaRa shows better accuracy than GDC, implying it is effective to augment a temporal locality for the performance. For the other datasets, TiaRa slightly improves each model, but it overall performs better than other augmentations. Note GCN and EGCN are worse than a random classifier of score (0.25 for ) in the Reddit where is the number of labels, and all tested augmentations fail to beat the score, implying even these augmentations could not boost a poor model in this task.

Effect of Hyperparameters

We analyze the effects of temporal decay ratio and filtering threshold that mainly affect TiaRa’s results. We fix the number of iterations to for the power iteration, which leads to sufficiently accurate results for .

Figure 3: Effect of the temporal decay ratio .

Effect of the temporal decay ratio . As TiaRa’s hyperparameters, and should be analyzed, but our preliminary experiments showed that patterns vary by models and datasets in the changes of and . Instead, we narrow our focus to in Equation (4) where . For this experiment, we vary from to by tweaking and s.t. is fixed to . Figure 3 shows that too small or large values of can degrade link prediction accuracy except GCRN with TiaRa in BitcoinAlpha. This implies that it is important to properly mix spatial and temporal information about the performance, which is controlled by .

Figure 4: Effect of the filtering threshold .

Effect of the filtering threshold . Figure 4 shows the effects of in terms of approximate error, time, space, and accuracy of link prediction in BitcoinAlpha and WikiElec. We fix and to 0.25, and vary from to for this experiment. We measure the approximate error

of eigenvalues where

and are vectors of eigenvalues of (i.e., ) and , respectively, as similarly analyzed in Klicpera et al. (2019). The right y-axis of Figures 4(a) and (b) is the error, and the left y-axis is the number of edges in . As time increases, the errors (red and blue lines) remain small, and do not explode, implying errors incurred by repeated sparsifications are not excessively accumulated over time. Rather, the errors tend to be proportional to at each time.

Figures 4 (c) and (d) show the space measured by (left y-axis) and the augmentation time (right y-axis) of TiaRa by between and . As the strength of sparsification increase (i.e., becomes larger), the produced non-zeros and the augmentation time decrease. On the other hand, most of the accuracies remain similar except as shown in Figures 4(e) and (f). Note it is not effective to truncate too many entries (e.g., ), or too dense can worse the performance as GCN in WikiElec. Thus, the sparsification with proper such as or provides a good trade-off between error, time, space, and accuracy.


In this work, we propose TiaRa, a novel diffusion-based method for augmenting a dynamic graph with the purpose of improvements in dynamic GNN models. We first extend Random Walk with Restart to Time-aware RWR so that it produces spatially and temporally localized scores. We then formulate time-aware random walk diffusion matrices, and analyze how our diffusion approach augments both spatial and temporal localities in the dynamic graph. As graph diffusions lead to dense matrices, we further employ approximate techniques such as power iteration and sparsification, and analyze how they are effective for achieving a good trade-off between error, time, space, and predictive accuracy. Our experiments on various real-world dynamic graphs show that TiaRa aids GNN models in providing better performance of temporal link prediction and node classification tasks.


  • R. Andersen, F. Chung, and K. Lang (2006) Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), Cited by: Algorithm for TiaRa.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, External Links: Link Cited by: 2nd item.
  • W. Feng, J. Zhang, Y. Dong, Y. Han, H. Luan, Q. Xu, Q. Yang, E. Kharlamov, and J. Tang (2020)

    Graph random neural networks for semi-supervised learning on graphs

    Advances in neural information processing systems. Cited by: Introduction, Related Work.
  • S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In

    Proceedings of the AAAI conference on artificial intelligence

    Cited by: Appendix B, Introduction.
  • Y. Han, G. Huang, S. Song, L. Yang, H. Wang, and Y. Wang (2021) Dynamic neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: Introduction.
  • G. Hou, X. Chen, S. Wang, and Z. Wei (2021) Massively parallel algorithms for personalized pagerank. Proceedings of the VLDB Endowment 14 (9), pp. 1668–1680. Cited by: Algorithm for TiaRa.
  • J. Jung, J. Jung, and U. Kang (2021)

    Learning to walk across time for interpretable temporal knowledge graph completion

    In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 786–795. Cited by: Introduction.
  • J. Jung, W. Jin, L. Sael, and U. Kang (2016) Personalized ranking in signed networks using signed random walk with restart. In IEEE 16th International Conference on Data Mining, ICDM, F. Bonchi, J. Domingo-Ferrer, R. Baeza-Yates, Z. Zhou, and X. Wu (Eds.), pp. 973–978. Cited by: Appendix B.
  • J. Jung, N. Park, L. Sael, and U. Kang (2017) BePI: fast and memory-efficient method for billion-scale random walk with restart. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, pp. 789–804. Cited by: Algorithm for TiaRa.
  • J. Jung, J. Yoo, and U. Kang (2022) Signed random walk diffusion for effective representation learning in signed graphs. Plos one 17 (3), pp. e0265001. Cited by: Appendix B.
  • S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart (2020) Representation learning for dynamic graphs: a survey.. J. Mach. Learn. Res. 21 (70). Cited by: Related Work.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: 1st item, 2nd item, Experimental Setting.
  • J. Klicpera, S. Weißenberger, and S. Günnemann (2019) Diffusion improves graph learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Cited by: 2nd item, Appendix D, Introduction, Related Work, Algorithm for TiaRa, Effect of Hyperparameters.
  • S. Kumar, W. L. Hamilton, J. Leskovec, and D. Jurafsky (2018a) Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, Cited by: 3rd item, Experimental Setting.
  • S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, and V. Subrahmanian (2018b) Rev2: fraudulent user prediction in rating platforms. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Cited by: 1st item, Experimental Setting.
  • S. Kumar, F. Spezzano, V. Subrahmanian, and C. Faloutsos (2016) Edge weight prediction in weighted signed networks. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 221–230. Cited by: Experimental Setting.
  • D. Lee, K. Shin, and C. Faloutsos (2020) Temporal locality-aware sampling for accurate triangle counting in real graph streams. VLDB J. 29 (6), pp. 1501–1525. Cited by: Introduction.
  • J. Leskovec, D. Huttenlocher, and J. Kleinberg (2010) Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web, Cited by: 2nd item, Experimental Setting.
  • H. Nassar, K. Kloster, and D. F. Gleich (2015) Strong localization in personalized pagerank vectors. In International Workshop on Algorithms and Models for the Web-Graph, Cited by: Preliminaries.
  • G. Panagopoulos, G. Nikolentzos, and M. Vazirgiannis (2021) Transfer graph neural networks for pandemic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: Introduction.
  • A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. B. Schardl, and C. E. Leiserson (2020) EvolveGCN: evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: 3rd item, Introduction, Related Work, Preliminaries, Experimental Setting, Temporal Link Prediction Task.
  • M. Raghavendra, K. Sharma, S. Kumar, et al. (2022) Signed link representation in continuous-time dynamic signed networks. arXiv preprint arXiv:2207.03408. Cited by: Appendix B.
  • Y. Rong, W. Huang, T. Xu, and J. Huang (2020) DropEdge: towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, Cited by: 1st item, Introduction, Related Work.
  • E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein (2020)

    Temporal graph networks for deep learning on dynamic graphs

    In ICML 2020 Workshop on Graph Representation Learning, Cited by: Introduction, Introduction.
  • A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang (2020) DySAT: deep neural representation learning on dynamic graphs via self-attention networks. In Proceedings of the 13th International Conference on Web Search and Data Mining, Cited by: Related Work, Related Work, Preliminaries.
  • Y. Seo, M. Defferrard, P. Vandergheynst, and X. Bresson (2018) Structured sequence modeling with graph convolutional recurrent networks. In International conference on neural information processing, Cited by: 2nd item, Introduction, Related Work, Preliminaries, Experimental Setting.
  • K. Shin, J. Jung, L. Sael, and U. Kang (2015) BEAR: block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1571–1585. Cited by: Algorithm for TiaRa.
  • K. Shin (2017) Wrs: waiting room sampling for accurate triangle counting in real graph streams. In 2017 IEEE International Conference on Data Mining (ICDM), Cited by: Introduction.
  • J. Skarding, B. Gabrys, and K. Musial (2021) Foundations and modeling of dynamic networks using dynamic graph neural networks: a survey. IEEE Access. Cited by: Appendix B, Introduction, Related Work, Preliminaries.
  • N. Srivastava, H. Jin, J. Liu, D. Albonesi, and Z. Zhang (2020) Matraptor: a sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Cited by: Algorithm for TiaRa.
  • H. Tong, C. Faloutsos, and J. Pan (2006) Fast random walk with restart and its applications. In Sixth international conference on data mining (ICDM’06), Cited by: Preliminaries.
  • S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang (2017) FORA: simple and effective approximate single-source personalized pagerank. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 505–514. Cited by: Algorithm for TiaRa.
  • Y. Wang, Y. Cai, Y. Liang, H. Ding, C. Wang, S. Bhatia, and B. Hooi (2021) Adaptive data augmentation on temporal graphs. Advances in Neural Information Processing Systems. Cited by: Related Work.
  • Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang (2020) Connecting the dots: multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, Cited by: Appendix B, Introduction.
  • D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, and K. Achan (2020) Inductive representation learning on temporal graphs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, Cited by: Introduction.
  • D. Xu, W. Cheng, D. Luo, X. Liu, and X. Zhang (2019) Spatio-temporal attentive rnn for node classification in temporal attributed graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI, Cited by: 1st item, 2nd item, 3rd item, Appendix D, Appendix D, Introduction, Related Work, Experimental Setting, Node Classification Task.
  • M. Yang, M. Zhou, M. Kalander, Z. Huang, and I. King (2021) Discrete-time temporal network embedding via implicit hierarchical learning in hyperbolic space. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Cited by: Introduction, Related Work, Related Work, Preliminaries.
  • J. Yoo, H. Jeon, J. Jung, and U. Kang (2022a)

    Accurate node feature estimation with structured variational graph autoencoder

    In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2336–2346. Cited by: Appendix D.
  • J. Yoo, S. Shim, and U. Kang (2022b) Model-agnostic augmentation for accurate graph classification. In WWW ’22: The ACM Web Conference 2022, pp. 1281–1291. Cited by: Introduction.
  • M. Yoon, J. Jung, and U. Kang (2018) Tpa: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), Cited by: Algorithm for TiaRa.
  • B. Yu, H. Yin, and Z. Zhu (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In IJCAI, Cited by: Appendix B.
  • T. Zhao, G. Liu, S. Günnemann, and M. Jiang (2022) Graph data augmentation for graph machine learning: a survey. arXiv preprint arXiv:2202.08871. Cited by: Introduction, Related Work.

Appendix A Proofs

Lemma 1.

For every time step , is column stochastic.


As a base case, for , which is trivially column stochastic. Assume is column stochastic, i.e., where is a column vector of ones. Then, is written as follows:

where . Therefore, the claim holds for every by mathematical induction. ∎

Proof of Theorem 1. We prove Theorem 1 as follows:


We begin the derivation from the following equation:


As a base case, is ( in the above) as follows:

where . This trivially holds Equation (5). Let’s assume that Equation (5) holds at . Then, by substituting of Equation (5) into Equation (4), is:

Suppose ; then, is represented as follows:

Note that the equation of has the same form of that of in Equation (4). This indicates Equation (4) also holds at . Thus, the claim holds for every by mathematical induction. ∎

Lemma 2.

Suppose and . The approximate error of the following power iteration is bounded by , and converges to as :

where is column stochastic, and .


Let be the stationary matrix of the equation. Then, the iteration is represented as , implying . Then, the error is represented as follows:

where is L1 norm of a matrix. Note