Introduction
Dynamic graphs represent various realworld relationships that dynamically occur over time, e.g., friendships in an online social service, citations of scholarly papers, traffic flow on road networks, and financial transactions between traders. Learning such dynamic graphs has recently attracted considerable attention from machine learning communities
Skarding et al. (2021); Han et al. (2021), and plays a crucial role in diverse applications such as link prediction Yang et al. (2021); Pareja et al. (2020), node or edge classification Xu et al. (2019); Pareja et al. (2020), timeseries traffic forecasting Wu et al. (2020); Guo et al. (2019), knowledge completion Jung et al. (2021), and pandemic forecasting Panagopoulos et al. (2021). Over the last years, many researchers have put tremendous effort into developing interesting methods by sophisticatedly fusing GNNs and recurrent neural networks (RNN) or attention mechanisms for continuoustime
Xu et al. (2020); Rossi et al. (2020) and discretetime Seo et al. (2018); Pareja et al. (2020); Yang et al. (2021) dynamic graphs.With the astonishing progress of deep neural networks for graph data, diverse augmentation techniques Zhao et al. (2022); Yoo et al. (2022b) have been proposed to increase the generalization power of GNN models, especially on a static graph. Previous approaches mainly transform the topological structure of the input graph. For example, dropbased methods stochastically remove a certain number of edges Rong et al. (2020) or nodes Feng et al. (2020)
at each training epoch in a similar manner to dropout regularization. On the contrary, diffusionbased methods
Klicpera et al. (2019) insert additional edges having weights scored by graph diffusions such as Personalized PageRank (PPR), thereby augmenting a spatial locality around each node and improving graph convolution.However, the aforementioned techniques assume to augment data within a static graph, and dynamic graph augmentation problem has not yet been comprehensively studied, especially for dynamic graphs represented in discretetime domain. Unlikely static graphs, dynamic graphs change or evolve over time by their nature; thus, dynamic graph augmentation needs to simultaneously consider temporal dynamics as well as spatial structure. More specifically, as verified in previous works Rossi et al. (2020); Shin (2017); Lee et al. (2020), realworld dynamic graphs exhibit temporal locality indicating that graph objects such as nodes and triangles tend to be more affected by more recent edges than older ones, i.e., edges closer to a specific object in time are more likely to provide important information. Naively applying a static augmentation method to each time step cannot consider such a temporal locality.
In this work, we propose TiaRa (Timeaware Random Walk Diffusion), a novel diffusionbased augmentation method for a discretetime dynamic graph which is represented by a temporal sequence of graph snapshots. TiaRa aims to augment both spatial and temporal localities of each graph snapshot. For this purpose, we design a timeaware random walk that a surfer randomly moves around nodes or a timeaxis to measure spatially and temporally localized scores. We then derive timeaware random walk diffusion from the scores, and interpret it as the combination of spatial and temporal augmenters. Our diffusion matrices are used as augmented adjacency matrices for any dynamic GNN models in discretetime domain. We further adopt approximate techniques such as power iteration and sparsification to reduce a heavy cost for computing the diffusion matrices.
Our contributions are summarized as follows:

Method. We propose TiaRa, a novel method for augmenting a dynamic graph based on timeaware random walks. TiaRa strengthens not only a spatial locality but also a temporal locality of the input dynamic graph so that dynamic GNNs perform better.

Experiments. We demonstrate that TiaRa effectively augments a given dynamic graph, and leads to consistent improvements in GNNs for temporal link prediction and node classification tasks.
Related Work
Augmentation for Static GNNs. Graph data augmentation Zhao et al. (2022) aims to reduce overfitting for training GNN models by modifying an input graph. As representative approaches, DropEdge Rong et al. (2020) stochastically drops edges, and DropNode Feng et al. (2020) removes arbitrary nodes and their adjacent edges at each epoch. These augment the diversity of the input graph by randomly creating different copies sampled from the graph. GDC Klicpera et al. (2019) adds new edges weighted by a graph diffusion derived from node proximities. GDC boosts a spatial locality of the graph so that a GNN can consider adjacent nodes as well as distant ones during their convolutions, enhancing its representation power. Most of existing methods including the aforementioned ones are limited to augment dynamic graphs because they do not consider temporal properties.
GNNs and Augmentation for Dynamic Graphs. Dynamic graphs Kazemi et al. (2020) are categorized into two representations: discretetime dynamic graphs (DTDG) and continuoustime dynamic graphs (CTDG) where a DTDG is represented as a sequence of graph snapshots with multiple discrete time steps while a CTDG is represented as a set of temporal edges whose timestamps have continuous values. It is straightforward to convert a CTDG to a DTDG by distributing the continuoustime edges into multiple bins in chronological order, but the reverse is not possible because continuoustime values are generally lacked in most DTDGs Sankar et al. (2020); Yang et al. (2021), i.e., models for DTDGs can be applied to CTDGs, but the reverse is rather limited. Hence, we narrow our focus to representation learning on DTDGs in this work.
Dynamic GNNs have rapidly advanced under the framework that closely integrates GNNs and temporal sequence models such as RNNs to capture spatial and temporal relations on dynamic graphs Skarding et al. (2021). GCRN Seo et al. (2018) uses a GCN to produce node embeddings on each graph snapshot, and then forwards them to an LSTM for modeling temporal dynamics. STAR Xu et al. (2019) utilizes a GRU combined with spatial and temporal attentions. DySat Sankar et al. (2020) employs a selfattention strategy to aggregate spatial neighborhood and temporal dynamics. EvolveGCN Pareja et al. (2020) evolves the parameters of GCNs using RNNs. To consider hierarchical properties in real graphs, HTGN Yang et al. (2021) extends the framework to hyperbolic space.
Compared to the impressive progress of dynamic GNNs, it has not yet been extensively explored to augment dynamic graphs with the purpose of improving such models. As a related method, MeTA Wang et al. (2021) adaptively augments a temporal graph based on predictions of a temporal graph network, which perturbs time and removes or adds edges. However, it is difficult to employ MeTA for the aforementioned DTDG models because MeTA is designed for CTDGs requiring continuoustime values.
Preliminaries
Random Walk with Restart (RWR). Our work is closely related to RWR Tong et al. (2006) which measures nodetonode similarity scores w.r.t. a seed node . The scores are spatially localized to Nassar et al. (2015), i.e., scores of nearby nodes highly associated to are high while those of distant nodes are low. Based on this, diffusion methods such as GDC exploit RWR to augment a spatial locality.
Let
be a vector of RWR scores w.r.t. the seed node
. Given a rownormalized adjacency matrixand a restart probability
, the vector is represented as follows:where is the randomwalk normalized Laplacian matrix, and is the th unit vector. Notice that is a columnstochastic transition matrix interpreted as a diffusion kernel that diffuses a given distribution such as on the graph through RWR.
Problem Formulation. A discretetime dynamic graph (DTDG) is represented as a sequence of snapshots in a chronological order where is the number of time steps Skarding et al. (2021). Each snapshot is a weighted undirected graph with a shared set of nodes and a set of edges at time where is the number of nodes. is an initial node feature matrix where is a feature dimension, and denotes the sparse and selflooped adjacency matrix of . The node representation learning on the dynamic graph aims to learn a function parameterized by and produce lowdimensional hidden node embeddings for each time , represented as:
(1) 
where is a normalized adjacency matrix of , and contains the latest hidden embeddings before time . Note that the definition of the DTDG and the learning scheme of Equation (1) are generally adopted in most existing methods Seo et al. (2018); Pareja et al. (2020); Yang et al. (2021); Sankar et al. (2020) for learning DTDGs where is usually designed by the combination of GNNs and RNNs.
Our goal is to improve the performance of dynamic GNNs by augmenting the input data where the formal definition of the problem is described in Problem 1.
Problem 1 (Dynamic Graph Augmentation).
Given a temporal sequence of , the problem is to generate a sequence of new adjacency matrices improving the performance of a model . ∎
Proposed Method
We depict the overall framework of our method TiaRa in Figure 1. Given a temporal sequence of sparse adjacency matrices in a dynamic graph , our TiaRa aims to produce a timeaware random walk diffusion matrix for each time step using two diffusion based modules, called spatial and temporal augmenters.
The spatial augmenter results in a spatial diffusion matrix that enhances a spatial locality of through random walks. The temporal augmenter receives the previous that contains information squashed from the initial time to , and then disseminates it through at the current time . This leads to a temporal diffusion matrix in which a temporal locality is magnified. Finally, TiaRa linearly combines and , and sparsifies to form . We replace each adjacency matrix with for the inputs of dynamic GNN models. If necessary, we simply use edges of the graph in represented by without weights, or make the graph undirected by using after the sparsification.
Timeaware Random Walk with Restart
As described in the preliminaries section, RWR is used to measure node proximities in a graph. However, it is limited to directly employ RWR in a dynamic graph because RWR measures only spatially localized scores in a single static graph. In this section, we extend RWR to Timeaware Random Walk with Restart (TRWR) so that TRWR produces nodetonode scores which are spatially and temporally localized in the dynamic graph.
Our simple idea for TRWR is to virtually connect identical nodes from to for each time step as shown in Figure 2 so that a random surfer not only moves around the current but also jumps to the next , and the surfer becomes timeaware. In the beginning, the surfer starts from a seed node at the initial time step (e.g., ). After a few movements, suppose the surfer is at node in a graph snapshot . Then, the surfer takes one of the following actions:

Action 1) Random walk. The surfer randomly moves to one of the neighbors from node in the current graph with probability .

Action 2) Restart. The surfer goes back to the seed node in with probability .

Action 3) Time travel. The surfer does time travel from node in to that node in with probability .
where and are called restart and time travel probabilities, respectively, and . Note that we do not allow the surfer to move backward from to because the future information at time should be prevented when we make a prediction at time .
Through TRWR, the vector of stationary probabilities that the surfer visits each node from the seed node in is recursively represented as follows:
(2) 
where is the th unit vector of size . is a rownormalized matrix of (i.e., where is a selflooped adjacency matrix and is a diagonal outdegree matrix of ). If , we define as .
In the above equation, the random walk part propagates scores of over . The restart part makes the scores spatially localized around the seed node , which is controlled by . The time travel part injects scores of the previous to make temporally localized, which is controlled by . Notice TRWR extends RWR to a discretetime dynamic graph, i.e., leads to RWR scores on each graph snapshot without considering temporal information.
Timeaware Random Walk Diffusion Matrices
In Equation (2), is an
dimensional column vector of a probability distribution w.r.t. a seed node
. For all seed nodes , we horizontally stack to form such that is the th column vector of , i.e., . We call a timeaware random walk diffusion matrix at time . The derivation of starts by moving the term of the random walk to the left side in Equation (2) as follows:Let where is an identity matrix, and as described above. Thus, of the above equation is written as the following:
(3) 
where for , and because is defined as .
Spatial and Temporal Augmenters. We obtain the recurrence relation of from Equation (3), and further rearrange it to interpret the process as follows:
In the above, we set which is the diffusion kernel by RWR on the graph where its restart probability is . Let where ; then, is represented as follows:
(4) 
where is a spatial diffusion matrix, and is a temporal diffusion matrix. In TiaRa, the spatial augmenter computes , and the temporal augmenter computes .
The meaning of is the result of diffusing the th column of through for each node . This is interpreted as the augmentation of a spatial locality of each node through RWR within . On the other hand, is the result of diffusing of through for each node . Note that contains the probabilities that the surfer visits each node starting from node during the travel from the initial time to . Thus, it spreads the past proximities of in the current through , which consequently reflects the temporal information to .
The final diffusion matrix is a convex combination between and w.r.t. , which is denoted by in Figure 1. Notice that is a column stochastic transition matrix for every time step , which is proved in Lemma 1, implying that as an augmented adjacency matrix, can be replaced with for the input of GNNs in Equation (1).
Interpretation. We further analyze how reflects the spatial and temporal information of the input dynamic graph. For this purpose, we first obtain the closedform expression of which is described in Theorem 1.
Theorem 1.
The closedform expression of is:
(5) 
where for , and .
Proof.
It is proved by mathematical induction, and the detailed proof is described in Appendix A. ∎
In the theorem, indicates a random walk diffusion traveling from former time toward latter time . According to Equation (5), is concisely represented as:
Note that is more affected by the information close to time than that passed from the distant past. The influence of is decayed by as time is further away from time , while it is emphasized as time is near to time where is interpreted as a temporal decay ratio. This explanation is consistent with the temporal locality, i.e., the tendency that recent edges are more influential than older ones. Combined with the spatial diffusion , the result of augments both spatial and temporal localities in .
Discussion. TiaRa is a generalized version of GDC with PPR kernel to dynamic graphs since TiaRa with spatially augments data within a single at each time, which is exactly what GDC does. However, GDC does not consider temporal information for its augmentation, and it performs worse than TiaRa as shown in Tables 2 and 3. Discussions on addition/deletion of nodes/edges w.r.t. augmentation are described in Appendix B.
Algorithm for TiaRa
Most graph diffusions involve heavy computational cost, especially for a large graph, and result in a dense matrix. The computation of also exhibits the same issue, and thus we adopt approximate techniques to alleviate the problem. Including the approximate strategies, the procedure of TiaRa is summarized in Algorithm 1 where is set to .
Power iteration. The main bottleneck for obtaining is to compute the matrix inversion of in Equation (4), which requires time. Instead of directly calculating the inversion, we use power iteration (lines ) based on the following Yoon et al. (2018):
where , and is the number of iterations. Let be the result after iterations; then, it is recursively represented as follows:
where and . Note is a normalized adjacency matrix (line 1) in which selfloops are added, as traditional GNNs usually do. The approximation error is bounded by , and converges to as as shown in Lemma 2. After that, we set (line 13).
At a glance, each iteration seems to take time for the matrix multiplication, but it is much faster than that since each snapshot is sparse in most cases. More specifically, only a few nodes form edges at each time step in real graphs. We call such nodes activated where is the set of activated nodes at time , and . In each , a surfer can move only between activated nodes, i.e., only pairs of nodes in are diffused. As seen in Table 1, the average of over time is smaller than except the Brain dataset.
This allows us to do the power iteration on the submatrix of for nodes in where is the number of nonzeros of . Then, an iteration takes time for a sparse matrix multiplication. Note is linearly proportional to in real graphs, i.e., where is a constant. Let be the average number of edges over time. As seen in Table 1, is smaller than . Thus, each iteration takes time; overall, it takes time and space for (as only nodes are diffused). More details are in Lemma 4.
Sparsification. Another bottleneck is that is likely to be dense by repeatedly multiplying (line 4) as time increases where . This could be problematic in terms of space as well as running time, especially for graph convolutions since is used as an adjacency matrix. To alleviate this issue, we adopt a sparsification technique suggested in Klicpera et al. (2019). As established in Theorem 1, the graph structure of is spatially and temporally localized, which allows us to drop small entries of , resulting in the sparse . For this purpose, we use a filtering threshold to set values of below to zero (line 6). This strategy has two advantages. First, it keeps sparse at each time. Second, it reduces the cost for processing as and are sparse. After the sparsification, we normalize (line 7) columnwise. As seen in Figure 4, this sparsification makes the augmentation process fast and lightweight with tiny errors while it does not harm predictive accuracy too much, or can even improve.
Theorem 2 (Complexity Analysis).
For each time step , TiaRa takes time on average, and produces consuming space where is the number of total nodes, is the number of activated nodes at time , is the number of iterations, and is a filtering threshold.
Proof.
The proof is provided in Appendix A. ∎
Datasets  
BitcoinAlpha  3,783  31,748  138  2  105  2.2 
WikiElec  7,125  212,854  100  2  354  6.0 
RedditBody  35,776  484,460  88  2  2,465  2.2 
Brain  5,000  1,955,488  12  10  5,000  32.6 
DBLP3  4,257  23,540  10  3  782  3.0 
DBLP5  6,606  42,815  10  5  1,212  3.5 
8,291  264,050  10  4  2,071  12.8 
Discussion. Theorem 2 implies TiaRa shows faster than , and uses space less than for storing in most real dynamic graphs. Nevertheless, its time complexity can reach for a graph such as the Brain dataset; thus, for larger graphs, its scalability can be limited. However, TiaRa is based on matrix operations which are easytoaccelerate using GPUs, and other diffusion methods such as GDC lie at the same complexity. Furthermore, there are extensive works of efficient RWR computations Andersen et al. (2006); Jung et al. (2017); Shin et al. (2015); Wang et al. (2017); Hou et al. (2021) and accelerated multiplications of sparse matrices Srivastava et al. (2020), which can make TiaRa scalable. In this work, we focus on effectively augmenting a dynamic graph, and leave further computational optimization on the augmentation as future work.
Experiment
In this section, we evaluate TiaRa to show its effectiveness for the augmentation problem for dynamic graphs.
Experimental Setting
Datasets. Table 1 summarizes public datasets used in this work. BitcoinAlpha is a social network between bitcoin users Kumar et al. (2016, 2018b). WikiElec is a voting network for Wikipedia adminship elections Leskovec et al. (2010). RedditBody is a hyperlink network of connections between two subreddits Kumar et al. (2018a). For node classification, we use the following datasets evaluated in Xu et al. (2019). Brain is a network of brain tissues where edges indicate their connectivities. DBLP3 and DBLP5 are coauthorship networks extraced from DBLP. Reddit is a post network where two posts were connected if they contain similar keywords.
Baseline augmentation methods. We compare TiaRa to the following baselines. None indicates the result of a model without any augmentation. DropEdge is a dropbased method randomly removing edges at each epoch. GDC is a graph diffusion method where we use PPR for this as our approach is based on random walks. Merge is a simple baseline merging adjacency matrices from time to when training a model at time . We apply DropEdge and GDC to each snapshot since they are designed for a static graph.
Baseline GNNs. We use GCN Kipf and Welling (2017), GCRN Seo et al. (2018) and EvolveGCN Pareja et al. (2020), abbreviated to EGCN, for performing graph tasks. We naively apply a static GCN to each graph snapshot for verifying how temporal information is informative. We choose GCRN and EvolveGCN, lightweight and popular dynamic GNN models showing decent performance, to observe practical gains from augmentation. We adopt GCN layers for GCRN’s graph convolution. Note any GNN models following Problem 1 can utilize TiaRa because our approach is modelagnostic.
Training details.
For each dataset, we tune the hyperparameters of all models on the original graph (marked as None) and modified graphs of augmentation methods separately through a combination of grid and random search on a validation set, and report test accuracy at the best validation epoch. For
TiaRa, we fix to , search for in , and tune and in s.t. . We use the Adam optimizer with weight decay , and the learning rate is tuned in with decay factor . The dropout probability is searched in . We repeat each experimenttimes with different random seeds, and report the average and standard deviation of test values. We use PyTorch to implement all models. All experiments were done at workstations with Intel Xeon 4215R and RTX 3090. Details about datasets and hyperparameters are reported
in Appendix D.Temporal Link Prediction Task
This aims to predict whether an edge exists or not at time using the information up to time . As a standard setting Pareja et al. (2020), we follow a chronological split with ratios of training (70%), validation (10%), and test (20%) sets. We sample the same amount of negative samples (edges) to positive samples (edges) for each time, and use AUC as a representative measure. We set the number of epochs to with early stopping of patience .
As shown in Table 2, TiaRa consistently improves the performance of dynamic GNN models such as GCRN and EGCN compared to None (i.e., without augmentation) while static augmentations of DropEdge and GDC do not. TiaRa also outperforms the static methods on all models and datasets. This indicates it is not beneficial to only spatially augment the graphs for this task. TiaRa even improves static GCN, which is competitive with EGCN, implying that effectively and temporally augmented data can even make static GNNs learn dynamic graphs well. In addition, Merge also improves the accuracy of the tested models on many datasets. This confirms the need to utilize temporal information when it comes to dynamic graph augmentation in this task. However, Merge performs worse than TiaRa in most cases because TiaRa can effectively augment both spatial and temporal localities at once while Merge does not have a mechanism to enhance such localities.
[t]
Macro F1  Brain  DBLP3  DBLP5  

GCN  GCRN  EGCN  GCN  GCRN  EGCN  GCN  GCRN  EGCN  GCN  GCRN  EGCN  
None  44.7±0.8  66.8±1.0  43.4±0.7  18.2±2.9  40.4±1.6  18.6±2.3  53.4±2.6  83.1±0.6  51.3±2.7  69.6±0.9  75.4±0.7  68.5±0.6 
DropEdge  35.2±1.7  67.8±0.6  39.7±1.8  19.4±0.8  40.3±1.4  18.0±2.7  55.8±1.9  84.3±0.6  52.4±1.7  70.5±0.5  75.6±0.7  68.0±0.7 
GDC  63.2±1.2  88.0±1.5  67.3±1.3  17.5±2.3  41.0±1.6  18.5±2.8  53.4±2.1  84.7±0.5  52.8±2.2  70.0±0.7  75.5±1.2  69.1±1.0 
Merge  34.4±3.4  63.2±1.6  53.0±0.9  19.3±3.0  39.6±0.8  20.4±3.0  54.9±3.1  83.0±1.4  53.3±1.2  70.8±0.4  74.5±0.8  69.7±1.6 
TiaRa  68.7±1.2  91.3±1.0  72.0±0.6  18.4±3.0  41.5±1.5  21.9±1.6  57.5±2.2  84.9±1.6  56.4±1.8  71.1±0.6  77.9±0.4  70.1±1.0 
Node Classification Task
This is to classify a label of each node where a graph and features change over time. Following
Xu et al. (2019), we split all nodes into training, validation, and test sets by the 7:1:2 ratio. We feed node embeddings of each model forward to a softmax classifier, and use Macro F1score because labels are imbalanced in each dataset. We set the number of epochs to with early stopping of patience .Table 3 shows TiaRa consistently improves the accuracies of GNNs on most datasets. Especially, TiaRa significantly enhances the accuracies on the Brain dataset as another diffusion method GDC does, but TiaRa shows better accuracy than GDC, implying it is effective to augment a temporal locality for the performance. For the other datasets, TiaRa slightly improves each model, but it overall performs better than other augmentations. Note GCN and EGCN are worse than a random classifier of score (0.25 for ) in the Reddit where is the number of labels, and all tested augmentations fail to beat the score, implying even these augmentations could not boost a poor model in this task.
Effect of Hyperparameters
We analyze the effects of temporal decay ratio and filtering threshold that mainly affect TiaRa’s results. We fix the number of iterations to for the power iteration, which leads to sufficiently accurate results for .
Effect of the temporal decay ratio . As TiaRa’s hyperparameters, and should be analyzed, but our preliminary experiments showed that patterns vary by models and datasets in the changes of and . Instead, we narrow our focus to in Equation (4) where . For this experiment, we vary from to by tweaking and s.t. is fixed to . Figure 3 shows that too small or large values of can degrade link prediction accuracy except GCRN with TiaRa in BitcoinAlpha. This implies that it is important to properly mix spatial and temporal information about the performance, which is controlled by .
Effect of the filtering threshold . Figure 4 shows the effects of in terms of approximate error, time, space, and accuracy of link prediction in BitcoinAlpha and WikiElec. We fix and to 0.25, and vary from to for this experiment. We measure the approximate error
of eigenvalues where
and are vectors of eigenvalues of (i.e., ) and , respectively, as similarly analyzed in Klicpera et al. (2019). The right yaxis of Figures 4(a) and (b) is the error, and the left yaxis is the number of edges in . As time increases, the errors (red and blue lines) remain small, and do not explode, implying errors incurred by repeated sparsifications are not excessively accumulated over time. Rather, the errors tend to be proportional to at each time.Figures 4 (c) and (d) show the space measured by (left yaxis) and the augmentation time (right yaxis) of TiaRa by between and . As the strength of sparsification increase (i.e., becomes larger), the produced nonzeros and the augmentation time decrease. On the other hand, most of the accuracies remain similar except as shown in Figures 4(e) and (f). Note it is not effective to truncate too many entries (e.g., ), or too dense can worse the performance as GCN in WikiElec. Thus, the sparsification with proper such as or provides a good tradeoff between error, time, space, and accuracy.
Conclusion
In this work, we propose TiaRa, a novel diffusionbased method for augmenting a dynamic graph with the purpose of improvements in dynamic GNN models. We first extend Random Walk with Restart to Timeaware RWR so that it produces spatially and temporally localized scores. We then formulate timeaware random walk diffusion matrices, and analyze how our diffusion approach augments both spatial and temporal localities in the dynamic graph. As graph diffusions lead to dense matrices, we further employ approximate techniques such as power iteration and sparsification, and analyze how they are effective for achieving a good tradeoff between error, time, space, and predictive accuracy. Our experiments on various realworld dynamic graphs show that TiaRa aids GNN models in providing better performance of temporal link prediction and node classification tasks.
References
 Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), Cited by: Algorithm for TiaRa.
 Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, External Links: Link Cited by: 2nd item.

Graph random neural networks for semisupervised learning on graphs
. Advances in neural information processing systems. Cited by: Introduction, Related Work. 
Attention based spatialtemporal graph convolutional networks for traffic flow forecasting.
In
Proceedings of the AAAI conference on artificial intelligence
, Cited by: Appendix B, Introduction.  Dynamic neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: Introduction.
 Massively parallel algorithms for personalized pagerank. Proceedings of the VLDB Endowment 14 (9), pp. 1668–1680. Cited by: Algorithm for TiaRa.

Learning to walk across time for interpretable temporal knowledge graph completion
. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 786–795. Cited by: Introduction.  Personalized ranking in signed networks using signed random walk with restart. In IEEE 16th International Conference on Data Mining, ICDM, F. Bonchi, J. DomingoFerrer, R. BaezaYates, Z. Zhou, and X. Wu (Eds.), pp. 973–978. Cited by: Appendix B.
 BePI: fast and memoryefficient method for billionscale random walk with restart. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, pp. 789–804. Cited by: Algorithm for TiaRa.
 Signed random walk diffusion for effective representation learning in signed graphs. Plos one 17 (3), pp. e0265001. Cited by: Appendix B.
 Representation learning for dynamic graphs: a survey.. J. Mach. Learn. Res. 21 (70). Cited by: Related Work.
 Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: 1st item, 2nd item, Experimental Setting.
 Diffusion improves graph learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Cited by: 2nd item, Appendix D, Introduction, Related Work, Algorithm for TiaRa, Effect of Hyperparameters.
 Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, Cited by: 3rd item, Experimental Setting.
 Rev2: fraudulent user prediction in rating platforms. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Cited by: 1st item, Experimental Setting.
 Edge weight prediction in weighted signed networks. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 221–230. Cited by: Experimental Setting.
 Temporal localityaware sampling for accurate triangle counting in real graph streams. VLDB J. 29 (6), pp. 1501–1525. Cited by: Introduction.
 Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web, Cited by: 2nd item, Experimental Setting.
 Strong localization in personalized pagerank vectors. In International Workshop on Algorithms and Models for the WebGraph, Cited by: Preliminaries.
 Transfer graph neural networks for pandemic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: Introduction.
 EvolveGCN: evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: 3rd item, Introduction, Related Work, Preliminaries, Experimental Setting, Temporal Link Prediction Task.
 Signed link representation in continuoustime dynamic signed networks. arXiv preprint arXiv:2207.03408. Cited by: Appendix B.
 DropEdge: towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, Cited by: 1st item, Introduction, Related Work.

Temporal graph networks for deep learning on dynamic graphs
. In ICML 2020 Workshop on Graph Representation Learning, Cited by: Introduction, Introduction.  DySAT: deep neural representation learning on dynamic graphs via selfattention networks. In Proceedings of the 13th International Conference on Web Search and Data Mining, Cited by: Related Work, Related Work, Preliminaries.
 Structured sequence modeling with graph convolutional recurrent networks. In International conference on neural information processing, Cited by: 2nd item, Introduction, Related Work, Preliminaries, Experimental Setting.
 BEAR: block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1571–1585. Cited by: Algorithm for TiaRa.
 Wrs: waiting room sampling for accurate triangle counting in real graph streams. In 2017 IEEE International Conference on Data Mining (ICDM), Cited by: Introduction.
 Foundations and modeling of dynamic networks using dynamic graph neural networks: a survey. IEEE Access. Cited by: Appendix B, Introduction, Related Work, Preliminaries.
 Matraptor: a sparsesparse matrix multiplication accelerator based on rowwise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Cited by: Algorithm for TiaRa.
 Fast random walk with restart and its applications. In Sixth international conference on data mining (ICDM’06), Cited by: Preliminaries.
 FORA: simple and effective approximate singlesource personalized pagerank. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 505–514. Cited by: Algorithm for TiaRa.
 Adaptive data augmentation on temporal graphs. Advances in Neural Information Processing Systems. Cited by: Related Work.
 Connecting the dots: multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, Cited by: Appendix B, Introduction.
 Inductive representation learning on temporal graphs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 2630, 2020, Cited by: Introduction.
 Spatiotemporal attentive rnn for node classification in temporal attributed graphs. In Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence, IJCAI, Cited by: 1st item, 2nd item, 3rd item, Appendix D, Appendix D, Introduction, Related Work, Experimental Setting, Node Classification Task.
 Discretetime temporal network embedding via implicit hierarchical learning in hyperbolic space. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Cited by: Introduction, Related Work, Related Work, Preliminaries.

Accurate node feature estimation with structured variational graph autoencoder
. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2336–2346. Cited by: Appendix D.  Modelagnostic augmentation for accurate graph classification. In WWW ’22: The ACM Web Conference 2022, pp. 1281–1291. Cited by: Introduction.
 Tpa: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), Cited by: Algorithm for TiaRa.
 Spatiotemporal graph convolutional networks: a deep learning framework for traffic forecasting. In IJCAI, Cited by: Appendix B.
 Graph data augmentation for graph machine learning: a survey. arXiv preprint arXiv:2202.08871. Cited by: Introduction, Related Work.
Appendix A Proofs
Lemma 1.
For every time step , is column stochastic.
Proof.
As a base case, for , which is trivially column stochastic. Assume is column stochastic, i.e., where is a column vector of ones. Then, is written as follows:
where . Therefore, the claim holds for every by mathematical induction. ∎
Proof.
We begin the derivation from the following equation:
(4) 
As a base case, is ( in the above) as follows:
where . This trivially holds Equation (5). Let’s assume that Equation (5) holds at . Then, by substituting of Equation (5) into Equation (4), is:
Suppose ; then, is represented as follows:
Lemma 2.
Suppose and . The approximate error of the following power iteration is bounded by , and converges to as :
where is column stochastic, and .
Proof.
Let be the stationary matrix of the equation. Then, the iteration is represented as , implying . Then, the error is represented as follows:
where is L1 norm of a matrix. Note