1. Introduction
The study of centrality measures is a fundamental primitive in the analysis of networked datasets (Borgatti2006; Newman2010), and plays a key role in social network analysis (Das2018). A centrality measure informally captures how important a node is for a given network according to structural properties of the network. Central nodes are crucial in many applications such as analyses of coauthorship networks (Liu2005; Yan2009), biological networks (Wuchty2003; Koschuetzki2008), and ontology summarization (Zhang2007).
One of the most important centrality measures is the betweenness centrality (Freeman1977; Freeman1978), which informally captures the fraction of shortest paths going through a specific node. The betweenness centrality has found applications in many scenarios such as community detection (Fortunato2010), link prediction (Ahmad2020), and network vulnerability analysis (Holme2002). The exact computation of the betweenness centrality of each node of a network is an extremely challenging task on modern networks, both in terms of running time and memory costs. Therefore, sampling algorithms have been proposed to provide provable highquality approximations of the betweenness centrality values, while remarkably reducing the computational costs (Riondato2016; Riondato2018; Brandes2007).
Modern networks, additionally to being large, have also richer information about their edges. In particular, one of the most important and easily accessible information is the time at which edges occur. Such networks are often called temporal networks (Holme2019). The analysis of temporal networks provides novel insights compared to the insights that would be obtained by the analysis of static networks (i.e., networks without temporal information), as, for example, in the study of subgraph patterns (Paranjape2017; Kovanen2011), community detection (Lehmann2019), and network clustering (Fu2020). As well as for static networks, the study of the temporal betweenness centrality in temporal networks aims at identifying the nodes that are visited by a high number of optimal paths (Holme2012; Buss2020). In temporal networks, the definition of optimal paths has to consider the information about the timing of the edges, making the possible definitions of optimal paths much more richer than in static networks (Rymar2021).
In this work, a temporal path is valid if it is time respecting, i.e. if all the interactions within the path occur at increasing timestamps (see Figures (b)b(c)c). We considered two different optimality criteria for temporal paths, chosen for their relevance (Holme2012): (i) shortest temporal path (STP) criterion, a commonly used criterion for which a path is optimal if it uses the minimum number of interactions to connect a given pair of nodes; (ii) restless temporal path (RTP) criterion, for which a path is optimal if, in addition to being shortest, all its consecutive interactions occur at most within a given userspecified time duration parameter (see Figure (c)c). The RTP criterion finds application, for example, in the study of spreading processes over complex networks (Pan2011), where information about the timing of consecutive interactions is fundamental. The exact computation of the temporal betweenness centrality under the STP and RTP optimality criteria becomes impractical (both in terms of running time and memory usage) for even moderatelysized networks. Furthermore, as well as for static networks, obtaining a highquality approximation of the temporal betweenness centrality of a node is often sufficient in many applications. Thus, we propose , the first algorithm to compute rigOrous estimatioN of temporal Betweenness centRality values in temporAl networks^{1}^{1}1https://vec.wikipedia.org/wiki/Onbra., providing sharp guarantees on the quality of its output. As for many datamining algorithms, ’s output is function of two parameters: controlling the estimates’ accuracy; and controlling the confidence. The algorithmic problems arising from accounting for temporal information are really challenging to deal with compared to the static network scenario, although shares a highlevel sampling strategy similar to (Riondato2018). Finally, we show that in practice our algorithm , other than providing highquality estimates while reducing computational costs, it also enables analyses that cannot be otherwise performed with existing stateoftheart algorithms. Our main contributions are the following:

We propose , the first samplingbased algorithm that outputs highquality approximations of the temporal betweenness centrality values of the nodes of a temporal network.
leverages on an advanced datadependent and varianceaware concentration inequality to provide sharp probabilistic guarantees on the quality of its estimates.

We show that is able to compute highquality temporal betweenness estimates for two optimality criteria of the paths, i.e., STP and RTP criteria. In particular, we developed specific algorithms for to address the computation of the estimates according to such optimality criteria.

We perform an extensive experimental evaluation with several goals: (i) under the STP criterion, show that studying the temporal betweenness centrality provides novel insights compared to the static version; (ii) under the STP criterion, show that provides highquality estimates, while significantly reducing the computational costs compared to the stateoftheart exact algorithm, and that it enables the study of large datasets that cannot practically be analyzed by the existing exact algorithm; (iii) show that is able to estimate the temporal betweenness centrality under the RTP optimality criterion by varying .

2. Preliminaries
In this section we introduce the fundamental notions needed throughout the development of our work and formalize the problem of approximating the temporal betweenness centrality of the nodes in a temporal network.
We start by introducing temporal networks.
Definition 2.1 ().
A temporal network is a pair , where is a set of nodes (or vertices), and is a set of directed edges^{2}^{2}2 can be easily adapted to work on undirected temporal networks with minor modifications.^{3}^{3}3W.l.o.g. we assume the edges to be sorted by increasing timestamps..
Each edge of the network represents an interaction from node to node at time , which is the timestamp of the edge. Figure (a)a (left) provides an example of a temporal network . Next, we define temporal paths.
Definition 2.2 ().
Given a temporal network , a temporal path is a sequence of edges of ordered by increasing timestamps^{4}^{4}4Our work can be easily adapted to deal with nonstrict ascending timestamps (i.e., with constraints)., i.e., , such that the node of edge is equal to the node of the consecutive edge , i.e., , and each node is visited by at most once.
Given a temporal path made of edges, we define its length as . An example of temporal path of length is given by Figure (c)c. Given a source node and a destination node , a shortest temporal path between and is a temporal path of length such that in there is no temporal path connecting to of length . Given a temporal shortest path connecting and , we define as the set of nodes internal to the path . Let be the number of shortest temporal paths between nodes and . Given a node , we denote with the number of shortest temporal paths connecting and for which is an internal node, i.e., . Now we introduce the temporal betweenness centrality of a node , which intuitively captures the fraction of shortest temporal paths visiting .
Definition 2.3 ().
We define the temporal betweenness centrality of a node as
Let be the set of pairs composed of a node and its temporal betweenness value . Since the exact computation of the set using stateoftheart exact algorithms, e.g., (Buss2020; Rymar2021), is impractical on even moderatelysized temporal networks (see Section 5 for experimental evaluations), in our work we aim at providing highquality approximations of the temporal betweenness centrality values of all the nodes of the temporal network. That is, we compute the set , where is an accurate estimate of , controlled by two parameters , (accuracy and confidence). We want to be an absolute ()approximation set of , as commonly adopted in datamining algorithms (e.g., in (Riondato2018)): that is, is an approximation set such that
Note that in an absolute ()approximation set, for each node , the estimate of the temporal betweenness value deviates from the actual value of at most
, with probability at least
. Finally, let us state the main computational problem addressed in this work.Problem 1 ().
Given a temporal network and two parameters , compute the set , i.e., an absolute ()approximation set of .
3. Related Works
Given the importance of the betweenness centrality for network analysis, many algorithms have been proposed to compute it in different scenarios. In this section we focus on those scenarios most relevant to our work, grouped as follows.
Approximation Algorithms for Static Networks. Recently, many algorithms to approximate the betweenness centrality in static networks have been proposed, most of them employ randomized sampling approaches (Riondato2016; Riondato2018; Brandes2007). The existing algorithms differ from each other mainly for the sampling strategy they adopt and for the probabilistic guarantees they offer. Among these works, the one that shares similar ideas to our work is (Riondato2018) by Riondato and Upfal, where the authors proposed to sample pairs of nodes , compute all the shortest paths from to , and update the estimates of the betweenness centrality values of the nodes internal to such paths. The authors developed a suite of algorithms to output an approximation set of the set of betweenness centrality values. Their work cannot be easily adapted to temporal networks. In fact, static and temporal paths in general are not related in any way, and the temporal scenario introduces many novel challenges: (i) computing the optimal temporal paths, and (ii) updating the betweenness centrality values. Therefore, our algorithm employs the idea of the estimator provided by (Riondato2018), while using novel algorithms designed for the context of temporal networks. Furthermore, the probabilistic guarantees provided by our algorithm leverage on the variance of the estimates, differently from (Riondato2018) that used bounds based on the Rademacher averages. Our choice to use a varianceaware concentration inequality is motivated by the recent interest in providing sharp guarantees employing the empirical variance of the estimates (Cousins2021; Pellegrina2021).
Algorithms for Dynamic Networks. In this setting the algorithm keeps track of the betweenness centrality value of each node for every timestamp observed in the network (Lee2012; Hanauer2021). Note that this is extremely different from estimating the temporal betweenness centrality values in temporal networks. In the dynamic scenario the paths considered are not required to be time respecting. For example, in the dynamic scenario, if we consider the network in Figure (a)a (left) at any time , the shortest path from to is the one highlighted in purple in Figure (a)a (right). Instead, in the temporal setting such path is not time respecting. We think that it is very challenging to adapt the algorithms for dynamic networks to work in the context of temporal networks, which further motivates us to propose .
Exact Algorithms for Temporal Networks. Several exact approaches have been proposed in the literature (Tsalouchidou2020; Alsayed2015; Kim2012). The algorithm most relevant to our work was presented in (Buss2020), where the authors extended the wellknown Brandes algorithm (Brandes2001) to the temporal network scenario considering the STP criterion (among several other criteria). They showed that the time complexity of their algorithm is , which is often impractical on even moderatelysized networks. Recently, (Rymar2021) discussed conditions on temporal paths under which the temporal betweenness centrality can be computed in polynomial time, showing a general algorithm running in even under the RTP criterion, which is again very far from being practical on modern networks.
We conclude by observing that, to the best of our knowledge, no approximation algorithms exist for estimating the temporal betweenness centrality in temporal networks.
4. Method, Algorithm, and Analysis
In this section we discuss , our novel algorithm for computing highquality approximations of the temporal betweenness centrality values of the nodes of a temporal network. We first discuss the sampling strategy used in , then we present the algorithm, and finally we show the theoretical guarantees on the quality of the estimates of .
4.1.  Sampling Strategy
In this section we discuss the sampling strategy adopted by that is independent of the optimality criterion of the paths. However, for the sake of presentation, we discuss the sampling strategy for the STPbased temporal betweenness centrality estimation.
samples pairs of nodes and computes all the shortest temporal paths from to . More formally, let , and be a userspecified parameter. first collects pairs of nodes , sampled uniformly at random from . Next, for each pair it computes , i.e., the set of shortest temporal paths from to . Then, for each node s.t. with , i.e., for each node that is internal to a shortest temporal path of , computes the estimate
, which is an unbiased estimator of the temporal betweenness centrality value
(i.e., , see Lemma A.1 in Appendix A). Finally, after processing the pairs of nodes randomly selected, computes for each node the (unbiased) estimate of the actual temporal betweenness centrality by averaging over the sampling steps: , where is the estimate of obtained by analyzing the th sample, . We will discuss the theoretical guarantees on the quality of in Section 4.4.4.2. Algorithm Description
Sampling Algorithm:
is presented in Algorithm 1. In line 1 we first initialize the set of objects to be sampled, where each object is a pair of distinct nodes from . Next, in line 1 we initialize the matrix of size to store the estimates of for each node at the various iterations, needed to compute their empirical variance and the final estimates. Then we start the main loop (line 1) that will iterate times. In such loop we first select a pair sampled uniformly at randomly from (line 1). We then compute all the shortest temporal paths from to by executing Algorithm 2 (line 1), which is described in detail later in this section. Such algorithm computes all the shortest temporal paths from and adopting some pruning criteria to speedup the computation. If at least one STP between and exists (line 1), then for each node internal to a path in we update the corresponding estimate to the current iteration by computing using Algorithm 3 (line 1). While in static networks this step can be done with a simple recursive formula (Riondato2018), in our scenario we need a specific algorithm to deal with the more challenging fact that a node may appear at different distances from a given source across different shortest temporal paths. We will discuss in detail such algorithm later in this section. At the end of the iterations of the main loop, computes: (i) the set of unbiased estimates (line 1); (ii) and a tight bound on , which leverages the empirical variance of the estimates (line 1). We observe that is such that the set is an absolute approximation set of . We discuss the computation of such bound in Section 4.4. Finally, returns .
Subroutines
We now describe the subroutines employed in Algorithm 1 focusing on the STP criterion. Then, in Section 4.3, we discuss how to deal with the RTP criterion.
Source Destination Shortest Paths Computation. We start by introducing some definitions needed through this section. First, we say that a pair is a vertex appearance (VA) if . Next, given a VA we say that a VA is a predecessor of if . Finally, given a VA we define its set of outneighbouring VAs as .
We now describe Algorithm 2 that computes the shortest temporal paths between a source node and a destination node (invoked in at line 1). Such computation is optimized to prune the search space once found the destination . The algorithm initializes the data structures needed to keep track of the shortest temporal paths that, starting from , reach a node in , i.e., the arrays and that contain for each node , respectively, the minimum distance to reach and the number of shortest temporal paths reaching (line 2). In line 2 we initialize that keeps track of the minimum distance of a VA from the source , that maintains the number of shortest temporal paths reaching a VA from , and keeping the set of predecessors of a VA across the shortest temporal paths explored. After initializing the values of the data structures for the source and keeping the length of the minimum distance to reach (lines 22), we initialize the queue that keeps the VAs to be visited in a BFS fashion in line 2 (observe that, since the temporal paths need to be timerespecting, all the paths need to account for the time at which each node is visited). Next, the algorithm explores the network in a BFS order (line 2), extracting a VA from the queue, which corresponds to a node and the time at which such node is visited, and processing it by collecting its set of outneighbouring VAs (lines 22). If a VA was not already explored (i.e., it holds ), then we update the minimum distance to reach at time , the minimum distance of the vertex if it was not already visited, and, if is the destination node , we update (lines 22). Observe that the distance to reach is used as a pruning criterion in line 2 (clearly, if a VA appears at a distance greater than then it cannot be on a shortest temporal path from to ). After updating the VAs to be visited by inserting them in (line 2), if the current temporal path is shortest for the VA analyzed, we update the number of shortest temporal paths leading to it, its set of predecessors, and the number of shortest temporal paths reaching the node (lines 22).
Update Estimates: STP criterion. Now we describe Algorithm 3, which updates the temporal betweenness estimates of each node internal to a path in already computed. With Algorithm 2 we computed for each VA the number of shortest temporal paths from reaching . Now, in Algorithm 3 we need to combine such counts to compute the total number of shortest temporal paths leading to each VA appearing in a path in , allowing us to compute the estimate of for each node .
At the end of Algorithm 2 there are in total shortest temporal paths reaching from . Now we need to compute, for each node internal to a path in and for each VA , the number of shortest temporal paths leading from to at a time greater that . Then, the fraction of paths containing the node is computed with a simple formula, i.e., , where . The whole procedure is described in Algorithm 3. We start by initializing that stores for each VA the number of shortest temporal paths reaching at a time greater than starting from , and a boolean matrix that keeps track for each VA if it has already been considered (line 3). In line 3 we initialize a queue that will be used to explore the VAs appearing along the paths in in reverse order of distance from starting from the destination node . Then we initialize for each VA reaching at a given time (line 3), and we insert each VA in the queue only one time (line 3). The algorithm then starts its main loop exploring the VAs in decreasing order of distance starting from (line 3). We take the VA to be explored in line 3. If differs from (i.e., is an internal node), then we update its temporal betweenness estimate by adding (line 3). As we did in the initialization step, then we process each predecessor of across the paths in (line 3), update the count of the paths from the predecessor to by summing the number of paths passing through and reaching (line 3), and we enqueue the predecessor only if it was not already considered (lines 33). So, the algorithm terminates by having properly computed for each node the estimate for each iteration .
4.3. Restless Temporal Betweenness
In this section we present the algorithms that are used in when considering the RTP criterion for the optimal paths to compute the temporal betweenness centrality values.
Recall that, in such scenario, a temporal path is considered optimal if and only if , additionally to being shortest, is such that, given , it holds for . Considering the RTP criteria, we need to relax the definition of shortest temporal paths and, instead, consider shortest temporal walks. Intuitively, a walk is a path where we drop the constraint that a node must be visited at most once. We provide an intuition of why we need such requirement in Figure 2. Given , we refer to a shortest temporal walk as shortest restless temporal walk.
In order to properly work under the RTP criteria, needs novel algorithms to compute the optimal walks and update the betweenness estimates. Note that to compute the shortest restless temporal walks we can use Algorithm 2 provided that we add the condition in line 2.
More interestingly, the biggest computational problem arises when updating the temporal betweenness values of the various nodes on the optimal walks. Note that, to do so, we cannot use Algorithm 3 because it does not account for cycles (i.e, when vertices appear multiple times across a walk). We therefore introduce Algorithm 4 (pseudocode in Appendix B) that works in the presence of cycles. The main intuition behind Algorithm 4 is that we need to recreate backwards all the optimal walks obtained through the RTP version of Algorithm 2. For each walk we will maintain a set that keeps track of the nodes already visited up to the current point of the exploration of the walk, updating a node’s estimate if and only if we see such node for the first time. This is based on the simple observation that a cycle cannot alter the value of the betweenness centrality of a node on a fixed walk, allowing us to account only once for the node’s appearance along the walk.
We now describe Algorithm 4 by discussing its differences with Algorithm 3. In line 4, instead of maintaining a matrix keeping track of the presence of a VA in the queue, we now initialize a matrix that keeps the number of times a VA is in the queue. The queue, initialized in line 4, keeps elements of the form , where the first entry is a VA to be explored and the second entry is the set of nodes already visited backwards along the walk leading to such vertex appearance. While visiting backwards each walk, we check if the nodes are visited for the first time on such walk: if so, we update the betweenness values by accounting for the number of times we will visit such VA across other walks (lines 44). Next, we update the set of nodes visited (line 4). Finally, we update the count of the walks leading from the predecessor of the current VA to (line 4), the number of times such predecessor will be visited (line 4), and enqueue the predecessor to be explored, together with the additional information of the set of nodes explored up to that point.
To conclude, note that Algorithm 4 is more expensive than Algorithm 3 since it recreates all the optimal walks, while Algorithm 3 avoids such step given the absence of cycles.
4.4.  Theoretical Guarantees
In order to address Problem 1, bounds the deviation between the estimates and the actual values , for every node . To do so, we leverage on the so called empirical Bernstein bound, which we adapted to .
Given a node , let , where is the estimate of by analysing the th sample, . Let be the empirical variance of :
We use the empirical Bernstein bound to limit the deviation between ’s and ’s, which represents Corollary 5 of (maurer2009empirical) adapted to our framework, since Corollary 5 of (maurer2009empirical)
is formulated for generic random variables taking values in
and for an arbitrary set of functions.Theorem 4.1 (Corollary 5, (maurer2009empirical)).
Let be the number of samples, and be the confidence parameter. Let be the estimate of by analysing the th sample, and . Let , and be its empirical variance. With probability at least , and for every node , we have that
The right hand side of the inequality of the previous theorem differs from Corollary 5 of (maurer2009empirical) by a factor of in the arguments of the natural logarithms, since in (maurer2009empirical) the bound is not stated in the symmetric form reported in Theorem 4.1. Finally, the result about the guarantees on the quality of the estimates provided by follows.
Corollary 4.2 ().
Given a temporal network , the pair in output from is such that, with probability , it holds that is an absolute approximation set of .
Observe that Corollary 4.2 is independent of the structure of the optimal paths considered by , therefore such guarantees hold for both the criteria considered in our work.
5. Experimental Evaluation
In this section we present our experimental evaluation that has the following goals: (i) motivate the study of the temporal betweenness centrality by showing two real world temporal networks on which the temporal betweenness provides novel insights compared to the static betweenness computed on their associated static networks; (ii) assess, considering the STP criterion, the accuracy of the ’s estimates, and the benefit of using instead of the stateoftheart exact approach (Buss2020), both in terms of running time and memory usage; (iii) finally, show how can be used on a real world temporal network to analyze the RTPbased betweenness centrality values.
5.1. Setup
Name  Granularity  Timespan  

HighSchool2012 (HS)  180  45K  20 sec  7 (days) 
CollegeMsg  1.9K  59.8K  1 sec  193 (days) 
EmailEu  986  332K  1 sec  803 (days) 
FBWall (FB)  35.9K  199.8K  1 sec  100 (days) 
Sms  44K  544.8K  1 sec  338 (days) 
Mathoverflow  24.8K  390K  1 sec  6.4 (years) 
Askubuntu  157K  727K  1 sec  7.2 (years) 
Superuser  192K  1.1M  1 sec  7.6 (years) 
We implemented in C++20 and compiled it using gcc 9. The code is freely available^{5}^{5}5https://github.com/iliesarpe/ONBRA.. All the experiments were performed sequentially on a 72 core Intel Xeon Gold 5520 @ 2.2GHz machine with 1008GB of RAM available. The real world datasets we used are described in Table 1, which are mostly social or message networks from different domains. Such datasets are publicly available online^{6}^{6}6http://www.sociopatterns.org/ and https://snap.stanford.edu/temporalmotifs/data.html.. For detailed descriptions of such datasets we refer to the links reported and (Paranjape2017). To obtain the FBWall dataset we cut the last 200K edges from the original dataset (Viswanath2009), which has more than 800K edges. Such cut is done to allow the exact algorithm to complete its execution without exceeding the available memory.
5.2. Temporal vs Static Betweenness
In this section we assess that the temporal betweenness centrality of the nodes of a temporal network provides novel insights compared to its static version. To do so, we computed for two datasets, from different domains, the exact ranking of the various nodes according to their betweenness values. The goal of this experiment is to compare the two rankings (i.e., temporal and static) and understand if the relative orderings are preserved, i.e., verify if the most central nodes in the static network are also the most central nodes in the temporal network. To this end, given a temporal network , let be its associated static network. We used the following two real world networks: (i) HS, that is a temporal network representing a facetoface interaction network among students; (ii) and FB, that is a Facebook useractivity network (Viswanath2009) (see Table 1 for further details).
We first computed the exact temporal and static betweenness values of the different nodes of the two networks. Then, we ranked the nodes by descending betweenness values. We now discuss how the top ranked nodes vary from temporal to static on the two networks. We report in Table 3 (in Appendix C) the Jaccard similarity between the sets containing the top nodes of the static and temporal networks. On HS, for , only 11 nodes are top ranked in both the rankings, which means that less than half of the top25 nodes are central if only the static information is considered. The value of the intersection increases to for , since the network has only 180 nodes. More interestingly, also on the Facebook network only few temporally central nodes can be detected by considering only static information: only 9 over the top25 nodes and 15 over the top50 nodes. In order to better visualize the top ranked nodes, we show their betweenness values in Figure (a)a: note that there are many top temporally ranked nodes having small static betweenness values, and vice versa.
These experiments show the importance of studying the temporal betweenness centrality, which provides novel insights compared to the static version.
Dataset  Avg. Error  Sample rate (%)  (sec)  (sec)  MEM (GB)  MEM (GB)  

CollegeMsg  0.083  231  148  12.0  0.13  
EmailEu  0.093  7211  1808  23.9  2.1  
Mathoverflow  0.005  79492  36983  1004.3  6.8  
FBWall  0.003  11489  3145  738.0  11.1  
Askubuntu  ✗  ✗  0.00006  ✗  35585  1008  20.3  
Sms  ✗  ✗  0.00231  ✗  13020  1008  16.2  
Superuser  ✗  ✗  0.00003  ✗  41856  1008  16.7 
5.3. Accuracy and Resources of
In this section we first assess the accuracy of the estimates provided by considering only the STP criterion, since for the RTP criterion no implemented exact algorithm exists. Then, we show the reduction of computational resources induced by compared to the exact algorithm in (Buss2020).
To assess ’s accuracy and its computational cost, we used four datasets, i.e., CollegeMsg, EmailEu, Mathoverflow, and FBWall. We first executed the exact algorithm, and then we fix and properly for to run within a fraction of the time required by the exact algorithm. The results we now present, which are described in detail in Table 2, are all averaged over 10 runs (except for the RAM peak, which is measured over one single execution of the algorithms).
Remarkably, even using less than of the overall pairs of nodes as sample size, is able to estimate the temporal betweenness centrality values with very small average deviations between and , while obtaining a significant running time speedup between and with respect to the exact algorithm (Buss2020). Additionally, the amount of RAM memory used by is significantly smaller than the exact algorithm in (Buss2020): e.g., on the Mathoverflow dataset requires only GB of RAM peak, which is less than the GB required by the exact stateoftheart algorithm (Buss2020). Furthermore, in all the experiments we found that the maximum deviation is distant at most one order of magnitude from the theoretical upper bound guaranteed by Corollary 4.2. Surprisingly, for two datasets (EmailEu and Mathoverflow) the maximum deviation and the upper bound are even of the same order of magnitude. Therefore we can conclude that the guarantees provided by Corollary 4.2 are often very sharp. In addition, ’s accuracy is demonstrated by the fact that the deviation between the actual temporal betweenness centrality value of a node and its estimate obtained using is about one order of magnitude less than the actual value, as we show in Figure (b)b and Figure 4 (in Appendix C).
Finally, we show in Table 2 that on the large datasets Askubuntu, Sms, and Superuser the exact algorithm (Buss2020) is not able to conclude the computation on our machine (denoted with ✗) since it requires more than 1008GB of RAM. Instead, provides estimates of the temporal betweenness centrality values in less than K (sec) and GB of RAM memory.
To conclude, is able to estimate the temporal betweenness centrality with high accuracy providing rigorous and sharp guarantees, while significantly reducing the computational resources required by the exact algorithm in (Buss2020).
5.4. on RTPbased Betweenness
In this section we discuss how can be used to analyze real world networks by estimating the centrality values of the nodes for the temporal betweenness under the RTP criterion.
We used the FB network, on which we computed a tight approximation of the temporal betweenness values () of the nodes for different values of , i.e., =1 day, =15 days, and =1 month. For =1 day, we found only 4 nodes with temporal betweenness value different from 0, which is surprising since it highlights that the information spreading across wall posts through RTPs in 2008 on Facebook required more than 1 day of time between consecutive interactions (i.e., slow spreading). We present the results for the other values of in Figures (c)c and (d)d, comparing them to the (exact) STPbased betweenness. Interestingly, 15 days are still not sufficient to capture most of the betweenness values based on STPs of the different nodes, while with =1 month the betweenness values are much closer to the STPbased values. While this behaviour is to be expected with increasing , finding such values of helps to better characterize the dynamics over the network.
To conclude, also enables novel analyses that cannot otherwise be performed with existing tools.
6. Discussion
In this work we presented , the first algorithm that provides highquality approximations of the temporal betweenness centrality values of the nodes in a temporal network, with rigorous probabilistic guarantees. works under two different optimality criteria for the paths on which the temporal betweenness centrality is defined: shortest and restless temporal paths (STP, RTP) criteria. To the best of our knowledge, is the first algorithm enabling a practical computation under the RTP criteria. Our experimental evaluation shows that provides highquality estimates with tight guarantees, while remarkably reducing the computational costs compared to the stateoftheart in (Buss2020), enabling analyses that would not otherwise be possible to perform.
Finally, several interesting directions could be explored in the future, such as dealing with different optimality criteria for the paths, and employing sharper concentration inequalities to provide tighter guarantees on the quality of the estimates.
Acknowledgements.
Part of this work was supported by the Italian Ministry of Education, University and Research (MIUR), under PRIN Project n. 20174LF3T8 “AHeAD” (efficient Algorithms for HArnessing networked Data) and the initiative “Departments of Excellence” (Law 232/2016), and by University of Padova under project “SID 2020: RATEDX”.References
Appendix A Missing Proofs
Lemma A.1 ().
Let , then is an unbiased estimator of .
Proof.
Let be a Bernoulli random variable that takes value if the pair of nodes is sampled, and otherwise. Since , then by the linearity of expectation,
∎
Appendix B Missing Algorithms
Appendix C Supplementary Data
Given and , let be the top nodes ranked by their temporal betweenness values and let be the top nodes ranked by their static betweenness values. We report in Table 3 the Jaccard similarity for two different values of .
Name  

HS  0.28 (11)  0.56 (36) 
FB  0.22 (9)  0.18 (15) 