DeepAI
Log In Sign Up

ONBRA: Rigorous Estimation of the Temporal Betweenness Centrality in Temporal Networks

In network analysis, the betweenness centrality of a node informally captures the fraction of shortest paths visiting that node. The computation of the betweenness centrality measure is a fundamental task in the analysis of modern networks, enabling the identification of the most central nodes in such networks. Additionally to being massive, modern networks also contain information about the time at which their events occur. Such networks are often called temporal networks. The temporal information makes the study of the betweenness centrality in temporal networks (i.e., temporal betweenness centrality) much more challenging than in static networks (i.e., networks without temporal information). Moreover, the exact computation of the temporal betweenness centrality is often impractical on even moderately-sized networks, given its extremely high computational cost. A natural approach to reduce such computational cost is to obtain high-quality estimates of the exact values of the temporal betweenness centrality. In this work we present ONBRA, the first sampling-based approximation algorithm for estimating the temporal betweenness centrality values of the nodes in a temporal network, providing rigorous probabilistic guarantees on the quality of its output. ONBRA is able to compute the estimates of the temporal betweenness centrality values under two different optimality criteria for the shortest paths of the temporal network. In addition, ONBRA outputs high-quality estimates with sharp theoretical guarantees leveraging on the empirical Bernstein bound, an advanced concentration inequality. Finally, our experimental evaluation shows that ONBRA significantly reduces the computational resources required by the exact computation of the temporal betweenness centrality on several real world networks, while reporting high-quality estimates with rigorous guarantees.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

01/18/2021

PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts

The identification and counting of small graph patterns, called network ...
06/07/2021

SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds

Betweenness centrality is a popular centrality measure with applications...
02/08/2022

Temporal Walk Centrality: Ranking Nodes in Evolving Networks

We propose the Temporal Walk Centrality, which quantifies the importance...
03/03/2020

Scalable Distributed Approximation of Internal Measures for Clustering Evaluation

The most widely used internal measure for clustering evaluation is the s...
08/19/2021

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks

Counting the number of occurrences of small connected subgraphs, called ...
01/25/2022

Structural importance and evolution: an application to financial transaction networks

A fundamental problem in the study of networks is the identification of ...
09/13/2017

Efficient Computation of Multiple Density-Based Clustering Hierarchies

HDBSCAN*, a state-of-the-art density-based hierarchical clustering metho...

1. Introduction

The study of centrality measures is a fundamental primitive in the analysis of networked datasets (Borgatti2006; Newman2010), and plays a key role in social network analysis (Das2018). A centrality measure informally captures how important a node is for a given network according to structural properties of the network. Central nodes are crucial in many applications such as analyses of co-authorship networks (Liu2005; Yan2009), biological networks (Wuchty2003; Koschuetzki2008), and ontology summarization (Zhang2007).

One of the most important centrality measures is the betweenness centrality (Freeman1977; Freeman1978), which informally captures the fraction of shortest paths going through a specific node. The betweenness centrality has found applications in many scenarios such as community detection (Fortunato2010), link prediction (Ahmad2020), and network vulnerability analysis (Holme2002). The exact computation of the betweenness centrality of each node of a network is an extremely challenging task on modern networks, both in terms of running time and memory costs. Therefore, sampling algorithms have been proposed to provide provable high-quality approximations of the betweenness centrality values, while remarkably reducing the computational costs (Riondato2016; Riondato2018; Brandes2007).

Modern networks, additionally to being large, have also richer information about their edges. In particular, one of the most important and easily accessible information is the time at which edges occur. Such networks are often called temporal networks (Holme2019). The analysis of temporal networks provides novel insights compared to the insights that would be obtained by the analysis of static networks (i.e., networks without temporal information), as, for example, in the study of subgraph patterns (Paranjape2017; Kovanen2011), community detection (Lehmann2019), and network clustering (Fu2020). As well as for static networks, the study of the temporal betweenness centrality in temporal networks aims at identifying the nodes that are visited by a high number of optimal paths  (Holme2012; Buss2020). In temporal networks, the definition of optimal paths has to consider the information about the timing of the edges, making the possible definitions of optimal paths much more richer than in static networks (Rymar2021).

In this work, a temporal path is valid if it is time respecting, i.e. if all the interactions within the path occur at increasing timestamps (see Figures (b)b-(c)c). We considered two different optimality criteria for temporal paths, chosen for their relevance (Holme2012): (i) shortest temporal path (STP) criterion, a commonly used criterion for which a path is optimal if it uses the minimum number of interactions to connect a given pair of nodes; (ii) restless temporal path (RTP) criterion, for which a path is optimal if, in addition to being shortest, all its consecutive interactions occur at most within a given user-specified time duration parameter (see Figure (c)c). The RTP criterion finds application, for example, in the study of spreading processes over complex networks (Pan2011), where information about the timing of consecutive interactions is fundamental. The exact computation of the temporal betweenness centrality under the STP and RTP optimality criteria becomes impractical (both in terms of running time and memory usage) for even moderately-sized networks. Furthermore, as well as for static networks, obtaining a high-quality approximation of the temporal betweenness centrality of a node is often sufficient in many applications. Thus, we propose , the first algorithm to compute rigOrous estimatioN of temporal Betweenness centRality values in temporAl networks111https://vec.wikipedia.org/wiki/Onbra., providing sharp guarantees on the quality of its output. As for many data-mining algorithms, ’s output is function of two parameters: controlling the estimates’ accuracy; and controlling the confidence. The algorithmic problems arising from accounting for temporal information are really challenging to deal with compared to the static network scenario, although  shares a high-level sampling strategy similar to (Riondato2018). Finally, we show that in practice our algorithm , other than providing high-quality estimates while reducing computational costs, it also enables analyses that cannot be otherwise performed with existing state-of-the-art algorithms. Our main contributions are the following:

  • We propose , the first sampling-based algorithm that outputs high-quality approximations of the temporal betweenness centrality values of the nodes of a temporal network.

     leverages on an advanced data-dependent and variance-aware concentration inequality to provide sharp probabilistic guarantees on the quality of its estimates.

  • We show that  is able to compute high-quality temporal betweenness estimates for two optimality criteria of the paths, i.e., STP and RTP criteria. In particular, we developed specific algorithms for  to address the computation of the estimates according to such optimality criteria.

  • We perform an extensive experimental evaluation with several goals: (i) under the STP criterion, show that studying the temporal betweenness centrality provides novel insights compared to the static version; (ii) under the STP criterion, show that  provides high-quality estimates, while significantly reducing the computational costs compared to the state-of-the-art exact algorithm, and that it enables the study of large datasets that cannot practically be analyzed by the existing exact algorithm; (iii) show that  is able to estimate the temporal betweenness centrality under the RTP optimality criterion by varying .

(a)
(b)
(c)
Figure 1. ((a)a): (left) a temporal network with nodes and edges, (right) its associated static network obtained from by removing temporal information. A shortest temporal path cannot be identified by a shortest path in the static network: e.g., the shortest paths from node to node , respectively coloured in green in and purple in , are different. ((b)b): A path that is not time respecting. ((c)c): A time respecting path that is also shortest in . With such path is also shortest -restless path.

2. Preliminaries

In this section we introduce the fundamental notions needed throughout the development of our work and formalize the problem of approximating the temporal betweenness centrality of the nodes in a temporal network.

We start by introducing temporal networks.

Definition 2.1 ().

A temporal network is a pair , where is a set of nodes (or vertices), and is a set of directed edges222 can be easily adapted to work on undirected temporal networks with minor modifications.333W.l.o.g. we assume the edges to be sorted by increasing timestamps..

Each edge of the network represents an interaction from node to node at time , which is the timestamp of the edge. Figure (a)a (left) provides an example of a temporal network . Next, we define temporal paths.

Definition 2.2 ().

Given a temporal network , a temporal path is a sequence of edges of ordered by increasing timestamps444Our work can be easily adapted to deal with non-strict ascending timestamps (i.e., with constraints)., i.e., , such that the node of edge is equal to the node of the consecutive edge , i.e., , and each node is visited by at most once.

Given a temporal path made of edges, we define its length as . An example of temporal path of length is given by Figure (c)c. Given a source node and a destination node , a shortest temporal path between and is a temporal path of length such that in there is no temporal path connecting to of length . Given a temporal shortest path connecting and , we define as the set of nodes internal to the path . Let be the number of shortest temporal paths between nodes and . Given a node , we denote with the number of shortest temporal paths connecting and for which is an internal node, i.e., . Now we introduce the temporal betweenness centrality of a node , which intuitively captures the fraction of shortest temporal paths visiting .

Definition 2.3 ().

We define the temporal betweenness centrality of a node as

Let be the set of pairs composed of a node and its temporal betweenness value . Since the exact computation of the set using state-of-the-art exact algorithms, e.g., (Buss2020; Rymar2021), is impractical on even moderately-sized temporal networks (see Section 5 for experimental evaluations), in our work we aim at providing high-quality approximations of the temporal betweenness centrality values of all the nodes of the temporal network. That is, we compute the set , where is an accurate estimate of , controlled by two parameters , (accuracy and confidence). We want to be an absolute ()-approximation set of , as commonly adopted in data-mining algorithms (e.g., in (Riondato2018)): that is, is an approximation set such that

Note that in an absolute ()-approximation set, for each node , the estimate of the temporal betweenness value deviates from the actual value of at most

, with probability at least

. Finally, let us state the main computational problem addressed in this work.

Problem 1 ().

Given a temporal network and two parameters , compute the set , i.e., an absolute ()-approximation set of .

3. Related Works

Given the importance of the betweenness centrality for network analysis, many algorithms have been proposed to compute it in different scenarios. In this section we focus on those scenarios most relevant to our work, grouped as follows.

Approximation Algorithms for Static Networks. Recently, many algorithms to approximate the betweenness centrality in static networks have been proposed, most of them employ randomized sampling approaches (Riondato2016; Riondato2018; Brandes2007). The existing algorithms differ from each other mainly for the sampling strategy they adopt and for the probabilistic guarantees they offer. Among these works, the one that shares similar ideas to our work is (Riondato2018) by Riondato and Upfal, where the authors proposed to sample pairs of nodes , compute all the shortest paths from to , and update the estimates of the betweenness centrality values of the nodes internal to such paths. The authors developed a suite of algorithms to output an -approximation set of the set of betweenness centrality values. Their work cannot be easily adapted to temporal networks. In fact, static and temporal paths in general are not related in any way, and the temporal scenario introduces many novel challenges: (i) computing the optimal temporal paths, and (ii) updating the betweenness centrality values. Therefore, our algorithm  employs the idea of the estimator provided by (Riondato2018), while using novel algorithms designed for the context of temporal networks. Furthermore, the probabilistic guarantees provided by our algorithm  leverage on the variance of the estimates, differently from (Riondato2018) that used bounds based on the Rademacher averages. Our choice to use a variance-aware concentration inequality is motivated by the recent interest in providing sharp guarantees employing the empirical variance of the estimates (Cousins2021; Pellegrina2021).

Algorithms for Dynamic Networks. In this setting the algorithm keeps track of the betweenness centrality value of each node for every timestamp observed in the network (Lee2012; Hanauer2021). Note that this is extremely different from estimating the temporal betweenness centrality values in temporal networks. In the dynamic scenario the paths considered are not required to be time respecting. For example, in the dynamic scenario, if we consider the network in Figure (a)a (left) at any time , the shortest path from to is the one highlighted in purple in Figure (a)a (right). Instead, in the temporal setting such path is not time respecting. We think that it is very challenging to adapt the algorithms for dynamic networks to work in the context of temporal networks, which further motivates us to propose .

Exact Algorithms for Temporal Networks. Several exact approaches have been proposed in the literature (Tsalouchidou2020; Alsayed2015; Kim2012). The algorithm most relevant to our work was presented in (Buss2020), where the authors extended the well-known Brandes algorithm (Brandes2001) to the temporal network scenario considering the STP criterion (among several other criteria). They showed that the time complexity of their algorithm is , which is often impractical on even moderately-sized networks. Recently, (Rymar2021) discussed conditions on temporal paths under which the temporal betweenness centrality can be computed in polynomial time, showing a general algorithm running in even under the RTP criterion, which is again very far from being practical on modern networks.

We conclude by observing that, to the best of our knowledge, no approximation algorithms exist for estimating the temporal betweenness centrality in temporal networks.

4. Method, Algorithm, and Analysis

In this section we discuss , our novel algorithm for computing high-quality approximations of the temporal betweenness centrality values of the nodes of a temporal network. We first discuss the sampling strategy used in , then we present the algorithm, and finally we show the theoretical guarantees on the quality of the estimates of .

4.1.  - Sampling Strategy

In this section we discuss the sampling strategy adopted by  that is independent of the optimality criterion of the paths. However, for the sake of presentation, we discuss the sampling strategy for the STP-based temporal betweenness centrality estimation.

 samples pairs of nodes and computes all the shortest temporal paths from to . More formally, let , and be a user-specified parameter.  first collects pairs of nodes , sampled uniformly at random from . Next, for each pair it computes , i.e., the set of shortest temporal paths from to . Then, for each node s.t.  with , i.e., for each node that is internal to a shortest temporal path of ,  computes the estimate

, which is an unbiased estimator of the temporal betweenness centrality value

(i.e., , see Lemma A.1 in Appendix A). Finally, after processing the pairs of nodes randomly selected,  computes for each node the (unbiased) estimate of the actual temporal betweenness centrality by averaging over the sampling steps: , where is the estimate of obtained by analyzing the -th sample, . We will discuss the theoretical guarantees on the quality of in Section 4.4.

Input: Temporal network ,
Output: Pair s.t. is an absolute -approximation set of .
1 for  to  do
2       uniformRandomSample() SourceDestinationSTPComputation() if reached then
3             updateSTPEstimates()
4      
return
Algorithm 1 .

4.2. Algorithm Description

Sampling Algorithm:

 is presented in Algorithm 1. In line 1 we first initialize the set of objects to be sampled, where each object is a pair of distinct nodes from . Next, in line 1 we initialize the matrix of size to store the estimates of  for each node at the various iterations, needed to compute their empirical variance and the final estimates. Then we start the main loop (line 1) that will iterate times. In such loop we first select a pair sampled uniformly at randomly from (line 1). We then compute all the shortest temporal paths from to by executing Algorithm 2 (line 1), which is described in detail later in this section. Such algorithm computes all the shortest temporal paths from and adopting some pruning criteria to speed-up the computation. If at least one STP between and exists (line 1), then for each node internal to a path in we update the corresponding estimate to the current iteration by computing using Algorithm 3 (line 1). While in static networks this step can be done with a simple recursive formula (Riondato2018), in our scenario we need a specific algorithm to deal with the more challenging fact that a node may appear at different distances from a given source across different shortest temporal paths. We will discuss in detail such algorithm later in this section. At the end of the iterations of the main loop,  computes: (i) the set of unbiased estimates (line 1); (ii) and a tight bound on , which leverages the empirical variance of the estimates (line 1). We observe that is such that the set is an absolute -approximation set of . We discuss the computation of such bound in Section 4.4. Finally,  returns .

Input: , source node , destination node
1 for  do
2       ;
3for  do
4      
5; ; ; empty queue; enqueue while empty do
6       dequeue if  then
7             for  do
8                   if  then
9                         if  then
10                               if  then
11                                    
12                              
13                         enqueue
14                  if  then
15                         if  then
16                              
17                        
18                  
19            
20      
Algorithm 2 Source-Destination STP computation.

Subroutines

We now describe the subroutines employed in Algorithm 1 focusing on the STP criterion. Then, in Section 4.3, we discuss how to deal with the RTP criterion.

Source Destination Shortest Paths Computation. We start by introducing some definitions needed through this section. First, we say that a pair is a vertex appearance (VA) if . Next, given a VA  we say that a VA is a predecessor of if . Finally, given a VA we define its set of out-neighbouring VAs as .

We now describe Algorithm 2 that computes the shortest temporal paths between a source node and a destination node (invoked in  at line 1). Such computation is optimized to prune the search space once found the destination . The algorithm initializes the data structures needed to keep track of the shortest temporal paths that, starting from , reach a node in , i.e., the arrays and that contain for each node , respectively, the minimum distance to reach and the number of shortest temporal paths reaching (line 2). In line 2 we initialize that keeps track of the minimum distance of a VA from the source , that maintains the number of shortest temporal paths reaching a VA from , and keeping the set of predecessors of a VA across the shortest temporal paths explored. After initializing the values of the data structures for the source and keeping the length of the minimum distance to reach (lines 2-2), we initialize the queue that keeps the VAs to be visited in a BFS fashion in line 2 (observe that, since the temporal paths need to be time-respecting, all the paths need to account for the time at which each node is visited). Next, the algorithm explores the network in a BFS order (line 2), extracting a VA from the queue, which corresponds to a node and the time at which such node is visited, and processing it by collecting its set of out-neighbouring VAs (lines 2-2). If a VA was not already explored (i.e., it holds ), then we update the minimum distance to reach at time , the minimum distance of the vertex if it was not already visited, and, if is the destination node , we update (lines 2-2). Observe that the distance to reach is used as a pruning criterion in line 2 (clearly, if a VA appears at a distance greater than then it cannot be on a shortest temporal path from to ). After updating the VAs to be visited by inserting them in (line 2), if the current temporal path is shortest for the VA analyzed, we update the number of shortest temporal paths leading to it, its set of predecessors, and the number of shortest temporal paths reaching the node (lines 2-2).

Update Estimates: STP criterion. Now we describe Algorithm 3, which updates the temporal betweenness estimates of each node internal to a path in already computed. With Algorithm 2 we computed for each VA the number of shortest temporal paths from reaching . Now, in Algorithm 3 we need to combine such counts to compute the total number of shortest temporal paths leading to each VA appearing in a path in , allowing us to compute the estimate of  for each node .

At the end of Algorithm 2 there are in total shortest temporal paths reaching from . Now we need to compute, for each node internal to a path in and for each VA , the number of shortest temporal paths leading from to at a time greater that . Then, the fraction of paths containing the node is computed with a simple formula, i.e., , where . The whole procedure is described in Algorithm 3. We start by initializing that stores for each VA the number of shortest temporal paths reaching at a time greater than starting from , and a boolean matrix that keeps track for each VA if it has already been considered (line 3). In line 3 we initialize a queue that will be used to explore the VAs appearing along the paths in in reverse order of distance from starting from the destination node . Then we initialize for each VA reaching at a given time (line 3), and we insert each VA in the queue only one time (line 3). The algorithm then starts its main loop exploring the VAs in decreasing order of distance starting from (line 3). We take the VA to be explored in line 3. If differs from (i.e., is an internal node), then we update its temporal betweenness estimate by adding (line 3). As we did in the initialization step, then we process each predecessor of across the paths in (line 3), update the count of the paths from the predecessor to by summing the number of paths passing through and reaching (line 3), and we enqueue the predecessor only if it was not already considered (lines 3-3). So, the algorithm terminates by having properly computed for each node the estimate for each iteration .

Input: .
1 for  do
2       ;
3 empty queue; foreach  do
4       for  do
5             if  then
6                   .enqueue();
7            
8      
9while empty do
10       dequeue if  then
11             for  do
12                   if  then
13                         .enqueue();
14                  
15            
16      
Algorithm 3 Update betweenness estimates - STP.

4.3. Restless Temporal Betweenness

In this section we present the algorithms that are used in  when considering the RTP criterion for the optimal paths to compute the temporal betweenness centrality values.

Recall that, in such scenario, a temporal path is considered optimal if and only if , additionally to being shortest, is such that, given , it holds for . Considering the RTP criteria, we need to relax the definition of shortest temporal paths and, instead, consider shortest temporal walks. Intuitively, a walk is a path where we drop the constraint that a node must be visited at most once. We provide an intuition of why we need such requirement in Figure 2. Given , we refer to a shortest temporal walk as shortest -restless temporal walk.

In order to properly work under the RTP criteria,  needs novel algorithms to compute the optimal walks and update the betweenness estimates. Note that to compute the shortest -restless temporal walks we can use Algorithm 2 provided that we add the condition in line 2.

More interestingly, the biggest computational problem arises when updating the temporal betweenness values of the various nodes on the optimal walks. Note that, to do so, we cannot use Algorithm 3 because it does not account for cycles (i.e, when vertices appear multiple times across a walk). We therefore introduce Algorithm 4 (pseudocode in Appendix B) that works in the presence of cycles. The main intuition behind Algorithm 4 is that we need to recreate backwards all the optimal walks obtained through the RTP version of Algorithm 2. For each walk we will maintain a set that keeps track of the nodes already visited up to the current point of the exploration of the walk, updating a node’s estimate if and only if we see such node for the first time. This is based on the simple observation that a cycle cannot alter the value of the betweenness centrality of a node on a fixed walk, allowing us to account only once for the node’s appearance along the walk.

We now describe Algorithm 4 by discussing its differences with Algorithm 3. In line 4, instead of maintaining a matrix keeping track of the presence of a VA in the queue, we now initialize a matrix that keeps the number of times a VA is in the queue. The queue, initialized in line 4, keeps elements of the form , where the first entry is a VA to be explored and the second entry is the set of nodes already visited backwards along the walk leading to such vertex appearance. While visiting backwards each walk, we check if the nodes are visited for the first time on such walk: if so, we update the betweenness values by accounting for the number of times we will visit such VA across other walks (lines 4-4). Next, we update the set of nodes visited (line 4). Finally, we update the count of the walks leading from the predecessor of the current VA to (line 4), the number of times such predecessor will be visited (line 4), and enqueue the predecessor to be explored, together with the additional information of the set of nodes explored up to that point.

To conclude, note that Algorithm 4 is more expensive than Algorithm 3 since it recreates all the optimal walks, while Algorithm 3 avoids such step given the absence of cycles.

Figure 2. Considering the temporal network in Figure 1 and , the paths from node to node on the left are not shortest -restless since both violate the timing constraint (i.e., ). Instead, the walk on the right is shortest and meets the timing constraint with : so, it is a shortest -restless walk.

4.4.  - Theoretical Guarantees

In order to address Problem 1,  bounds the deviation between the estimates and the actual values , for every node . To do so, we leverage on the so called empirical Bernstein bound, which we adapted to .

Given a node , let , where is the estimate of by analysing the -th sample, . Let be the empirical variance of :

We use the empirical Bernstein bound to limit the deviation between ’s and ’s, which represents Corollary 5 of (maurer2009empirical) adapted to our framework, since Corollary 5 of (maurer2009empirical)

is formulated for generic random variables taking values in

and for an arbitrary set of functions.

Theorem 4.1 (Corollary 5, (maurer2009empirical)).

Let be the number of samples, and be the confidence parameter. Let be the estimate of by analysing the -th sample, and . Let , and be its empirical variance. With probability at least , and for every node , we have that

The right hand side of the inequality of the previous theorem differs from Corollary 5 of (maurer2009empirical) by a factor of in the arguments of the natural logarithms, since in (maurer2009empirical) the bound is not stated in the symmetric form reported in Theorem 4.1. Finally, the result about the guarantees on the quality of the estimates provided by  follows.

Corollary 4.2 ().

Given a temporal network , the pair in output from  is such that, with probability , it holds that is an absolute -approximation set of .

Observe that Corollary 4.2 is independent of the structure of the optimal paths considered by , therefore such guarantees hold for both the criteria considered in our work.

5. Experimental Evaluation

In this section we present our experimental evaluation that has the following goals: (i) motivate the study of the temporal betweenness centrality by showing two real world temporal networks on which the temporal betweenness provides novel insights compared to the static betweenness computed on their associated static networks; (ii) assess, considering the STP criterion, the accuracy of the ’s estimates, and the benefit of using  instead of the state-of-the-art exact approach (Buss2020), both in terms of running time and memory usage; (iii) finally, show how  can be used on a real world temporal network to analyze the RTP-based betweenness centrality values.

5.1. Setup

Name Granularity Timespan
HighSchool2012 (HS) 180 45K 20 sec 7 (days)
CollegeMsg 1.9K 59.8K 1 sec 193 (days)
EmailEu 986 332K 1 sec 803 (days)
FBWall (FB) 35.9K 199.8K 1 sec 100 (days)
Sms 44K 544.8K 1 sec 338 (days)
Mathoverflow 24.8K 390K 1 sec 6.4 (years)
Askubuntu 157K 727K 1 sec 7.2 (years)
Superuser 192K 1.1M 1 sec 7.6 (years)
Table 1. Datasets used and their statistics.

We implemented  in C++20 and compiled it using gcc 9. The code is freely available555https://github.com/iliesarpe/ONBRA.. All the experiments were performed sequentially on a 72 core Intel Xeon Gold 5520 @ 2.2GHz machine with 1008GB of RAM available. The real world datasets we used are described in Table 1, which are mostly social or message networks from different domains. Such datasets are publicly available online666http://www.sociopatterns.org/ and https://snap.stanford.edu/temporal-motifs/data.html.. For detailed descriptions of such datasets we refer to the links reported and (Paranjape2017). To obtain the FBWall dataset we cut the last 200K edges from the original dataset (Viswanath2009), which has more than 800K edges. Such cut is done to allow the exact algorithm to complete its execution without exceeding the available memory.

5.2. Temporal vs Static Betweenness

In this section we assess that the temporal betweenness centrality of the nodes of a temporal network provides novel insights compared to its static version. To do so, we computed for two datasets, from different domains, the exact ranking of the various nodes according to their betweenness values. The goal of this experiment is to compare the two rankings (i.e., temporal and static) and understand if the relative orderings are preserved, i.e., verify if the most central nodes in the static network are also the most central nodes in the temporal network. To this end, given a temporal network , let be its associated static network. We used the following two real world networks: (i) HS, that is a temporal network representing a face-to-face interaction network among students; (ii) and FB, that is a Facebook user-activity network (Viswanath2009) (see Table 1 for further details).

We first computed the exact temporal and static betweenness values of the different nodes of the two networks. Then, we ranked the nodes by descending betweenness values. We now discuss how the top- ranked nodes vary from temporal to static on the two networks. We report in Table 3 (in Appendix C) the Jaccard similarity between the sets containing the top- nodes of the static and temporal networks. On HS, for , only 11 nodes are top ranked in both the rankings, which means that less than half of the top-25 nodes are central if only the static information is considered. The value of the intersection increases to for , since the network has only 180 nodes. More interestingly, also on the Facebook network only few temporally central nodes can be detected by considering only static information: only 9 over the top-25 nodes and 15 over the top-50 nodes. In order to better visualize the top- ranked nodes, we show their betweenness values in Figure (a)a: note that there are many top- temporally ranked nodes having small static betweenness values, and vice versa.

These experiments show the importance of studying the temporal betweenness centrality, which provides novel insights compared to the static version.

(a)
(b)
(c)
(d)
Figure 3. ((a)a): static and temporal betweenness values of the top-50 ranked nodes of the dataset FB; ((b)b): for dataset EmailEu, the deviations (or absolute errors) between the estimates and the actual values of the temporal betweenness centrality, for decreasing order of ; ((c)c,(d)d): comparison between the temporal betweenness values based on STP and RTP, for =15 days (left) and =1 month (right).
Dataset Avg. Error Sample rate (%) (sec) (sec) MEM (GB) MEM (GB)
CollegeMsg 0.083 231 148 12.0 0.13
EmailEu 0.093 7211 1808 23.9 2.1
Mathoverflow 0.005 79492 36983 1004.3 6.8
FBWall 0.003 11489 3145 738.0 11.1
Askubuntu 0.00006 35585 1008 20.3
Sms 0.00231 13020 1008 16.2
Superuser 0.00003 41856 1008 16.7
Table 2. For each dataset, the average and maximum deviation between the estimate and the actual temporal betweenness value over all nodes and runs, respectively and , the theoretical upper bound , the (%) of pairs of nodes we sampled, the running time and peak RAM memory required by the exact approach (Buss2020), the running time and peak RAM memory required by . The symbol ✗ denotes that the exact computation of (Buss2020) is not able to conclude on our machine.

5.3. Accuracy and Resources of

In this section we first assess the accuracy of the estimates provided by  considering only the STP criterion, since for the RTP criterion no implemented exact algorithm exists. Then, we show the reduction of computational resources induced by  compared to the exact algorithm in (Buss2020).

To assess ’s accuracy and its computational cost, we used four datasets, i.e., CollegeMsg, EmailEu, Mathoverflow, and FBWall. We first executed the exact algorithm, and then we fix and properly for  to run within a fraction of the time required by the exact algorithm. The results we now present, which are described in detail in Table 2, are all averaged over 10 runs (except for the RAM peak, which is measured over one single execution of the algorithms).

Remarkably, even using less than of the overall pairs of nodes as sample size,  is able to estimate the temporal betweenness centrality values with very small average deviations between and , while obtaining a significant running time speed-up between and with respect to the exact algorithm (Buss2020). Additionally, the amount of RAM memory used by  is significantly smaller than the exact algorithm in (Buss2020): e.g., on the Mathoverflow dataset  requires only GB of RAM peak, which is less than the GB required by the exact state-of-the-art algorithm (Buss2020). Furthermore, in all the experiments we found that the maximum deviation is distant at most one order of magnitude from the theoretical upper bound guaranteed by Corollary 4.2. Surprisingly, for two datasets (EmailEu and Mathoverflow) the maximum deviation and the upper bound are even of the same order of magnitude. Therefore we can conclude that the guarantees provided by Corollary 4.2 are often very sharp. In addition, ’s accuracy is demonstrated by the fact that the deviation between the actual temporal betweenness centrality value of a node and its estimate obtained using  is about one order of magnitude less than the actual value, as we show in Figure (b)b and Figure 4 (in Appendix C).

Finally, we show in Table 2 that on the large datasets Askubuntu, Sms, and Superuser the exact algorithm (Buss2020) is not able to conclude the computation on our machine (denoted with ✗) since it requires more than 1008GB of RAM. Instead,  provides estimates of the temporal betweenness centrality values in less than K (sec) and GB of RAM memory.

To conclude,  is able to estimate the temporal betweenness centrality with high accuracy providing rigorous and sharp guarantees, while significantly reducing the computational resources required by the exact algorithm in (Buss2020).

5.4.  on RTP-based Betweenness

In this section we discuss how  can be used to analyze real world networks by estimating the centrality values of the nodes for the temporal betweenness under the RTP criterion.

We used the FB network, on which we computed a tight approximation of the temporal betweenness values () of the nodes for different values of , i.e., =1 day, =15 days, and =1 month. For =1 day, we found only 4 nodes with temporal betweenness value different from 0, which is surprising since it highlights that the information spreading across wall posts through RTPs in 2008 on Facebook required more than 1 day of time between consecutive interactions (i.e., slow spreading). We present the results for the other values of in Figures (c)c and (d)d, comparing them to the (exact) STP-based betweenness. Interestingly, 15 days are still not sufficient to capture most of the betweenness values based on STPs of the different nodes, while with =1 month the betweenness values are much closer to the STP-based values. While this behaviour is to be expected with increasing , finding such values of helps to better characterize the dynamics over the network.

To conclude,  also enables novel analyses that cannot otherwise be performed with existing tools.

6. Discussion

In this work we presented , the first algorithm that provides high-quality approximations of the temporal betweenness centrality values of the nodes in a temporal network, with rigorous probabilistic guarantees.  works under two different optimality criteria for the paths on which the temporal betweenness centrality is defined: shortest and restless temporal paths (STP, RTP) criteria. To the best of our knowledge,  is the first algorithm enabling a practical computation under the RTP criteria. Our experimental evaluation shows that  provides high-quality estimates with tight guarantees, while remarkably reducing the computational costs compared to the state-of-the-art in (Buss2020), enabling analyses that would not otherwise be possible to perform.

Finally, several interesting directions could be explored in the future, such as dealing with different optimality criteria for the paths, and employing sharper concentration inequalities to provide tighter guarantees on the quality of the estimates.

Acknowledgements.
Part of this work was supported by the Italian Ministry of Education, University and Research (MIUR), under PRIN Project n. 20174LF3T8 “AHeAD” (efficient Algorithms for HArnessing networked Data) and the initiative “Departments of Excellence” (Law 232/2016), and by University of Padova under project “SID 2020: RATED-X”.

References

Appendix A Missing Proofs

Lemma A.1 ().

Let , then is an unbiased estimator of .

Proof.

Let be a Bernoulli random variable that takes value if the pair of nodes is sampled, and otherwise. Since , then by the linearity of expectation,

Appendix B Missing Algorithms

In this Section we present Algorithm 4, used to compute the temporal betweenness values estimates of the various nodes under the RTP criterion. This is discussed in details in Section 4.3.

Input: .
1 for  do
2       ;
3 empty queue; foreach  do
4       for  do
5             .enqueue();
6      
7while empty do
8       dequeue if  then
9             if  then
10                  
11             for  do
12                   .enqueue();
13            
14      
Algorithm 4 Update betweenness estimates - RTP.

Appendix C Supplementary Data

(a)
(b)
(c)
(d)
Figure 4. ((a)a): static and temporal betweenness values of the top-50 ranked nodes of the dataset dataset; ((b)b),((c)c), and ((d)d): respectively for datasets CollegeMsg, FBWall, and Mathoverflow, the deviations (or absolute errors) between the estimates and the actual values of the temporal betweenness centrality, for decreasing order of .

Given and , let be the top- nodes ranked by their temporal betweenness values and let be the top- nodes ranked by their static betweenness values. We report in Table 3 the Jaccard similarity for two different values of .

Name
HS 0.28 (11) 0.56 (36)
FB 0.22 (9) 0.18 (15)
Table 3. Static vs temporal top- nodes Jaccard similarity . We also report the size of the intersection.