Eccentricity queries and beyond using Hub Labels

10/29/2020 ∙ by Guillaume Ducoffe, et al. ∙ 0

Hub labeling schemes are popular methods for computing distances on road networks and other large complex networks, often answering to a query within a few microseconds for graphs with millions of edges. In this work, we study their algorithmic applications beyond distance queries. We focus on eccentricity queries and distance-sum queries, for several versions of these problems on directed weighted graphs, that is in part motivated by their importance in facility location problems. On the negative side, we show conditional lower bounds for these above problems on unweighted undirected sparse graphs, via standard constructions from "Fine-grained" complexity. However, things take a different turn when the hub labels have a sublogarithmic size. Indeed, given a hub labeling of maximum label size ≤ k, after pre-processing the labels in total 2^O(k)· |V|^1+o(1) time, we can compute both the eccentricity and the distance-sum of any vertex in 2^O(k)· |V|^o(1) time. It can also be applied to the fast global computation of some topological indices. Finally, as a by-product of our approach, on any fixed class of unweighted graphs with bounded expansion, we can decide whether the diameter of an n-vertex graph in the class is at most k in f(k) · n^1+o(1) time, for some "explicit" function f.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We refer to [10, 24] for basic notions and terminology in Graph Theory. Real-world networks can be conveniently represented as a weighted graph , with typical interpretations of the weight function being the latency of a link, the length of a road between two locations, etc. If , then we can drop the weight function, calling the graph unweighted and writing . Unless stated otherwise, all graphs considered in this note are finite, weighted, directed and strongly connected111The strongly connected assumption could be easily dropped, but it makes the presentation simpler by avoiding the pathological case of infinite distances.

. For every ordered pair of vertices

, the distance is defined as the smallest weight of a path from to . We omit the subscript if the graph is clear from the context. We note that distance computation is an important primitive in many scenarios, ranging from Satellite Navigation devices to forwarding network packets. However, in some practical or theoretical applications, the “right” notion of distance is a twist of the one presented above. Let us give a few examples (taken from [2]).

  • Source distance: . This is the standard distance.

  • Max distance: . It turns the distances in a directed graph into a metric (i.e., it is symmetric and it satisfies the triangular inequality).

  • Min distance: . As a concrete application of the latter, Abboud et al. [2] cited the problem of optimally locating a hospital (minimizing the time for either the patient coming to the hospital, or an ambulance drive being sent at the patient’s home).

  • Roundtrip distance: . It also turns the distances in a directed graph into a metric. Many distance computation techniques for undirected graphs can be generalized to roundrip distances, such as e.g., the existence of low-stretch sparse spanners [51].

A distance oracle is a data structure for computing distances. Next, we present a well-known family of distance oracles, of which we explore in this work the algorithmic properties.

Hub labeling. A -hopset or, as we call it here, a hub labeling, assigns to every vertex an ordered pair of vertex-subsets along with the distances , for , and , for . This labeling must ensure that, for any ordered pair of vertices , we have . In particular, we can compute – and more generally, the source, max-, min- and roundtrip distances from to – in time. It has been observed in [43] that most distance labeling constructions are based on hub label schemes, thereby further motivating our study of the latter as a first step toward a more general analysis of the algorithmic properties of distance oracles.

The maximum label size is defined as , and it is the main complexity measure for hub labelings222The bit size is larger by at least a logarithmic factor, because we need to encode the vertices’ IDs and the distances values for all the vertices inside the labels.. Every graph admits a hub labeling of maximum label size , and this is essentially optimal [34]. However, it has been experimentally verified that many real-world graphs, and especially road networks, admit hub labelings with small maximum label size [17]. From a more theoretical point of view, every -node planar graph admits a hub labeling with maximum label size  [34]. Bounded-treewidth graphs and graphs of bounded highway dimension – a parameter conjectured to be small for road networks – admit hub labelings with (poly)logarithmic maximum label size [4, 34]. Graphs of bounded tree-depth even have hub labelings with constant maximum label size, that can be computed in quasi linear time [40].

Eccentricity and Distance-sum computations. Since many hub labeling schemes are readily implemented [6, 3, 5, 22, 44], our approach consists in assuming a graph to be given with a hub labeling, and to study whether this additional information helps in solving faster certain distance problems. Specifically, let represent any distance function on directed graphs (source, min-, max- or roundtrip). The eccentricity of a vertex , denoted by , is the maximum distance from to any other vertex, i.e., . The diameter and the radius of a graph – w.r.t. – are, respectively, the maximum and minimum eccentricities amongst its vertices. The distance-sum of a vertex , that we denote by , is the sum of all the distances from to any other vertex, i.e., . A median is a vertex minimizing its distance-sum, while the Wiener index is equal to . We stress that both computing the radius and the median set of a graph are fundamental facility location problems: with applications to optimal placement of schools, hospitals and other important facilities on a network. The problems of computing the diameter and the Wiener index found applications in network design and chemistry [53], respectively. In Sec. 3, we consider other topological indices from chemical graph theory.

1.1 Related work

Computing the eccentricities in a graph is a notoriously “hard” problem in P, for which no quasi linear-time algorithm is likely to exist. For weighted -vertex graphs, the problem is subcubic equivalent to All-Pairs Shortest-Paths (APSP). It is conjectured that we cannot solve APSP in time for any  [1]. For the special case of unweighted graphs, we can compute the eccentricities in truly subcubic time using fast matrix multiplication [55, 62]. However if we assume the Strong Exponential-Time Hypothesis (SETH), then it cannot be improved to time, for any  [52]. Under a related hypothesis, the so-called “Hitting Set Conjecture”, similar hardness results were proved for median computation [2]

. For all that, some heuristics and exact algorithms for diameter, radius and eccentricities computations do perform well in practice 

[11, 21, 56, 57], which has been partially justified from a theoretical viewpoint [25, 59, 60].

These hardness results have renewed interest in characterizing the graph classes for which we can compute some of these above distance problems in truly subquadratic time, or even in quasi linear time. In this respect, several recent results were proved by using classical tools from Computational Geometry [2, 13, 14, 15, 26, 28, 27, 35]. Perhaps the most impressive such result is the truly subquadratic algorithm of Cabello for computing the diameter and the Wiener index of planar graphs [15]. It is worth of mentioning that Cabello’s techniques were the starting point of recent breakthrough results for exact distance oracles on planar graphs [18, 36]. In this paper, we ask whether conversely, having at hands an exact distance oracle with certain properties can be helpful in the design of fast algorithms for some distance problems on graphs. We give a partial answer for hub labelings.

Hub Labelings.

Computing a hub labeling with optimal maximum label size is NP-hard [7], but it can be approximated up to a logarithmic factor in polynomial time [17]. The use of hub labels beyond distance queries was already considered in [31] and [63] for -nearest neighbours queries and shortest-path counting, respectively. Several generalizations of hub labels were proposed. For instance, a -hopset assigns a pair of labels for every vertex (as for hub labelings), but it computes an additional global set of distances, denoted by . This labeling must ensure that, for every vertices and ,  [37]. A distance labeling scheme is given by an encoding function and a decoding function . It assigns a label to every vertex s.t., for every vertices and , we have  [34]. These extensions are to be investigated in future work.

1.2 Results

We first address the following types of queries on the vertices of a graph .

  • Eccentricity query: compute the eccentricity of ;

  • Distance-sum query: compute the distance-sum of .

Before stating our results, let us discuss about naive resolutions for these two types of queries. If we compute APSP in time, then we can compute all eccentricities and distance-sums in additional time. In doing so, we get a trivial data structure in -space and query time. Therefore, the main challenge here is to decrease the pre-processing time. A hub labeling (or more generally, a distance-labeling) of maximum label size is an implicit representation of the distance-matrix in space. It allows us to answer these above two types of queries in time. However, this query time is in , even in the very favourable case of . If for instance, the graph considered is sparse, then this is no better than computing a shortest-path tree from scratch. Our first observation is that, in general, hub labelings do not help. Namely:

Theorem 1.

Under SETH, for any , any data structure for answering eccentricity queries or distance-sum queries requires pre-processing time or query time. This holds even if a hub labeling of maximum label size is given as part of the input.

Proof.

We consider undirected graphs, for which all the aforementioned distances are equivalent (up to a factor for the roundtrip distance). In particular, a split graph is a graph whose vertex-set can be bipartitioned into a clique and a stable set . We observe that any unweighted split graph admits a hub labeling with maximum label size , that can be computed in time: simply store, for each vertex, its distance from and to any vertex of , that is either or . If we further assume , then this trivial hub labeling can be computed in time.

Under SETH, for any , we cannot compute the diameter of -vertex split graphs in time, and this result holds even if  [12]. Assume the existence of a data structure for eccentricity queries, with pre-processing time and query time. In particular, we can compute the diameter in time. Therefore, , or

In the same way, assume the existence of a data structure for distance-sum queries, with pre-processing time and query time. Observe that the diameter of a non-complete split graph is either or . It is folklore (see, e.g.[13]), that an undirected unweighted graph has diameter at most if and only if the distance-sum of any vertex of degree is equal to . In particular, we can compute the diameter of an unweighted undirected split graph in time. As a result, under SETH we must also have , or

A framework for fast queries computation. Positive results can be derived for hub labelings with sublogarithmic labels. Our algorithm is a novel application of a popular framework for fast diameter computation, using orthogonal range queries [2, 13, 14, 29, 27].

Theorem 2.

For source, min-, max- and roundtrip distances, for every graph , if we are given a hub labeling with maximum label size , then we can compute a data structure for answering eccentricity queries and distance-sum queries with query time. This takes pre-processing time.

We stress that in practice, the bottleneck of Theorem 2 is the computation of a hub labeling. For this task, we are bound to use heuristics [22]. In a few restricted classes, such as graphs of bounded tree-depth, constant-size hub labels can be computed in quasi linear time [40]. – Below, we give an application of this result to graph classes of bounded expansion. – Let us cite, as another example of graphs with small hub labels, the graphs of bounded vertex-cover. Indeed, we can easily derive a hub labeling of maximum label size from a vertex-cover of cardinality . However, for the unweighted graphs of bounded vertex-cover, there exist slightly faster methods for eccentricities and distance-sum computations than our Theorem 2 [13, 19]. We hope that our results will encourage the quest for other graph classes that admit constant-size hub labels. For instance, it was conjectured in [4] that all graphs of highway dimension at most admit a hub labeling with maximum label size . Our results in this paper show that solving this conjecture in the affirmative would be a first step toward fast diameter computation within these graphs, with applications to road networks.

Our techniques are inspired from those in [1] for bounded treewidth graphs, of which they are to some extent a generalization. Indeed, the algorithms in [1] for graphs of treewidth at most parse a hub labeling of maximum label size in , but where every label is composed of levels, each containing vertices. Roughly, it allows the authors from [1] to build an algorithm of logarithmic recursion depth, where at each call we may assume to be given hub labels of maximum size . However, in [1], all the nodes considered have the same vertices in their labels (the latter forming a balanced separator). This is no more true for general hub labels of maximum size , a case which requires more complex range queries than in [1]. Incidentally, for roundtrip distances, we improved on our way the -time algorithm from [1] for directed graphs of treewidth at most :

Corollary 1.

For the roundtrip distance, for any -vertex directed graph of treewidth at most , we can compute all the eccentricities and distance-sums in time.

Application to graph classes of bounded expansion. Our initial motivation for this work was to study constant diameter computation within graph classes of bounded expansion. Without entering too much into technical details, the graph classes of bounded expansion are exactly those whose so-called shallow minors are all sparse (see Sec. 4 for a formal definition). In particular, it generalizes bounded-degree graphs, bounded-treewidth graphs, and more generally properly minor-closed graph classes [46]. In [23], evidence was given that many classes of complex networks exhibit a bounded-expansion structure. Among those networks, social networks are well-known to obey the so-called “small-world” property [61], that implies a relatively small diameter. In this context, our theoretical results for constant-diameter computation within graph classes of bounded expansion might be a first step toward more practical algorithms for computing the diameter of social networks and other complex networks with similar properties.

Theorem 3.

For every class of graphs of bounded expansion, for every -vertex unweighted graph and positive integer , we can decide whether the diameter of is at most in time, for some function .

Diameter computation within unweighted graph classes of bounded expansion has already received some attention in the literature. In particular, for the special case of undirected graphs of maximum degree , Dahlgaard and Evald proved that under SETH, we cannot compute the diameter in truly subquadratic time [32]. But their hardness results hold for bounded-degree graphs of super-logarithmic diameter. In contrast to this negative result, testing whether a graph has diameter at most some constant can be written as a first-order formula of size . Therefore, in any class of unweighted graphs of bounded expansion, we can derive from a prior work of Dvor̆ák et al. a quasi linear-time parameterized algorithm for constant diameter computation [30]. Unfortunately, the hidden dependency in the parameter is rather huge due to the use of Courcelle’s theorem [20]. Recently, we proposed a different approach for constant diameter computation within nowhere dense graph classes – a broad generalization of the graph classes of bounded expansion–, based on a VC-dimension argument [28]. However, the running time of our algorithm for deciding whether the diameter of an -vertex graph is at most was of order for some super-exponential function .

Our Theorem 3 improves on these previous works by using low tree-depth decompositions: a covering of -vertex graphs of bounded expansion by relatively few subgraphs of bounded tree-depth, so that for a fixed , each -vertex subgraph is contained in at least one subgraph of this covering [46]. Combined with previous results on hub labelings within bounded tree-depth graph classes [40], it allows us to prove the existence, for every fixed , of -size hub labels for the pairs of vertices at distance , for some “explicit” function – about a tower of exponentials of height four. Then, Theorem 3 follows from our Theorem 2. We left open whether our approach could be generalized to nowhere dense graph classes.

1.3 Organization of the paper

Our main technical contribution (Theorem 2) is proved in Sec. 2. In Sec. 3.1 and 3.2, we discuss applications of Theorem 2 to, respectively, bounded-treewidth graphs (Corollary 1) and the computation of topological indices. In particular, in Sec. 3.2, we exploit recent results of Cabello from [16], and we present a new type of distance information which can be computed from the orthogonal range query framework. Finally, we conclude this paper in Sec. 4 with our results for graph classes of bounded expansion (Theorem 3).

2 Properties of Hub Labels

2.1 Range Queries

We first review some important terminology, and prior results. Let be a set of -dimensional points, where each point is assigned some value . A box is the cartesian product of intervals , for , denoted by . Note that we allow each of the intervals to exclude either of its ends, and that we allow these ends to be infinite. Furthermore, if and only if . A range query asks for some information about the points within a given box. We use the following types of range queries:

  • Max-Query: Compute a point maximizing ;

  • Sum-Query: Compute the sum ;

  • Count-Query: Count (the latter can be obtained from the above sum-query by setting ).

We refer to [13] for a thorough treatment of range queries and their applications to distance problems on graphs. The -dimensional range tree is a classic data structure in order to answer range queries efficiently for static point sets. Evidence of its practicality for graph problems was given in [45].

Lemma 1 ([13]).

For all the aforementioned types of range queries, for every -dimensional point set of size , we can construct a -dimensional range tree in time, that allows to answer a query in time.

2.2 Proof of Theorem 2

We are now ready to prove our main algorithmic tool for this paper. Our approach is essentially independent from the type of query and the distance considered. We first present our results for eccentricity queries, postponing the slight changes to be made for distance-sum queries until the end of this section. Similarly, we postpone the specific parts for each distance function to the end of the section. In what follows, let be an arbitrary distance function (i.e., source, min-, max- or roundtrip).

Decomposition-based techniques. Let us fix a hub labeling for of maximum label size . For every , let . In particular, is a partition of . For a fixed choice of and any vertex , we want to compute: Indeed, . For that, let us fix a total ordering over . For any vertex and , we now define:

We observe that , that follows from the definition of hub labelings. Finally, let be arbitrary. Recall that we totally ordered the vertices of the graph. We further reduce the computation of all eccentricities to computing:

for all and . As before we can observe that . Furthermore, because the maximum label size is , for any fixed vertex there are at most tuples to consider. We will show how to reduce the computation of to range queries. The following technical lemma is the gist of our approach in the paper.

Lemma 2.

Let a hub labeling with maximum label size be given for a graph . In time, we can map every vertex to a -dimensional point-set .

Moreover, for every and , let (resp., ) be the set of all vertices s.t.: , , is the least vertex of on a shortest -path, is the least vertex of on a shortest -path, and (resp., ). In time, we can compute a family of boxes where , , and we have:

If , then there exists a unique

Proof.

For every and , the point is defined as follows:

  • The elements are the vertices in totally ordered. In particular, these first coordinates form an -dimensional point that is a common prefix to all the points .

  • The elements are the vertices in totally ordered. These consecutive coordinates form a -dimensional point that is also common to all the points .

  • The elements are equal to the values: for . In particular, these coordinates may be different between and , for .

  • The elements are equal to: for . In particular, these coordinates may be different between and , for .

  • Finally, we set the last coordinate to .

Then, let and be fixed. In order to restrict ourselves to the vertices s.t. , we define a family of range queries over the points .

  1. We encode the set of indices in which the nodes of must be found. That is, let be totally ordered. For a fixed choice of , we restrict ourselves the vertices such that . We stress that there are only possibilities.

  2. In order to exclude from , we encode, for each vertex in this subset, the least index s.t. , and so, all subsequent vertices of , is greater than ; if no such index exists, then by convention we associate to the value . Specifically, for every , we pick and we add the following range constraints for our query:

    Since we totally ordered , each possibility is fully characterized by: (i) the set of indices ; (ii) and an ordered partition of such that two vertices are in the same group if and only if . Hence, the number of possibilities here is at most .

Furthermore, in the exact same way as above, in order to restrict ourselves to the points s.t. , we can define a family of range queries over the points . For the remaining of the proof, let us fix one query over the points , and one query over the points . Note that we can represent the latter as two sequences of indices and . We want to further restrict ourselves to the vertices s.t. is the least vertex of on a shortest -path, and in the same way is the least vertex of on a shortest -path. The above condition on is equivalent to have :

which can be rewritten as:

and under this form, can be encoded as additional range constraints over the points , for the indices between and . We proceed similarly for the desired condition on the vertex . Finally, in order to complete these inequalities into a range query for , we further impose: Indeed, in this situation, . We proceed similarly for by changing direction of the last inequality. ∎

We split the proof for eccentricity queries in two lemmas so as to take into account additional technicalities for the min- and max-distance. Specifically:

Lemma 3.

Let a hub labeling with maximum label size be given for a graph . If is the source distance or the roundtrip distance, then after a pre-processing in time, for any and we can compute in time.

Proof.

We create range trees, for point sets of various dimensions. More specifically, let be fixed. We create different -dimensional range trees, that are indexed by all possible pairs . For every vertex , we insert the point (defined in Lemma 2) in the range tree with same index . The corresponding value depends on the distance considered. This will be discussed at the end of the proof. By Lemmas 1 (applied times) and 2, this overall pre-processing phase can be executed in total time.

Answering a query. In what follows, let and be fixed. Applying Lemma 2, we compute boxes for the points to which the vertices were mapped. For every such box , let be such that and is maximized.

  • If is the source distance, then we set . In particular, we have: . Note that in this special case, are irrelevant.

  • If is the roundtrip distance, then we set . In particular, we have:

We are done by applying Lemma 1 for max-queries. ∎

Lemma 4.

Let a hub labeling with maximum label size be given for a graph . If is either the min-distance or the max-distance, then after a pre-processing in time, for any and we can compute in time.

Proof.

We only detail the necessary modifications for the proof of Lemma 3. Let be fixed. We create different -dimensional range trees, that are indexed by and for all possible pairs . – In particular, we need twice more range trees than for Lemma 3. – For every and , we insert two identical copies of in the range trees that are indexed by and , with different values associated:

  • For the index , ;

  • For the index , .

Let and be fixed. Applying Lemma 2, we compute a family of boxes for the points to which the vertices were mapped. In the same way, we compute a family of boxes for the points to which the vertices were mapped In doing so, if is the min-distance then we have: . It is straightforward to adapt the above to the max-distance, i.e., by reversing the respective roles of and . ∎

Lemmas 3 and 4 complete the proof of Theorem 2 for eccentricity queries. For adapting our approach to distance-sum queries, the key observation is that, for any fixed , the sets and , over all possible tuples , form a partition of . In particular, for any distance we have: . Then, for a fixed , applying Lemma 2 we compute a family of boxes for the points to which the vertices were mapped. We have that , for all above boxes , is a partition of