Fast approximation of centrality and distances in hyperbolic graphs

05/17/2018 ∙ by Victor Chepoi, et al. ∙ Young’s fringes pattern obtained at 80 kV showing a point 0

We show that the eccentricities (and thus the centrality indices) of all vertices of a δ-hyperbolic graph G=(V,E) can be computed in linear time with an additive one-sided error of at most cδ, i.e., after a linear time preprocessing, for every vertex v of G one can compute in O(1) time an estimate ê(v) of its eccentricity ecc_G(v) such that ecc_G(v)≤ê(v)≤ ecc_G(v)+ cδ for a small constant c. We prove that every δ-hyperbolic graph G has a shortest path tree, constructible in linear time, such that for every vertex v of G, ecc_G(v)≤ ecc_T(v)≤ ecc_G(v)+ cδ. These results are based on an interesting monotonicity property of the eccentricity function of hyperbolic graphs: the closer a vertex is to the center of G, the smaller its eccentricity is. We also show that the distance matrix of G with an additive one-sided error of at most c'δ can be computed in O(|V|^2^2|V|) time, where c'< c is a small constant. Recent empirical studies show that many real-world graphs (including Internet application networks, web networks, collaboration networks, social networks, biological networks, and others) have small hyperbolicity. So, we analyze the performance of our algorithms for approximating centrality and distance matrix on a number of real-world networks. Our experimental results show that the obtained estimates are even better than the theoretical bounds.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The diameter and the radius of a graph are two fundamental metric parameters that have many important practical applications in real world networks. The problem of finding the center of a graph is often studied as a facility location problem for networks where one needs to select a single vertex to place a facility so that the maximum distance from any demand vertex in the network is minimized. In the analysis of social networks (e.g., citation networks or recommendation networks), biological systems (e.g., protein interaction networks), computer networks (e.g., the Internet or peer-to-peer networks), transportation networks (e.g., public transportation or road networks), etc., the eccentricity of a vertex is used to measure the importance of in the network: the centrality index of [69] is defined as .

Being able to compute efficiently the diameter, center, radius, and vertex centralities of a given graph has become an increasingly important problem in the analysis of large networks. The algorithmic complexity of the diameter and radius problems is very well-studied. For some special classes of graphs there are efficient algorithms [8, 18, 25, 30, 33, 38, 42, 53, 56, 62, 79]. However, for general graphs, the only known algorithms computing the diameter and the radius exactly compute the distance between every pair of vertices in the graph, thus solving the all-pairs shortest paths problem (APSP) and hence computing all eccentricities. In view of recent negative results [8, 21, 83], this seems to be the best what one can do since even for graphs with (where is the number of edges and is the number of vertices) the existence of a subquadratic time (that is, time for some ) algorithm for the diameter or the radius problem will refute the well known Strong Exponential Time Hypothesis (SETH). Furthermore, recent work [9] shows that if the radius of a possibly dense graph () can be computed in subcubic time ( for some ), then APSP also admits a subcubic algorithm. Such an algorithm for APSP has long eluded researchers, and it is often conjectured that it does not exist (see, e.g., [84, 90]).

Motivated by these negative results, researches started devoting more attention to development of fast approximation algorithms. In the analysis of large-scale networks, for fast estimations of diameter, center, radius, and centrality indices, linear or almost linear time algorithms are desirable. One hopes also for the all-pairs shortest paths problem to have time small-constant–factor approximation algorithms. In general graphs, both diameter and radius can be 2-approximated by a simple linear time algorithm which picks any node and reports its eccentricity. A 3/2-approximation algorithm for the diameter and the radius which runs in 111 hides a polylog factor. time was recently obtained in [31] (see also [12] for an earlier time algorithm and [83] for a randomized time algorithm). For the sparse graphs, this is an time approximation algorithm. Furthermore, under plausible assumptions, no time algorithm can exist that -approximates (for ) the diameter [83] and the radius [8] in sparse graphs. Similar results are known also for all eccentricities: a 5/3-approximation to the eccentricities of all vertices can be computed in time [31] and, under plausible assumptions, no time algorithm can exist that -approximates (for ) the eccentricities of all vertices in sparse graphs [8]. Better approximation algorithms are known for some special classes of graphs [27, 34, 35, 42, 43, 50, 51, 54, 94]

. A number of heuristics for approximating diameters, radii and eccentricities in real-world graphs were proposed and investigated in

[10, 21, 22, 23, 69, 24, 52].

Approximability of APSP is also extensively investigated. An additive -approximation for APSP in unweighted undirected graphs (the graphs we consider in this paper) was presented in [46]. It runs in time and hence improves the runtime of an earlier algorithm from [12]. In [19], an time algorithm was designed which computes an approximation of all distances with a multiplicative error of 2 and an additive error of 1. Furthermore, [19] gives an time algorithm that computes an approximation of all distances with a multiplicative error of and an additive error of 2. The latter improves an earlier algorithm from [58]. Better algorithms are known for some special classes of graphs (see [25, 35, 49, 89] and papers cited therein).

The need for fast approximation algorithms for estimating diameters, radii, centrality indices, or all pairs shortest paths in large-scale complex networks dictates to look for geometric and topological properties of those networks and utilize them algorithmically. The classical relationships between the diameter, radius, and center of trees and folklore linear time algorithms for their computation is one of the departing points of this research. A result from 1869 by C. Jordan [66] asserts that the radius of a tree is roughly equal to half of its diameter and the center is either the middle vertex or the middle edge of any diametral path. The diameter and a diametral pair of can be computed (in linear time) by a simple but elegant procedure: pick any vertex , find any vertex furthest from , and find once more a vertex furthest from ; then return as a diametral pair. One computation of a furthest vertex is called an FP scan; hence the diameter of a tree can be computed via two FP scans. This two FP scans procedure can be extended to exact or approximate computation of the diameter and radius in many classes of tree-like graphs. For example, this approach was used to compute the radius and a central vertex of a chordal graph in linear time [33]. In this case, the center of is still close to the middle of all -shortest paths and is not the diameter but is still its good approximation: . Even better, the diameter of any chordal graph can be approximated in linear time with an additive error 1  [54]. But it turns out that the exact computation of diameters of chordal graphs is as difficult as the general diameter problem: it is even difficult to decide if the diameter of a split graph is 2 or 3.

The experience with chordal graphs shows that one have to abandon the hope of having fast exact algorithms, even for very simple (from metric point of view) graph-classes, and to search for fast algorithms approximating with a small additive constant depending only of the coarse geometry of the graph. Gromov hyperbolicity or the negative curvature of a graph (and, more generally, of a metric space) is one such constant. A graph is -hyperbolic [14, 59, 28, 60] if for any four vertices of , the two largest of the three distance sums , , differ by at most . The hyperbolicity of a graph is the smallest number such that is -hyperbolic. The hyperbolicity can be viewed as a local measure of how close a graph is metrically to a tree: the smaller the hyperbolicity is, the closer its metric is to a tree-metric (trees are 0-hyperbolic and chordal graphs are 1-hyperbolic).

Recent empirical studies showed that many real-world graphs (including Internet application networks, web networks, collaboration networks, social networks, biological networks, and others) are tree-like from a metric point of view [10, 11, 20] or have small hyperbolicity [67, 77, 85]. It has been suggested in [77], and recently formally proved in [39], that the property, observed in real-world networks, in which traffic between nodes tends to go through a relatively small core of the network, as if the shortest paths between them are curved inwards, is due to the hyperbolicity of the network. Bending property of the eccentricity function in hyperbolic graphs were used in [16, 15] to identify core-periphery structures in biological networks. Small hyperbolicity in real-world graphs provides also many algorithmic advantages. Efficient approximate solutions are attainable for a number of optimization problems [35, 36, 37, 39, 40, 44, 57, 92].

In [35] we initiated the investigation of diameter, center, and radius problems for -hyperbolic graphs and we showed that the existing approach for trees can be extended to this general framework. Namely, it is shown in [35] that if is a -hyperbolic graph and is the pair returned after two FP scans, then , , , and is contained in a small ball centered at a middle vertex of any shortest -path. Consequently, we obtained linear time algorithms for the diameter and radius problems with additive errors linearly depending on the input graph’s hyperbolicity.

In this paper, we advance this line of research and provide a linear time algorithm for approximate computation of the eccentricities (and thus of centrality indices) of all vertices of a -hyperbolic graph , i.e., we compute the approximate values of all eccentricities within the same time bounds as one computes the approximation of the largest or the smallest eccentricity ( or ). Namely, the algorithm outputs for every vertex of an estimate of such that where is a small constant. In fact, we demonstrate that has a shortest path tree, constructible in linear time, such that for every vertex of , (a so-called eccentricity -approximating spanning tree). This is our first main result of this paper and the main ingredient in proving it is the following interesting dependency between the eccentricities of vertices of and their distances to the center : up to an additive error linearly depending on , is equal to plus . To establish this new result, we have to revisit the results of [35] about diameters, radii, and centers, by simplifying their proofs and extending them to all eccentricities.

Eccentricity -approximating spanning trees were introduced by Prisner in [81]. A spanning tree of a graph is called an eccentricity -approximating spanning tree if for every vertex of holds [81]. Prisner observed that any graph admitting an additive tree -spanner (that is, a spanning tree such that for every pair ) admits also an eccentricity -approximating spanning tree. Therefore, eccentricity -approximating spanning trees exist in interval graphs for  [70, 75, 80], in asteroidal-triple–free graph [70], strongly chordal graphs [26] and dually chordal graphs [26] for . On the other hand, although for every there is a chordal graph without an additive tree -spanner [70, 80], yet as Prisner demonstrated in [81], every chordal graph has an eccentricity 2-approximating spanning tree. Later this result was extended in [51] to a larger family of graphs which includes all chordal graphs and all plane triangulations with inner vertices of degree at least 7. Both those classes belong to the class of 1-hyperbolic graphs. Thus, our result extends the result of [81] to all -hyperbolic graphs.

As our second main result, we show that in every -hyperbolic graph all distances with an additive one-sided error of at most can be found in time, where is a small constant. With a recent result in [32], this demonstrates an equivalence between approximating the hyperbolicity and approximating the distances in graphs. Note that every -hyperbolic graph admits a distance approximating tree  [35, 36, 37], that is, a tree (which is not necessarily a spanning tree) such that for every pair . Such a tree can be used to compute all distances in with an additive one-sided error of at most in time. Our new result removes the dependency of the additive error from and has a much smaller constant in front of . Note also that the tree may use edges not present in (not a spanning tree of ) and thus cannot serve as an eccentricity -approximating spanning tree. Furthermore, as chordal graphs are 1-hyperbolic, for every there is a 1-hyperbolic graph without an additive tree -spanner [70, 80].

At the conclusion of this paper, we analyze the performance of our algorithms for approximating eccentricities and distances on a number of real-world networks. Our experimental results show that the estimates on eccentricities and distances obtained are even better than the theoretical bounds proved.

2 Preliminaries

2.1 Center, diameter, centrality

All graphs occurring in this paper are finite, undirected, connected, without loops or multiple edges. We use and interchangeably to denote the number of vertices and and to denote the number of edges in . The length of a path from a vertex to a vertex is the number of edges in the path. The distance between vertices and is the length of a shortest path connecting and in . The eccentricity of a vertex , denoted by , is the largest distance from to any other vertex, i.e., . The centrality index of is . The radius of a graph is the minimum eccentricity of a vertex in , i.e., . The diameter of a graph is the the maximum eccentricity of a vertex in , i.e., . The center of a graph is the set of vertices with minimum eccentricity.

2.2 Gromov hyperbolicity and thin geodesic triangles

Let be a metric space. The Gromov product of with respect to is defined to be

A metric space is said to be -hyperbolic [60] for if

for all . Equivalently, is -hyperbolic if for any four points of , the two largest of the three distance sums , , differ by at most . A connected graph is -hyperbolic (or of hyperbolicity ) if the metric space is -hyperbolic, where is the standard shortest path metric defined on .

-Hyperbolic graphs generalize -chordal graphs and graphs of bounded tree-length: each -chordal graph has the tree-length at most [47] and each tree-length graph has hyperbolicity at most [35, 36]. Recall that a graph is -chordal if its induced cycles are of length at most , and it is of tree-length if it has a Robertson-Seymour tree-decomposition into bags of diameter at most [47].

For geodesic metric spaces and graphs there exist several equivalent definitions of -hyperbolicity involving different but comparable values of [14, 28, 59, 60]. In this paper, we will use the definition via thin geodesic triangles. Let be a metric space. A geodesic joining two points and from is a (continuous) map from the segment of of length to such that and for all A metric space is geodesic if every pair of points in can be joined by a geodesic. Every unweighted graph equipped with its standard distance can be transformed into a geodesic (network-like) space by replacing every edge by a segment of length 1; the segments may intersect only at common ends. Then is isometrically embedded in a natural way in The restrictions of geodesics of to the vertices of are the shortest paths of .

Let be a geodesic metric space. A geodesic triangle with is the union of three geodesic segments connecting these vertices. Let be the point of the geodesic segment located at distance from Then is located at distance from because . Analogously, define the points and both located at distance from see Fig. 1 for an illustration. There exists a unique isometry which maps to a tripod consisting of three solid segments and of lengths and respectively. This isometry maps the vertices of to the respective leaves of and the points and to the center of this tripod. Any other point of is the image of exactly two points of A geodesic triangle is called -thin if for all points implies A graph whose all geodesic triangles , , are -thin is called a graph with -thin triangles, and is called the thinness parameter of .

Figure 1: A geodesic triangle the points and the tripod

The following result shows that hyperbolicity of a geodesic space or a graph is equivalent to having thin geodesic triangles.

Proposition 1 ([14, 28, 59, 60])

Geodesic triangles of geodesic -hyperbolic spaces or graphs are -thin. Conversely, geodesic spaces or graphs with -thin triangles are -hyperbolic.

In what follows, we will need few more notions and notations. Let be a graph. By we denote a shortest path connecting vertices and in ; we call a geodesic between and . A ball of centered at vertex and with radius is the set of all vertices with distance no more than from (i.e., ). The th-power of a graph is the graph such that if and only if . Denote by the set of all vertices of that are most distant from . Vertices and of are called mutually distant if and , i.e., .

3 Fast approximation of eccentricities

In this section, we give linear and almost linear time algorithms for sharp estimation of the diameters, the radii, the centers and the eccentricities of all vertices in graphs with -thin triangles. Before presenting those algorithms, we establish some conditional lower bounds on complexities of computing the diameters and the radii in those graphs.

3.1 Conditional lower bounds on complexities

Recent work has revealed convincing evidence that solving the diameter problem in subquadratic time might not be possible, even in very special classes of graphs. Roditty and Vassilevska W. [83] showed that an algorithm that can distinguish between diameter 2 and 3 in a sparse graph in subquadratic time refutes the following widely believed conjecture.

The Orthogonal Vectors Conjecture: There is no such that for all , there is an algorithm that given two lists of

binary vectors

where can determine if there is an orthogonal pair , in time.

Williams [95] showed that the Orthogonal Vectors (OV) Conjecture is implied by the well-known Strong Exponential Time Hypothesis (SETH) of Impagliazzo, Paturi, and Zane [64, 63]. Nowadays many papers base the hardness of problems on SETH and the OV conjecture (see, e.g., [8, 21, 91] and papers cited therein).

Since all geodesic triangles of a graph constructed in the reduction in [83] are 2-thin, we can rephrase the result from [83] as follows.

Statement 1

If for some , there is an algorithm that can determine if a given graph with 2-thin triangles, vertices and edges has diameter 2 or 3 in time, then the Orthogonal Vector Conjecture is false.

To prove a similar lower bound result for the radius problem, recently Abboud et al. [8] suggested to use the following natural and plausible variant of the OV conjecture.

The Hitting Set Conjecture: There is no such that for all , there is an algorithm that given two lists of subsets of a universe of size , can decide in time if there is a set in the first list that intersects every set in the second list, i.e. a hitting set.

Abboud et al. [8] showed that an algorithm that can distinguish between radius 2 and 3 in a sparse graph in subquadratic time refutes the Hitting Set Conjecture. Since all geodesic triangles of a graph constructed in the reduction in [8] are 2-thin, rephrasing that result from [8], we have.

Statement 2

If for some , there is an algorithm that can determine if a given graph with 2-thin triangles, vertices, and edges has radius 2 or 3 in time, then the Hitting Set Conjecture is false.

3.2 Fast additive approximations

In this subsection, we show that in a graph with -thin triangles the eccentricities of all vertices can be computed in total linear time with an additive error depending on . We establish that the eccentricity of a vertex is determined (up-to a small error) by how far the vertex is from the center of . Finally, we show how to construct a spanning tree of in which the eccentricity of any vertex is its eccentricity in up to an additive error depending only on . For these purposes, we revisit and extend several results from our previous paper [35] concerning the linear time approximation of diameter, radius, and centers of -hyperbolic graphs. For these particular cases, we provide simplified proofs, leading to better additive errors due to the use of thinness of triangles instead of the four point condition and to the computation in time of a pair of mutually distant vertices.

Define the eccentricity layers of a graph as follows: for set

With this notation, the center of a graph is . In what follows, it will be convenient to define also the eccentricity of the middle point of any edge of ; set .

We start with a proposition showing that, in a graph with -thin triangles, a middle vertex of any geodesic between two mutually distant vertices has the eccentricity close to and is not too far from the center of .

Proposition 2

Let be a graph with -thin triangles, be a pair of mutually distant vertices of .

  1. If is the middle point of any -geodesic, then .

  2. If is a middle vertex of any -geodesic, then .

  3. . In particular,

  4. If is a middle vertex of any -geodesic and then . In particular, .

Proof

Let be an arbitrary vertex of and be a geodesic triangle, where are arbitrary geodesics connecting with and . Let be a point on which is at distance from and hence at distance from . Since and are mutually distant, we can assume, without loss of generality, that is located on between and , i.e., , and hence . Since , we also get .

(a) By the triangle inequality and since , we get

(b) Since when is even and when

is odd, we have

. Additionally to the proof of (a), one needs only to consider the case when is odd. We know that the middle point sees all vertices of within distance at most . Hence, both ends of the edge of -geodesic, containing the point in the middle, have eccentricities at most

(c) Since a middle vertex of any -geodesic sees all vertices of within distance at most , if , then

which is impossible.

(d) In the proof of (a), instead of an arbitrary vertex , consider any vertex from . By the triangle inequality and since and both are at most , we get

Consequently, On the other hand, since and , by statement (a), we get

As an easy consequence of Proposition 2(d), we get that the eccentricity of any vertex is equal, up to an additive one-sided error of at most , to plus .

Corollary 1

For every vertex of a graph with -thin triangles,

Proof

Consider an arbitrary vertex in and assume that . Let be a vertex from closest to . By Proposition 2(d), and . Hence,

and

Combining both inequalities, we get

Note also that, by the triangle inequality, (that is, the right-hand inequality holds for all graphs). ∎

It is interesting to note that the equality holds for every vertex of a graph if and only if the eccentricity function on is unimodal (that is, every local minimum is a global minimum)[48]. A slightly weaker condition holds for all chordal graphs [51]: for every vertex of a chordal graph , .

Proposition 3

Let be a graph with -thin triangles and be a pair of vertices of such that .

  1. If is a vertex of a -geodesic at distance from , then .

  2. For every pair of vertices , .

  3. .

  4. If is a vertex of a -geodesic at distance from and , then and . In particular, .

Proof

(a) Let be a vertex of with . Let be a geodesic triangle, where are arbitrary geodesics connecting with and . Let be a point on which is at distance from and hence at distance from . We distinguish between two cases: is between and or is between and in .

In the first case, by the triangle inequality and (and hence, ), we get

In the second case, by the triangle inequality and since , we get

(b) Consider an arbitrary -geodesic . Let be a geodesic triangle, where are arbitrary geodesics connecting with and . Let be a geodesic triangle, where are arbitrary geodesics connecting with and .

Let be a point on which is at distance from and hence at distance from . Let be a point on which is at distance from and hence at distance from . Without loss of generality, assume that is on between and .

Since (as ), we have . By the triangle inequality, we get

Consequently,

(c) Now, if is a diametral pair, i.e., , then, by (b) and Proposition 2(c),

(d) Consider any -geodesic and let be the middle point of it, be a vertex of at distance from , and be a vertex of at distance from . We know by (a) that . Furthermore, since (by (c)), . Hence,

implying

Let now be an arbitrary vertex from , i.e., , for some integer . Consider a geodesic triangle , where are arbitrary geodesics connecting with and . Let be a point on which is at distance from and hence at distance from . Since, in what follows, we will use only the fact that , we can assume, without loss of generality, that is located on between and , i.e., .

By the triangle inequality and since and both and are at most , we get

Hence, On the other hand, since and , we get

Proposition 4

For every graph with -thin triangles, In particular,

Proof

Let be two vertices of such that . Pick any -geodesic and consider the middle point of it. Let be a vertex of such that