An exact -hopset for a weighted graph is a weighted edge set, whose addition to the graph guarantees that every pair of vertices has a path between them with at most edges (hops) and whose length is exactly the length of shortest path between the vertices.
The concept of a hopset was first explicitly described by Cohen  in its approximate setting, in which the length of -hop path between a pair of vertices in the hopset should approximate the length of the shortest path in . Hopsets wer introduced in the context of parallel computation of approximate shortest paths. In this paper, we study hopsets in their exact version, with the general objective of optimizing exact shortest path queries.
Data structure which allow for querying distance between any pair of vertices of a graph have been intensively studied under the name of distance oracles. The efficiency of an exact distance oracle is typically measured by the interplay between the space requirement of the representation of the data structure and its decoding time. It is a well-established empirical fact that many real-world networks admit efficient (i.e., low-space and fast) distance oracles [6, 23]. A key example here concerns transportation networks, and specifically road networks, which are empirically known [33, 32, 5] to be augmentable by carefully tailored sets of shortcut edges, allowing for shortest-path computation. These sets of shortcuts may be hopsets (as is the case for the hub-labeling approach which effectively implements a -hopset), but may also be considered in some related (and frequently more involved) framework, such as contraction hierarchies  or transit-node routing .
An interesting theoretical insight due to Abraham et al. [4, 2, 5] provides theoretical bounds on the number of shortcuts required in all of the above-mentioned frameworks. They introduce a parameter describing the structure of shortest paths within ball neighborhoods of a graph, called highway dimension , and express the number of shortcuts that need to be added for each node so as to achieve shortest-path queries in a graph of nodes with weighted diameter as a polynomial of , , and ; this approach has been extended in subsequent work [3, 36]. The value of is known to be small in practice (e.g., typically for continental-sized road networks ), and does indeed appear to be inherently linked to the size of the required shortcut sets. In fact, empirical tests have suggested that the (average) number of necessary shortcuts per node is in fact very close to , laying open the question of whether the additional dependence of the number of shortcuts on logarithmic factors in and may be an artifact of the theoretical analysis of the oracles, which for each node require a separate shortcut for every “scale” of distance.
In this work, we provide strong evidence that the dependence of the number of shortcuts on such logarithmic factors in and is indeed not essential, and we design a simple distance oracle based on a -hopset in which the number of shortcuts per node depends only on , , and the logarithm of the average edge length. This result is in fact shown in the framework of a strictly broader class of graphs, namely, graphs with a bounded value of a parameter known as skeleton dimension (), describing the width of the shortest-path tree of a node after pruning all branches at a constant fraction of their depth. We also show a similar result for -hopsets in graphs of treewidth , obtaining a distance oracle in which the number of shortcuts per node is a function of and . (For the case of bounded treewidth, different constructions with comparable performance were previously known, cf. ). The space and time-bounds of oracles based on -hopsets are presented in Table 1, and compared with the corresponding parameters of oracles based on -hopsets. For the case of constant skeleton dimension or constant treewidth, we remark that using a -hopset instead of a -hopset reduces the number of shortcuts per node from to while achieving a query time of .
|Distance oracle||Treewidth||Skeleton dimension|
|-hopset (hubs):||11footnotemark: 1|
The query time of a simple -hopset data structure is . In a centralized setting, it can be reduced to by application in combination of a -hopset with a given -factor approximation of the distance, which can be provided by an auxiliary fast approximate distance oracle (e.g. ).
1.1 Results and Organization of the Paper
The rest of the paper is organized as follows. In Section 2, we introduce the necessary notions related to -hopset and give a general approach for how a -hopset can be used as a distance oracle, focusing on the special case of . As a warmup to the main results, in Section 3, we show how to construct efficient -hopsets for bounded treewidth graphs. We also consider the query time for -hopsets for weighted trees and, more generally, for bounded treewidth graphs. Then, in Section 4, we provide the first of our main results, using -hopsets to obtain improved (smaller) distance oracles in USP graphs with bounded skeleton dimension.
In the second part of the paper, we consider LP-based approximation algorithms for constructing -hopsets in unique shortest path graphs. A unique shortest path graph (USP) is a graph such that, given any two nodes and , there is a unique shortest path between them. In practice, this common assumption can be made without loss of generality, as one can perturb the input to ensure uniqueness; however such a perturbation may significantly change the size of the required distance oracle. Our construction builds on and significantly extends the LP-formulation of 2-hopsets and the framework of prehub labelings introduced in .
In Section 5, we provide a ILP formulation for the problem of finding a -hopset of minimal size. For the case of USP graphs, we then show how this ILP formulation is related to its LP relaxation, namely, that the problem has an at most polylogarithmic integrality gap. We extend the same approach to provide an algorihtm which constructs -hopsets which are ready to use as distance oracles in USP graphs, with (approximate) optimality guarantees on size and query time of the oracle.
Our work is presented in the context of weighted undirected graphs, but all results can easily be extended to weighted directed graphs.
1.2 Other Related Work
Exact hopsets were implicitly constructed in the context of single-source shortest paths parallel computation [42, 35, 19, 39]. Such works study the work versus time trade-offs of such computation. Cohen  explicitly introduced the notion of -hopset of as set of weighted edges such that paths of hops at most in have length within of the corresponding shortest path in . The parameter is called the hopbound. For any graph and , she proposed a construction of -hopset of with size . More recently, Elkin et al.  proposed the construction of -hopset with edges for any and integral . Abboud et al.  recently showed the optimiality of the Elkin et al.  result. In particular, they showed that for any and integer , any hopset of size less than must have hop bound , where is a constant depending only on . As far as we know, exact hopsets (with ) have not been explicitly studied. However, they are related to the following well studied notion.
Hopsets vs. TC-spanners.
In directed graphs, a hopset can be seen as a special case of an -transitive-closure spanner (-TC-spanner), i.e., an unweighted directed graph with the same transitive closure as a given unweighted directed graph , having hop diameter at most . Hopsets and TC-spanners are a fundamental graph-theoretic objects and are widely used in various settings from distance oracles to pre-processing for range queries in sequential or parallel setting or even in property testing.
The concept of adding transitive arcs to a digraph in order to reduce its diameter was introduced by Thorup  in the context of parallel processing. Bhattacharyya et al.  define a -transitive-closure spanner (-TC-spanner for short) of an unweighted digraph as a digraph with same transitive closure as and diameter at most . They note that this is a central concept in a long line of work around pre-processing a tree for range queries [7, 16, 41]. A TC-spanner can also be defined as a spanner (for the classical spanner definition ) of the transitive closure of a graph that has bounded diameter. We will see that an exact -hopset defines a -TC-spanner but that the converse is not necessarily true. Bhattacharyya et al.  proposed a construction of -TC-spanner of size for -minor-free graphs (where denotes the th-row inverse Ackermann function, cf. Section 3).
Exact Distance Oracles.
A long line of research studies the interplay between data structure space and query decoding time. A lot of attention has been given to distance oracles for planar graphs [24, 10, 18, 14, 26, 22, 30], and it has recently been shown that a distance oracle with space and query-time is possible . In the context of weighted directed graphs with treewidth , Chaudhuri and Zaroliagis  propose a distance oracle using space and query time for integral where is the th-row inverse Ackermann function (as defined in Subsection 3). In the context of unweighted graphs with treewidth , Farzan and Kamali  obtain distance oracles with query time using optimal space (within low order terms). This construction heavily relies on the unweighted setting as exhaustive look-up tables are constructed for handling graphs with polylogarithmic size.
Distance Labelings and 2-Hopsets.
The distance labeling problem is a special case of a distributed distance oracle, and consists of assigning labels to the nodes of a graph such that the distance between two nodes and can be computed from the labels of and (see, e.g., ).
The notion of 2-hopset studied in this work coincides with the special case of two-hop distance labeling (also called hub-labeling), where labels are constructed from hub sets: in hub-labeling, a small hub set is assigned to each node of a graph such that or any pair of nodes, the intersection of hub sets contains a node on a shortest path. Such a construction is proposed by Gavoille et al.  and applies to graphs of treewidth with labels of size and allows to answer distance queries in time; the hub sets have a hierarchical structure, which allows for an improvement of query time to time by a binary search over levels. Hub labelings are the best currently known distance labelings for sparse graphs, achieving sublinear node label size [8, 29], and may also be used to provide a 2-additive-approximation for distance labeling in general graphs using sublinear-space labels .
In graphs of bounded highway dimension, hub labels were among the first identified distance oracles to provide label size and query time polynomial in the highway dimension and polylogarithmic in other graph parameters . This result was then extended to the more general class of graphs with bounded skeleton dimension .
Hub sets with near to optimal size can be constructed in polynomial time. A greedy setcover-type -approximation algorithm (with respect to average size of a hub set) was proposed by Cohen et al. . For the case of USP graphs, this approximation ratio was improved by Angelidakis et al.  to the logarithm of the graph hop-diameter, i.e., the maximum number of hops of a shortest path in .
We are given a weighted undirected graph where associates a weight with each edge of . For a positive integer parameter and a pair , the -limited distance between and , denoted , is defined as the length of the shortest path that contains at most edges (aka hops). The usual shortest path distance can be defined as . For the sake of brevity, we often let denote the pair representing an edge from to .
An (exact) -hopset for a weighted graph is a set of edges such that for all in where is the graph augmented with edges of the hopset with weights for and for . The parameter is called the hopbound of the hopset. Edges from set are called shortcuts in .
By convention, we will assume that all self-loops at nodes of are included in . Thus, is a graph whose -th power in the algebra on matrices of edge weights corresponds to the transitive closure of the weight matrix of graph .
Equivalently, a -hopset can be defined as a set of edges such that for any pair , there exists a path of edges at most from to in and a shortest path from to in such that all nodes of belong to and appear in the same order. Note that a -hopset is completely specified by its set of edges as the associated weights are deduced from distances in the graph.
2.2 Using a Hopset as a Distance Oracle
Hopsets may be used to answer shortest-path queries in a graph . In general, given a hopset , the naïve way to approach a query for for a given node pair is to perform a bidirectional Dijkstra search in graph from this node pair, limited to a maximum of hops distance from each of these nodes. We have, in particular for any pair :
Different optimizations of this technique are possible.
In this paper, we focus only on the time complexity of the case of , where we perform the following optimization of query execution. We represent set as the union of two (not necessarily disjoint) sets of shortcuts, , where an edge belongs to if it is used as the first or third (last) hop on a shortest path in , and it belongs to if it is used as the second hop on such a path. By convention, we assume that self-loops at nodes are added to , thus e.g. a -hop path between a pair of adjacent nodes in is constructed by taking a self-loop from , the correct edge from , and another self-loop from . (Note that we never directly use edges of as first or last hops in the hopset; if such an edge is required for correctness of construction, it should be explicitly added to set .) We further apply an orientation to the shortcuts in , constructing a corresponding set of arcs , such that, for any node pair , there exist such that , , , and:
We note that , since each shortcut from corresponds to at most a pair of symmetric arcs in . For a node , let represent the out-neighborhood of in the graph . To perform shortest path queries on , we now store for each node the lists . We also store a hash map, mapping all node pairs to the length of the respective link, . Now, we answer the distance query for a node pair as follows:
Using the given data structures, the query is then processed using hashmap look-ups, one for each pair , i.e., in time . Time is simply referred to as the query time for the considered node pair in the -hopset oracle . Assuming uniform query density over all node pairs, the uniform-average query time is given as: Thus, in the uniform density setting (which we refer to in Section 5 only), the average time of processing a query is proportional to the square of the average degree of a node with respect to edge set .
The size of set affects only the size of the data structure required by the distance oracle, which is given as at most edges, with each edge represented using bits.
In the 3-hopset distance oracles described in the following sections, we will confine ourselves to describing shortcut sets and , noting that the correct orientation of will follow naturally from the details of the provided constructions.
3 Warmup: Bounded Treewidth Graphs
As a warm-up, we provide a -hopset construction for bounded treewidth graphs and use it to design a distance oracle in the case .
We first consider the case of (weighted) trees. The construction of -hopsets for trees is classical. It is implicit in [7, 16], explicit for unweighted trees in  and directed trees in . We provide a short construction which fine-grains the dependence of the hopset size on (e.g., replacing by
with respect to the asymptotic analysis in). The construction is based on the following lemma for splitting the tree into smaller sub-trees.
Given a tree with nodes and a value , there exists a set of nodes at most such that each connected component of contains less than nodes and is connected to at most two nodes in . Set can be computed in linear time through a bottom-up traversal of the tree.
Start with and root at some arbitrary node . As long as the connected component of containing root has nodes or more, add to a node from this component such that the subtree rooted as has size or more while for all descendants of . This results in a set of at most nodes such that the connected components of have size less than . Define as the set of lowest common ancestors of any two nodes . The size of is at most since its nodes correspond to the internal nodes with two children ore more in the minimal sub-tree containing which has at most leaves. Let be the union of and . For any connected component of , there exist at most two nodes in that are connected to nodes of in : at most one is connected to the root of ( is considered as a sub-tree of ) and at most one has its parent in (if there were two such nodes, their lowest common ancestor would be in and not in , contradicting the connectivity of ). ∎
-hopset construction for trees.
A 1-hopset in a tree is obtained by adding all pairs as edges with appropriate weight. For , we recursively define a -hopset of as follows. Select a set of nodes at most with according to Lemma 1. (The number is suitably chosen according to the -row inverse Ackerman function defined next.) When , we add an edge from each node of to each node in . When , we consider the forest induced by nodes in : it has node set and edges such that is the closest ancestor of in that belongs to . The weight of such an edge is defined as . We then add a -hopset of to the construction. Additionally, we add one or two edges per node not in : for each connected component of , add an edge for each node and each connected to . Note that Lemma 1 ensures that there are at most two such nodes for a given component . In both cases (), we construct recursively a -hopset of each sub-tree induced by a connected component of . In the special case of , the -hopsets contribute to while all edges connecting to a node in some selected set contribute to according to the convention introduced in the Preliminaries.
Notation: Ackermann function.
To analyze the construction, following , we introduce the following variants of the Ackermann function:
The th-row inverse Ackermann function is defined by and for . Equivalently, we have , and where we define for any function : , for , and . Note that , , and .
The inverse Ackermann function is defined as . Note that we have . We are now ready to state the parameters of the designed hopset.
For any integer and weighted tree with nodes, a -hopset of with edges can be computed in time. A linear size -hopset can be computed in time. In the case , the constructed hopset allows to obtain a distance oracle using space of edges of bits and having query time .
The correctness of the constructed -hopset comes from the fact two nodes in two different connected components of both have an hopset edge to a node in on the path from to according to Lemma 1. Let and denote the nodes in that are linked to and respectively (). For , the -hopset added in the construction implies that a path of hops at most links to in and we thus have . For , we also have (and ), and implies .
We claim that the resulting -hopset has edges for . Recall that a -hopset has edges. Note that the choice of in our construction implies that connected components created by Lemma 1 have size at most. The components created in a recursive call with recursion depth will have size . The number of recursion levels is thus . We now show that edges are added to the construction at each recursion level. For , we have and the number of edges added at each recursion level is thus at most . For , we have and the -hopset constructed on nodes at most has edges. For , we proceed by induction on : we assume that the -hopset constructed for a tree with nodes at most has edges that is edges for (note that is non-decreasing for any ).
Query time for -hopsets.
For the special case of , we have , and the size required to represent the -hop data structure is edges. Moreover, following the convention introduced in the Preliminaries, we note that in the adopted construction for all . A bound of query time follows the above analysis.
Linear size hopset.
We can obtain a linear size -hopset by splitting into sub-trees of size at most using Lemma 1 with . Two nodes in a connected component of are thus obviouly linked by a path of length at most in . Similary as before, we link every node to the (at most 2) nodes in connected to its component and add a -hopset for the forest induced by nodes in . We thus obtain a -hopset with edges.
We note that the hopset size is indeed tight for some trees. If is a path with nodes from to , any -hopset can be seen as a covering of intervals in where denotes the interval of integers. More precisely, a set of intervals -covers when every interval is the union of at most intervals in . We can easily obtain a -covering from any -hopset of the path by associating each edge of to the interval . A lower bound of for the size of a -covering of is proved in .
Recall that a graph has treewidth if there exists a tree whose nodes are subsets of called bags such that: for all ; for all edges , there exists a bag containing both and (); and for all nodes , the bags containing form a sub-tree of . Without loss of generality, we assume that each bag contains exactly nodes, and that two neighboring bags share exactly nodes (the decomposition is standard). This implies as each bag brings one new node. Note that removing a non-leaf bag separates the graph into several connected components. We consider that all edges of have weight 1. For convenience, we assume that is root at some bag and define for each node the root bag of as the bag containing which is closest to the root.
-hopset construction for bounded treewidth graphs.
Consider a graph with treewidth and an associated tree . The general idea is to follow the construction of a -hopset of with slight modifications. Similarly to the tree case, we select a set of bags at most with according to Lemma 1. We then construct a -hopset of the forest induced by bags in according to the tree construction. For each edge in , we add an edge to the graph hopset for all and . Such edges are called tree-hopset edges. Now for each node such that its root bag falls in a connected component of , we consider the (at most 2) bags that are connected to that component and add an edge to the graph hopset for all . Such edges are called separator edges. We then recurse on each component of until we reach subtrees of size . We then pursue with and so on recursively until reaching components of size 1 at most. Finally, for each node , we add an edge to the graph hopset for all . Such edges are called bag edges. To construct a linear size hopset, we use a single step with and a -hopset of . For each tree edge inside components of we add an edge to the construction for all and such that and . Such edges are also considered as tree-hopset edges.
For all , any graph with treewidth has a -hopset with edges and a -hopset with edges.
Correctness of construction.
Let denote the hopset constructed for a graph with treewidth and associated tree . Consider a shortest path for some integer . First consider the case where a bag of contains both and . We can assume without loss of generality than is an ancestor of . As lies on the path from to , it must contain and edge is in according to the last step of the above construction. Now suppose that no bag contains both and . Consider the first recursion call where splitting a subtree with a set of bags separates and . Consider the path from to in . Let (resp. ) be the first (resp. last) bag in on that path. Either is in or contains separator edges from to all nodes in . Similarly, either is in or contains a separator edge from to all nodes in . The -hopset considered during that recursion call contains a path of hops from to . If two consecutive bags contain and respectively, then contains edge as a tree-hopset edge. Otherwise, let (resp. Y’) be the first bag in not containing (resp. ). By treewidth definition, there exists bags containing edges respectively (i.e., contains and for all ). The shortest path corresponds to a walk in from to , then to and so on. All bags on the path (in ) from to must contain . As that walk must go through , we can define the highest index such that . Similarly, we can define the smallest index such that . Our construction then contains separator edges and . When , contains a path of 2 hops at most with same length as . If two consecutive bags of contain and respectively, then contains a tree-hopset edge . Otherwise, we can similarly define indexes and with and . Our construction then contains tree-hopset edges . In all cases, contains a path of hops at most and same length as .
In the first recursion levels, a subtree of size is split into subtrees smaller than . At recursion depth , we thus obtain subtrees of size at most. Deeper recursion calls are similar to the tree case. The total number of recursion levels is thus . When processing a subtree of size , we build a -hopset for a forest of bags at most using edges according to Proposition 1. For , we use and thus produce tree-hopset edges at most. For , we use . However, for a given bag , there are at most nodes not in among the other bags. We thus produce at most tree-hopset edges per bag. In both cases, each recursion level thus brings tree-hopset edges as well as separator edges. There are bag edges at most in total. We can thus obtain a -hopset with edges for any graph of treewidth . In the linear size construction, we use a single step using a -hopset for with edges. We thus have tree-hopset edges and separator edges.
Query time for 3-hopsets.
For the special case of , we have , and the size required to represent the -hop data structure is edges. Following the convention
, we classify tree-hopset edges inwhile both separator edges and bag edges are classified in . For any , we thus have . The following bound on the query time follows.
Any graph with treewidth admits a -hopset distance oracle represented on edges of bits, with a query time of . ∎
4 Bounded Skeleton Dimension
In this Section we consider graphs with unique shortest paths (USP), only. A formal definition of the notion of skeleton dimension relies on the concept of the geometric realization of a graph, cf. . The geometric realization of can be seen as the “continuous” graph where each edge is seen as infinitely many vertices of degree two with infinitely small edges, such that for any and , there is a node in at distance from on edge . We define the reach of as . We then define the skeleton of as the subtree of induced by nodes with reach at least half their distance from the root. More precisely, is the subtree of induced by . The width of a tree with root is defined as the maximum number of nodes (points) in at a given distance from its root. More precisely, the width of is where is the set of nodes with .
The skeleton dimension of a graph is now defined as the maximum width of the skeleton of a shortest path tree, that is , where denotes the shortest path tree of obtained as the union of shortest paths from to all .
For the definition of the related concept of highway dimension we refer the reader to . We note that if the geometric realization of a graph has highway dimension , then has skeleton dimension ; hence, in all subsequent asymptotic analyses, upper bounds expressed in terms of skeleton dimension can be replaced by analogous bounds in terms of highway dimension.
4.1 Construction of the 3-Hopset
We denote by the maximum length of an edge in graph . The construction of the 3-hopset is obtained by taking a union of sets of shortcuts, each of which covers sets of node pairs within a given distance range. The first shortcut set covers all node pairs with , for some choice of distance bound , whereas each of the subsequent shortcut sets covers nodes at a distance in an exponential increasing distance range, , where is suitably chosen. We then put:
Construction of set .
We note that a construction of -hopsets for graphs of skeleton dimension was performed in . As a direct corollary of [Lem. 2, Cor. 1,2], given a distance bound , there exists a randomized polynomial-time construction of a set of shortcuts for graph with the property that for any pair of nodes with , we have , such that , and moreover for all , we have and . We use set directly for the value , considering as a -hopset for node pairs with . We have:
and for all :
We remark that, without loss of generality, in asymptotic analysis one may assume that , where is the average edge length in , noting that edges longer than can be subdivided into edges of length at most by inserting additional vertices, increasing the number of nodes of the graph only by a multiplicative constant. Thus, in the above bounds, we can replace by .
Construction of set .
We now proceed to construct a -hopset for node pairs with . The construction of set is randomized and completely determined by assignment of real values to each node , uniformly and independently at random. We condition all subsequent considerations on the event that all values are distinct,
, which holds with probability.
Now, hopset is defined as , where following our usual notation, is the set of first and last hops, and is the set of middle hops.
Set of first and last hops.
For , let be the set of nodes which lie on a shortest path of length at least which has one of its endpoints at , and which have minimum value of among all vertices on this path at distance in from :
We now put: .
Set of middle hops.
We put in links between all pairs of nodes which have a small value of , satisfy the natural upper bound of on distance between them, and have sufficiently large reach, i.e., the shortest path between them can be extended by at least :
The validity of as a -hopset is immediate to verify from the construction.
4.2 Bound on 3-Hopset Size and Oracle Time
Fix and . We have: .
By the fact that the size of the cut of the skeleton tree for node at distance from is upper-bounded by the skeleton dimension , we have that the set of paths , where , has at least at most distinct paths, . The bound on the size of set now follows directly from its definition. ∎
From the above Lemma, it follows that for any we have , thus summing over the levels of the construction, we successively obtain:
We now proceed to bound the size of the set of middle hopsets.
Fix . With probability , it holds that for all , for all , we have .
As noted in the proof of Lemma 2, to be included in , a node must be the minimum element along one of the at most possible paths . Each such path includes all nodes on the path at distance in the range from , where we recall that , for some sufficiently large choice of constant . It follows that each path contains
nodes. Now, taking note of the independence of the choice of random variables, we have by a simple concentration bound that