1. Introduction
Finding the shortestpath distance between a pair of vertices is a fundamental task in graph theory, and has a broad range of applications (Backstrom et al., 2006; Freeman, 1977; Sabidussi, 1966; Yahia et al., 2008; Ukkonen et al., 2008; Vieira et al., 2007; Maniu and Cautis, 2013). For example, in web graphs, ranking of web pages based on their distances to recently visited web pages helps in finding the more relevant pages and is referred to as contextaware search (Ukkonen et al., 2008). In social network analysis, distance is used as a core measure in many problems such as centrality (Freeman, 1977; Sabidussi, 1966) and community identification (Backstrom et al., 2006), which require distances to be computed for a large number of vertex pairs. However, despite extensive efforts in addressing the shortestpath distance problem for many years, there is still a high demand for scalable solutions that can be used to support analysis tasks over large and evergrowing networks.
Traditionally, one can use the Dijkstra algorithm (Tarjan, 1983) for weighted graphs or a breadthfirst search (BFS) algorithm for unweighted graphs to query shortestpath distances. However, these algorithms are not scalable, i.e., for large graphs with billions of vertices and edges, they may take seconds or even longer to find the shortestpath distance between one pair of vertices, which is not acceptable for largescale network applications where distances need to be provided in the order of milliseconds. To improve query time, a wellestablished approach is to precompute and store shortestpath distances between all pairs of vertices in an index, also called distance labelling, and then answer a distance query (i.e., find the distance between two vertices) in constant time with a single lookup in the index. Recent work (Hayashi et al., 2016) shows that such labellingbased methods are the fastest known exact distance querying methods on moderately large graphs (millionscale) having millions of edges, but still fail to scale to large graphs (billionscale) due to quadratic space requirements and unbearable indexing construction time.
Thus, the question is still open as to how scalable solutions to answer exact distance queries in billionscale networks can be developed. Essentially, there are three computational factors to be considered concerning the performance of algorithms for answering distance queries: construction time, index size, and query time. Much of the existing work has focused on exploring tradeoffs among these computational factors (Abraham et al., 2011, 2012; Akiba et al., 2013, 2012; Wei, 2010; Hayashi et al., 2016; Tretyakov et al., 2011; Potamias et al., 2009; Fu et al., 2013; Jin et al., 2012; Qiao et al., 2014; Gubichev et al., 2010; Li et al., 2017; Chang et al., 2012), especially for the 2hop cover distance labelling (Cohen et al., 2003; Akiba et al., 2013). Nonetheless, to handle large graphs, we believe that a scalable solution for answering exact distance queries needs to have the following desirable characteristics: (1) the construction time of a distance labelling is scalable with the size of a network; (2) the size of a distance labelling is minimized so as to reduce the space overhead; (3) the query time remains in the order of milliseconds, even in graphs with billions of nodes and edges.
In this work, we aim to develop a scalable solution for exact distance queries which can meet the aforementioned characteristics. Our solution is based on two ingredients: (i) a scalable labelling algorithm for constructing minimal distance labelling, and (ii) a querying framework that supports fast distancebounded search on a sparsified graph. More specifically, we first develop a novel labelling algorithm that can scale to graphs at the billionscale. We observed that, for a given number of landmarks, the distance entries from these landmarks to other vertices in a graph can be further minimized if the definition of 2hop cover distance labelling is relaxed. Thus, we formulate a relaxed notion for labelling in this paper, called the highway cover distance labelling, and develop a simple yet scalable labelling algorithm that adds a significantly small number of distance entries into the label of each vertex. We prove that the distance labelling constructed by our labelling algorithm is minimal, and also experimentally verify that the construction process is scalable.
Then, we formalize a querying framework for exact distance queries, which combines our proposed highway cover distance labelling with distancebounded searches to enable fast distance computation. This querying framework is capable of balancing the tradeoff between construction time, index size and query time through an offline component (i.e. the proposed highway cover distance labelling) and an online component (i.e. distancebounded searches). The basic idea is to select a small number of highly central landmarks that allow us to efficiently compute the upper bounds of distances between all pairs of vertices using an offline distance labelling, and then conduct distancebounded search over a sparsified graph to find exact distances efficiently. Our experimental results show that the query time of distance queries within this framework is still in millionseconds for large graphs with billions of vertices and edges.
Figure 1 summarizes the performance of the stateoftheart methods for exact distance queries (Akiba et al., 2013; Fu et al., 2013; Abraham et al., 2012; Hayashi et al., 2016; Jiang et al., 2014; Tarjan, 1983; Pohl, 1971; Chang et al., 2012), as well as our proposed method in this paper, denoted as HL. In Figure 1(a)1(b), we can see that, labellingbased methods PLL (Akiba et al., 2013), HDB (Jiang et al., 2014), and HHL (Abraham et al., 2012) can answer distance queries in considerably small amount of time. However, they have very large space requirements and very long labelling construction time. On the contrary, traditional online search methods such as Dijkstra (Tarjan, 1983) and bidirectional BFS (denoted as BiBFS) (Pohl, 1971) are not applicable to largescale networks where distances need to be provided in the order of milliseconds because of their very high response time. The hybrid methods FD (Fu et al., 2013), ISL (Hayashi et al., 2016) and HL (our method) combine an offline labelling and an online graph traversal technique, which can often provide better tradeoffs between query response time and labelling size. In Figure 1(b), we can also see that only our proposed method HL can handle networks of size 8B, and is scalable to perform distance queries on networks with billions of vertices and billions of edges.
Figure 1(c) presents a highlevel overview for several important properties of labelling methods. The column ordering dependent refers to whether a distance labelling depends on the ordering of landmarks when being constructed by a method. Only our method HL and FD are not orderingdependent. The columns 2HCminimal and HWCminimal refer to whether a distance labelling constructed by a method is minimal in terms of the 2hop cover (2HC) and highway cover (HWC) properties, respectively. PLL is 2HCminimal, but not HWCminimal. Our method HL is the only method that is HWCminimal. The column Parallel refers to what kind of parallelism a method can support. FD and PLL support bitparallelism for up to 64 neighbours of a landmark. Our method HL supports parallel computation for multiple landmarks, depending on the number of processors. Other methods did not mention any parallelism.
In summary, our contributions in this paper are as follows:

We introduce a new labelling property, namely highway cover labelling, which relaxes the standard notion of 2hop cover labelling. Based on this new labelling property, we propose a highly scalable labelling algorithm that can scale to construct labellings for graphs with billions of vertices and billions of edges.

We prove that the proposed labelling algorithm can construct HWCminimal labellings, which is independent of any ordering of landmarks. Then, due to this determinstric nature of labelling, we further develop a parallel algorithm which can run parallel BFSs from multiple landmarks to speed up labelling construction.

We combine our novel labelling algorithm with online boundeddistance graph traversal to efficiently answer exact distance queries. This querying framework enables us to balance the tradeoffs among construction time, labelling size and query time.

We have experimentally verified the performance of our methods on 12 largescale complex networks. The results that our methods can not only handle networks with billions of vertices, but also be up to 70 times faster in constructing labelling and save up to 90% of labelling space.
The rest of the paper is organized as follows. In Section 2, we present basic notations and definitions used in this paper. Then, we discuss a novel labelling algorithm in Section 3, formulate the querying framework in Section 4, and introduce several optimization techniques in Section 5. In Section 6 we present our experimental results and in Section 7 we discuss other works that are related to our work here. The paper is concluded in Section 8.
2. Preliminaries
Let be a graph where is a set of vertices and is a set of edges. We have and . Without loss of generality, we assume that the graph is connected and undirected in this paper. Let be a subset of vertices of . Then the induced subgraph is a graph whose vertex set is and whose edge set consists of all of the edges in that have both endpoints in . Let denote a set of neighbors of a vertex in .
The distance between two vertices and in , denoted as , is the length of the shortest path from to . We consider , if there does not exist a path from to . For any three vertices , the following triangle inequalities are satisfied:
(1)  
(2) 
If belongs to one of the shortest paths from to , then holds.
Given a special subset of vertices of , socalled landmarks, a label for each vertex can be precomputed, which is a set of distance entries where and for . The set of labels is called a distance labeling over . The size of a distance labelling is defined as size(L)=.
Using such a distance labeling L, we can query the distance between any pair of vertices in graph as follows,
(3) 
We define , if and do not share any landmark. If holds for any two vertices and of , is called a 2hop cover distance labeling over (Cohen et al., 2003; Abraham et al., 2012).
Given a graph and a set of landmarks , the distance querying problem is to efficiently compute the shortest path distance between any two vertices and in , using a distance labeling over in which labels may contain distance entries from landmarks in .
3. Highway Cover Labelling
In this section, we formulate the highway cover labelling problem and propose a novel algorithm to efficiently construct the highway cover distance labelling over graphs. Then, we provide theoretical analysis of our proposed algorithm.
3.1. Highway Cover Labelling Problem
We begin with the definitions of highway and highway cover.
Definition 3.1 ().
(Highway) A highway is a pair , where is a set of landmarks and is a , i.e. , such that for any we have .
Given a landmark and two vertices (i.e. ), a constrained shortest path between and is a path between and satisfying two conditions: (1) It goes through the landmark , and (2) It has the minimum length among all paths between and that go through . We use to denote the set of vertices in a shortest path between and , and to denote the set of vertices in a constrained shortest path between and .
Definition 3.2 ().
(Highway Cover) Let be a graph and a highway. Then for any two vertices and for any , there exist and such that and , where and may equal to .
If the label of a vertex contains a distance entry , we also say that the vertex is covered by the landmark in the distance labelling . Intuitively, the highway cover property guarantees that, given a highway with a set of landmarks and , any constrained shortest path distance between two vertices and can be found using only the labels of these two vertices and the given highway. A distance labelling is called a highway cover distance labelling if satisfies the highway cover property.
Example 3.3 ().
Consider the graph depicted in Figure 2(a), the highway has three landmarks as highlighted in red in Figure 2(b). Based on the graph in Figure 2(a) and the highway in Figure 2(b), we have which is a shortest path between the vertices and constrained by the landmark , i.e. constrained shortest path between and . In contrast, neither of the paths and is a constrained shortest path between and .
In Figure 2(b), the outgoing arrows from each landmark point to vertices in that are covered by this landmark in the highway. The distance labelling in Figure 2(c) satisfies the highway cover property because for any two vertices that are not landmarks and any landmark , we can find the constrained shortest path distance between these two vertices using their labels and the highway.
Definition 3.4 ().
(Highway Cover Labelling Problem) Given a graph and a highway over , the highway cover labelling problem is to efficiently construct a highway cover distance labelling .
Several choices naturally come up: (1) One is to add a distance entry for each landmark into the label of every vertex in , as the approach proposed in (Hayashi et al., 2016); (2) Another is to use the pruned landmark labelling approach (Akiba et al., 2013) to add the distance entry of a landmark into the labels of vertices in if the landmark has not been pruned during a BFS rooted at ; (3) Alternatively, we can also extend the pruned landmark labelling approach to construct the highway cover labeling by replacing the 2hop cover pruning condition with the one required by the highway cover as defined in Definition 3.2 at each step of checking possible labels to be pruned.
In all these cases, the labelling construction process would not be scalable nor be suitable for largescale complex networks with billions of vertices and edges. Moreover, these approaches would potentially lead to the construction of distance labellings with different sizes. A question arising naturally is how to construct a minimal highway cover distance labelling without redundant labels? In a nutshell, it is a challenging task to construct a highway cover distance labelling that can scale to very large networks, ideally in linear time, but also with the minimal labelling size.
3.2. A Novel Algorithm
We propose a novel algorithm for solving the highway cover labelling problem, which can construct labellings in linear time.
The key idea of our algorithm is to construct a label for vertex such that the distance entry of each landmark is only added into the label iff there does not exist any other landmark that appears in the shortest path between and , i.e. . In other words, if there is another landmark and is in the shortest path between and , then is added into iff is the “closest" landmark to . To compute such labels efficiently, we conduct a breadthfirst search from every landmark and add distance entries into labels of vertices that do not have any other landmark in their shortest paths from .
Example 3.5 ().
Consider vertex in Figure 2(c), the label contains the distance entries of landmarks , but no distance entry of landmark . This is because and are the closest landmarks to vertex 7 in the shortest paths and , respectively. However, for either of two shortest paths and between and , there is another landmark (i.e. or ) that is closer to compared with in these shortest paths. Thus the distance entry of landmark 1 is not added into .
Our highway cover labelling approach is described in Algorithm 1. Given a graph and a highway over , we start with an empty highway cover distance labelling , where for every . Then, for each landmark , we compute the corresponding distance entries as follows. We use two queues and to process vertices to be labeled or pruned at each level of a breadthfirst search (BFS) tree, respectively. We start by processing vertices in . For each vertex at depth , we examine the children of at depth that are unvisited. For each unvisited child vertex at depth , if then we prune , i.e., we do not add a distance entry of the current landmark into and we also enqueue to the pruned queue (Line 11). Otherwise, we add to the label of , i.e., we add it into and we also enqueue to the labeled queue (Lines 1314). Here, refers to BFS decoded distance from root to . Then we process the pruned vertices in . These vertices are either landmarks or have landmarks in their shortest paths from , and thus do not need to be labeled. Therefore, for each vertex at depth , we enqueue all unvisited children of at depth to the pruned queue . We keep processing these two queues, one after the other, until is empty.
Example 3.6 ().
We illustrate how our algorithm conducts pruned BFSs in Figure 3. The pruned BFS from landmark is depicted in Figure 3(a), which labels only four vertices because the other vertices are either landmarks or contain other landmarks in their shortest paths to landmark . Similarly, in the pruned BFS from landmark depicted in Figure 3(b), only vertices are labelled, and none of the vertices , , and is labelled because of the presence of landmark in their shortest paths to landmark . Indeed, we can get the distance between landmark to these vertices by using the highway, i.e. , and distance entries in their labels to landmark . The pruned BFS from landmark 9 is depicted in Figure 3(c), which works in a similar fashion.
3.3. Correctness
Here we prove the correctness of our labelling algorithm.
Lemma 3.7 ().
In Algorithm 1, for each pruned BFS rooted at , is added into the label of a vertex iff there is no any other landmark appearing in the shortest path between and , i.e., .
Proof.
Suppose that Algorithm 1 is conducting a pruned BFS rooted at and is an unvisited child of another vertex in (start from ) (Lines 69). If (Line 10), then we have (Lines 11, 1921), cannot be added into the label of any child of , i.e., put into . Otherwise, by and is an unvisited child of a vertex in (Lines 89), we know that and thus is added into (lines 1214). ∎
Then, by Lemma 3.7, we have the following corollary.
Corollary 3.8 ().
Let be a landmark, a vertex, and a distance labelling constructed by Algorithm 1, if , then there must exist a landmark such that and .
Theorem 3.9 ().
The highway cover distance labelling over constructed using Algorithm 1 satisfies the highway cover property over .
Proof.
To prove that, for any two vertices and for any , there exist and such that and , we consider the following 4 cases: (1) If and , then . (2) If and , then and by Lemma 3.8, there exists another landmark such that is in the shortest path between and and . (3) If and , then similarly we have , and by Lemma 3.8, there exists another landmark such that is in the shortest path between and and . (4) If and , then by Lemma 3.8 there exist another two landmarks and such that is in the shortest path between and and , and is in the shortest path between and and . The proof is done. ∎
3.4. Order Independence
In previous studies (Abraham et al., 2011; Akiba et al., 2013; Abraham et al., 2012; Cohen et al., 2003), given a graph , a distance labelling algorithm builds a unique canonical distance labelling subject to a labelling order (i.e., the order of landmarks used for constructing a distance labelling). It has been well known that such a labelling order is decisive in determining the size of the constructed distance labelling (Qin et al., 2017). For the same set of landmarks, when using different labelling orders, the sizes of the constructed distance labelling may vary significantly.
The following example shows how different labelling orders in the pruned landmark labelling approach (Akiba et al., 2013) can lead to the distance labelling of different sizes.
Example 3.10 ().
In Figure 4, the size of the distance labelling constructed using the labelling order in Figure 4(a)4(c) is different from the size of the distance labelling constructed using the labelling order in Figure 4(d)4(f). In both cases, the first BFS adds the distance entry of the current landmark into all the vertices in the graph. Then, the following BFSs check each visited vertex whether the shortest path distance between the current landmark and the visited vertex can be computed via the 2hop cover property based on their labels added by the previous BFSs. A distance entry is only added into the label of a vertex if the shortest path distance cannot be computed by applying the 2hop cover over the existing labels. Thus, the choice of the labelling order could affect the size of labels significantly. Take the vertex for example, its label contains only one distance entry using the labelling order depicted in Figure 4(a)4(c), but contains three distance entries , , and when the labelling order depicted in Figure 4(d)4(f) is used.
Unlike all previous approaches taken with distance labelling, our highway cover labelling algorithm is orderinvariant. That is, regardless of the labelling order, the distance labellings constructed by our algorithm using different labelling orders over the same set of landmarks always have the same size. In fact, we can show that our algorithm has the following stronger property: the distance labelling constructed using our algorithm is deterministic (i.e., the same label for each vertex) for a given set of landmarks.
Lemma 3.11 ().
Let be a graph and a highway over . For any two different labelling orders over , the highway cover distance labellings and over constructed by these two different labelling orders using Algorithm 1 satisfy for every .
Proof.
3.5. Minimality
Here we discuss the question of minimality, i.e., whether the highway cover distance labelling constructed by our algorithm is always minimal in terms of the labelling size. We first prove the following theorem.
Theorem 3.12 ().
The highway cover distance labelling over constructed using Algorithm 1 is minimal, i.e., for any highway cover distance labelling over , must hold.
Proof.
We prove this by contradiction. Let us assume that there is a highway cover distance labelling with . Then, this would imply that there must exist a vertex and a landmark such that and . By Lemma 3.7 and , we know that there is no any other landmark in that is in the shortest path between and . However, by the definition of the highway cover property (i.e. Definition 3.2) and , we also know that there must exist another landmark and , which contradicts with the previous conclusion that there is no any other landmark in the shortest path between and . Thus, must hold for any highway cover distance labelling . ∎
The stateoftheart approaches for distance labelling is primarily based on the idea of 2hop cover (Akiba et al., 2013; Fu et al., 2013; Abraham et al., 2011). One may ask the question: how is the highway cover labelling different from the 2hop cover labelling, such as the pruned landmark labelling (Akiba et al., 2013)? It is easy to verify the following lemma that each pruned landmark labelling satisfies the highway cover property for the same set of landmarks.
Lemma 3.13 ().
Let be a pruned landmark labelling over graph constructed using a set of landmarks . Then also satisfies the highway cover property over where .
As the pruned landmark labelling algorithm (Akiba et al., 2013) prunes labels based on the 2hop cover property, but our highway cover labeling algorithm prunes labels based on the property described in Lemma 3.7, by Theorem 3.12, we also have the following corollary, stating that, for the same set of landmarks, the size of the highway cover labelling is always smaller than the size of any pruned landmark labelling.
Corollary 3.14 ().
For a highway cover distance labelling produced by Algorithm 1 over , where , and a pruned landmark labelling over using any labelling order over , we always have .
Example 3.15 ().
Figure 4 shows the labelling size (LS) of the pruned landmark labelling at the top right corner, which is constructed using two different orderings. The first ordering labels 25 vertices whereas the second ordering labels 30 vertices. On the other hand, the LS of the highway cover distance labelling is 13 as shown in Figure 3. Note that the LS of the highway cover distance labelling does not change, irrespective of ordering. Since the highway cover distance labelling constructed by our algorithm is always minimal, the LS of the highway cover distance labelling in Figure 3 is much smaller than the LS of either pruned landmark labelling in Figure 4.
4. Bounded Distance Querying
In this section, we describe a bounded distance querying framework that allows us to efficiently compute exact shortestpath distances between two arbitrary vertices in a massive network.
4.1. Querying Framework
We start with presenting a highlevel overview of our querying framework. To compute the shortest path distance between two vertices and in graph , our querying framework proceeds in two steps: (1) an upper bound of the shortest path distance between to is computed using the highway cover distance labelling; (2) the exact shortest path distance between to is computed using a distancebounded shortestpath search over a sparsified graph from .
Given a graph and a highway over , we can precompute a highway cover distance labelling using the landmarks in , which enables us to efficiently compute the length of any constrained shortest path between two vertices in . The length of such a constrained shortest path must be greater than or equal to the exact shortest path distance between these two vertices and can thus serve as an upper bound in Step (1). On the other hand, since the length of such a constrained shortest path between two vertices in can always be efficiently computed by the highway cover distance labelling , the distancebounded shortestpath search only needs to be conducted over a sparsified graph by removing all landmarks in from , i.e. .
More precisely, we define the bounded distance querying problem in the following.
Definition 4.1 ().
(Bounded Distance Querying Problem) Given a sparsified graph , a pair of vertices , and an upper (distance) bound , the bounded distance querying problem is to efficiently compute the shortest path distance between and over under the upper bound such that,
In the following, we discuss the two steps of this framework in detail.
4.2. Computing Upper Bounds
Given any two vertices and , we can use a highway cover distance labelling to compute an upper bound for the shortest path distance between and as follows,
(4) 
This corresponds to the length of a shortest path from to passing through landmarks and , where is the shortest path distance from to in , is the shortest path distance from to through highway , and is the shortest path distance from to in .
Example 4.2 ().
Consider the graph in Figure 2(a), we may use the labels and to compute the upper bound for the shortest path distance between two vertices and . There are two cases: (1) for the path that goes through landmarks 5 and 1, we have , and (2) for the path that goes through landmarks 9 and 1, we have . Thus, we take the minimum of these two distances as the upper bound, which is 3 in this case.
4.3. DistanceBounded Shortest Path Search
We conduct a bidirectional search on the sparsified graph which is bounded by the upper bound from the highway cover distance labelling. For a pair of vertices , we run breadthfirst search algorithm from and , simultaneously (Hayashi et al., 2016). Algorithm 2 shows the pseudocode of our distancebounded shortest path search algorithm. We use two sets of vertices and to keep track of visited vertices from and . We use two queues and to conduct both a forward search from and a reverse search from . Furthermore, we use two integers and to maintain the current distances from and , respectively.
During initialization, we set and to and , and enqueue and into and , respectively. In each iteration, we increment or and expand or by running either a forward search (FS) or a reverse search (RS) as long as and have no any common vertex or is equal to the upper bound , and and are not empty. In the forward search from , we examine the neighbors of each vertex . Suppose we are visiting a vertex , if is included in vertex set , then it means that we find a shortest path to vertex of length , because the reverse search from had already visited with distance . At this stage, we return as the answer since we already know . Otherwise, we add vertex to and enqueue into a new queue . When we could not find the shortest distance in the iteration, we replace with and increase by 1, and check if . If it holds, then we return since .
Example 4.3 ().
In Figure 5(b), the upper distance bound between vertices 2 and 11 is 3, as computed in Example 4.2. Suppose that we run BFSs from vertices 2 and 11 respectively. First, a forward search from 2 enqueues its neighbors 7, 12 and 14 into and increases by 1. Then a reverse search from 11 enqueues 4 and 10 into and also sets to 1. At this stage, we have not found any common vertex between and , and which is less the upper bound . Therefore, we continue to start a search from the vertices in , which enqueues 5 into and increments to 2. Now, we have reaching the upper bound, hence we do not need to continue our search.
4.4. Correctness
The correctness of our querying framework can be proven based on the following two lemmas. More specifically, Lemma 4.4 can be derived by the highway cover property and the definition of . Lemma 4.5 can also be proven by the property of shortest path and the definition of the sparsified graph .
Lemma 4.4 ().
For a highway cover distance labelling over (), we have for any two vertices and of , where is computed using and .
Lemma 4.5 ().
For any two vertices , if there is a shortest path between and in that does not include any vertex in , then holds.
Thus, the following theorem holds:
Theorem 4.6 ().
Let be a graph, a highway over and a highway cover distance labelling. Then, for any two vertices , the querying framework over yields .
5. Optimization Techniques
In this section, we discuss optimization techniques for label construction, label compression, and query processing.
Dataset  Network  Type  avg. deg  max. deg  Sources  

Skitter  computer  undirected  1.7M  11M  6.5  13.081  35455  85 MB  (the Koblenz Network Collection, 2017) 
Flickr  social  undirected  1.7M  16M  9.1  18.133  27224  119 MB  (the Koblenz Network Collection, 2017) 
Hollywood  social  undirected  1.1M  114M  49.5  98.913  11467  430 MB  (Boldi and Vigna, 2004; Boldi et al., 2011) 
Orkut  social  undirected  3.1M  117M  38.1  76.281  33313  894 MB  (the Koblenz Network Collection, 2017) 
enwiki2013  social  directed  4.2M  101M  21.9  43.746  432260  701 MB  (Boldi and Vigna, 2004; Boldi et al., 2011) 
LiveJournal  social  directed  4.8M  69M  8.8  17.679  20333  327 MB  (the Koblenz Network Collection, 2017) 
Indochina  web  directed  7.4M  194M  20.4  40.725  256425  1.1 GB  (Boldi and Vigna, 2004; Boldi et al., 2011) 
it2004  web  directed  41M  1.2B  24.9  49.768  1326744  7.7 GB  (Boldi and Vigna, 2004; Boldi et al., 2011) 
social  directed  42M  1.5B  28.9  57.741  2997487  9.0 GB  (Boldi and Vigna, 2004; Boldi et al., 2011)  
Friendster  social  undirected  66M  1.8B  22.5  45.041  4006  13 GB  (Leskovec and Krevl, 2015) 
uk2007  web  directed  106M  3.7B  31.4  62.772  979738  25 GB  (Boldi and Vigna, 2004; Boldi et al., 2011) 
ClueWeb09  computer  directed  2B  8B  5.98  11.959  599981958  55 GB  (Rossi and Ahmed, 2015) 
5.1. Label Construction
A technique called BitParallelism (BP) has been previously used in several methods (Akiba et al., 2013; Hayashi et al., 2016) to speed up the label construction process. The key idea of BP is to perform BFSs from a given landmark and up to 64 of its neighbors simultaneously, and encode the relative distances (1, 0 or 1) of these neighbors w.r.t. the shortest paths between and each vertex into a 64bit unsigned integer. In the work (Akiba et al., 2013), BP was applied to construct bitparallel labels from initial vertices without pruning, which aimed to leverage the information from these bitparallel labels to cover more shortest paths between vertices. Then, both bitparallel labels and normal labels are constructed in the pruned BFSs. The work in (Hayashi et al., 2016) also used BP to construct thousands of bitparallel shortestpath trees (SPTs) because it is very costly to construct thousands of normal SPTs in memory owing to their prohibitively large space requirements and very long construction time.
In our work, we develop a simple yet rigorous parallel algorithm (HLP) which can run parallel BFSs from multiple landmarks (depending on the number of processors) to construct labelling in an extremely efficient way for massive networks, with much less time as will be demonstrated in our experiments.
5.2. Label Compression
The choice of the data structure for labels may significantly affect the performance of index size and memory usage. As noted in (Li et al., 2017), some works (Abraham et al., 2012; Delling et al., 2014) did not elaborate on what data structure they have used for representing labels. Nonetheless, for the works that are most relevant to ours, such as FD (Hayashi et al., 2016) and PLL (Akiba et al., 2013), they used 32bit integers to represent vertices and 8bit integers to represent distances for normal labels. In addition to this, they also used 64bits to encode the distances from a landmark to up to 64 of its neighbors in their shortest paths to other vertices. Since our approach only selects a very small number of landmarks to construct the highway cover labelling (usually no more than 100 landmarks), we may use 8 bits to represent landmarks and another 8 bits to store distances for labels. In order to fairly compare methods from different aspects, we have implemented our methods using both 32 bits and 8 bits for representing vertices in labels. However, different from the BP technique that uses 64bits to encode the distance information of up to 64 neighbours of a landmark, our parallel algorithm (HLP) does not use a different data structure for labels constructed in parallel BFSs.
5.3. Query Processing
We show that computing the upper bound can be optimized based on the observation, captured by the following lemma.
Lemma 5.1 ().
For a highway cover distance labelling over , where and , and any , if a landmark appears in both and , then holds for any other .
Proof.
By the definition of the highway cover property, we know that is not in the shortest path between and . Then by triangle inequality in Equation 1, this lemma can be proven. ∎
Thus, in order to efficiently compute the upper bound , for any landmarks that appear in both and , we compute the constrained shortest path distance between and using Equation 2, while for a landmark that only appear in one of and , we use Equation 4.2 to calculate the constrained shortest path distance between and . This would lead to more efficient computations for queries when the landmarks appear in both labels of two vertices.
6. Experiments
To compare the proposed method with baseline approaches, we have implemented our method in C++11 using STL libraries and compiled using gcc 5.5.0 with the O3 option. We performed all the experiments using a single thread on a Linux server (having 64 AMD Opteron(tm) Processors 6376 with 2.30GHz and 512GB of main memory) for sequential version of the proposed method and up to 64 threads for parallel version of the proposed method.
Dataset  CT[s]  QT[ms]  ALS  

HLP  HL  FD  PLL  ISL  HL  FD  PLL  ISL  BiBFS  HL  FD  PLL  ISL  
Skitter  2  13  30  638  1042  0.067  0.043  0.008  3.556  3.504  12  20+64  138+50  51 
Flickr  2  14  41  1330  8359  0.015  0.028  0.01  33.760  4.155  10  20+64  290+50  50 
Hollywood  3  17  107  31855  DNF  0.047  0.075  0.051    6.956  12  20+64  2206+50   
Orkut  10  62  366  DNF  DNF  0.224  0.251      21.086  11  20+64     
enwiki2013  9  77  308  22080  DNF  0.190  0.131  0.027    19.423  10  20+64  471+50   
LiveJournal  9  77  166  DNF  20583  0.088  0.111    56.847  17.264  13  20+64    69 
Indochina  8  50  144  9456  DNF  1.905  1.803  0.02    9.734  5  20+64  441+50   
it2004  66  304  1623  DNF  DNF  2.684  2.118      92.187  10  20+64     
133  1380  1838  DNF  DNF  1.424  0.432      426.949  14  20+64      
Friendster  135  2229  9661  DNF  DNF  1.091  1.435      534.576  19  20+64     
uk2007  110  1124  6201  DNF  DNF  11.841  18.979      355.688  8  20+64     
ClueWeb09  4236  28124  DNF  DNF  DNF  0.309          2       
Dataset  HL(8)  HL  FD  PLL  ISL 

Skitter  42MB  102MB  202MB  2.5GB  507MB 
Flickr  34MB  81MB  178MB  3.7GB  679MB 
Hollywood  28MB  67MB  293MB  13GB   
Orkut  70MB  170MB  756MB     
enwiki2013  83MB  200MB  743MB  12GB   
LiveJournal  123MB  299MB  778MB    3.8GB 
Indochina  81MB  192MB  999MB  21GB   
it2004  855MB  2GB  5.6GB     
1.2GB  2.8GB  4.8GB      
Friendster  2.5GB  5.2GB  11.8GB     
uk2007  1.8GB  4.3GB  14.1GB     
ClueWeb09  4.7GB  9GB       
6.1. Datasets
In our experiments, we used 12 largescale realworld complex networks, which are detailed in Table 1. These networks have vertices and edges ranging from millions to billions. Among them, the largest network is ClueWeb09 which has 2 billions of vertices and 8 billions of edges. We included this network in our experiments for the purpose of evaluating the robustness and scalability of the proposed method. In previous works, the largest dataset that has been reported is uk2007 which has only around 100 millions of vertices and 3.7 billions of edges. For all these networks, we treated them as undirected and unweighted graphs.
To investigate the query time of finding the distance between two vertices, we randomly sampled 100,000 pairs of vertices from all pairs of vertices in each network, i.e., . The distance distribution of these 100,000 randomly sampled pairs of vertices are shown in Figure 6(a)6(b), from which we can confirm that most of pairs of vertices in these networks have a small distance ranging from 2 to 8.
6.2. Baseline Methods
We compared our proposed method with three stateoftheart methods. Two of these methods, namely fully dynamic (FD) (Hayashi et al., 2016) and ISL (Fu et al., 2013), combine a distance labelling algorithm with a graph traversal algorithm for distance queries on complex networks. The third one is pruned landmark labelling (PLL) (Akiba et al., 2013) which is completely based on distance labelling to answer distance queries. Besides these, there are a number of other methods for answering distance queries, such as HDB (Jiang et al., 2014), RXL and CRXL (Delling et al., 2014), HCL (Jin et al., 2012), HHL (Abraham et al., 2012) and TEDI (Wei, 2010). However, since the experimental results of the previous works (Hayashi et al., 2016; Akiba et al., 2013) have shown that FD outperforms HDB, RXL and CRXL, and PLL outperforms HCL, HHL and TEDI, we omit the comparison with these methods.
In our experiments, the implementations of the baseline methods FD, ISL and PLL were provided by their authors, which were all implemented in C++. We used the same parametric settings for running these methods as suggested by their authors. For instance, the number of landmarks is chosen to 20 for FD (Hayashi et al., 2016), the number of bitparallel BFSs is set to 50 for PLL (Akiba et al., 2013), and is 6 for graphs larger than 1 million vertices for ISL (Fu et al., 2013).
6.3. Comparison with Baseline Methods
To evaluate the performance of our proposed approach, we compared our approach with the baseline methods in terms of the construction time of labelling, the size of labelling, and querying time. The experimental results are presented in Tables 2 and 3, where DNF denotes that a method did not finish in one day or ran out of memory. In order to make a consistent comparison with the baseline methods (Hayashi et al., 2016; Akiba et al., 2013; Fu et al., 2013), we chose top 20 vertices as landmarks after sorting based on decreasing order of their degrees, and also used 32bit integers to represent vertices and 8bit integers to represent distances.
6.3.1. Construction Time
As shown in Table 2, our proposed method (HL) has successfully constructed the distance labelling on all the datasets for a significantly less amount of time than the stateoftheart methods. As compared to FD, our method is on average 5 times faster and have results on all the datasets. In contrast to this, FD failed to construct labelling for the largest dataset ClueWeb09. PLL failed for 7 out of 12 datasets, including the datasets Orkut and LiveJournal which have less than 120 millions of edges, due to its prohibitively high preprocessing time and memory requirements for building labelling. ISL failed to construct labelling for all the datasets that have edges more than 100 million due to its very high cost for computing independent sets on massive networks, i.e. failed for 9 out of 12 datasets. We can also see from Table 2 that the parallel version of our method (HLP) is much faster than the sequential version (HL). Compared with FD, HLP is more than 5070 times faster for the two large datasets Friendster and uk2007. This confirms that our method can construct labelling very efficiently and is scalable on large networks with billions of vertices and edges.
6.3.2. Labelling Size
As we can see from Table 3 that the labelling sizes of all the datasets constructed by the proposed method are significantly smaller than the labelling sizes of FD and much smaller than PLL and ISLabel. Specifically, our labelling sizes using 32bits representation of vertices (HL) are 25 times smaller than FD except for ClueWeb09 (as discussed before, FD failed to construct labelling for ClueWeb09), 7 times smaller than ISLabel on Skitter, Flickr and LiveJournal and more than 60 times smaller than PLL for Skitter, Flickr, Hollywood, enwiki2013 and Indochina. The compressed version of our method that uses 8bits representation of vertices (i.e. HL(8)) produces further smaller index sizes as compared to uncompressed version (HL). Here, It is important to note that the labelling sizes of almost all the datasets are also significantly smaller than the original sizes of the datasets shown in Table 1. This also shows that our method is highly scalable on large networks in terms of the labellng sizes.
6.3.3. Query Time
The average query times of our method (HL) are comparable with FD and PLL and faster than ISL. Particularly, the average query time of our method on Hollywood is even faster than FD and PLL. This is due to a very small average labelling size (i.e., 12) as compared with FD and PLL (i.e., 20+64 and 2206+50, respectively) and a very small average distance. The average query time of HL on Twitter is 3 times slower than FD. This may be due to a large portion of covered pairs by FD as shown in Figure 9 which contributes towards an effective bounded traversal on the sparsified network since the landmarks of Twitter have very high degrees and the average distance is also very small. Moreover, the average query times of HL and FD on Indochina, it2004, Friendster and uk2007 are more than 1ms due to comparatively large average distances than other datasets as shown in Figure 6(b). Note that all the baseline methods are not scalable enough to have results for ClueWeb09 and the average query time on ClueWeb09 of our method HL is small because of a very large portion of covered pairs and a small average label size. We also reported the average query time for online bidirectional BFS algorithm (BiBFS) using randomly selected 1000 pairs of vertices in Table 2. As we can see that BiBFS has considerably long query times, which are not practicable in applications for performing distance queries in real time.
6.4. Performance under Varying Landmarks
We have also evaluated the performance of our method (HL) by varying the number of landmarks between 10 and 50, which are again selected based on highest degrees.
6.4.1. Construction Time
The construction times of our method HL against different numbers of landmarks (from 10 to 50) are shown in Figure 7(a)7(d). We can see that the construction times are linear in terms of the number of landmarks, which confirms the scalability of our method. In Figure 7(a)7(b), our method is able to construct labelling for 7 datasets under 50 landmarks from 20 seconds to 2 minutes, which is not possible with any stateoftheart methods. In Figure 7(c), the construction time using 50 landmarks of Friendster is 3 times faster and the construction time of uk2007 is 4 times faster than FD using only 20 landmarks as shown in Table 2. Figure 7(d) shows the construction time for ClueWeb09 which has 2 billion vertices and 8 billion edges. The significant improvement in construction time allows us to compute labelling for a large number of landmarks, leading to better pair coverage ratios to tighten upper distance bounds (will be further discussed in Section 6.4.4).
6.4.2. Labelling Size
Figure 9 shows the labelling sizes of HL using 10, 20, 30, 40 and 50 landmarks on all the dataset, and of FD using only 20 landmarks on all the datasets except for ClueWeb09 (as discussed before, FD failed to construct labeling for ClueWeb09). It can be seen that the labelling sizes of HL increase linearly with the increased number of landmarks, and even the labelling sizes of HL using 50 landmarks are almost always smaller than the labelling sizes constructed by FD using only 20 landmarks. This reduction in labelling sizes enables us to save space and memory, thus makes our method scalable on large networks.
6.4.3. Query Time
Figure 7 shows the impact of using different numbers of landmarks between 10 and 50 on average query time of our method. The average query times either decrease or remain the same when the number of landmarks increases, except for Orkut when using 30 landmarks and for Friendster when using landmarks greater than 20. In particular, on Friendster, labelling sizes are very large as shown in Figure 9 and the fraction of covered pairs (i.e., pair coverage ratio) is very small as shown in Figure 9, which may have slowed down our query processing due to a longer time for computing upper distance bounds and ineffective use of boundeddistance traversal.
6.4.4. Pair Coverage
Figure 9 presents the ratios of pairs of vertices covered by at least one landmark (i.e., pair coverage ratios) in HL using 1050 landmarks and in FD using 20 landmarks. As we can observe that the pair coverage ratios for HL increase when the number of landmarks increases and 40 turns out to be the better choice on the number of landmarks for most of the datasets. Specifically, pair coverage ratios on Orkut, enwiki2013, Indochina and uk2007 with 40 landmarks are good, resulting in better query times than using 20 landmarks, as shown in Figure 7. On datasets such as Hollywood and it2004, 30 landmarks are a better option than 40 landmarks because they only slightly differ in the pair coverage ratios and query times w.r.t. using 40 landmarks, but with reduced labelling sizes. The pair coverage ratios by FD are greater than HL on all the datasets except for ClueWeb09, which may be the reason behind its better query times for some datasets as shown in Table 2. Note that, on ClueWeb09, we obtain almost hundred percentage for pair coverage due to its very high degree landmarks.
7. Related Work
A naive solution for exact shortestpath distance computation is to run the Dijkstra search for weighted graphs or BFS for unweighted graphs, from a source vertex to a destination vertex (Tarjan, 1983). To improve search efficiency, a bidirectional scheme can be used to run two such searches: one from the source vertex and the other from the destination vertex (Pohl, 1971). Later on, Goldberg et al. (Goldberg and Harrelson, 2005)
combined the bidirectional search technique with the A* algorithm to further improve the search performance. In their method, they precomputed labeling based on landmarks to estimate the lower bounds, and used that estimate with a bidirectional A* search for efficient computation of shortestpath distances. However, this method is known to work only for road networks and do not scale well on complex networks
(Hayashi et al., 2016).To efficiently answer exact shortestpath distance queries on graphs, labellingbased methods have been developed with great success (Akiba et al., 2013; Abraham et al., 2012; Fu et al., 2013; Jin et al., 2012; Abraham et al., 2011; Li et al., 2017). Most of them construct a labeling based on the idea of 2hop cover (Cohen et al., 2003). It has also been shown that computing a minimal 2hop cover labeling is NPhard (Abraham et al., 2012; Cohen et al., 2003). In (Abraham et al., 2011), the authors proposed a hubbased labeling algorithm (HL) which constructs hub labelling by processing contraction hierarchies (CH) and is among the fastest known algorithms for distance queries in road networks. However, the method is not feasible for complex networks as reported by the same authors and they thus proposed a hierarchical hublabeling (HHL) algorithm for complex networks in (Abraham et al., 2012). In this work, a topdown method was used to maintain a shortestpath tree for every vertex in order to indicate all uncovered shortestpaths at each vertex. Due to very high storage and computation requirements, the method is also not scalable for handling large graphs. Another method called Highway Centric Labeling (HCL) was proposed by Jin et al. (Jin et al., 2012) which exploits highway structure of a graph. This method aimed to find a spanning tree which can assist in optimal distance labelling and used that spanning tree as a highway to compute a highwaybased 2hop labelling for fast distance computation. After that, in (Akiba et al., 2013), Akiba et al. proposed the pruned landmark labeling (PLL) method which precomputes a distanceaware 2hop cover index by performing a pruned breadthfirst search (BFS) from every vertex. The idea is to prune vertices whose distance information can be obtained using a partially available 2hop index constructed via previous BFSs. This work helps to achieve low construction cost and smaller index size due to reduced search space on millionscale networks. It has been shown that PLL outperforms other stateoftheart methods available at the time of publication, including HHL (Abraham et al., 2012), HCL (Jin et al., 2012) and TEDI (Wei, 2010). However, PLL is still not feasible for constructing 2hop cover indices for billionscale networks due to a very high memory requirement for labelling construction.
Fu et al. (Fu et al., 2013) proposed ISLabel (ISL) which gained significant scalability in precomputing 2hop cover distance labellings for large graphs with hundreds of millions of vertices and edges. ISL uses the notion of an independent set of vertices in a graph. First, it computes an independent set of vertices from a graph, then it constructs a graph by removing the independent set of vertices from the previous graph recursively and augments edges that preserve distance information after the removal of the independent set of vertices. All the vertices in the remaining graph preserve their distance information to/from each other. Generally, ISL is regarded as a hybrid method that combines distance labelling with graph traversal for complex networks (Li et al., 2017). Following the same line of thought, very recently, Akiba et al. (Hayashi et al., 2016) proposed a method to accelerate shortestpath distances computation on largescale complex networks. To the best of our knowledge, this work is most closely related to our work presented in this paper. The key idea of the method in (Hayashi et al., 2016) is to select a small set of landmarks and precompute shortestpath trees (SPTs) rooted at each . Given any two vertices and , it first computes the upper bound by taking the minimum length among the paths that pass through . Then a bidirectional BFS from to is conducted on the subgraph to compute the shortestpath distances that do not pass through and take the minimum of these two results as the answer to an exact distance query. The experiments in (Hayashi et al., 2016) showed that this method can scale to graphs with millions of vertices and billions of edges, and outperforms the stateoftheart exact methods PLL (Akiba et al., 2013), HDB (Jiang et al., 2014), RXL and CRXL (Delling et al., 2014) with significantly reduced construction time and index size, while the query times are higher but still remain among 0.010.06 for most of graphs with less than 5M vertices.
Although the method proposed in (Hayashi et al., 2016) has been tested on a large network with millions of vertices and billions of edges, it still fails to construct labelling on billionscale networks in general, particularly with billions of vertices. In contrast, our proposed method not only constructs labellings linearly with the number of landmarks in large networks with billions of vertices, but also enables the sizes of labellings to be significantly smaller than the original network sizes. In addition to these, the deterministic nature of labelling allows us to achieve further gains in computational efficiency using parallel BFSs over multiple landmarks, which is highly scalable for handling billionscale networks.
8. Conclusion
We have presented a scalable solution for answering exact shortest path distance queries on very large (billionscale) complex networks. The proposed method is based on a novel labelling algorithm that can scale to graphs at the billionscale, and a querying framework that combines a highway cover distance labelling with distancebounded searches to enable fast distance computation. We have proven that the proposed labelling algorithm can construct HWCminimal labellings that are independent of the ordering of landmarks, and have further developed a parallel labelling method to speed up the labelling construction process by conducting BFSs simultaneously for multiple landmarks. The experimental results showed that the proposed methods significantly outperform the stateoftheart methods. For future work, we plan to investigate landmark selection strategies for further improving the performance of labelling methods.
References
 (1)
 Abraham et al. (2011) Ittai Abraham, Daniel Delling, Andrew V Goldberg, and Renato F Werneck. 2011. A hubbased labeling algorithm for shortest paths in road networks. In SEA. 230–241.
 Abraham et al. (2012) Ittai Abraham, Daniel Delling, Andrew V Goldberg, and Renato F Werneck. 2012. Hierarchical hub labelings for shortest paths. In ESA. 24–35.
 Akiba et al. (2013) Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortestpath distance queries on large networks by pruned landmark labeling. In ACM SIGMOD. 349–360.
 Akiba et al. (2012) Takuya Akiba, Christian Sommer, and Kenichi Kawarabayashi. 2012. Shortestpath queries for complex networks: exploiting low treewidth outside the core. In EDBT. 144–155.
 Backstrom et al. (2006) Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In ACM SIGKDD. 44–54.
 Boldi et al. (2011) Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution CoordinateFree Ordering for Compressing Social Networks. In WWW. 587–596.
 Boldi and Vigna (2004) Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In WWW. 595–601.
 Chang et al. (2012) Lijun Chang, Jeffrey Xu Yu, Lu Qin, Hong Cheng, and Miao Qiao. 2012. The exact distance to destination in undirected world. The VLDB Journal 21, 6 (2012), 869–888.
 Cohen et al. (2003) Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2hop labels. SIAM J. Comput. 32, 5 (2003), 1338–1355.
 Delling et al. (2014) Daniel Delling, Andrew V Goldberg, Thomas Pajor, and Renato F Werneck. 2014. Robust distance queries on massive networks. In ESA. 321–333.
 Freeman (1977) Linton C Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry (1977), 35–41.
 Fu et al. (2013) Ada WaiChee Fu, Huanhuan Wu, James Cheng, and Raymond ChiWing Wong. 2013. Islabel: an independentset based labeling scheme for pointtopoint distance querying. VLDB 6, 6 (2013), 457–468.
 Goldberg and Harrelson (2005) Andrew V Goldberg and Chris Harrelson. 2005. Computing the shortest path: A search meets graph theory. In SODA. 156–165.
 Gubichev et al. (2010) Andrey Gubichev, Srikanta Bedathur, Stephan Seufert, and Gerhard Weikum. 2010. Fast and accurate estimation of shortest paths in large graphs. In CIKM. 499–508.
 Hayashi et al. (2016) Takanori Hayashi, Takuya Akiba, and Kenichi Kawarabayashi. 2016. Fully Dynamic ShortestPath Distance Query Acceleration on Massive Networks. In CIKM. 1533–1542.
 Jiang et al. (2014) Minhao Jiang, Ada WaiChee Fu, Raymond ChiWing Wong, and Yanyan Xu. 2014. Hop doubling label indexing for pointtopoint distance querying on scalefree networks. VLDB 7, 12 (2014), 1203–1214.
 Jin et al. (2012) Ruoming Jin, Ning Ruan, Yang Xiang, and Victor Lee. 2012. A highwaycentric labeling approach for answering distance queries on large sparse graphs. In ACM SIGMOD. 445–456.
 Leskovec and Krevl (2015) Jure Leskovec and Andrej Krevl. 2015. SNAP Datasets:Stanford Large Network Dataset Collection. (2015).
 Li et al. (2017) Ye Li, Man Lung Yiu, Ngai Meng Kou, et al. 2017. An experimental study on hub labeling based shortest path algorithms. VLDB 11, 4 (2017), 445–457.
 Maniu and Cautis (2013) Silviu Maniu and Bogdan Cautis. 2013. Networkaware search in social tagging applications: instance optimality versus efficiency. In CIKM. 939–948.
 Pohl (1971) Ira Pohl. 1971. Biderectional search. Machine intelligence 6 (1971), 127–140.
 Potamias et al. (2009) Michalis Potamias, Francesco Bonchi, Carlos Castillo, and Aristides Gionis. 2009. Fast shortest path distance estimation in large networks. In CIKM. 867–876.
 Qiao et al. (2014) Miao Qiao, Hong Cheng, Lijun Chang, and Jeffrey Xu Yu. 2014. Approximate shortest distance computing: A querydependent local landmark scheme. IEEE TKDE 26, 1 (2014), 55–68.
 Qin et al. (2017) Yongrui Qin, Quan Z Sheng, Nickolas JG Falkner, Lina Yao, and Simon Parkinson. 2017. Efficient computation of distance labeling for decremental updates in large dynamic graphs. WWW 20, 5 (2017), 915–937.
 Rossi and Ahmed (2015) Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.com
 Sabidussi (1966) Gert Sabidussi. 1966. The centrality index of a graph. Psychometrika 31, 4 (1966), 581–603.
 Tarjan (1983) Robert Endre Tarjan. 1983. Data structures and network algorithms. Vol. 44. Siam.
 the Koblenz Network Collection (2017) KONECT the Koblenz Network Collection. 2017. http://konect.unikoblenz.de/networks/
 Tretyakov et al. (2011) Konstantin Tretyakov, Abel ArmasCervantes, Luciano GarcíaBañuelos, Jaak Vilo, and Marlon Dumas. 2011. Fast fully dynamic landmarkbased estimation of shortest path distances in very large graphs. In CIKM. 1785–1794.
 Ukkonen et al. (2008) Antti Ukkonen, Carlos Castillo, Debora Donato, and Aristides Gionis. 2008. Searching the wikipedia with contextual information. In CIKM. 1351–1352.
 Vieira et al. (2007) Monique V Vieira, Bruno M Fonseca, Rodrigo Damazio, Paulo B Golgher, Davi de Castro Reis, and Berthier RibeiroNeto. 2007. Efficient search ranking in social networks. In CIKM. 563–572.
 Wei (2010) Fang Wei. 2010. TEDI: efficient shortest path query answering on graphs. In ACM SIGMOD. 99–110.
 Yahia et al. (2008) Sihem Amer Yahia, Michael Benedikt, Laks VS Lakshmanan, and Julia Stoyanovich. 2008. Efficient network aware search in collaborative tagging sites. VLDB 1, 1 (2008), 710–721.
Comments
There are no comments yet.