Optimal sequenced route (OSR) querying [32, 33], a.k.a., generalized shortest path querying , aims at finding a route with minimum total cost (e.g., travel distance or travel time), passing through a number of vertex categories (e.g., restaurants, banks, gas stations) in a particular order (e.g., visiting banks before restaurants). This problem has many practical applications in route planing [14, 19], crisis management, supply chain management, video surveillance, mobility-as-a-service , and logistics [32, 29]. However, it is often the case that the optimal sequenced route with the minimum total cost may not be the best choice for all users since different users may have different personal preferences [28, 10, 36].
Consider the example shown in Figure 1, where a vertex represents a point-of-interest and is associated with a category, e.g., shopping mall (), restaurant (), or cinema () and edge weights represent travel costs, e.g., travel time or fuel consumption. Suppose that Alice plans a trip which starts from location and wishes passing through a shopping mall, a restaurant, and then a cinema and finally reaching destination . This plan can be formalized with an OSR query with category sequence . The optimal sequenced route for Alice is with a cost of 20. However, if Alice prefers restaurant to restaurant , route with a cost of 21 is more preferable. In addition, if the shopping mall at vertex has sale promotions, route with a cost of 22 can also be a good candidate. In these cases, returning only the optimal sequenced route may not sufficiently satisfy users’ varying preferences. This motivates us to study the top- optimal sequenced routes (KOSR) querying that returns routes that satisfy the given category order and have the least total costs.
In this paper, we focus on finding the top- optimal sequenced routes in general graphs, where edge weights may not satisfy triangle inequality. Unfortunately, the KOSR problem on general graphs has not been addressed carefully before, though the OSR problem has been extensively studied. In , the progressive neighbor exploration algorithm PNE is proposed to solve the OSR problem on general graphs. In , a dynamic programming based algorithm GSP is formulated, which outperforms PNE significantly and is considered as the state-of-the-art for solving the OSR problem on general graphs.
However, by simply extending existing solutions for the OSR problem, it is unlikely, if it is not impossible, to achieve efficient solutions for the KOSR problem. In particular, dynamic programming based GSP is unable to be extended to solve the KOSR problem due to lack of sufficient information for other sequenced routes. Although PNE can be extended to handle the KOSR problem by iteratively finding the next optimal sequenced route, the efficiency is low since all partially explored sequenced routes whose costs are less than the cost of the -th optimal sequenced route must be examined, whereas most of them can be avoided being extended.
It is non-trivial to devise an efficient solution for solving KOSR due to two challenges. The first is how to filter unnecessary partially explored sequenced routes when exploring the graph. To conquer this challenge, we propose a dominance relationship between two partially explored sequenced routes and . If dominates , the optimal (i.e., least-cost) feasible sequenced route extended from is always better than that of . Thus, the exploring of routes that are extended from can be postponed until a complete sequenced route extended from occurs in the result set. Furthermore, inspired by A algorithm , we estimate the cost of each partially explored sequenced route to the destination, and explore the partially explored routes according to their estimated total costs, which further reduces the searching space.
The second challenge is how to efficiently find the -th nearest, not merely the nearest, neighbor in a category, as this operation is invoked frequently when solving KOSR. For example, recall that we may want to recommend the top-3 optimal sequenced routes to Alice in Figure 1. More than one nearest neighbors in category for vertex , i.e., and , are required to be explored. A simple and intuitive implementation of the operation is to apply Dijkstra’s algorithm, which is however very costly. To overcome this weakness, we build an inverted label index for each category by employing hop labeling technique [9, 1, 2, 4, 5] on the original graph in an off-line manner. In this way, the -th nearest neighbor in a category can be identified efficiently in an on-line manner by simply looking up the inverted label index.
To the best of our knowledge, this is the first comprehensive work to study the KOSR problem. The paper makes four contributions. First, we propose a dominance relationship between partially explored sequenced routes and develop an algorithm based on the dominance relationship to reduce the searching space significantly when solving the KOSR problem. Second, we propose a heuristic method that is able to estimate the minimal total cost of partially explored sequenced routes, which enables the develop of an A like algorithm to further reduce the searching space for solving the KOSR problem. Third, we propose an inverted label index which facilitates the operation that identifies the -th nearest neighbor in a category for a given vertex, which improves the efficiency of both algorithms. Finally, we report on a comprehensive empirical study over different real-world graphs, showing that the proposed algorithms significantly outperform the baseline method for KOSR and the state-of-the-art method for OSR.
Ii Related work
We categorize relevant studies on sequenced route querying in Table I. This categorization considers three different aspects. First, we consider whether the algorithms work for general graphs. When edge weights represent Euclidean distances between vertices, the edge weights satisfy the triangle inequality. We call such graphs Euclidean graphs. When edge weights represent other costs such as travel times and fuel consumption [18, 35], the edge weights do not necessarily satisfy triangle inequality anymore. We call such graphs general graphs. Note that Euclidean distance and indexing structures based on the Euclidean space, such as R-trees, cannot be utilized in general graphs. The proposed algorithms in this paper work for general graphs. Second, we consider whether the algorithms support returning the top- optimal sequenced routes. Most existing studies only work for the case when only the top-1 optimal sequenced route is required. Third, we consider whether a specific category order is given. Table I clearly shows that this paper is the first comprehensive study for addressing the sequenced route problem on general graphs, with specific category orders, and , i.e., the top- optimal sequenced route (KOSR) problem.
|Euclidean Graphs||General Graphs|
|Specific order: Arbitrary order:[8, 27, 26]||Specific order:[32, 33, 29] Arbitrary order:[30, 7]|
|Specific order:[31, 22, 23] Arbitrary order:||Specific order: This paper Arbitrary order:|
The optimal sequenced route querying [32, 33], a.k.a., the generalized shortest path querying , is the most relevant problem.  is the first work that addresses the problem, in which three algorithms are proposed, namely LORD, R-LORD and PNE. The first two algorithms, LORD and R-LORD, are designed for edge weights in Euclidean spaces where R-trees can be utilized to enable efficient query processing. The PNE algorithm works for general graphs. In this paper, we extend PNE to solve the KOSR problem, which is regarded as the baseline method. tries to improve the efficiency of optimal sequenced route querying on general graphs by pre-constructing a series of additively weighted voronoi diagrams (AWVD). However, this approach requires a prior knowledge of the category sequence in a query, thus limiting its applicability for online queries, because it is prohibitive to pre-construct AWVDs for all possible category sequences.  addresses the optimal sequenced route queries on general graphs by using a dynamic programming formulation. In their formulation, the optimal costs of all vertices in each category from the start and passing through all the categories before them are computed by using a transition function between consecutive categories. In their solutions, contraction hierarchy technique  is utilized to compute the optimal costs of the vertices in the next category according to above recurrence. Though efficient, this approach cannot be extended to KOSR queries, because the transition function only suits the optimal cost.
Group optimal sequenced routes problem [31, 22, 23] is also relevant to KOSR. Given a group of users with different sources and destinations and a set of ordered categories, group optimal sequenced routes querying aims to find the top- optimal sequenced routes that pass through the categories in order and minimize the aggregate travel costs of the group. Specifically, when the group only has one user, then the problem becomes the KOSR problem. However, all existing methods are based on Euclidean space. Thus, they cannot be applied in general graphs.
[8, 27, 26, 7] study the problem on finding the optimal route that visits a given set of categories, but without a specific category order. Sometimes, additional constraints, such as partial order [27, 8] and budget limit , are also considered. Such problems are NP-hard and can be reduced to generalized traveling salesman problem . Therefore, approximate methods are proposed to solve such problems. Due to different problem natures, above methods cannot be directly applied for KOSR. Other advanced routing strategies [16, 37, 39], e.g., skyline routing [18, 35, 17], stochastic routing [11, 34, 24, 25], and personalized routing [36, 10], are also different from KOSR.
We formalize the KOSR problem and introduce baseline solution. Frequent notations are summarized in Table II.
|A route from to|
|A category sequence ,|
|The number of categories in category sequence|
|The vertex set of category|
|The number of vertices that belong to category , i.e.,|
|Witness , such that for|
|The number of vertices in route or witness|
|The weight of route or witness|
|The least cost from vertex to|
|Top results are needed|
Iii-a Problem Definition
Definition 1 (Graph)
A directed weighted graph includes a vertex set and an edge set . Category function takes as input a vertex and returns a set of categories , where denotes a set of all possible categories. Weight function takes as input an edge and returns a non-negative cost of the edge , e.g., the travel time when traversing edge .
For example, in Figure 1, we have , , and . Note that the edge weights can be arbitrary and may not satisfy the triangle inequality.
Definition 2 (Route)
A route from vertex to vertex in graph is a sequence of vertices, where each two adjacent vertices are connected by an edge, denoted by . Let be the weight, or cost, of route and be the size of route which equals to the number of vertices in route .
Definition 3 (Category Sequence)
A category sequence , represents an order in which each category must be visited, where each , , represents a specific category in category set , and each corresponds to a vertex set . We refer and to the size of the category sequence and the size of , respectively.
Definition 4 (Feasible Route)
Given a source-destination pair , and a category sequence , a route is feasible if and only if there exists a subsequence of vertices from , such that and for , or . We call the witness111Note that a witness may not represent a route according to Definition 2 as consecutive vertices in a witness may not be connected by an edge. of w.r.t category sequence , denoted as .
In many cases, there exist multiple feasible routes for a given source-destination pair and a category sequence. We distinguish two feasible routes according to their witnesses. This means that if two feasible routes share the same witness w.r.t a category sequence, they are regarded as the same feasible route and only the route with smaller cost is considered. Formally, for a witness , its cost is defined as , where is the least cost from vertex to .
Definition 5 (KOSR query)
Given a graph , the top- optimal sequenced routes (KOSR) query is a quad-tuple , where denotes a source-destination pair, is a category sequence, and is a positive integer. The query returns a set of different feasible routes w.r.t , , such that there does not exist any other feasible route in where .
Consider the graph in Figure 1, the KOSR query returns =, , that includes routes with costs of 20, 21, and 22. There does not exist another feasible path whose cost is smaller than 22.
To simplify later discussion, we focus on identifying the witnesses of top- optimal sequenced routes, rather than identifying the actual routes. However, given the witness, its actual route can be easily reconstructed. For simplicity, all routes we discuss in the following sections refer to witnesses unless stated otherwise. Moreover, given a category sequence , we introduce two dummy categories and to include the source vertex and destination vertex .
Iii-B Baseline Solution
Since OSR can be considered as a special case of KOSR where is set to 1, we present PNE and GSP, two state-of-the-art methods for solving OSR, and we present the baseline KPNE, which is extended from PNE, for solving KOSR.
The progressive neighbor exploration (PNE) algorithm  is able to find the optimal sequenced route in general graphs. Algorithm 1 shows the sketch of PNE. During the processing, a priority queue is maintained for partially explored routes (witnesses). At each iteration, the route with minimal cost in the priority queue is chosen to be examined, where for each . To extend from the route, we need to consider vertices in the next category . Instead of extending the route via all its neighbors in category , only the nearest neighbor of , such that , is considered. Moreover, to guarantee the correctness, another candidate route derived from is incrementally generated by extending via ’s next nearest neighbor in , such that and . The algorithm returns the optimal route as it passes through all categories in order and reaches the destination. Since a vertex’s neighbors in the next category can be as many as , it is impractical to compute the least costs from the vertex to all its neighbors. By progressively extending route via its nearest neighbors and generating candidate route derived from it, PNE carefully examines all the possible partially explored candidate routes on demand to find the optimal sequenced route. It is possible to extend PNE to solve KOSR problem, we only need to add a result set and each time we find an optimal sequenced route (line 5), it will be added to the result set, when the result set consists of routes or the priority queue is already empty, the set will be returned as the result of KOSR. We refer to this method for solving KOSR as KPNE.
Another state-of-the-art method, namely GSP, for the optimal sequenced route within graph is proposed in , in which, a dynamic programming solution is formulated as follows:
where records the least cost of the route of the -th vertex (starting from 0) in the -th category that from the source and passes through all the categories before it and is the least cost from the th vertex in category to the -th vertex in category . As a result, the cost of the optimal sequenced route will be . To compute the matrix efficiently, contraction hierarchy technique  is utilized to compute the least costs of the vertices in the next category according to above recurrence. By applying times of forward search (Dijkstra’s algorithm based search by using contraction hierarchy) and backward search (DFS based) as well as pruning optimizations, GSP can efficiently find the optimal sequenced route. However, since only the least cost of each vertex is considered and the above recurrence only suits the least cost, GSP cannot be directly extended to KOSR problem.
Although KPNE which is extended from  is able to solve KOSR on general graphs, it is inefficient since all partially explored candidate routes whose costs are smaller than the cost of the -th optimal sequenced route must be examined. In the worst case, the number of examined partially explored candidate routes at category can reach , as a result, the total number of routes to be examined by KPNE can be , which is too huge to process on large graphs.
Iv Proposed solutions for KOSR
In this section, we propose two efficient methods to solve KOSR. We first describe a method based on the route dominance relationship to filter unnecessary partially explored candidate routes in Section IV-A, which reduces the searching space. Moreover, we demonstrate the extensibility of the proposed method by incorporating an optimization technique that is able to find the -th nearest neighbor in a category for a given vertex efficiently. Subsequently, we further reduce the searching space by integrating a heuristic estimation in an A manner in Section IV-B.
Iv-a Dominance Based Algorithm
We first illustrate the intuition of the route dominance relationship. Consider a KOSR query in Figure 1. In order to find the first optimal sequenced route with the cost of 20 (shorten as ), KPNE will attempt to examine and extend and , because both and have a smaller cost than . However, there is no need to extend to find , because the cost of the optimal feasible route extended from won’t be smaller than that of (i.e., ). Hence, can be excluded to be extended until the optimal sequenced route is found. In this case, we say is dominated by . Next, we formally define the dominance relationship.
Definition 6 (Dominance)
Consider a given category sequence and two partially explored candidate routes (witnesses) and (). If and holds, dominates w.r.t , denoted as .
Given a KOSR query = and two partially explored routes and , if , then , where and are the optimal feasible routes that are extended from and , respectively.
Suppose =, = and =, since is the optimal feasible route extended from , must be the optimal sequenced route for category sub-sequence from to . Because , we have and , thus, can be represented by , then and , since , we have .
According to Lemma 1, there is no need to extend the dominated partially explored routes until the optimal feasible route extended from their dominating route become one of the top- optimal sequenced routes. This is because the partially explored candidate routes that are dominated by other partially explored candidate routes with smaller costs can never be extended to be the next optimal sequenced routes before their dominating routes. On the other hand, after an optimal sequenced route is found, we need to reconsider its corresponding dominated routes, so that they can be extended to be the next optimal sequenced routes. Based on the dominance relationship, we propose PruningKOSR method (Algorithm 2).
To check the dominance relationship and maintain the dominated routes, for each vertex , we introduce two hash tables in the form of pairs. One is HT for dominating routes, where is the size of the partially explored dominating route that has been extended at , and the is the route itself. Another one is HT for dominated routes, where represents the size of dominated route, and is a priority queue for the routes with the size of that have reached and been dominated, the dominated routes are ordered according to their costs in an ascending order. We also maintain a result set for the top- optimal sequenced routes and a global priority queue for partially explored routes (witnesses) sorted by their costs in an ascending order. Moreover, for each route , we introduce an additional attribute to indicate that is the -th nearest neighbor of in category when generating . Initially, only the source with is added to the queue . Then we begin a loop until is empty or the top- optimal sequenced routes have been found.
Pruning dominated routes: At each iteration, the route with the minimum cost is chosen to be examined. If it already reaches the destination, we add it to the result set and reconsider the dominated routes (lines 6–12). Otherwise, we check whether it is dominated. For a route to be examined, if is the first route with size that reaches vertex , we add to the HT of and extend it via ’s nearest neighbor in category (lines 14–17). Otherwise, if its size is in the HT of , it means that another route with size and smaller cost has been reached and extended at , so that is dominated. According to Lemma 1, there is no need to extend anymore, therefore, we insert it into the HT of instead of the priority queue (line 19). Subsequently, we generate a new candidate route derived from . Since the candidate route via the -th nearest neighbor of has been generated in previous iterations, we need to find ’s -th nearest neighbor in category , , by invoking algorithm FindNN, and create candidate route with incremental and insert it into the priority queue (lines 20–22).
Reconsider dominated routes: After an optimal sequenced route has been found, we need to reconsider the partially explored routes that are dominated by sub-routes of , since these routes now can possibly be extended to be the next optimal sequenced route. Therefore, for each vertex in , if dominates the routes with size of in the HT of (line 9), we only reconsider the dominated route with the least cost in the HT of , because other routes in HT of are dominated by . This also explains why we use a priority queue as the in hash table HT. Since ’s nearest neighbor has been computed after it is dominated, we set its to ‘-’ (which means there is no need to generate candidate route that is derived from ) and re-add it to the priority queue (lines 10–11). Meanwhile, we remove from the HT of , so that the next candidate route that reaches can be extended (line 12).
Algorithm 2 returns the correct result for a KOSR query.
To find the next optimal sequenced route, all possible partially explored candidate routes are considered (lines 14–17 and 20–22) except for the dominated routes (line 19) which can be removed from extending according to Lemma 1. After an optimal sequenced route is found, the dominated routes that can be extended to be the next optimal sequenced route are reconsidered. Therefore, Algorithm 2 returns the correct result for a KOSR query.
Consider Figure 1. Suppose the given query is . Table III shows the routes in the priority queue at each step and the hash tables of vertex at different steps. At step 1, route is added to the queue, then it is extended via (’s nearest neighbor in category ), and no candidate route can be generated. At step 2, is examined, it is extended via (’s nearest neighbor in category ) and candidate route is generated via ’s 2nd nearest neighbor in category . At step 4, is examined and extended at , we insert it into the HT of . Subsequently, at step 6, since is dominated by in the HT of , won’t be extended at , instead, we insert into the HT of , and generate candidate route via ’s 2nd nearest neighbor in category . At step 9, the first optimal sequenced route is found. Since both and in HT of and , respectively, are dominated by and in HT of and , respectively, we re-add them into the queue with =‘-’ and remove the corresponding dominating routes from HT. Finally, at step 13, the second optimal sequenced route is found, and we return as the result.
By pruning the dominated routes and the candidate routes derived from them, both the capacity of the priority queue and the searching space are reduced, which improves the efficiency. Given a KOSR query , to find the first optimal sequenced route, for each vertex in , at most one route with size (plus the source) is extended at (line 16 in Algorithm 2), and at most candidate routes can be generated via ’s next nearest neighbors in category (line 21 in Algorithm 2). As a result, in the worst case, the number of routes to be examined by Algorithm 2 for the first optimal sequenced route is , in which routes are extended. Then, for each of the next optimal sequenced routes, at most dominated routes are reconsidered once an optimal sequenced route is found, which results in at most examined routes, and in which at most routes are extended at different categories, respectively. That is, to find the top- optimal sequenced routes, at most partially explored routes need to be examined, in which routes are extended. Compared to KPNE, the searching space is reduced from exponential complexity () down to polynomial complexity (). Lemma 3 shows the time complexity of Algorithm 2.
Given a KOSR query , let , the time complexity of Algorithm 2 is , where is the time complexity of Algorithm FindNN.
Since at most partially explored candidate routes are generated during the process of Algorithm 2, which means Algorithm FindNN will be called times at most, in which, at most routes are extended via the nearest neighbor. So that the complexity of this part is . In addition, each time we examine a candidate route from the priority queue, if the route is extended via the nearest neighbor, two candidate routes are generated in total, in this case, the capacity of the priority queue will be increased by 1. Otherwise, if the route is dominated, then it cannot be extended and only one candidate route is generated via the next nearest neighbor, and the capacity of the priority queue will not change. Since at most candidate routes are extended via their nearest neighbors, the capacity of the priority queue is at most . As a result, the complexity of the maintenance of the priority queue is . In summary, the total time complexity of Algorithm 2 is .
Finding the -th nearest neighbor. Next, we introduce how to find the -th nearest neighbor, the core operation FindNN in PruningKOSR. A straightforward way to find the -th nearest neighbor of vertex in category is that by using Dijkstra’s search. We start from and extend vertices via their adjacent vertices until the -th vertex in is settled. However, each time we find the -th nearest neighbor, Dijkstra’s search actually finds the top- nearest neighbors from scratch, which results in duplicate search effort throughout the graph. Moreover, since FindNN is frequently invoked, frequent Dijkstra’s searches on large graphs are practically inefficient. Hence, a more efficient method without duplicate searches is called for. To this end, we propose a method to incorporate the use of 2-hop labeling technique [9, 1, 2] to find the -th nearest neighbor.
Given a directed weighted graph , for each vertex , 2-hop labeling maintains two labels and . In particular, consists of a set of label entries in the form of , where is a vertex that is able to reach , and =. Similarly, consists of a set of label entries in the form of , where is a vertex that can be reached by , and =. Note that ’s entries may only contain a subset of vertices that can reach ; similarly, ’s entries may only contain a subset of vertices that can be reached by . In addition, the labels must satisfy the cover property: for any two vertices and , there exists a vertex on the shortest path from to that belongs to both and . Based on which, to answer a least cost query from to , we compute as follows:
Hence, the least cost from to can be computed by scanning and to find their matching label entries. If the label entries in each label set are sorted by their vertices, then we can compute in time using a merge-join like algorithm.
We note that building the 2-hop labeling with the minimal size (where the size of the index is defined as ) while satisfying the cover property is NP-hard . Thus, existing methods [9, 1, 4, 2] are all heuristic to approximate the minimal 2-hop labeling index. Alternatively, we may use an all-pairs shortest path algorithm to generate index. Although it works, it requires index size of , which is not acceptable for large graphs.
|Inverted label||Label entries|