Recently, technological advances in various devices, such as smart phones and automobile navigation systems, have allowed users to obtain real-time location information easily. This has triggered the development of location-based services such as Foursquare, which exploit rich location information to improve service quality. The users of the location-based services often want to find short routes that pass through multiple Points-of-Interest (PoIs); consequently, developing trip planning queries that can find the shortest routes that passes through user-specified categories has attracted considerable attention (dai2016personalized; li2005trip). If multiple PoI categories, e.g., restaurant and shopping mall, are in an ordered list (i.e., a category sequence), the trip planning query searches for a sequenced route that passes PoIs that match the user-specified categories in order.
Example 1.1 ().
Figure 1 shows a road network with the following PoIs: “Asian restaurant”, “Italian restaurant”, “Gift shop”, “Hobby shop”, and “ArtsEntertainment (AE)”. Assume that a user wants to go to an Asian restaurant, an AE place, and a gift shop in this order from start point . The sequenced route query outputs route because it is the shortest route from that satisfied the user requirements Asian restaurant, AE, gift shop.
Existing approaches find the shortest route based on the user query. However, such approaches may find an unexpectedly long route because the found PoIs may be distant from the start point. A major problem with the existing approaches is that they only output routes that perfectly match the given categories (eisner2012sequenced; ohsawa2012sequenced; sharifzadeh2008optimal). To overcome this problem, we introduce flexible similarity matching based on PoI category classification to find shorter routes in a flexible manner. In the real-world, category classification often forms a semantic hierarchy, which we refer to as a category tree. For example, in Foursquare111https://developer.foursquare.com/categorytree, the “Food” category tree includes “Asian restaurant,” “Italian restaurant,” and “Bakery” as subcategories, and the “Shop Service” category includes “Gift shop,” “Hobby shop,” and “Clothing store” as subcategories (Figure 2). We employ this semantic hierarchy to evaluate routes in terms of two aspects, i.e., route length and the semantic similarity between the categories of the PoIs in the route and those specified in the user query. As a result, we can find effective sequenced routes that semantically match the user requirement based on the semantic hierarchy. For example, in Figure 1, route satisfies the user requirement because it semantically matches the category sequence because Italian and Asian restaurants are in the same category tree. However, this approach may find a significantly large number of sequenced routes because the number of PoIs that flexibly match the given categories increases significantly. To reduce the number of routes to be output, we employ the skyline concept (borzsony2001skyline), i.e., we restrict ourselves to searching for the routes that are not worse than any other routes in terms of their scores (i.e., numerical values to evaluate the routes). Based on this concept, we propose the skyline sequenced route (SkySR) query, which applies the skyline concept to the route length and semantic similarity (i.e., we consider route length and semantic similarity as route scores). Given a start point and a sequence of PoI categories, a SkySR query searches for sequenced routes that are no worse than any other routes in terms of length and semantic similarity.
Example 1.2 ().
Table 1 shows real-world examples of sequenced routes in New York city where a user plans to go to a cupcake shop, an art museum, and then a jazz club in this order. The existing approaches output a single route that matches the user’s requirement perfectly. The proposed approach can output three additional routes that are shorter than the route found by the existing approach. Note that the additional routes also satisfy the user query semantically. The user can select a preferred route among all the four routes depending on how far he/she does not want to walk or their available time.
|Existing||3239 meters||Cupcake Shop Art Museum Jazz Club|
|Proposed||3239 meters||Cupcake Shop Art Museum Jazz Club|
|1858 meters||Dessert Shop Art Museum Jazz Club|
|1392 meters||Dessert Shop Museum Jazz Club|
|823 meters||Dessert Shop Museum Music Venue|
|SkySR (proposed)||Network||Total||Yes or No||Exact||Length and semantic|
|Optimal sequenced route (OSR) (sharifzadeh2008optimal)||Euclidean or Network||Total||Yes or No||Exact||Length|
|Sequenced route (eisner2012sequenced; ohsawa2012sequenced)||Network||Total||Yes||Exact||Length|
|Personalized sequenced route (dai2016personalized)||Euclidean||Total||No||Approximate||Length and rating|
|Trip planning (li2005trip)||Euclidean or Network||Non||Yes||Approximate||Length|
|Multi rule partial sequenced route (chen2008multi)||Euclidean||Partial||No||Approximate||Length|
|Multi rule partial sequenced route (li2013optimal)||Euclidean||Partial||No||Exact||Length|
|Multi-type nearest neighbor (ma2006exploiting)||Euclidean||Non||No||Exact||Length|
The SkySR query can provide effective trip plans; however, it incurs significant computational cost because a large number of routes can match the user requirement. Therefore, the SkySR query requires an efficient algorithm. The challenge is to search for SkySRs efficiently by reducing the search space without sacrificing the exactness of the result. We propose bulk SkySR algorithm (BSSR for short) that finds exact SkySRs efficiently. Recall that a feature of SkySRs is that their scores are no worse than those of other sequenced routes. BSSR exploits the branch-and-bound algorithm (lawler1966branch), which effectively prunes unnecessary routes based on the upper and lower bounds of route scores. In addition, to improve efficiency more, we employ four techniques to optimize BSSR. (1) First, we initially find sequenced routes to calculate the upper bound. (2) We tighten the upper bound by arranging the priority queue and (3) tighten the lower bound by introducing minimum distances. (4) we keep intermediate results for later processing, which refer to as on-the-fly caching. Our approach significantly outperforms existing approaches in terms of response time (up to four orders of magnitude) without increasing memory usage or sacrificing the exactness of the result.
The main contributions of this paper are as follows.
We introduce a semantic hierarchy to the route search query, which allows us to search for routes flexibly.
We propose the skyline sequenced route (SkySR) query, which finds all preferred routes related to a specified category sequence with a semantic hierarchy (Section 4).
We propose an exact and efficient algorithm and its optimization techniques to process SkySR queries (Section 5).
We discuss variations and extensions of the SkySR query. The SkySR query can be applied to various user requirements and environments (Section 6).
We demonstrate that the proposed approach works well in terms of response time and memory usage by performing extensive experiments. (Section 7).
We develop a prototype service that employs the SkySR query and conduct a user test to evaluate usefulness of the SkySR query. (Section 8).
The remainder of this paper is organized as follows. Section 2 introduces related work. Section 3 describes the problem formulation, and Section 4 defines the SkySR query. Section 5 presents the proposed algorithm. In Section 6, we discuss variations and extensions of the SkySR query. Sections 7 and 8 present experiment and user test results, respectively, and Section 9 concludes the paper.
2. Related work
First, we review trip planning query studies related to the SkySR query. Then, we review some studies related to the skyline operator. To the best of our knowledge, no study has considered a skyline sequenced route; thus, our problem cannot be solved efficiently using existing approaches.
Trip planning: We categorize trip planning queries in Table 2. Note that all existing trip planning queries only output routes that perfectly match the user-specified category sequences. Moreover, since most trip planning queries assume Euclidean distance, they cannot find SkySRs, in which road network distance is assumed. Dai et al. (dai2016personalized) proposed a personalized sequenced route and assumed that PoIs have ratings as well as categories and that users assign weighting factors as preferences. Although this personalized sequenced route considers route lengths and ratings, it only outputs the route that perfectly matches the given categories and has the best score based on lengths, ratings, and preferences. Only the optimal sequenced route (OSR) is applicable to find SkySRs without modification because the OSR and SkySR are based on the same settings (except for scoring). Sharifzadeh et al. (sharifzadeh2008optimal) proposed two algorithms to find OSRs in road networks: the Dijkstra-based solution and the Progressive Neighbor Exploration (PNE) approach. The main difference between these algorithms is that the Dijkstra-based solution employs the Dijkstra algorithm to search for PoIs and the PNE approach employs the nearest neighbor search. It has been reported that these algorithms are comparable in terms of performance (sharifzadeh2008optimal). Thus, we consider both algorithms to verify the performance of the proposed approach.
Skyline: The skyline operator was proposed previously (borzsony2001skyline). Few studies have considered the skyline concept for route searches. Recently, the skyline route (or skyline path) has received considerable attention (aljubayrin2015skyline; hansen1980bicriterion; kriegel2010route; martins1984multicriteria; shekelyan2015linear; tian2009finding; yang2014stochastic). A skyline route assumes that edges on road networks are associated with multiple costs, such as distance, travel time, and tolls. Here, the objective is to find skyline routes from a start point to a destination considering these multiple costs. However, since we specify a category sequence rather than a destination, we cannot apply conventional algorithms to find SkySRs. The continuous skyline query in road networks (e.g., (huang2005route)) searches for the skyline PoIs for a moving object considering both the PoI category and the distances to the moving object. Because continuous skyline queries search for a single PoI category, these solutions are not applicable to SkySR queries, which obtain routes that pass through multiple PoIs.
|Set of vertices|
|Set of PoI vertices|
|Set of edges|
|Set of categories|
|Category of PoI vertex|
|Category tree of|
|Set of PoI vertices associated with|
|Set of PoI vertices associated with|
|Category sequence (sequence of categories)|
|Route (sequence of PoI vertices)|
|Sequential PoI categories in|
|Length score of|
|Semantic score of|
|Set of routes|
|Set of super-routes of|
|Minimal set of sequenced routes|
|Category sequence specified by user|
|Start point specified by user|
Table 3 summarizes the notations used in this paper. We assume a connected graph , where , , and represent the sets of vertices, PoI vertices, and edges, respectively. This graph corresponds to a road network that contains PoIs. The numbers of vertices, PoI vertices, and edges are denoted , , and , respectively. PoI vertex is associated with category , where is the set of categories. We denote the category of PoI vertex as , and assume that each PoI is associated with a single category. Each category is associated with category tree , and we denote the category tree of category as . We denote the set of PoI vertices associated with and the set of PoI vertices associated with the category tree as and , respectively. If a PoI vertex is associated with category , it is also associated with all ancestor categories of in . Each edge in is associated with a weight . The weight can represent either travel duration or distance. Next, we define several terms required to introduce the skyline sequenced route (SkySR).
Definition 3.1 ().
(Category sequence) A category sequence is a sequence of categories, where is the size of . denotes the -th category in . A super-category sequence of is a category sequence where each -th category is either or an ancestor of () in the category tree.
Definition 3.2 ().
(Route) A route is a sequence of PoI vertices in a road network, where and denote the -th PoI vertex in and the size of , respectively. denotes the category sequence of (i.e., ). In addition, we define a super-route of as an extended route of , such as . In other words, a super-route of is obtained by adding a sequence of PoI vertices to the end of . and denote a set of routes and a set of super-routes of , respectively. Moreover, given a route and a PoI vertex , we define .
Definition 3.3 ().
(Category similarity) Given two categories and , the similarity is calculated by an arbitrary function such as the Wu and Palmer similarity or path length (resnik1995using; wu1994verbs). We assume the following relations in the similarity.
is irrelevant to if both exist in different category trees; thus, we obtain .
semantically matches if and are in the same category tree; thus, we obtain .
perfectly matches if and are the same; thus, we obtain .
Note that a semantic match subsumes a perfect match.
We define a sequenced route using the above definitions. The difference between our definition of sequenced route and the previous definition (sharifzadeh2008optimal) is that we consider category similarity.
Definition 3.4 ().
(Sequenced route) Given category sequence , is a sequenced route of category sequence if and only if it satisfies (i) , (ii) semantically matches for all such that , and (iii) all PoI vertices in differ each other.
Definition 3.5 ().
(Route scores) Given category sequence and vertex as a start point, we define two scores for route : length score and semantic score . We define the length score as follows:
where denotes the smallest weight sum of the edges on the routes between vertices (or PoIs) and . The semantic score is calculated by an aggregation function as follows:
where denotes . We assume that, if all , , i.e., if all PoI vertices in a route perfectly match the categories, the semantic score of the given route is 0. We also assume that is the possible minimum semantic score of when it is a sequenced route. Without loss of generality, preferred routes have small length and semantic score.
4. The skyline sequenced route query
Here, we define the SkySR query. Intuitively, a SkySR is a potential route that may be the best route related to the user’s requirement. A potential route is a route that is not dominated by any other routes; the notion of dominance is used in the skyline operator (borzsony2001skyline). We define dominance for sequenced routes and SkySR query in the following.
Definition 4.1 ().
(Dominance) Let be the set of all sequenced routes starting from point for category sequence . For two sequenced routes , we say that dominates if we have (i) and or (ii) and . If two sequenced routes have the same length and semantic scores, the routes are equivalent in the dominance, and a set of sequenced routes is minimal if it has no equivalent routes.
Definition 4.2 ().
(SkySR query) Given vertex as a start point and category sequence , a skyline sequenced route is a sequenced route not dominated by other routes. Let be the set of all sequenced routes from start point for category sequence , and let be a minimal set of the sequenced routes. The SkySR query returns that includes sequenced routes such that all are SkySRs and all are dominated by or equivalent to some of .
An naive solution to find SkySRs is to first enumerate SkySR candidates by iteratively executing OSR queries for any super-category sequences of and then check the dominance among the routes. The number of super-category sequences of increases exponentially as the depth of the category in the category tree and the size of increase. Thus, although OSR algorithms can find a sequenced route efficiently, we must repeat many searches. As a result, the naive solution needs significantly high computational cost to find SkySRs.
5. Proposed Algorithm
In this section, we present the proposed approach, which we refer to as the bulk SkySR algorithm (BSSR), that finds SkySRs efficiently. Section 5.1 presents the BSSR design policy, and Section 5.2 explains the BSSR procedure. In Section 5.3, we propose optimization techniques for BSSR. We also theoretically analyze its performance in Section 5.4. Finally, we show a running example of BSSR in Section 5.5. In Section 5, we assume undirected graphs in which each PoI vertex is associated with only one category and that users give sequences of single PoI categories. However, in a real application, the graphs would be directed graphs, each PoI vertex would be associated with multiple categories, and users may specify complex categories. Section 6 describes how we handle the above conditions.
5.1. Design Policy
Our idea to improve efficiency is to find sequenced routes simultaneously (i.e., by searching sequenced routes in bulk) in order to reduce the search space. We have two choice as the basis for our approach; Dijkstra-based or nearest neighbor-based approaches (sharifzadeh2008optimal). We use the Dijkstra-based approach as the basis of our algorithm. Recall that a SkySR query has two scores for a route, i.e., length and semantic scores. To find all SkySRs, we must find routes that have small category scores even if the routes have large length scores. However, PoIs that are included in the routes with small category scores could be distant from the start point. Although the nearest neighbor-based approach finds the closest PoIs, it cannot efficiently find such PoIs. On the other hand, the Dijkstra-based approach searches for all PoI vertices that match a PoI category. Therefore, the Dijkstra-based approach is more suitable for the SkySR query than the nearest neighbor-based approach.
Although our approach finds sequenced routes simultaneously, it entails a large number of executions of the Dijkstra algorithm. This is because, since the number of PoI candidates increases, a large number of possible routes increases. The search space does not become small effectively. To effectively reduce the search space, we exploit the branch-and-bound algorithm, which uses the upper and lower bounds of a branch of the search space to solve an optimization problem effectively. With BSSR, each branch corresponds to each route. For the upper and lower bounds, we compute the bounds during finding the set of SkySRs. Specifically, we compute the upper bound of a route from the already found sequenced routes, and we compute the lower bound from the current searched route (i.e., not a sequenced route yet). With the upper and lower bounds, we can safely prune unnecessary routes to improve efficiency.
To further increase efficiency, we propose optimization techniques for BSSR. In order to exploit the branch-and-bound algorithm, it is necessary to initialize the upper bound. Thus, we first search for a sequenced route to initialize the upper bound. However, it may take high computational cost to find a sequenced route. Therefore, we propose a nearest neighbor-based initial search method (NNinit) that finds sequenced routes efficiently by greedily finding PoI vertices. In addition, to effectively update the upper bound, we assign a priority to each route and use the priority queue to efficiently find routes that are likely to give an effective upper bound. To compute the lower bound, we compute the possible minimum distance and add it to the length score of a route to safely prune unnecessary routes. Moreover, to avoid executing the Dijkstra algorithm iteratively from the same vertices, we materialize search results of the Dijkstra algorithm and reuse them to search the PoI vertices. By using BSSR with optimization techniques, we can perform the SkySR query efficiently.
5.2. Bulk SkySR algorithm
Bulk SkySR algorithm (BSSR) finds all SkySRs by finding simultaneously sequenced routes with checking dominance on demand. The naive solution must execute OSR queries for all super-category sequences of one by one because it only searches for the PoIs that perfectly match the given category. In contrast, BSSR searches for all PoIs that semantically match the given category.
The basic process of BSSR is simple as shown in Algorithm 1: (i) start searching the PoI vertices that match the first category from start point and insert the route found into priority queue which stores all found routes (line 4), (ii) fetch a route from (line 6), (iii) search for the next PoI vertices that semantically match the next category from PoI vertex which is the end of the fetched route, and insert the fetched route with each of the found PoI vertices into (lines 7–9), and (iv) if is not empty, return to (ii), otherwise output the minimal set of sequenced route (line 10). In steps (i) and (iii), we find PoI vertices from the end of the fetched route using a Dijkstra algorithm modified for the SkySR query as described in Section 5.2.2.
We search for sequenced routes simultaneously to reduce the search space. Our idea to safely reduce the search space is to exploit the branch-and-bound algorithm, which can reduce unnecessary search space. This section describes the theoretical background of using the branch-and-bound algorithm. We use the following three lemmas to reduce the search space:
Lemma 5.1 ().
Let be a minimum set of sequenced routes while searching for SkySRs and be the minimum set of sequenced routes after finding SkySRs. If sequenced route is dominated by a sequenced route in , cannot be included in .
proof: From Definition 4.2, we search for a set of SkySRs, which are not dominated by the other sequenced routes. If we find a sequenced route not dominated by any sequenced routes in , we update by inserting the new sequenced route and deleting a sequenced route dominated by the new one. Therefore, any sequenced routes in after the update are not dominated by any sequenced routes in prior to the update. As a result, sequenced routes in are not dominated by any sequenced routes in . In other words, is not included in if we have sequenced route in such that and .
Lemma 5.2 ().
Let be a set of super-routes of starting from the same start point. For any route in , the length and semantic scores and cannot be less than and , respectively.
Therefore, we have . is the possible minimum semantic score of when it becomes a sequenced route. Thus, even if PoI vertices are added to , we have . As a result, we have and .
Lemma 5.3 ().
(pruning condition) If (i) is a sequenced route included in the set of sequenced routes and (ii) and , any routes in cannot be included in .
proof: If we have and , is not included in (Lemma 5.1). From Lemma 5.2, the scores of cannot become less than and even if we expand . Therefore, any routes in cannot be included in because is dominated by or equivalent to the sequenced route with and .
Lemma 5.3 gives us the length score threshold for a route, and, if the length score of a route is greater than this threshold, we can prune the given route. We define the length score threshold of a route as follows:
Definition 5.4 ().
The threshold of the length score of route is given by the following equation:
5.2.2. The modified Dijkstra Algorithm
We search the next PoI vertices that semantically match the next PoI category using the modified Dijkstra algorithm. The modified Dijkstra algorithm can prune unnecessary routes based on Lemma 5.3. Moreover, based on the following lemma, it terminates unnecessary traversal of the graph and avoids inserting unnecessary routes.
Lemma 5.5 ().
Let be a route and be a PoI vertex on a path between and . Route must be dominated by or equivalent to another route if we have .
proof: Let be a route such that the difference between and is only in and . Since the PoI vertex is on the path between and , we have based on triangle inequality (i.e., ). Moreover, if , we have . Therefore, is dominated by or equivalent to because and .
Lemma 5.5 gives us two properties for the SkySR query: (i) even if we find a PoI vertex that passes through another PoI vertex that has a better category similarity, we can ignore the PoI vertex, and (ii) if we find a PoI vertex that perfectly matches the given category, we do not need to traverse the graph through the PoI vertex. As a result, using Lemma 5.3 and 5.5, we can efficiently find the next PoI vertices.
Algorithm 2 shows the pseudocode for the modified Dijkstra algorithm, which is used to find PoI vertices that semantically match from . In priority queue for the modified Dijkstra algorithm, the top vertex is the closest vertex to . The queue is initialized to (line 3). The closest vertex to is dequeued from (line 5). is a route expanded from , which is with fetched vertex (line 7). If the length score of is greater than or equal to the threshold of , the modified Dijkstra algorithm terminates the process (Lemma 5.3) (line 8). We check whether (i) semantically matches and (ii) does not proceed through another PoI vertex whose category similarity is greater than or equal to that of (line 9). If we satisfy the above conditions and the length score of is less than its threshold (line 10), we insert into the priority queue or the set of sequenced routes (lines 10–12). Otherwise, we skip the process to insert (Lemma 5.3 and 5.5). The neighbor vertices of are inserted into unless perfectly matches (Lemma 5.5) (lines 13–17).
5.3. Optimization techniques
In this section, we propose four optimization techniques for BSSR. Section 5.3.1 explains an initial search for sequenced routes and proposes NNinit. We then explain tightening the upper and the lower bounds in Section 5.3.2 and Section 5.3.3, respectively. Furthermore, in Section 5.3.4 we propose an on-the-fly caching technique to reuse previous search results of the modified Dijkstra algorithm.
5.3.1. Initial search
We prune unnecessary routes efficiently using the branch-and-bound algorithm. However, we cannot calculate the threshold of if there are no sequenced routes in whose semantic scores are not greater than that of based on Equation (3). Therefore, initially, we search for the sequenced route whose semantic score is 0. However, the length score of the sequenced route can be large if its semantic score is 0. To tighten the threshold, we also search for sequenced routes whose semantic scores are greater than 0 because the length scores of them are less than that of the sequenced route with a semantic score of 0. We initially find several sequenced routes to tighten the upper bound.
We propose NNinit, which searches for several sequenced routes efficiently. NNinit performs a nearest neighbor search repeatedly to find PoI vertices that perfectly match the given categories. With this process, we can find a sequenced route whose semantic score is 0. Moreover, NNinit can find the PoI vertex that semantically matches the given category during the nearest neighbor search. When we find the last visited PoI vertex, we may find PoI vertices that semantically match the last category in . Therefore, we can obtain sequenced routes whose semantic scores are greater than 0 and length scores are small. As a result, NNinit can find several sequenced routes without incurring additional cost, and one of the sequenced routes has a semantic score of 0.
We present the pseudocode for NNinit in Algorithm 3. Here, priority queue is initialized to start point (line 3). NNinit repeats the Dijkstra algorithm times to find sequenced routes (line 4). The Dijkstra algorithm is executed to search for the closest PoI vertex that perfectly matches from the initial vertex (the first initial vertex is ) (lines 5–19). Here, the closest vertex to the initial vertex is dequeued from (line 7). If the algorithm finds a PoI vertex that perfectly matches , this vertex is added to and is initialized to the PoI vertex (lines 12–15). When it finds the last PoI vertex that semantically matches , it inserts the sequenced route into (lines 9–11). Finally, we obtain a set of sequenced routes, and one of the sequenced routes in has a semantic score of 0.
Example 5.6 ().
We show an example of NNinit using Example 1.1, which searches an Asian restaurant, an AE place, and a gift shop in this order from start point . NNinit executes the Dijkstra algorithm three times because the size of category sequence is three. First, NNinit searches PoI vertices that perfectly match Asian restaurant from . Then, it finds that is the closest PoI that perfectly match Asian restaurant to . Next, it searches the closest PoI vertex that perfectly matches AE to and then finds . From the next search, NNinit inserts sequenced routes to when it finds PoI vertices that semantically match gift shop. NNinit finds whose category is ShopService (i.e., semantically match) and thus inserts to . After finding , it finds that perfectly matches gift shop and inserts to . Finally NNinit returns including . The length score of is 12, which is less than the length score of of 15.
5.3.2. Tightening upper bound: Arranging routes in the priority queue
We use the upper bound to prune unnecessary routes. The upper bound is computed from the obtained sequenced routes. To tighten the upper bound, it is important to efficiently find sequenced routes that have small length and semantic scores. BSSR extends a route at the top of the priority queue to search for a sequenced route, as shown in Algorithm 1. Note that priority queues in existing algorithms conventionally consider only distances (i.e., a distance-based priority queue). If we use a distance-based priority queue, BSSR preferentially extends a route with a small length score. Although we must increase the size of a route to to find a sequenced route, a route that has a small length score likely has a small size. Therefore, it is difficult to search for sequenced routes efficiently using a distance-based priority queue.
To search for sequenced routes efficiently, we preferentially extend a route that has a large size. Here, since many routes in the priority queue could have the same size, we must consider an additional priority, which is expected to affect performance. If multiple routes in the priority queue are the same size, we preferentially extend the route with the smallest semantic score. We can reduce the search space by searching for sequenced routes in ascending order of semantic score. Moreover, if routes are the same size and have the same semantic score, we preferentially extend the route with the smallest length score. As a result, we can efficiently obtain sequenced routes with small length and semantic scores.
5.3.3. Tightening lower bound: Possible minimum length score
As described in Section 5.2.1
, we use the length scores of routes as the lower bound, i.e., we prune a route if the length score of the route is not less than the threshold. Note that the length score of the route increases as the route size increases. This indicates that it is difficult to prune routes before the route size increases. Our approach to tighten the lower bound of the route is to estimate the increase of the length score. However, if we carelessly estimate a future length score, we may sacrifice the exactness of th result.
The basic idea of this estimation is to calculate the possible minimum distance. Here, we compute the smallest distance among any pair of PoI vertices in sets of PoI vertices. We use the following two minimum distances, semantic-match minimum distance and perfect-match minimum distance :
Definition 5.7 ().
We compute the semantic-match minimum distance based on the distance to the PoI vertices that semantically match the next category. We can safely add the semantic-match minimum distance to the current length score without restriction. However, the semantic-match minimum distance is much less than the threshold. Thus, it could be difficult to improve pruning performance; thus, we use the perfect-match minimum distance to increase pruning performance. The perfect-match minimum distance is computed based on the distance to the PoI vertices that perfectly match the next category. We can improve pruning performance using the perfect-match minimum distance compared to the semantic-match minimum distance because the perfect-match minimum distance is much greater than the semantic-match minimum distance; therefore, the perfect-match minimum distance tightens the lower bound more than the semantic-match minimum distance. However, we can use the perfect-match minimum distance only in a special case, i.e., where a route must pass only PoIs that perfectly match the given categories so as not to be dominated. The perfect-match minimum distance works well if the number of sequenced route in is large because the constraint is usually satisfied by increasing the number of sequenced route in .
Lemma 5.8 ().
Let and be sequenced routes in and be a route such that (i) and and (ii) and . Let be the minimum increment of a semantic score222The least increase of the semantic score is computed from the category tree. Specifically, we can compute the least increase from the category that is most similar (but not equal) to the next category.. We can prune if we have (a) and and (b) and .
proof: First, we consider case (a). If we have and , is dominated by or equivalent to if its semantic score increases. Therefore, must only pass through PoI vertices that perfectly match the given categories not to be dominated. If passes through only PoI vertices that perfectly match the given categories, the length score of increases by at least . For case (b), if we have and , is dominated by or equivalent to if its length score increases by . As a result, if we have two routes and , such as (i) and and (ii) and , is dominated by or equivalent to at least one of and .
To compute the estimation of the lower bound, we compute two types of possible minimum distances and . A naive approach computes all minimum distances from the PoI vertices that semantically match to for by iteratively executing the Dijkstra algorithm. However, this has a high computational cost. To reduce the cost, we execute a multi-source multi-destination Dijkstra algorithm. In this algorithm, all start points are inserted into the same priority queue. Then, the algorithm dequeues vertices in the same manner as the conventional Dijkstra algorithm. Here, the process is terminated if the top of the priority queue becomes one of the destinations. This approach only needs times to compute the possible minimum distance. The multi-source multi-destination Dijkstra algorithm guarantees the minimum distance by the following lemma:
Lemma 5.9 ().
The multi-source multi-destination Dijkstra algorithm guarantees the minimum distance from the start points to the destinations.
proof: We first insert multiple start points into the priority queue, and their distances from the start points are initialized as 0. If we find a vertex, it is inserted into the queue and the distance to the vertex is updated from the closest start point to the vertex. The vertex with the smallest distance from the start point in the priority queue is dequeued from the priority queue. If the top vertex in the priority queue is one of the destinations, there are no destinations with smaller distance than the top one. Therefore, we can guarantee the minimum distance from the start points to the destinations.
Algorithm 4 shows the pseudocode to compute the semantic-match minimum distance. The estimation of the lower bound is executed after line 4 in Algorithm 1. Here, we initialize and (lines 3–4). denotes the threshold for a route whose semantic score is 0. The difference between computing the semantic-match and perfect-match minimum distances is whether the PoI vertices in semantically or perfectly match the given category.
Example 5.10 ().
We show an example to compute the semantic-match minimum distance using Example 1.1. , , and include , , and , respectively. First, PoI vertices in are inserted to priority queue , and the set of destinations is . By processing the Dijkstra algorithm, we compute possible minimum distance (from to ). Next, we search PoI vertices that semantically match AE to gift shop. Then, we compute (from to ). Finally, we obtain semantic-match minimum distance . We can compute the perfect-match minimum distance in the same way and obtain , which is greater than .
5.3.4. Reuse of the temporal result: On-the-fly caching technique
Although BSSR efficiently prunes unnecessary routes, it may iteratively execute the modified Dijkstra algorithm at the same vertex because, in Algorithm 1 (line 8), could be the same as the former executions of the modified Dijkstra algorithms. Thus, we reuse the result starting at the same PoI vertex by materializing the result of the modified Dijkstra algorithm (i.e., keeping PoI vertices matching and distances from to the PoI vertices), which we refer to as on-the-fly caching.
After finding SkySRs, on-the-fly caching frees the results of the modified Dijkstra algorithms (this is why we call it on-the-fly), because the search space rarely overlaps across different inputs (i.e., and differ).
5.4. Theoretical Analysis
In this section, we theoretically analyze the cost and correctness of the proposed BSSR.
Theorem 1 ().
(Time complexity) Let be a ratio of pruning and be a ratio of the size of a graph to find the SkySRs. The time complexity of BSSR is .
proof: The time complexity of the Dijkstra algorithm is if the number of vertices is . In our setting, we have vertices because we have two types of vertices. In addition, we do not need to search the whole graph by reducing the graph size according to the threshold. Therefore, the time complexity of the modified Dijkstra algorithm is . The time complexity of BSSR depends on the number of times the modified Dijkstra algorithms is executed. The number of modified Dijkstra algorithms is equal to all the potential routes . Recall that we can prune the number of routes using the branch-and-bound algorithm. Finally, the time complexity of BSSR is .
In our approach, and depend on the upper and lower bounds. These are affected by the graph structure, the category trees, and the ratio of PoI vertices, and the time complexity of BSSR depends on these factors.
Theorem 2 ().
(Space complexity) Let be the pruning ratio, and be the ratio of the size of the graph to find the SkySRs. The space complexity of BSSR is .
proof: We store the whole graph of size . We also store routes into the priority queue and , and the maximum number of routes is . We can prune the number of routes using the branch-and-bound algorithm. The size of the routes is proportional to . Therefore, the space complexity of BSSR is .
If the number of routes in the priority queue is small, the graph size becomes the main factor related to the memory usage. Otherwise, the number of routes in the priority queue is the main factor.
Theorem 3 ().
(Correctness) BSSR guarantees the exact result.
proof: BSSR prunes routes based on the upper and lower bounds. BSSR safely prunes routes dominated by or equivalent to the obtained sequenced routes. As a result, BSSR does not sacrifice the exactness of the search result.
5.5. Running Example
We demonstrate BSSR with optimization techniques using Example 1.1. Table 4 shows routes in priority queue and sequenced routes in . To compute category similarity and semantic score, we use Equations (6) and (7), respectively.
First, we process NNinit, and initially includes , . 1st step: BSSR starts to find PoI vertices that semantically match Asian restaurant from with the threshold of 15. Then, it finds , , , , and . Both ’s and ’s category similarities are 1, and their lengths are 6 and 8, respectively. Thus, comes the top in . 2nd step: BSSR searches PoI vertices that semantically match ArtsEntertainment from , and finds . Since passes through and is more than 15, both routes are not inserted to . 3rd step: as the top route is , BSSR searches PoI vertices that semantically match gift shop from . BSSR does not find any routes due to the threshold. 4th step: BSSR fetches from and inserts two routes and to . 5th step: BSSR fetches and finds sequenced route . Since dominates , is deleted from . 6th step: The top route is deleted from because its length score is not smaller than the threshold of 13. 7th step: BSSR fetches and inserts and . 8th step: BSSR fetches and finds a sequenced route . is inserted to , and is deleted from . 9th step: is deleted due to the threshold. 10th step: BSSR fetches and finds a route . 11th step: BSSR finds a sequenced route , and the route dominates . 12th step: The distance from to the PoI vertices that match AE is larger than the threshold. Finally, BSSR returns the set of SkySRs .