People spend a significant amount of their time in indoor spaces often in unfamiliar buildings such as shopping malls, airports, and libraries (Jenkins et al., 1992). Recent advances in indoor positioning technology, cheap wireless network and availability of geo-tagged data have resulted in huge demand for indoor location-based services such as finding nearby indoor objects, indoor navigation, and route planning to name a few. Route planning is one of the most popular services among both indoor and outdoor users, which assists them in planning a route satisfying their preferences. Specifically, a user may issue a route planning query by providing a source location and a target location along with her preferences as a set of keywords (e.g., restaurant, salon, supermarket). A route planning query returns an optimal route that starts from the source location, passes through at least one location from each given preference and ends at the target location.
Due to its popularity, route planning query has been extensively studied in the past few years (Li et al., 2005; Sharifzadeh et al., 2008; Cao et al., 2012; Zeng et al., 2015; Yao et al., 2011). However, all these techniques are specifically designed for outdoor spaces and cannot be efficiently extended for the indoor spaces because they fail to exploit the unique properties specific to indoor venues. For example, indoor graphs have a much higher out-degree as compared to the road networks (Shao et al., 2016). Furthermore, the object density is much higher for indoor venues, e.g., the number of POIs (e.g., restaurants, fuel stations) on the vertices of road networks is typically small whereas the number of objects in a single room (e.g., products in a supermarket) of indoor venues may be in thousands. Thus, specialized techniques are required to answer route planning queries in indoor venues.
Inspired by the above, in this paper, we provide first set of techniques to answer an important route planning query with various applications in different scenarios. Consider a user who is in the car park of a large shopping center and has a list of items to buy (e.g., a wine bottle, a bunch of flowers, a cake, and a wrist watch). She may want to find an optimal route such that the total distance she needs to walk and the total price she pays to purchase all these items are minimized. She may use a Category Aware Multi-criteria route planning query, denoted as CAM, which takes as input a set of categories (e.g., the list of items she wants to purchase) and a scoring function, and returns the route that passes through at least one object of each category and has the minimum score where score of each route is computed using the user defined scoring function considering the total length of the route and total price of the items along the route.
In contrast to traditional route planning queries that only consider a single criterion (i.e., distance), Category Aware Multi-criteria route planning queries could retrieve optimal route considering multiple criteria such as the total length of the route, total price, total rating of items, and total waiting time for the activities etc. Consider another example of a user in an airport who is running late for a flight and needs to withdraw money from an ATM, grab a coffee, and needs to go to a service desk before she checks in. For such a user, total length of the route is important as well as the total waiting time at the ATM, coffee machine and service desk. Therefore, she may issue a CAM query where the scoring function is used to compute the score of a route considering its total length and the total waiting time at each facility (i.e., ATM, coffee machine and service desk) along the route.
To the best of our knowledge, we are the first to study the route planning queries where the score of a route is computed using not only its total length but also other relevant attributes such as total price and total waiting time etc. We show that a CAM query is NP-hard in number of categories and propose an approximation algorithm to efficiently solve it. Although it is possible to extend existing outdoor techniques to solve CAM query in indoor venues, they fail to exploit the properties specific to indoor venues such as high density of objects in indoor partitions (e.g., thousands of objects in a single store). To address this issue, we present an efficient algorithm that utilizes a novel dominance-based pruning to significantly reduce the number of possibilities while maintaining high-quality results. Our extensive experimental study shows the effectiveness of our proposed algorithm. We summarize our contributions below.
We propose the category aware multi-criteria route planning (CAM) query and show that it is NP-Hard.
We present an efficient approximation algorithm to retrieve high-quality results for CAM queries.
We conduct an extensive set of experiments on a real-world shopping center containing real products. The experiments demonstrate that our algorithm outperforms state-of-the-art technique in terms of running time and quality of results. Furthermore, our experiments show that the cost of the route generated by our algorithm is at most higher than the cost of the optimal route.
2. Related Work
2.1. Query processing in indoor space
The existing outdoor query processing techniques fall short in indoor space as they do not consider unique properties of an indoor space such as hallways and rooms. Hence, efficient query processing in indoor space has received a great attention in recent years in which many indexing structures and query processing techniques were proposed. A comprehensive taxonomy for querying indoor data, shortest distance/path, range and k nearest neighbour queries under various settings can be found in (Lu et al., 2011; Xie et al., 2015; Yang et al., 2009; Yuan and Schneider, 2010). RTR-Tree and R-tree (Jensen et al., 2009) are extensions of R-tree to index trajectories of indoor moving objects. Xie et al. (Xie et al., 2013) develop a composite indexing structure called R-tree, that indexes indoor entities into different layers. D2D graph (Yang et al., 2010) is one of the most notable techniques which has been used in most of the studies in literature since they enable various query processing techniques in road networks (Zhong et al., 2015; Lee et al., 2012) to be applied in the indoor space. D2D graph represents doors in the indoor space as vertices. A weighted edge between two vertices is created if they are connected to the same indoor partition (e.g., room, hallway) where the edge weight is the indoor distance between the corresponding doors. Lu et al. (Lu et al., 2012) propose a distance aware indoor space data model along with efficient distance computation algorithms.
Shao et al. (Shao et al., 2016) introduce an efficient indexing structure called IP-tree that takes into account unique indoor properties in tree construction and query processing. In an IP-tree, adjacent indoor partitions (e.g., rooms, hallways, staircases) are combined to form leaf nodes. Then, the adjacent leaf nodes are combined to form intermediate nodes. This process is iteratively continued until all nodes are combined into a single node (i.e., root node). VIP-tree (Shao et al., 2016) is an improvement of the IP-tree. Compared to the existing indexing techniques, VIP-tree has demonstrated more efficiency and higher scalability.
2.2. Route Planning Queries
A large body of research has been done on developing efficient techniques to process route planning queries. Trip planning query (TPQ) (Li et al., 2005) has source and target locations and a set of categories in which it finds the shortest route starts at the source location, passes through at least one object from each given category and ends at the target location. They propose two fast algorithms (a greedy and an integer programming algorithm) based on triangular inequality property of the metric space. These solutions take into account only the distance in finding an optimal route while CAM considers multiple criteria such as static cost. Hence, these solutions cannot be used to process CAM queries.
Sharifzadeh et al. (Sharifzadeh et al., 2008) introduce a variant of TPQ called optimal sequenced route (OSR) query that visits the categories in a particular order given by the user. There are several works (Kanza et al., 2009, 2010) in literature that study OSR queries. CAM is different from OSR since it does not consider a visiting order of the categories. Therefore, these algorithms are not applicable to process CAM quries. Cao et al. (Cao et al., 2012) introduce another variant called keyword aware optimal route (KOR) search, which covers all user given keywords while satisfying a user specified budget constraint and optimizing objective score of the route. Zeng et al. (Zeng et al., 2015) find an optimal route such that the keyword coverage is maximized without exceeding a budget constraint. Purpose of such a route is to optimally satisfy the user’s weighted preferences. Chen et al. (Chen et al., 2008) study a new type of route planning query called multi-rule partial sequenced route (MRSPSR) query in which users set travelling preferences/restrictions when they issue a query. We find these works have different aims compared to CAM problem.
Yao et al. (Yao et al., 2011) study another variant of route planning query, the multi-approximate-keyword routing (MAKR) query. A MAKR query finds a route with the shortest length such that it covers at least one matching object per given keyword while satisfying string similarity constraints. MARK studies a similar problem to CAM . Thus, we employ an extension of their approximation solution in our experiments to evaluate our proposed solution. Shao et al. (Shao et al., 2017) are the first to study the indoor trip planning queries. They propose an exact solution called VIP-tree neighbour expansion (VNE) that exploits the unique indoor features such as rooms and hallways. Hence, we find that an extension of their solution is inefficient in answering a CAM query.
3. Problem Definition
In this section, we formulate the problem of category aware multi-criteria route planning query and prove the hardness of the problem.
Notations used in this paper are summarized in Table 1.
Definition 3.1 (Indoor objects).
Let be an indoor point representing an indoor object. Each point is associated with a category and a static score denoted by .
Definition 3.2 (Route).
A route denotes a path from indoor point to where is the shortest path between two points.
Definition 3.3 (Travel cost).
Given a route , the travel cost of route is computed as follows
where denotes the indoor distance between two points in route .
Definition 3.4 (Static cost).
Given a route , let be the set of categories covered by where and denotes an indoor point that covers . Hence, the static cost is computed as follows,
where denotes the static score of the indoor point .
Definition 3.5 (Cost function).
We determine the cost of a route in terms of travel cost and static cost, as follows,
Here, is a query parameter (user-defined) that lies between 0 and 1 to control the preference of travel cost and static cost.
|An indoor point|
|The static score of point|
|A CAM query|
|A door in indoor space|
|An indoor partition|
|The start/end point of a route|
|A set of categories|
|The set of indoor points of category|
|The point dominates w.r.t door|
|The route dominates|
|The dominated set of point w.r.t door|
Definition 3.6 (Category Aware Multi-criteria route planning (CAM) query).
Given an indoor space, a category aware multi-criteria route planning query where denotes the source point and the target point of the route, and denotes a set of unique categories that describes the user preferences. A route from the point to the point , that passes through at least one indoor point from each given category, is called a complete candidate route. Moreover, a CAM query returns a route subject to:
where is the collection of all complete candidate routes for the given query .
Theorem 3.7 ().
The problem of solving a CAM query is NP-hard.
This problem can be reduced from the classical travelling salesman problem (TSP) which is NP-hard. Given a graph in which each edge has a length, let both start and end points equal to a node , each given category is covered by a node with where and all the other nodes contain non-query categories. Clearly, the problem of solving CAM query is identical to the TSP. Thus, the problem of solving CAM problem is NP-hard. ∎
4. Our Solutions
4.1. GCNN Algorithm
A CAM query can be answered using a brute force approach by conducting an exhaustive search. Even though the brute force method guarantees the optimal solution, the exhaustive search is prohibitively expensive in practice. We devise a novel approximation algorithm called Global Category Nearest Neighbour (GCNN) algorithm to quickly answer a CAM query.
GCNN algorithm is a greedy algorithm that greedily adds an indoor point to an existing partial candidate route by minimizing the route cost w.r.t travel and static costs. Basically, GCNN algorithm starts from the source point and progressively constructs a candidate route by inserting an indoor point covering one of the uncovered categories. For a given partial candidate route , the algorithm finds such a point subjected to:
where returns the category nearest neighbour point for a given category w.r.t an indoor point . We comprehensively describe the process of obtaining a category nearest neighbour point in Section 4.1.1. Then, the globally best category nearest neighbour point for the current point is determined using Equation (7) and is updated to . The algorithm terminates when turns into a complete route where all the query categories are covered. In order to determine such an optimal route, we can maintain a min-priority queue where a partial candidate route is enqueued into the queue by determining the key value as follows: where is the recently inserted point. Whenever a candidate route is dequeued from the queue, we find category nearest neighbour points for each uncovered category and generate new candidate routes. Then, the set of new candidate routes are enqueued into the queue. Intuitively, the candidate route which is dequeued first in next iteration is the answer to Equation (7).
As Algorithm 1 illustrates, initially, we enqueue a route with zero as the key value. We terminate the algorithm either when the queue is empty (line 4) or an optimal route is found (line 8-10). In each iteration, we dequeue a candidate route from the queue (line 5) which essentially provides the answer to Equation (7) of the previous iteration. After a candidate route is dequeued, we clear the min-priority queue by dequeuing all the routes (line 6). This allows us to maintain the current optimal partial candidate route in each iteration. Next, the set of uncovered categories, i.e., , is obtained (line 7). Then for each uncovered category, we get the category nearest neighbour point using Equation (6) and generate a new candidate route by inserting that point into the current candidate route (line 11-13). The key value of a route is determined by taking into account both route cost and distances between point and start/end points, i.e., and , (line 14). Each route is then enqueued into the queue with its key value (line 15). Finally, the optimal route for the given CAM query is returned (line 16).
For example, Figure 1 shows a route where . Let and be indoor points where belong to category , i.e., , and belongs to category , i.e., . The score of each point w.r.t Equation (5) is mentioned next to the point. Assume that and be the recently dequeued candidate route. Then GCNN algorithm finds the category nearest neighbour point for each uncovered category, i.e., , using Equation (6). Hence, the points and are selected as and respectively. Then, new candidate routes and are generated accordingly and enqueued into the queue. In the next iteration, is dequeued first satisfying Equation (7).
4.1.1. Category Nearest Neighbour
The category nearest neighbour of point is the closest point to w.r.t both travel and static costs. In order to obtain the category nearest neighbour point covering category for a given point , i.e., , every indoor point belongs to the particular category , i.e., , is ranked using Equation (5). As Equation (6) depicts, the point with the minimum ranking score is selected eventually.
Determining an optimal route in GCNN algorithm is very expensive since all the related indoor points belong to uncovered categories are ranked in each iteration to find category nearest neighbours. Thus, the number of times, the ranking operation is executed in obtaining an optimal route is , where is the average number of indoor points per category and is the total number of query categories. Clearly, the performance of GCNN algorithm decreases as and are increased. Since, is a query parameter, the performance of the algorithm can be accelerated by reducing the number of related indoor points (i.e., ) in the indoor space. Thus, we introduce a novel pruning technique that eliminates the indoor points which are highly unlikely to be selected in determining an optimal route. We use an extension of VIP-tree (Shao et al., 2016) called inverted VIP-tree as our indexing structure. In order to support category-based filtering, we modified VIP-tree by implementing an inverted file at each tree node. For example, an inverted file in a leaf node consists of a list of all the unique categories that appear in any indoor partition, and for each category, a list of indoor partitions in which it appears. Moreover, we maintain an additional list at each tree node which consists of the minimum static score of each category appear in the tree node. This enables simultaneous travel cost and static cost based filtering. Thus, a category nearest neighbour point is retrieved in an efficient manner.
4.2. Dominance-based Pruning
As we discussed in previous section, the performance of GCNN algorithm can be accelerated by reducing the number of indoor points visited by the algorithm in query processing. Thus, we introduce a novel pruning technique called dominance-based pruning that eliminates objects that are highly unlikely to be selected in constructing the optimal solution. The dominance-based pruning technique utilizes the unique properties of an indoor space such as partitions in which it identifies the incompetent points in each indoor partition and prune them accordingly. Before we present the pruning technique, we introduce following definitions.
Definition 4.1 (Point Dominance).
Let and be points belong to category , i.e., , reside in an indoor partition . Let be one of the doors of . Then, the indoor point dominates w.r.t door , denoted by , if and only if and .
Definition 4.2 (Dominated Set).
Let be a point belongs category , i.e., . Then dominated set of the point w.r.t door is defined as follows,
The dominance of a point over another point can be decided only if both points belong to the same category and reside in the same indoor partition. As Definition 4.1 depicts, for a given door and two indoor points , if the point is closer to the door than the point and also has a static cost less than , then dominates w.r.t the door . Moreover, according to Definition 4.2, the point belongs to the dominated set of w.r.t door .
Definition 4.3 (Route Dominance).
Let and be routes inside an indoor partition , start from door and end at door where . Then, dominates (denoted by ) if and only if .
A route can dominate another route only if both routes are inside the same partition, the starting and ending doors are same, and covering the same set of categories. According to Definition 4.3, if the cost of route is less than the cost of route , then route dominates . Next, we present four important theorems that help to derive our pruning rules. Note that, for all these theorems and pruning rules, we assume that and an indoor partition consist of two doors . Also, when we say , it means set of points of the indoor partition that belongs to category .
Theorem 4.4 ().
Let routes and where and . Then, only if and .
For the given routes and , if dominates then . Also, we know that, . By adding both inequalities, . Furthermore, . And, . Hence, . ∎
Theorem 4.5 ().
Let routes and where and , and . Then, only if and where .
We prove this by contradiction, Assume and . Then, . Hence, . Also, we know . By above two inequalities, . Therefore, it must be the case that our assumption is false. So when and . ∎
For given , the dominance of route over can be guaranteed if the point dominates and is closer to than (See Figure 2(a)).
Theorem 4.5 takes into account an instance where the point is closer to than . In this case, can be guaranteed only if the point resides outside the distance threshold as Figure 2(b) illustrates.
For multiple objects. Assume that there is another point within the distance , where . If where , then point can be ignored since a route via does not dominate . If the indoor points, i.e., , are visited based on dominance order, then distance threshold, i.e., , is guaranteed to be an upper bound as is always a lower bound. Visiting the indoor points based on dominance order means that always a point is visited before visiting a point dominated by . Moreover, if , then needs to be updated w.r.t point and checked for . Similarly, all points need to be verified if there is more. Then we can guarantee that dominates where , .
Theorem 4.6 ().
Let routes and where and . Then, only if , and .
For the given routes and , if dominates and dominates , then and respectively. Also, we know that . By adding them, . And, . Hence, . ∎
Theorem 4.7 ().
Let routes and where and , and . Then, only if , and where .
We prove this by contradiction, Assume , and . Then, and . Hence, . Also, we know . By above two inequalities, . Therefore, it must be the case that our assumption is false. So when , and . ∎
As Figure 3(a) shows, Theorem 4.6 guarantees that a route via points and , i.e., , cannot dominate a route via corresponding points and , i.e., when the distance between points and is less than the distance between points in corresponding dominated sets. Theorem 4.7 explains an instance (See See Figure 3(b)) where . In this case dominates if the distance between the points in dominated sets is greater than the particular distance threshold, i.e., .
Next, we proceed to introduce our pruning rules which are derived from the aforementioned theorems. These pruning rules help to filter all the points in an indoor partition that are highly unlikely to be selected in generating an optimal route.
Pruning Rule 1 ().
Let and where and . Then, the points are selected and a point is pruned only if and .
Pruning Rule 2 ().
Let and where and . Then, the points are selected and a point is pruned when and only if where and .
Pruning Rule 3 ().
Let and where and . Then, the points are selected and a point is pruned when and only if where and .
Pruning Rule 4 ().
Let and where and . Then, the points are selected and a point is pruned when and only if where and for given .
Definition 4.8 (Dominant point).
An indoor point is called a dominant point if it is highly likely to be selected in generating an optimal route.
Simply, a dominant point is a point that is selected by a pruning rule while a non-dominant point is a point which is never selected by a pruning rule. Accordingly, the pruning rules are capable of identifying the dominant points while pruning the non-dominant points as they incapable of generating better routes than the routes of dominant points. i.e., dominant routes. Due to the space limitations, we provide an example only for Pruning Rule 1. Let be an indoor partition consist of two doors and three indoor points and where . Assume that a user who wants to find a route from to covering categories, visit the point first. Then either the point or needs to be visited before visiting door to get a complete route. Assume that the user visits the point . Then, according to Pruning Rule 1, the point can be pruned only if . Because, the route dominates