I Introduction
Emerging transportation innovations, such as mobility-on-demand services and autonomous driving, call for stochastic routing where travel time distributions, but not just average travel times, are utilized [1, 2]. For example, consider two paths and where both paths connect the same source and destination. The travel time distribution of the two paths are shown in Table I.
Travel time (mins) | 40 | 50 | 60 | 70 |
---|---|---|---|---|
0.5 | 0.2 | 0.2 | 0.1 | |
0 | 0.8 | 0.2 | 0 |
If only considering average travel times, is always the best choice, because has average travel time 49 whereas ’s average travel time is 52. However, if a person needs to go to the destination within 60 minutes, e.g., to catch a flight, then is the best choice as it guarantees that the person will arrive the destination within 60 minutes. In contrast, taking may run into a risk that the person will miss the flight when the path takes 70 minutes.
In this paper, we consider stochastic routing that takes into account travel time distributions. In particular, we investigate the shortest path with on-time arrival reliability (SPOTAR) problem. Given a source, a destination, and a travel time budget, e.g., 60 minutes, the SPOTAR problem aims at finding a path that maximizes the on-time arrival probability, i.e., the probability of arriving the destination within the time budget.
Although the SPOTAR problem has been studied in the literature [4, 5], they all consider a classical uncertain road network modeling, where uncertain travel times are assigned to only edges, and the uncertain travel times on different edges are independent. We call the classical model the edge-centric model. However, recent studies suggest that the travel times on different edges are often highly dependent [2, 3]. The edge-centric model ignores the dependency and thus results in inaccurate travel time distribution computation for paths, especially for long paths. To contend with this, a PAth-CEntric model (PACE) has been proposed [2, 3]. In the PACE model, not only edges, but some paths are also associated with uncertain travel times and the uncertain travel times that are associated with paths well capture the dependency of the travel times among different edges in the path. Thus, the PACE model is able to provide much more accurate travel time distributions [2, 3]. However, existing algorithms that work for the edge-centric model cannot be applied directly in the new PACE model. In this paper, we investigate how to solve the SPOTAR problem under the PACE model.
Contributions:
To the best of our knowledge, this is the first paper to study the SPOTAR problem in the PACE model that exploits travel time dependencies among edges. First, we define the SPOTAR problem in the PACE model. Second, we propose a generic algorithm with different speed-up heuristics to solve the problem. Third, we report on comprehensive experiments based on real world trajectory data.
Ii Related work
We first review two uncertain road network models and then discuss existing studies on stochastic shortest path finding.
Uncertain Road Network Modeling: The edge-centric uncertain road network model has been applied extensively in stochastic routing [4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. In the edge-centric model, each edge is associated with an uncertain travel time and different edges’ uncertain travel times are independent. Given a path, the uncertain time of the path is derived by convoluting the uncertain times of the edges in the path. However, uncertain travel times among different edges in a path are often dependent but not independent. The edge-centric model is unable to capture the dependency and thus results in inaccurate uncertain travel times for paths. To contend with this, the PAth-CEntric (PACE) model is proposed recently [2, 3]. PACE also associates some paths with joint travel time distributions that well capture the dependency of the uncertain travel times of the edges in the paths.
Stochastic shortest path finding: There exist two categories of stochastic shortest path finding—the Shortest path with on-time arrival reliability problem (SPOTAR) and the Stochastic on time arrival problem (SOTA) [4, 5]. SPOTAR identifies an a-priori path that maximizes the on time arrival reliability of reaching a destination within a pre-defined time budget. SOTA considers a scenario where a vehicle keeps moving and needs to choose which edge to follow at each intersection, based on the travel time has been already spent. SOTA tries to find an optimal policy that guides a vehicle to choose the next optimal edge at each intersection. We study the SPOTAR problem in the PACE model, while exsiting studies on SPOTAR all consider the edge-centric model.
Iii Preliminaries
We use a graph to model a road network. In particular, a road network is modeled as a directed graph . is a set of nodes that represent intersections and is a set of edges that represent road segments. For example, Figure 1 shows a graph that models a road network with 6 intersections (i.e., nodes) and 9 road segments (i.e., edges).
A path is a sequence of adjacent edges , , , where . In addition, we require that the edges in path are unique, meaning that , if and . Next, we define a subsequence of edges in path as its sub-path . Formally, , , , , where . For example, in Figure 1, path is a sub-path of path .
Based on the above definitions, we proceed to introduce the classic edge-centric uncertain road network model and the path-centric uncertain road network model (i.e., PACE), respectively.
Edge-centric uncertain road network model: We maintain a weight function that takes as input an edge and returns an uncertain travel time for the edge. We maintain an uncertain travel time for each edge. The edge-centric model is considered as the de-facto model in stochastic routing [4, 5, 6, 7, 8, 9, 16].
The uncertain edge weights are often instantiated using trajectories that occurred on the road network. Given a set of trajectories, we first break the trajectories to small pieces that fit the underlying edges and then use the small pieces to instantiate edge weights.
Assume that there we have 100 trajectories as shown in Table II. The first row of Table II means that 80 trajectories traversed path , which spent 8 mins on and 6 mins on .
Traversed Path | Costs on edges | # of trajectories |
---|---|---|
8, 6 | 80 | |
10, 10 | 20 | |
8 | 100 |
Now, we are able to instantiate , , meaning that may take 8 mins with probability 0.9, and 10 mins with probability 0.1, as shown in Fig. 1. This is because out of 200 trajectories that traversed , 180 trajectories took 8 mins and 20 trajectories took 10 mins. Similarly, we have .
In this edge-centric model, the uncertain travel times of edges are independent. Thus, the uncertain travel time of a path is computed as the convolution of the uncertain travel times of the edges in the path [4, 5, 6, 7, 8, 9, 16]. For example, assuming that we have instantiated the edge weights as shown in Figure 1. Consider path . Since we assume that the travel times of and
are independent, we compute the joint travel time distribution first. Then, based on the joint distribution, we are able to compute the total travel time distribution, which is shown in Table
V.Note that the computed total travel time distribution of path is inconsistent with the “ground truth” distribution that is reflected in the trajectories in Table II. The ground truth distribution suggests took 14 mins with probability 0.8 (because out of 100 trajectories, 80 trajectories took 14 mins to traverse ) and 20 mins with probability 0.2 (because out of 100 trajectories, 20 trajectories took 20 mins to traverse ). This example shows that the travel time dependency, i.e., traveling fast/slow on and also traveling fast/slow on , is broken when using the edge-centric model, which results in inaccurate uncertain travel time for paths. It also shows that it is very important to use trajectories that are traversing the whole path, but not only intendant edges when calculating the travel cost distribution of a path.
PAth-CEntric (PACE) uncertain road network model: In the PACE model [2, 3], not only edges are associated with uncertain travel times, some paths are also associated with uncertain travel times. The uncertain travel times that are associated with paths model the joint travel time distributions of the edges in the paths.
Following the running example, PACE maintains not only uncertain travel times for edges, i.e., and , but also the joint travel time distribution for path , i.e., , as shown Table VIII(a). The joint travel time distributions maintained in PACE are directly derived from trajectories. Based on , we are able to derive total travel time distribution of , which well aligns with the ground truth distribution.
Note that, in PACE, only some paths are associated with uncertain travel times, but not all paths. A road network may have a huge number of meaningful paths and thus we cannot afford maintain uncertain travel times for all paths in PACE. In addition, we often lack sufficient trajectories to cover all paths. In practice, we maintain joint distributions for those paths which have been traversed by sufficient amount of trajectories, e.g., popular paths.
Next, we present how to compute the travel time distribution of a path in PACE by using a concrete example. Assuming that, in addition to path travel time distribution , PACE also maintains path travel time distribution for path . Then, given a path , there exist more than one combination of weights such that each combination covers . For example, we may use , , , , , , , and , to compute ’s uncertain travel time, respectively. It has been shown that the coarsest combination, i.e., the combination with the longest sub-paths, gives the most accurate uncertain travel time [2, 3]. In our example, it is , .
After identifying the coarsest combination for a query path , say , we use Equation 1 to compute the joint distribution of [2, 3].
(1) |
where denotes the overlapped path of paths and . Consider the running example where . We have since .
Problem Definition: Given a source vertex , a destination vertex , and a travel time budget , we aim at identifying path which goes from to and has the largest probability of arriving the destination within the time budget . This means that , where set contains all paths from to and the travel time distribution of path is computed using the PACE model.
Iv Proposed Solution
We propose an algorithm for solving the SPOTAR problem based on the PACE model. The algorithm is based on a heuristic function that estimates the least possible travel time from a vertex to the destination vertex. We first introduce the heuristic function and then introduce the algorithm using the heuristic function.
Iv-a Heuristic Function
Motivated by algorithm, to decide which edge we need to explore next to find the SPOTAR path, we maintain a heuristic function that takes as input a vertex and returns the least possible travel time from the vertex to the destination vertex .
A baseline heuristic function can simply return a travel time that equals to the Euclidean distance between the argument vertex and divided by the maximum speed limit in the road network. However, this heuristic is too optimistic and thus loose. We aim at introducing a tighter heuristic.
To this end, we first introduce a reversed graph based on the original graph and then compute a shortest path tree that is rooted at in . In particular, contains the same vertices with and the edges in have reversed directions of the edges in . Fig. 2 shows the reversed graph of the graph shown in Fig. 1. In the reversed graph, the edge weights are deterministic. In particular, for an edge , its weight equals to the least travel time of the uncertain travel time on edge in the original graph. For example, edge in has weight 8 since the least travel time of edge in is 8.
Next, we compute a shortest path tree that is rooted as the destination vertex in the reversed graph using Dijkstra’s algorithm. Note that Dijkstra’s algorithm finishes when the distance from the destination vertex to a vertex becomes higher than the time budget . In other words, we do not need to compute a full shortest path tree but only part of the shortest path tree that contains the vertices whose distances to are not larger than . We use function to return the minimum travel time which is needed to travel from to destination .
Figure 3 shows the reversed shortest path tree rooted at vertex with budget . If budget is 15, the shortest path tree excludes edge . And we have meaning that it took at least 11 mins to travel from to destination .
Based on the reversed shortest path tree, we derive the “best” travel time distribution of reaching destination vertex from any vertex within time units as , which is shown in Equation 4.
(2) |
Given a path from source vertex to vertex , we are able to compute the cost distribution of path using the PACE model. Now, based on best travel time distribution , we are able to estimate the best travel time distribution if we follow and continue to reach the destination . In particular, we use to denote the largest probability of arriving the destination within mins if we follow path to go from to and then reach the destination .
(3) |
where indicates the probability that takes mins and indicates the probability that it takes less than mins to go from to .
Iv-B Proposed Algorithm
The algorithm for computing the SPOTAR problem is shown in Algorithm 1.
We start initializing a priority queue that holds elements in the form of , where is a path from source to some vertex , is the largest probability of arriving within time budget using path , is the joint distribution of , and is the total travel time distribution of . We also initialize path , which intends to maintain a candidate SPOTAR path and its probability of arriving destination .
We first check all vertices that are from vertex (line 3 to line 7). Since now the candidate path only has one edge, the joint distribution and cost distribution are the same (line 4). Next, if the minimum travel time from the candidate path to is already larger than the time budget , we do not insert it into the priority queue. Otherwise, we insert it.
We proceed to the main while loop at line 10: in each iteration, we extract the element with the largest value until the priority queue is empty.
Assume that the element with the largest -value represents path . We check if has already arrived destination . If yes, we check if ’s -value is larger than the current maximum probability . If yes, we update the candidate SPOTAR path and remove all the elements from if their -values are smaller than the current . If not, we can stop the algorithm since all the remaining paths in the priority queue have -values less than the current maximum probability . This means it makes no sense to keep exploring the remaining paths, since their probabilities of arriving within time budget cannot be greater than .
On the other hand, if has not arrived at yet, i.e., , we need to extend by one more edge to get an extended path , which arrives vertex . Here, we only consider simple path without cycles as it has been shown that a path with cycles cannot be a SPOTAR path [6].
If it is possible to follow to arrive within the time budge , we compute its joint distribution using the PACE model and its total cost distribution . We next apply Equation 3 to compute the - value of the extend path .
Before inserting the extended path into the priority queue , we need to conduct the following stochastic dominance check. If there already exists a path in which also arrives . We check if ’s cost distribution stochastic dominanates [8] ’s cost distribution. If yes, we remove ’s element from and insert , , , into . Similarly, if stochastic dominates , we do not need to insert into . Note that only considering if ’s -value is larger than ’s -value is insufficient [4, 8].
Finally, we return the SPOTAR path along with the maximum probability of arriving within time budget .
V Running example
This section presents a concrete example of the SPOTAR problem which is solved by the proposed algorithm. The graph from Figure 1 is used to show a running example of the Algorithm for computing SPOTAR under the PACE model [1]. The joint distribution and the derived cost distribution for paths and for the graph in Fig. 1 are shown in TablesXI and XIV, respectively. For example, the joint distribution for path has been derived from the trajectories shown in Table II.
Problem: Consider a problem that aims at finding the path with the highest probability of reaching destination from source in a graph within a time budget . In addition, we have ; and the following paths’ joint distributions are known a priori and is maintained in the path weight function : in Table XI, in Table XIV
Solution: The reversed graph with minimum travel times for all edges is shown in Figure 2. Next, the algorithm computes the shortest path tree based on the graph, and labels each node with the distance to destination node as shown in Figure 3.
For each node , we define as the highest probability of reaching the destination node from node , i.e., Equation 4 as stated in [1], where denotes the minimum travel time from node to destination node and is a given time budget.
(4) |
The algorithm begins with initialization of the priority queue . Next, it iterates over all adjacent arcs of , i.e. and . For arc the algorithm computes which is given as follows:
{}
Next, the value of is computed using as follows:
After the value of has been computed, edge has been added to the path created until now, i.e., . The path along with its value are pushed to the priority queue .
The second adjacent arc of node that has been considered by the algorithm is arc . The algorithm again starts by computing the joint probability for the path discovered until now (i.e. ), which is shown as follows.
{}
Next, the algorithm computes values for node using .
Once the r-value of has been computed, edge has been added to the path created until now and its value has been pushed to the priority queue
Next, the algorithm has to pop the value with the maximum key from the priority queue . Until now there are two elements in i.e. edge and edge both with key value equal to 1. The algorithm decides to pop the element which is associated with the path based on the key .
After that, the algorithm iterates over all adjacent arcs of node i.e. arcs and .
For arc conditional independence holds because there are no overlapping sub-paths that contains and the path constructed so far by the algorithm. Therefore, the algorithm computes the joint distribution and then derive the associated cost distribution as shown in Table XVII.
Next, the algorithm prunes path based on minimum travel time to destination i.e. , therefore nothing has been added to priority queue.
The next adjacent arc of the path that ends in node is . It passes through path , therefore the algorithm uses the given joint and cost distributions from Table XI and then has been computed, where .
The path has been added to the priority queue .
Until now, there are two paths in the priority queue i.e. path and path with keys 1.0 and 0.8, respectively.
The algorithm pops the element associated with path from the priority queue based on the element keys since .
Then the algorithm iterates over all adjacent arcs of i.e. and . Arc can be pruned based on minimum travel time calculated as follows . Since arc passes through path , the algorithm uses the given probability in Table XIV in order to compute a value of , where as follows:
Since path ends in node and path also ends in node q and it is already added to the priority queue, we try to prune one of the paths based on Stochastic dominance as explained in [1].
As it can be seen from Figure 4, none of the path stochastically dominates the other, therefore we cannot prune either of them, hence the algorithm adds path to the priority queue
At this point the priority queue contains two paths: and with the following objects associated as follows:
Path has been popped up from the priority queue based on its key 0.8 and for all adjacent edges of node i.e. arcs and .
Arc is pruned by the algorithm bases on its minimum travel time to destination calculated as follows:
Arc is explored by the algorithm, it computes the travel cost distribution for path where independence between and does hold, the result can be seen in Table XX
The search reaches a destination node . Solution 1 for path with probability 0.32 of reaching the destination within a time budget of 22 units. After obtaining a solution. The algorithm checks the keys (i.e., the r-values) of all elements in priority queue. If the key is less then the probability of the solution , we prune the corresponding entry from the priority queue. Since , we therefore cannot prune path which is in the priority queue.
Pop based on key . For all adjacent arcs of node First can be pruned based on minimum travel time Second, arc . The algorithm computes the travel cost for the path . Independence does hold, therefore therefore we have that the result is shown in Table XXIII
The search reaches destination node d. Solution for path with probability of reaching the destination node within a time budget units. We check the keys of all elements in priority queue. If the key is less then , we eliminate the corresponding entry from the priority queue. Priority queue is empty, so we can not prune. Since solution 2 with path has higher probability of reaching the destination node within a time budget of units. It is the final solution.
Vi Experimental results
We report on experimental results using real world GPS trajectories.
Vi-a Experimental Setup
We use the road network of Aalborg, Denmark, which contains 6,253 nodes and 10,716 edges. We use 37 million GPS records that occurred in Aalborg from Jan 2007 to Dec 2008 to instantiate the PACE model. The sampling rate of GPS records is 1 Hz (i.e., one GPS record per second). If an edge is not covered by GPS data, we use the length of the edge and the speed limit on the edge to derive a travel time. If a path has been traversed by 10 trajectories, we maintain a joint distribution for the path. A visual representation of the edges that are covered and not covered by the GPS trajectories can be found in Figure 5.
Queries: We consider different settings to generate SPOTAR queries. We vary time budget (seconds) from 300, 500, 700, to 1,000. We also vary the Euclidean distance (km) between source-destination pairs: , , , and . For each setting, we randomly generate 20 source-destination pairs.
Methods: We consider different heuristic functions: (1) the proposed solution with minimum travel time to the destination using shortest path trees (SP); (2) the baseline heuristic using Euclidean distance divided by the maximum speed limit (BA). We apply the both heuristics on both PACE and the edge-centric model (EDGE), resulting four methods: SP+PACE, SP+EDGE, BA+PACE, and BA+EDGE.
Evaluation Metrics: We report average runtimes and sizes of search space for running SPOTAR in different settings.
We conduct experiments on a computer with Intel® Core™ i5-4210U CPU @ 1.70GHz × 4 processors with 12GB RAM under 64 bit Linux Fedora 25. The code was implemented in Python 3.
Vi-B Experimental Results
Runtime: Figure 10 shows the runtimes when using four methods under different settings. When the distance between a source-destination pair increases, the runtimes of all methods also increases for all time budgets. The SP heuristic shows significantly better runtime in comparison with the BA heuristic under all settings on both models. In addition, it is clear than the runtime growth of the BA heuristic is much faster in comparison with the SP heuristic as the distance between a source and a destination increases. Under the same heuristic function and time budget, methods based on PACE have similar runtime with the methods based on EDGE. This suggests that although PACE is able to provide more accurate results that EDGE, it does not takes longer run time. This suggests that PACE is both accurate and efficient.
Search space: We investigate the sizes of search spaces that different methods explore. In particular, we define the search space as the edges that have been explored by a method.
We compare the search spaces that are explored by different methods. Figure 15 shows that the search space of BA is much higher than that of SP in all settings, which is consistent with the performance of runtime shown in Figure 10.
Next, we show visually the edges that are discovered by both heuristic under a specific source-destination pair setting (see the caption of Fig. 20). It is clear that: (1) when using the same model, SP explores much less edges than BA does, indicating the SP heuristic is effective (see the green edges in Fig. 20); (2) the path returned by the PACE model is different from the path returned by the EDGE model (see the red paths in Fig. 20), indicating that the more accurate path distributions captured by PACE do make a difference on the returned paths.
Vii Conclusions and Outlook
Arriving on time is an important problem that has many applications in intelligent transportation systems. We present an effective algorithm that solves the SPOTAR problem on the novel path-centric PACE model that considers the dependencies among travel times on different edges. Experimental results on real-world trajectories suggest that the proposed algorithm is effective. In the future, we plan to study further speed-up strategies, e.g., using contraction hierarchies and hub labeling, on top of the PACE model, also possibly using parallel computing [17].
References
- [1] C. Guo, C. S. Jensen, and B. Yang, “Towards Total Traffic Awareness,” SIGMOD Record, vol. 43, no. 3, pp. 18–23, 2014.
- [2] J. Dai, B. Yang, C Guo, C. S. Jensen, and J. Hu, “Path Cost Distribution Estimation Using Trajectory Data,” PVLDB 10(3): 85-96, 2016.
- [3] B. Yang, J. Dai, C Guo, C. S. Jensen, and J. Hu, “PACE: a PAth-CEntric paradigm for stochastic path finding,” VLDB Journal 27(2): 153-178, 2018.
- [4] M. Niknami, S. Samaranayake, “Tractable Pathfinding for the Stochastic On-Time Arrival Problem,” SEA, 231-245, 2016
- [5] G. Sabran, S. Samaranayake, A. Bayen, “Precomputation techniques for the stochastic on-time arrival problem,” ALENEX, 138-146, 2014.
- [6] Yu Nie and Xing Wu, “Shortest path problem considering on-time arrival probability,” Transportation Research Part B: Methodological, 2009.
- [7] C. Guo, B. Yang, O. Andersen, et al. “EcoSky: Reducing vehicular environmental impact through eco-routing.” ICDE, 1412-1415, 2015.
- [8] B. Yang, C. Guo, C. S. Jensen, et al. “Stochastic skyline route planning under time-varying uncertainty.” ICDE, 136–147, 2014.
- [9] B. Yang, M. Kaul, and C. S. Jensen. “Using incomplete information for complete weight annotation of road networks.” TKDE, 26(5):1267–1279, 2014.
- [10] C. Guo, B. Yang, J. Hu, C. S. Jensen. “Learning to Route with Sparse Trajectory Sets.” ICDE, 12 pages, 2018.
- [11] H. Liu, C. Jin, B. Yang, A. Zhou. “Finding Top-k Optimal Sequenced Routes.” ICDE, 12 pages, 2018.
- [12] H. Liu, C. Jin, B. Yang, A. Zhou. “Finding Top-k Shortest Paths with Diversity.” TKDE, 30(3):488-502, 2018.
- [13] J. Hu, B. Yang, C. Guo, C. S. Jensen. “Risk-aware path selection with time-varying, uncertain travel costs: a time series approach.” The VLDB Journal 27(2):179-200, 2018.
- [14] J. Hu, B. Yang, C. S. Jensen, Y. Ma. “Enabling time-dependent uncertain eco-weights for road networks.” GeoInformatica 21(1):57-88, 2017.
- [15] Z. Ding, B. Yang, R. H. Güting, Y. Li. “Network-Matched Trajectory-Based Moving-Object Database: Models and Applications.” IEEE Trans. Intelligent Transportation Systems 16(4):1918-1928, 2015.
- [16] B. Yang, C. Guo, Y. Ma, and C. S. Jensen. “Toward personalized, context-aware routing.” The VLDB Journal, 24(2):297–318, 2015.
- [17] B. Yang, Q. Ma, W. Qian, A. Zhou. TRUSTER: “TRajectory Data Processing on ClUSTERs.” DASFAA, 768-771, 2009.
- [18] Z. Ding, B. Yang, Y. Chi, L. Guo. “Enabling Smart Transportation Systems: A Parallel Spatio-Temporal Database Approach.” IEEE Trans. Computers 65(5): 1377-1391, 2016.
Comments
There are no comments yet.