Learning to Route with Sparse Trajectory Sets---Extended Version

02/22/2018 ∙ by Chenjuan Guo, et al. ∙ Aalborg University 0

Motivated by the increasing availability of vehicle trajectory data, we propose learn-to-route, a comprehensive trajectory-based routing solution. Specifically, we first construct a graph-like structure from trajectories as the routing infrastructure. Second, we enable trajectory-based routing given an arbitrary (source, destination) pair. In the first step, given a road network and a collection of trajectories, we propose a trajectory-based clustering method that identifies regions in a road network. If a pair of regions are connected by trajectories, we maintain the paths used by these trajectories and learn a routing preference for travel between the regions. As trajectories are skewed and sparse, many region pairs are not connected by trajectories. We thus transfer routing preferences from region pairs with sufficient trajectories to such region pairs and then use the transferred preferences to identify paths between the regions. In the second step, we exploit the above graph-like structure to achieve a comprehensive trajectory-based routing solution. Empirical studies with two substantial trajectory data sets offer insight into the proposed solution, indicating that it is practical. A comparison with a leading routing service offers evidence that the paper's proposal is able to enhance routing quality. This is an extended version of "Learning to Route with Sparse Trajectory Sets" [1], to appear in IEEE ICDE 2018.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Vehicular transportation is an important aspect of the daily lives of many people and is essential to many businesses as well as society as a whole [2, 3]. As a part of the continued digitization of societal processes, more and more data is becoming available in the form of trajectories that capture the movements of vehicles [4, 5]. This data offers a foundation for improving vehicular transportation, including vehicle routing.

Traditional routing is cost-centric and aims at returning paths with minimal costs, e.g., distance, travel time, or fuel consumption. The cost of a path is computed from edge costs in edge-based cost modeling [6, 7, 8, 9, 10, 11] or sub-path costs in path-based cost modeling [12, 13, 14, 15]. In such routing, trajectory data is often used for annotating the edges or sub-paths with travel costs such as travel times; and routing services employ shortest path algorithms, e.g., Dijkstra’s algorithm or contraction hierarchies [16], to return fastest, or simply shortest, paths. However, an existing study [17] suggests that local drivers who drive passenger vehicles follow paths that differ substantially from the paths computed using cost-centric routing and are often neither fastest nor shortest. Our paper also focuses on trajectory data that was generated from passenger vehicles.

We study a very different routing approach that relies on the availability of trajectories from local drivers. Assuming that local drivers implicitly take into account a multitude of factors, such as traffic conditions, turns, travel time, fuel consumption, road types, and traffic lights, when making routing decisions and thus know best which paths are preferable, we propose a methodology that utilizes paths found in historical trajectories to construct new paths between arbitrary source, destination pairs. We call this trajectory-based routing.

If historical trajectories show that many drivers traveling from a source to a destination follow a particular path, it is straightforward to recommend that path to drivers asking for directions from to . The big challenge now is how to benefit from historical trajectories when no historical trajectories capture paths from to . This is important because any set of historical trajectories is sparse in the sense that it is unlikely to provide paths for all ’s and ’s. For example, the road network of Denmark, a small country, contains some 1.6 million edges. Thus, if all edges are candidate ’s and ’s, a minimum of 2.6 trillion pairs are needed. Given that the distribution of trajectories in a road network is skewed, an enormous set of trajectories (e.g., trillions for Denmark and quadrillions for Germany) would be needed before routing could be done by simply looking up paths of past trajectories for any pair.

Figure 1 exemplifies the problem setting. The solid edges and filled vertices are covered by a set of five trajectories, while the dashed edges and unfilled vertices are not covered by any trajectories. For example, trajectory visited and then , , , and before reaching . If routing from to is requested, the path , as captured by trajectory , can be recommended directly. The challenge is to enable routing for pairs that are not connected by trajectories, e.g., , and , .

Fig. 1: Motivating Example

To enable trajectory-based routing with massive, but still sparse, sets of historical trajectories, we propose means that are able to generalize the cases where historical trajectories can be utilized for routing. This includes three steps. In the first step, we cluster vertices into regions and thus map a road network graph into a region graph. Trajectories that originally connect vertices in the road network graph now connect regions in the region graph. This arrangement generalizes the cases where trajectories can be used for routing from being between specific vertex pairs to being between region pairs. As regions include multiple vertices, this arrangement contributes to solving the data sparseness problem.

For example, in Figure 1, and are clustered into region , and and are clustered into region . Now, although no trajectories connect and , connects regions and that are close to and . Thus, the path of can be used for recommending a path from to . For instance, a user may go from to , then follow the path used by to reach , and then go to . This enables trajectory-based routing between regions connected by trajectories. However, in the region graph, some region pairs are still not connected by any trajectories, e.g., regions and in Figure 1.

In the second step, we learn routing preferences from available historical trajectories that connect some region pairs and then transfer these preferences to similar region pairs that are not connected by trajectories. Based on the transferred preferences, we identify paths for the non-covered region pairs. Note that the routing preferences are learned for different region pairs, not for different individual drivers. Assume that , is similar to , , e.g., because both are from a residential area to a business district. Next, we extract a routing preference from the trajectories connecting and that explains the choice of paths from to . We transfer this routing preference to driving from to and then identify paths connecting and , upon which trajectory-based routing from to is possible.

In the third step, we provide a unified routing solution, called learn-to-route (L2R), which performs path finding on the region graph, thus enabling routing between arbitrary pairs in the original road network graph.

To the best of our knowledge, this is the first solution that learns routing preferences from historical trajectories and transfers the learned preferences to the part of a road network that is not covered by trajectories, thus supporting comprehensive trajectory-based routing for arbitrary pairs.

The paper makes four contributions. First, it presents a trajectory-based road network clustering algorithm that produces the data foundation—the region graph. Second, it presents a general routing preference model, including an algorithm that extracts preferences from historical trajectories and an algorithm that transfers preference to similar region pairs. Third, it presents a unified routing algorithm for the region graph. Fourth, it reports on an empirical evaluation that offers insight into the proposed solution, indicating that it is capable of efficiently computing paths that match those of local drivers better than do traditional routing services.

Paper Outline: Section 2 covers related work. Section 3 covers preliminaries. Section 4 presents Step 1, region graph generation. Section 5 presents Step 2, preference learning and transfer. Section 6 presents Step 3, unified routing. Section 7 reports on empirical evaluations. Section 8 concludes.

Ii Related Work

We first review studies on employing historical trajectories for path recommendation, considering three cases.

Case 1: Given a source and a destination, complete trajectories exist that connect the source to the destination. For example, given and in Figure 1, trajectory went from to . Then, the path of trajectory is recommended. When multiple paths exist, the path with the highest popularity is recommended, where the popularity can be defined using different strategies [18, 19, 20]. This is the simplest case, which is also considered in our proposal.

Case 2: Given a source and a destination, no complete trajectories exist that connect the source to the destination, but trajectories exist that can be spliced such that the spliced trajectories connect the source to the destination. In Figure 1, given and , sub-paths from , from , and from can be spliced to form a path from to . Alternatively, and can also be spliced to enable a different path from to

. To determine which spliced path is “best”, absorbing Markov chains 


and hidden Markov models 


are employed to the probabilities that different spliced paths may occur based on historical trajectories. The spliced path with the highest probability is chosen. In contrast, we learn routing preference vectors from trajectories and apply the preference vectors to identify best paths.

Case 3: Neither complete nor spliced trajectories are able to connect a source to a destination. In the example, consider, e.g., to , to , and to . Here, existing methods [18, 19, 20, 21] no longer work. In this paper, the use of the proposed region graph, together with the mechanism of learning and transferring routing preferences captured by past trajectories, makes it possible to extend the situations where historical trajectories can be utilized to cover also Case 3.

Next, we review related work on road network clustering. Gonzalez et al. [22] propose a graph partition method based on prior knowledge of the road network hierarchy with levels, which may vary from country to country. Wei et al[23] propose a grid-based method for constructing regions using trajectories, where two adjacent grid cells are merged if more than trajectories exist that passed through them. These studies rely heavily on “appropriate” parameters, e.g., and . Tuning such parameters is non-trivial. Based on recent advances in modularity based graph clustering, we propose a generic, parameter-free region generation method, where parameters such as and are not needed. Our proposal is also different from POI clustering [24].

Finally, we consider learning of routing preferences [25, 26, 27, 28]. Methods [26, 25] compare the paths used by trajectories to skyline paths [9] to identify different users’ dominating factors when choosing paths, e.g., travel time, fuel consumption, or distance. TRIP [27] uses the ratios between individual drivers’ travel time and average travel time to model personalized travel times. A recent study from Microsoft presents an algorithm that learns driver-specific parameters for Bing Maps’ ranking function for candidate paths based on individual drivers’s past trajectories [28]. However, all existing methods work only when trajectories are available. In contrast, our proposal is also able to transfer routing preferences to places without trajectories, where existing methods do not apply.

Iii Preliminaries

We cover the definitions of important concepts, introduce the problem, and present a solution overview.

A road network is a weighted graph , where vertex set consists of vertices representing road intersections, edge set consists of edges representing road segments, and is a set of weight functions, where each function has signature . For specificity, we maintain four functions in . Functions , , , and return the distance (DI), travel time (TT), fuel consumption (FC), and road type (RT) of the argument edge, respectively.

A path is a sequence of vertices where two consecutive vertices are connected by an edge.

A trajectory is a time-ordered sequence of GPS records capturing the movement of an object, where a GPS record captures the location of the object at a time point. The time gap between two consecutive GPS records in trajectories varies, from a few seconds (a.k.a., high-frequency trajectories) to tens of seconds or a few minutes (a.k.a., low-frequency trajectories). In the experiments, we test the proposed method on both a high-frequency and a low-frequency GPS data sets. Map matching [29] is able to align a trajectory with the road-network path that the trajectory traversed. For example, the path used by trajectory is .

Problem Setting. We study a new routing methodology—trajectory-based routing. Specifically, we study how to best utilize the paths found in trajectories to enable routing for arbitrary source and destination pairs such that the identified paths are similar to the paths chosen by local drivers.

Spareness. The spareness considered in the paper means that past trajectories cannot cover paths between all possible pairs, so simply looking up paths of past trajectories for a given pair does not work. Although it may be possible that a substantial set of trajectories cover the roads in a road network, e.g., the 1.6 million edges in Denmark, it is almost impossible to cover all possible pairs with paths. Having just one path for each pair in Denmark calls for 2.6 trillion trajectories. The key challenge is to conquer data sparseness by making it possible to benefit from historical trajectories for routing from to when no trajectories capture paths from to .

Solution Overview. We propose a three-step procedure to conquer the data sparseness problem, as outlined in Figure 2.

Fig. 2: Solution Overview

Given a road network and a set of trajectories , the clustering module employs modularity-based clustering to cluster vertices into regions, thus obtaining a region graph . We partition the edges in a region graph into T-edges and B-edges, according to whether they are traversed or not traversed by trajectories, respectively. For each T-edge, the preference learning module learns a routing preference. The resulting preferences are fed into the preference transfer module as training data, and the preference transfer module transfers the preferences from T-edges to similar B-edges. Based on the learned and transferred preferences, the routing module recommends paths for user-specified pairs.

Scope of the paper. (1) To account for time-dependent traffic conditions, we construct peak and off-peak region graphs using trajectories that occurred in peak and off-peak periods, respectively. These are constructed the same way, so we disregard the distinction in the presentation. Depending on the departure time, one of the two region graphs is chosen for routing. Modeling time-dependent traffic conditions at a finer granularity and building a dynamic region graph are interesting extensions that are left for future work.

(2) L2R utilizes trajectories from multiple drivers to recommend paths, and thus is not a personalized routing approach. In Section VII-C, we empirically compare L2R with state-of-the-art personalized routing approaches. L2R can also be adapted to support personalized routing by only using the trajectories from specific drivers, which we also leave as future work.

Iv Building the Region Graph

We propose a trajectory-based method for clustering the vertices of a road network into regions (Section IV-A). Then, we build a region graph that connects pertinent regions (Section IV-B). The region graph extends the cases where trajectories can be used for recommending paths between an arbitrary pair of source and destination, thus providing a foundation for the final routing module.

Iv-a Clustering Vertices to Regions

A region is a set of homogenous vertices where the homogeneity is defined based on two properties that are used in urban planning [30, 31]: (i) the numbers of trajectories associated with the vertices in a region are similar [30]; (ii) the edges connecting the vertices have the same road type [31]. The intuition is as follows. A region with vertices connected by edges of residential-road type may capture a residential area; and by taking into account the number of trajectories associated with the vertices, we can distinguish a residential area in the city from one in a suburb area because the former has more trajectories.

Consider Figure 3, where the label : on an edge indicates that trajectories occurred on the edge and that the road type of the edge is . For example, 100 trajectories occurred on edge , a type 1 road. According to the above two properties, vertices , , , and can be regarded as a region because they have more trajectories than the other vertices and are connected by road type 1 edges. Similarly, vertices , , and can be regarded as a region.

Fig. 3: An Example of Regions

Based on the two properties, we propose a modularity-based method that clusters vertices connected by the same road types into regions. The setting is a trajectory graph that consists of vertices and edges that are traversed by trajectories. Figure 3 shows the trajectory graph of the road network in Figure 1. A trajectory graph may not be a connected graph.

Next, we define popularity values for the edges and vertices in a trajectory graph. The popularity of edge is the number of trajectories that occurred on edge . The popularity of vertex is the sum of the trajectories that occurred on the edges that are incident to , i.e., . Next, we define as the sum of the popularity values of all edges in the trajectory graph.

Modularity, which is used widely in the network analysis literature [32, 33], quantifies the quality of the clusters in a graph from a global perspective. In our context, the modularity is high if the popularity of edges inside clusters is high and the popularity of edges between clusters is low, which is desired by property (i) of regions.

We define modularity gain [32, 33, 34] to quantify the benefit of merging vertices and into a cluster:

It has been shown that if merging two vertices and gives a non-positive modularity gain, the two vertices should not be merged [34]. If the modularity gain is positive, vertices and are merged into an aggregate vertex with a popularity that equals the sum of the popularity of the and , i.e., .

To take into account property (ii) of regions, i.e., the road type constraint, we also associate a road type attribute with an aggregate vertex that records the road type of edge .

We proceed to propose a hierarchical clustering method that follows a bottom-up, agglomerative clustering strategy 

[35]. In the beginning, each vertex is treated as a cluster. The clustering method keeps merging clusters into larger clusters until no more clusters can be merged.

Merging vertices. We call an original, non-merged vertex a simple vertex and refer to a cluster that contains merged vertices as an aggregate vertex. We differentiate the processes of merging adjacent simple and aggregate vertices.

Merging two simple vertices: Given two adjacent simple vertices , if , we merge the two vertices into an aggregate vertex , whose popularity is the sum of the popularity values of both and . In addition, we set the road type of the aggregate vertex as the road type of edge , i.e., .

After merging and , the topology of the graph is adjusted. First, and are removed from , and the aggregate vertex is added to . Second, the edge is removed from and any edges that used to connect to and are now connected to . We use function to denote the procedure of merging two simple vertices and .

Merging an aggregate vertex and a simple vertex: We use function to denote the procedure of merging an aggregate vertex and a simple vertex . If the modularity gain and if the road type is consistent with the road type of the aggregate vertex , the two vertices are merged in a way similar to merging two simple vertices. Otherwise, the two vertices cannot be merged, and edge is removed from edge set .

Merging two aggregate vertices: We use function , to denote the procedure of merging two aggregate vertices and . If and if the road type is consistent with , the two vertices are merged similarly to the process of merging two simple vertices. Otherwise, they are not merged, and edge is removed from edge set .

Clustering process. The clustering method always chooses a vertex with the highest popularity, regardless of whether it is an aggregate or a simple vertex, to merge with its adjacent vertices. If has no adjacent vertex, forms a region. If has adjacent vertices, we compose a vertex set that consists of all the adjacent vertices, which are candidates for being merged with . Next, we further filter the vertices in to identify the vertices that are final candidates for being merged with , which forms vertex set .

Checking qualification: Function , checks whether can be merged with an adjacent vertex in . It returns true if and can be merged. We distinguish cases according to whether vertices and are simple or aggregate vertices. All cases should satisfy the condition . In addition, we state the additional road type related conditions that vertices and must satisfy to be merged in Table I.

: Simple : Aggre.
: Simple
: Aggre.
TABLE I: Additional Condition for Merging and

Merging selection: After filtering, vertex set consists of ’s adjacent vertices that have passed the qualification check. We call function that returns a subset of vertices that should be merged with .

If is an aggregate vertex, has a road type . All vertices in are returned by , i.e., . This is so because Table I enforces that , if is a simple vertex, and that , if is an aggregate vertex.

If is a simple vertex, the edges between and its adjacent vertices may have different road types, i.e., , , and may be different. Thus, returns the largest subset of vertices in such that their edges that are incident to have the same road type. For example, let , , , and , then returns .

Given a trajectory graph , where vertices in are those in the road network traversed by trajectories, the pseudo code of the algorithm is presented in Algorithm 1.

Input: A trajectory graph: ;
Output: A Cluster Set ;
1 Initialize a priority queue based on vertex popularity;
2 Insert all vertices in into ;
3 while  is not empty do
4        Vertex ;
5        Vertex set ;
6        if  then
7               Qualified vertex set ;
8               for each vertex  do
9                      if ,  then
12              ;
13               for each vertex  do
14                      Remove edge from ;
15              for each vertex  do
16                      Remove from ;
17                      Call one of , , and depending on whether and are simple or aggregate vertices and generate a new aggregate vertex .
18              Insert the new aggregate vertex into ;
19       else
return ;
Algorithm 1 BottomUpClustering

We utilize a priority queue to order simple and aggregate vertices according to their popularity values. The priority queue returns the vertex with the highest popularity to start a merge iteration (lines 1–4). If has no adjacent vertices, it becomes a cluster (line 19). Otherwise, we consider whether should be merged with its adjacent vertices. We first find those of ’s adjacent, qualified vertices for merging with (lines 8–10). Then we identify which includes the vertices that will actually be merged with and cut the graph between and the vertices in (lines 11–13). Finally, we merge with its adjacent vertices in and add it back to the priority queue (lines 14–17).

A simple example of the clustering algorithm is show in Figure 4.

Fig. 4: An Example of the Clustering Process

In the beginning, simple vertex has the largest popularity and is first popped from the priority queue. We compute the modularity gains between and , , and , respectively. The modularity gains are also shown in Figure 4. Vertices and are not merged with because their modularity gains are negative. Edges and are removed from . Since vertices and have positive modularity gains and the road types of and are both 1, an aggregated vertex is formed with road type 1.

In the 2nd merging iteration, aggregate vertex has the largest popularity and thus is considered with its adjacent vertices. Vertices and cannot be merged with since the road types are inconsistent. Since the modularity gain between and is positive and the road type of edge is consistent with the road type of , a new aggregate vertex with road type 1 is created.

During the clustering, we need not control manually the size of clusters, as a cluster “ends” automatically when merging it with the neighbors gives non-positive modularity gains or they have different road types. This prevents naturally clusters of extremely large sizes. In addition, we maintain paths used by trajectories inside regions (see “inner-region paths” in Section IV-B). This design is useful when the source and destination in a routing request is inside a region, which is common for large regions.

Based on the above, we are able to form regions in a trajectory graph where both properties (i) and (ii) are satisfied. For example, the dashed circles in Figure 3 indicate regions. The popularity of edges in region is high, while the popularity of the edge between regions and is low; region has road type 1 edges, while the edge between regions and have road type 2.

Iv-B Region Graph

We build a region graph based on the obtained regions, which serves as a foundation for routing. The region graph can be regarded as a backbone of the road network graph. To distinguish it from the road network graph, we call a vertex in the region graph region vertex and an edge in the region graph region edge. In particular, a region vertex represents a region. We proceed to show how to construct region edges by connecting region vertices, using the combination of two different strategies.

Constructing region edges from trajectories: Having identified regions, trajectories that originally connected vertices in the road network are now utilized to connect regions. If a trajectory exists that went through a vertex in region and a vertex in region , we construct a region edge . Note that a trajectory may produce more than one region edge. In particular, if a trajectory went through vertices in regions, up to region edges can be constructed. For example, in Figure 3, trajectory went through vertices in , , and , and we are able to construct region edges , , and , as shown in Figure 5(a).

Fig. 5: Region Graph

Each region edge is associated with a set of paths, where each path in was traversed by at least a trajectory that left at vertex and entered at vertex . A vertex at which a trajectory enters or leaves a region is called a transfer center, e.g., and .

For example, region edge is associated with path because trajectory left at vertex and entered at vertex , and thus and are transfer centers. Similarly, region edge is associated with path , where and are transfer centers; and region edge is associated with path , where and are transfer centers.

For each region, we also maintain inner-region paths based on trajectories. Specifically, given a region and a trajectory , if entered at and left at , the path that was traversed by in is recorded as an inner-region path of . For example, regions and have inner-region paths and , respectively.

However, when only using trajectories for constructing region edges, the resulting region graph may not be a connected graph. For example, in Figure 3, region is not connected with any other regions since no trajectory went through and other regions. Thus, we get the region graph in Figure 5(a). To enable the region graph to serve as a foundation for routing, we need to ensure that the region graph is connected. To this end, we apply a breadth first search (BFS) based procedure to make the region graph connected.

To ease the following discussion, we call the region edges that are constructed from trajectories T-edges and the region edges that are constructed from the BFS procedure B-edges.

BFS construction of region edges: We consider the original road network graph . We conduct a BFS for each vertex in a region . When the search reaches a vertex in a different region , we stop further exploring ’s neighbors so that the search does not enter another region via . If no T-edge or B-edge exists between regions and , we build a B-edge as their region edge. We repeat the same procedure until all vertices in region are traversed. The method of obtaining specific paths for B-edges will be discussed in detail in Section V.

For instance, consider vertex in region in Figure 3 and the original road network graph in Figure 1. A BFS starting from visits vertices and . Since vertex is in region , a region edge is constructed as a B-edge. Similarly, since vertex is in region , a region edge is constructed as a B-edge. The same procedure is applied to the other vertex in region , i.e., vertex , but it does not produce any new B-edges. After applying the same procedure to each region, we obtain the final region graph shown in Figure 5(b).

Different from T-edges that are composed by trajectory paths, B-edges have no path information because no trajectories went through the regions connected by the B-edges. To enable routing on top of the region graph, we need to know the paths when traveling between two regions that are connected by B-edges. To this end, in Section V, we study how to learn and transfer appropriate paths for B-edges.

An alternative way to make the region graph connected is to connect every region pair, i.e., making the region graph fully connected. However, the BFS based procedure has two benefits. First, it guarantees that there are no disconnected regions. Second, it tries to connect a disconnected region to its nearby regions, which makes the region graph simple.

V Identifying Paths for B-Edges

To enable routing using the region graph, we associate appropriate paths with all B-edges using a three-step method. First, for each T-edge, we learn a routing preference from the set of paths that are associated with the T-edge, which explains why drivers choose specific paths. Second, we quantify the similarity between T-edges and B-edges, and then we transfer routing preferences from T-edges to B-edges based on similarity. Third, we apply the transferred routing preferences to identify appropriate paths for B-edges.

V-a Step 1: Learning routing preferences for T-Edges

Each T-edge is with a set of paths (see Section IV-B) that connects region to region . We learn a representative routing preference vector for each T-edge that explains why drivers chose the paths in .

We consider two categories of features that may affect a driver’s travel decisions—travel costs and road conditions. Travel cost features describe the travel costs that drivers want to minimize. Road condition features describe drivers’ preferences or restrictions relating to road conditions. For example, we may consider three different travel cost features, travel time (TT), distance (DI), and fuel consumption (FC); and three road condition features, e.g., highways, residential roads, and highways and residential roads.

Based on the above, we use a 2-dimensional vector to represent a routing preference, where the so-called master dimension corresponds to travel cost features and the so-called slave dimension corresponds to road condition features. For example, vector TT, Highway indicates a preference for minimizing travel time and using highways.

Based on the routing preference model, we aim at identifying an appropriate preference vector for the T-edge based on its path set . Given the source and destination of a path and a preference vector , we are able to construct a path based on that connects the source and destination of . If captures the driver’s preferences well, path should largely match the actual, or ground truth, path . Thus, we aim to identify a routing preference vector such that the constructed paths match the paths in as much as possible. Equivalently, we aim at solving the optimization problem , where is a set of possible vectors and is a path similarity function that evaluate the similarity between two paths.

We use a popular path similarity function [26, 36]:


The intuition is two-fold: first, the more edges the constructed path shares with the ground-truth path , the more similar the two paths are; second, the longer the shared edges are, the more similar the two paths are.

A naive way of solving the optimization problem is to search the whole space, i.e., all combinations of features in the master and slave dimensions. However, the search space can be very large, thus rendering the learning algorithm inefficient. We instead propose an efficient learning algorithm that is inspired by coordinate descent. In short, we first identify the best travel cost feature in the master dimension, and next, based on the chosen travel cost feature, we identify the best road condition features in the slave dimension.

Specifically, given the source and destination of each ground truth path , we obtain a lowest-cost path using each cost type. This yields three lowest-cost paths , , and , for distance, travel time, and fuel consumption [37, 38], respectively. We then measure the similarity between path and each of the three lowest-cost paths and choose the optimal cost type whose corresponding lowest-cost path has the highest similarity. Next, we identify the optimal road condition feature. For each road condition feature, we compute a new lowest-cost path based on the optimal cost type while making sure that the road condition feature is also satisfied. We check if the similarity between the new path and the ground truth path can be further improved. The road condition feature that gives the largest improvement is chosen as the optimal road condition feature.

For example, if the optimal cost type is distance, we test if the shortest path with preferences for highways or residential roads can yield a higher similarity compared to shortest path without any road type preferences. If so, we choose the road type that gives the largest similarity improvements. Otherwise, all the road condition features are ignored.

Next, we provide statistical evidence to justify our design choice of choosing only a single representative preference for each T-edge. Given a T-edge , we learn a routing preference for each path in , and we count the number of unique preferences. The curve in Figure 6 shows that for more than 70% of all T-edges, we obtain a single preference, although multiple paths often exist in . Thus, we chose to learn a single routing preference for each T-edges. On the other hand, Figure 6 also suggests that it is possible that a T-edge has more than one preference—we leave the modeling multiple preferences per T-edge as future research.

(a) Distribution of Preferences
(b) vs. Preference Similarity
Fig. 6: Statistical Evidences for Design Choices

We also show the distribution of the learned routing preferences as bars in Figure 6. We aggregate more than 200 unique routing preferences based on their travel cost features, i.e., DI, TT, and FC. The bars show that the routing preferences are distributed almost uniformly, indicating that T-edges do have different routing preferences.

V-B Step 2: Transferring routing preferences

So far, we have identified preference vectors for T-edges. The next step is to associate preference vectors with B-edges, which can then be used to identify appropriate paths for B-edges. To this end, we transfer the routing preferences of T-edges to similar B-edges, which follows the intuition that when two region edges are similar, they also have similar routing preferences. For example, if most local drivers choose the fastest paths with a preference for main roads to travel between a region in the city center and a northern suburb residential area, it is also likely that local drivers have this preference when traveling between another region in the city center and a southern suburb residential area.

Based on the above intuition, we first introduce the similarity function that quantifies the similarity between two region edges and then provide an algorithm that transfers routing preferences between similar region edges.

Similarity between two region edges: Any region edge, a T-edge or a B-edge, connects two regions. A region edge is described by the features of its two regions. In particular, we use two elements and to describe a region edge .

Element is a real value, indicating the Euclidean distance between the centroids of the two regions connected by the region edge. The distance information is an influential factor when drivers choose their paths. For example, drivers may prefer the fastest paths if they travel long distances, but they may prefer the shortest paths when traveling at shorter distances.

Next, element describes the functionalities of the two regions. Element is also essential because, for example, when traveling between two business districts and between a residential area and a city center, drivers may have different preferences. In particular, we use a set of road types to describe the functionality of a region [31]. For each region, we consider all edges that are incident to the vertices in the region and select top- road types of the edges as the region’s road type set. For example, regions and have top-2 road type sets TP1, TP2 and TP3, TP4, respectively. Then, region edge , has element that is the Cartesian product of the road type sets from both regions: TP1, TP3, TP1, TP4, TP2, TP3, TP2, TP4.

Based on the above, the similarity between two region edges and , quantified by the similarity of their feature vectors, is defined as follows.

The similarity function is the sum of distance similarity and region function similarity. For distance similarity, the more similar the two distances are, the larger the similarity is. This captures the intuition that travels between equally far apart regions may tend to have similar routing preferences. For region function similarity, we use Jaccard similarity to evaluate the similarity between the region functions. If the two region edges share more function features, meaning that they connect similar region pairs, travels on the two region edges are expected to have similar routing preferences.

To justify design choices, that (i) similar region edges have similar routing preferences and that (ii) the proposed region edge similarity function is effective, we show the results of an experiment using preferences learned from T-edges in Figure 6. First, the “Similarity” bars show that similar T-edges have similar routing preferences, while dissimilar T-edges have dissimilar routing preferences. Second, the “Percentage” bars show the percentages of T-edge pairs that fall in a different T-edge similarity ranges. There are many similar (e.g., similarity above 0.5) T-edges, although there are few highly similar (e.g., similarity above 0.9) T-edges. This makes it possible to transfer routing preferences among region edges, and these observations indicate that the design choices are purposeful.

Transferring preferences among similar region edges: We adopt the idea of graph-based transduction learning [39, 40] to transfer routing preferences from T-edges to similar B-edges. First, we build a undirected, weighted graph, where a vertex represents a region edge, which can be a T-edge or a B-edge. Given a total of region edges, we use an adjacency matrix to record the edge weights of the graph. Specifically, equals to the similarity between region edges and , where .

Next, we introduce an adjacency matrix reduction threshold . In the adjacency matrix, we only keep the values that exceed ; otherwise, we set the values to 0. This way, the adjacency matrix only captures “sufficiently” similar region edge pairs, which enables to control the accuracy of the transferred preferences. The less dense resulting matrix also improves efficiency (see Figure 9(b) in experiments).

Figure 7 shows a graph with four vertices (i.e., ) representing two T-edges and two B-edges. The corresponding matrix is also shown. For example, indicates that the similarity between and is 0.9, and indicates that the similarity between and is smaller than threshold .

In the next step, we use a matrix to denote the initial routing preferences of different region edges. Here, is the total number of region edges, including T-edges and B-edges, and is the total number of travel cost and road condition features that are used for modeling routing preferences in Section V-A.

To illustrate, we consider two travel cost features DI and TT and three road condition features indicating preferences on road type TP1, TP2, and both, i.e., TP1+2. In this setup, matrix has columns that represent features DI, TT, TP1, TP2, and TP1+2.

Each row in corresponds to a region edge’s routing preference. For a T-edge, the features corresponding to its learned routing preference are set to 1. For example, assuming that T-edge has preference vector DI, TP1, the first row of is set to . Similarly, if T-edge has preference vector TT, TP2, the second row is set to , as shown in Figure 7. Next, since the routing preferences of the B-edges are unknown, the rows that represent B-edges are set to .

Fig. 7: Transferring routing preferences

The transduction learning yields matrix that records the transferred routing preferences for the B-edges. Specifically, indicates the probability of region edge having the -th routing preference feature. For B-edge , the travel cost feature with the largest probability, i.e., , is used as the final travel cost feature. In the example in Figure 7, this is DI for and TT for . The road type feature with the largest probability, i.e., , is used as the final road type feature. In the example in Figure 7, this is TP1 for and TP2 for . Finally, B-edges and obtain the transferred routing preferences DI, TP1 and TT, TP2, respectively.

Now the remaining question is how to obtain , which is the core of the transduction learning.

Obtain matrix : We obtain by minimizing the following objective function


where and indicate the -th column of matrices and , respectively. Hyper-parameters and control the relative influences of the second and third terms in the objective function, respectively.

The intuition of each term of the objective function is as follows. First, the T-edges should keep the routing preferences that are learned in step 1. The T-edges’ learned routing preferences serve as training data in the transduction learning process.

Matrix is an auxiliary matrix that indicates which region edges are T-edges. In particular, we organize the region edges such that the first edges are T-edges and the remaining edges are B-edges. Then, is a diagonal matrix, where the first diagonal entries are set to 1 and the remaining diagonal entries are set to 0. Specifically, we have in our example because and are T-edges.

Based on , the first term actually computes the sum of the squared differences between and of the rows that represents T-edges. By minimizing the first term, we try to identify a that minimizes the difference. This means that the T-edges should try to keep their learned preferences from step 1. On the other hand, the first term does not pose any constraints between and of the rows that represents B-edges.

Second, the T-edges’ routing preferences are transferred to B-edges. The transfer process ensures that the more similar the two region edges are, the more similar their routing preferences are. This is realized by the use of the unnormalized graph Laplacian matrix in the second term of Equation 2. In particular, , where is the adjacency matrix and is a diagonal matrix where and if . In our example, we have

With the help of , the second term actually computes the sum of the products of the similarities of two region edges and the differences of the two region edges’ corresponding routing preferences. When the similarity of the two region edges is high, a small difference between their routing preferences make the product significant. Minimizing the second term in the objective function has the effect of smoothly spreading the routing preferences from T-edges to B-edges such that (1) two region edges with high similarities have highly similar routing preferences; (2) two region edges with low similarities may have dissimilar routing preferences.

Third, we conduct L2 regularization [39, 40] to avoid over-fitting.

Next, we need to minimize the objective function. By differentiating Equation 2 by and then setting it to 0, we get


Using basic linear algebra practice, Equation 3 can be solved by iterative approximation algorithms [41], e.g., the Jacobi method [39] or the conjugate gradient method [42]. We need to solve Equation 3 times; and each time, we obtain a where . Finally, we obtain .

V-C Step 3: Applying Transferred Preferences

After step 2, each B-edge has a transferred preference vector. For each B-edge, we now identify a few appropriate paths according to its transferred preference vector. Consider B-edge . Recall that a region has transfer centers where trajectories enter and leave the region (see Section IV-B). For each pair of a transfer center from and a transfer center from , we identify a path according to the preference vector. Finally, the identified paths are associated with B-edge .

We proceed to modify Dijkstra’s algorithm to accommodate the preference, as shown in Algorithm 2. To ease the discussion, we assume that a B-edge is associated with a transferred routing preference vector DI, TP1, meaning that minimizing travel distance and using road type TP1 are preferred. Recall that the first dimension is master dimension and the second dimension is the slave dimension.

The overall procedure is similar to the classical Dijkstra’s algorithm. In the algorithm, each vertex is associated with two attributes—a cost attribute that records the cost of travel from the source to the vertex and a parent attribute that records the parent vertex of the vertex. And we use a priority queue to control the order of visiting different vertices (lines 1–4).

Here, the cost value maintained in a vertex corresponds to the specific cost type feature for the master dimension of a given preference vector. For example, when considering preference vector DI, TP1, each vertex is associated with a cost that equals to the distance (according to DI) from the source vertex to the vertex.

The algorithm always chooses the vertex with the lowest cost, say vertex , to continue exploring (line 6). When exploring from , we differentiate two cases (lines 7–14): (i) at least one edge satisfies the slave preference, and (ii) no edge exists that satisfies the slave preference. For case (i), only edges that satisfy slave preference are explored. For case (ii), all ’s adjacent vertices are explored. This way, we make sure that the preferences on both the master and slave dimensions are accommodated by the algorithm.

Input: Preference Vector: ; Source and destination vertices: , ; Road Network: ;
Output: Path that connects and
1 for each vertex  do
2        ; ;
5 Initialize a priority queue and add all vertices to ;
6 while  is still in  do
7        vertex ;
8        Boolean false;
9        if there does not exist a vertex such that x is ’s adjacent vertex and edge ’s road type satisfies  then
10               true;
11       for each vertex that is adjacent  do
12               if edge ’s road type satisfies  then
13                      if  then
14                             ;
15                             ;
19Construct from using the parent attributes and return ;
Algorithm 2 ApplyingPreferencesModifiedDijkstra

The three steps yield a region graph where each region edge has a set of paths, meaning that the region graph can serve as a foundation for routing.

Vi Routing on Region Graphs

Given an arbitrary pair of a source and a destination in the original road network graph , we present a routing algorithm that is able to recommend a path connecting them, using the region graph. We distinguish two cases.

Case 1: Vertex is in a region, say , and vertex is also in a region, say . If both vertices are in the same region, i.e., , since we maintain inner-region paths inside regions, we check if trajectories exist that traverse from to . If yes, we return a path with the largest number of trajectory traversals; if no, we return the fastest path.

If the vertices are not in the same region, i.e., , we first identify a region path based on the region graph and then map the region path to a path in the original road network.

Routing on the region graph: The intuition is to find a region path that follows fewer region edges to reach the destination region . This is because if a region path consists of many region edges, it involves the stitching of many paths from different trajectories, which may not represent coherent routing preferences. Thus, in the routing procedure, we always prefer to follow a region edge that enables us to go to a region that is geometrically close to the destination region. When a region edge exists that directly connects and , we always use that region edge. Otherwise, we give higher priorities to the region edges that lead to regions that are closer to the destination region .

To illustrate, consider the region graph shown in Figure 5(b) and assume the physical locations of the regions are also represented in Figure 5(b). Assume that regions and are given as the source and destination regions. When exploring from , region is preferred over regions , and because is much closer to destination region . Finally, the region path , , , is returned.

Recall that each region edge corresponds to some paths in the original road network graph. Based on this, a region path can be mapped back to a path in the road network graph, which is then returned as the result.

Case 2: At least one of and is not in a region. In this case, we find appropriate region vertices for or/and . Then, we apply the procedure from case 1.

To this end, we issue a fastest path finding algorithm from to based on road network graph . If a region is visited by the algorithm, we consider it as a candidate region . Similarly, we can identify a candidate region . Then we apply the procedure from case 1 with source region and destination region to identify a path . Finally, we return a path that consists of three sub-paths—the fastest path from to , denoted as , the path that connects and , and the fastest path from to , denoted as , as shown in Figure 8. In case there is only one or no candidate region, we simply return the fastest path, e.g., in the case of and in Figure 8.

Fig. 8: Routing, Case 2

Vii Empirical Study

We conduct a comprehensive empirical study on two substantial GPS trajectory data sets.

Vii-a Experimental Setup

Road Network and GPS Trajectories: We use two road networks, and , both obtained from OpenStreetMap (openstreetmap.org). represents the road network of Denmark and includes 667,950 vertices and 1,636,040 edges, which are contained in a 320 km 370 km rectangular region. represents the road network of Chengdu, China and includes 27,671 vertices and 77,444 edges, which are contained in a 33 km 25 km rectangular region.

We use two GPS data sets and from and , respectively. consists of more than 180 million high-frequency GPS records that were collected by 183 vehicles at 1 Hz (i.e., one GPS record per second) in 2007 and 2008. consists of 100 million low-frequency GPS records that were collected by 10,864 taxis from August 3rd to 30th 2014. The sampling rate varies from 0.03 Hz to 0.1 Hz. We only use parts of trajectories where taxi have passengers on. We map match [29] the GPS records in and onto and , respectively, obtaining 466,305 trajectories and 185,284 trajectories, where the trajectories in represent trips with passengers. The travel distance distributions of the trajectories are shown in Table II.

Distance () (0,10] (10,50] (50,100] (100, 500]
# Trajectories of 427,430 35,271 2,263 1,341
Percentage (%) 91.6 7.6 0.5 0.3
Distance () (0,2] (2,5] (5,10] (10, 35]
# Trajectories of 29,256 105,503 43,473 7,052
Percentage (%) 15.8 56.9 23.5 3.8
TABLE II: Statistics of Trajectories

Training and Testing Data: We partition the trajectories in and into training data and testing data. Specifically, trajectories that occurred during the first 18 months in and the first 21 days in are used as training data; and trajectories that occurred in the last 6 months in and the last 7 days in are used as testing data.

Evaluation Criteria: For each trajectory in the testing data Test, we record its source and destination, departure time, and the actual path used by the trajectory. Since the aim of the paper is to reuse local drivers’ routing intelligence to recommend paths, the paths used by the local drivers are considered as the ground truth (GT) paths.

In the experiments, we run learn-to-route (L2R) on each pair of source and destination in Test, and we compare the returned path with the GT path, using the path similarity function in Section V-A. In addition, we also identify the shortest path, the fastest path, the paths returned by two personalized routing algorithms and by Google Maps, and compare them with the GT path. The departure time is used when identifying the L2R paths, the fastest paths and the Google Maps paths.

We also report results w.r.t. the accuracy using a different but also popular path similarity function [26, 36] to evaluate the similarity between the routing path and the ground truth path.


It follows the intuition of the similarity function in Section V-A, but utilizes the union of segments in the constructed path and the ground-truth path as the denominator.

Results are categorized according to the lengths of the GT paths and according to whether the source or destination of a GT path belongs to a region in the obtained region graph. If both the source and destination of a GT path are in regions, the path is in category InRegion. If either the source or the destination is in a region, the path is in category InOutRegion. If neither source nor destination belongs to a region, the path is in category OutRegion.

Implementation Details: All algorithms are implemented in Java using JDK 1.8. We conduct experiments on a server with a 64-core AMD Opteron(tm) 2.24 GHZ CPU, 528 GB main memory under Ubuntu Linux. We use distance, travel time, and fuel consumption as the travel cost features, where the fuel consumption is computed based on speed limits using vehicular environmental impact models [37]. We use six commonly used road types from OpenStreetMap as road condition features: motorway, trunk, primary, secondary, tertiary, and residential. The transduction learning algorithm for transferring preferences (cf. Section V-B) is implemented using the Junto library (github.com/parthatalukdar/junto).

Vii-B Evaluation of Design Choices

We evaluate the design choices chosen for L2R. In particular, we show the effect of important parameters by varying them according to Table III where default values are shown as bold. When we vary one parameter, we keep the remaining parameters at their default values. We show results for both and in most empirical studies, but omit some results for when they show little difference to those of .

Parameters Values
# T-edges 1X, 2X, 3X, 4X, 5X
Threshold 0.5, 0.6, 0.7, 0.8, 0.9
TABLE III: Parameters of L2R

Region Sizes: We report the sizes of the obtained regions by computing their convex hulls and then reporting their areas (in km) and maximum diameters (in km).

Size (km) (0,2] (2,10] (10,100] 100
3,357 (78.6%) 539 (12.6%) 304 (7.12%) 70 (1.63%)
/ 9.5 / 15.8 / 29.9 / 304.1
Size (km) (0,2] (2,5] (5,10] 10
388 (72.1%) 127 (23.6%) 19 (3.53%) 4 (0.74%)
/ 4.24 / 8.17 / 8.59 / 6.22
TABLE IV: Region Sizes

Table IV reports the numbers of regions whose area falls in given ranges and the maximum diameter of the regions in each range. There are a few large regions, but most regions have sizes less than 2 km. This indicates that the proposed modularity-based clustering is able to control the region size and avoids very large regions. has a few large regions, which represent backbone highways. Since we maintain inner-region paths for regions, large regions do not affect the final routing quality.

Transferring Preferences: We study the accuracy of transferring preferences from T-edges to B-edges in Step 2. As we have no ground-truth preferences for B-edges, we cannot evaluate the accuracy of the transferred preferences in a straightforward manner. To evaluate the accuracy, we randomly partition the preferences of T-edges into 5 partitions. We reserve one partition as a ground truth. Next, we use partition 1; partitions 1 and 2; partitions 1, 2, and 3; and partitions 1, 2, 3, and 4, to conduct the preference transfer. For each T-edge in the reserved partition, we obtain a transferred preference, which we compare it with the ground truth preference. We report the accuracy of the transferred preference against the ground truth preference using Jaccard similarity.

Figure 9(a) shows the accuracy when using 1, 2, 3, and 4 partitions, labeled as X, 2X, 3X, and 4X. The results indicate that the more preferences of T-edges are used, the better the accuracy we get. Therefore, we use all the preferences of T-edges (i.e., 5X) in the remaining experiments.

Next, we consider the effect of the adjacency matrix reduction parameter threshold on the transfer process. Since Figure 6 already suggests that when the similarity between two region edges is low, their preference vectors are dissimilar, we vary from 0.5 to 0.9 and ignore small values.

We use 4 partitions of T-edge preferences to build the adjacency matrix and use the last partition as the ground truth preferences. We report the accuracy of the transferred preferences against the ground truth measured using Jaccard similarity, the null rate (N-rate), i.e., the percentage of transferred preferences that get null values, and the run-time in Figure 9(b).

The accuracy of the transfer process increases slightly as increases and is not sensitive to the change of values when exceeds 0.5. This is intuitive because a large enables transfer of routing preferences from T-edges only to highly similar B-edges. However, as the value increases, the graph used in the graph-based transduction learning may become disconnected. Thus, some B-edges cannot be associated with transferred preferences and thus get a null preference vector. A smaller has the effect that the graph used in the graph-based transduction obtains many edges and thus takes longer run-time. The setting gives the best trade-off, i.e., relatively high accuracy and efficiency and low null rate. We thus use this value in the remaining experiments. We simply associate fastest paths with B-edges with null preference vectors.

(a) Varying # T-Edges (b) Varying ,
Fig. 9: Parameters of Preference Transfer
(a) By Distance,
(b) By Region,
(c) By Distance,
(d) By Region,
Fig. 10: Accuracy using Equation 1
(a) By Distance,
(b) By Region,
(c) By Distance,
(d) By Region,
Fig. 11: Accuracy using Equation 4
(a) By Distance,
(b) By Regions,
(c) By Distance,
(d) By Regions,
Fig. 12: Efficiency
(a) By distance,
(b) By regions,
(c) By distance,
(d) By regions,
Fig. 13: Comparison with Google Maps

Vii-C Comparisons with Other Routing Algorithms

We proceed to compare L2R with the shortest and fastest routing algorithms and with two personalized routing algorithms. We apply Dijkstra’s algorithm to identify the shortest (Shortest) and fastest paths (Fastest). We do not apply advanced speeding up techniques for routing, e.g., contraction hierarchy [16], since they have no improvement over the accuracy but only over the query efficiency. When applying such speed-up techniques, the efficiency of computing all paths, including the L2R paths, can be improved consistently. We leave such performance improvements as an interesting future research direction.

We also consider two personalized routing algorithms, Dom [26] and TRIP [27], that are able to find personalized “shortest” paths between arbitrary source and destination for individual drivers. The algorithms first learn a global routing preference (rather than a routing preference for each region pair in this paper) for each driver from the driver’s historical trajectories, then use the learned preference to obtain new, personalized weights for all edges, and finally apply shortest-path finding using the new edge weights. Specifically, Dom utilizes a routing preference that considers distance, travel time, and fuel consumption, whereas TRIP uses a routing preference that considers only travel time. In the experiment, we apply each algorithm to learn a routing preference according to a driver’s trajectories in the training data. For each trajectory in the testing data, we obtain the source, the destination, and the driver id. Then we apply Dom and TRIP to compute the personalized, shortest path connecting the source and the destination according to the driver id. Other routing algorithms that use historical trajectories, e.g., [18, 21, 19], do not support routing between arbitrary source and destination, and thus are not comparable to L2R.

Accuracy: The accuracies of L2R, Shortest, Fastest, Dom, and TRIP are calculated using the path similarity functions (see Equations 1 and 4), and are reported in Figures 10 and 11.

Shortest’s accuracy drops as the travel distance increases. This is because Shortest tends to find a path that approximates the straight line segment from a source to a destination. Such paths are often not preferred by drivers. In , when traveling longer distances, highways are usually preferred. However, given the fact that using highways often yield longer travel distances, Shortest does not return such paths. Therefore, the accuracy of Shortest is poor for longer distances.

The accuracy of Fastest is comparable to that of Shortest for small travel distances. However, Fastest achieves much higher accuracy when travel distance is longer. When travelling longer distances, highways usually offer the lowest travel times and are therefore returned by Fastest. Thus, Fastest achieves much better accuracy than does Shortest.

Dom achieves higher accuracy than the other routing methods, except L2R, because it learns routing preferences that consider the trade-off among distance, travel time, and fuel consumption for individual drivers. However, as it conducts an expensive multi-objective skyline routing process, it requires significantly more running time than other methods (see Figure 12). TRIP is slightly more accurate than Fastest due to the personal ratio learned for each driver, and it needs similar running time to Shortest and Fastest.

L2R achieves the highest accuracy in all settings. The accuracy increases as the travel distance becomes longer—this is achieved by capturing the preference for different travel costs and road types in the region graph.

The accuracy of L2R decreases when sources and (or) destinations are not in regions. This is intuitive because when no historical trajectories are available for path finding, an L2R path simply coincides with the fastest path. However, when historical trajectories can be utilized, L2R improves the accuracy of the fastest path (see InOutRegion and OutRegion in Figure 10(b)).

Online Running Time: Run-times are reported in Figure 12. In all settings, L2R is most efficient. This is because the path finding process is conducted on the region graph, which is much smaller than the original road network graph. When sources and (or) destinations are not in regions, the run-time of L2R increases because it needs extra time to identify the fastest paths from the source (destination) to a region.

The personalized routing Dom requires significantly more running time as it conducts an expensive multi-objective skyline routing process. Next, Trip has a running time similar to those of Shortest and Fastest as all three perform single-objective routing. Trip just uses personalized weights.

Offline Processing Time for L2R: When using all training data and default parameters, the offline processing time for constructing the region graph (Section IV) and for executing steps 1–3 to learn and transfer routing preferences (Section V) for are 21, 245, 106, and 7 minutes, respectively, and for are 9, 10, 29, and 0.06 minutes, respectively. Note that such offline processing is parallelizable, e.g., by MapReduce [43, 44].

Vii-D Comparison with Google Maps

We also compare L2R with Google Maps. We query the Google Directions API using a source, a destination, and the departure time from the testing set as arguments to obtain a Google path, which consists of a sequence of waypoints, represented by longitude-latitude coordinates.

We follow an existing methodology [19] to compute the similarity between a Google path and a GT path. We first represent a GT path as a polyline in the longitude-latitude coordinate system. We call this polyline a GT path polyline. Next, we introduce two polylines that are parallel to the GT path polyline and are 10 meters away on each side. We thus obtain a band around the GT path polyline. The solid line in Figure 14 shows the GT path polyline, and the two dashed lines indicate the band. When a Google waypoint is within the band, it is a matched waypoint. We project each matched waypoint onto the GT path polyline to obtain a projection point. If two consecutive waypoints are matched waypoints, we regard the edges between their projection points as the edges on which the Google path is matched to the GT path, e.g., edges , , , , , and in Figure 14. The above enables us to use the path similarity function in the similarity function in Section V-A of the paper submission to measure the similarity between Google and GT paths.

Fig. 14: Google Accuracy

We report the accuracy of Google vs. L2R paths in Figure 13. In particular, the accuracy of Google paths lies between 60% and 85%, and the accuracy increases with the travel distance. However, Google paths show no pattern when we categorize according to whether the source and destination belong to regions. In all settings, L2R achieves higher accuracy, indicating that L2R has the potential to improve the quality of state-of-the-art routing services.

Viii Conclusion and Outlook

We propose a learn-to-route solution that enables comprehensive trajectory-based routing. The solution encompasses an algorithm that clusters road intersections into regions, yielding a derived region graph. It learns routing preferences for region pairs with sufficient trajectories and transfers these preferences to region pairs with insufficiently many trajectories. It then utilizes the learned and transferred preferences to enable routing. Empirical studies offer evidence that the solution is practical and is able to compute high-quality routes. In future work, it is of interest to consider finer granularity modeling of time-dependency, e.g., using a time-varying region graph, real-time region graph updates when receiving new trajectories, and the modeling of more than one preference for each T-edge.


  • [1] C. Guo, B. Yang, J. Hu, and C. S. Jensen, “Learning to route with sparse trajectory sets,” in ICDE, 2018, 12 pages.
  • [2] C. Guo, C. S. Jensen, and B. Yang, “Towards total traffic awareness,” SIGMOD Record, vol. 43, no. 3, pp. 18–23, 2014.
  • [3] Z. Ding, B. Yang, Y. Chi, and L. Guo, “Enabling smart transportation systems: A parallel spatio-temporal database approach,” IEEE Trans. Computers, vol. 65, no. 5, pp. 1377–1391, 2016.
  • [4] J. Hu, B. Yang, C. Guo, and C. S. Jensen, “Risk-aware path selection with time-varying, uncertain travel costs a time series approach,” VLDB Journal, to appear, 2018.
  • [5] Z. Ding, B. Yang, R. H. Güting, and Y. Li, “Network-matched trajectory-based moving-object database: Models and applications,” IEEE Trans. Intelligent Transportation Systems, vol. 16, no. 4, pp. 1918–1928, 2015.
  • [6] C. Guo, B. Yang, O. Andersen, C. S. Jensen, and K. Torp, “Ecosky: Reducing vehicular environmental impact through eco-routing,” in ICDE, 2015, pp. 1412–1415.
  • [7] M. Hua and J. Pei, “Probabilistic path queries in road networks: traffic uncertainty aware path selection,” in EDBT, 2010, pp. 347–358.
  • [8] H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding top-k shortest paths with diversity,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 488–502, 2018.
  • [9] B. Yang, C. Guo, C. S. Jensen, M. Kaul, and S. Shang, “Stochastic skyline route planning under time-varying uncertainty,” in ICDE, 2014, pp. 136–147.
  • [10] H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding top-k optimal sequenced routes,” in ICDE, 2018, p. 12 pages.
  • [11] B. Yang, C. Guo, and C. S. Jensen, “Travel cost inference from sparse, spatio-temporally correlated time series using markov models,” PVLDB, vol. 6, no. 9, pp. 769–780, 2013.
  • [12]

    J. Dai, B. Yang, C. Guo, C. S. Jensen, and J. Hu, “Path cost distribution estimation using trajectory data,”

    PVLDB, vol. 10, no. 3, pp. 85–96, 2016.
  • [13] B. Yang, J. Dai, C. Guo, and C. S. Jensen, “PACE: A PAth-CEntric paradigm for stochastic path finding,” VLDB Journal, online first, 2017.
  • [14] S. Aljubayrin, B. Yang, C. S. Jensen, and R. Zhang, “Finding non-dominated paths in uncertain road networks,” in SIGSPATIAL, 2016, pp. 15:1–15:10.
  • [15] J. Hu, B. Yang, C. S. Jensen, and Y. Ma, “Enabling time-dependent uncertain eco-weights for road networks,” GeoInformatica, vol. 21, no. 1, pp. 57–88, 2017.
  • [16] R. Geisberger, P. Sanders, D. Schultes, and D. Delling, “Contraction hierarchies: Faster and simpler hierarchical routing in road networks,” in WEA, 2008, pp. 319–333.
  • [17] V. Ceikute and C. S. Jensen, “Routing service quality - local driver behavior versus routing services,” in MDM, 2013, pp. 97–106.
  • [18] Z. Chen, H. T. Shen, and X. Zhou, “Discovering popular routes from trajectories,” in ICDE, 2011, pp. 900–911.
  • [19] V. Ceikute and C. S. Jensen, “Vehicle routing with user-generated trajectory data,” in MDM, 2015, pp. 14–23.
  • [20] W. Luo, H. Tan, L. Chen, and L. M. Ni, “Finding time period-based most frequent path in big trajectory data,” in SIGMOD, 2013, pp. 713–724.
  • [21] J. Dai, B. Yang, C. Guo, and Z. Ding, “Personalized route recommendation using big trajectory data,” in ICDE, 2015, pp. 543–554.
  • [22] H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. P. Sondag, “Adaptive fastest path computation on a road network: A traffic mining approach,” in VLDB, 2007, pp. 794–805.
  • [23] L. Wei, Y. Zheng, and W. Peng, “Constructing popular routes from uncertain trajectories,” in SIGKDD, 2012, pp. 195–203.
  • [24] S. Shang, K. Zheng, C. S. Jensen, B. Yang, P. Kalnis, G. Li, and J. Wen, “Discovery of path nearby clusters in spatial networks,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 6, pp. 1505–1518, 2015.
  • [25] A. Balteanu, G. Jossé, and M. Schubert, “Mining driving preferences in multi-cost networks,” in SSTD, 2013, pp. 74–91.
  • [26] B. Yang, C. Guo, Y. Ma, and C. S. Jensen, “Toward personalized, context-aware routing,” VLDB Journal, vol. 24, no. 2, pp. 297–318, 2015.
  • [27] J. Letchner, J. Krumm, and E. Horvitz, “Trip router with individualized preferences (TRIP): incorporating personalization into route planning,” in AAAI, 2006, pp. 1795–1800.
  • [28] D. Delling, A. V. Goldberg, M. Goldszmidt, J. Krumm, K. Talwar, and R. F. Werneck, “Navigation made personal: inferring driving preferences from GPS traces,” in SIGSPATIAL, 2015, pp. 31:1–31:9.
  • [29] P. Newson and J. Krumm, “Hidden Markov map matching through noise and sparseness,” in SIGSPATIAL, 2009, pp. 336–343.
  • [30] X. Liang, J. Zhao, L. Dong, and K. Xu, “Unraveling the origin of exponential law in intra-urban human mobility,” arXiv:1305.6364, 2013.
  • [31] G. Forbes, “Urban roadway classification,” in Urban Street Symposium, 1999, pp. B-6/1–B-6/8.
  • [32] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
  • [33] M. E. Newman, “Analysis of weighted networks,” Physical review E, vol. 70, no. 5, p. 056131, 2004.
  • [34] H. Shiokawa, Y. Fujiwara, and M. Onizuka, “Fast algorithm for modularity-based graph clustering,” in AAAI, 2013, pp. 1170–1176.
  • [35] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques.   Elsevier, 2011.
  • [36] E. Erkut and V. Verter, “Modeling of transport risk for hazardous materials,” Operations Research, vol. 46, no. 5, pp. 625–642, 1998.
  • [37] C. Guo, B. Yang, O. Andersen, C. S. Jensen, and K. Torp, “Ecomark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data,” GeoInformatica, vol. 19, no. 3, pp. 567–599, 2015.
  • [38] C. Guo, Y. Ma, B. Yang, C. S. Jensen, and M. Kaul, “Ecomark: evaluating models of vehicular environmental impact,” in SIGSPATIAL, 2012, pp. 269–278.
  • [39] P. P. Talukdar and K. Crammer, “New regularized algorithms for transductive learning,” in ECML/PKDD, 2009, pp. 442–457.
  • [40] Y. Wang, B. Yang, L. Qu, M. Spaniol, and G. Weikum, “Harvesting facts from textual web sources by constrained label propagation,” in CIKM, 2011, pp. 837–846.
  • [41] G. H. Golub and C. F. Van Loan, Matrix computations.   JHU Press, vol. 3, 2012.
  • [42] B. Yang, M. Kaul, and C. S. Jensen, “Using incomplete information for complete weight annotation of road networks,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 5, pp. 1267–1279, 2014.
  • [43] B. Yang, Q. Ma, W. Qian, and A. Zhou, “TRUSTER: trajectory data processing on clusters,” in DASFAA, 2009, pp. 768–771.
  • [44] P. Yuan, C. Sha, X. Wang, B. Yang, A. Zhou, and S. Yang, “XML structural similarity search using mapreduce,” in WAIM, 2010, pp. 169–181.