Street-level Travel-time Estimation via Aggregated Uber Data

01/13/2020 ∙ by Kelsey Maass, et al. ∙ PNNL 10

Estimating temporal patterns in travel times along road segments in urban settings is of central importance to traffic engineers and city planners. In this work, we propose a methodology to leverage coarse-grained and aggregated travel time data to estimate the street-level travel times of a given metropolitan area. Our main focus is to estimate travel times along the arterial road segments where relevant data are often unavailable. The central idea of our approach is to leverage easy-to-obtain, aggregated data sets with broad spatial coverage, such as the data published by Uber Movement, as the fabric over which other expensive, fine-grained datasets, such as loop counter and probe data, can be overlaid. Our proposed methodology uses a graph representation of the road network and combines several techniques such as graph-based routing, trip sampling, graph sparsification, and least-squares optimization to estimate the street-level travel times. Using sampled trips and weighted shortest-path routing, we iteratively solve constrained least-squares problems to obtain the travel time estimates. We demonstrate our method on the Los Angeles metropolitan-area street network, where aggregated travel time data is available for trips between traffic analysis zones. Additionally, we present techniques to scale our approach via a novel graph pseudo-sparsification technique.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Data-driven mobility modeling and prediction are important aspects of modern urban planning. With respect to travel forecasting, the two major areas of research are demand modeling and travel-time estimation [1], where demand modeling involves generating accurate statistics of the number of trips between origin–destination (O–D) pairs, and travel-time estimation involves predicting the travel times for trips between O–D pairs. The focus of this work is on the latter, specifically, street-level travel-time estimation. In existing research on travel-time estimation, interstate link models have received disproportionate attention from the transportation research community, due primarily to the availability of large amounts of freeway sensor data. Although equally important, the same is not true for arterial models, where coverage is limited due to the costs related to installing probe sensors and associated infrastructure. Under these circumstances, significant insights can be gained at a fraction of the cost with coarse-grained data sets such as the summary statistics of Uber trips.

Uber Movement datasets [22] provide anonymized, aggregated, and coarse-grained O–D travel time statistics at the TAZ (traffic analysis zone) level for many metropolitan areas around the world. TAZs are small geographical units into which a given metropolitan area is divided, characterized by factors such as the total population, type of population, and employment. While Uber Movement datasets can be coarse-grained and have high uncertainty, they cover large geographical regions and are available for multiple metropolitan areas, allowing for generalizability. Similar datasets are also available through other sources (for specific cities) such as the New York City taxi-cab data [14].

Our work is concerned with filling the gap in arterial travel-time estimation. While our prior work [20] focused on travel-time estimation at the TAZ level, this work is concerned with travel-time estimation at the street level. We start with the graph representation of a given road network, where intersections and road segments are represented as vertices and edges, respectively. Next, using the Uber Movement data and the street network graphs as inputs, we iteratively estimate the travel time on each edge for a given time window. At each iteration, we solve a constrained least-squares problem on the pseudo-sparsified graph. While the examples here utilize the coarse-grained, high-coverage Uber Movement data, the developed approach could seamlessly incorporate high-quality, low-uncertainty datasets, such as those from loop counters or probe sensors, by including these elements as constraints. While this work uses methods similar to those in [2], which combines shortest-path routing with a convex optimization formulation, we make the following contributions:

  • We utilize trip sampling in order to leverage coarse-grained, aggregated TAZ-level data. Finer-grained data from sensors as well as street-level trip data can easily be absorbed into our approach.

  • We propose a biasing scheme for sampling travel times from the statistics of the aggregated data, which can significantly improve predicted travel time distributions.

  • We make use of constraints and the convex combinations of sequential iterates as a means of stabilizing solutions and improving convergence rates.

  • We demonstrate the efficacy of a graph pseudo-sparsification technique that can improve scalability with little loss of accuracy. We are currently building on this approach to scaling to much larger network sizes that encompass entire metropolitan areas with the aid of high performance computing resources.

The paper is organized as follows. Section 2 describes the Uber Movement and the LA road network data used in this work. In the subsequent section (Section 3), we describe the forward model and simulated trips, followed by the optimization methodology (Section 4) used in this work. Section 5 presents our principal experimental results, followed by the methods and scaling results related to our pseudo-sparsification technique in Section 6. Section 7

describes related work in the area of graph analytics, optimization, and machine learning applied to road networks. We conclude the paper with a note on future work in Section


2 Datasets

Our primary data sources for this work are from Uber Movement [22] and the road network for the LA area. Uber has released a trove of aggregated and anonymized data on travel-time and average-speed statistics for a large number of cities around the world [22]. Because gathering transportation data is an expensive and cumbersome process, leveraging these surrogate datasets is expected to help researchers and city planners conduct quick and fairly detailed studies of the various aspects of vehicle mobility in urban settings. In this work, we focus on Uber’s travel time data, which includes statistics for travel times between pairs of TAZs or census tracts for each hour of the day and day of the week.

The graph representation of the LA city road network is made available to us as part of the project. Similar analyses is possible with the open-source version of the maps provided by OpenStreetMaps

[16]. Table 1 lists some basic network properties of the full LA road network, along with sub-networks formed with different radii around downtown LA. Other than the number of TAZs, most of the structural properties are similar across all of the networks. The number of TAZs included increases with the size of the graph.

Area Vertices Edges TAZ min(Deg) max(Deg) avg(Deg) Clust. Coeff.
LA_DT+1 3239 7138 25 2 12 2.2 0.018
LA_DT+2 9879 22756 79 2 12 2.3 0.025
LA_DT+3 16906 40929 159 2 12 2.4 0.026
LA_full 368419 905622 2205 2 14 2.46 0.034
Table 1: Structural properties of LA road networks of varying sizes: one, two and three miles radii of downtown LA (DT) and the full LA network.

3 The graph-based forward model

In this section we describe a forward model for travel-time prediction, and in the following section we show how the parameters of this model can be estimated by means of an optimizer. Let denote a set of edges that forms a path from vertex to vertex in the road network graph. The expected travel time between the vertices and can be computed as


where represents the expected travel time along edge

, the vector

represents the travel time along all edges in the graph, and the binary vector encodes the edges in . The choice of optimal routing () between the origin and destination vertices is a variable in the model. Routing is typically done with special routing software such as the Open Source Routing Machine [13]. However, in this work, we use shortest-path routes where the edges are weighted by travel times [2, 12, 24]. Specifically, for initialization, we use weights determined by free-flow travel times , the time it takes to travel along a road segment free of congestion, computed by dividing the length of the road segment by the posted speed limit.

3.1 Training and testing tasks

Eq. 1 is based on O–D pairs in the road network graph. However, the Uber Movement data is provided at the much coarser granularity of TAZ O–D pairs, consisting of travel-time statistics computed over all trips originating at a given TAZ and ending in another during a particular hour of the day. Furthermore, not all TAZ pairs are included in the data.

For a given hour of the day and geographic area, we first collect all the TAZ O–D pair statistics available from Uber Movement. We then choose a random 90-10 split of the available TAZ O–D pairs for training and testing, respectively. For each of the TAZ O–D pairs, we simulate trips by sampling vertices from the origin and destination TAZs to form vertex O–D pairs, assigning trip times by sampling from a log-normal distribution based on the geometric mean and geometric standard deviation travel times given in the Uber dataset.

After estimating edge travel times using our vertex-level training data, testing is done at the TAZ level using the geometric mean travel time for all vertex O–D pairs present in a given test TAZ O–D pair. For example, suppose there are simulated trips from TAZ to TAZ . The estimated geometric mean travel time would then be computed as


where vector encodes the edges in the weighted shortest-path between sampled vertex O–D pair . The vertex O–D sampling and travel-time sampling are described in detail below.

3.1.1 Vertex sampling

For each iteration of our edge travel-time estimation algorithm, we sample vertex O–D pairs each from our training and test sets, letting the number of simulated trips for each TAZ O–D pair be proportional to the size of the two TAZs. Specifically, for each TAZ O–D pair in the current subset of the Uber dataset, we let be the product of the number of vertices in origin TAZ and destination TAZ , then we set the number of simulated trips .

We sample the

origin and destination vertices for a given TAZ O–D pair uniformly from all vertices within their respective TAZ. We note that this process selects the shortest-path edges with a probability proportional to their local betweenness centrality, which in turn correlates with the importance of the edge with respect to traffic flow

[5, 10, 17].

3.1.2 Travel-time sampling

Let represent the free-flow shortest-path matrix for our simulated trips. We assign travel times to trips based on the ordering of free-flow shortest-path travel times for all vertex O–D pairs within a TAZ O–D pair, summarized in Algorithm 1 below. Here denotes the vector of free-flow travel times for each of the edges.

Input: Free-flow shortest-paths matrix
Output: Sampled travel times vector

  for all TAZ O–D pairs  do
     Sample travel times from log-normal distribution
     Get indices and of longest to shortest travel times for and , respectively
     for  do
     end for
  end for
Algorithm 1 Travel-time sampling

In practice, biased travel-time sampling appears to improve results when there is a relatively large number of trips for a given TAZ O–D pair. For example, Fig. 1 illustrates the difference between estimated trip travel-time distributions with and without biased travel-time sampling. For 2000 trip times sampled from the log-normal distribution of a given TAZ O–D pair (left), the distribution of estimated trip travel times deviates less from the target distribution (red line, same across panels) when we bias the travel times assigned to each vertex O–D pair (center) than when we do not (right).

Figure 1: Illustrating the effect of biasing travel-time assignments for simulated trips for a TAZ O-D pair with trips. (Left) Distribution of simulated travel times according to the log-normal distribution drawn in red. (Center) Distribution of estimated trip travel times computed using biased sampling . (Right) Distribution of estimated trip travel times computed using unbiased sampling .

4 The optimization process

Our approach solves a series of constrained least-squares problems to fit edge travel times to simulated trips (vertex O–D pairs, travel times, and routes) with travel-time statistics consistent with the Uber dataset. For each iteration , we sample vertex O–D pairs with sampled travel times vector and weighted shortest-paths matrix . Our goal is to then estimate edge travel times vector so that to satisfy our forward model in Eq. 1. Thus we have a system of equations with unknowns where is the number of road segments (or edges in the graph). The estimates are then updated using a convex combination of the previous estimates and the solution found by minimizing the mean-squared error with constraints on the unknown coefficients, illustrated by Eqs. 3-5 below.


In our implementation, we initialize with the free-flow estimates and let , we constrain the elements of the vector to be bounded below by and above by , and we update our weight using the constant = 0.9 for all . Our weighted shortest-paths matrix is recomputed each iteration using our current estimates as weights, and our constrained least-squares sub-problems are solved using the lsq_linear function from the well-known Python scientific computing library SciPy. We run the optimizer until the estimates converge (measured by the magnitude of the average change in the solution vector between iterations, Eq. 6 with ), or we reach a maximum number of iterations . This iterative scheme is similar to the one proposed in [2]

but with a choice of optimizer that favors scalability in conjunction with a number of heuristics that improve convergence. The workflow of our proposed travel-time estimation algorithm is given in Algorithm 



Input: Travel time statistics for TAZ O–D pairs; Road network ; Number of trips
Output: Estimated travel times along edges

  Determine number of trips for each TAZ O–D pair (Section 3.1.1)
  Initialize edges with free-flow travel times and
  while  Eq. 6 not satisfied do
     Sample vertex O–D pairs
     Compute shortest-path matrix using weights
     Compute using biased travel-time sampling (Algorithm 1)
     Compute according to Eqs. 3-5
  end while
Algorithm 2 Travel-time Estimation

5 Experimental results

We implement the workflow Algorithm 2 on a road network encompassing a three mile radius of downtown Los Angeles (network LA_DT+3 in Table 1). Fig. 2 shows the convergence of the travel-time error (both training and test sets) for the road network at 3am and 6pm, measured by


where is the total number of trips in the training (test) data, is the number of trips for TAZ O–D pair in the training (test) subset, is the current estimated geometric mean travel time (Eq. 2), and is the geometric mean travel-time from the Uber Movement dataset. This RMSLE error metric represents the relative mean-squared error in the logarithm of the estimated geometric mean travel time [2]. Fig. 3 shows the estimated travel times (as a percent of free-flow travel time ) for both 3am and 6pm, where congested regions (red) are clearly visible in the 6pm plot.

Figure 2: Travel-time estimation results for downtown LA for off-peak (3am) and peak traffic (6pm). Top: Convergence of travel time error . Bottom: Geometric mean travel times calculated using free-flow edge weights and estimated edge weights compared to ground-truth data from Uber.
Figure 3: Estimated travel time as a percent of free-flow travel time in downtown LA for off-peak traffic (3am, left) and peak traffic (6pm, right).
Figure 4: Scaling results for travel time estimation in downtown LA for peak traffic (6pm). Left column: Results for one, two, and three mile radii of downtown. Right column: Results for 1000, 3000, and 5000 simulated trips per iteration. Top row: The average time per iteration scales proportionally with both the number of edges and number of trips. Bottom row: The travel time errors for both training and test sets achieve their lowest values for larger problem sizes.

6 Scalability and graph sparsification

The computational complexity of our proposed model is proportional to the number of trips and the number of the edges (Fig. 4). Due to the amount of data available, including vertices and edges in the full LA road network graph and between (3am) and

(6pm) TAZ O–D pairs in the 2019 Quarter 1 Weekdays-Only Uber dataset, the constrained least-squares optimizer becomes a computational bottleneck, limiting our ability to solve large systems (i.e., street-level travel-time estimation of a large geographical area). Since our goal is to generate a data-driven solution, we aim to use as much data as possible, therefore leading us to systemically reduce the number of unknowns. One strategy would be to sparsify the graph by dropping edges according to a pre-determined criterion but still maintaining the graph connectivity. This strategy, however, changes routing, which can result in skewed travel-time estimations. For example, consider two paths between vertex

and : the shortest route through intermediate vertices and , and an alternative, but longer, route through vertices and . If a sparsification algorithm removes the road segment between and based on some metric (e.g., low edge betweenness centrality), our model would incorrectly estimate the travel time from to using the available road segment through and .

In light of this, our approach is to pseudo-sparsify the underlying road network. Rather than remove edges from the graph, we sort road segments into two sets based on their significance with respect to traffic flow (e.g., betweenness centrality). We then estimate the travel-time along the edges with the top significance, setting the travel times along the remaining edges to the free-flow travel times. In our implementation, we sort edges based on their betweenness centrality, computed using shortest-paths weighted by free-flow travel times. For instance, if are the indices of the edges with the top betweenness and are indices of the remaining edges, we solve a modified version of Eqs. 3-4 each iteration:


In this way, we can control the number of unknowns and scale our method for larger geographical areas.

We test our pseudo-sparsification approach using different values of (Figs. 5-6). In Fig. 5 (left), we first run our algorithm on the full graph (i.e., ), and plot the relative difference between our estimated travel times and the initial free-flow travel times versus the percentile of the edges sorted by betweenness centrality. We observe that roughly of lowest betweenness edges retain their free-flow travel time as the final estimate, suggesting that we can set these edges to their free-flow travel times to reduce the number of unknowns.

Of course, drivers rarely maintain the exact speed limit. To account for this uncertainty in free-flow travel times, we sort the edges into bins based on their betweenness percentiles and plot the fraction of edges within each bin that have a relative difference within a factor of of their free-flow travel times (Fig. 5, right). We observe that the high betweenness edges are less likely to have an estimated travel time close to their free-flow times irrespective of the level of uncertainty . Therefore, we can pseudo-sparsify the graph to different extents depending upon the level of uncertainty that can be tolerated.

In Fig. 6, we vary the number of edges by estimating only the top 50%, 75%, and 100% of edges by betweenness. Here the time per iteration scales proportionally with the number of unknown edges included in the problem, and the travel-time error given by Eq. 7 increases as we assign more edges to their free-flow travel times.

Figure 5: Relative difference () between estimated travel time and free-flow travel time by percentile of betweenness for peak downtown LA traffic (left) and fraction of estimated edge travel times that are within of free-flow (), where edges are binned by percentile (right). Edge betweenness centrality is calculated using shortest paths weighted by free-flow travel time.
Figure 6: Scaling results for travel time estimation in downtown LA for peak traffic (6pm) solved using pseudo-sparsified graphs, where 50%, 75%, and 100% of the edges were estimated and the remainder set to free-flow travel times. Left: The average time per iteration scales proportionally with the percent of edges estimated. Right: Travel time errors decrease as less edges are set to free-flow travel time.

7 Related work

Urban traffic modeling has seen a recent surge of interest in the use of graph analytics and machine learning methods. Reference [23] provides an overview of many of these methods. In this section we summarize the prior work related to both of these themes as applied to road transportation networks.

The authors in [21] use graph analytics on networks with three different weighting schemes to perform a statistical characterization of the Beijing road network. Many prior publications consider betweenness centrality [6] to be an important metric when applied to road networks, as it is argued to be a direct predictor of important links in urban transport. It has been shown that betweenness centrality is highly correlated with the traffic flow count on a road network [5, 10, 17], a natural result of including travel time as a factor when selecting trip routes. In real-world scenarios, however, route choices are also influenced by time-of-day and other socio-economic factors. Using these observations, the authors in [18] define an augmented betweenness centrality measure where shortest paths are weighted according to a traffic demand model based on census tracts and traffic analysis zones. The authors show that this new centrality measure correlates better with traffic flow than other centrality measures. Similarly, in [4] the authors employ analytics in the form of novel graph centrality measures to derive insights into the traffic flow patters in Singapore. Graph models, along with heterogeneous data sources, were leveraged to understand the urban traffic patterns in [15]. Finally, the authors in [7] utilize a grid-based fabric and cellular automata for modeling arterial traffic, resulting in gains in computational efficiency.

Many approaches make use of machine learning models and optimization methods to model various aspects of urban traffic flow. In [11]

, the authors leverage a deep-learning approach in the form of a diffusion convolutional recurrent neural network (DCRNN) to forecast short-term freeway traffic counts in the LA and San Francisco Bay Area networks. The authors in


also propose a deep-learning approach that brings together convolutional neural networks and recurrent neural networks with long short-term memory (LSTM) units, utilizing their architecture for short-term traffic count extrapolation at 349 locations on the Beijing road network. In a set of articles

[19, 25]

, the authors leverage data from Bluetooth and GPS probe sensors for travel-time estimation and validation. Coupled hidden Markov models (CHMM) were used in

[8] to model the evolution of traffic states, applied to a sparse taxi-fleet dataset for the San Francisco Bay area road network. In a subsequent publication [9]

leveraging the same dataset, the authors employ a dynamic Bayesian network to learn arterial dynamics.

8 Conclusions and future work

In this work we leveraged coarse-grained Uber Movement data in the form of TAZ O–D pair summary statistics to provide estimates of fine-grained, street-level travel times. Our techniques for trip simulation and biased travel-time sampling were used in conjunction with weighted shortest-path routing to set up a system of linear equations with unknown edge travel times. The travel times were iteratively refined using a constrained least-squares optimizer for multiple batches of simulated trips. In our largest road network (40,522 edges and 24,472 TAZ O–D pairs), we achieved a RMSLE of 0.28 on our test set after 13 iterations with 4.7 hours total runtime (32-core Intel Xeon E7-8860 system with 2.27 GHz processor speed and 512 GB primary memory). We demonstrated a graph pseudo-sparsification algorithm in order to improve the computational efficiency of our estimation routines. Our future work involves improving the optimization process, formalizing the sparsification approach, and scaling the approach to metropolitan-sized road networks using high-performance computing systems.


  • [1] E. Beimborn et al., A transportation modeling primer, (2006).
  • [2] D. Bertsimas, A. Delarue, P. Jaillet, and S. Martin, Travel time estimation in the age of big data, Operations Research, 67 (2019), pp. 498–515.
  • [3] X. Cheng, R. Zhang, J. Zhou, and W. Xu, Deeptransport: Learning spatial-temporal dependency for traffic condition forecasting, in 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, 2018, pp. 1–8.
  • [4] Y.-Y. Cheng, R. K.-W. Lee, E.-P. Lim, and F. Zhu, Measuring centralities for transportation networks beyond structures, in Applications of social media and social network analysis, Springer, 2015, pp. 23–39.
  • [5] P. Crucitti, V. Latora, and S. Porta, Centrality in networks of urban streets, Chaos: an interdisciplinary journal of nonlinear science, 16 (2006), p. 015113.
  • [6] L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry, (1977), pp. 35–41.
  • [7] P. Gundaliya, T. V. Mathew, and S. L. Dhingra, Heterogeneous traffic flow modelling for an arterial using grid based approach, Journal of Advanced Transportation, 42 (2008), pp. 467–491.
  • [8] R. Herring, A. Hofleitner, P. Abbeel, and A. Bayen, Estimating arterial traffic conditions using sparse probe data, in 13th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2010, pp. 929–936.
  • [9] A. Hofleitner, R. Herring, P. Abbeel, and A. Bayen, Learning the dynamics of arterial traffic from probe data using a dynamic bayesian network, IEEE Transactions on Intelligent Transportation Systems, 13 (2012), pp. 1679–1693.
  • [10] B. Jiang, Street hierarchies: a minority of streets account for a majority of traffic flow, International Journal of Geographical Information Science, 23 (2009), pp. 1033–1048.
  • [11] Y. Li, R. Yu, C. Shahabi, and Y. Liu, Diffusion convolutional recurrent neural network: Data-driven traffic forecasting, in International Conference on Learning Representations, 2018,
  • [12] E. H.-C. Lu, C.-C. Lin, and V. S. Tseng, Mining the shortest path within a travel time constraint in road network environments, in 2008 11th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2008, pp. 593–598.
  • [13] D. Luxen and C. Vetter, Real-time routing with openstreetmap data, in Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’11, New York, NY, USA, 2011, ACM, pp. 513–516,,
  • [14] NYC Taxi and Limousine Commission, NYC yellow and green taxi trip records, ., 2019.
  • [15] K. S. Oberoi, G. Del Mondo, Y. Dupuis, and P. Vasseur, Spatial modeling of urban road traffic using graph theory, in Spatial Analysis and GEOmatics 2017, 2017.
  • [16] OpenStreetMap contributors, Planet dump retrieved from ., 2019.
  • [17] S. Porta, P. Crucitti, and V. Latora, The network analysis of urban streets: a primal approach, Environment and Planning B: planning and design, 33 (2006), pp. 705–725.
  • [18] R. Puzis, Y. Altshuler, Y. Elovici, S. Bekhor, Y. Shiftan, and A. Pentland, Augmented betweenness centrality for environmentally aware traffic monitoring in transportation networks, Journal of Intelligent Transportation Systems, 17 (2013), pp. 91–105.
  • [19] H. A. Rakha, H. Chen, A. Haghani, X. Zhang, M. Hamedi, et al., Use of probe data for arterial roadway travel time estimation and freeway medium-term travel time prediction, tech. report, Mid-Atlantic Universities Transportation Center, 2015.
  • [20] A. V. Sathanur, V. Amatya, A. Khan, R. Rallo, and K. Maass, Graph analytics and optimization methods for insights from the uber movement data, in Proceedings of the 2Nd ACM/EIGSCC Symposium on Smart Cities and Communities, SCC ’19, ACM, 2019, pp. 2:1–2:7.
  • [21] Z. Tian, L. Jia, H. Dong, F. Su, and Z. Zhang, Analysis of urban road traffic network based on complex network, Procedia engineering, 137 (2016), pp. 537–546.
  • [22] Uber Technologies, Inc., Data retrieved from Uber Movement, (c) 2019 ., 2019.
  • [23] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, Short-term traffic forecasting: Where we are and where we’re going, Transportation Research Part C: Emerging Technologies, 43 (2014), pp. 3–19.
  • [24] L. Wu, X. Xiao, D. Deng, G. Cong, A. D. Zhu, and S. Zhou, Shortest path and distance queries on road networks: An experimental evaluation, Proceedings of the VLDB Endowment, 5 (2012), pp. 406–417.
  • [25] X. Zhang, M. Hamedi, and A. Haghani, Arterial travel time validation and augmentation with two independent data sources, Transportation Research Record, 2526 (2015), pp. 79–89.