Taxi demand forecasting: A HEDGE based tessellation strategy for improved accuracy

A key problem in location-based modeling and forecasting lies in identifying suitable spatial and temporal resolutions. In particular, judicious spatial partitioning can play a significant role in enhancing the performance of location-based forecasting models. In this work, we investigate two widely used tessellation strategies for partitioning city space, in the context of real-time taxi demand forecasting. Our study compares (i) Geohash tessellation, and (ii) Voronoi tessellation, using two distinct taxi demand datasets, over multiple time scales. For the purpose of comparison, we employ classical time-series tools to model the spatio-temporal demand. Our study finds that the performance of each tessellation strategy is highly dependent on the city geography, spatial distribution of the data, and the time of the day, and that neither strategy is found to perform optimally across the forecast horizon. We propose a hybrid tessellation algorithm that picks the best tessellation strategy at each instant, based on their performance in the recent past. Our hybrid algorithm is a non-stationary variant of the well-known HEDGE algorithm for choosing the best advice from multiple experts. We show that the hybrid tessellation strategy performs consistently better than either of the two strategies across the data sets considered, at multiple time scales, and with different performance metrics. We achieve an average accuracy of above 80 km^2 for both data sets considered at 60 minute aggregation levels.



There are no comments yet.


page 4


Taxi Demand-Supply Forecasting: Impact of Spatial Partitioning on the Performance of Neural Networks

In this paper, we investigate the significance of choosing an appropriat...

Predicting Short-Term Uber Demand Using Spatio-Temporal Modeling: A New York City Case Study

The demand for e-hailing services is growing rapidly, especially in larg...

Balancing Taxi Distribution in City-Scale Dynamic Ridesharing Service: A Hybrid Solution Based on Demand Learning

In this paper, we study the challenging problem of how to balance taxi d...

A Context Integrated Relational Spatio-Temporal Model for Demand and Supply Forecasting

Traditional methods for demand forecasting only focus on modeling the te...

Grids versus Graphs: Partitioning Space for Improved Taxi Demand-Supply Forecasts

Accurate taxi demand-supply forecasting is a challenging application of ...

To aggregate or not to aggregate: Forecasting of finite autocorrelated demand

Temporal aggregation is an intuitively appealing approach to deal with d...

Intermittent Demand Forecasting with Renewal Processes

Intermittency is a common and challenging problem in demand forecasting....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Mobile application based e-hailing taxi services are gaining huge popularity across the world due to the advancement in GPS based smart phone technologies. These services can be used to supplement the use of public transit and other traditional modes of transportation. A key challenge faced by these fast growing taxi services is the demand - supply mismatch problem. During peak hours, the demand for taxis surpass the available supply, creating unmet demand. During off-peak hours, the scenario reverses and the vacant taxis cruise for longer periods to find passengers. These issues lead to dynamic surge pricing, along with reduced customer satisfaction and low driver utilization. Therefore, it is crucial to devise accurate location-specific demand forecasting algorithms, so as to gain prior knowledge of under-supplied and over-supplied areas. This knowledge can be used to mitigate the demand-supply imbalance by re-routing vacant taxi drivers, to compensate for the unmet demand. One of the key steps towards devising efficient location-based forecasting algorithms is the selection of proper spatial resolution for forecasting. The spatial resolution should not be too high, each cell should contain sufficient demand density for prediction to be effective. At the same time, the resolution should not be too low so that the driver has to cruise a large distance before finding a passenger. Hence, a carefully planned tessellation strategy is a key step in any location-based modeling exercise.

I-a Related works

Several attempts have already been made to study the supply-demand levels and imbalances in taxi services [1, 2]. Identification and modeling of passenger hot spots for rerouting taxi drivers is also a widely researched area. Various methodologies have been proposed to this end, with Auto Regressive Integrated Moving Average model (ARIMA) and its variants [3, 4], Exponential Weighted Average models [5], Nearest Neighbour clustering [6, 7]

, Neural Networks (NN)

[8] being some of the commonly used modeling techniques. Irrespective of the modeling technique used, the preliminary step in modeling a location based entity such as passenger demand is to spatially partition the space. In the transportation literature, two common tessellation strategies are used. Spatial aggregations are either performed using grids, where the space is partitioned into square or rectangular grids of fixed area [3, 8, 9], or using polygons, where the space is partitioned into regular or irregular polygons of variable area [10, 11, 12]. It is common practice in the transportation literature to consider either one of these tessellation styles for spatial partitioning, without motivating the choice of the tessellation technique.

In our previous work [13], we tessellated the city of Bengaluru, India, into fixed-sized partitions known as geohashes. We observed that for those regions with low demand density, fixed sized partitions resulted in data scarcity, which led to low model accuracy. To improve the accuracy, we explored the spatial correlation between the neighbouring geohashes to enhance the performance of the models. The performance limitation due to the chosen tessellation strategy motivated us to conduct a comparison study of various tessellation strategies. In this scenario, a few significant questions arise: (i) How can one decide the tessellation scheme to be used?, (ii) How sensitive is the performance of the models to the tessellation strategy?, (iii) Can we arrive at a tessellation strategy that works for a broad range of datasets? We aim to address these questions through this paper. To the best of our knowledge, an extensive study of the tessellation techniques and their effects on the model performance have not been conducted in the past. In this work, we explore the relationships between tessellation strategies, demand densities and city geographies. This is one of the features that distinguishes our work from the existing literature.

For comparing fixed and variable sized tessellation styles, we choose Geohash tessellation, and Centroidal Voronoi tessellation with K-Means respectively. Intriguingly, each tessellation strategy has been shown to outperform the other

[14, 15]. In [14], the authors claimed that Geohash technique works better at partitioning data than Voronoi technique for their data set, while in [15], authors preferred Voronoi over Geohash for forecasting supply of drivers in the top localities. In this work, we favour a partition based clustering technique such as K-Means [16] over other clustering techniques. The reader is referred to [17] and [18] for a comprehensive survey of various clustering algorithms. In the survey paper [18], K-Means is listed as a potential clustering algorithm for large scale data, due to its low time complexity, and high computing efficiency. K-Means has a linear memory and time complexity, which is ideal for our very large data sets. It has also been shown that K-Means performs reasonably well in comparison with other clustering techniques; for example, DBSCAN [20]

and Hierarchical clustering

[19], among others. In addition, K-Means has been widely used in the literature to generate Centroidal Voronoi polygons [21]. Of the widely used modeling techniques, Exponential Smoothing and ARIMA models are known for their simplicity in implementation and high computational speed [22]

. These techniques have also been observed to perform satisfactorily on comparison with other conventional methods such as Bayesian networks, SVR, and Artifical Neural Networks (ANN)

[23, 24]. Authors in [3] proposed an improved ARIMA based prediction method to forecast the spatial-temporal variation of passengers in the hot spots with a prediction error of 5.8%. Using a time varying Poisson process and ARIMA model, the work in [4] obtained an accuracy of 76% for passenger demand on taxi services. Clustering along with Exponential Weighted Moving Average models were used to predict passenger demand hot spots with a 79.6% hit ratio in [5]. The subway ridership demand was analysed in [25] using a combined ARIMA-GARCH model with a maximum error of 7%. This promising performance of regression and smoothing based models in the context of passenger demand modeling, along with high computational speeds, make them ideal choices for our comparison study.

Thus, we aim to perform an extensive city wide spatio-temporal analysis and compare the two tessellation techniques for two independent datasets – an e-hailing mobile application based demand data set in Bengaluru, India and a street hailing based taxi demand data set in New York, USA. Next, in addition to comparing the two strategies, we also propose a hybrid tessellation strategy by combining the two tessellation strategies based on their past performances, that works for any data set. No similar effort has been made in the transportation literature to combine tessellation strategies, and this feature also sets our work apart from the existing works.

I-B Our contributions

The demand data points are segregated into clusters using K-Means clustering algorithm. The obtained cluster centroids are used to partition the city into geohashes and Voronoi cells. In the past, road intersections [10, 12] and bus stops [26] are employed to act as tessellation centers. In contrast, we use K-Means cluster centroids to act as the tessellation centers. Different time-series modeling techniques are applied to the Geohash aggregated data and Voronoi aggregated data to compare the two techniques. The data is aggregated and analyzed at two different time-scales; at 15 minutes to facilitate response to real-time decisions and, at 60 minutes to observe the high-level patterns. While comparing the two tessellation strategies, we observe that performances of the strategies do not remain consistent. Performances of the strategies are found to vary with different data sets. They also showed time dependent variations within each data set. In order to deal with this apparent non-stationarity, we need a combining algorithm that can pick the best tessellation strategy at every time instant.

The method of using advises from multiple experts111In our case, the two “experts” refer to the two tessellation strategies. was first introduced in [27], and later generalized in [28] to arrive at the well-known HEDGE algorithm. Specifically, by adopting a multiplicative update of weight parameter, the authors of [28] were able to produce algorithms performing almost as well as the best expert in the pool. The authors in [29] extend the HEDGE algorithm to include non-stationary experts, by introducing a discounting factor. We modify the discounted variant of HEDGE to arrive at a hybrid tessellation strategy for enhanced taxi demand forecasts.

The key contributions and findings of this paper can be summarized as follows:

  1. We conduct a comparative study of Voronoi and Geohash tessellation strategies for spatial demand partitioning.

  2. We highlight the dependence of the tessellation strategy on time of the day, and on the properties of the data set.

  3. We find that models based on Voronoi tessellation have a superior performance compared to models based on Geohash tessellation for demand scarce cells. On the other hand, Geohash tessellation performs better than Voronoi tessellation in demand dense cells.

  4. We develop a hybrid tessellation strategy using a HEDGE based combining algorithm. We find that our hybrid strategy always performs at least as good as the best strategy out of the list of experts.

  5. Specifically, our algorithm picks the best tessellation strategy for each instant in the forecasting horizon, across various temporal and spatial resolutions.

At a broader level, this work demonstrates the potential of improving location-based forecasts by efficiently partitioning large-scale temporal and spatial data for taxi hailing services. The rest of the paper is organized as follows: Section II defines the problem, followed by a brief overview of the data, and the tessellation strategies in Section III. The time-series modelling procedure is explained, along with preliminary results in Section IV. The proposed model combining algorithm and the results are provided in Section V, and we conclude our work in Section VI.

Ii Problem setting

Let be the number of taxi bookings ( demand) observed at time in a particular region . In order to model , we can represent the bookings as a time-sequence, where the sequence contains points aggregated over the space at time . The space can be fixed-sized or variable-sized. In our case, we denote them as geohash and Voronoi cell respectively. Geohash partitioning of a city is trivial, as it does not incorporate neighbour information or data volume. On the other hand, to perform Voronoi partitioning, we need to run a clustering algorithm to cluster nearby points together. In order to limit vacant taxi cruising, we aim to route drivers to regions of area no more than 1 . The parameters in the clustering algorithm are set accordingly. We define the of a K-Means cluster centroid as the number of data points closer to that centroid than to any other centroid. We assume that a unique smallest distance K-Means centroid exists for each data point.

For an idle driver, we aim to provide him/her with the location of nearest suitable K-Means centroid. The demand aggregated from a Voronoi cell may vary in magnitufr from the demand aggregated from its corresponding geohash. Hence, for a standard comparison of the two techniques, we normalize the demand by the area of the partition to obtain demand per . The area normalized demand for geohash/Voronoi partition is computed as follows:


where refers to the demand in P, is the sampling period of 15 minutes or 60 minutes. This is used to generate time-sequences for further analysis. The SMAPE and MASE [30] are the error performance metrics that are considered in this work, and are defined in Section IV. See Figure 1 for a schematic representation of the work to be conducted. We are mainly interested in the Levels I and III of the flowchart, with nodes in Level II acting as tools for comparison. The data sets used for this study are mentioned below.

Ii-a Data set description

The Bengaluru taxi demand data is acquired from a leading Indian e-hailing taxi service provider. The data contains GPS traces of taxi passengers booking a taxi by logging into the mobile app. The data is available for a period of two months, of January 2016 to of February 2016. The data set contains latitude-longitude coordinates of the logged in customer, along with his/her identification number, session duration and time stamp. The latitude and longitude coordinates of the city is 12.9716° N, 77.5946° E, with an area of approximately 740 .

The New York yellow taxi cab data set is publicly available at [31]. The data set contains GPS traces of government-run street hailing Yellow taxis. This data set differs from our mobile app based data set, both in terms of data volume and city structure. The geographical structure of Bengaluru city is radial, while that of New York city is linear. We considered the period of January-February 2016, for analysis. We extract the pick up locations and time stamps from the data to form the demand set. The latitude and longitude coordinates of New York city is 40.7128° N, 74.0059° W, with an area of approximately 780 .







Level I


Level II

Level III

Hybrid tessellation strategy
Figure 1: Flow chart of the study.







(a) Voronoi tessellation







(b) Geohash tessellation
Figure 2: Heatmaps obtained after partitioning Bengaluru city into Voronoi cells and 6-level geohashes.







(a) Voronoi tessellation







(b) Geohash tessellation
Figure 3: Heatmaps obtained after partitioning New York city into Voronoi cells and 6-level geohashes.

Iii Tessellation strategies

As motivated in Section II, we perform spatial partitioning of the city using two tessellation strategies for the purpose of modeling and comparison. First, we generate clusters of demand points for each city. Later, the centroids of these clusters aid in the generation of tessellation cells.

Iii-a K-Means algorithm

K-Means [16]

is a widely used unsupervised learning algorithm to classify a given data set through a certain number of clusters (assume

clusters) fixed apriori. This algorithm aims at minimizing the squared error function given as:


where is the Euclidean distance between a data point and its center , is the total number of data points. For ensuring that the average cluster area remains around 1 , the number of centers is set to 740 and 780 for clustering Bengaluru and New York city respectively. For each data point, the algorithm calculates the distance from the data point to each cluster. If the data point is closest to its own cluster, leave it where it is, else, move it into the closest cluster. The algorithm stops when no data point is reassigned. By sorting these clusters in descending order of density, we can find locations that generate relatively higher demand (hot spots) compared to other locations.

Iii-B Voronoi tessellation

Voronoi tessellation is a spatial partitioning method that divides space according to the nearest neighbour-rule. Specifically, each point, called a seed or site, is associated with the region ( Voronoi cell) that is closer to it than to all other points in the space. A Voronoi tessellation is called centroidal when the generating site of each Voronoi cell is also its mean (center of mass). In our work, these sites are obtained from the K-Means algorithm. Based on the closeness of sites, this tessellation strategy produces polygon partitions of varying areas. For eg., the 740 demand centers result in 740 variable-sized quadrilateral partitions for Bengaluru city. We remark that the overall time complexity of Voronoi and K-Means algorithm is O(nlogn). Refer Figures 1(a) and 2(a) for heat maps generated from voronoi tessellations of Bengaluru and New York city. The partitioned cells are color-coded according to their sample size, for ease of representation on the color scale.

Iii-C Geohash tessellation

Geohash is a technique that hashes the latitude and longitude coordinates into a character string. It is an extension of grid based method, with a simple naming convention. Note that with Geohash (first G capitalized), we refer to the technique of encoding a coordinate pair into a single string, while geohash (all small letters) refers to the string itself. Geohash can be visualized as a division of Earth into 32 planes, each of which can be divided again into 32 planes, and so on. We refer to these divisions as Geohash levels by defining level x as the division that results in geohashes of length x. A 6-level geohash spans a grid of area 1.2 km 0.6 km, covering approximately 1 , where a 5-level geohash spans an area of 4.9 km 4.9 km, covering approximately 25 . We choose 6-level grids over 5-level as it is more sensible to route drivers to a smaller area. In terms of time complexity, this algorithm is O(1). Refer Figures 1(b) and 2(b) for heat maps obtained from geohash tessellations of Bengaluru and New York city.

area Area (in ) freqFrequency cluVoronoi geoGeohash 00 1 22 3 44 5 20 4040 60 8080 100

(a) Cell area (Bengaluru)

sSample size freqFrequency cluVoronoi geoGeohash 00 3000030000 6000060000 5050 100100 150

(b) Cell volume (Bengaluru)

area Area (in ) freqFrequency cluVoronoi geoGeohash 0.00 0.5 1.01 1.5 2.02 00 5050 100100 150

(c) Cell area (New York)

sSample size freqFrequency cluVoronoi geoGeohash 00 50 100100 150 200200 00 4000040000 8000080000

(d) Cell volume (New York)
Figure 4: Histogram to show the distribution of area per partition and samples per partition for the two tessellation techniques.

Iii-D Observations

Iii-D1 Bengaluru data set

On referring Figure 2

for the heat maps generating by the two tessellation techniques, we make the following inferences. Geohash is a region-oriented city map partition approach and therefore, results in some cells that are highly dense and in some cells that are highly sparse. On the other hand, Voronoi cells are more uniformly distributed in terms of density. Similar inferences can be made from the histogram plots (refer Figures

3(a) and 3(b)

). The cell sample sizes follow a normal distribution for Voronoi cells in Figure

3(b). The tessellation technique tries to uniformly distribute samples among the partitions. This, in fact, increases the chance of finding a passenger in a cell as there are less “demand scarce” cells. On the other hand, this process of uniformly distributing samples increases the partition size to above 1 for some Voronoi cells, as seen in Figure 3(a). As a result, the driver might have to traverse a larger distance to find a passenger. The size of the smallest Voronoi partition observed is 0.10 and 85% of the partitions are below 2 . The area of a geohash cell remains constant at 0.72 . So when the driver is re-routed to a cell, if sufficient demand density exists in the cell, a vacant taxi driver has to spend less time searching in a geohash when compared to its corresponding Voronoi cell. At times, this advantage comes at the expense of masking the actual demand hot spots. In some locations, typically near the city center, there can be more than one hot spot in one area. Due to the inflexible structure of a geohash, these multiple hot spots are considered as a single hot spot. Note that if the service providers are more inclined towards re-allocating drivers to fixed sized locations, multiple hot spots in a geohash might not be an issue. With Voronoi cells, we have the added flexibility of sub-setting a geohash, into further cells. So from a hot spot identification view point, Voronoi tessellation seems to be a better strategy than Geohash tessellation for the Bengaluru data set.

Iii-D2 New York data set

Figure 3 is a zoomed-in heat map of borough of Manhattan. We note that 92.5% of the total generated demand originated from Manhattan over the period of study. Thus, the spatial data distribution is highly non-linear, which results in very closely packed Voronoi cells in the Manhattan borough. In fact, the smallest partition is of the size 0.08 and 75% of the partitions are below 0.10 , as observed in Figure 3(c). Routing drivers to such a small area is infeasible and economically not viable for any service provider. The Geohash strategy, on the other hand, is not a density-dependent technique and hence, assigns geohashes uniformly. From Figure 2(a), we observe that there are inter-island tessellations, even though all the centroids are in mainland. This does not make sense from a driver re-routing view point. This problem does not arise in Figure 2(b) and the 6-level geohashes are confined to either sides of the river. Here, Voronoi tessellation fails as a spatial tessellation strategy because of the non-uniform spatial data distribution, and the city geography, which is spread over multiple islands. If Voronoi tessellation is to perform better, extensive parameter tuning of the K-Means algorithm has to be conducted. The Voronoi tessellation has to be performed on each borough to avoid inter-island tessellations, which adds to further complexity. For this data set, the sheer simplicity of the Geohash tessellation makes it the winning strategy.

f Frequency rResidual Magnitude -2-2 -1-1 00 11

(a) Sample histogram of residuals

acf ACF lLag 0.00 0.5 1.01 1.5 2.02

(b) Significance levels

mag Magnitude tSample index cluVoronoi geoGeohash 2.01 3.5 5.01440 -2-2 -1-1 00 11


Resemblance to white noise

Figure 5: Plots obtained by performing residual diagnostics for model evaluation.

The inferences and comparisons made above are based on the spatial aggregation of the data. In the next section, we perform comparisons based on the temporal aggregated data. The spatially aggregated data for each Voronoi and Geohash partition is temporally aggregated and modeled using time-series techniques.

Iv Time-series modeling

Before modeling the data, we remove duplicate user IDs, if they appear multiple times in a 30 minute interval per partition. We assume that it is highly unlikely for a passenger to book multiple taxis within that time, from the same cell. A Box-Cox transformation is applied to stabilize the variance of the raw data, thus making the data amenable for linear processing

[32]. Each K-Means demand centroid has two time-series associated with it; one using the demand aggregated at the Voronoi cell level, and the other with the demand aggregated at the geohash level. When these sequences are plotted, we observe that the series shows significant trend and seasonal patterns. On performing a spectral analysis, we find strong daily and weekly seasonality. A computationally efficient approach to real time forecasting is to use linear time series models. Hence, the data is trained using single and double-seasonal linear parametric time-series models. The city comprises of various activity zones such as residential, entertainment, office, school zones, etc. The demand patterns for each of these zones differ and hence, a single time-series model may not be a satisfactory fit for all the time-sequences. We perform the time-series modelling exercise for each tessellation strategy separately, at two time scales: 15 minutes and 60 minutes.

Iv-a Shortlisted models

Holt-Winters, Auto-Regressive and Seasonal Auto-regressive Integrated Moving Average (ARIMA and SARIMA) models, Seasonal and Trend Decomposition using LOESS (STL) combined with non-seasonal exponential smoothing [32, Chapters 6-8] are some of the widely used single-seasonal exponential smoothing models. Double Seasonal Holt-Winters (DSHW)  [32] and Trigonometric BATS (TBATS) [33] are some common double-seasonal models. In order to make sure that these models work better than nive alternatives, we compare the aforementioned models against a simple averaging model, nive and seasonal nive model, and random drift model. It was observed that in a few cells, the simple alternatives indeed performed better than the parametric time-series models. We consider a simple seasonal averaging model as the baseline model for our analysis. If a model performs better than the baseline model, it is kept; else it is discarded. Below, we briefly explain the models with which almost all the activity zones are well-modeled for our data.

  1. Baseline model: In order to forecast for a particular time step, the mean of all the previous season samples corresponding to that time step is computed. If weekly forecast is considered, forecast for Sunday 12 noon is the average of all Sunday 12 noon demand in the past.


    where is the demand at time , is the number of seasons and is the seasonality period.

  2. TBATS: TBATS model is a state space model introduced in  [33] for forecasting time-series with multiple seasonal periods, high frequency seasonal periods, non integer seasonality, and calender effects. TBATS is an acronym for Trigonometric (T), Box-Cox transform (B), ARMA errors (A), Trend (T) and Seasonal components (S). The seasonal components are represented using trigonometric fourier series. The equations for a -step ahead additive trend, multiplicative seasonality TBATS model prediction are as follows:


    where , , and are the smoothing parameters, and , are the seasonal periods.

  3. Seasonal and Trend decomposition using Loess (STL): STL is a time-series decomposition technique where the data is decomposed into seasonal and non-seasonal components. The non seasonal component is modeled using techniques such as ARIMA, exponential smoothing models (ETS), random drift etc. The seasonal component, after a Seasonal Nive forecast for the seasons, is added to the non-seasonal component to form the final model.


    Here, denotes the trend, is the irregular component, is the integer part of variable , is the prediction horizon, and is the seasonal period.

Iv-B Model validation

The suitability of the models for the data can be analyzed as follows:

  • Plotting the histogram of the residuals: The histogram should follow a Gaussian distribution with residual mean as the mean of the distribution.

  • Plotting the Auto Correlation Function (ACF) of the residuals: The residuals should lie within the significant band of the ACF plot, i.e, within band, where is the sample size.

  • Resemblance with white noise: The residuals should appear to be uncorrelated and should pass a statistical test for correlation; for eg., the Box-Ljung test. The Box-Ljung statistic is a function of the accumulated sample auto correlations up to any specified time lag.

We refer the reader to  [32, Chapter 2, Section ] for an elaborate explanation of the afore-mentioned residual diagnostic tools. Results of residual diagnostics performed on the Voronoi time-series generated from an entertainment zone in Bengaluru are plotted in Figure 5. Here, STL decomposition with ARIMA(2,1,2) performed satisfactorily and the residuals from the fit appear to be uncorrelated.

Strategy Bengaluru New York
Table I: Best performing models for the datasets, where a.e = additive errors, and d.t = damped trend.
Data set
60 min. 15 min.
Vor Geo Vor Geo
Bengaluru SMAPE (%) 17.6 21.7 37 40.4
MASE 0.75 0.73 0.52 0.53
New York SMAPE (%) 15.2 15.6 33.6 31.2
MASE 0.60 0.59 0.62 0.47
Table II: Preliminary results for the Bengaluru and New York data sets.

The performance of the model can further be evaluated by employing performance metrics such as SMAPE and MASE. The SMAPE is a symmetrized version of the Mean Absolute Percent Error, which is defined if, at all future time points, the point forecasts and actuals are both not zero. For a forecast horizon considered, SMAPE is defined as follows:


where is the actual demand and is the forecast at time . Even though this metric is widely used in industry, SMAPE is a scale-dependent error, and not well-suited for intermittent demand. Hence, we also evaluate our models against Mean Absolute Scaled Error (MASE). For a time-series of forecast horizon and seasonal period , MASE is defined as:


For a non-seasonal time-series, . The denominator of the equation is the mean absolute error of the one-step seasonal nive forecast method on the training set. This error metric compares the models with the standard random drift model, and ensures that the models perform better than a nive technique. MASE can be used to compare forecasts across data sets with different scales, as the metric is independent of data scale. Smaller the value of the metric, the better is the model. The selected models passed the residual diagnostics and performed satisfactorily with SMAPE and MASE. In Table II, we list the shortlisted models with which Bengaluru and New York are well-modelled at 60 min. aggregation levels. The errors encountered by the individual models are not listed because the modeling technique that works for one scenario might not provide good performance with another scenario.

no Sample size (norm.) smapeSMAPE cluVoronoi geoGeohash 0 0 275 550 1 9 9 18 18 27 27

(a) Bengaluru

nopoints Sample size (norm.) smapeSMAPE cluVoronoi geoGeohash 4 4 11 11 18 18 0 0 1 1

(b) Manhattan-Bronx
Figure 6: The variation of SMAPE with the partition size at 60 minute aggregation level.

Iv-C Preliminary results

After tessellating the city space into geohashes and Voronoi cells, we ran the time-series models mentioned in the previous section on the spatially and temporally aggregated area-normalized demand. In general, We observe strong heterogeneity in city demand across both spatial and temporal dimensions. It was observed that for cells with low density, Voronoi tessellation performs better than Geohash tessellation (Figure 6). The optimal density based partition in Voronoi cells appears to be the reason for this behavior. For high density cells, Geohash based models perform at least as good as Voronoi based models. The comparable performance along with the low computational complexity makes Geohash tessellation the preferred choice for data dense locations. The numerical results are summarized below:

  1. Using the models mentioned in Section IV-A, we achieve an average accuracy of about 80% per for both data sets, at 60 min. aggregation levels.

  2. On referring Table II for the performance evaluation of Geohash and Voronoi tessellation based models on the 740 Bengaluru city and 780 New York city demand centers, we notice the absence of an universal winner.

  3. On average, Voronoi tessellation appears to outperform Geohash tessellation for Bengaluru city. On the other hand, Geohash tessellation emerges as the superior technique for most of the tested use cases for the New York city.

smape SMAPE cdfCDF c1V-60 g1G-60 c2V-15 g2G-15 0.00 0.50.5 1.01 00 2020 4040

(a) Bengaluru

smape MASE cdfCDF c1V-60 g1G-60 c2V-15 g2G-15 0.00 0.50.5 1.01 0l0 11 22

(b) Bengaluru

smape SMAPE cdfCDF c1V-60 g1G-60 c2V-15 g2G-15 0.00 0.50.5 1.01 0l0 2020 4040

(c) New York

smape MASE cdfCDF c1V-60 g1G-60 c2V-15 g2G-15 0.00 0.50.5 1.01 0l0 0.750.75 1.51.5

(d) New York
Figure 7: ECDF plots of Voronoi and Geohash partition techniques using MASE and SMAPE as the metrics. V-XX and G-XX correspond to performance of Voronoi and Geohash based models at XX min. aggregation levels.

tTime step maseSMAPE cluVoronoi geoGeohash 1 1 22 22 48 48 53 53 96 96 84 84

(a) Without HEDGE

t Time step maseSMAPE cluVoronoi geoGeohash hedgeHEDGE 1 1 22 22 48 48 53 53 96 96 84 84

(b) HEDGE without discounting

t Time step maseSMAPE cluVoronoi geoGeohash h2dHEDGE(0.1,0.7) 1 1 22 22 48 48 53 53 96 96 84 84

(c) HEDGE with discounting
Figure 8: The performances of models based on tessellation techniques without HEDGE, after original HEDGE, and after HEDGE with discounting.

Further, the Empirical Cumulative Distribution Function (ECDF) of the average errors obtained from the models based on all the Voronoi and Geohash partitions are plotted in Figure

7. We can see a dominance of Voronoi technique over Geohash technique at 15 and 60 minute aggregation levels in Figures 6(a) and 6(b) for Bengaluru city. The errors from Voronoi based models reach the upper bound faster than errors from Geohash based models. On the other hand, for New York city, Figures 6(c) and 6(d) show a dominance of Geohash technique over Voronoi technique. In addition, we conduct the Kolmogorov-Smirnov test (K-S test) [34] to check if the ECDFs differ significantly. The results obtained provide sufficient evidence that Voronoi technique is statistically better than Geohash technique for Bengaluru city and vice versa for New York city, consistent with the inferences from Figure 7. Hence, we find that Geohash tessellation performs better than Voronoi for the Bengaluru city, while Voronoi tessellation has a superior performance over Geohash for New York city. Further, we proceed to plot the instantaneous errors obtained from the models based on the two tessellation techniques in Figure 7(a) for Bengaluru at 15 minute aggregation. Each point on the graph corresponds to the mean of errors obtained from all geohash/Voronoi cells at that time instant. We can see that early mornings and late nights, Geohash based models work better compared to Voronoi based models. For the rest of the day, i.e., the daylight hours, Voronoi based models tend to produce better forecasts compared to Geohash based models. It is surprising to note that even within a single city, a unique tessellation technique does not have superior performance over the other at all time steps. This non-stationary behavior of the models prompts us to combine the models for better location-based forecasts. To that end, we propose a modified version of discounted HEDGE algorithm [29] in the next section.

V Modified HEDGE for combining models

Consider a learning framework, where model provides recommendations. Each model is referred to as an . Assuming that the predictor has complete knowledge about all the past decisions and performances of experts, the goal is to perform as good as the best expert in the pool. This exercise belongs to the class of ensemble learning where we use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Here, whenever Geohash tessellation performs better than Voronoi tessellation at a time step , the ideal decision is to choose Geohash technique as the tessellation strategy for that . For all demand centers, the forecasts at that instant should then be made from their geohash cells.

On observing the performances of both strategies at 15 and 60 minute aggregation levels, we see that a single tessellation technique is unable to yield optimal results for the entire forecast horizon. The best strategy varies with time. A sensible approach towards solving this issue is to combine the two experts. As an initial step towards that goal, we apply the classic HEDGE algorithm as proposed in [28] to combine the models (Figure 7(b)). We see that the decision maker is able to follow the best expert, model based on Voronoi in this case, till a cross over occurs. The algorithm is unable to adapt to the shifting nature of the experts. In order to adapt to such a non stationary environment, dHEDGE [29] was proposed, where the authors modify the conventional HEDGE to include an exponential discounting factor. By exponentially filtering the past performances of the experts, dHEDGE takes into account the non-stationarity of the process. By normalizing the weights of each expert at every instant, dHEDGE provides a set of coefficients for experts. This can then be used to generate a convex combination of expert opinion. Our problem do not require normalized weights. Since there are only two experts, at every instant we can pick the expert which has maximum weight,

, the expert with the minimum loss. Hence, we modify the dHEDGE algorithm to pick an expert if it has a higher weight than the other expert. The original algorithm has a generic loss function

. For illustrative purposes, they have used a discrete loss function , where is the indicator function, is the observed symbol and is the predicted symbol. For their cellular LTE network application, that loss function outputs a for all the experts who predicted the symbol correctly at that instant, and output a for all other experts. On the other hand, we build our loss functions based on the errors observed for each strategy. Our modified dHEDGE algorithm is given in Algorithm 1.

11.35 Parameters: Learning para. , Discounting factor
2 Initialization: Set for s.t
3 for  do
4      Choose expert with index =
5       Error = MEAN(forecast errors from models based on Voronoi at )
6       Error = MEAN(forecast errors from models based on Geohash at )
7       Loss L at , ,
8       Update weights as
9 end for
Algorithm 1 Modified dHEDGE for choosing the best expert

t Time step maseSMAPE cmsCum. Mean Error cluVoronoi geoGeohash h1 dHEDGE (0.1,0.1) h2 dHEDGE (0.1,0.4) h3 dHEDGE (0.1,0.1) h4 dHEDGE (0.1,0.4) 11 11 19 19 51 51 88 88 13 13 27 27 25 25 58 58 42 42 60 60 1 1 48 48 12 12 96 96 24 24

(a) Bengaluru data with SMAPE as the metric

t Time step maseMASE cmsCum. Mean Error cluVoronoi geoGeohash h1 dHEDGE (0.5,0.7) h2 dHEDGE (0.1,0.3) h3 dHEDGE (0.5,0.7) h4 dHEDGE (0.1,0.3) 1 1 48 48 12 12 96 96 24 24 0.3 0.3 1.4 1.4 1.1 1.1 0.4 0.4 0.6 0.6 1.0 1.0 0.5 0.5 0.7 0.7 0.9 0.9 0.30 0.3 0.45 0.45 0.60 0.6

(b) Bengaluru data with MASE as the metric

t Time step maseSMAPE cmsCum. Mean Error cluVoronoi geoGeohash h1 dHEDGE (0.1,0.1) h2 dHEDGE (0.1,0.1) h3 dHEDGE (0.1,0.1) h4 dHEDGE (0.1,0.1) 1 1 48 48 12 12 96 96 24 24 10 10 35 35 20 20 16 16 45 45 71 71 32 32 54 54 30 30 42 42 55 55

(c) New York data with SMAPE as the metric

t Time step maseMASE cmsCum. Mean Error cluVoronoi geoGeohash h1 dHEDGE (0.1,0.1) h2 dHEDGE (0.1,0.5) h3 dHEDGE (0.1,0.1) h4 dHEDGE (0.1,0.5) 1 1 48 48 12 12 96 96 24 24 0.3 0.3 0.8 0.8 1.3 1.3 0.2 0.2 0.4 0.4 1.6 1.6 0.5 0.5 1.4 1.4 1.2 1.2 0.9 0.9

(d) New York data with MASE as the metric
(e) Manhattan-Bronx and Brooklyn-Queens data sets with SMAPE and MASE as metrics
Figure 9: The prediction performance of dHEDGE (,) algorithm based model against single strategy based models at 15 and 60 minute aggregation levels for the two cities with SMAPE and MASE as metrics are plotted in the first two sub figures of (a)-(d). Cumulative mean error performance of models based on Geohash, Voronoi and hybrid strategies are provided in the last two subfigures of (a)-(d). The first two and the last two subfigures of (e) corresponds to cumulative mean error performances on Manhattan-Bronx and Brooklyn-Queens boroughs respectively at 15 minute aggregation levels.

The weights can be initialized either uniformly or based on some prior knowledge about the experts. The factor gives more leverage to observations made in the recent past compared to the observations in the distant past. Thus, the expert which have been performing well in the recent past is boosted. By tuning the parameter , we can manipulate the dHEDGE algorithm to respond to sudden changes in the average error performance. By setting = 1, this algorithm becomes HEDGE with no forgetting factor. When modified dHEDGE is used, we observe that the algorithm tends to choose the best expert quite rapidly compared to the original HEDGE (Figure 7(c)).

Data set Tessellation size Sampling period
5 15 30 60
Vor Geo dHEDGE Vor Geo dHEDGE Vor Geo dHEDGE Vor Geo dHEDGE
6-level geohash
& K= 740
63.2 60.8 60.7 37.1 40.4 36.5 25.6 29.6 25.6 17.6 21.7 17.6
(0.1,0.4) (0.1,0.4) (0.1,0.1) (0.1,0.1)
sub 5-level geohash
& K= 114
23 20.7 20.5 12.8 19.3 12.6 9.7 16.7 9.6 6.9 12.8 7.0
(0.1,0.1) (0.1,0.6) (0.1,0.1) (0.1,0.3)
5-level geohash
& K = 30
10.9 10.5 9.7 7.4 9.4 7.3 6.4 7.4 6.4 5.4 5.8 5.4
(0.1,0.9) (0.1,0.7) 0.9,0.7 0.9,0.9
New York
6-level geohash
& K= 780
61.4 40.0 40.0 33.6 31.2 31 24.8 24.4 23.8 15.2 15.6 15.2
(0.1,0.1) (0.1,0.1) (0.1,0.6) (0.1,0.1)
sub 5-level geohash
& K= 120
23.7 15.9 15.5 13.9 12.4 12.1 11 11.6 10.6 9.4 11.9 9.3
(0.8,0.8) (0.7,0.2) (0.1,0.3) (0.1,0.4)
5-level geohash
& K= 31
10.9 10.5 10.4 9.7 8.7 8.3 11.2 7.7 7.9 10.6 12.8 10.8
(0.1,0.9) (0.1,0.7) (0.1,0.1) (0.1,0.8)
6-level geohash
& K= 169
29.9 20.6 20.6 17 16.5 16.5 13.2 11.1 11.1 9.15 10.5 9.15
(0.1,0.2) (0.1,0.3) (0.1,0.1) (0.1,0.1)
sub 5-level geohash
& K= 26
11.6 14.7 11.4 8.3 12.9 8.4 7.4 10.2 7.5 6.8 9.3 6.9
(0.1,0.8) (0.1,0.3) (0.1,0.2) (0.1,0.1)
5-level geohash
& K = 7
7.9 7.2 6.9 6.2 6.7 6.2 5.5 5.7 5.1 5.9 5.4 5.5
(0.1,0.7) (0.7,0.9) (0.7,0.2) (0.6,0.9)
6-level geohash
& K = 460
90.2 57.2 57.2 97.6 74.9 74.2 73.9 41.8 41.8 62.6 52.9 52.9
(0.1,0.1) (0.1,0.1) (0.1,0.1) (0.1,0.1)
sub 5-level geohash
& K= 71
73.0 35.1 35.1 49.3 27.2 27.2 37 22.9 22.9 26.2 15.9 15.8
(0.1,0.1) (0.1,0.1) (0.1,0.4) (0.1,0.3)
5-level geohash
& K = 18
42.4 19.5 19.5 23.1 28.8 23.1 18.2 22.8 18.2 12.9 18 12.9
(0.1,0.4) (0.1,0.8) (0.1,0.8) (0.9,0.9)
Table III: The table contains the performance comparison of Voronoi, Geohash and dHEDGE using SMAPE, for the entire set of simulations done. We observe that dHEDGE performs superiorly for all the tested combinations on the 4 data sets. The unique set of parameters (,) obtained for each test case are mentioned below the errors encountered using dHEDGE algorithm.

The modified dHEDGE algorithm was run on both datasets at both sampling periods. The parameters and for each scenario are chosen based on an independent validation set spanning over 24 hours. The performance of the algorithm can be seen in the Figure 9. We can see that the algorithm picks up the best shifting expert, by giving more weightage to the behavior of that expert in the recent past. Note that in Figure 8(a), the plots of Voronoi and dHEDGE overlap as Voronoi is consistently the best performer throughout the forecasting horizon. The last two sub figures in each of the Figures 8(a)-8(d) show the cumulative mean error behaviors of the strategies. The cumulative mean error plots reveal that the modified dHEDGE performs at least as good as the best expert. Whenever the experts in consideration have similar cumulative errors, the hybrid strategy has an error lower than both experts; else, the hybrid strategy consistently has a cumulative error equal to the best performing expert in the recent past. We also analyze the performance of the strategies on various boroughs of New York city, to avoid the issue of inter-island Voronoi tessellations. To further demonstrate the flexibility of our algorithm in adapting to various scenarios, we have repeated the simulations for temporal aggregation levels of 5 and 30 minutes. Multiple spatial resolutions are also explored to validate the applicability of our algorithm. Irrespective of all these nuances, when dHEDGE was applied, the hybrid strategy still turned out to be the winner for all the datasets considered, with different metrics at multiple temporal and spatial scales. Table III summarizes the performances of the strategies on the datasets, with SMAPE as the metric. Similar results was observed using MASE, and hence,are not included. We observe that the modified dHEDGE improves the prediction accuracy by picking the best strategy for that particular instant, for that city, at that temporal and spatial resolution. Please note that our policy does not switch continuously between strategies every minutes. This is because the dHEDGE chooses the winner strategy for an instant based on neither the current nor the immediate past performance, but on the recent past performance of both experts. The dHEDGE switches only if one strategy performs better than the other for a period of time in the past. On tracking the performance of our strategy for all the test cases mentioned in Table III, we observe an average of only 1.5 switches per day at 60 min. aggregation levels. In other words, for 24 time steps, the hybrid strategy made less than 2 switches on average for all tested scenarios. For 30 min. (i.e., 48 steps) and 15 min. (i.e., 96 steps) aggregation levels, the average switches made were only 3.2 and 8.4 per day. With finer temporal resolutions, the number of switches increases, which is inevitable, considering the increase in the number of time steps in the forecast horizon. We feel that the observed number of switches is admissible, in return for a significant improvement in accuracy.

Therefore, based on our observations from two independent datasets, we find that there is no universal winning tessellation technique that works for all datasets. However, a hybrid technique can be developed for enhanced prediction performances on a broad range of datasets.

Vi Conclusions

Efficient spatial partitioning is a key step towards better location-based demand modeling and forecasting. To that end, we performed a comparison of two widely used spatial partitioning techniques, Voronoi tessellation and Geohashing. The study was conducted using two distinct datasets; mobile application log-in based taxi demand data set from Bengaluru, India and government-run street hailing Yellow taxi services data set from New York, USA. A K-Means clustering algorithm was employed to generate demand clusters, that act as generating sites for the tessellations. In order to compare Voronoi based models and Geohash based models at multiple time scales, time-series techniques were applied to the datasets at temporal aggregation levels of 15 and 60 minutes. STL decomposition with ARIMA, ETS, and TBATS were shortlisted as the suitable models. We noticed that neither Geohash nor Voronoi tessellation technique individually proved to be optimal for the entire forecast horizon. Additionally, models based on Voronoi had a superior performance over models based on Geohash when the demand density is low, and vice versa. While Voronoi tessellation appears to be the recommended strategy for tessellating Bengaluru, Geohash tessellation was the winning strategy for the New York city. Thus, we conclude that the tessellation strategy is heavily dependent on the demand density in each partition, and the geography of the city.

The lack of a clear winning strategy prompted us to devise a hybrid tessellation strategy by combining models based on the well known HEDGE algorithm. We modified dHEDGE, which is a discounted version of the HEDGE algorithm, to suit our application. Our algorithm is shown to always pick up the best possible strategy at each time step in the forecast horizon. The tessellation performance is no longer affected by the time of the day, or the features of the underlying data set. Our hybrid tessellation strategy was clearly the winning strategy for all the datasets considered, performing consistently better at multiple time scales with different performance metrics.

This work is directed towards a comparison of tessellation strategies, where we have focused on combining the strategies. More tessellation experts can be added to check the robustness of our dHEDGE algorithm. Recent developments in Recurrent Neural Networks have shown promises in the prediction domain


, which can be utilized to improve predictions. As part of future works, non-parametric techniques such as Support Vector Regression, Artificial Neural Networks, and Bayesian analysis may be applied to model the tessellation data, to check their effectiveness. Extensive parameter tuning of K-Means should be done, if possible, to improve the clustering. K-Means algorithm can also be replaced by density based clustering algorithms; for example, DBSCAN and OPTICS, among others.


The work is undertaken as part of the Cognizant-Ernet IIT Madras project. The authors are also thankful to Vishnu Raj, Gopal Krishna Kamath, and Sudharsan Parthasarathy for many helpful discussions.


  • [1]

    D. Shao, W. Wu, S. Xiang, and Y. Lu, “Estimating taxi demand-supply level using taxi trajectory data stream,” in

    Proc. of IEEE International Conference on Data Mining Workshop, pp. 407–413, 2015.
  • [2] C. Kamga, M. A. Yazici, and A. Singhal, “Hailing in the rain: temporal and weather-related variations in taxi ridership and taxi demand-supply equilibrium,” in Transportation Research Board 92nd Annual Meeting, no. 13-3131, 2013.
  • [3] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang, “Prediction of urban human mobility using large-scale taxi traces and its applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.
  • [4] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “Predicting taxi–passenger demand using streaming data,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1393–1402, 2013.
  • [5] K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang, “A framework for passengers demand prediction and recommendation,” in Proc. of IEEE International Conference on Services Computing, pp. 340–347, 2016.
  • [6] J. Lee, I. Shin, and G.-L. Park, “Analysis of the passenger pick-up pattern for taxi location recommendation,” in Proc. of International Conference on Networked Computing and Advanced Information Management, pp. 199–204, 2008.
  • [7] Y. Dong, S. Qian, K. Zhang, and Y. Zhai, “A novel passenger hotspots searching algorithm for taxis in urban area,” in

    Proc. of IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

    , pp. 175–180, 2017.
  • [8] J. Xu, R. Rahmatizadeh, L. Bölöni, and D. Turgut, “Real-time prediction of taxi demand using recurrent neural networks,” IEEE Transactions on Intelligent Transportation Systems (Early Access), 2017.
  • [9] S. Phithakkitnukoon, M. Veloso, C. Bento, A. Biderman, and C. Ratti, “Taxi-aware map: Identifying and predicting vacant taxis in the city.” in Proc. of International Joint Conference on Ambient Intelligence, pp. 86–95, 2010.
  • [10] Y. Ding, Y. Li, K. Deng, H. Tan, M. Yuan, and L. M. Ni, “Dissecting regional weather-traffic sensitivity throughout a city,” in Proc. of IEEE International Conference on Data Mining, pp. 739–744, 2015.
  • [11] J. Shen, X. Liu, and M. Chen, “Discovering spatial and temporal patterns from taxi-based floating car data: a case study from nanjing,” GIScience & Remote Sensing, vol. 54, no. 5, pp. 1–22, 2017.
  • [12] Y. Zhang, B. Li, and K. Ramayya, “Learning individual behavior using sensor data: The case of gps traces and taxi drivers,” 2016. [Online]. Available:
  • [13] N. Davis, G. Raina, and K. Jagannathan, “A multi-level clustering approach for forecasting taxi travel demand,” in Proc. of IEEE International Conference on Intelligent Transportation Systems, pp. 223–228, 2016.
  • [14] R. Outersterp, “Partitioning of spatial data in publish-subscribe messaging systems,” Master’s thesis, University of Twente, 2016.
  • [15] R. Gelda, K. Jagannathan, and G. Raina, “Forecasting supply in voronoi regions for app-based taxi hailing services.” in International Conference on Advanced Logistics and Transport, 2017.
  • [16] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in

    Proc. of the Berkeley Symposium on Mathematical Statistics and Probability

    , pp. 281–297, 1967.
  • [17] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, 2014.
  • [18] D. Xu and Y. Tian, “A comprehensive survey of clustering algorithms,”

    Annals of Data Science

    , vol. 2, no. 2, pp. 165–193, 2015.
  • [19] M. Kaushik and M. B. Mathur, “Comparative study of k-means and hierarchical clustering techniques,” International Journal of Software & Hardware Research in Engineering, vol. 2, no. 6, 2014.
  • [20] S. Chakraborty, N. Nagwani, and L. Dey, “Performance comparison of incremental k-means and incremental dbscan algorithms,” arXiv preprint arXiv:1406.4751, 2014.
  • [21] J. Burns, “Centroidal voronoi tessellations,” 2009. [Online]. Available:
  • [22] L. Moreira-Matias, J. Mendes-Moreira, J. F. de Sousa, and J. Gama, “Improving mass transit operations by using avl-based systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 1636–1653, 2015.
  • [23] C. Chen, Y. Wang, L. Li, J. Hu, and Z. Zhang, “The retrieval of intra-day trend and its influence on traffic prediction,” Transportation Research part C: Emerging Technologies, vol. 22, pp. 103–118, 2012.
  • [24]

    M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning,”

    IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 871–882, 2013.
  • [25] C. Ding, J. Duan, Y. Zhang, X. Wu, and G. Yu, “Using an arima-garch modeling approach to improve subway short-term ridership forecasting accounting for dynamic volatility,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 4, pp. 1054–1064, 2017.
  • [26] Y. Liu, C. Liu, N. J. Yuan, L. Duan, Y. Fu, H. Xiong, S. Xu, and J. Wu, “Intelligent bus routing with heterogeneous human mobility patterns,” Knowledge and Information Systems, vol. 50, no. 2, pp. 383–415, 2017.
  • [27] V. G. Vovk, “Aggregating strategies,” in

    Proc. of Computational Learning Theory

    , pp. 371–383, 1990.
  • [28] Y. Freund and R. E. Schapire, “A desicion-theoretic generalization of on-line learning and an application to boosting,” in Proc. of European Conference on Computational Learning Theory, pp. 23–37, 1995.
  • [29] V. Raj and S. Kalyani, “An aggregating strategy for shifting experts in discrete sequence prediction,” arXiv preprint arXiv:1708.01744, 2017.
  • [30] R. J. Hyndman et al., “Another look at forecast-accuracy metrics for intermittent demand,” Foresight: The International Journal of Applied Forecasting, vol. 4, no. 4, pp. 43–46, 2006.
  • [31] N. Y. C. Taxi & Limousine Commission, “Tlc trip record data,” 2017. [Online]. Available: about/trip_record_data.shtml.
  • [32] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and practice.   OTexts, 2014.
  • [33] A. M. De Livera, R. J. Hyndman, and R. D. Snyder, “Forecasting time series with complex seasonal patterns using exponential smoothing,” Journal of the American Statistical Association, vol. 106, no. 496, pp. 1513–1527, 2011.
  • [34] F. J. Massey Jr, “The kolmogorov-smirnov test for goodness of fit,” Journal of the American Statistical Association, vol. 46, no. 253, pp. 68–78, 1951.