I Introduction
Mobile application based ehailing taxi services are gaining huge popularity across the world due to the advancement in GPS based smart phone technologies. These services can be used to supplement the use of public transit and other traditional modes of transportation. A key challenge faced by these fast growing taxi services is the demand  supply mismatch problem. During peak hours, the demand for taxis surpass the available supply, creating unmet demand. During offpeak hours, the scenario reverses and the vacant taxis cruise for longer periods to find passengers. These issues lead to dynamic surge pricing, along with reduced customer satisfaction and low driver utilization. Therefore, it is crucial to devise accurate locationspecific demand forecasting algorithms, so as to gain prior knowledge of undersupplied and oversupplied areas. This knowledge can be used to mitigate the demandsupply imbalance by rerouting vacant taxi drivers, to compensate for the unmet demand. One of the key steps towards devising efficient locationbased forecasting algorithms is the selection of proper spatial resolution for forecasting. The spatial resolution should not be too high, each cell should contain sufficient demand density for prediction to be effective. At the same time, the resolution should not be too low so that the driver has to cruise a large distance before finding a passenger. Hence, a carefully planned tessellation strategy is a key step in any locationbased modeling exercise.
Ia Related works
Several attempts have already been made to study the supplydemand levels and imbalances in taxi services [1, 2]. Identification and modeling of passenger hot spots for rerouting taxi drivers is also a widely researched area. Various methodologies have been proposed to this end, with Auto Regressive Integrated Moving Average model (ARIMA) and its variants [3, 4], Exponential Weighted Average models [5], Nearest Neighbour clustering [6, 7]
, Neural Networks (NN)
[8] being some of the commonly used modeling techniques. Irrespective of the modeling technique used, the preliminary step in modeling a location based entity such as passenger demand is to spatially partition the space. In the transportation literature, two common tessellation strategies are used. Spatial aggregations are either performed using grids, where the space is partitioned into square or rectangular grids of fixed area [3, 8, 9], or using polygons, where the space is partitioned into regular or irregular polygons of variable area [10, 11, 12]. It is common practice in the transportation literature to consider either one of these tessellation styles for spatial partitioning, without motivating the choice of the tessellation technique.In our previous work [13], we tessellated the city of Bengaluru, India, into fixedsized partitions known as geohashes. We observed that for those regions with low demand density, fixed sized partitions resulted in data scarcity, which led to low model accuracy. To improve the accuracy, we explored the spatial correlation between the neighbouring geohashes to enhance the performance of the models. The performance limitation due to the chosen tessellation strategy motivated us to conduct a comparison study of various tessellation strategies. In this scenario, a few significant questions arise: (i) How can one decide the tessellation scheme to be used?, (ii) How sensitive is the performance of the models to the tessellation strategy?, (iii) Can we arrive at a tessellation strategy that works for a broad range of datasets? We aim to address these questions through this paper. To the best of our knowledge, an extensive study of the tessellation techniques and their effects on the model performance have not been conducted in the past. In this work, we explore the relationships between tessellation strategies, demand densities and city geographies. This is one of the features that distinguishes our work from the existing literature.
For comparing fixed and variable sized tessellation styles, we choose Geohash tessellation, and Centroidal Voronoi tessellation with KMeans respectively. Intriguingly, each tessellation strategy has been shown to outperform the other
[14, 15]. In [14], the authors claimed that Geohash technique works better at partitioning data than Voronoi technique for their data set, while in [15], authors preferred Voronoi over Geohash for forecasting supply of drivers in the top localities. In this work, we favour a partition based clustering technique such as KMeans [16] over other clustering techniques. The reader is referred to [17] and [18] for a comprehensive survey of various clustering algorithms. In the survey paper [18], KMeans is listed as a potential clustering algorithm for large scale data, due to its low time complexity, and high computing efficiency. KMeans has a linear memory and time complexity, which is ideal for our very large data sets. It has also been shown that KMeans performs reasonably well in comparison with other clustering techniques; for example, DBSCAN [20][19], among others. In addition, KMeans has been widely used in the literature to generate Centroidal Voronoi polygons [21]. Of the widely used modeling techniques, Exponential Smoothing and ARIMA models are known for their simplicity in implementation and high computational speed [22]. These techniques have also been observed to perform satisfactorily on comparison with other conventional methods such as Bayesian networks, SVR, and Artifical Neural Networks (ANN)
[23, 24]. Authors in [3] proposed an improved ARIMA based prediction method to forecast the spatialtemporal variation of passengers in the hot spots with a prediction error of 5.8%. Using a time varying Poisson process and ARIMA model, the work in [4] obtained an accuracy of 76% for passenger demand on taxi services. Clustering along with Exponential Weighted Moving Average models were used to predict passenger demand hot spots with a 79.6% hit ratio in [5]. The subway ridership demand was analysed in [25] using a combined ARIMAGARCH model with a maximum error of 7%. This promising performance of regression and smoothing based models in the context of passenger demand modeling, along with high computational speeds, make them ideal choices for our comparison study.Thus, we aim to perform an extensive city wide spatiotemporal analysis and compare the two tessellation techniques for two independent datasets – an ehailing mobile application based demand data set in Bengaluru, India and a street hailing based taxi demand data set in New York, USA. Next, in addition to comparing the two strategies, we also propose a hybrid tessellation strategy by combining the two tessellation strategies based on their past performances, that works for any data set. No similar effort has been made in the transportation literature to combine tessellation strategies, and this feature also sets our work apart from the existing works.
IB Our contributions
The demand data points are segregated into clusters using KMeans clustering algorithm. The obtained cluster centroids are used to partition the city into geohashes and Voronoi cells. In the past, road intersections [10, 12] and bus stops [26] are employed to act as tessellation centers. In contrast, we use KMeans cluster centroids to act as the tessellation centers. Different timeseries modeling techniques are applied to the Geohash aggregated data and Voronoi aggregated data to compare the two techniques. The data is aggregated and analyzed at two different timescales; at 15 minutes to facilitate response to realtime decisions and, at 60 minutes to observe the highlevel patterns. While comparing the two tessellation strategies, we observe that performances of the strategies do not remain consistent. Performances of the strategies are found to vary with different data sets. They also showed time dependent variations within each data set. In order to deal with this apparent nonstationarity, we need a combining algorithm that can pick the best tessellation strategy at every time instant.
The method of using advises from multiple experts^{1}^{1}1In our case, the two “experts” refer to the two tessellation strategies. was first introduced in [27], and later generalized in [28] to arrive at the wellknown HEDGE algorithm. Specifically, by adopting a multiplicative update of weight parameter, the authors of [28] were able to produce algorithms performing almost as well as the best expert in the pool. The authors in [29] extend the HEDGE algorithm to include nonstationary experts, by introducing a discounting factor. We modify the discounted variant of HEDGE to arrive at a hybrid tessellation strategy for enhanced taxi demand forecasts.
The key contributions and findings of this paper can be summarized as follows:

We conduct a comparative study of Voronoi and Geohash tessellation strategies for spatial demand partitioning.

We highlight the dependence of the tessellation strategy on time of the day, and on the properties of the data set.

We find that models based on Voronoi tessellation have a superior performance compared to models based on Geohash tessellation for demand scarce cells. On the other hand, Geohash tessellation performs better than Voronoi tessellation in demand dense cells.

We develop a hybrid tessellation strategy using a HEDGE based combining algorithm. We find that our hybrid strategy always performs at least as good as the best strategy out of the list of experts.

Specifically, our algorithm picks the best tessellation strategy for each instant in the forecasting horizon, across various temporal and spatial resolutions.
At a broader level, this work demonstrates the potential of improving locationbased forecasts by efficiently partitioning largescale temporal and spatial data for taxi hailing services. The rest of the paper is organized as follows: Section II defines the problem, followed by a brief overview of the data, and the tessellation strategies in Section III. The timeseries modelling procedure is explained, along with preliminary results in Section IV. The proposed model combining algorithm and the results are provided in Section V, and we conclude our work in Section VI.
Ii Problem setting
Let be the number of taxi bookings ( demand) observed at time in a particular region . In order to model , we can represent the bookings as a timesequence, where the sequence contains points aggregated over the space at time . The space can be fixedsized or variablesized. In our case, we denote them as geohash and Voronoi cell respectively. Geohash partitioning of a city is trivial, as it does not incorporate neighbour information or data volume. On the other hand, to perform Voronoi partitioning, we need to run a clustering algorithm to cluster nearby points together. In order to limit vacant taxi cruising, we aim to route drivers to regions of area no more than 1 . The parameters in the clustering algorithm are set accordingly. We define the of a KMeans cluster centroid as the number of data points closer to that centroid than to any other centroid. We assume that a unique smallest distance KMeans centroid exists for each data point.
For an idle driver, we aim to provide him/her with the location of nearest suitable KMeans centroid. The demand aggregated from a Voronoi cell may vary in magnitufr from the demand aggregated from its corresponding geohash. Hence, for a standard comparison of the two techniques, we normalize the demand by the area of the partition to obtain demand per . The area normalized demand for geohash/Voronoi partition is computed as follows:
(1) 
where refers to the demand in P, is the sampling period of 15 minutes or 60 minutes. This is used to generate timesequences for further analysis. The SMAPE and MASE [30] are the error performance metrics that are considered in this work, and are defined in Section IV. See Figure 1 for a schematic representation of the work to be conducted. We are mainly interested in the Levels I and III of the flowchart, with nodes in Level II acting as tools for comparison. The data sets used for this study are mentioned below.
Iia Data set description
The Bengaluru taxi demand data is acquired from a leading Indian ehailing taxi service provider. The data contains GPS traces of taxi passengers booking a taxi by logging into the mobile app. The data is available for a period of two months, of January 2016 to of February 2016. The data set contains latitudelongitude coordinates of the logged in customer, along with his/her identification number, session duration and time stamp. The latitude and longitude coordinates of the city is 12.9716° N, 77.5946° E, with an area of approximately 740 .
The New York yellow taxi cab data set is publicly available at [31]. The data set contains GPS traces of governmentrun street hailing Yellow taxis. This data set differs from our mobile app based data set, both in terms of data volume and city structure. The geographical structure of Bengaluru city is radial, while that of New York city is linear. We considered the period of JanuaryFebruary 2016, for analysis. We extract the pick up locations and time stamps from the data to form the demand set. The latitude and longitude coordinates of New York city is 40.7128° N, 74.0059° W, with an area of approximately 780 .
Iii Tessellation strategies
As motivated in Section II, we perform spatial partitioning of the city using two tessellation strategies for the purpose of modeling and comparison. First, we generate clusters of demand points for each city. Later, the centroids of these clusters aid in the generation of tessellation cells.
Iiia KMeans algorithm
KMeans [16]
is a widely used unsupervised learning algorithm to classify a given data set through a certain number of clusters (assume
clusters) fixed apriori. This algorithm aims at minimizing the squared error function given as:(2) 
where is the Euclidean distance between a data point and its center , is the total number of data points. For ensuring that the average cluster area remains around 1 , the number of centers is set to 740 and 780 for clustering Bengaluru and New York city respectively. For each data point, the algorithm calculates the distance from the data point to each cluster. If the data point is closest to its own cluster, leave it where it is, else, move it into the closest cluster. The algorithm stops when no data point is reassigned. By sorting these clusters in descending order of density, we can find locations that generate relatively higher demand (hot spots) compared to other locations.
IiiB Voronoi tessellation
Voronoi tessellation is a spatial partitioning method that divides space according to the nearest neighbourrule. Specifically, each point, called a seed or site, is associated with the region ( Voronoi cell) that is closer to it than to all other points in the space. A Voronoi tessellation is called centroidal when the generating site of each Voronoi cell is also its mean (center of mass). In our work, these sites are obtained from the KMeans algorithm. Based on the closeness of sites, this tessellation strategy produces polygon partitions of varying areas. For eg., the 740 demand centers result in 740 variablesized quadrilateral partitions for Bengaluru city. We remark that the overall time complexity of Voronoi and KMeans algorithm is O(nlogn). Refer Figures 1(a) and 2(a) for heat maps generated from voronoi tessellations of Bengaluru and New York city. The partitioned cells are colorcoded according to their sample size, for ease of representation on the color scale.
IiiC Geohash tessellation
Geohash is a technique that hashes the latitude and longitude coordinates into a character string. It is an extension of grid based method, with a simple naming convention. Note that with Geohash (first G capitalized), we refer to the technique of encoding a coordinate pair into a single string, while geohash (all small letters) refers to the string itself. Geohash can be visualized as a division of Earth into 32 planes, each of which can be divided again into 32 planes, and so on. We refer to these divisions as Geohash levels by defining level x as the division that results in geohashes of length x. A 6level geohash spans a grid of area 1.2 km 0.6 km, covering approximately 1 , where a 5level geohash spans an area of 4.9 km 4.9 km, covering approximately 25 . We choose 6level grids over 5level as it is more sensible to route drivers to a smaller area. In terms of time complexity, this algorithm is O(1). Refer Figures 1(b) and 2(b) for heat maps obtained from geohash tessellations of Bengaluru and New York city.
IiiD Observations
IiiD1 Bengaluru data set
On referring Figure 2
for the heat maps generating by the two tessellation techniques, we make the following inferences. Geohash is a regionoriented city map partition approach and therefore, results in some cells that are highly dense and in some cells that are highly sparse. On the other hand, Voronoi cells are more uniformly distributed in terms of density. Similar inferences can be made from the histogram plots (refer Figures
3(a) and 3(b)). The cell sample sizes follow a normal distribution for Voronoi cells in Figure
3(b). The tessellation technique tries to uniformly distribute samples among the partitions. This, in fact, increases the chance of finding a passenger in a cell as there are less “demand scarce” cells. On the other hand, this process of uniformly distributing samples increases the partition size to above 1 for some Voronoi cells, as seen in Figure 3(a). As a result, the driver might have to traverse a larger distance to find a passenger. The size of the smallest Voronoi partition observed is 0.10 and 85% of the partitions are below 2 . The area of a geohash cell remains constant at 0.72 . So when the driver is rerouted to a cell, if sufficient demand density exists in the cell, a vacant taxi driver has to spend less time searching in a geohash when compared to its corresponding Voronoi cell. At times, this advantage comes at the expense of masking the actual demand hot spots. In some locations, typically near the city center, there can be more than one hot spot in one area. Due to the inflexible structure of a geohash, these multiple hot spots are considered as a single hot spot. Note that if the service providers are more inclined towards reallocating drivers to fixed sized locations, multiple hot spots in a geohash might not be an issue. With Voronoi cells, we have the added flexibility of subsetting a geohash, into further cells. So from a hot spot identification view point, Voronoi tessellation seems to be a better strategy than Geohash tessellation for the Bengaluru data set.IiiD2 New York data set
Figure 3 is a zoomedin heat map of borough of Manhattan. We note that 92.5% of the total generated demand originated from Manhattan over the period of study. Thus, the spatial data distribution is highly nonlinear, which results in very closely packed Voronoi cells in the Manhattan borough. In fact, the smallest partition is of the size 0.08 and 75% of the partitions are below 0.10 , as observed in Figure 3(c). Routing drivers to such a small area is infeasible and economically not viable for any service provider. The Geohash strategy, on the other hand, is not a densitydependent technique and hence, assigns geohashes uniformly. From Figure 2(a), we observe that there are interisland tessellations, even though all the centroids are in mainland. This does not make sense from a driver rerouting view point. This problem does not arise in Figure 2(b) and the 6level geohashes are confined to either sides of the river. Here, Voronoi tessellation fails as a spatial tessellation strategy because of the nonuniform spatial data distribution, and the city geography, which is spread over multiple islands. If Voronoi tessellation is to perform better, extensive parameter tuning of the KMeans algorithm has to be conducted. The Voronoi tessellation has to be performed on each borough to avoid interisland tessellations, which adds to further complexity. For this data set, the sheer simplicity of the Geohash tessellation makes it the winning strategy.
The inferences and comparisons made above are based on the spatial aggregation of the data. In the next section, we perform comparisons based on the temporal aggregated data. The spatially aggregated data for each Voronoi and Geohash partition is temporally aggregated and modeled using timeseries techniques.
Iv Timeseries modeling
Before modeling the data, we remove duplicate user IDs, if they appear multiple times in a 30 minute interval per partition. We assume that it is highly unlikely for a passenger to book multiple taxis within that time, from the same cell. A BoxCox transformation is applied to stabilize the variance of the raw data, thus making the data amenable for linear processing
[32]. Each KMeans demand centroid has two timeseries associated with it; one using the demand aggregated at the Voronoi cell level, and the other with the demand aggregated at the geohash level. When these sequences are plotted, we observe that the series shows significant trend and seasonal patterns. On performing a spectral analysis, we find strong daily and weekly seasonality. A computationally efficient approach to real time forecasting is to use linear time series models. Hence, the data is trained using single and doubleseasonal linear parametric timeseries models. The city comprises of various activity zones such as residential, entertainment, office, school zones, etc. The demand patterns for each of these zones differ and hence, a single timeseries model may not be a satisfactory fit for all the timesequences. We perform the timeseries modelling exercise for each tessellation strategy separately, at two time scales: 15 minutes and 60 minutes.Iva Shortlisted models
HoltWinters, AutoRegressive and Seasonal Autoregressive Integrated Moving Average (ARIMA and SARIMA) models, Seasonal and Trend Decomposition using LOESS (STL) combined with nonseasonal exponential smoothing [32, Chapters 68] are some of the widely used singleseasonal exponential smoothing models. Double Seasonal HoltWinters (DSHW) [32] and Trigonometric BATS (TBATS) [33] are some common doubleseasonal models. In order to make sure that these models work better than nive alternatives, we compare the aforementioned models against a simple averaging model, nive and seasonal nive model, and random drift model. It was observed that in a few cells, the simple alternatives indeed performed better than the parametric timeseries models. We consider a simple seasonal averaging model as the baseline model for our analysis. If a model performs better than the baseline model, it is kept; else it is discarded. Below, we briefly explain the models with which almost all the activity zones are wellmodeled for our data.

Baseline model: In order to forecast for a particular time step, the mean of all the previous season samples corresponding to that time step is computed. If weekly forecast is considered, forecast for Sunday 12 noon is the average of all Sunday 12 noon demand in the past.
(3) where is the demand at time , is the number of seasons and is the seasonality period.

TBATS: TBATS model is a state space model introduced in [33] for forecasting timeseries with multiple seasonal periods, high frequency seasonal periods, non integer seasonality, and calender effects. TBATS is an acronym for Trigonometric (T), BoxCox transform (B), ARMA errors (A), Trend (T) and Seasonal components (S). The seasonal components are represented using trigonometric fourier series. The equations for a step ahead additive trend, multiplicative seasonality TBATS model prediction are as follows:
(4) where , , and are the smoothing parameters, and , are the seasonal periods.

Seasonal and Trend decomposition using Loess (STL): STL is a timeseries decomposition technique where the data is decomposed into seasonal and nonseasonal components. The non seasonal component is modeled using techniques such as ARIMA, exponential smoothing models (ETS), random drift etc. The seasonal component, after a Seasonal Nive forecast for the seasons, is added to the nonseasonal component to form the final model.
(5a) (5b) (5c) Here, denotes the trend, is the irregular component, is the integer part of variable , is the prediction horizon, and is the seasonal period.
IvB Model validation
The suitability of the models for the data can be analyzed as follows:

Plotting the histogram of the residuals: The histogram should follow a Gaussian distribution with residual mean as the mean of the distribution.

Plotting the Auto Correlation Function (ACF) of the residuals: The residuals should lie within the significant band of the ACF plot, i.e, within band, where is the sample size.

Resemblance with white noise: The residuals should appear to be uncorrelated and should pass a statistical test for correlation; for eg., the BoxLjung test. The BoxLjung statistic is a function of the accumulated sample auto correlations up to any specified time lag.
We refer the reader to [32, Chapter 2, Section ] for an elaborate explanation of the aforementioned residual diagnostic tools. Results of residual diagnostics performed on the Voronoi timeseries generated from an entertainment zone in Bengaluru are plotted in Figure 5. Here, STL decomposition with ARIMA(2,1,2) performed satisfactorily and the residuals from the fit appear to be uncorrelated.
Strategy  Bengaluru  New York  

Geohash 



Voronoi 


Data set 

60 min.  15 min.  

Vor  Geo  Vor  Geo  
Bengaluru  SMAPE (%)  17.6  21.7  37  40.4  
MASE  0.75  0.73  0.52  0.53  
New York  SMAPE (%)  15.2  15.6  33.6  31.2  
MASE  0.60  0.59  0.62  0.47 
The performance of the model can further be evaluated by employing performance metrics such as SMAPE and MASE. The SMAPE is a symmetrized version of the Mean Absolute Percent Error, which is defined if, at all future time points, the point forecasts and actuals are both not zero. For a forecast horizon considered, SMAPE is defined as follows:
(6) 
where is the actual demand and is the forecast at time . Even though this metric is widely used in industry, SMAPE is a scaledependent error, and not wellsuited for intermittent demand. Hence, we also evaluate our models against Mean Absolute Scaled Error (MASE). For a timeseries of forecast horizon and seasonal period , MASE is defined as:
(7) 
For a nonseasonal timeseries, . The denominator of the equation is the mean absolute error of the onestep seasonal nive forecast method on the training set. This error metric compares the models with the standard random drift model, and ensures that the models perform better than a nive technique. MASE can be used to compare forecasts across data sets with different scales, as the metric is independent of data scale. Smaller the value of the metric, the better is the model. The selected models passed the residual diagnostics and performed satisfactorily with SMAPE and MASE. In Table II, we list the shortlisted models with which Bengaluru and New York are wellmodelled at 60 min. aggregation levels. The errors encountered by the individual models are not listed because the modeling technique that works for one scenario might not provide good performance with another scenario.
IvC Preliminary results
After tessellating the city space into geohashes and Voronoi cells, we ran the timeseries models mentioned in the previous section on the spatially and temporally aggregated areanormalized demand. In general, We observe strong heterogeneity in city demand across both spatial and temporal dimensions. It was observed that for cells with low density, Voronoi tessellation performs better than Geohash tessellation (Figure 6). The optimal density based partition in Voronoi cells appears to be the reason for this behavior. For high density cells, Geohash based models perform at least as good as Voronoi based models. The comparable performance along with the low computational complexity makes Geohash tessellation the preferred choice for data dense locations. The numerical results are summarized below:

Using the models mentioned in Section IVA, we achieve an average accuracy of about 80% per for both data sets, at 60 min. aggregation levels.

On referring Table II for the performance evaluation of Geohash and Voronoi tessellation based models on the 740 Bengaluru city and 780 New York city demand centers, we notice the absence of an universal winner.

On average, Voronoi tessellation appears to outperform Geohash tessellation for Bengaluru city. On the other hand, Geohash tessellation emerges as the superior technique for most of the tested use cases for the New York city.
Further, the Empirical Cumulative Distribution Function (ECDF) of the average errors obtained from the models based on all the Voronoi and Geohash partitions are plotted in Figure
7. We can see a dominance of Voronoi technique over Geohash technique at 15 and 60 minute aggregation levels in Figures 6(a) and 6(b) for Bengaluru city. The errors from Voronoi based models reach the upper bound faster than errors from Geohash based models. On the other hand, for New York city, Figures 6(c) and 6(d) show a dominance of Geohash technique over Voronoi technique. In addition, we conduct the KolmogorovSmirnov test (KS test) [34] to check if the ECDFs differ significantly. The results obtained provide sufficient evidence that Voronoi technique is statistically better than Geohash technique for Bengaluru city and vice versa for New York city, consistent with the inferences from Figure 7. Hence, we find that Geohash tessellation performs better than Voronoi for the Bengaluru city, while Voronoi tessellation has a superior performance over Geohash for New York city. Further, we proceed to plot the instantaneous errors obtained from the models based on the two tessellation techniques in Figure 7(a) for Bengaluru at 15 minute aggregation. Each point on the graph corresponds to the mean of errors obtained from all geohash/Voronoi cells at that time instant. We can see that early mornings and late nights, Geohash based models work better compared to Voronoi based models. For the rest of the day, i.e., the daylight hours, Voronoi based models tend to produce better forecasts compared to Geohash based models. It is surprising to note that even within a single city, a unique tessellation technique does not have superior performance over the other at all time steps. This nonstationary behavior of the models prompts us to combine the models for better locationbased forecasts. To that end, we propose a modified version of discounted HEDGE algorithm [29] in the next section.V Modified HEDGE for combining models
Consider a learning framework, where model provides recommendations. Each model is referred to as an . Assuming that the predictor has complete knowledge about all the past decisions and performances of experts, the goal is to perform as good as the best expert in the pool. This exercise belongs to the class of ensemble learning where we use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Here, whenever Geohash tessellation performs better than Voronoi tessellation at a time step , the ideal decision is to choose Geohash technique as the tessellation strategy for that . For all demand centers, the forecasts at that instant should then be made from their geohash cells.
On observing the performances of both strategies at 15 and 60 minute aggregation levels, we see that a single tessellation technique is unable to yield optimal results for the entire forecast horizon. The best strategy varies with time. A sensible approach towards solving this issue is to combine the two experts. As an initial step towards that goal, we apply the classic HEDGE algorithm as proposed in [28] to combine the models (Figure 7(b)). We see that the decision maker is able to follow the best expert, model based on Voronoi in this case, till a cross over occurs. The algorithm is unable to adapt to the shifting nature of the experts. In order to adapt to such a non stationary environment, dHEDGE [29] was proposed, where the authors modify the conventional HEDGE to include an exponential discounting factor. By exponentially filtering the past performances of the experts, dHEDGE takes into account the nonstationarity of the process. By normalizing the weights of each expert at every instant, dHEDGE provides a set of coefficients for experts. This can then be used to generate a convex combination of expert opinion. Our problem do not require normalized weights. Since there are only two experts, at every instant we can pick the expert which has maximum weight,
, the expert with the minimum loss. Hence, we modify the dHEDGE algorithm to pick an expert if it has a higher weight than the other expert. The original algorithm has a generic loss function
. For illustrative purposes, they have used a discrete loss function , where is the indicator function, is the observed symbol and is the predicted symbol. For their cellular LTE network application, that loss function outputs a for all the experts who predicted the symbol correctly at that instant, and output a for all other experts. On the other hand, we build our loss functions based on the errors observed for each strategy. Our modified dHEDGE algorithm is given in Algorithm 1.

The weights can be initialized either uniformly or based on some prior knowledge about the experts. The factor gives more leverage to observations made in the recent past compared to the observations in the distant past. Thus, the expert which have been performing well in the recent past is boosted. By tuning the parameter , we can manipulate the dHEDGE algorithm to respond to sudden changes in the average error performance. By setting = 1, this algorithm becomes HEDGE with no forgetting factor. When modified dHEDGE is used, we observe that the algorithm tends to choose the best expert quite rapidly compared to the original HEDGE (Figure 7(c)).
Data set  Tessellation size  Sampling period  
5  15  30  60  
Vor  Geo  dHEDGE  Vor  Geo  dHEDGE  Vor  Geo  dHEDGE  Vor  Geo  dHEDGE  
Bengaluru 

63.2  60.8  60.7  37.1  40.4  36.5  25.6  29.6  25.6  17.6  21.7  17.6  
(0.1,0.4)  (0.1,0.4)  (0.1,0.1)  (0.1,0.1)  

23  20.7  20.5  12.8  19.3  12.6  9.7  16.7  9.6  6.9  12.8  7.0  
(0.1,0.1)  (0.1,0.6)  (0.1,0.1)  (0.1,0.3)  

10.9  10.5  9.7  7.4  9.4  7.3  6.4  7.4  6.4  5.4  5.8  5.4  
(0.1,0.9)  (0.1,0.7)  0.9,0.7  0.9,0.9  
New York 

61.4  40.0  40.0  33.6  31.2  31  24.8  24.4  23.8  15.2  15.6  15.2  
(0.1,0.1)  (0.1,0.1)  (0.1,0.6)  (0.1,0.1)  

23.7  15.9  15.5  13.9  12.4  12.1  11  11.6  10.6  9.4  11.9  9.3  
(0.8,0.8)  (0.7,0.2)  (0.1,0.3)  (0.1,0.4)  

10.9  10.5  10.4  9.7  8.7  8.3  11.2  7.7  7.9  10.6  12.8  10.8  
(0.1,0.9)  (0.1,0.7)  (0.1,0.1)  (0.1,0.8)  
ManhattanBronx 

29.9  20.6  20.6  17  16.5  16.5  13.2  11.1  11.1  9.15  10.5  9.15  
(0.1,0.2)  (0.1,0.3)  (0.1,0.1)  (0.1,0.1)  

11.6  14.7  11.4  8.3  12.9  8.4  7.4  10.2  7.5  6.8  9.3  6.9  
(0.1,0.8)  (0.1,0.3)  (0.1,0.2)  (0.1,0.1)  

7.9  7.2  6.9  6.2  6.7  6.2  5.5  5.7  5.1  5.9  5.4  5.5  
(0.1,0.7)  (0.7,0.9)  (0.7,0.2)  (0.6,0.9)  
BrooklynQueens 

90.2  57.2  57.2  97.6  74.9  74.2  73.9  41.8  41.8  62.6  52.9  52.9  
(0.1,0.1)  (0.1,0.1)  (0.1,0.1)  (0.1,0.1)  

73.0  35.1  35.1  49.3  27.2  27.2  37  22.9  22.9  26.2  15.9  15.8  
(0.1,0.1)  (0.1,0.1)  (0.1,0.4)  (0.1,0.3)  

42.4  19.5  19.5  23.1  28.8  23.1  18.2  22.8  18.2  12.9  18  12.9  
(0.1,0.4)  (0.1,0.8)  (0.1,0.8)  (0.9,0.9) 
The modified dHEDGE algorithm was run on both datasets at both sampling periods. The parameters and for each scenario are chosen based on an independent validation set spanning over 24 hours. The performance of the algorithm can be seen in the Figure 9. We can see that the algorithm picks up the best shifting expert, by giving more weightage to the behavior of that expert in the recent past. Note that in Figure 8(a), the plots of Voronoi and dHEDGE overlap as Voronoi is consistently the best performer throughout the forecasting horizon. The last two sub figures in each of the Figures 8(a)8(d) show the cumulative mean error behaviors of the strategies. The cumulative mean error plots reveal that the modified dHEDGE performs at least as good as the best expert. Whenever the experts in consideration have similar cumulative errors, the hybrid strategy has an error lower than both experts; else, the hybrid strategy consistently has a cumulative error equal to the best performing expert in the recent past. We also analyze the performance of the strategies on various boroughs of New York city, to avoid the issue of interisland Voronoi tessellations. To further demonstrate the flexibility of our algorithm in adapting to various scenarios, we have repeated the simulations for temporal aggregation levels of 5 and 30 minutes. Multiple spatial resolutions are also explored to validate the applicability of our algorithm. Irrespective of all these nuances, when dHEDGE was applied, the hybrid strategy still turned out to be the winner for all the datasets considered, with different metrics at multiple temporal and spatial scales. Table III summarizes the performances of the strategies on the datasets, with SMAPE as the metric. Similar results was observed using MASE, and hence,are not included. We observe that the modified dHEDGE improves the prediction accuracy by picking the best strategy for that particular instant, for that city, at that temporal and spatial resolution. Please note that our policy does not switch continuously between strategies every minutes. This is because the dHEDGE chooses the winner strategy for an instant based on neither the current nor the immediate past performance, but on the recent past performance of both experts. The dHEDGE switches only if one strategy performs better than the other for a period of time in the past. On tracking the performance of our strategy for all the test cases mentioned in Table III, we observe an average of only 1.5 switches per day at 60 min. aggregation levels. In other words, for 24 time steps, the hybrid strategy made less than 2 switches on average for all tested scenarios. For 30 min. (i.e., 48 steps) and 15 min. (i.e., 96 steps) aggregation levels, the average switches made were only 3.2 and 8.4 per day. With finer temporal resolutions, the number of switches increases, which is inevitable, considering the increase in the number of time steps in the forecast horizon. We feel that the observed number of switches is admissible, in return for a significant improvement in accuracy.
Therefore, based on our observations from two independent datasets, we find that there is no universal winning tessellation technique that works for all datasets. However, a hybrid technique can be developed for enhanced prediction performances on a broad range of datasets.
Vi Conclusions
Efficient spatial partitioning is a key step towards better locationbased demand modeling and forecasting. To that end, we performed a comparison of two widely used spatial partitioning techniques, Voronoi tessellation and Geohashing. The study was conducted using two distinct datasets; mobile application login based taxi demand data set from Bengaluru, India and governmentrun street hailing Yellow taxi services data set from New York, USA. A KMeans clustering algorithm was employed to generate demand clusters, that act as generating sites for the tessellations. In order to compare Voronoi based models and Geohash based models at multiple time scales, timeseries techniques were applied to the datasets at temporal aggregation levels of 15 and 60 minutes. STL decomposition with ARIMA, ETS, and TBATS were shortlisted as the suitable models. We noticed that neither Geohash nor Voronoi tessellation technique individually proved to be optimal for the entire forecast horizon. Additionally, models based on Voronoi had a superior performance over models based on Geohash when the demand density is low, and vice versa. While Voronoi tessellation appears to be the recommended strategy for tessellating Bengaluru, Geohash tessellation was the winning strategy for the New York city. Thus, we conclude that the tessellation strategy is heavily dependent on the demand density in each partition, and the geography of the city.
The lack of a clear winning strategy prompted us to devise a hybrid tessellation strategy by combining models based on the well known HEDGE algorithm. We modified dHEDGE, which is a discounted version of the HEDGE algorithm, to suit our application. Our algorithm is shown to always pick up the best possible strategy at each time step in the forecast horizon. The tessellation performance is no longer affected by the time of the day, or the features of the underlying data set. Our hybrid tessellation strategy was clearly the winning strategy for all the datasets considered, performing consistently better at multiple time scales with different performance metrics.
This work is directed towards a comparison of tessellation strategies, where we have focused on combining the strategies. More tessellation experts can be added to check the robustness of our dHEDGE algorithm. Recent developments in Recurrent Neural Networks have shown promises in the prediction domain
[8], which can be utilized to improve predictions. As part of future works, nonparametric techniques such as Support Vector Regression, Artificial Neural Networks, and Bayesian analysis may be applied to model the tessellation data, to check their effectiveness. Extensive parameter tuning of KMeans should be done, if possible, to improve the clustering. KMeans algorithm can also be replaced by density based clustering algorithms; for example, DBSCAN and OPTICS, among others.
Acknowledgements
The work is undertaken as part of the CognizantErnet IIT Madras project. The authors are also thankful to Vishnu Raj, Gopal Krishna Kamath, and Sudharsan Parthasarathy for many helpful discussions.
References

[1]
D. Shao, W. Wu, S. Xiang, and Y. Lu, “Estimating taxi demandsupply level using taxi trajectory data stream,” in
Proc. of IEEE International Conference on Data Mining Workshop, pp. 407–413, 2015.  [2] C. Kamga, M. A. Yazici, and A. Singhal, “Hailing in the rain: temporal and weatherrelated variations in taxi ridership and taxi demandsupply equilibrium,” in Transportation Research Board 92nd Annual Meeting, no. 133131, 2013.
 [3] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang, “Prediction of urban human mobility using largescale taxi traces and its applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.
 [4] L. MoreiraMatias, J. Gama, M. Ferreira, J. MendesMoreira, and L. Damas, “Predicting taxi–passenger demand using streaming data,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1393–1402, 2013.
 [5] K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang, “A framework for passengers demand prediction and recommendation,” in Proc. of IEEE International Conference on Services Computing, pp. 340–347, 2016.
 [6] J. Lee, I. Shin, and G.L. Park, “Analysis of the passenger pickup pattern for taxi location recommendation,” in Proc. of International Conference on Networked Computing and Advanced Information Management, pp. 199–204, 2008.

[7]
Y. Dong, S. Qian, K. Zhang, and Y. Zhai, “A novel passenger hotspots searching
algorithm for taxis in urban area,” in
Proc. of IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
, pp. 175–180, 2017.  [8] J. Xu, R. Rahmatizadeh, L. Bölöni, and D. Turgut, “Realtime prediction of taxi demand using recurrent neural networks,” IEEE Transactions on Intelligent Transportation Systems (Early Access), 2017.
 [9] S. Phithakkitnukoon, M. Veloso, C. Bento, A. Biderman, and C. Ratti, “Taxiaware map: Identifying and predicting vacant taxis in the city.” in Proc. of International Joint Conference on Ambient Intelligence, pp. 86–95, 2010.
 [10] Y. Ding, Y. Li, K. Deng, H. Tan, M. Yuan, and L. M. Ni, “Dissecting regional weathertraffic sensitivity throughout a city,” in Proc. of IEEE International Conference on Data Mining, pp. 739–744, 2015.
 [11] J. Shen, X. Liu, and M. Chen, “Discovering spatial and temporal patterns from taxibased floating car data: a case study from nanjing,” GIScience & Remote Sensing, vol. 54, no. 5, pp. 1–22, 2017.
 [12] Y. Zhang, B. Li, and K. Ramayya, “Learning individual behavior using sensor data: The case of gps traces and taxi drivers,” 2016. [Online]. Available: https://ssrn.com/abstract=2779328.
 [13] N. Davis, G. Raina, and K. Jagannathan, “A multilevel clustering approach for forecasting taxi travel demand,” in Proc. of IEEE International Conference on Intelligent Transportation Systems, pp. 223–228, 2016.
 [14] R. Outersterp, “Partitioning of spatial data in publishsubscribe messaging systems,” Master’s thesis, University of Twente, 2016.
 [15] R. Gelda, K. Jagannathan, and G. Raina, “Forecasting supply in voronoi regions for appbased taxi hailing services.” in International Conference on Advanced Logistics and Transport, 2017.

[16]
J. MacQueen et al., “Some methods for classification and analysis of
multivariate observations,” in
Proc. of the Berkeley Symposium on Mathematical Statistics and Probability
, pp. 281–297, 1967.  [17] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, 2014.

[18]
D. Xu and Y. Tian, “A comprehensive survey of clustering algorithms,”
Annals of Data Science
, vol. 2, no. 2, pp. 165–193, 2015.  [19] M. Kaushik and M. B. Mathur, “Comparative study of kmeans and hierarchical clustering techniques,” International Journal of Software & Hardware Research in Engineering, vol. 2, no. 6, 2014.
 [20] S. Chakraborty, N. Nagwani, and L. Dey, “Performance comparison of incremental kmeans and incremental dbscan algorithms,” arXiv preprint arXiv:1406.4751, 2014.
 [21] J. Burns, “Centroidal voronoi tessellations,” 2009. [Online]. Available: https://www.whitman.edu/Documents/Academics/Mathematics/burns.pdf.
 [22] L. MoreiraMatias, J. MendesMoreira, J. F. de Sousa, and J. Gama, “Improving mass transit operations by using avlbased systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 1636–1653, 2015.
 [23] C. Chen, Y. Wang, L. Li, J. Hu, and Z. Zhang, “The retrieval of intraday trend and its influence on traffic prediction,” Transportation Research part C: Emerging Technologies, vol. 22, pp. 103–118, 2012.

[24]
M. Lippi, M. Bertini, and P. Frasconi, “Shortterm traffic flow forecasting: An experimental comparison of timeseries analysis and supervised learning,”
IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 871–882, 2013.  [25] C. Ding, J. Duan, Y. Zhang, X. Wu, and G. Yu, “Using an arimagarch modeling approach to improve subway shortterm ridership forecasting accounting for dynamic volatility,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 4, pp. 1054–1064, 2017.
 [26] Y. Liu, C. Liu, N. J. Yuan, L. Duan, Y. Fu, H. Xiong, S. Xu, and J. Wu, “Intelligent bus routing with heterogeneous human mobility patterns,” Knowledge and Information Systems, vol. 50, no. 2, pp. 383–415, 2017.

[27]
V. G. Vovk, “Aggregating strategies,” in
Proc. of Computational Learning Theory
, pp. 371–383, 1990.  [28] Y. Freund and R. E. Schapire, “A desiciontheoretic generalization of online learning and an application to boosting,” in Proc. of European Conference on Computational Learning Theory, pp. 23–37, 1995.
 [29] V. Raj and S. Kalyani, “An aggregating strategy for shifting experts in discrete sequence prediction,” arXiv preprint arXiv:1708.01744, 2017.
 [30] R. J. Hyndman et al., “Another look at forecastaccuracy metrics for intermittent demand,” Foresight: The International Journal of Applied Forecasting, vol. 4, no. 4, pp. 43–46, 2006.
 [31] N. Y. C. Taxi & Limousine Commission, “Tlc trip record data,” 2017. [Online]. Available: http://www.nyc.gov/html/tlc/html/ about/trip_record_data.shtml.
 [32] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and practice. OTexts, 2014.
 [33] A. M. De Livera, R. J. Hyndman, and R. D. Snyder, “Forecasting time series with complex seasonal patterns using exponential smoothing,” Journal of the American Statistical Association, vol. 106, no. 496, pp. 1513–1527, 2011.
 [34] F. J. Massey Jr, “The kolmogorovsmirnov test for goodness of fit,” Journal of the American Statistical Association, vol. 46, no. 253, pp. 68–78, 1951.
Comments
There are no comments yet.