1 Introduction
Online carhailing apps have evolved as novel and popular services to provide ondemand transportation service via mobile apps. Comparing with the traditional transportation means such as the subways and buses, the online carhailing service is much more convenient and flexible for the passengers. Furthermore, by incentivizing private cars owners to provide carhailing services, it promotes the sharing economy and enlarges the transportation capacities of the cities. Several carhailing mobile apps have gained great popularities all over the world, such as Uber, Didi, and Lyft. Large number of passengers are served and a significant volume of carhailing orders are generated routinely every day. For example, TAP30, one of the largest online carhailing service providers in Iran, handles hundreds of thousands of orders per day all over Iran.
These platforms serve as a coordinator who matches requesting orders from passengers (demand) and vacant registered cars (supply). There exists an abundance of leverages to influence drivers’ and passengers’ preference and behavior, and thus affect both the demand and supply, to maximize profits of the platform or achieve maximum social welfare. Having better understanding of the shortterm passenger demand over different spatial zones is of great importance to the platform or the operator, who can incentivize drivers to the zones with more potential passenger demands, and improve the utilization rate of the registered cars. However, in metropolises like Tehran, it is common to see passengers seeking for taxicabs roadside while some taxi drivers are cruising idly on the street. This contradiction reveals the supplydemand disequilibrium with the following two scenarios: Scenario 1, demand exceeds supply, where passengers’ needs would not be met in a timely response. Scenario 2, supply exceeds demand, where drivers would spend overly long time in seeking for passengers. To solve the problem of disequilibrium, an overall prediction for passenger demand in different zones, provides a global distribution of passengers, upon which providers of carhailing services can adjust prices and dispatch policies of supply dynamically in advance. We define the taxidemand prediction problem as follows: Given historical taxi demand data in a region , we want to predict the number of ride requests that will emerge within during the next time interval.
Over the past few decades, many data analysis models have been proposed to solve the shortterm traffic forecasting problem, including probabilistic models [1], timeseries forecasting methods [2][3]
and decision tree based methods
[4]. Recently approaches based on neural networks gained noticeable attention in studies related to traffic flow prediction[5][6][7]. One of the most popular kinds of NNs in this context is Recurrent Neural Networks (RNNs) [8][9]. Since 2015, when [8] proposed longshort term memory (LSTM) NNs for traffic flow prediction and showed that LSTMs (due to their excellent ability to memorize longterm dependencies) outperform other methods in this particular context, almost every study that attempted to use RNNs for demand prediction, has utilized LSTMs [9][10][11]. In this paper, the performance of different types of RNNs are evaluated and compared with some other powerful methods such as eXtreme Gradient Boosting (XGBoost)
[12] and least absolute shrinkage and selection operator (LASSO)[13] and also with each other. Experimental results demonstrate that RNNs outperform the other methods according to the metrics chosen for comparison; However when it comes to the comparison between RNNs, Simple RNN units and Gated recurrent unit (GRU) defeat LSTM in terms of performance and computational(training) time.The results obtained from experiments show that the best nonRNN method (XGBoost) reached error rates 3.78 and 40.8% according to RMSE and MAPE, respectively. However these errors were reduced to 3.22 and 37.42% by simple RNN units. In addition to the fact that simple RNN units outperformed other nonRNN methods and LSTM, computation time required for simple RNN units is approximately 0.13 and 0.1 the time needed to train XGBoost and LSTM, respectively. Although the experimental results denote that simple RNN units and GRU perform nearly the same, there is a significant difference between their training time and simple RNN units train nearly 13 times faster than GRU.
2 Related work
Although there has been many efforts to predict traffic flow using spatiotemporal data; the most related studies to the demand prediction problem shows that the most implemented methods consists of probabilistic models such as Poisson [1], timeseries forecasting methods such as auto regression integrated moving average (ARIMA) [2][3] and neural networks [5][6][7]. Between the timeseries forecasting methods, ARIMA is more prevalent because of its performance in shortterm forecasting. [2] presented an improved ARIMAbased method to forecast the spatialtemporal distribution of passengers in urban environment. First, urban regions with high demand are detected; then demand in next hour are predicted in those regions using ARIMA and finally, demand is forecasted using an improved ARIMAbased method that uses both time and type of the day. [3] proposes the challenge that ARIMA is not necessarily the best method to forecast demand. They propose an endtoend framework to predict the number of services that will happen at taxi stands by applying the timevarying Poisson model and ARIMA. Moreover, they used slidingwindow ensemble framework to originate a prediction by combining the prediction of each model accuracy. The dataset was generated from 441 vehicles with 63 taxi stands in the city of Porto. [1]
presented and algorithm based on Poisson model to recommend the most probable points to find passengers for taxi drivers in shortest time.
[14] proposed a multilevel clustering technique to improve the accuracy of linear timeseries model fitting, by exploring the correlation between adjacent Geohashes.Recently, the success of deep learning in the fields of computer vision and natural language processing
[15][16], motivated researchers to apply deep learning techniques on traffic prediction problems. [5]is one of the first studies that implemented NNs in order to forecast taxi demand. They have used a multilayer perceptron to achieve this target.
[6] introduced a new parameter named "Maximum predictability" showed that different predictors (Markov predictor (a probabilitybased predictive algorithm), the LempelZivWelch predictor (a sequencebased predictive algorithm), and the Neural Network predictor (a predictive algorithm that uses machine learning)), perform differently according to the maximum predictability of a region. They showed that considering maximum predictability, in the regions with more random demand pattern, NNs perform better and in the regions with lower randomness in their demand pattern, Markov predictor beats the others. [7] proposed an endtoend framework named DeepSD, based on a novel deep neural network structure that automatically discovers the complicated supplydemand patterns in historical order, weather and traffic data, with minimal amount of handcrafted features.In 2015 [8] proposed longshort term memory NNs (LSTMs) for traffic flow prediction and showed that LSTMs (due to their excellent ability to memorize longterm dependencies) perform better in comparison to the other methods in this particular context. Since then, almost every study that used Recurrent neural networks to predict demand, used LSTMs [9][10][11]. In this paper we are going to compare the performance of different types of RNNs and also evaluate their performance in comparison to some other powerful methods such as XGBoost and LASSO.
3 Material and Methods
In this section, first, we explain how we cleaned the dataset and prepared it for modeling. Second, the features used in the models are introduced and finally, three different types of recurrent neural networks that we have used as models are explained in details.
3.1 Data Processing
The dataset used in this study is realworld data from TAP30 corporation ride requests from September 1st to December 20th, 2017. The details of raw data taken from database is shown in Table 1.
Data type  Description 

Ride Request ID  The unique ID of the ride request 
Passenger ID  The unique ID of the passenger that made the ride request 
Timestamp  Timestamp of the ride request 
Latitude/Longitude  GPS location of origin of the ride request 
The urban area is partitioned into 1616 grids uniformly where each grid refers to a region. On the other hand, we consider variables aggregated in a 15 minutes time interval in this paper. We have removed the ride requests canceled in 5 seconds, because there are not considered to be real demand and potentially are noisy data. And also the ride requests that a passenger with his/her unique passenger id has made in a time interval of 15 minutes length are aggregated to become a single request. The number of unique ride requests made, represents the demand. We aggregated the number of unique ride requests for all 256 regions, every 15 minutes. In order to obtain robust and interpretable results, we decided to consider only the regions that at least 300 ride requests per day on average(nearly 3 ride requests in each time interval on average) had been made in them. After eliminating the regions that does not satisfy our limit, 64 regions were left.
3.2 Features
There are 68 main features for the predictive model. Each data point in our final cleaned data has 4 temporal features and 64 spatial features.
3.2.1 Temporal Features
We have extracted 4 main temporal features from the timestamps of the cleaned raw data. In order to use the continuous nature of the timeslot feature, first, we converted the timeslot number to triangular format and used its sine and cosine as features. Table 2 includes the temporal features and their description.
Feature  Description 

Day of week  The ID of the day of week 
National holiday  Whether the day is a national holiday or not 
Timeslot Sineunus  sin(2timeslot number/96) 
Timeslot Cosineus  cos(2timeslot number/96) 
3.2.2 Spatial Features
Since there are correlations between the amount of demand in a region and the other regions, we used the amount of demand in all regions in the previous timeslots as features. For example to predict the demand in timeslot in region number , not only we used the demand in previous timeslots in that region, but also we used the demand in all other regions as features in our models.
3.3 Methods
In this section, we briefly describe our selected recurrent neural networks for the aforementioned task, which are Simple RNN, GRU (Gated recurrent unit) and LSTM (Long short term memory).
3.3.1 Simple RNN
A recurrent neuron is a special kind of artificial neuron which has a backward connection to the neurons in previous layers. RNNs have internal memory which allows them to operate over sequential data effectively. This feature made the RNNs one of the most popular models for dealing with sequential tasks such as handwriting recognition
[17], NLP[18] and time series forecasting[19].Figure 2 shows the structure of an RNN and Figure 2 illustrates an unrolled RNN an how it deals with sequential data. Given a sequence X = {, , , …, } as input, RNN computes the hidden state sequence H = {, , , …, } and output sequence Y = {, , , …, } using Equations 1 and 2.
(1) 
(2) 
In Equations 1 and 2 , and denote the inputtohidden, hiddentohidden and hiddentooutput weight matrices, respectively. and
are hidden layer bias and output layer bias vectors.
andare the activation functions of the hidden layer and output layer respectively. The hidden state of each time step is passed to the next time step’s hidden state.
3.3.2 Long short term memory
Long Short Term Memory networks are a special kind of RNN, capable of learning longterm dependencies. They were introduced by Hochreiter and Schmidhuber (1997)[20], and were refined and popularized by many researchers in different contexts. LSTMs are explicitly designed to avoid the longterm dependency problem. In comparison to simple RNN, LSTM has a more complicated structure and contains three kinds of gates: input gate, forget gate and cell state gate. Figure 4 illustrates an LSTM cell.
Forget gate: After getting the output of previous state, , Forget gate helps to take decisions about what must be removed from
state and thus keeping only relevant stuff. It is surrounded by a sigmoid function which helps to crush the input between 0 and 1. (Equation
3):(3) 
Input Gate: In the input gate, we decide to add new stuff from the present input to our present cell state scaled by how much we wish to add them. Sigmoid layer decides which values to be updated and layer creates a vector for new candidates to added to present cell state. (Equations 4 and 5):
(4) 
(5) 
Then the cell state is calculated by Equation 6:
(6) 
Output Gate: Finally the sigmoid function decides what to output from the cell state as shown in Equation 7. We multiply the input with “tanh” to crush the values between (1) and 1, then multiply it with the output of sigmoid function so that we only output what we want to. (Equations 7 and 8)
(7) 
(8) 
3.3.3 Gated recurrent unit
GRU was proposed by Cho et al. in 2014[21]. It is similar to LSTM in structure but simpler to compute and implement. The difference between a GRU cell and an LSTM cell is in the gating mechanism. It combines the forget and input gates into a single update gate. It also merges the cell state and the hidden state. The function of reset gate is similar to forget gate of LSTM. Since the structure of GRU is very similar to LSTM, we will not get into the detailed formula. The structure of a GRU cell is shown in Figure 4.
3.4 Methods for Comparison
We compared the results obtained from recurrent neural networks with a treebased regression method (XGBoost), one linear regression method (LASSO) and one moving average time series forecasting method (DEMA). We have tuned the parameters for all these methods, then reported the results. Since these methods are not able to process sequentially formed data, demand intensity for 4 previous timeslots (the sequence length chosen for RNNs) were fed to them as features.
3.4.1 Dema
Double exponential moving average is a wellknown method for time series forecasting problems. It attempts to remove the inherent lag associated to Moving Averages by placing more weight on recent values. The name suggests this is achieved by applying a double exponential smoothing which is not the case. The name double comes from the fact that the value of an EMA (Exponential Moving Average) is doubled. To keep it in line with the actual data and to remove the lag the value "EMA of EMA" is subtracted from the previously doubled EMA.
(9) 
3.4.2 Lasso
3.4.3 XGBoost
eXtreme Gradient Boosting(XGBoost) is a powerful ensemble boosting tree based method and is widely used in data mining applications both for classification and regression problems. We use the XGBoost implementation from XGBoost python package.[12]
4 Results
In this section, we declare our RNNs’ specifications and introduce the metrics that evaluations are performed based on them. Then, we evaluate different RNN models on our dataset and see how well they can predict the requests in the future. In addition, we compare our model with 3 other baselines and show that RNNs outperform all.
4.1 Experimental Setup
Our dataset is obtained from TAP30 Co. ride requests in Tehran from September 1st to December 20th, 2017. We used the first prior 80 days to train the models and last 30 days for validation. All three kinds of recurrent neural networks (Simple RNN, GRU, LSTM) were implemented in Keras API built on top of Tensorflow. Although recurrent neural networks can accept sequences with any length as input, because of the nature of our problem we had to choose a constant sequence length. Due to the constrained computational power we had, we used every hour data as a sequence. Because the time interval for each data point is 15 minutes, each sequence consists of four data points. Since the data contains records for 110 days, the shape of data would be (110*24, 4, 68). Table
3 includes the list of parameters used in the experiment for all three types of RNNs.Data of each sequence  1 hour data 

Timestep length  15 mins 
Sequence length  4 
Number of regions  64 
Number of features  68 
Number of hidden layers  2 
Number of neurons in each hidden layer  15002000 
Activation function of hidden recurrent layers  tanh 
Loss function  Mean squared error 
4.2 Evaluation metrics
We use root mean absolute error (RMSE) and mean absolute percentage error (MAPE) to evaluate the models. These metrics are defined as follows:
(10) 
(11) 
Where and mean the real and prediction value for demand in region for time interval and denotes total number of samples.
4.3 Experimental Results
First we report the performance of RNNs (RMSE and MAPE) over the entire city (all selected regions) and then we report the errors on each category of regions.
4.3.1 Performance over the entire city
To evaluate the prediction performance over the entire city which includes 64 regions, we compare the performance of RNNs with other methods described in 3.4 in terms of RMSE and MAPE from Equations 10 and 11. We report the RMSE and MAPE over the entire city during daily hours in Figures 6 and 6.
As it can be seen in Figures 6 and 6, all methods share common patterns through both metrics. For instance, they reach their minimum values at about 3am and maximum values at about 7pm. All three kinds of RNNs show better performance than the other methods, but between them, RNN and GRU have nearly the same error values during the day and are better than LSTM with a considerable difference. There is a haphazard pattern between hours 12:00am and 6:00am in Figure 6. According to the Equation 11, MAPE is a very sensitive metric and depends on the real value’s range. Since the amount of ride requests through these hours are extremely low, this metric fails to have a specific pattern during these hours. Predicting demand intensity during rush hours (about 8am and 5pm) is considered more crucial than the other times. According error rates both RMSE and MAPE, it can be observed that RNNs demonstrate considerably better performance in comparison to the others.
Method  RMSE  MAPE (%)  Training time 

DEMA  4.37  48.54   
LASSO  3.87  41.42  4 mins/37 secs 
XGBoost  3.78  40.80  120 mins/53 secs 
LSTM  3.46  39.04  146 mins/43 secs 
Simple RNN  3.22  37.42  16 mins/40 secs 
GRU  3.21  37.50  119 mins/19 secs 
Table 4 shows the detailed values of errors over the entire city for each method. Training was performed on a corei77700HQ CPU with 16 GBs of RAM.
4.3.2 Performance over categorized regions
We have categorized 64 regions in Tehran, to 5 distinct categories. The regions with average ride requests per day greater than 1600, are categorized as very crowded regions and the regions with average ride requests per day less than 400, are categorized as very uncrowded regions and the other 3 categories are placed between these 2 categories. Figures 8 and 8 illustrate the performance in terms of RMSE and MAPE respectively over these 5 categories. As we move from the very uncrowded regions to very crowded ones, since the real value of demand gets greater, the range for RMSE gets greater and the range for MAPE becomes less. But over all 5 categories, RNNs show a better performance. Especially simple RNN and GRU are the best models.
5 Conclusion
In this paper different types of recurrent neural networks were implemented and used in order to forecast shortterm demand in different regions on an online carhailing company’s data. We compared the performance of prediction between three types of RNNs including simple RNN, GRU and LSTM with tree based models (XGBoost and Random forest), a very powerful linear regression model (LASSO) and time series forecasting models based on moving averages (SMA, DEMA). The results indicated that all three types of RNNs outperformed the other methods but the simple RNN and GRU showed the best results between RNNs. Compared to the best nonRNN method (XGBoost), GRU and Simple RNN reduced RMSE about 15% and reduced MAPE nearly 8%. Since the nature of the demand prediction problem for traffic flow is a shortterm history dependent kind, more simple types of RNN’s performed better than longshort term memory networks (LSTM). Not only LSTM networks’ performance is worse than other RNNs, but also it takes more time for training due to the complexity of these networks.
6 Acknowledgments
This research is financially supported by TAP30 Co. and also the authors are grateful to TAP30 Co. for providing sample data.
References
 [1] N. J. Yuan, Y. Zheng, L. Zhang, X. Xie, Tfinder: A recommender system for finding passengers and vacant taxis, IEEE Transactions on knowledge and data engineering 25 (10) (2013) 2390–2403 (2013).
 [2] X. Li, G. Pan, G. Qi, S. Li, Predicting urban human mobility using largescale taxi traces, in: Proceedings of the First Workshop on Pervasive Urban Applications, 2011 (2011).
 [3] L. MoreiraMatias, J. Gama, M. Ferreira, J. MendesMoreira, L. Damas, Predicting taxi–passenger demand using streaming data, IEEE Transactions on Intelligent Transportation Systems 14 (3) (2013) 1393–1402 (2013).
 [4] X. Zhang, X. Wang, W. Chen, J. Tao, W. Huang, T. Wang, A taxi gap prediction method via double ensemble gradient boosting decision tree, in: Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 2017 IEEE 3rd International Conference on, IEEE, 2017, pp. 255–260 (2017).
 [5] N. Mukai, N. Yoden, Taxi demand forecasting based on taxi probe data by neural network, in: Intelligent Interactive Multimedia: Systems and Services, Springer, 2012, pp. 589–597 (2012).
 [6] K. Zhao, D. Khryashchev, J. Freire, C. Silva, H. Vo, Predicting taxi demand at high spatial resolution: approaching the limit of predictability, in: Big Data (Big Data), 2016 IEEE International Conference on, IEEE, 2016, pp. 833–842 (2016).
 [7] D. Wang, W. Cao, J. Li, J. Ye, Deepsd: supplydemand prediction for online carhailing services using deep neural networks, in: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), IEEE, 2017, pp. 243–254 (2017).
 [8] Y. Tian, L. Pan, Predicting shortterm traffic flow by long shortterm memory recurrent neural network, in: Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, IEEE, 2015, pp. 153–158 (2015).
 [9] Z. Zhao, W. Chen, X. Wu, P. C. Chen, J. Liu, Lstm network: a deep learning approach for shortterm traffic forecast, IET Intelligent Transport Systems 11 (2) (2017) 68–75 (2017).
 [10] J. Xu, R. Rahmatizadeh, L. Bölöni, D. Turgut, Realtime prediction of taxi demand using recurrent neural networks, IEEE Transactions on Intelligent Transportation Systems 19 (8) (2018) 2572–2581 (2018).
 [11] J. Ke, H. Zheng, H. Yang, X. M. Chen, Shortterm forecasting of passenger demand under ondemand ride services: A spatiotemporal deep learning approach, Transportation Research Part C: Emerging Technologies 85 (2017) 591–608 (2017).
 [12] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, Xgboost: extreme gradient boosting, R package version 0.42 (2015) 1–4 (2015).
 [13] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological) (1996) 267–288 (1996).
 [14] N. Davis, G. Raina, K. Jagannathan, A multilevel clustering approach for forecasting taxi travel demand, in: Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on, IEEE, 2016, pp. 223–228 (2016).
 [15] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015) 436 (2015).

[16]
A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105 (2012).
 [17] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, J. Schmidhuber, A novel connectionist system for unconstrained handwriting recognition, IEEE transactions on pattern analysis and machine intelligence 31 (5) (2009) 855–868 (2009).
 [18] A. Graves, A.r. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in: Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, IEEE, 2013, pp. 6645–6649 (2013).
 [19] J. T. Connor, R. D. Martin, L. E. Atlas, Recurrent neural networks and robust time series prediction, IEEE transactions on neural networks 5 (2) (1994) 240–254 (1994).
 [20] S. Hochreiter, J. Schmidhuber, Long shortterm memory, Neural computation 9 (8) (1997) 1735–1780 (1997).
 [21] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoderdecoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014).
Comments
There are no comments yet.