Short-term Demand Forecasting for Online Car-hailing Services using Recurrent Neural Networks

by   Alireza Nejadettehad, et al.
University of Tehran

Short-term traffic flow prediction is one of the crucial issues in intelligent transportation system, which is an important part of smart cities. Accurate predictions can enable both the drivers and the passengers to make better decisions about their travel route, departure time and travel origin selection, which can be helpful in traffic management. Multiple models and algorithms based on time series prediction and machine learning were applied to this issue and achieved acceptable results. Recently, the availability of sufficient data and computational power, motivates us to improve the prediction accuracy via deep-learning approaches. Recurrent neural networks have become one of the most popular methods for time series forecasting, however, due to the variety of these networks, the question that which type is the most appropriate one for this task remains unsolved. In this paper, we use three kinds of recurrent neural networks including simple RNN units, GRU and LSTM neural network to predict short-term traffic flow. The dataset from TAP30 Corporation is used for building the models and comparing RNNs with several well-known models, such as DEMA, LASSO and XGBoost. The results show that all three types of RNNs outperform the others, however, more simple RNNs such as simple recurrent units and GRU perform work better than LSTM in terms of accuracy and training time.



There are no comments yet.


page 5

page 8


Comparison between DeepESNs and gated RNNs on multivariate time-series prediction

We propose an experimental comparison between Deep Echo State Networks (...

Comparative Analysis of Machine Learning Models for Predicting Travel Time

In this paper, five different deep learning models are being compared fo...

Short-term traffic prediction using physics-aware neural networks

In this work, we propose an algorithm performing short-term predictions ...

Short Term Blood Glucose Prediction based on Continuous Glucose Monitoring Data

Continuous Glucose Monitoring (CGM) has enabled important opportunities ...

Predictive Precompute with Recurrent Neural Networks

In both mobile and web applications, speeding up user interface response...

An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting

The key component in forecasting demand and consumption of resources in ...

Network-Scale Traffic Modeling and Forecasting with Graphical Lasso and Neural Networks

Traffic flow forecasting, especially the short-term case, is an importan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Online car-hailing apps have evolved as novel and popular services to provide on-demand transportation service via mobile apps. Comparing with the traditional transportation means such as the subways and buses, the online car-hailing service is much more convenient and flexible for the passengers. Furthermore, by incentivizing private cars owners to provide car-hailing services, it promotes the sharing economy and enlarges the transportation capacities of the cities. Several car-hailing mobile apps have gained great popularities all over the world, such as Uber, Didi, and Lyft. Large number of passengers are served and a significant volume of car-hailing orders are generated routinely every day. For example, TAP30, one of the largest online car-hailing service providers in Iran, handles hundreds of thousands of orders per day all over Iran.

These platforms serve as a coordinator who matches requesting orders from passengers (demand) and vacant registered cars (supply). There exists an abundance of leverages to influence drivers’ and passengers’ preference and behavior, and thus affect both the demand and supply, to maximize profits of the platform or achieve maximum social welfare. Having better understanding of the short-term passenger demand over different spatial zones is of great importance to the platform or the operator, who can incentivize drivers to the zones with more potential passenger demands, and improve the utilization rate of the registered cars. However, in metropolises like Tehran, it is common to see passengers seeking for taxicabs roadside while some taxi drivers are cruising idly on the street. This contradiction reveals the supply-demand disequilibrium with the following two scenarios: Scenario 1, demand exceeds supply, where passengers’ needs would not be met in a timely response. Scenario 2, supply exceeds demand, where drivers would spend overly long time in seeking for passengers. To solve the problem of disequilibrium, an overall prediction for passenger demand in different zones, provides a global distribution of passengers, upon which providers of car-hailing services can adjust prices and dispatch policies of supply dynamically in advance. We define the taxi-demand prediction problem as follows: Given historical taxi demand data in a region , we want to predict the number of ride requests that will emerge within during the next time interval.

Over the past few decades, many data analysis models have been proposed to solve the short-term traffic forecasting problem, including probabilistic models [1], time-series forecasting methods [2][3]

and decision tree based methods

[4]. Recently approaches based on neural networks gained noticeable attention in studies related to traffic flow prediction[5][6][7]. One of the most popular kinds of NNs in this context is Recurrent Neural Networks (RNNs) [8][9]. Since 2015, when [8] proposed long-short term memory (LSTM) NNs for traffic flow prediction and showed that LSTMs (due to their excellent ability to memorize long-term dependencies) outperform other methods in this particular context, almost every study that attempted to use RNNs for demand prediction, has utilized LSTMs [9][10][11]

. In this paper, the performance of different types of RNNs are evaluated and compared with some other powerful methods such as eXtreme Gradient Boosting (XGBoost)

[12] and least absolute shrinkage and selection operator (LASSO)[13] and also with each other. Experimental results demonstrate that RNNs outperform the other methods according to the metrics chosen for comparison; However when it comes to the comparison between RNNs, Simple RNN units and Gated recurrent unit (GRU) defeat LSTM in terms of performance and computational(training) time.

The results obtained from experiments show that the best non-RNN method (XGBoost) reached error rates 3.78 and 40.8% according to RMSE and MAPE, respectively. However these errors were reduced to 3.22 and 37.42% by simple RNN units. In addition to the fact that simple RNN units outperformed other non-RNN methods and LSTM, computation time required for simple RNN units is approximately 0.13 and 0.1 the time needed to train XGBoost and LSTM, respectively. Although the experimental results denote that simple RNN units and GRU perform nearly the same, there is a significant difference between their training time and simple RNN units train nearly 13 times faster than GRU.

2 Related work

Although there has been many efforts to predict traffic flow using spatiotemporal data; the most related studies to the demand prediction problem shows that the most implemented methods consists of probabilistic models such as Poisson [1], time-series forecasting methods such as auto regression integrated moving average (ARIMA) [2][3] and neural networks [5][6][7]. Between the time-series forecasting methods, ARIMA is more prevalent because of its performance in short-term forecasting. [2] presented an improved ARIMA-based method to forecast the spatial-temporal distribution of passengers in urban environment. First, urban regions with high demand are detected; then demand in next hour are predicted in those regions using ARIMA and finally, demand is forecasted using an improved ARIMA-based method that uses both time and type of the day. [3] proposes the challenge that ARIMA is not necessarily the best method to forecast demand. They propose an end-to-end framework to predict the number of services that will happen at taxi stands by applying the time-varying Poisson model and ARIMA. Moreover, they used sliding-window ensemble framework to originate a prediction by combining the prediction of each model accuracy. The dataset was generated from 441 vehicles with 63 taxi stands in the city of Porto. [1]

presented and algorithm based on Poisson model to recommend the most probable points to find passengers for taxi drivers in shortest time.

[14] proposed a multi-level clustering technique to improve the accuracy of linear time-series model fitting, by exploring the correlation between adjacent Geo-hashes.

Recently, the success of deep learning in the fields of computer vision and natural language processing

[15][16], motivated researchers to apply deep learning techniques on traffic prediction problems. [5]

is one of the first studies that implemented NNs in order to forecast taxi demand. They have used a multilayer perceptron to achieve this target.

[6] introduced a new parameter named "Maximum predictability" showed that different predictors (Markov predictor (a probability-based predictive algorithm), the Lempel-Ziv-Welch predictor (a sequence-based predictive algorithm), and the Neural Network predictor (a predictive algorithm that uses machine learning)), perform differently according to the maximum predictability of a region. They showed that considering maximum predictability, in the regions with more random demand pattern, NNs perform better and in the regions with lower randomness in their demand pattern, Markov predictor beats the others. [7] proposed an end-to-end framework named DeepSD, based on a novel deep neural network structure that automatically discovers the complicated supply-demand patterns in historical order, weather and traffic data, with minimal amount of hand-crafted features.

In 2015 [8] proposed long-short term memory NNs (LSTMs) for traffic flow prediction and showed that LSTMs (due to their excellent ability to memorize long-term dependencies) perform better in comparison to the other methods in this particular context. Since then, almost every study that used Recurrent neural networks to predict demand, used LSTMs [9][10][11]. In this paper we are going to compare the performance of different types of RNNs and also evaluate their performance in comparison to some other powerful methods such as XGBoost and LASSO.

3 Material and Methods

In this section, first, we explain how we cleaned the dataset and prepared it for modeling. Second, the features used in the models are introduced and finally, three different types of recurrent neural networks that we have used as models are explained in details.

3.1 Data Processing

The dataset used in this study is real-world data from TAP30 corporation ride requests from September 1st to December 20th, 2017. The details of raw data taken from database is shown in Table 1.

Data type Description
Ride Request ID The unique ID of the ride request
Passenger ID The unique ID of the passenger that made the ride request
Timestamp Timestamp of the ride request
Latitude/Longitude GPS location of origin of the ride request
Table 1: Details of raw data

The urban area is partitioned into 1616 grids uniformly where each grid refers to a region. On the other hand, we consider variables aggregated in a 15 minutes time interval in this paper. We have removed the ride requests canceled in 5 seconds, because there are not considered to be real demand and potentially are noisy data. And also the ride requests that a passenger with his/her unique passenger id has made in a time interval of 15 minutes length are aggregated to become a single request. The number of unique ride requests made, represents the demand. We aggregated the number of unique ride requests for all 256 regions, every 15 minutes. In order to obtain robust and interpretable results, we decided to consider only the regions that at least 300 ride requests per day on average(nearly 3 ride requests in each time interval on average) had been made in them. After eliminating the regions that does not satisfy our limit, 64 regions were left.

3.2 Features

There are 68 main features for the predictive model. Each data point in our final cleaned data has 4 temporal features and 64 spatial features.

3.2.1 Temporal Features

We have extracted 4 main temporal features from the timestamps of the cleaned raw data. In order to use the continuous nature of the timeslot feature, first, we converted the timeslot number to triangular format and used its sine and cosine as features. Table 2 includes the temporal features and their description.

Feature Description
Day of week The ID of the day of week
National holiday Whether the day is a national holiday or not
Timeslot Sineunus sin(2timeslot number/96)
Timeslot Cosineus cos(2timeslot number/96)
Table 2: Temporal features

3.2.2 Spatial Features

Since there are correlations between the amount of demand in a region and the other regions, we used the amount of demand in all regions in the previous timeslots as features. For example to predict the demand in timeslot in region number , not only we used the demand in previous timeslots in that region, but also we used the demand in all other regions as features in our models.

3.3 Methods

In this section, we briefly describe our selected recurrent neural networks for the aforementioned task, which are Simple RNN, GRU (Gated recurrent unit) and LSTM (Long short term memory).

3.3.1 Simple RNN

A recurrent neuron is a special kind of artificial neuron which has a backward connection to the neurons in previous layers. RNNs have internal memory which allows them to operate over sequential data effectively. This feature made the RNNs one of the most popular models for dealing with sequential tasks such as handwriting recognition

[17], NLP[18] and time series forecasting[19].

Figure 1: A recurrent neural network
Figure 2: An unrolled recurrent neural network

Figure 2 shows the structure of an RNN and Figure 2 illustrates an unrolled RNN an how it deals with sequential data. Given a sequence X = {, , , …, } as input, RNN computes the hidden state sequence H = {, , , …, } and output sequence Y = {, , , …, } using Equations 1 and 2.


In Equations 1 and 2 , and denote the input-to-hidden, hidden-to-hidden and hidden-to-output weight matrices, respectively. and

are hidden layer bias and output layer bias vectors.


are the activation functions of the hidden layer and output layer respectively. The hidden state of each time step is passed to the next time step’s hidden state.

3.3.2 Long short term memory

Long Short Term Memory networks are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter and Schmidhuber (1997)[20], and were refined and popularized by many researchers in different contexts. LSTMs are explicitly designed to avoid the long-term dependency problem. In comparison to simple RNN, LSTM has a more complicated structure and contains three kinds of gates: input gate, forget gate and cell state gate. Figure 4 illustrates an LSTM cell.

Figure 3: An LSTM Cell
Figure 4: A GRU Cell

Forget gate: After getting the output of previous state, , Forget gate helps to take decisions about what must be removed from

state and thus keeping only relevant stuff. It is surrounded by a sigmoid function which helps to crush the input between 0 and 1. (Equation



Input Gate: In the input gate, we decide to add new stuff from the present input to our present cell state scaled by how much we wish to add them. Sigmoid layer decides which values to be updated and layer creates a vector for new candidates to added to present cell state. (Equations 4 and 5):


Then the cell state is calculated by Equation 6:


Output Gate: Finally the sigmoid function decides what to output from the cell state as shown in Equation 7. We multiply the input with “tanh” to crush the values between (-1) and 1, then multiply it with the output of sigmoid function so that we only output what we want to. (Equations 7 and 8)


3.3.3 Gated recurrent unit

GRU was proposed by Cho et al. in 2014[21]. It is similar to LSTM in structure but simpler to compute and implement. The difference between a GRU cell and an LSTM cell is in the gating mechanism. It combines the forget and input gates into a single update gate. It also merges the cell state and the hidden state. The function of reset gate is similar to forget gate of LSTM. Since the structure of GRU is very similar to LSTM, we will not get into the detailed formula. The structure of a GRU cell is shown in Figure 4.

3.4 Methods for Comparison

We compared the results obtained from recurrent neural networks with a tree-based regression method (XGBoost), one linear regression method (LASSO) and one moving average time series forecasting method (DEMA). We have tuned the parameters for all these methods, then reported the results. Since these methods are not able to process sequentially formed data, demand intensity for 4 previous timeslots (the sequence length chosen for RNNs) were fed to them as features.

3.4.1 Dema

Double exponential moving average is a well-known method for time series forecasting problems. It attempts to remove the inherent lag associated to Moving Averages by placing more weight on recent values. The name suggests this is achieved by applying a double exponential smoothing which is not the case. The name double comes from the fact that the value of an EMA (Exponential Moving Average) is doubled. To keep it in line with the actual data and to remove the lag the value "EMA of EMA" is subtracted from the previously doubled EMA.


3.4.2 Lasso

Least absolute shrinkage and selection operator(LASSO) is a linear model that estimates sparse coefficients. It usually produces better prediction result than simple linear regression. We use the LASSO implementation from the scikit-learn library.


3.4.3 XGBoost

eXtreme Gradient Boosting(XGBoost) is a powerful ensemble boosting tree based method and is widely used in data mining applications both for classification and regression problems. We use the XGBoost implementation from XGBoost python package.[12]

4 Results

In this section, we declare our RNNs’ specifications and introduce the metrics that evaluations are performed based on them. Then, we evaluate different RNN models on our dataset and see how well they can predict the requests in the future. In addition, we compare our model with 3 other baselines and show that RNNs outperform all.

4.1 Experimental Setup

Our dataset is obtained from TAP30 Co. ride requests in Tehran from September 1st to December 20th, 2017. We used the first prior 80 days to train the models and last 30 days for validation. All three kinds of recurrent neural networks (Simple RNN, GRU, LSTM) were implemented in Keras API built on top of Tensorflow. Although recurrent neural networks can accept sequences with any length as input, because of the nature of our problem we had to choose a constant sequence length. Due to the constrained computational power we had, we used every hour data as a sequence. Because the time interval for each data point is 15 minutes, each sequence consists of four data points. Since the data contains records for 110 days, the shape of data would be (110*24, 4, 68). Table 

3 includes the list of parameters used in the experiment for all three types of RNNs.

Data of each sequence 1 hour data
Time-step length 15 mins
Sequence length 4
Number of regions 64
Number of features 68
Number of hidden layers 2
Number of neurons in each hidden layer 1500-2000
Activation function of hidden recurrent layers tanh
Loss function Mean squared error
Table 3: Experimental Parameters

4.2 Evaluation metrics

We use root mean absolute error (RMSE) and mean absolute percentage error (MAPE) to evaluate the models. These metrics are defined as follows:


Where and mean the real and prediction value for demand in region for time interval and denotes total number of samples.

4.3 Experimental Results

First we report the performance of RNNs (RMSE and MAPE) over the entire city (all selected regions) and then we report the errors on each category of regions.

4.3.1 Performance over the entire city

To evaluate the prediction performance over the entire city which includes 64 regions, we compare the performance of RNNs with other methods described in 3.4 in terms of RMSE and MAPE from Equations 10 and 11. We report the RMSE and MAPE over the entire city during daily hours in Figures 6 and 6.

Figure 5: Prediction performance at different hours according to RMSE
Figure 6: Prediction performance at different hours according to MAPE

As it can be seen in Figures 6 and 6, all methods share common patterns through both metrics. For instance, they reach their minimum values at about 3am and maximum values at about 7pm. All three kinds of RNNs show better performance than the other methods, but between them, RNN and GRU have nearly the same error values during the day and are better than LSTM with a considerable difference. There is a haphazard pattern between hours 12:00am and 6:00am in Figure 6. According to the Equation 11, MAPE is a very sensitive metric and depends on the real value’s range. Since the amount of ride requests through these hours are extremely low, this metric fails to have a specific pattern during these hours. Predicting demand intensity during rush hours (about 8am and 5pm) is considered more crucial than the other times. According error rates both RMSE and MAPE, it can be observed that RNNs demonstrate considerably better performance in comparison to the others.

Method RMSE MAPE (%) Training time
DEMA 4.37 48.54 -
LASSO 3.87 41.42 4 mins/37 secs
XGBoost 3.78 40.80 120 mins/53 secs
LSTM 3.46 39.04 146 mins/43 secs
Simple RNN 3.22 37.42 16 mins/40 secs
GRU 3.21 37.50 119 mins/19 secs
Table 4: Errors over the entire city

Table 4 shows the detailed values of errors over the entire city for each method. Training was performed on a core-i7-7700HQ CPU with 16 GBs of RAM.

4.3.2 Performance over categorized regions

We have categorized 64 regions in Tehran, to 5 distinct categories. The regions with average ride requests per day greater than 1600, are categorized as very crowded regions and the regions with average ride requests per day less than 400, are categorized as very uncrowded regions and the other 3 categories are placed between these 2 categories. Figures 8 and 8 illustrate the performance in terms of RMSE and MAPE respectively over these 5 categories. As we move from the very uncrowded regions to very crowded ones, since the real value of demand gets greater, the range for RMSE gets greater and the range for MAPE becomes less. But over all 5 categories, RNNs show a better performance. Especially simple RNN and GRU are the best models.

Figure 7: Prediction performance in different region categories according to RMSE
Figure 8: Prediction performance in different region categories according to MAPE

5 Conclusion

In this paper different types of recurrent neural networks were implemented and used in order to forecast short-term demand in different regions on an online car-hailing company’s data. We compared the performance of prediction between three types of RNNs including simple RNN, GRU and LSTM with tree based models (XGBoost and Random forest), a very powerful linear regression model (LASSO) and time series forecasting models based on moving averages (SMA, DEMA). The results indicated that all three types of RNNs outperformed the other methods but the simple RNN and GRU showed the best results between RNNs. Compared to the best non-RNN method (XGBoost), GRU and Simple RNN reduced RMSE about 15% and reduced MAPE nearly 8%. Since the nature of the demand prediction problem for traffic flow is a short-term history dependent kind, more simple types of RNN’s performed better than long-short term memory networks (LSTM). Not only LSTM networks’ performance is worse than other RNNs, but also it takes more time for training due to the complexity of these networks.

6 Acknowledgments

This research is financially supported by TAP30 Co. and also the authors are grateful to TAP30 Co. for providing sample data.


  • [1] N. J. Yuan, Y. Zheng, L. Zhang, X. Xie, T-finder: A recommender system for finding passengers and vacant taxis, IEEE Transactions on knowledge and data engineering 25 (10) (2013) 2390–2403 (2013).
  • [2] X. Li, G. Pan, G. Qi, S. Li, Predicting urban human mobility using large-scale taxi traces, in: Proceedings of the First Workshop on Pervasive Urban Applications, 2011 (2011).
  • [3] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, L. Damas, Predicting taxi–passenger demand using streaming data, IEEE Transactions on Intelligent Transportation Systems 14 (3) (2013) 1393–1402 (2013).
  • [4] X. Zhang, X. Wang, W. Chen, J. Tao, W. Huang, T. Wang, A taxi gap prediction method via double ensemble gradient boosting decision tree, in: Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 2017 IEEE 3rd International Conference on, IEEE, 2017, pp. 255–260 (2017).
  • [5] N. Mukai, N. Yoden, Taxi demand forecasting based on taxi probe data by neural network, in: Intelligent Interactive Multimedia: Systems and Services, Springer, 2012, pp. 589–597 (2012).
  • [6] K. Zhao, D. Khryashchev, J. Freire, C. Silva, H. Vo, Predicting taxi demand at high spatial resolution: approaching the limit of predictability, in: Big Data (Big Data), 2016 IEEE International Conference on, IEEE, 2016, pp. 833–842 (2016).
  • [7] D. Wang, W. Cao, J. Li, J. Ye, Deepsd: supply-demand prediction for online car-hailing services using deep neural networks, in: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), IEEE, 2017, pp. 243–254 (2017).
  • [8] Y. Tian, L. Pan, Predicting short-term traffic flow by long short-term memory recurrent neural network, in: Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, IEEE, 2015, pp. 153–158 (2015).
  • [9] Z. Zhao, W. Chen, X. Wu, P. C. Chen, J. Liu, Lstm network: a deep learning approach for short-term traffic forecast, IET Intelligent Transport Systems 11 (2) (2017) 68–75 (2017).
  • [10] J. Xu, R. Rahmatizadeh, L. Bölöni, D. Turgut, Real-time prediction of taxi demand using recurrent neural networks, IEEE Transactions on Intelligent Transportation Systems 19 (8) (2018) 2572–2581 (2018).
  • [11] J. Ke, H. Zheng, H. Yang, X. M. Chen, Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach, Transportation Research Part C: Emerging Technologies 85 (2017) 591–608 (2017).
  • [12] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, Xgboost: extreme gradient boosting, R package version 0.4-2 (2015) 1–4 (2015).
  • [13] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological) (1996) 267–288 (1996).
  • [14] N. Davis, G. Raina, K. Jagannathan, A multi-level clustering approach for forecasting taxi travel demand, in: Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on, IEEE, 2016, pp. 223–228 (2016).
  • [15] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015) 436 (2015).
  • [16]

    A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105 (2012).

  • [17] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, J. Schmidhuber, A novel connectionist system for unconstrained handwriting recognition, IEEE transactions on pattern analysis and machine intelligence 31 (5) (2009) 855–868 (2009).
  • [18] A. Graves, A.-r. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in: Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, IEEE, 2013, pp. 6645–6649 (2013).
  • [19] J. T. Connor, R. D. Martin, L. E. Atlas, Recurrent neural networks and robust time series prediction, IEEE transactions on neural networks 5 (2) (1994) 240–254 (1994).
  • [20] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780 (1997).
  • [21] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014).