Taxi as one of the most common travel modes for urban residents, has greatly penetrated into people’s daily life. Online taxicab requesting platforms, such as Didi Chuxing111http://www.didichuxing.com/en/, Uber222https://www.uber.com and Grab333https://www.grab.com/, have recently experienced rapid expansion due to the convenience it brings to our daily travel. However, this huge industry still suffers from some inefficient operations  (i.e, long passenger waiting time  and excessive vacant trips 
). The main problem stems from the mismatch between supply and demand caused by inaccurate taxi demand prediction, which results in a large number of taxis gathering in some busy areas and causing oversupply, while in other remote areas the distribution of taxis was extremely sparse. The solution to this issue involves taxi demand prediction, which estimates the future taxi demand and helps to allocate the taxis to each region in advance.
|Method||Task and Scope|
|Zhang et al.||Traffic Inflow and Outflow Prediction|
|Jin et al.||in all regions|
|Tong et al.||Taxi Demand Prediction|
|Yao et al.||in all regions|
|Toqu et al.||Traffic Flow or Demand Prediction|
|Azzouni et al.||between some well-designed positions|
|Yang et al.||(e.g., highway toll booths, subway and bus stations)|
|Zhou et al.||Passenger Pickup/Dropoff Demand Prediction|
|in all regions|
|Ours||Taxi Demand Prediction|
|between all regions|
As a crucial task in intelligent transportation systems (ITS), taxi demand prediction has attracted a wide range of research interest and achieved notable successes [12, 13, 14, 6, 7]. However, most of the existing methods only model the demand of the taxi at the departure place and estimate the requests for taxis in all regions or some specific locations, ignoring the influence of the passenger destination. We believe the information of passengers’ destinations is critical for the taxi preallocation systems. Without considering the distribution of passengers destinations, the taxi preallocation systems deploy the taxis in advance based solely on the predicted taxi origin demand, which may suffer from the following issues:
Limited by the city management rules (such as the driving restriction policy444http://zhengce.beijing.gov.cn/library/192/33/50/438650/1552930/index.html in Beijing), some drivers are only allowed to drive in some specified regions. If a taxi driver is assigned to a region where most passengers are to go to a place where the driver is restricted, he/she cannot take orders, which may result in waste of resources.
Some drivers prefer to carry passengers in their familiar regions. Meanwhile, some drivers are unwilling to take the short trip orders for little profit. If the destinations of most passengers in the driver’s preallocated region are out of his/her operating regions or too close to the pickup locations, the driver may reject those requests.
If a driver is dispatched to a region where most passengers will travel to his/her unfamiliar regions, the driver may spend more time to carry the passengers to their destination, even though guided by GPS navigation. This will reduce the taxi market operating efficiency and the levels of passenger satisfaction.
In literature, some works [8, 9, 10] have been proposed to estimate the traffic flow or demand between some well-designed positions, such as highway toll booths, subway and bus stations. However, taxi passengers can be anywhere and these traffic flow estimation system for limited positions may not be suitable for citywide taxi preallocation. Therefore it becomes desirable to predict the taxi demand between every two regions and optimize the taxi allocation mechanism.
In this paper, we propose a challenging taxi origin-destination demand prediction task, which aims to predict the future taxi demands between any two regions. If the taxi origin demand and the destinations of passengers are well predicted, we can preallocate the taxi more efficiently to meet the passengers’ requests and simultaneously avoid all above issues. The key challenges of the proposed task lie in how to capture the diverse spatial-temporal contextual information to learn the demand patterns. For example, some regions that are spatially adjacent usually have the similar demand patterns (e.g, the number of taxi requests and the demand trends), which is called as local spatial context (LSC) in our work. Moreover, even though two regions are spatially distant, the demand patterns may still have some relevance, if they share similar functionality (e.g., both of them are residential districts). We call this relationship between two far-apart regions as global correlation context (GCC). Finally, taxi demand is a time-varying process and its evolution is related to various factors, such as its current states and the ever-changing meteorology, which is formulated as temporal evolution context (TEC).
Recently, deep neural networks have facilitated great advances in context modeling [15, 16, 17]. Inspired by this, we address the problem of taxi origin-destination demand prediction with a novel Contextualized Spatial-Temporal Network (CSTN), which well integrates the local spatial context, temporal evolution context, and global correlation context into a unified framework. Specifically, our proposed network consists of three components, including a LSC module, a TEC module and a GCC module, respectively for the three types of context modeling. Firstly, a LSC module utilizes two convolution neural networks to learn the local spatial dependencies of taxi demand respectively from the origin view and the destination view. The output of the two networks would be combined to generate the final local spatial feature, which involves the hybrid information of taxi demand patterns from different views. Secondly, a TEC module incorporates both the local spatial features of taxi demand and the meteorological information to a CNN-LSTM network  (convolutional long short-term memory network) for the analysis of taxi demand evolution. Thirdly, to capture the correlation between the far-apart regions, the GCC module computes the similarity between any two regions and generates the global correlation feature of each region by summing the features of all regions with the similarity weights. In this way, each region contains the information of all regions and it is mainly relevant to the regions that have high similarities with it. Finally, we integrate the local spatial-temporal feature generated by TEC module and the global correlation feature generated by GCC module to predict the future taxi origin-destination demand.
The main contributions of this work are three-fold:
We extend the existing taxi demand prediction to the task of taxi original-destination demand prediction, which is more worth-exploring for intelligent transportation systems. To the best of our knowledge, we are the first to study the interregional taxi demand prediction.
We propose a novel Contextualized Spatial-Temporal Network to address this task, which well integrates the local spatial context, temporal evolution context and global correlation context into a unified framework.
Extensive experiments on a large-scale benchmark of taxi original-destination demand prediction demonstrate that our approach outperforms existing state-of-the-art methods by a margin.
The rest of the paper is organized as follows. We firstly review some related works in Section II and define the taxi original-destination demand problem with some notations in Section III. We then introduce the proposed Contextualized Spatial-Temporal Network in Section IV and conduct extensive experiments in Section V. Finally we conclude this paper in Section VI.
Ii Related Works
In this paper, we utilize deep neural networks to forecast the interregional taxi demand, which is closely related to the taxi demand prediction and origin-destination estimation. We will thoroughly review the relevant works of these two categories of researches in the following subsections.
Ii-a Taxi Demand Prediction
Due to its huge potential application in ITS, taxi demand prediction has been extensively studied [19, 20, 21, 22, 23]. Moreira-Matias et al.  proposed to aggregate the GPS signals into histogram time series and applied them to predict the demand with a Poisson Model and an AutoRegressive Moving Average model. Yuan et al.  presented a recommender to provide taxi drivers accurate locations to pick up passengers quickly with historical GPS trajectories of taxicabs. Li et al. forecast the spatio-temporal variations of passengers at the given hotspot with an improved ARIMA-based prediction model. All above methods require the taxi trajectories. The trajectory-free prediction has recently attracted increasing attention. A pioneer work was proposed by Tong et al. 
, in which they utilized the taxi-calling records from some online taxicab requesting platforms to predict the taxi demand with a unified linear regression model.
Recently, the success of deep learning on various computer vision tasks[26, 27, 28, 29, 30, 31] motivates researchers to adopt the deep neural network to handle this task. Wang et al. designed a neural network framework using context data from multiple sources to predict the gap between taxi supply and demand. Xu et al. proposed a sequence learning model that can predict future taxi requests in each area of a city based on the recent demand and other relevant information. Yao et al. proposed a Deep Multi-View Spatial-Temporal Network framework to model both spatial and temporal relations of taxi demand. Rodrigues et al. combined time-series and textual data to forecast the taxi demand in event areas with two hybrid deep learning architectures. Recently, Zhou et al. built an attention-based neural network to predict the passenger pickup/dropoff demand on each region, but they were still not applicable to taxi demand prediction between region pairs.
All the above methods only forecast the taxi demand per unit time in each region or at some specific locations. In contrast, our method attempts to predict interregional taxi demand, which can help taxi preallocation systems to allocate the taxis more efficiently. Moreover, our proposed CSTN explicitly captures the local spatial context, temporal evolution context and global correlation context in one united framework to infer more accurate taxi demand patterns.
Ii-B Origin-Destination Estimation
Origin-Destination Estimation [34, 35, 36] aims to estimate the flow between the endpoints of the studied traffic network, given the flow count and other observations of several traffic links. Existing research works on this task can be divided into two categories, including static estimation and dynamic estimation. The static approaches [37, 38] consider the traffic flow as time-independent and estimate the average demand, which are suitable for long-time transportation planning and design purpose. On the other hand, dynamic approaches [39, 40] estimate the time-variant flow between each origin and destination, which can be used for short time route guidance and dynamic traffic assignment. These works generally take as input the flow count of some links collected from well-designed positions (e.g., highway toll booths, the intersection of main street, express road, subway and bus stations) and some prior information (e.g. the proportion of different origin-destination pair). When considering a huge number of positions, the OD matrix would become high-dimensional and hard to be computed. Some previous studies [10, 41] attempted to resolve this issue through dimension reduction technology.
However, these methods are designed to estimate the traffic flow between some specific positions and are not effective for citywide taxi preallocation, as taxi passengers can be located in any area. In contrast, our method divides a city into multiple regions and forecasts the taxi demand between these regions.
In this section, we first define some notations and then formulate the taxi origin-destination demand prediction problem based on these notations.
Region Partition: In this work, we focus on the taxi origin-destination demand prediction between regions, rather than the specific positions. There are many ways to divide a city into multiple regions in terms of different granularities and semantic meanings, such as road network  and zip code tabular . Inspired by the previous works [4, 7], we partition a city into non-overlapping grid map based on the longitude and latitude. Each rectangular grid represents a different geographical region in the city. The region on the row and column of the grid map is denoted as in the following sections. Figure 1
(a) illustrates the partitioned regions of the Manhattan in New York City. With this simple partition method, the raw taxi request records could be directly transformed into matrix or tensor, which is the most common format of input data of the deep neural networks.
Taxi Origin-Destination Demand: In taxi calling industry, the taxicab companies or online platforms, such as Didi Chuxing and Uber, would receive a large number of taxi requests from passengers every second. Each raw taxi request contains the origin location, destination location, timestamp and other information (e.g., user identification and phone number) of the passengers. In our work, the taxi origin-destination demand is defined as the total number of taxi requests from the origin region to the destination region in each time interval.
We denote the taxi origin-destination demand in time interval as a 3D matrix , where and are the height and width of the city grid map respectively. is the total number of the regions in city and it is equal to . Specifically, , in which the destination index is equal to , is the demand from origin region to destination region , The value of can be measured from the taxi request records in time interval . In particular, the channel of , denoted as , is the taxi demand from all regions to region . Figure 1(b) shows some channels of by mapping the passengers’ pick-up locations back to the geo-coordinate on Google map. The taxi origin demand, denoted as , can be easily calculated by .
Taxi Origin-Destination Demand Prediction: The taxi origin-destination demand prediction problem in our work aims to predict the taxi origin-destination demand in time interval , given the data until time interval . As shown in Figure 2, the taxi demand is seriously affected by the meteorological conditions, so we also incorporate the historical meteorological data to handle this task and we denote the meteorological data in time interval as . The collection and preprocessing of taxi demand data and meteorological data are described in Section V-A. Therefore, our final goal is to predict with the historical demand data and meteorological data , where is the sequence length of time intervals.
Iv The Proposed Method
In this section, we propose a novel Contextualized Spatial-Temporal Network (CSTN) for taxi origin-destination demand prediction. As shown in Figure 3, our network consists of three components for three types of context modeling respectively. First, the LSC module utilizes two convolutional neural networks to learn the local spatial context of taxi demand from origin view and destination view. Second, the TEC module incorporates both the local spatial features of taxi demand and the meteorological information to a ConvLSTM for the analysis of taxi demand evolution. Third, the GCC module generates the global correlation feature of each region by summing the features of all regions with the calculated similarity weights.
Iv-a Local Spatial Context Modeling
Generally, the taxi demand is usually related to local spatial location, and the spatially adjacent regions may have the similar demand patterns. For instance, people tend to depart from residence regions and head to employment regions in the morning rush hours. In this case, most of the residence regions in city suburb have high origin demands, while most of the working area in city center have high destination demands. Vice versa in the evening rush hours. Recently, Yao et al.  modeled the local spatial context of taxi origin demand with convolutional layers, but they neglected the context of destination demand.
In this work, our proposed LSC module simultaneously captures the local spatial context of taxi demand from both the origin view and destination view. As described in Section III, each channel of the OD matrix is the taxi demand from all origin regions to the corresponding region, thus we define the convolution operations on as origin view modeling. To model the local spatial context from destination view, we generate a DO matrix from with the transformation process described in Figure 4. Specifically, we first reshape to be a 2D matrix and then conduct the common transposition operation. Finally, the transpose matrix is reorganize to be a 3D tensor . Each channel of is the taxi demand from the corresponding region to all destination regions.
As shown in the bottom of Figure 3, our LSC module is implemented by a Two-View ConvNet, which takes and as input to respectively capture the local spatial context from different views. The origin view CNN contains convolutional layers. Each convolutional layer has 16 filters of kernel size of
, followed by a Rectified Linear Unit (ReLU). To maintain the same resolution in space, the strides of all convolutional layers are set to 1 and no pooling layers are adopted in the network. The destination view CNN has the same network structure with the origin view CNN. In time interval, the origin view CNN takes as input and its output feature only contains the local spatial context of origin view. Meanwhile, the destination view CNN takes as input and its output feature only contains the local spatial context of destination view. To capture the integrated local spatial context, we finally fuse these two features using a convolutional layer with 32 filters. The whole pipeline of our LSC module can be expressed as:
where and are the parameters of the origin view CNN and destination view CNN respectively. denotes the parameters of the fusion convolutional layer and denotes the feature concatenation operation. is the final local spatial feature, which contains the local spatial context of taxi demand from both the origin view and the destination view.
Iv-B Temporal Evolution Context Modeling
Taxi demand is a time-varying process and it is usually affected by diverse complicated factors. Besides its own internal states, the meteorological conditions also impact the future demand. For instance, a sustained snowfall may seriously weaken the travel willingness of residents and cause a decrease in taxi demand, as shown in Figure 2. Therefore, we incorporate the historical demand feature and the ever-changing meteorological conditions to grasp the evolving tendency of taxi demand along the temporal dimension.
Fully Connected Long Short-term Memory Network (FC-LSTM) has been proven to be powerful for temporal context modeling, but it fails to preserve the local spatial context captured by the aforementioned Two-View ConvNet. In this work, we model the temporal evolution context of taxi demand with an advanced Convolutional LSTM (ConvLSTM) . Compared with FC-LSTM, ConvLSTM can preserve the structural locality of input feature, with the convolutional structures in both the input-to-state and state-to-state connections. Moreover, it can effectively accumulate the previous sequential information by maintaining a memory cell. Specifically, at iteration , given the input , the ConvLSTM updates its memory cell with an input gate and a forget gate , and controls its hidden state with an output gate . Its formulation can be expressed as follows:
where denotes the Hadamard product, and
is the logistic sigmoid function. Symboldenotes the convolutional operator and are the parameters of convolutional layers in ConvLSTM.
We aim to predict the taxi demand with the historical demand and the meteorological conditions of previous time intervals. For the meteorological datatimes and construct a 3D meteorological feature . We combine and with a convolutional layer, which is expressed as:
where is the feature concatenation operation and denotes the parameters of the convolutional layer with 32 filters. is the local spatial feature that integrates the meteorological information.
We feed the features into the ConvLSTM sequentially. At iteration , the ConvLSTM takes as input and accumulates the previous sequential information to the memory cell with Eq.(2). After iteration, the hidden state of ConvLSTM is denoted as . We generate the local spatial-temporal feature by feeding into a convolutional layer with filters, which is expressed as:
where is the parameters of the convolutional layer and encodes the temporal evolution context of the taxi demand.
Iv-C Global Correlation Context Modeling
In the above two modules, the ConvNets and ConvLSTM only capture and maintain the local context of taxi demand. However, the taxi demand distribution is also related to the attribute of the regions, e.g., most of the residential regions in different areas of the city may have high taxi demands in the morning rush hours. Therefore, even if the two regions are far apart in distance, they may still have similar taxi demand patterns as long as the attributes of the two regions are consistent. We call this kind of correlation as global correlation context.
Inspired by the recent work , we capture the global correlation between all regions with a global feature fusion operation. Specifically, we generate the global correlation feature of each region as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs. In this way, each region contains the information of all regions and it is mainly relevant to the regions of high similarities with it.
We detail each step of our GCC module as follows. Firstly, we feed into a convolutional layer with filters to generate an embedded feature and then reshape it into a 2D matrix, which can be expressed as:
where is equal to and is the convolutional parameters. Each column of stands for the feature of a region. We further calculate the similarity matrix as a dot-product operation between and its transposed matrix , and perform the Softmax operation on each column of , which is expressed as:
where denotes the dot product operation. is the normalized similarity weight between the two regions with index and index .
After obtaining the similarity matrix , we compute the global correlation feature of each region by summing the features of all regions with the calculated similarity weights. We implement this process with a dot-product operation. We reshape to dimension and then dot-product and to compute the global feature , which is further reshaped to dimension . The entire process can be expressed as :
The feature encodes the global correlation context, but lacks of structural locality, which would cause performance degradation. Therefore, we generate a new feature by concatenating and . The feature is thus incorporated with hybrid information of the local spatial context, temporal evolution context and global correlation context.
Finally, we predict the taxi origin-destination demand in time interval , denoted as , by feeding into a linear regression, which can be formulated as:
where is the linear regression implemented by a convolutional layer with filters and the hyperbolic tangent ensures the output values are between -1 and 1555When training, we use Min-Max linear normalization method to scale the origin-destination demand matrices into the range . We re-scale the predicted values back to the normal values and then compare with the ground truth while performing evaluation..
Iv-D Implementation Details
We implement our Contextualized Spatial-Temporal Network with Tensorflow. In LSC module, the layer number is set to 3, which means each ConvNet consists of three convolutional layers. In TEC module, all convolutional layers in ConvLSTM have 32 filters and the channel number of feature is set to 75. In GCC module, the channel number of feature is set to 64. For the whole model, the filter parameters of all convolutional layers and the fully-connected layers are initialized by Xavier . The size of a minibatch is set to 64. The learning rate is initially set to
and multiplied by 0.1 every 200 epochs. We optimize our network in an end-to-end manner via Adam optimization by minimizing the Euclidean loss between the ground truth and the predicted result. It takes 7 hours to train our network for 700 epochs with an NVIDIA K80 GPU.
In this section, we first build a large scale benchmark of taxi origin-destination demand prediction. We then introduce the evaluation metrics of this task and further compare our proposed method with several state-of-the-art methods. Finally, we conduct extensive component analysis to demonstrate the effectiveness of each module of our model.
V-a NYC-TOD Dataset
To the best of our knowledge, there are no public datasets for the citywide taxi origin-destination demand prediction. To evaluate the performances of all compared methods and further promote the relevant research, we also create the first benchmark for this task, denoted as NYC-TOD. It is composed of two data categories, including taxi origin-destination demand data and meteorological data of the New York City in 2014. We choose the data of the last sixty days as the testing set, and all data before that as the training set.
Taxi Origin-Destination Demand Data: New York City (NYC) is one of the most prosperous cities in the world and its taxi industry is extremely developed. The origin and destination locations of most NYC taxi trips are in the Manhattan borough , therefore we choose the Manhattan as the study area in our work. As discussed in Section III, we first divide the Manhattan into a grid map based on the longitude and latitude. Each grid represents a geographical region with a size of about . The detailed partitioned regions of Manhattan are shown in Figure 1.
We use the NYC yellow taxi trip records in 2014 to construct our taxi origin-destination demand prediction dataset. These data were collected by the New York City Taxi and Limousine Commission (NYCTLC666http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). Each raw trip record contains the timestamp and the geo-coordinates of origin and destination locations. After excluding the trips, of which origin or destination locations aren’t in the Manhattan borough, we get 132 million taxi trip records. Finally, we can generate the taxi origin-destination demand matrix in each time interval by calculating the number of taxi trips between all regions according to the timestamps and geo-coordinates of taxi trip records. Each time interval is set to half an hour in this dataset. The total number of taxi demands in each day are summarized in Figure 5, which shows that more than ten million taxi requests are made in NYC per month. The spatial distribution of taxi demand is shown in Figure 1(b) and we can observe that most taxi demands gather in the city center and traffic hubs.
Meteorological Data: We collect the NYC meteorological data in each time interval from Wunderground777https://www.wunderground.com/, which is a well-known meteorological information provider. As the meteorological conditions of all regions are quite similar in the same time interval, we treat the meteorological data observed at the Central Park Station as that of the whole Manhattan borough. We consider the effect of temperature, windchill, humidity, visibility, wind speed, precipitation and weather conditions in our study. The categories of weather condition and the range of other six meteorological indicators are shown in Table II
. Further, the weather condition is digitized with One-Hot Encoding, while the other six numeric indicators are scaled into the range [0,1] with Min-Max linear normalization. Finally, the meteorological data in time interval
can be denoted as a vector.
|Temperature / ℃||[-18.3, 35.6]|
|Windchill / ℃||[-28.4, 38.5]|
|Humidity / %||[9, 100]|
|Visibility / km||[0.4, 16.1]|
|Wind Speed / km/h||[0.0, 137.0]|
|Precipitation / mm||[0.0, 28.7]|
|Weather Condition||23 types(e.g. Sunny, Rainy, Snowy and Unknown)|
V-B Evaluation Metric
Following the previous works [7, 4], we adopt the Mean Average Percentage Error (MAPE) and Rooted Mean Square Error (RMSE) as the metrics to evaluate the performance of all methods, which are defined as:
where is the total number of testing samples, and are the predicted taxi demand and the corresponding ground truth in time interval respectively. As described in section IV, the input and output of our proposed network are normalized into the range during training, so when evaluating, we re-scale the predicted values back to the normal values and then compare them with the ground truth.
In our experiment, we not only evaluate the performance of the task of taxi origin-destination demand prediction, but also consider the task of taxi origin demand prediction. As described in Section III, the predicted origin demand can be calculated from by . For convenience in the following section, the MAPE and RMSE of the former task are denoted as OD-MAPE and OD-RMSE, while these two metrics of the latter task are denoted as O-MAPE and O-RMSE. When evaluating, we follow the previous work  to filter the origin-destination pairs or the origin regions with ground truth less than 5 in each time interval since such low taxi demand is always ignorable in real-world applications.
V-C Comparison with the State-of-the-Art
We compare the performance of our proposed method with the following basic and advanced methods. We tune the parameters of all methods and report their best performance.
Historical Average (HA): Historical Average predicts the future demand by averaging the historical demands. There are two implemented methods: (1) HA-All averages the historical demands in the same time intervals of every day on the whole training set; (2) HA-Rec averages the taxi demands of previous time intervals.
: We implement two typical linear regression methods: Ordinary Least Squares Regression (OLSR ) and Lasso Regression  with -norm regularization. They take the concatenation of the demand matrices of previous time intervals as input and predict the taxi demand between any two regions.
Multiple Layer Perceptron (MLP): A neural network consists of four fully connected layers with 128, 128, 64 and 75 neurons respectively. The MLP forecasts the every channel of by taking the corresponding channels of demand matrices of previous time intervals as input.
ST-ResNet : ST-ResNet is a deep learning based method that predicts the future traffic inflow and outflow. We utilize its released code888https://github.com/lucktroy/DeepST/tree/master/scripts/papers/AAAI17 to predict the taxi origin-destination demand.
ConvLSTM : ConvLSTM is our LSC module + TEC module. Specifically, the LSC module in this network only contains the origin view ConvNet and takes as input to learn the local spatial context.
ON THE WHOLE NYC-TOD TESTING SET
Performance on the Whole Testing Set: We first conduct the comparison of our proposed method with other methods on the whole NYC-TOD testing set. The results of all methods are summarized in Table III and it can be observed that our method outperforms other competed methods by a margin. Specifically, our method achieves the lowest MAPE and RMSE on the task of taxi origin-destination demand prediction. Moreover, for the taxi origin demand prediction, our method achieves 7.1% and 5.6% relative performance improvements over O-MAPE and O-RMSE, compared to the existing best-performing method ConvLSTM. Figure 7 shows the taxi origin-destination demands and taxi origin demands predicted by our CSTN. We can observe that our method is robust to forecast the taxi demands of different scale.
Despite some competed methods (such as MLP, ST-ResNet and ConvLSTM) also adopt deep learning techniques to predict the taxi demand, they perform worse than our CSTN. The main reasons are that MLP fails to capture the local spatial context and ST-ResNet does not explicitly learn temporal evolution context, while ConvLSTM does not model the global correlation context. Compared with these methods, our method integrates the above various context into a unified framework to predict the taxi demand in future time intervals.
ON THE HIGH-DEMAND REGIONS
Performance on the High-Demand Regions: As shown in Figure 1(b), the spatial distribution of taxi demand is not uniform and most of the taxi demands are gathered in some regions, therefore the taxicab companies may give priority to meet the taxi demand of these regions. In this section, we evaluate the performance of all compared methods on the high-demand regions. We first measure the taxi origin demand of each region on the whole training set of NYC-TOD and then choose twenty regions with the highest demands. These regions cover about 70% of the taxi demand in Manhattan. We only evaluate the origin-destination demand between these regions and the origin demand within these regions. As shown in Table IV, our method achieves the best performance in comparison to other methods on high-demand regions. Specifically, our method outperforms ConvLSTM by about 1% over the MAPE metric for two types of taxi demand prediction.
Performance on Different Days We compare the performance of all methods on different days of the week in this section. We will exclude the result of HA-All and HA-Rec in the following experiment as they are of very poor performance. Here we only report their performance over the OD-MAPE metric, and the similar phenomenon also occurs over the O-MAPE. As shown in Figure 6, our method consistently outperforms other competed methods in all days of the week. Furthermore, we also average the performance of all methods on weekdays and weekends. The results are summarized in Table V and our method still achieves the best performance. We can observe that the performances of three shallow methods on weekdays and weekends are comparable. In contrast, the performance of four deep learning based methods on weekdays is better than that on weekends. Yao et al. also found this phenomenon and one main reason is that the taxi demand patterns are less regular on weekends. We can conclude that the deep learning based methods have more capacity to capture the regular patterns on weekdays while learning the inconspicuous patterns on weekends.
V-D Component Analysis
Influence of Different Context: Our full model consists of three components for three types of context modeling. To explore the influence of different context on taxi demand prediction, we implement the following variants of our model with different components:
LSC Net: This network only contains the LSC module and it directly concatenates the local spatial features of each time interval to predict the future taxi demand with a convolutional layer.
LSC+TEC Net: This network contains the LSC module and TEC module, but without the GCC module. It feeds the last hidden state of the TEC module into a convolutional layer to predict the taxi demand.
LSC+TEC+GCC Net: As the full version of CSTN, this network integrates the local spatial context, temporal evolution context and global correlation context to predict the taxi demand.
As shown in Table VI, the LSC Net achieves an OD-MAPE of 28.54% and an O-MAPE of 20.80%. It outperforms the ST-ResNet which has more convolutional layers, as our LSC module adequately captures the local spatial context with the Two-View ConvNet. When explicitly modeling the temporal evolution context of taxi demand with LSTM, the LSC+TEC Net gets an OD-MAPE of 27.80% and an O-MAPE of 19.41%, achieving an obvious performance improvement compared to the LSC Net. After integrating the global correlation context with the GCC module, the LSC+TEC+GCC Net can further decrease the OD-MAPE to 27.27% and OD-MAPE to 18.48%, with 2.5% relative performance improvement on average. The experimental result shows that our network can achieve notable performance improvement by modeling these context, which also indicates the effectiveness of these context for the task of taxi demand prediction.
|Origin View + Destination view||28.54%||20.80%|
Effectiveness of the Two-View ConvNet in the LSC module: As described in Section IV-A, we use a Two-View ConvNet to model the local spatial context from origin view and destination view. To validate the effectiveness of the Two-View ConvNet, we train a variant of LSC Net that only takes OD matrix as input to learn the local spatial context from origin view. As shown in Table VII, only with origin view ConvNet, the LSC Net performs so poorly. After adding the destination view, the performance will be improved with 0.5% and 2.03% over metrics OD-MAPE and O-MAPE respectively. The experimental result shows that the destination view context is also beneficial for taxi demand and our LSC module can capture the local spatial context effectively.
Influence of Local and Global Context: As described in Section IV-C, we forecast the taxi demand with the concatenation of local feature and global feature . To analyze how these two features contribute to the performance, we train other two variants of CSTN to predict the taxi demand only with or . As shown in Table VIII, the performance of the local feature is better than that of the global feature , which indicates the local feature is more efficient for this task. When combining the local and global feature for the final prediction, our method achieves the best performance, which shows that the local context and global context are complementary for the taxi demand prediction.
Influence of Meteorology: In this section, we will explore the influence of meteorology on taxi demand prediction. We train another LSC+TEC Net and CSTN without considering the meteorology. As shown in table IX, without taking the meteorological data into consideration, the LSC+TEC Net and CSTN respectively get an O-MAPE of 20.03% and 19.72%. In contrast, when predicting the taxi demand with meteorological data, the LSC+TEC Net and CSTN can decrease the O-MAPE to 19.41% and 18.48%, with 3.1% and 6.23% relative performance improvement. Moreover, we also verify the relevance of each variable in meteorological data for the taxi demand prediction by filtering it when inferring. When exploring the effect of weather condition, we set it to the type Unknown for all time intervals. For each of the other numeric indicators, it is set to its mean value of the whole training set. The changes in performance are shown in table X and we can see that the OD-MAPE and O-MAPE are incremental to some extent when filtering the meteorological data variables. These experiments show that the meteorological information can help to improve the performance of taxi demand prediction.
|LSC+TEC Net W/O Meteorology||28.08%||20.03%|
|LSC+TEC Net W/- Meteorology||27.80%||19.41%|
|CSTN W/O Meteorology||27.69%||19.72%|
|CSTN W/- Meteorology||27.37%||18.48%|
when filtering each variable in meteorological data
Influence of Sequence Length: As described in Section IV, we can implement our model with different sequence length of time intervals. To explore the influence of sequence length, we train our model with different . As shown in table XI, the OD-MAPE and O-MAPE gradually decrease as the sequence length increases. Our method achieves the best performance with five time intervals (2.5 hours) and longer sequence hardly results in obvious performance improvement. One potential reason is that the future taxi demand is more relevant to the short-term tendency. Therefore, feeding too long time sequence into the network no longer helps to boost the performance and we finally set the sequence length to 5 in all experiments.
V-E Further Discussion
on an NVIDIA 1080 GPU
Runtime Efficiency: In this subsection, we compare the running times of different methods for taxi origin-destination demand prediction. As shown in Table XII, all deep learning based methods can achieve practical runtime efficiencies on an NVIDIA 1080 GPU. Specifically, our CSTN only costs 1.187 ms to predict the taxi demand of next time interval, which is totally acceptable in the industrial community. As for the traditional methods (such as Lasso, OLSR and XGBoost) of CPU implementation, we evaluate their running times on Intel Xeon 2.40GHz E5-2620 CPU. Lasso and OLSR can conduct a prediction within 0.381 ms, while XGBoost requires 5.666 ms for each inference, as it processes the prediction of each region pair independently. In summary, all compared methods can perform in real-time and the runtime efficiency is not the bottleneck of this task, thus we should pay more attention to improving the accuracy of the taxi demand prediction.
Long-Term Taxi Demand Prediction: In this subsection, we extend our CSTN to predict the long-term taxi demand. Here, we take the historical demand data and meteorological data to forecast the future demand , where and are set to 5 and 6 respectively in our experiment. The long-term prediction version of CSTN is denoted as L-CSTN and its architecture is shown in Figure 8. L-CSTN first encodes the historical data with LSC module and generates the feature with TEC module. Then, is fed into decoding ConvLSTM units and each of them is followed by a GCC module to predict the taxi demand. The performance of long-term taxi demand prediction is shown in Table XIII. Our L-CSTN achieves an OD-MAPE of 28.63% and an O-MAPE of 20.50% for the demand . As the predicted time intervals increase, the performance gradually drops. For the demand , despite its OD-MAPE and D-MAPE increase to 30.86% and 24.85%, this estimated result is still very practical for taxi preallocation.
Different Region Partition Manners: In this subsection, we explore the performance of different region partition manners. Geographical coordinate (longitude and latitude) is widely used to generate rectangular regions [4, 7, 53], while land use homogeneity is another good foundation of region partition. According to the Pluto (Primary Land Use Tax Lot Output dataset999https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page), we visualize the land types of each building block of the Manhattan borough in Figure 9 and find there may exist multiple categories of land in a local region. Inspired by the previous work , we first generate multiple areas on the basis of the Zip Code Tabular (ZCT) and then manually adjust their spaces with the land use homogeneity. The final 44 regions with different shapes are shown in Figure 9 and each region has relatively consistent land use homogeneity. In this case, the historical taxi origin-destination demand in time interval is organized as a 2D matrix with a dimension of and denotes the demand from origin region to destination region . We reconstruct the NYC-TOD Dataset with the new region coordinates and retrain the compared methods. Specifically, since lacks the spatial information, our CSTN utilizes a CNN with four convolutional layers to encode the and then feed the feature to the TEC module.
The OD-MAPE of four deep learning based methods of different region partition manners is shown in Table XIV. We can observe that convincing performance can be achieved by the manner “ZCT + Land Use Homogeneity”, but it is still slightly worse than the manner “Geographical Coordinate”. The main reason is that the data organization format of “ZCT + Land Use Homogeneity” can not well preserve the local spatial information of the taxi demand, where is the total number of regions. How to boost the performance with the local spatial information and land use homogeneity is worth exploring in the future works.
|Manner||Geographical Coordinate||ZCT +|
|Land Use Homogeneity|
In this paper, we introduce a more worth-exploring task, taxi origin-destination demand prediction, which aims at predicting the taxi demand between all regions in the future time intervals. We argue that the information of passengers destinations is also critical for the taxi preallocation systems, since some factors (e.g. the city management rules and the individual preference of drivers) may affect the supply amount of available taxi between two regions as mentioned in Section I. Therefore, it’s essential to combine the predicted taxi OD demand and the aforementioned external factors to optimize the taxi preallocation scheme.
We address this problem with a Contextualized Spatial-Temporal Network (CSTN), which integrates local spatial context, temporal evolution context and global correlation context in one united framework. By learning the taxi demand patterns from historical data, the proposed CSTN can make taxi demand predictions for all regions pairs. 132 million taxi trip records of New York City is used to train and evaluate our model. Experimental results show that our model achieves an OD-MAPE of 24.93% and an O-MAPE of 12.92%, outperforming other state-of-the-art methods on both tasks of taxi OD demand prediction and origin demand prediction. Further, we extend our CSTN to predict the long-term taxi demand and our method achieves very practical performance.
How to divide a city into different regions is still an open problem. In the future work, we will explore a better region partition manner, with which the spatial information and land use homogeneity information can be efficiently used simultaneously. Meanwhile, our work can be extended by adding more information to the network, such as the periodic taxi demand and the Point of Interest (POI) in each region, which may help to further boost the performance. Finally, we will cooperate with some taxicab requesting platforms and optimize their taxi preallocation systems with the prediction OD demand and the aforementioned external factors. Such systems are expected to decrease the inefficient operations of the taxi industry.
-  X. Zhan, X. Qian, and S. V. Ukkusuri, “A graph-based approach to measuring the efficiency of an urban taxi service system,” TITS, vol. 17, no. 9, pp. 2479–2489, 2016.
L. Zhang, T. Hu, Y. Min, G. Wu, J. Zhang, P. Feng, P. Gong, and J. Ye, “A taxi order dispatch model based on combinatorial optimization,” inACM SIGKDD. ACM, 2017, pp. 2151–2159.
-  H. Yang, Y. W. Lau, S. C. Wong, and H. K. Lo, “A macroscopic taxi model for passenger demand, taxi utilization and level of services,” Transportation, vol. 27, no. 3, pp. 317–340, 2000.
-  J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks for citywide crowd flows prediction.” in AAAI, 2017, pp. 1655–1661.
-  W. Jin, Y. Lin, Z. Wu, and H. Wan, “Spatio-temporal recurrent convolutional networks for citywide short-term crowd flows prediction,” in ICCDA. ACM, 2018, pp. 28–35.
-  Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv, “The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms,” in ACM SIGKDD. ACM, 2017, pp. 1653–1662.
-  H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, and J. Ye, “Deep multi-view spatial-temporal network for taxi demand prediction,” arXiv preprint arXiv:1802.08714, 2018.
F. Toqué, E. Côme, M. K. El Mahrsi, and L. Oukhellou, “Forecasting dynamic public transport origin-destination matrices with long-short term memory recurrent neural networks,” inITSC. IEEE, 2016, pp. 1071–1076.
-  A. Azzouni and G. Pujolle, “A long short-term memory recurrent neural network framework for network traffic matrix prediction,” arXiv preprint arXiv:1705.05690, 2017.
C. Yang, F. Yan, and X. Xu, “Daily metro origin-destination pattern recognition using dimensionality reduction and clustering methods,” inITSC. IEEE, 2017, pp. 548–553.
-  X. Zhou, Y. Shen, Y. Zhu, and L. Huang, “Predicting multi-step citywide passenger demands using attention-based neural networks,” in ICWSDM. ACM, 2018, pp. 736–744.
-  L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “Predicting taxi passenger demand using streaming data,” TITS, vol. 14, no. 3, pp. 1393–1402, 2013.
-  X. Qian, S. V. Ukkusuri, C. Yang, and F. Yan, “Forecasting short-term taxi demand using boosting-gcrf,” 2017.
-  J. Ke, H. Zheng, H. Yang, and X. M. Chen, “Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach,” Transportation Research Part C: Emerging Technologies, vol. 85, pp. 591–608, 2017.
-  X. Liang, C. Xu, X. Shen, J. Yang, S. Liu, J. Tang, L. Lin, and S. Yan, “Human parsing with contextualized convolutional neural network,” in ICCV, 2015, pp. 1386–1394.
-  R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,” in CVPR, 2015, pp. 1265–1274.
-  Z. Li, Y. Gan, X. Liang, Y. Yu, H. Cheng, and L. Lin, “Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling,” in ECCV. Springer, 2016, pp. 541–557.
S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inNIPS, 2015.
-  J. Sun, D. Papadias, Y. Tao, and B. Liu, “Querying about the past, the present, and the future in spatio-temporal databases,” in ICDE. IEEE, 2004, pp. 202–213.
-  A. Anwar, M. Volkov, and D. Rus, “Changinow: A mobile application for efficient taxi allocation at airports,” in ITSC. IEEE, 2013, pp. 694–701.
-  K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang, “A framework for passengers demand prediction and recommendation,” in SCC. IEEE, 2016, pp. 340–347.
-  Z. Qiu, L. Liu, G. Li, Q. Wang, N. Xiao, and L. Lin, “Taxi origin-destination demand prediction with contextualized spatial-temporal network,” in ICME, 2019.
-  J. Xu, R. Rahmatizadeh, L. Bölöni, and D. Turgut, “Real-time prediction of taxi demand using recurrent neural networks,” TITS, 2017.
-  J. Yuan, Y. Zheng, L. Zhang, X. Xie, and G. Sun, “Where to find my next passenger,” in ICUC. ACM, 2011, pp. 109–118.
-  X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang, “Prediction of urban human mobility using large-scale taxi traces and its applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.
-  T. Chen, L. Lin, L. Liu, X. Luo, and X. Li, “Disc: Deep image saliency computing via progressive representation learning,” TNNLS, vol. 27, no. 6, pp. 1135–1149, 2016.
-  L. Liu, H. Wang, G. Li, W. Ouyang, and L. Lin, “Crowd counting using deep recurrent spatial-aware network,” in IJCAI, 2018.
-  G. Li and Y. Yu, “Contrast-oriented deep neural networks for salient object detection,” IEEE Transactions on Neural Networks and Learning Systems, 2018.
W. Ouyang, H. Zhou, H. Li, Q. Li, J. Yan, and X. Wang, “Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection,”TPAMI, pp. 1874–1887, 2018.
-  Z. Qiu, L. Liu, G. Li, Q. Wang, N. Xiao, and L. Lin, “Crowd counting via multi-view scale aggregation networks,” in ICME, 2019.
-  L. Liu, G. Li, Y. Xie, Y. Yu, Q. Wang, and L. Lin, “Facial landmark machines: A backbone-branches architecture with progressive representation learning,” TMM, 2019.
-  D. Wang, W. Cao, J. Li, and J. Ye, “Deepsd: supply-demand prediction for online car-hailing services using deep neural networks,” in ICDE. IEEE, 2017, pp. 243–254.
-  F. Rodrigues, I. Markou, and F. C. Pereira, “Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach,” Information Fusion, vol. 49, pp. 120–129, 2019.
-  O. Tamin and L. Willumsen, “Transport demand model estimation from traffic counts,” Transportation, vol. 16, no. 1, pp. 3–26, 1989.
-  E. Cascetta and S. Nguyen, “A unified framework for estimating or updating origin/destination matrices from traffic counts,” Transportation Research Part B: Methodological, vol. 22, no. 6, pp. 437–455, 1988.
-  X. Zhou and H. S. Mahmassani, “Dynamic origin-destination demand estimation using automatic vehicle identification data,” TITS, vol. 7, no. 1, pp. 105–114, 2006.
O. Z. Tamin, H. Hidayat, and A. K. Indriastuti, “The development of maximum-entropy (me) and bayesian-inference (bi) estimation methods for calibrating transport demand models based on link volume information,” inEASTS, vol. 4, 2003, pp. 630–647.
-  M. L. Hazelton, “Some comments on origin–destination matrix estimation,” Transportation Research Part A: Policy and Practice, vol. 37, no. 10, pp. 811–822, 2003.
-  X. Zhou, X. Qin, and H. Mahmassani, “Dynamic origin-destination demand estimation with multiday link traffic counts for planning applications,” Transportation Research Record: Journal of the Transportation Research Board, no. 1831, pp. 30–38, 2003.
-  M. L. Hazelton, “Statistical inference for time varying origin-destination matrices,” Transportation Research Part B: Methodological, vol. 42, no. 6, pp. 542–552, 2008.
-  X. Li, J. Kurths, C. Gao, J. Zhang, Z. Wang, and Z. Zhang, “A hybrid algorithm for estimating origin-destination flows,” IEEE Access, vol. PP, no. 99, pp. 1–1, 2017.
-  D. Deng, C. Shahabi, U. Demiryurek, L. Zhu, R. Yu, and Y. Liu, “Latent space model for road networks to predict time-varying traffic,” KDD, pp. 1525–1534, 2016.
-  S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
-  X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” arXiv preprint arXiv:1711.07971, 2017.
-  M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
-  X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010, pp. 249–256.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  X. Qian, X. Zhan, and S. V. Ukkusuri, “Characterizing urban dynamics using large scale taxicab data,” in Engineering and Applied Sciences Optimization. Springer, 2015, pp. 17–32.
-  D. Harris and S. Harris, Digital design and computer architecture. Morgan Kaufmann, 2010.
-  B. Craven and S. Islam, Ordinary least-squares regression. Sage Publications, 2011.
-  R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
-  T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in ACM KDD. ACM, 2016, pp. 785–794.
-  L. Liu, R. Zhang, J. Peng, G. Li, B. Du, and L. Lin, “Attentive crowd flow machines,” in ACM MM. ACM, 2018, pp. 1553–1561.