An Effective Dynamic Spatio-temporal Framework with Multi-Source Information for Traffic Prediction

05/08/2020 ∙ by Jichen Wang, et al. ∙ BEIJING JIAOTONG UNIVERSITY 0

Traffic prediction is necessary not only for management departments to dispatch vehicles but also for drivers to avoid congested roads. Many traffic forecasting methods based on deep learning have been proposed in recent years, and their main aim is to solve the problem of spatial dependencies and temporal dynamics. In this paper, we propose a useful dynamic model to predict the urban traffic volume by combining fully bidirectional LSTM, the more complex attention mechanism, and the external features, including weather conditions and events. First, we adopt the bidirectional LSTM to obtain temporal dependencies of traffic volume dynamically in each layer, which is different from the hybrid methods combining bidirectional and unidirectional ones; second, we use a more elaborate attention mechanism to learn short-term and long-term periodic temporal dependencies; and finally, we collect the weather conditions and events as the external features to further improve the prediction precision. The experimental results show that the proposed model improves the prediction precision by approximately 3-7 percent on the NYC-Taxi and NYC-Bike datasets compared to the most recently developed method, being a useful tool for the urban traffic prediction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, due to the surge in people’s car holdings and limited road carrying capacity, the contradiction between road supply and demand has become prominent increasingly. The traffic volume is one of the main parameters reflecting the running state of the road. If the traffic volume on the road can be monitored and predicted in time and accurately, the vehicle can be guided in advance expertly. This will not only improve the operational capability and operational efficiency of the road network but also be of considerable significance to traffic managers, operators, and participants.

Over the past three decades, there has been a dramatic increase in traffic forecasting research. Different methods Cui et al. (2018)

have been proposed for traffic prediction, and those methods can be divided into three categories, traditional time series, traditional machine learning, and deep learning techniques. The first category includes autoregressive integrated moving average (ARIMA) 

Williams and Hoel (2003), season autoregressive integrated moving average(SARIMA) Szeto et al. (2009)

, and Kalman filtering 

Xie et al. (2007); Guo and Williams (2010) with its variants, and they have already been used in many traffic prediction applications. However, these classical statistical methods require that the input data must meet certain premise assumptions. Therefore it is not sufficient for them to process complex and non-linear traffic data, and cannot model the real circumstance meticulously Yao et al. (2019). Researchers add extra information, such as human activity, weather data, and holiday plan to those methods and make a better decision Wu and Tan (2016); Tong et al. (2017); Zhang et al. (2017); Ye et al. (2019).

The second category includes traditional machine learning approaches, such as KNN 

Van Lint and Van Hinsbergen (2012); Luo et al. (2019) and SVM Jeong et al. (2013); Sun et al. (2015). Traditional machine learning approaches and traditional time series methods are similar in data processing. It is more useful if the types and features of traffic data are given, but it is challenging to set the parameters in advance Guo et al. (2019). Moreover, these simple machine learning methods can model more complex traffic data, but they do not take into account the temporal and spatial dependencies of traffic data simultaneously. Therefore, deep learning techniques are more useful and accessible for traffic prediction.

Recently, deep learning has achieved the best precision in many challenging tasks, such as the game of go Silver et al. (2016), speech recognition, object detection, and many other domains LeCun et al. (2015)

. Affected by the great success of deep learning, more and more researchers are trying to use deep learning techniques to traffic prediction. For example, Lv et al. use stacked autoencoders to learn generic traffic flow features 

Lv et al. (2015)

. Cui et al. transfer the city-wide traffic to a heatmap image to catch the non-linear spatial dependency by using the convolutional neural network (CNN) 

Cui et al. (2018); Yao et al. (2018); Zhang et al. (2018)

. At the same time, some researchers make use of the recurrent neural network (RNN) 

Yao et al. (2019); Cui et al. (2018, 2018) to catch non-linear temporal dependency. A ST-ResNet model designed base on the residual convolution unit to predict traffic flow Zhang et al. (2017). Besides, graph convolutional network(GCN) and graph neural network(GNN) as the supplement to CNN have been applied to the traffic flow prediction together with CNN Yu et al. (2018); Zhao et al. (2018); Guo et al. (2019). A Spatial-Temporal Dynamic Network(STDN) is proposed to learn the similarity between locations dynamically for taxi demand prediction Yao et al. (2018), and the STDN is put forward to solve dynamic problems Yao et al. (2019).

The traffic volume forecast results are somewhat dependent on long-term periodicity. The prediction of traffic volume is not only affected by the data dependency of the previous item, but also the situation of traffic prediction should affect the previous prediction result to some extent. In addition to the above factors, weather conditions have a tremendous impact on the forecast of traffic volume, and the impact level of weather conditions on different traveling choices is also inconsistent. For example, heavy rain has a more critical impact on bicycles than on buses or taxis.

In this paper, we propose a novel framework based on a more elaborate attention mechanism and bidirectional LSTM combined with weather conditions and other external features. We use the proposed framework to conduct the experiments on the NYC-Taxi and NYC-Bike datasets. The experimental results show that the prediction accuracy of our framework is better than the existing baselines. The main contributions of this paper are as follows:

  • We utilize a bidirectional LSTM based on two-way feature dependency to extract omnidirectional features of traffic volume.

  • We propose a more elaborate multi-scale attention mechanism that combines long-term and short-term periodic dynamic temporal dependency acquisition.

  • We add the external features such as weather, holidays, weekdays, and weekends into our model to improve the prediction accuracy of traffic volume.

2 Related Work

Traffic volume is operating the traffic data as a consecutive fluid Greenberg (1959). Traffic volume includes not only the operations of the vehicles but also the activities of bicycles and pedestrians. The traffic volume forms a time series dynamical with time and is embedded in the continuous space, and thus traffic volume forecasting is a typical spatio-temporal data mining problem. In this section, we focus on the work related to traffic volume forecasting.

In the traditional time series prediction methods, ARIMA Williams and Hoel (2003) and SARIMA Szeto et al. (2009) have been applied to some linear data prediction modules. However, the accuracy of these methods for non-linear traffic data cannot be guaranteed. Although the KNN Van Lint and Van Hinsbergen (2012) and SVM Jeong et al. (2013) methods are capable of handling more complex data, it is necessary to know the data types and features and set the corresponding parameters in advance. Affected by the great success of deep learning, more and more researchers use deep learning techniques to predict traffic data. A wide variety of networks are designed and applied by researchers in traffic volume prediction. Although several methods, such as MLFNNDougherty and Cobbett (1997), RNN Yasdi (1999); Zhang (2000), and WNN Boto-Giralda et al. (2010), can model the non-linear traffic data to some extent, their network structures only consider the partial temporal dependency of traffic data.

To solve this problem, CNN has been used to predict traffic volume in recent years Yu et al. (2017); Yao et al. (2018); Zhang et al. (2018). Shi et al. proposed the LSTM extension, the ConvLSTM, which is better than FC-LSTM and suitable for the spatio-temporal data Shi et al. (2015). Three residual neural networks are composed dynamically for forecasting the flow of crowds Zhang et al. (2017). Similarly, a highly flexible and extendible end-to-end framework, DeepSD is proposed for modeling the car-hailing service supply-demand Wang et al. (2017). DeepSD combined with the deep residual network and embedding method is used for learning the patterns by spatial-temporal attributes. They regard the entire city as a map and divide the entire city equally into areas or according to traffic checkpoints, such as bus stops, traffic lights, and highway toll stations. These algorithms take into account both spatial-temporal dependencies but do not consider the dynamic relationship of spatial-temporal dependence of traffic data.

Yao et al. designed a multi-view spatial-temporal network (DMVST-Net) Yao et al. (2018), which combines spatial attributes, temporal attributes, and semantic attributes. The local CNN and graph embedding methods are used in DMVST-Net to get more views of data. Moreover, they further designed the deep learning network and proposed the framework called Spatial-Temporal Dynamic Network (STDN) Yao et al. (2019), which can extract the similarity between different regions dynamically. However, their method still makes insufficient consideration of dynamic temporal dependency. For example, the traffic volume in the morning peaks can affect the traffic conditions in the evening peaks, and the traffic conditions in different seasons are different. Besides, most methods do not take into account the impact of external features such as weather, holidays, weekdays, weekends, and seasons on traffic data.

In summary, we propose a novel model that is different from the literatures. Our framework is based on a new multi-scale attention mechanism, forward and backward dependency information on traffic data and the external features to improve the prediction accuracy.

Figure 1: The framework of the network. Input data include traffic flow, traffic volume, and external features. The traffic features are extracted by local CNN and BDLSTM with an attention mechanism. The external features are combined with the obtained traffic data features, and the final output is obtained through a fully connected layer.

3 Methods

In this section, we will focus on the proposed network framework. The overall network framework is shown in Figure 1. The traffic volume represents the number of vehicles arriving or departing a region within a fixed time interval. The traffic flow represents the number of vehicles entering into other regions within a fixed time interval. Traffic flow reflects the association between different regions.

3.1 Local CNN

We use local CNN to obtain local spatial dependency. The main idea is to weaken the reliance between the weaker correlation region and the prediction region, and enhance the dependence between the strongly correlated region and the prediction region Yao et al. (2018, 2019). Experiments have shown that regions with weaker correlations with predicted regions regularly reduce the accuracy of traffic prediction.

For each time interval , we handle the target region and its surrounding neighbors as an image with two channels. The first channel represents traffic conditions at the start of the time interval, and the other channel represents traffic conditions at the end of the time interval. The target region is the center point of the image. Then, local CNN takes as input , and the expression of each convolution layer is:

(1)

, where represents the convolution operation, and are learned parameters. After a total of convolutions, the features of the target region are obtained and transmitted to the fully connected layer.

3.2 Masking Mechanism

There may be missing values or outliers which are less than zero or greater than 0.5 in the traffic data, and the bidirectional LSTM (BDLSTM) model cannot train with these values. If we assign them to zero or the fixed values, the prediction accuracy of the framework would be significantly reduced. Similar to

Cui et al. (2018), we use a masking mechanism to reduce the impact of these values on model training. The operation of the masking mechanism is shown in Figure 2. Initially, we assume that all data is reasonable. Then, they are fed into local CNN, and precessed by the masking operation on the convolution output. For example, if the data at time is a missing value or outlier, the BDLSTM training process of step is skipped, and the result of step is fed into step . If the data at time is still a missing value or outlier, the processing is skipped again until the value is reasonable. In this way, the missing values or outliers need not be set to zero or the fixed values, and thus the BDLSTM training process is not affected by them.

Figure 2: Masking mechanism for traffic volume with missing values and outliers.
Figure 3: Detailed architecture of BDLSTM.

3.3 Bdlstm

We use a BDLSTM network Cui et al. (2018) to obtain the temporal dependency of traffic data, as shown in Figure 3. The main reasons for using BDLSTM are as follows:

  • Generally, the input data of the LSTM is arranged in the temporal order. However, LSTM only utilizes the positive dependency of the data, which may cause useful features to be filtered out or not pass the chain-like gated structure effectively.

  • Besides, analyzing the temporal periodicity of traffic data from a forward and backward perspective can help us find the recurring traffic patterns and improve prediction precision.

  • In fact, traffic data does not increase or decrease suddenly. We can utilize BDLSTM to smooth the data so that the predicted results are closer to the ground truth.

The BDLSTM mentioned in this paper contains a forward LSTM and a backward LSTM. We use and to represent the inputs iteratively calculated in a positive sequence and a reversed sequence for the target region at time , respectively. The output is defined as follow:

(2)

, where represent the features of the target region at time .

3.4 Attention Mechanism

Typically, scholars adopt an attention mechanism to capture the temporal shifting in the long-term traffic volume of the daily or weekly periodicity Yao et al. (2019); Guo et al. (2019). To improve the prediction accuracy, we add more information to the attention mechanism, which includes three kinds of time intervals, hourly (e.g., intervals), daily, and weekly periodicity. For the hourly level, the traffic data between time and time is used to predict the traffic volume of time . For the daily (resp. weekly) level, the traffic data between time and time of the previous days (resp. weeks) is used to predict the traffic volume of time .

Moreover, through a large amount of data analysis and comparative experiments, we found that there are strong relationships between traffic conditions at the morning peak and the ones at the evening peak each day, as shown in Figure 4.

Figure 4: Traffic volume of three days selected randomly.

So, it is useful for predicting the traffic volume between 20:00 and 20:30 to consider the traffic volume around 9:00. Therefore, we regard the traffic volume before hours as an essential reliance on specific periods to improve the prediction accuracy of the model. We use BDLSTM to learn the correlation of these different levels, which are defined as follows:

(3)

, where represents the target region, represents the time to be predicted, represents the output of the hours period ( is a interval of half an hour), (resp. ) represents the output of the period before days (resp. weeks). represents the input of the hours period. (resp. ) represents the input of the hours period before days (resp. weeks). We use an attention mechanism to capture the dynamic temporal dependency, and obtain the weighted representations of the previous hours, days and weeks as follows:

(4)

, where , , and represent the weighted sum of the output features of the previous hours, days, and weeks corresponding to the region at time , respectively. The weights , and are used to measure the importance of the time intervals , and , respectively, and defined as follows:

(5)

The function is defined as:

(6)

, where , , , , , , , , , , , and are learned parameters, and is a transposition of . For the previous hours, days, and weeks, we obtain the values of , , and , respectively. Then, we use these representations as the input of the following BDLSTM to preserve the periodic information,

(7)

The final outputs of , , and represent the dynamic temporal dependence.

3.5 Integration with External Features

We first capture the weather conditions of New York City from the website111https://darksky.net and extract the other external features, including holidays, weekdays, and weekends. This multiple information is pre-processed and used as the external features of the target region at time , denoted as . Then, we integrate the short-term output with the three long-term output , and together, denoted as , where we adopt the shorter intervals in the short-term function to capture the detail information. Finally, is combined with the external features , and the prediction results and are obtained through a fully connected layer, which are defined as follows:

(8)

, where and are the parameters learned. and indicate the prediction results of the target area at the start and the end of at time respectively.

4 Experiment

4.1 Experimental Data

We use two datasets, NYC-Taxi and NYC-Bike, to verify the effectiveness of our prediction model, their information is shown in Table 1. For the NYC-Taxi dataset, there are 22,349,490 taxi trip records from 01/01/2015 to 03/01/2015. For the NYC-Bike dataset, there are 2,605,648 trip records from 07/01/2016 to 08/29/2016. For the two datasets, we choose the last twenty days data as the testing data which plays the same role of the validation data and other data as training data. The external features we collected include weather conditions, holidays, weekends, and weekdays. The details of the external features are shown in Table 2.

Dataset NYC-Taxi NYC-Bike
Time span 01/01/2015- 07/01/2016-
03/01/2015 08/29/2016
Time interval 30 minutes
Gird map size (10, 20)
Table 1: The datasets of the NYC-Bike and NYC-Taxi.
Dataset NYC-Taxi NYC-Bike
Temperature/F [1, 53] [58, 95]
Wind speed/mph [0, 17] [0, 8]
Humidity/% [30, 89] [31, 95]
UV index [0,4] [0,10]

Precip probability/%

[0, 100] [0, 100]
Armospheric pressure/mb [991, 1044] [1005, 1027]
Visibility/mi [0, 11] [3, 10]
holidays 3 1
weekend 17 17
Table 2: The external features of the NYC-Bike and NYC-Taxi.

4.2 Preprocessing

In the beginning, we normalize the training data to [0, 1] by using Max-Min normalization on the training set. In the evaluation, we re-scale the predicted values back to the standard values and compare them with the ground truth. For the external features, we first use the de-unit process to remove the information of units and then normalize them to [0-1] by way of Max-Min. Besides, we use one-hot coding to transform holiday, weekday, and weekend conditions into binary vectors.

4.3 Parameters

The python library Keras is used to construct the BDLSTM in our method. We set up a 3-layer convolution to extract the spatial dependency of the traffic volume, the size of each layer of convolution kernel is 3*3 with 64 filters, the dimension of hidden representation of BDLSTM is 128. For the length of the short-time traffic data, we set

, namely the previous 1.5 hours. For the length of the long-time traffic data, we set the corresponding peak (i.e., from previous 9.5 to 12.5 hours), the number of days (i.e., the previous three days), the number of weeks (i.e., previous one week). We choose 2/3 of the data as the training set and the remaining data as the test set. The batch size is 128, and the learning rate is

. The optimizer used in our experiment is Adagrad. We finetune the CNN network after 20 epochs. We set the batch size to be 80 and train for up to 150 epochs with early stopping if the loss score had not improved over the last 6 epochs.

4.4 Evaluation Methods

We use the metrics of Rooted Mean Square Error (RMSE) and Mean Average Percentage Error (MAPE) to evaluate our proposed model, which are defined as follows:

(9)
(10)

, where is the total number of samples, and represent the ground truth and the prediction result, respectively.

4.5 Baselines

We compare our model with the following conventional methods that employ a neural network to predict traffic volume.

  • ConvLSTM Shi et al. (2015): It uses convolution to obtain the spatial dependency of the data and then adopts LSTM to obtain temporal dependencies.

  • DeepSD Wang et al. (2017): An end-to-end deep learning framework that automatically learns the spatial-temporal features.

  • ST-ResNet Zhang et al. (2017): ST-ResNet uses a convolution-based residual network to model the spatial-temporal dependency of any two regions in a city.

  • DMVST-Net Yao et al. (2018): A unified multi-view model that jointly considers the spatial, temporal, and semantic relations.

  • STDN Yao et al. (2019): STDN uses the flow gating mechanism to obtain the spatial dynamic dependency of traffic data, and obtain the dynamic dependency on temporal through the attention mechanism.

5 Results and Discussion

In this section, we will analyze the experimental results on the datasets of NYC-Bike and NYC-Taxi.

5.1 Results on NYC-Bike

The experimental results on the NYC-Bike dataset are shown in Table 3. It includes a comparison of the proposed model with the five baselines mentioned above including the latest method STDN, where there are three cases in the ablation experiment of our model. We can know by comparison that the different factors we added show varying degrees of influence on the overall experimental results. Overall, compared to the latest method STDN, our model reduces the RMSE and MAPE values by 5.99% and 3.43% at the start of the period, respectively. At the end of the period, it reduces the RMSE and MAPE values by 6.01% and 5.03%, respectively.

model RMSE-START MAPE-START RMSE-END MAPE-END
ConvLSTM
DeepSD
ST-ResNet
DMVST-Net
STDN
LSTM+attention
LSTM+external
BDLSTM
Our model 8.32 21.09% 7.66 19.82%
Table 3: Experimental results on the NYC-Bike dataset. The results of ConvLSTM, DeepSD, ST-ResNet, DMVST-Net, and STDN are refered from Yao et al. (2019).
Predict Period Model RMSE-START RMSE-END MAPE-START MAPE-END
Weekend STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Weekdays STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Off-peak period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Peak period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
All period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Table 4: Experimental results for different periods on the NYC-Bike dataset.

In order to further analyze the advantages of our model, we conduct a comparative experiment with STDN at special time intervals, e.g., peak hours versus off-peak hours, weekends versus weekdays. The experimental results are shown in Table 4, where the values in parentheses are the relative error increments. We can observe that our framework shows the best prediction performance on the weekends and weekdays, in the off-peak periods and peak periods separately. In addition, each factor we added has a different influence on prediction accuracy. From the data in Table 4, we can see that the multi-scale attention mechanism has the most significant impact on forecasting traffic conditions on the weekends. BDLSTM has a greater impact on the weekdays, and external features have a greater impact on the off-peak periods. BDLSTM and multi-scale attention mechanisms have a more significant effect on the peak periods.

Figure 5: The experimental results of our model for different prediction intervals on the NYC-Bike dataset. Figures and indicate the RMSE and MAPE from 0.5 to 2 hours at the start and end, respectively.
model RMSE-START MAPE-START RMSE-END MAPE-END
ConvLSTM
DeepSD
ST-ResNet
DMVST-Net
STDN
LSTM+attention
LSTM+external
BDLSTM
Our model 22.37 15.45% 17.78 15.52%
Table 5: Experimental results on the NYC-Taxi dataset. The results of ConvLSTM, DeepSD, ST-ResNet, DMVST-Net, and STDN are refered from Yao et al. (2019).
Predict Period Model RMSE-START RMSE-END MAPE-START MAPE-END
Weekend STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Weekdays STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Off-peak period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Peak period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
All period STDN
LSTM+attention
LSTM+external
BDLSTM
Our model
Table 6: Experimental results for different periods on the NYC-Taxi dataset.

In order to test the performance of our framework at different time intervals, we select four prediction scenarios with prediction periods of 0.5h, 1h, 1.5h, and 2h, respectively. Figure 5 shows the RMSE and MAPE of STDN and our model in different prediction intervals. As the prediction interval increases, the experimental error gradually increases from 0.5 to 2 hours. It is worth noting that as the prediction interval of our framework increases, the rate of error rise (i.e., the slope of the red and purple polylines) is significantly lower than that of STDN. The result shows that our framework is useful not only in the short-term prediction interval but also in the case of long-term traffic condition prediction.

5.2 Results on NYC-Taxi

The experimental results on the NYC-Taxi dataset are shown in Table 5. Similarly, it includes a comparison of the proposed model with the five baselines mentioned above including the latest method STDN, where there are three cases in the ablation experiment of our model. From the experimental results in Table 5, we can see that the three factors have played a decisive role in improving prediction accuracy. In particular, compared with the latest method STDN, our model reduces the RMSE and MAPE values by 7.18% and 5.21% at the start of the period, respectively. At the end of the period, it reduces the RMSE and MAPE values by 6.67% and 4.49%, respectively.

Figure 6: The experimental results of our model for different prediction intervals on the NYC-Taxi dataset. Figures and indicate the RMSE and MAPE from 0.5 to 2 hours at the start and end, respectively.

Similarly, in order to further analyze the advantages of our model, we conduct a comparative experiment with STDN at special time intervals, e.g., peak hours versus off-peak hours, weekends versus weekdays. The experimental results are shown in Table 6. From it, we can see that for the NYC-Taxi dataset, BDLSTM has a higher degree of reduction in the prediction error rate for the weekends and off-peak periods. External features have a higher degree of influence on weekdays, off-peak periods, and peak periods. In addition, when making predictions for all periods, external features can greatly reduce prediction errors.

Similarly, Figure 6 shows the RMSE and MAPE of STDN and our model in different prediction intervals. As the prediction interval increases, the experimental error from 0.5 to 2 hours gradually increases. It is worth noting that as the prediction interval of our network framework increases, the rate of error rise (i.e., the slope of the red and purple polylines) is significantly lower than that of STDN.

5.3 Effects of Different Components

External Features: From Tables 3, 4, 5, and 6, we can conclude that the external characteristics (weather conditions, holidays, etc.) have a notable influence on improving the predicting accuracy in the short-term traffic condition. For the factor of holidays, we randomly select the traffic conditions for a week, as shown in Figure 7. We can see that the traffic volume on the weekends fluctuates significantly, while the traffic volume on weekdays maintains a potential regularity. In addition, for the factor of weather conditions (rainy and sunny days), we randomly select the traffic volume for two weekdays, as shown in Figure 8. We can see that the trends of traffic volume on rainy and sunny days are significantly different.

Figure 7: Normalized traffic volume of one random week. The dotted lines represent the data of weekday, and the solid lines represent the data of weekend.
Figure 8: Normalized traffic volume of sunny and rainy days.

We deduce the main reason that bicycle and taxi usage rates are mainly affected by weather and other external characteristics. For example, when there is heavy rainfall or high UV level, the bicycle usage rate will be reduced significantly. Besides, in extreme weather conditions, the possibility of taking a taxi will dramatically increase. At this time, the influence of weather features on the prediction results of traffic data is undeniable. Moreover, the trends of traffic volume on holidays, weekends, and weekdays are much different. Thus these factors affect the accuracy of prediction significantly. Figures 9 and 10 show our framework for predicting RMSE and MAPE of traffic volume in different periods on the NYC-Bike and NYC-Taxi dataset, respectively. So we believe that the framework can obtain the superior performance on predicting the traffic conditions of weekdays.

Figure 9: The experimental results of our integrated framework for different prediction periods on the NYC-Bike dataset.
Figure 10: The experimental results of our integrated framework for different prediction periods on the NYC-Taxi dataset.

BDLSTM: From Tables 3 and 5, we can see that BDLSTM has a better performance than LSTM. The reason is that in the chain-like gated structure of BDLSTM, the forward and backward dependencies of traffic volume are taken into consideration. In other words, BDLSTM can provide double protection, preventing critical information from being filtered out and ensuring them to pass through the chain-like gated structures efficiently. Besides, due to the periodicity of traffic data, BDLSTM can learn the forward feature dependencies in chronological order and the reverse feature dependencies in reverse chronological order.

Figure 11: The RMSE and MAPE of the LSTM with attention mechanism on the NYC-Bike dataset.
Figure 12: The RMSE and MAPE of the LSTM with attention mechanism on the NYC-Taxi dataset.

New Attention Mechanism: It can be seen from the experimental results that LSTM with the new attention mechanism has a better performance than STDN Yao et al. (2019). The error rates at the start and end of the traffic volume have been reduced after the new attention mechanism is adopted. One reason is that we have added more information to the attention mechanism, including hourly (e.g., intervals), daily, and weekly periodic dependence. The other reason is that we have considered the relationship between the morning peak and the evening peak of the daily traffic volume. We do a lot of data analysis and experiments on the selection of peak intervals, and the experimental results are shown in Figures 11 and 12. When the value of the peak interval is set to 11 hours, the values of RMSE and MAPE at the start and the end of the period are minimal. It concludes that the traffic volume at the target region at time is related to the traffic conditions of the corresponding peak time.

5.4 Time Complexity

The most time-consuming part of the model is the capture of the temporal features dependency. Due to the STDN uses LSTM and traditional attention mechanism, and our model uses the more complex gate structure of BDLSTM and the multi-scale attention mechanism, the time required for each iteration of the model is 1.52 times higher than that of the method STDN during the training, and the executing time of the model for prediction is 1.58 times higher than that of STDN. We implement the proposed model in Python 3.6. The model has been successfully executed and tested on Ubuntu 18.04 platform, running on a PC with Intel Core CPU i7-7820X@3.60 GHz, 80 GB RAM, and Nvidia graphics card TITAN Xp. The training process of the model takes about 6.5 hours, and the prediction of the traffic volume takes about 3.3 minutes.

6 Conclusion

Many effective models have been designed to predict traffic conditions, where the latest model STDN is a combination of LSTM and attention mechanism with time intervals of days. In this paper, after analyzing the multi-scale correlation of traffic volume, we first propose a multi-scale modeling scheme, which contains the four levels of time intervals, hours, days, and weeks to improve the accuracy of traffic volume forecast during peak hours. This multi-scale model not only is useful for traffic volume in un-peak periods, but also can make more accurate predictions for peak periods. Secondly, we propose a way to combine the multi-scale attention mechanism and the BDLSTM. Finally, the multi-source information is added as the external features to improve the prediction accuracy of traffic volume. The experimental results on the NYC-Bike and NYC-Taxi show that the model achieves a more accurate prediction performance than the method STDN. Moreover, since our model is superior to the five baselines in terms of all the evaluation metrics on the different traffic datasets, we believe that it has generalization ability to a certain extent and can be applied to other time series prediction tasks. In the future, we plan to improve the real-time performance of the model so that it can be used to predict the actual road conditions.

Acknowledgements.
This research is supported by the National Natural Science Foundation of China (NSFC 61572005, 61672086, 61702030, 61771058).

References

  • D. Boto-Giralda, F. J. Díaz-Pernas, D. González-Ortega, J. F. Díez-Higuera, M. Antón-Rodríguez, M. Martínez-Zarzuela, and I. Torre-Díez (2010) Wavelet-based denoising for traffic volume time series forecasting with self-organizing neural networks. Computer-Aided Civil and Infrastructure Engineering 25 (7), pp. 530–545. Note: doi: 10.1111/j.1467-8667.2010.00668.x Cited by: §2.
  • Z. Cui, K. Henrickson, R. Ke, and Y. Wang (2018) Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. CoRR abs/1802.07007, pp. 1–11. Cited by: §1, §1.
  • Z. Cui, R. Ke, and Y. Wang (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. CoRR abs/1801.02143, pp. 1–12. Cited by: §1, §3.2, §3.3.
  • M. S. Dougherty and M. R. Cobbett (1997) Short-term inter-urban traffic forecasts using neural networks. International Journal of Forecasting 13 (1), pp. 21–31. Cited by: §2.
  • H. Greenberg (1959) An analysis of traffic flow. Operations Research 7 (1), pp. 79–85. Note: doi: 10.1287/opre.7.1.79 Cited by: §2.
  • J. Guo and B. M. Williams (2010) Real-time short-term traffic speed level forecasting and uncertainty quantification using layered kalman filters. Transportation Research Record 2175 (1), pp. 28–37. Note: doi: 10.3141/2175-04 Cited by: §1.
  • S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting.

    Proceedings of the AAAI Conference on Artificial Intelligence

    33, pp. 922–929.
    Cited by: §1, §1, §3.4.
  • Y. Jeong, Y. Byon, M. M. Castro-Neto, and S. M. Easa (2013) Supervised weighting-online learning algorithm for short-term traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems 14 (4), pp. 1700–1707. Cited by: §1, §2.
  • Y. LeCun, Y. Bengio, and G. E. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §1.
  • X. Luo, D. Li, Y. Yang, and S. Zhang (2019) Spatiotemporal traffic flow prediction with KNN and LSTM. Journal of Advanced Transportation 2019, pp. 10. Cited by: §1.
  • Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Wang (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16 (2), pp. 865–873. Cited by: §1.
  • X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, pp. 802–810. Cited by: §2, 1st item.
  • D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis (2016) Mastering the game of go with deep neural networks and tree search. Nature 529, pp. 484–489. Cited by: §1.
  • Y. Sun, B. Leng, and W. Guan (2015) A novel wavelet-SVM short-time passenger flow prediction in Beijing subway system. Neurocomputing 166, pp. 109–121. Cited by: §1.
  • W. Y. Szeto, B. Ghosh, B. Basu, and M. O’Mahony (2009) Multivariate traffic forecasting technique using cell transmission model and SARIMA model. Journal of Transportation Engineering 135 (9), pp. 658–667. Note: doi: 10.1061/(ASCE)0733-947X(2009)135:9(658) Cited by: §1, §2.
  • Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv (2017) The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1653–1662. Cited by: §1.
  • J. Van Lint and C. Van Hinsbergen (2012) Short-term traffic and travel time prediction models. Artificial Intelligence Applications to Critical Transportation Issues 22 (1), pp. 22–41. Cited by: §1, §2.
  • D. Wang, W. Cao, J. Li, and J. Ye (2017) DeepSD: supply-demand prediction for online car-hailing services using deep neural networks. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 243–254. Cited by: §2, 2nd item.
  • B. M. Williams and L. A. Hoel (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. Journal of Transportation Engineering 129 (6), pp. 664–672. Note: doi: 10.1061/(ASCE)0733-947X(2003)129:6(664) Cited by: §1, §2.
  • Y. Wu and H. Tan (2016) Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework. CoRR abs/1612.01022, pp. 1–14. Cited by: §1.
  • Y. Xie, Y. Zhang, and Z. Ye (2007) Short-term traffic volume forecasting using kalman filter with discrete wavelet decomposition. Computer-Aided Civil and Infrastructure Engineering 22 (5), pp. 326–334. Note: doi: 10.1111/j.1467-8667.2007.00489.x Cited by: §1.
  • H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li (2019) Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction. Proceedings of the AAAI Conference on Artificial Intelligence 33, pp. 5668–5675. Cited by: §1, §1, §2, §3.1, §3.4, 5th item, §5.3, Table 3, Table 5.
  • H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li (2018) Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 2588–2595. Cited by: §1, §2, §2, §3.1, 4th item.
  • R. Yasdi (1999) Prediction of road traffic using a neural network approach. Neural Computing & Applications 8 (2), pp. 135–142. Cited by: §2.
  • J. Ye, L. Sun, B. Du, Y. Fu, X. Tong, and H. Xiong (2019) Co-prediction of multiple transportation demands based on deep spatio-temporal neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 305–313. Note: doi: 10.1145/3292500.3330887 Cited by: §1.
  • B. Yu, H. Yin, and Z. Zhu (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pp. 3634–3640. Note: doi : 10.24963/ijcai.2018/505 Cited by: §1.
  • H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma (2017) Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17 (7), pp. 1501. Cited by: §2.
  • H. M. Zhang (2000) Recursive prediction of traffic conditions with neural network models. Journal of Transportation Engineering 126 (6), pp. 472–481. Note: doi: 10.1061/(ASCE)0733-947X(2000)126:6(472) Cited by: §2.
  • J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li (2018) Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence 259, pp. 147–166. Cited by: §1, §2.
  • J. Zhang, Y. Zheng, and D. Qi (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1655–1661. Cited by: §1, §1, §2, 3rd item.
  • L. Zhao, Y. Song, M. Deng, and H. Li (2018) Temporal graph convolutional network for urban traffic flow prediction method. CoRR abs/1811.05320, pp. 1–10. Cited by: §1.