1 Introduction
Precise, quickly and timely traffic flow prediction is one of the major tasks of intelligent transportation systems (ITSs). It is of practical significance for individuals, companies and governments to make decisions according to realtime traffic flow. However, accurate and shortterm traffic flow prediction remains challenging to researchers for decades because of its stochastic and nonlinear characteristics. At the beginning of traffic flow forecasting research, researchers mainly used linear methods, such as autoregressive integrated moving average (ARIMA) (Ahmed and Cook, 1979)
. Some researchers still prefer to use linear models for their simplicity and convenience and researchers have proposed some improved linear models such as multivariable linear regression (MVLR)
(L. Li and Zhang, 2015).From about 40 years ago, some machine learning algorithms showed good performance in many tasks, researchers began to use machine learning algorithms like support vector regression (SVR)
(X. Jin and Yao, 2007) and kNearest Neighbor (kNN) (a. Davis and Nihan, 1991). In spite of good performance, these approaches cannot consider the entire characteristics in traffic flow and do not have satisfactory performances.Recently, with the development of deep learning, some deep learning models for traffic flow prediction are put forward, such as Recurrent Neural Network (RNN)
(Kyunghyun Cho and Bengio, 2014), Long ShortTerm Memory network (LSTM)
(Tian and Pan, 2015), Gated Recurrent Unit (GRU)
(Kyunghyun Cho, 2014), Stacked AutoEncoders (SAEs) (Y. Lv and Wang, 2014), Deep Belief Network (DBN)
(W. Huang and Xie, 2014), etc.. In general, these models have complex network architecture that can capture nonlinearities in traffic flow. Hence, they perform better in forecasting traffic flow than traditional models. Despite of this, models can be improved in many ways and performance can be even better.Among all these deep learning models, RNN (Kyunghyun Cho and Bengio, 2014), LSTM (Tian and Pan, 2015) and GRU (Kyunghyun Cho, 2014)
are the most commonly used models. However, the architecture of these models is so complicated that it takes a lot of times to train them. Since it is hard to train, people often have to stack less layers and set the training epochs to a small number, which will result in lower accuracy, or people may use some techniques such as dropout to reduce the size of training set and test set which does not allow people to get a higher accuracy when the data set is not large enough. Therefore, it is difficult to use these models in practical applications. On the other hand, previous studies were mainly based on static models, that is, the model will not be updated once it was trained. Nevertheless, traffic flow is the trick of the real world. In addition, the spatial and temporal relationship of traffic flow is is changing all the time. Therefore, we need to build the mechanism of realtime updating.
In this paper, we improved the deep residual network (DRN) and proposed a Dynamic Improved Deep Residual Network (DIDRN) that will continuously update its training set when some real data are available, which is more powerful in practical application. The DIDRN has good adaptability when the road conditions and the places where we apply it change over time.
The rest of this paper is organized as follows. Section 2 reviews the existing works on shortterm traffic flow prediction. Section 3.1 demonstrates basic ideas of DRN. Section 3.2 and Section 3.3 explain how we further improve DRN and how we design the dynamic model, respectively. Section 4 develops experiments and compares our model’s performance with several popular models. Finally, section 5 concludes this paper and discusses future works.
2 Literature Review
Since traffic flow prediction is one of the main tasks of ITSs, researches on traffic flow prediction have been ongoing for many years. New models are constantly being proposed. Although there are numerous traffic flow prediction methods, generally, they can be divided into two main categories: parametric methods and nonparametric methods.
Parametric methods such as ARIMA (Ahmed and Cook, 1979) and MVLR (L. Li and Zhang, 2015) are often used by researchers. These approaches require predetermined model architecture and that the parameters of the model are calculated by empirical data (Xingyuan Dai, 2017). Some improved models based on ARIMA like SARIMA (Billy M. Williams and Lester A. Hoel, 2003) were proposed in the 20th century. These models have simple and explicit architecture but require a huge amount of data and that the traffic condition is in a stationary process. Hence, it may be impossible to use these approaches when sufficient data are unavailable. In most cases, traffic flow data have stochastic and nonlinear features. Therefore, those parametric approaches cannot perform very well.
From the beginning of 20th century, researchers began to pay more attention to nonparametric approaches such as kNN (a. Davis and Nihan, 1991), SVR (X. Jin and Yao, 2007)
, random forests regression (RF)
(Leshem and Ritov, 2007)gradient boosting regression (Friedman, 2001) and so on. On the other hand, due to the excellent performances of Artificial Neural Networks (ANNs) in capturing nonlinearities, researchers have paid more and more attention to them since the birth of ANNs.Recently, with the rapid development of deep learning, a lot of deep learning models have been proposed by computer scientists. Many powerful deep learning models, such as SAE (Y. Lv and Wang, 2014), DBN (W. Huang and Xie, 2014), RNN (Kyunghyun Cho and Bengio, 2014), LSTM (Tian and Pan, 2015) and GRU (Kyunghyun Cho, 2014), were introduced to traffic flow prediction and showed superior performances.
Sequential data are correlated with data in the context. Nevertheless, common neural networks cannot capture this correlation among data. In the 1980s, RNN (Hochreiter, 1991) was developed to solve this problem. However, although RNN can capture correlation among data, it will soon forget this correlation, that is, it does not have a longterm memory. In 1997, long shortterm memory (LSTM) networks (Sepp Hochreiter, 1997) were invented and set accuracy records in various applications domains. LSTM can remember correlation for a long time and thus has an outstanding performance in sequential modeling. However, both RNN and LSTM are very hard to train since the architecture of these models are too complex. It often takes researchers hours or even days to train these models once. Although they have good performance, to use them in practice is sometimes impossible.
In order to reduce mathematical operations and total running time, gated recurrent unit (GRU) was developed by R. Fu and Li (2016). Unfortunately, in practice, although GRU can reduce training time, the cost is still unacceptable. In addition, Klaus Greff (2017) made a good comparison of the variants of LSTM and found that they are all the same.
Besides RNN and LSTM, researchers have also tried some other deep learning models.
Gang Song (2017) proposed a novel double deep ELMs ensemble system (DDELMsES) and their model demonstrates better generalization performance than some stateofart algorithms. However, their model only focuses on onestep forecasting.Xingyuan Dai (2017) implemented DeepTrend which decomposes original traffic flow data into trend and residual components. They demonstrate that DeepTrend can noticeably improve the prediction performance and outperform many traditional models and LSTM. But they didn’t take the spatial correlation into account. Xiaochuan Sun (2017) proposed a deep belief echostate network (DBEN) to address the problem of slow convergence and local optimum in time series prediction. Experiments results demonstrate that DBEN has a good performance in learning speed and shortterm memory capacity. However, there are a large number of parameters in their model so parameter tuning would be a really hard work.
He et al. (2016) proposed deep residual network (DRN) in ILSVRC. Their model substantially outperformed other models and won the first prize in this competition. Typically, when networks become deeper, it will be difficult to train them and the problem of gradient vanishing will arise. Their model is easy to train and can solve the problem of gradient vanishing effectively even if the deep neural network is several times deeper than other networks. For a normal deep neural network, the maximum depth that people have ever used is less than 100. But for DRN, we can easily create a network with several hundred layers and train it in a even shorter time.
3 Model Development
3.1 Drn
When deep learning began to boost, people simply stacked layers and expected that deeper is better. They give the network an input and let the intermediate layers fit a map to get the final output . As they believe deeper networks could have better performances, they stack more layers to fit the desired map and expect to get a higher accuracy. However, it turns out that when the depth of the network increases, both training and test accuracy get saturated, that is, degradation arises.(He et al., 2016) Figure 2 shows the architecture of simply stacked neural network.
DRN is proposed to solve the problem of degradation. Instead of hoping each stacked layer fits a desired mapping directly. Kaiming lets some layers fit a residual mapping (He et al., 2016).
Concretely, assuming that we want our network to fit , we let the sacked layer fit another mapping . Then the desired mapping can be replaced by .
We hypothesize that it is easier to fit the residual mapping than the original mapping . If the optimal mapping is identity, then it would be easy to make the residual be zero. Figure 2 shows the idea stated above.
This simple change makes a lot of difference. The network becomes much easier to train. Deep residual network has helped Kaiming and his team win 1st prize in the ILSVRC classification competition and many other competitions. Before DRN existed, the winner network of ILSVRC had at most 32 layers while when DRN participated in this competition, it had 152 layers. What’s more, the accuracy had a significant increase.
3.2 Improved DRN
We are inspired by the idea of Kaiming on deep residual network. To keep more information when information is transfered among layers and thus get higher accuracy, we further improve the architecture of DRN and use it in traffic flow prediction. Figure 3 shows architecture of our improved network.
The network works as follows:

Denote the input as , and the first layer fits the function .

Then we add these two outputs together and we get .

The second layer takes as input and maps it into .

Similarly, we then add the output of the first layer and the output of the second layer and we get .
From the aforementioned steps, it is clear that we divide the task of fitting a complex function into fitting two simpler mappings and , respectively. It will make the layers work more efficiently since it is easier to fit these two simple mappings. Consequently, it can reduce the error rate of fitting these two mappings and eventually reduce the overall error rate.
3.3 Dynamic Model
In practice, transportation management departments will not use this model in the specific place that our data come from, that is our model needs to be transfered easily among different places. Besides this, traffic flow on a specific road may change over time. In other words, the traffic flow after one month or one year may have significant difference with the current one. The weights and parameters were obtained by using the old training data. These weights and parameters can fit the old data very well but it can not fit the new data well, that is the previous model would not have good performances on new data. Therefore, it is unreasonable to simply use the old model to predict traffic flow without updating it. It is also not surprising that we cannot expect this old model to have good performances when the traffic flow has changed significantly. Hence, traffic flow prediction models should be updated constantly. Based on the ideas stated above, we choose to design a dynamic model.
In order to update pretrained model, we use the basic idea of incremental learning to implement a dynamic model.
The complete process is as follows:

We use the collected data to pretrain a basic model.

Then we use our model to do prediction work.

When finishing some steps of prediction, we get some new real data.

We combine these new data into a new training set and use this new training set to train our model again.
Figure 4 shows the process of these steps.
With these steps, the model will be continuously updated through absorbing new data. Therefore, this model will fit the practical conditions better and give more accurate predictions.
3.4 Model Implementation
In this section, we will talk about the whole process of processing data.
We divide the process into several steps:

Denote the raw data as . It is a column vector of .
We reasonably assume that there are some trend characteristics in traffic flow data. So we subtract adjacent data to get the difference vector . Where

However, we cannot apply supervised learning to
directly. Hence, we further process our data into supervised data.First, we ought to choose the timestep. Timestep represents how many previous data points we use to predict the next data point. After parameter tuning, we choose 1 as our timestep, that is, timestep = 1.
Then we denote as the feature and as the data to be predicted. and are defined as follows.
Where
Then we get the entire data set.

In order to accelerate the speed of training and prediction, a simple approach that people often use when developing deep learning model and doing experiments is normalization. Since some elements in are positive and others are negative, we scaled and into . We define the resulted vectors as and . They are defined as follows:
Where
And
Then the data set becomes
4 Experiments and Results
4.1 Data Source
Our data come from the ring roads in Beijing, China. Basic traffic flow data are collected by 14 microwave detectors. The interval of time of these traffic flow data is 10 minutes. The total length of data collected by each detector is 8784. The time span is 61 days from Jun., 2013 to Jul.,2013 . For each series of data, we use the first 7200 pieces of the entire data as training data and the rest as test data. We use the detected traffic flow as input and finally get the predicted traffic flow at specified time points. The results verify the feasibility and effectiveness of our proposed model.
4.2 Performance Indexes
In order to evaluate the performance of our proposed model, we develop several experiments on our model and several similar models, including one layer LSTM, deep LSTM, and DRN. We use one layer LSTM as the baseline model to show the accuracy promotion of our model. We use LSTM to illustrate that DIDRN outperforms some stateoftheart and widely used models. Finally, DRN is included to demonstrate that the improvement we have made does make sense.
We use Rooted Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) to evaluate the performance of these four models.
RMSE, MAPE and MAE are defined as follows:
Where is the real value and is the forecast value.
RMSE and MAE represent the deviation from the predicted values and the detected values. They will expand as the range of the data is expanded. Therefore, we can not evaluate our model just on the basis of the absolute value of RMSE and MAE. The range of our data should be considered at the same time. However, MAPE is a relative measure of the deviation from the predicted values and the ground truth values. It is a percentage error so it is not related to the range of our data. We use these three performance indexes to evaluate two aspects of the selected models: 1) absolute error 2) relative error.
4.3 Performance Analysis
Except for one layer LSTM, the other three models have 16 layers. In order to change dimension, we add some layers in both DRN and DIDRN but they still have 16 layers in total.
We evaluate models by 14 different sets of traffic flow data from 14 detectors. They are numbered as 2010, 2011, 2013, 2023, 2030, 2033, 2052, 3034, 3035, 4004, 4005, 4050, 4051, 5062.
All the experiments are run on a desktop with @2.60 GHz processor, 8.0 GB RAM and NVIDIA GeForce GTX 960M. All models were coded in Python 3.6 with Keras and Tensorflow framework and they were compiled by Anaconda Jupyter Notebook.
We first use our these model to predict shortterm traffic flow. Concretely, the interval of time points is 10 minutes. We use the previous traffic flow to predict the traffic flow 10 minutes later.
Table 1 shows the performance.
Model  One Layer LSTM  Deep LSTM  DRN  DIDRN  

2010  RMSE  101.07  75.65  75.64  74.35 
MAPE  11.58%  7.05%  7.04%  6.99%  
MAE  84.60  59.13  59.12  53.73  
2011  RMSE  123.79  104.13  104.13  101.53 
MAPE  11.15%  7.90%  7.90%  7.44%  
MAE  98.61  77.70  77.71  75.07  
2012  RMSE  122.05  96.47  96.37  94.46 
MAPE  15.56%  9.46%  9.43%  9.38%  
MAE  98.82  71.23  71.13  69.07  
2023  RMSE  99.99  78.28  78.02  74.26 
MAPE  11.19%  7.02%  6.97%  6.96%  
MAE  81.86  58.90  58.60  55.72  
2030  RMSE  104.95  76.00  75.96  67.13 
MAPE  14.63%  7.43%  7.42%  7.38%  
MAE  86.94  57.34  57.29  49.66  
2033  RMSE  79.84  59.75  59.75  58.40 
MAPE  12.63%  6.89%  6.89%  6.80%  
MAE  66.70  45.11  45.11  44.11  
2052  RMSE  109.56  85.99  85.99  83.28 
MAPE  11.63%  7.46%  7.46%  7.20%  
MAE  88.31  65.52  65.52  62.77  
3034  RMSE  99.08  74.32  74.04  72.27 
MAPE  10.45%  6.97%  6.89%  6.89%  
MAE  83.05  57.07  56.99  53.67  
3035  RMSE  98.09  68.36  68.36  64.74 
MAPE  15.15%  7.64%  7.64%  7.59%  
MAE  82.82  51.15  51.15  46.68  
4004  RMSE  94.20  61.89  61.89  57.30 
MAPE  16.44%  8.21%  8.21%  6.80%  
MAE  80.25  47.01  47.01  42.39  
4005  RMSE  106.9  87.65  87.65  86.02 
MAPE  10.02%  6.34%  6.34%  6.27%  
MAE  87.13  65.63  65.63  64.35  
4050  RMSE  93.53  57.85  57.85  56.48 
MAPE  37.96%  19.36%  19.36%  18.58%  
MAE  80.57  37.89  37.89  36.11  
4051  RMSE  112.35  98.13  98.13  98.60 
MAPE  12.04%  8.66%  8.66%  8.45%  
MAE  83.94  65.56  65.56  65.20  
5062  RMSE  90.63  60.06  60.06  56.18 
MAPE  17.89%  8.23%  8.23%  8.06%  
MAE  77.86  45.87  45.87  42.24 
Through Table 1, we can find that DIDRN has more outstanding performance than other models. One layer LSTM has a much worse performance than other models since it is shallower. Deep LSTM seems to have a satisfying performance, however, on our machine, it actually takes us nearly 1 hour to train it before it converges. DRN shares a similar but lower error rate with deep LSTM and it is much easier to train. It only takes us about 20 minutes to train DRN. On the other hand, although DIDRN is derived from DRN, it has a lower error rate than DRN. In addition, it is not hard to train. We can finish training it within 30 minutes which is similar to DRN but is much shorter than that of deep LSTM.
Figure 5 shows a comparison of predictions by different models.
Figure 6 shows part of the predictions of DIDRN.
To evaluate the stability of our proposed model, we change the time step into 1 hour, 2 hours, 24 hours and use DIDRN to conduct experiments. Figure 7 and Table 2 show our results.
Time Interval (hour)  RMSE  MAPE  MAE 

1  202.87  18.45%  141.34 
2  342.96  32.67%  233.05 
24  135.24  10.88%  94.92 
In the following discussion, we conduct our analysis based on MAPE because it is simple and intuitive and the data of detector 2010 are chosen to demonstrate the results.
From Table 2 we can see that while the time interval becomes larger from 10 minutes to 1 hour and 2 hours, the accuracy decreases monotonously. This is congruent with our intuition. When the time interval becomes larger, we need more information to predict the traffic flow and the traffic flow of the previous time point is less related to the traffic flow of the predicted time point. Therefore, it is obvious that the accuracy would decrease. However, when we use the traffic flow of the previous day to predict that of the next day, the accuracy increase significantly to 10.88% in terms of MAPE. This is not strange. On one hand, since our data are stable and have periodic nature, traffic flow of the same time in the previous day is quite similar to that of the predicted day. On the other hand, traffic flows 1 hour or 2 hours later can be really different with that of the current time point. For example, traffic flow at 7:00 a.m. has significant change compared to traffic flow at 8:00 a.m. and 9:00 a.m. since they are peak hours. People go to work during peak hours and consequently, lead to a sudden increase in traffic flow. The model do not have enough information to predict this sudden increase so the predict values have a larger deviation from the detected values.
We explore the decrease and increase of accuracy when time interval increase from 10 minutes to 1 hour and from 1 hour to 24 hours in more detail.
Figure 8 shows the diagram of MAPE. It is obvious that MAPE is linearly related to time interval. This is true when the jump between time intervals is small, but when we increase the jump to 1 hour, this linear relation does not exist any more. Figure 9
shows the MAPE of DIDRN when time interval increase from 1 hour to 24 hours. We can observe that the shape of this curve is like a pudding. MAPE firstly increase from about 18% to over 100% then decrease to less than 11%. When time interval equals to 2 hours, the error rate is unacceptable. The highest error rate is over 105% which means the models almost cannot estimate the traffic flow when time interval is large. The reason is that we only use the current traffic flow to estimate traffic flow of the next time point. Traffic flow would not have significant change when time interval is not too large. However, when time interval is as large as several hours, the current traffic flow is much different with the traffic flow to predict. This kind of time interval is equivalent to a reshuffle. The specific example can be what we have mentioned above.
In order to further analyze the performance of our model, we gradually increase the time interval to 7 days.
Table 3 shows the result. It can be observed that when the time interval is ranging from 2 days to 6 days, the error rate decreases from 13.96% to 10.45%. This is because when time interval becomes larger, the predicted traffic flow has less relation with the input traffic flow. The model has less information to update its weights in order to perform well both in training set and test set.
There is a big reduction between 1 day and 2 days. If we think carefully about the whole prediction process, we can figure out what has happened. When time interval equals to 1 day, we can find that traffic flow of Friday and Saturday are used to predict traffic flow of Saturday and Sunday, respectively. Nevertheless, when time interval equals to 1 days, traffic flow of Friday and Saturday are used to predict traffic flow of Sunday and Monday, respectively. Saturday and Sunday are weekend and Monday and Friday are workdays. As less people will go to work at weekends, traffic flow of weekend and workday are significantly different. Hence, it is not hard to see that when time interval equals to 2 days, the accuracy will decrease to some extent.
There is a large increase between 6 days and 7 days. Due to symmetry, the error rate of 6 days should be similar with the error rate of 1 day and it can be observed from our result too. Since traffic flow are a kind of periodic data and the period is exact 7 days, it is not surprising that the flow one week later is similar with the current flow. Due to aforementioned property, our model can perform better and thus has a higher accuracy.
Days  RMSE  MAPE  MAE 

1  135.24  10.88%  94.92 
2  158.27  13.96%  113.70 
3  154.55  13.81%  109.36 
4  145.62  13.22%  104.54 
5  142.73  10.85%  99.14 
6  143.36  10.45%  98.12 
7  100.56  7.28%  69.48 
In summary, our model can do well in shortterm traffic flow prediction as we use traffic flow at one time point to predict traffic flow at next time point. If we use more information as input, that is, use not only the current traffic flow but also that of previous one or more time points, our model is likely to perform better.
5 Conclusion and Future work
In this paper, we demonstrate the basic ideas of deep residual network. It turns out that DRN is much simpler to train and have an excellent performance. Then we explain how we are inspired by DRN and and how we improve it to do our research. We show the architecture of our network and how it works. After that, we propose our dynamic model DIDRN and demonstrate why it makes sense.
We show the entire process of processing data step by step. We explain how we process the raw data into data that can be used in supervised learning and how we get the final predictions. Next, we develop several experiments and compare performance of different models. It turns out that DIDRN has better performance than some popular models. From the view of MAPE, DIDRN has a 1.41% performance improvement at most comparing to LSTM and DRN. For RMSE and MAE, DIDRN can have a mostly 9 of reduction.
Despite the good performance, our model still have some shortcomings.To summarize, our main contributions are as follows:

We apply deep residual network in traffic flow prediction and improve it.

We take practical applications into account and propose a dynamic model called DIDRN.

The results show that our model is more powerful than other commonly used models.
We only take temporal pattern into account in this paper. In future work, the spatial pattern would be considered and we will make our model learn spatialtemporal dependence.
In addition, our model can do traffic flow prediction well when the time interval is short or is the period of our data. But it has a poor performance when the time interval is a little larger. Future work can be done to extend the model to a more generalized version so that the model can perform well when time interval is both short and large.
Moreover, since weather condition certainly has an impact on traffic flow, it would be considered that making our network learn the correlation between weather condition and traffic flow.
References
 a. Davis and Nihan [1991] G. a. Davis and N. L. Nihan. Nonparametric regression and shortterm freeway traffic forecasting. Journal of Transportation Engineering, vol.117(no.2):pp.178–188, 1991. URL https://ascelibrary.org/doi/abs/10.1061/(ASCE)0733947X(1991)117:2(178).
 Ahmed and Cook [1979] M. S. Ahmed and A. R. Cook. Analysis of freeway traffic timeseries data by using boxjenkins techniques. Transportation Research Record, (no. 722):pp.1–9, 1979. URL http://onlinepubs.trb.org/Onlinepubs/trr/1979/722/722001.pdf.
 Billy M. Williams and Lester A. Hoel [2003] M.ASCE Billy M. Williams and F.ASCE Lester A. Hoel. Modeling and forecasting vehicular traffic flow as a seasonal arima process: Theoretical basis and empirical results. J.Transport. Eng., 2003. URL https://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733947X%282003%29129%3A6%28664%29.
 Friedman [2001] J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, vol.29(no.5):pp. 1189–1232, 2001. URL http://www.jstor.org/stable/2699986.
 Gang Song [2017] Qun Dai Gang Song. A novel double deep elms ensemble system for time series forecasting. Knowle dgeBased Systems, 2017. URL https://www.sciencedirect.com/science/article/pii/S0950705117303295.

He et al. [2016]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 770–778, 2016. URL http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf.  Hochreiter [1991] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München, 91:1, 1991. URL https://link.springer.com/article/10.1007%2FBF03227308.
 Klaus Greff [2017] Jan Koutnık Bas R. Steunebrink Jurgen Schmidhuber Klaus Greff, Rupesh K. Srivastava. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, vol. 28:pp. 2222 – 2232, 2017. URL https://ieeexplore.ieee.org/document/7508408/.
 Kyunghyun Cho [2014] Caglar Gulcehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Yoshua Bengio Kyunghyun Cho, Bart van Merrienboer. Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. URL https://arxiv.org/abs/1406.1078.
 Kyunghyun Cho and Bengio [2014] Caglar Gulcehre Fethi Bougares Holger Schwenk Kyunghyun Cho, Bart van Merrienboer and Yoshua Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint, arXiv:1406.1078, 2014. URL http://arxiv.org/abs/1406.1078.
 L. Li and Zhang [2015] X. Su L. Li and Y. Zhang. Trend modeling for traffic time series analysis : An integrated study. IEEE Transactions on Intelligent Transportation Systems, vol. 16(no. 6):pp. 1–10, 2015. URL https://ieeexplore.ieee.org/abstract/document/7180371/.
 Leshem and Ritov [2007] G. Leshem and Y. Ritov. Traffic flow prediction using adaboost algorithm with random forests as a weak learner. International Journal of Mathematical, Computational, Physical, Electrical and Computer Engineering, vol. 1(no.1):pp.2–7, 2007. URL https://pdfs.semanticscholar.org/d013/049dc55f011651cbc6c4ba27097be5e0bfad.pdf.
 R. Fu and Li [2016] Z. Zhang R. Fu and L. Li. Using lstm and gru neural network methods for traffic flow prediction. 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), page pp. 324–328, 2016. URL https://ieeexplore.ieee.org/abstract/document/7804912/.
 Sepp Hochreiter [1997] Jurgen Schmidhuber Sepp Hochreiter. Long shortterm memory. Neural Computation, 1997. URL https://www.mitpressjournals.org/doi/abs/10.1162/neco.1997.9.8.1735.
 Tian and Pan [2015] Y. Tian and L. Pan. Predicting shortterm traffic flow by long shortterm memory recurrent neural network. 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), page pp. 153–158, 2015. URL https://ieeexplore.ieee.org/abstract/document/7463717/.
 W. Huang and Xie [2014] H. Hong W. Huang, G. Song and K. Xie. Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. IEEE Transactions on Intelligent Transportation Systems, vol. 15(no. 5):pp. 2191–2201, 2014. URL https://ieeexplore.ieee.org/abstract/document/6786503/.
 X. Jin and Yao [2007] Y. Zhang X. Jin and D. Yao. Simultaneously prediction of network traffic flow based on pcasvr. Advances in Neural Networks ISNN, vol. 4492(no. PART 2):pp. 1022–1031, 2007. URL https://link.springer.com/chapter/10.1007%2F9783540723936_121.
 Xiaochuan Sun [2017] Qun Li Yue Huang Yingqi Li Xiaochuan Sun, Tao Li. Deep belief echostate network and its application to time series prediction. Knowle dgeBase d Systems, 2017. URL https://www.sciencedirect.com/science/article/pii/S0950705117302459.
 Xingyuan Dai [2017] Yilun Lin Xingyuan Dai, Rui Fu. Deeptrend: A deep hierarchical neural network for traffic flow prediction. arXiv preprint arXiv:1707.03213, 2017. URL https://arxiv.org/abs/1707.03213.
 Y. Lv and Wang [2014] W. Kang Z. Li Y. Lv, Y. Duan and F.Y. Wang. Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems, vol. 16(no. 2):pp. 1–9, 2014. URL https://ieeexplore.ieee.org/abstract/document/6894591/.
Comments
There are no comments yet.