1 Introduction
Urban life has undergone many changes in the development of local communities. This transport transformation and traffic congestion lead to roadclogging, slower speeds, longer trip times, and increased vehicular queuing in most of the urban and suburban passages in the world. This issue will be the trigger of abundant problems such as air pollution and noise pollution and in total, has a massive role in quality reductions. Therefore, governors recognize intelligent traffic flow control systems as a priority plan for their countries. The traffic flow forecasting is a crucial step for obtaining time optimizers in the public traffic adaptive control system.
Traffic flow prediction is a significant issue for both transport management from one side and drivers and ordinary people on the other side. These methods help managers to recognize heavy traffics in the countrysides. Using some predefined paradigms and protocols can avoid the incidence of long traffic jams. On the other hand, drivers and ordinary people can also make a better decision based on that prediction and contributing to decreasing traffic levels. Therefore, predicting traffic flow characteristics in a geographical area is one of the most critical decisionmaking and policymakers that have a significant effect on urban traffic management. Mainly traffic flow prediction divided into three categories [1].

Shortterm forecasting (the interval is 5 minutes to 30 minutes)

Mediumterm forecasting (a time interval of 30 minutes to several hours)

Longterm forecasting (ranges of one day to several days)
The ultimate goal in this domain is to evaluate the traffic flow prediction with the historical traffic data in a particular region before it happens. However, unpredictable disturbances, including internalevents in transportation ways (such as an accident, falling part of the route) and unexpected externalevents (such as a flood, storm) make longterm forecasting inaccurate enough. While mediumterm or shortterm forecasting can be reliable if they correctly setup.
In this research, the shortterm case takes into consideration. The hybrid deep learning method predicts the flow based on a complex generative model from the data, which can recognize the spatial and temporal correlation within the sequence of traffic flows in a particular range. Furthermore, in the following, the recommended model compares to other stateoftheart models.
The contribution of this paper can be summarized as follows:

Presenting a novel hybrid deep learning model based on a Variational Long ShortTerm Memory Encoder (VLSTME)

The proposed model is considering the distribution of data to forecast shortterm traffic flow

Take into consideration the missing data, which occurred by sensors failure by the distributed data
The paper is segmented as follows; the next section gives a brief description of terminologies, challenges, and other methods of shortterm traffic forecasting research concerning several neural network techniques. In section 3, the background of the model is introduced. Then, In section 4, the suggested model is presented. The dataset is denoted in section 5, and the results, and performance evaluation are presented in section 6. Finally, conclusions and future research are stated in section 7.
2 Related Works
Traffic flow forecasting is one of the most useful tools in intelligent transportation systems (ITS). It allows the system to be in a control automatic operation state and anticipates the events before they occur. It can be able to predict and assess the states and prepare itself for logical decisionmaking at the machine level, and based on humanmade protocols can manage the condition [2]. Meanwhile, the shortterm prediction of the traffic flow is more critical than the other two before categories in the field of intelligent transportation systems, in which many research and development are done in both academically and operationally [2]
. A great deal of research on the shortterm forecasting model can be classified into two main categories:

Nonparametric, In these models, with nonlinear backgrounds, we are trying to find the model that has the most receptive learning features. Many research has gotten lots of remarkable results with this insight, such as nonparametric regression techniques [12, 13, 14], knearest neighbor models [15], fuzzy techniques [16, 17, 18], neural networks [19, 20, 21, 22, 23]
, and support vector machine
[24, 25, 26].
The spatialtemporal realtime information by traffic sensors around the country is one of the signs of technological advancement that brings up valuable facilities for the transportation systems of the country. The information provides a massive amount of patterns and paradigms of terrestrial transport in a geographic location. Moreover, the direct and indirect effects of that information present the foundation for the application of deep learning networks. Deep learning is a section of machine learning that grants shortterm forecasts of traffic flows to find latent dependence relationships in a set of patterns with high dimensions of explanatory variables. This model tries to detect extreme disturbances in the traffic flow within a pool of latent relations providing by realtime sensors
[27, 28]. Nevertheless, there is no clue that which types of deep learning models are the most appropriate model for forecasting traffic flows. All of these models are trying to find a part of these latent relations by presenting a different structure.For example, the Stacked Autoencoders model was introduced by considering time and space correlation, was able to learn the general characteristics of the traffic flow
[29]. Another model that was able to achieve better performance is the Long ShortTerm Memory (LSTM) and Gated Recurrent Unit (GRU) networks
[30]. These models provided a solution for gaining better results with an increase in the length of the sequences of information. It is necessary to take into account the effects of time before, and after more on each day. The performance of these models is significantly downed due to the accumulation of errors. The LSTM+ model in [31] made it possible to achieve better performance considering these effects.In addition to predicting traffic flow behavior, which is one the importance of the traffic flow prediction, traffic sensors are usually controlling manually, so these collections of data from sensors accompany with various lengths, irregular sampling, and missing data. These dissonances make this prediction complicated. To solve this challenge, the researcher proposed a model base on Long ShortTerm Memory in [32]
. Also, Convolutional Neural Network models, which showed their abilities to resolve image issues, are used in this domain so that they could provide excellent results in prediction the traffic flow
[33].3 Background
Since the central core of the proposed mode divided into two parts, variational and Long ShortTerm Memory (LSTM). In the following, each section introduced in detail.
3.1 Long ShortTerm Memory
Long shortterm memory (LSTM), as shown in Fig (1), proposed by [34]
, is a recursive neural network architecture that is capable of learning longterm dependencies. This model has been developed to deal with vanishing gradient problems and considered a deep neural network architecture over time. The main component of the Long shortterm memory layer is the memory cell.
A memory cell consists of four main elements: an input gate, a neuron with reconnection, a forget gate, and an output gate. The following equations show step by step operation of a layer of memory cells for input time series as
, hidden states memory cells .(1)  
(2)  
(3)  
(4)  
(5)  
(6) 
The sign in this calculation considered as elementwise multiplication, and by refusing the bias terms, it can be shown how the hidden layer calculated at a time . In the calculations above:

are called the input, forget and output gates, respectively.

the weights connect the recurrence layer at to the hidden layer at time .

weights that connect the hidden layer at time to the recursive layer at time .
At the end of the weighted nonlinear calculation in the gates section, the output enters int a sigmoid activation function so that it can simulate the gating concept since the sigmoid activation function as shown in Eq (
7) with a range from 0 to 1 can provide a gateway as an open or closed concept(7) 
In Long ShortTerm Memory networks, the objective function can be different depending on the structure of the problem, which crossentropy, softmax, and l quadratic can be called accessible functions.
3.2 Variational Autoencoders
Before paying attention to the variational part, it is necessary to get acquainted with the concept of an Autoencoder [35]. The Autoencoder network is a bipartite neural network that teaches the network to compress the information by forcing an encoder network to the output in that case to a low dimensional representation , which is then consumed by a decoder network to output the original data as shown in (2).
However, concerning the variational part [36]
, we must say that the goal is to achieve a model in which reproduction is not dependent only on data. Variational Autoencoder tries to decode data from some known probability distribution, in this case, Gaussian distribution that comes from encoding part to produce reasonable outputs even if they are not encoding actual data as shown in Fig (
3).Suppose be a set of observed variables and
be a set of hidden variables with joint distribution
. Label this distribution as which parameterized by . To generate a sample that looks like a real data point as shown in Fig (4).Then the inference issue is to calculate the conditional distribution of hidden variables given the observations, that is, which can write as shown in Eq (8).
(8)  
Unfortunately, computing is quite difficult because it is very expensive to check all the possible values of and sum them up. So, to solve this issue, approximate by another distibution then can perform approximate inference of the intractable distribution. In order to ensure that and were similar to each other, we could minimize the KL divergence between these two distributions, as shown in Eq (9).
(9)  
Then rearrange the left and righthand side of the equation. We have Eq (10
); moreover, then the loss function would be as the variational lower bound, or evidence lower bound, as shown in Eq (
11).(10)  
(11)  
Therefore by minimizing the loss, we are maximizing the lower bound of the probability of generating real data samples in Eq (12).
(12) 
4 Proposed Method
According to the previous approaches, the proposed model includes a Variational Autoencoder, which uses LSTM as its encoder and decoder parts, as shown in Fig (5
). Long ShortTerm Memory acts as an exploiter both the past and future information — finally, a multilayer perceptron (MLP) network, which is responsible for mapping the target with the samples of distribution, which learned by the VLSTME.
In this proposed approach, the network simultaneously learns the distribution of
and transmits samplings from the distribution and feed into the Multilayer Perceptron model to estimate traffic flow
5 Experiments
5.1 Dataset
Caltrans Performance Measurement System (PeMS) used as a public dataset. It was collected in the realtime form of data by more than 39,000 individual detectors across all major metropolitan areas of the state of California. Performance Measurement System provides a significant variety source of traffic data integrated from Caltrans and other local agency systems.
In this paper, the traffic flow dataset consists of sensors information in the California area, district seven, between 20190101 to 20190530 in a five minutes interval detections. In the case of sensors failure, some records have no values (missing data). In this scenario, a combination of SplineInterpolation and average over a 15 minutes interval, could help the model learn inner patterns desirably. Then the dataset prepared in preprocessing steps. In this particular case, the proposed model would be tested on the traffic flows of two points between station 716076 and 717060, as shown in Fig (
6).Then for each record at time , data related to time is selected as additional features. In other words, our data is picked up to 12 earlier records as a look back. Then the data is scaled into a MinMax scaler. The data in 2019 between 20190101 00:00:00 to 20190331 23:59:00 chose as a training set others for testing, as shown in Table (1). Besides, typical daily traffic flow charts are presented in Fig (7) for both training and testing parts regarding two stations.
Stations  X Train  Y Train  X Test  Y Test 

716076  8628 x 12 x 1  5778 x 12 x 1  8628 x 1  5778 x 1 
717060  8628 x 12 x 1  6187 x 12 x 1  8628 x 1  6187 x 1 
5.2 Parametric Settings
In terms of hardware, the GPU we use is Tesla k80 which provided by Google Colab[37]
. The proposed VLSTME architecture and chosen networks were implemented on the TensorFlow platform (v1.14.0)
[38]. The learning rate is 0.0001, and the batch size is 256, the sigmoid is used for both as the activation of the last layer.5.3 Index of Performance
Four measurements introduced in this paper to evaluate the effectiveness of the proposed model, in the follows:
(13) 
(14) 
(15) 
(16) 
(17) 
where n is the number of the test sample, is the real traffic flow in sample , and denotes the predicted traffic flow.
6 Results
In the following, the results presented as evaluation results and forecasting the traffic flow for VLSTME (Table (6), Fig (8)), LSTM (Table (6), Fig (9)), MCNNM (Table (6), Fig (10)), and SAEs (Table (6), Fig (11)), respectively.
[H] VLSTME Station ID MAPE [%] MAE MSE RMSE 716076 9.5954 0.0312 0.0018 0.0422 717060 8.8625 0.0276 0.0015 0.0381
[H] LSTM Station ID MAPE [%] MAE MSE RMSE 716076 10.2718 0.0341 0.0024 0.0490 717060 10.8174 0.0366 0.0022 0.0464
[H] MCNNM Station ID MAPE [%] MAE MSE RMSE 716076 31.0840 0.0757 0.0129 0.1136 717060 24.0724 0.0603 0.0082 0.0905
[H] SAEs Station ID MAPE [%] MAE MSE RMSE 716076 9.9421 0.0326 0.0020 0.0449 717060 18.4939 0.0560 0.0040 0.0635
As the results show, the proposed model, VLSTME, has improved compared to other conventional models like the Stacked Autoencoders, Long ShortTerm Memory, and Multiple Convolutional Neural Network, which introduced in 2015 [29], 2016 [30] and 2019 [33]. To better understanding, this superiority, the average of the results according to the evaluation criterion is presented in Table (6) which, shows the MSE score of the VLSTME is 0.0016.
[H] Average Models Station ID MAPE [%] MAE MSE RMSE VLSTME 9.2290 0.0294 0.0016 0.0402 LSTM [30] 10.5446 0.0353 0.0023 0.0477 MCNNM [33] 27.5782 0.0680 0.0106 0.1021 SAEs [29] 14.2180 0.0443 0.0030 0.0542
Figures (12, 13) shows the prediction results for the two stations 716076, and 717060 for the test dataset on 2019, April 20. As can be seen, in all stations, the VLSTME curve has a better estimation of the traffic flow than other curves. In cases where the traffic flow fluctuates in viewing a large amount of traffic, the model can quickly converge into that behavior. Also, in low volume volatility, imitation shows a better response than the Long ShortTerm Memory model. Perhaps the reason for this improvement can be found in the data structure; in some cases, the sensors in the stations can not detect the observation, or even this observation will not be highly accurate. In another word, these sensors might be failed in vehicle detection, so it caused missing values. Since the model related to the distribution of data, and the sample of this distribution feed into the network, it can be reduced the adverse effects of these missing data in the learning process and lead to satisfactory results than the other models like Long ShortTerm Memory.
7 Conclusions
This paper presents a Deep Learning approach with a Variational Long ShortTerm Memory Encoder to predict the shortterm traffic flow. In contrast to the previous approaches [30], this model considers the pattern of the data and provided a solution for missing data. So, it could achieve better results based on the four evaluation criteria in contrast to the other models [29, 30, 33], which were introduced earlier. This model is implemented on the PeMS dataset. A suggestion for future work would be interesting if implemented on the other dataset that the stations and its sensors produce missing or lowvalue information. Also, on various distributions, such as Dirichlet distribution, can be useful in improving sample distribution in traffic flow.
References
 [1] Zhongsheng Hou and Xingyi Li. Repeatability and similarity of freeway traffic flow and longterm prediction under big data. IEEE Transactions on Intelligent Transportation Systems, 17:1786–1796, 2016.

[2]
Se do Oh, Young jin Kim, and Ji sun Hong.
Urban traffic flow prediction system using a multifactor pattern recognition model.
IEEE Transactions on Intelligent Transportation Systems, 16:2744–2755, 2015.  [3] Anthony Stathopoulos and Matthew G. Karlaftis. A multivariate state space approach for urban traffic flow modeling and prediction. Transportation Research Part C: Emerging Technologies, 11(2):121–135, April 2003.
 [4] Teng Zhou, Dazhi Jiang, Zhizhe Lin, Guoqiang Han, Xuemiao Xu, and Jing Qin. Hybrid dual kalman filtering model for shortterm traffic flow forecasting. IET Intelligent Transport Systems, 13(6):1023–1032, June 2019.
 [5] Yanru Zhang, Yunlong Zhang, and Ali Haghani. A hybrid shortterm traffic flow forecasting method based on spectral analysis and statistical volatility model. Transportation Research Part C: Emerging Technologies, 43:65–78, June 2014.
 [6] Milan Krbálek, Jiří Apeltauer, and František Šeba. Traffic flow merging – statistical and numerical modeling of microstructure. Journal of Computational Science, 32:99–105, March 2019.
 [7] Xianglong Luo, Liyao Niu, and Shengrui Zhang. An algorithm for traffic flow prediction based on improved sarima and ga. KSCE Journal of Civil Engineering, 22(10):4107–4115, Oct 2018.
 [8] Qinzhong Hou, Junqiang Leng, Guosheng Ma, Weiyi Liu, and Yuxing Cheng. An adaptive hybrid model for shortterm urban traffic flow prediction. Physica A: Statistical Mechanics and its Applications, 527:121065, August 2019.
 [9] Chukwutoo C. Ihueze and Uchendu O. Onwurah. Road traffic accidents prediction modelling: An analysis of anambra state, nigeria. Accident Analysis & Prevention, 112:21–29, March 2018.

[10]
Guangyu Zhu, Kang Song, Peng Zhang, and Li Wang.
A traffic flow state transition model for urban road network based on hidden markov model.
Neurocomputing, 214:567–574, November 2016.  [11] Liguo Zhang and Christophe Prieur. Stochastic stability of markov jump hyperbolic systems with application to traffic flow control. Automatica, 86:29–37, December 2017.

[12]
Darong Huang and Xing rong Bai.
A wavelet neural network optimal control model for trafficflow
prediction in intelligent transport systems.
In
Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
, pages 1233–1244. Springer Berlin Heidelberg, 2007. 
[13]
Shaurya Agarwal, Pushkin Kachroo, and Emma Regentova.
A hybrid model using logistic regression and wavelet transformation to detect traffic incidents.
IATSS Research, 40(1):56–63, July 2016.  [14] Dick Apronti, Khaled Ksaibati, Kenneth Gerow, and Jaime Jo Hepner. Estimating traffic volume on wyoming low volume roads using linear and logistic regression methods. Journal of Traffic and Transportation Engineering (English Edition), 3(6):493–506, December 2016.
 [15] Pinlong Cai, Yunpeng Wang, Guangquan Lu, Peng Chen, Chuan Ding, and Jianping Sun. A spatiotemporal correlative knearest neighbor model for shortterm traffic multistep forecasting. Transportation Research Part C: Emerging Technologies, 62:21–34, January 2016.
 [16] A. Sharma, R. Vijay, G. L. Bodhe, and L. G. Malik. An adaptive neurofuzzy interface system model for traffic classification and noise prediction. Soft Computing, 22(6):1891–1902, November 2016.
 [17] Jianhua Guo, Zhao Liu, Wei Huang, Yun Wei, and Jinde Cao. Shortterm traffic flow prediction using fuzzy information granulation approach under different time intervals. IET Intelligent Transport Systems, 12(2):143–150, March 2018.
 [18] Weihong Chen, Jiyao An, Renfa Li, Li Fu, Guoqi Xie, Md Zakirul Alam Bhuiyan, and Keqin Li. A novel fuzzy deeplearning approach to traffic flow prediction with uncertain spatial–temporal data features. Future Generation Computer Systems, 89:78–88, December 2018.
 [19] Carl Goves, Robin North, Ryan Johnston, and Graham Fletcher. Short term traffic prediction on the UK motorway network using neural networks. Transportation Research Procedia, 13:184–195, 2016.
 [20] Jithin Raj, Hareesh Bahuleyan, and Lelitha Devi Vanajakshi. Application of data mining techniques for traffic density estimation and prediction. Transportation Research Procedia, 17:321–330, 2016.
 [21] KuiLin Li, ChunJie Zhai, and JianMin Xu. Shortterm traffic flow prediction using a methodology based on ARIMA and RBFANN. In 2017 Chinese Automation Congress (CAC). IEEE, October 2017.
 [22] Bharti Sharma, Sachin Kumar, Prayag Tiwari, Pranay Yadav, and Marina I. Nezhurina. ANN based shortterm traffic flow forecasting in undivided two lane highway. Journal of Big Data, 5(1), December 2018.
 [23] Jingyuan Wang, Yukun Cao, Ye Du, and Li Li. DST: A deep urban traffic flow prediction framework based on spatialtemporal features. In Knowledge Science, Engineering and Management, pages 417–427. Springer International Publishing, 2019.
 [24] Anyu Cheng, Xiao Jiang, Yongfu Li, Chao Zhang, and Hao Zhu. Multiple sources and multiple measures based traffic flow prediction using the chaos theory and support vector regression method. Physica A: Statistical Mechanics and its Applications, 466:422–434, January 2017.
 [25] Yuxing Sun, Biao Leng, and Wei Guan. A novel waveletSVM shorttime passenger flow prediction in beijing subway system. Neurocomputing, 166:109–121, October 2015.
 [26] Jianli Xiao, Chao Wei, and Yuncai Liu. Speed estimation of traffic flow using multiple kernel support vector regression. Physica A: Statistical Mechanics and its Applications, 509:989–997, November 2018.
 [27] Nicholas G. Polson and Vadim O. Sokolov. Deep learning for shortterm traffic flow prediction. Transportation Research Part C: Emerging Technologies, 79:1–17, June 2017.
 [28] Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies, 90:166–180, May 2018.
 [29] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and FeiYue Wang. Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems, pages 1–9, 2014.
 [30] Rui Fu, Zuo Zhang, and Li Li. Using LSTM and GRU neural network methods for traffic flow prediction. In 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, November 2016.
 [31] Bailin Yang, Shulin Sun, Jianyuan Li, Xianxuan Lin, and Yan Tian. Traffic flow prediction using LSTM with feature enhancement. Neurocomputing, 332:320–327, March 2019.
 [32] Yan Tian, Kaili Zhang, Jianyuan Li, Xianxuan Lin, and Bailin Yang. LSTMbased traffic flow prediction with missing data. Neurocomputing, 318:297–305, November 2018.
 [33] Kang Wang, Kenli Li, Liqian Zhou, Yikun Hu, Zhongyao Cheng, Jing Liu, and Cen Chen. Multiple convolutional neural networks for multivariate time series prediction. Neurocomputing, May 2019.
 [34] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural Computation, 9(8):1735–1780, November 1997.
 [35] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks : the official journal of the International Neural Network Society, 61:85–117, 2015.
 [36] Diederik P. Kingma and Max Welling. Autoencoding variational bayes. CoRR, abs/1312.6114, 2014.
 [37] Google colab.
 [38] Tensorflow.
Comments
There are no comments yet.