A Dynamic Model for Traffic Flow Prediction Using Improved DRN

by   Zeren Tan, et al.

Real-time traffic flow prediction can not only provide travelers with reliable traffic information and thus save time, but also assist traffic management department to manage transportation system. It can greatly improve the efficiency of transportation. Traditional traffic flow prediction methods usually need a huge amount of data but still leaves a poor performance. With the development of deep learning, researchers begin to pay attention to artificial neural networks (ANNs) such as RNN and LSTM. However, these ANNs are very time-consuming. In our article, we improve the Deep Residual Network and build a dynamic model which previous researchers hardly use. Our result shows that our model can not only be trained efficiently but also have a higher accuracy. Additionally, our dynamic model is more suitable for practical applications.



There are no comments yet.


page 1

page 2

page 3

page 4


A Deep Learning Approach for Network-wide Dynamic Traffic Prediction during Hurricane Evacuation

Proactive evacuation traffic management largely depends on real-time mon...

Traffic Flow Combination Forecasting Method Based on Improved LSTM and ARIMA

Traffic flow forecasting is hot spot research of intelligent traffic sys...

Utility of Traffic Information in Dynamic Routing: Is Sharing Information Always Useful?

Real-time traffic information can be utilized to enhance the efficiency ...

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction

This paper proposes a convolutional neural network (CNN)-based method th...

Framework for Passenger Seat Availability Using Face Detection in Passenger Bus

Advancements in Intelligent Transportation System (IES) improve passenge...

Improving Multi-Step Traffic Flow Prediction

In its simplest form, the traffic flow prediction problem is restricted ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Precise, quickly and timely traffic flow prediction is one of the major tasks of intelligent transportation systems (ITSs). It is of practical significance for individuals, companies and governments to make decisions according to real-time traffic flow. However, accurate and short-term traffic flow prediction remains challenging to researchers for decades because of its stochastic and nonlinear characteristics. At the beginning of traffic flow forecasting research, researchers mainly used linear methods, such as autoregressive integrated moving average (ARIMA) (Ahmed and Cook, 1979)

. Some researchers still prefer to use linear models for their simplicity and convenience and researchers have proposed some improved linear models such as multi-variable linear regression (MVLR)

(L. Li and Zhang, 2015).

From about 40 years ago, some machine learning algorithms showed good performance in many tasks, researchers began to use machine learning algorithms like support vector regression (SVR)

(X. Jin and Yao, 2007) and k-Nearest Neighbor (k-NN) (a. Davis and Nihan, 1991). In spite of good performance, these approaches cannot consider the entire characteristics in traffic flow and do not have satisfactory performances.

Recently, with the development of deep learning, some deep learning models for traffic flow prediction are put forward, such as Recurrent Neural Network (RNN)

(Kyunghyun Cho and Bengio, 2014)

, Long Short-Term Memory network (LSTM)

(Tian and Pan, 2015)

, Gated Recurrent Unit (GRU)

(Kyunghyun Cho, 2014), Stacked Auto-Encoders (SAEs) (Y. Lv and Wang, 2014)

, Deep Belief Network (DBN)

(W. Huang and Xie, 2014), etc.. In general, these models have complex network architecture that can capture nonlinearities in traffic flow. Hence, they perform better in forecasting traffic flow than traditional models. Despite of this, models can be improved in many ways and performance can be even better.

Among all these deep learning models, RNN (Kyunghyun Cho and Bengio, 2014), LSTM (Tian and Pan, 2015) and GRU (Kyunghyun Cho, 2014)

are the most commonly used models. However, the architecture of these models is so complicated that it takes a lot of times to train them. Since it is hard to train, people often have to stack less layers and set the training epochs to a small number, which will result in lower accuracy, or people may use some techniques such as dropout to reduce the size of training set and test set which does not allow people to get a higher accuracy when the data set is not large enough. Therefore, it is difficult to use these models in practical applications. On the other hand, previous studies were mainly based on static models, that is, the model will not be updated once it was trained. Nevertheless, traffic flow is the trick of the real world. In addition, the spatial and temporal relationship of traffic flow is is changing all the time. Therefore, we need to build the mechanism of real-time updating.

In this paper, we improved the deep residual network (DRN) and proposed a Dynamic Improved Deep Residual Network (DIDRN) that will continuously update its training set when some real data are available, which is more powerful in practical application. The DIDRN has good adaptability when the road conditions and the places where we apply it change over time.

The rest of this paper is organized as follows. Section 2 reviews the existing works on short-term traffic flow prediction. Section 3.1 demonstrates basic ideas of DRN. Section 3.2 and Section 3.3 explain how we further improve DRN and how we design the dynamic model, respectively. Section 4 develops experiments and compares our model’s performance with several popular models. Finally, section 5 concludes this paper and discusses future works.

2 Literature Review

Since traffic flow prediction is one of the main tasks of ITSs, researches on traffic flow prediction have been ongoing for many years. New models are constantly being proposed. Although there are numerous traffic flow prediction methods, generally, they can be divided into two main categories: parametric methods and non-parametric methods.

Parametric methods such as ARIMA (Ahmed and Cook, 1979) and MVLR (L. Li and Zhang, 2015) are often used by researchers. These approaches require predetermined model architecture and that the parameters of the model are calculated by empirical data (Xingyuan Dai, 2017). Some improved models based on ARIMA like SARIMA (Billy M. Williams and Lester A. Hoel, 2003) were proposed in the 20th century. These models have simple and explicit architecture but require a huge amount of data and that the traffic condition is in a stationary process. Hence, it may be impossible to use these approaches when sufficient data are unavailable. In most cases, traffic flow data have stochastic and nonlinear features. Therefore, those parametric approaches cannot perform very well.

From the beginning of 20th century, researchers began to pay more attention to non-parametric approaches such as k-NN (a. Davis and Nihan, 1991), SVR (X. Jin and Yao, 2007)

, random forests regression (RF)

(Leshem and Ritov, 2007)gradient boosting regression (Friedman, 2001) and so on. On the other hand, due to the excellent performances of Artificial Neural Networks (ANNs) in capturing nonlinearities, researchers have paid more and more attention to them since the birth of ANNs.

Recently, with the rapid development of deep learning, a lot of deep learning models have been proposed by computer scientists. Many powerful deep learning models, such as SAE (Y. Lv and Wang, 2014), DBN (W. Huang and Xie, 2014), RNN (Kyunghyun Cho and Bengio, 2014), LSTM (Tian and Pan, 2015) and GRU (Kyunghyun Cho, 2014), were introduced to traffic flow prediction and showed superior performances.

Sequential data are correlated with data in the context. Nevertheless, common neural networks cannot capture this correlation among data. In the 1980s, RNN (Hochreiter, 1991) was developed to solve this problem. However, although RNN can capture correlation among data, it will soon forget this correlation, that is, it does not have a long-term memory. In 1997, long short-term memory (LSTM) networks (Sepp Hochreiter, 1997) were invented and set accuracy records in various applications domains. LSTM can remember correlation for a long time and thus has an outstanding performance in sequential modeling. However, both RNN and LSTM are very hard to train since the architecture of these models are too complex. It often takes researchers hours or even days to train these models once. Although they have good performance, to use them in practice is sometimes impossible.

In order to reduce mathematical operations and total running time, gated recurrent unit (GRU) was developed by R. Fu and Li (2016). Unfortunately, in practice, although GRU can reduce training time, the cost is still unacceptable. In addition, Klaus Greff (2017) made a good comparison of the variants of LSTM and found that they are all the same.

Besides RNN and LSTM, researchers have also tried some other deep learning models.

Gang Song (2017) proposed a novel double deep ELMs ensemble system (DD-ELMs-ES) and their model demonstrates better generalization performance than some state-of-art algorithms. However, their model only focuses on one-step forecasting.Xingyuan Dai (2017) implemented DeepTrend which decomposes original traffic flow data into trend and residual components. They demonstrate that DeepTrend can noticeably improve the prediction performance and outperform many traditional models and LSTM. But they didn’t take the spatial correlation into account. Xiaochuan Sun (2017) proposed a deep belief echo-state network (DBEN) to address the problem of slow convergence and local optimum in time series prediction. Experiments results demonstrate that DBEN has a good performance in learning speed and short-term memory capacity. However, there are a large number of parameters in their model so parameter tuning would be a really hard work.

He et al. (2016) proposed deep residual network (DRN) in ILSVRC. Their model substantially outperformed other models and won the first prize in this competition. Typically, when networks become deeper, it will be difficult to train them and the problem of gradient vanishing will arise. Their model is easy to train and can solve the problem of gradient vanishing effectively even if the deep neural network is several times deeper than other networks. For a normal deep neural network, the maximum depth that people have ever used is less than 100. But for DRN, we can easily create a network with several hundred layers and train it in a even shorter time.

3 Model Development

3.1 Drn

When deep learning began to boost, people simply stacked layers and expected that deeper is better. They give the network an input and let the intermediate layers fit a map to get the final output . As they believe deeper networks could have better performances, they stack more layers to fit the desired map and expect to get a higher accuracy. However, it turns out that when the depth of the network increases, both training and test accuracy get saturated, that is, degradation arises.(He et al., 2016) Figure 2 shows the architecture of simply stacked neural network.

DRN is proposed to solve the problem of degradation. Instead of hoping each stacked layer fits a desired mapping directly. Kaiming lets some layers fit a residual mapping (He et al., 2016).

Concretely, assuming that we want our network to fit , we let the sacked layer fit another mapping . Then the desired mapping can be replaced by .

We hypothesize that it is easier to fit the residual mapping than the original mapping . If the optimal mapping is identity, then it would be easy to make the residual be zero. Figure 2 shows the idea stated above.

Figure 1: Architecture of common neural network
Figure 2: Architecture of residual neural network

This simple change makes a lot of difference. The network becomes much easier to train. Deep residual network has helped Kaiming and his team win 1st prize in the ILSVRC classification competition and many other competitions. Before DRN existed, the winner network of ILSVRC had at most 32 layers while when DRN participated in this competition, it had 152 layers. What’s more, the accuracy had a significant increase.

3.2 Improved DRN

We are inspired by the idea of Kaiming on deep residual network. To keep more information when information is transfered among layers and thus get higher accuracy, we further improve the architecture of DRN and use it in traffic flow prediction. Figure 3 shows architecture of our improved network.

Figure 3: Our improved residual network

The network works as follows:

  • Denote the input as , and the first layer fits the function .

  • Then we add these two outputs together and we get .

  • The second layer takes as input and maps it into .

  • Similarly, we then add the output of the first layer and the output of the second layer and we get .

From the aforementioned steps, it is clear that we divide the task of fitting a complex function into fitting two simpler mappings and , respectively. It will make the layers work more efficiently since it is easier to fit these two simple mappings. Consequently, it can reduce the error rate of fitting these two mappings and eventually reduce the overall error rate.

3.3 Dynamic Model

In practice, transportation management departments will not use this model in the specific place that our data come from, that is our model needs to be transfered easily among different places. Besides this, traffic flow on a specific road may change over time. In other words, the traffic flow after one month or one year may have significant difference with the current one. The weights and parameters were obtained by using the old training data. These weights and parameters can fit the old data very well but it can not fit the new data well, that is the previous model would not have good performances on new data. Therefore, it is unreasonable to simply use the old model to predict traffic flow without updating it. It is also not surprising that we cannot expect this old model to have good performances when the traffic flow has changed significantly. Hence, traffic flow prediction models should be updated constantly. Based on the ideas stated above, we choose to design a dynamic model.

In order to update pre-trained model, we use the basic idea of incremental learning to implement a dynamic model.

The complete process is as follows:

  1. We use the collected data to pre-train a basic model.

  2. Then we use our model to do prediction work.

  3. When finishing some steps of prediction, we get some new real data.

  4. We combine these new data into a new training set and use this new training set to train our model again.

  5. Repeat Step 2-Step 4.

Figure 4 shows the process of these steps.

Figure 4: Dynamic Model

With these steps, the model will be continuously updated through absorbing new data. Therefore, this model will fit the practical conditions better and give more accurate predictions.

3.4 Model Implementation

In this section, we will talk about the whole process of processing data.

We divide the process into several steps:

  1. Denote the raw data as . It is a column vector of .

    We reasonably assume that there are some trend characteristics in traffic flow data. So we subtract adjacent data to get the difference vector . Where

  2. However, we cannot apply supervised learning to

    directly. Hence, we further process our data into supervised data.

    First, we ought to choose the time-step. Time-step represents how many previous data points we use to predict the next data point. After parameter tuning, we choose 1 as our time-step, that is, time-step = 1.

    Then we denote as the feature and as the data to be predicted. and are defined as follows.


    Then we get the entire data set.

  3. In order to accelerate the speed of training and prediction, a simple approach that people often use when developing deep learning model and doing experiments is normalization. Since some elements in are positive and others are negative, we scaled and into . We define the resulted vectors as and . They are defined as follows:



    Then the data set becomes

  4. When predicting, we invert Step 1 and Step 3 to get the final prediction data.

4 Experiments and Results

4.1 Data Source

Our data come from the ring roads in Beijing, China. Basic traffic flow data are collected by 14 microwave detectors. The interval of time of these traffic flow data is 10 minutes. The total length of data collected by each detector is 8784. The time span is 61 days from Jun., 2013 to Jul.,2013 . For each series of data, we use the first 7200 pieces of the entire data as training data and the rest as test data. We use the detected traffic flow as input and finally get the predicted traffic flow at specified time points. The results verify the feasibility and effectiveness of our proposed model.

4.2 Performance Indexes

In order to evaluate the performance of our proposed model, we develop several experiments on our model and several similar models, including one layer LSTM, deep LSTM, and DRN. We use one layer LSTM as the baseline model to show the accuracy promotion of our model. We use LSTM to illustrate that DIDRN outperforms some state-of-the-art and widely used models. Finally, DRN is included to demonstrate that the improvement we have made does make sense.

We use Rooted Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) to evaluate the performance of these four models.

RMSE, MAPE and MAE are defined as follows:

Where is the real value and is the forecast value.

RMSE and MAE represent the deviation from the predicted values and the detected values. They will expand as the range of the data is expanded. Therefore, we can not evaluate our model just on the basis of the absolute value of RMSE and MAE. The range of our data should be considered at the same time. However, MAPE is a relative measure of the deviation from the predicted values and the ground truth values. It is a percentage error so it is not related to the range of our data. We use these three performance indexes to evaluate two aspects of the selected models: 1) absolute error 2) relative error.

4.3 Performance Analysis

Except for one layer LSTM, the other three models have 16 layers. In order to change dimension, we add some layers in both DRN and DIDRN but they still have 16 layers in total.

We evaluate models by 14 different sets of traffic flow data from 14 detectors. They are numbered as 2010, 2011, 2013, 2023, 2030, 2033, 2052, 3034, 3035, 4004, 4005, 4050, 4051, 5062.

All the experiments are run on a desktop with @2.60 GHz processor, 8.0 GB RAM and NVIDIA GeForce GTX 960M. All models were coded in Python 3.6 with Keras and Tensorflow framework and they were compiled by Anaconda Jupyter Notebook.

We first use our these model to predict short-term traffic flow. Concretely, the interval of time points is 10 minutes. We use the previous traffic flow to predict the traffic flow 10 minutes later.

Table 1 shows the performance.

Model One Layer LSTM Deep LSTM DRN DIDRN
2010 RMSE 101.07 75.65 75.64 74.35
MAPE 11.58% 7.05% 7.04% 6.99%
MAE 84.60 59.13 59.12 53.73
2011 RMSE 123.79 104.13 104.13 101.53
MAPE 11.15% 7.90% 7.90% 7.44%
MAE 98.61 77.70 77.71 75.07
2012 RMSE 122.05 96.47 96.37 94.46
MAPE 15.56% 9.46% 9.43% 9.38%
MAE 98.82 71.23 71.13 69.07
2023 RMSE 99.99 78.28 78.02 74.26
MAPE 11.19% 7.02% 6.97% 6.96%
MAE 81.86 58.90 58.60 55.72
2030 RMSE 104.95 76.00 75.96 67.13
MAPE 14.63% 7.43% 7.42% 7.38%
MAE 86.94 57.34 57.29 49.66
2033 RMSE 79.84 59.75 59.75 58.40
MAPE 12.63% 6.89% 6.89% 6.80%
MAE 66.70 45.11 45.11 44.11
2052 RMSE 109.56 85.99 85.99 83.28
MAPE 11.63% 7.46% 7.46% 7.20%
MAE 88.31 65.52 65.52 62.77
3034 RMSE 99.08 74.32 74.04 72.27
MAPE 10.45% 6.97% 6.89% 6.89%
MAE 83.05 57.07 56.99 53.67
3035 RMSE 98.09 68.36 68.36 64.74
MAPE 15.15% 7.64% 7.64% 7.59%
MAE 82.82 51.15 51.15 46.68
4004 RMSE 94.20 61.89 61.89 57.30
MAPE 16.44% 8.21% 8.21% 6.80%
MAE 80.25 47.01 47.01 42.39
4005 RMSE 106.9 87.65 87.65 86.02
MAPE 10.02% 6.34% 6.34% 6.27%
MAE 87.13 65.63 65.63 64.35
4050 RMSE 93.53 57.85 57.85 56.48
MAPE 37.96% 19.36% 19.36% 18.58%
MAE 80.57 37.89 37.89 36.11
4051 RMSE 112.35 98.13 98.13 98.60
MAPE 12.04% 8.66% 8.66% 8.45%
MAE 83.94 65.56 65.56 65.20
5062 RMSE 90.63 60.06 60.06 56.18
MAPE 17.89% 8.23% 8.23% 8.06%
MAE 77.86 45.87 45.87 42.24
Table 1: Performance comparison of different models

Through Table 1, we can find that DIDRN has more outstanding performance than other models. One layer LSTM has a much worse performance than other models since it is shallower. Deep LSTM seems to have a satisfying performance, however, on our machine, it actually takes us nearly 1 hour to train it before it converges. DRN shares a similar but lower error rate with deep LSTM and it is much easier to train. It only takes us about 20 minutes to train DRN. On the other hand, although DIDRN is derived from DRN, it has a lower error rate than DRN. In addition, it is not hard to train. We can finish training it within 30 minutes which is similar to DRN but is much shorter than that of deep LSTM.

Figure 5 shows a comparison of predictions by different models.

Figure 5: Predictions

Figure 6 shows part of the predictions of DIDRN.

Figure 6: Raw data and predictions

To evaluate the stability of our proposed model, we change the time step into 1 hour, 2 hours, 24 hours and use DIDRN to conduct experiments. Figure 7 and Table 2 show our results.

(a) 1 hour
(b) 2 hours
(c) 24 hours
Figure 7: Predictions of Different Time Intervals
Time Interval (hour) RMSE MAPE MAE
1 202.87 18.45% 141.34
2 342.96 32.67% 233.05
24 135.24 10.88% 94.92
Table 2: Performance of DIDRN with Different Time Intervals

In the following discussion, we conduct our analysis based on MAPE because it is simple and intuitive and the data of detector 2010 are chosen to demonstrate the results.

From Table 2 we can see that while the time interval becomes larger from 10 minutes to 1 hour and 2 hours, the accuracy decreases monotonously. This is congruent with our intuition. When the time interval becomes larger, we need more information to predict the traffic flow and the traffic flow of the previous time point is less related to the traffic flow of the predicted time point. Therefore, it is obvious that the accuracy would decrease. However, when we use the traffic flow of the previous day to predict that of the next day, the accuracy increase significantly to 10.88% in terms of MAPE. This is not strange. On one hand, since our data are stable and have periodic nature, traffic flow of the same time in the previous day is quite similar to that of the predicted day. On the other hand, traffic flows 1 hour or 2 hours later can be really different with that of the current time point. For example, traffic flow at 7:00 a.m. has significant change compared to traffic flow at 8:00 a.m. and 9:00 a.m. since they are peak hours. People go to work during peak hours and consequently, lead to a sudden increase in traffic flow. The model do not have enough information to predict this sudden increase so the predict values have a larger deviation from the detected values.

We explore the decrease and increase of accuracy when time interval increase from 10 minutes to 1 hour and from 1 hour to 24 hours in more detail.

Figure 8 shows the diagram of MAPE. It is obvious that MAPE is linearly related to time interval. This is true when the jump between time intervals is small, but when we increase the jump to 1 hour, this linear relation does not exist any more. Figure 9

shows the MAPE of DIDRN when time interval increase from 1 hour to 24 hours. We can observe that the shape of this curve is like a pudding. MAPE firstly increase from about 18% to over 100% then decrease to less than 11%. When time interval equals to 2 hours, the error rate is unacceptable. The highest error rate is over 105% which means the models almost cannot estimate the traffic flow when time interval is large. The reason is that we only use the current traffic flow to estimate traffic flow of the next time point. Traffic flow would not have significant change when time interval is not too large. However, when time interval is as large as several hours, the current traffic flow is much different with the traffic flow to predict. This kind of time interval is equivalent to a reshuffle. The specific example can be what we have mentioned above.

Figure 8: MAPE of Different Time Interval (minute)
Figure 9: MAPE of Different Time Interval (hours)

In order to further analyze the performance of our model, we gradually increase the time interval to 7 days.

Table 3 shows the result. It can be observed that when the time interval is ranging from 2 days to 6 days, the error rate decreases from 13.96% to 10.45%. This is because when time interval becomes larger, the predicted traffic flow has less relation with the input traffic flow. The model has less information to update its weights in order to perform well both in training set and test set.

There is a big reduction between 1 day and 2 days. If we think carefully about the whole prediction process, we can figure out what has happened. When time interval equals to 1 day, we can find that traffic flow of Friday and Saturday are used to predict traffic flow of Saturday and Sunday, respectively. Nevertheless, when time interval equals to 1 days, traffic flow of Friday and Saturday are used to predict traffic flow of Sunday and Monday, respectively. Saturday and Sunday are weekend and Monday and Friday are workdays. As less people will go to work at weekends, traffic flow of weekend and workday are significantly different. Hence, it is not hard to see that when time interval equals to 2 days, the accuracy will decrease to some extent.

There is a large increase between 6 days and 7 days. Due to symmetry, the error rate of 6 days should be similar with the error rate of 1 day and it can be observed from our result too. Since traffic flow are a kind of periodic data and the period is exact 7 days, it is not surprising that the flow one week later is similar with the current flow. Due to aforementioned property, our model can perform better and thus has a higher accuracy.

1 135.24 10.88% 94.92
2 158.27 13.96% 113.70
3 154.55 13.81% 109.36
4 145.62 13.22% 104.54
5 142.73 10.85% 99.14
6 143.36 10.45% 98.12
7 100.56 7.28% 69.48
Table 3: Performance of DIDRN with Different Days
(a) Detector 2010
(b) Detector 2011
(c) Detector 2012
(d) Detector 2023
(e) Detector 2030
(f) Detector 2033
(g) Detector 2052
(h) Detector 3034
(i) Detector 3035
(j) Detector 4004
(k) Detector 4005
(l) Detector 4050
(m) Detector 4051
(n) Detector 5062

In summary, our model can do well in short-term traffic flow prediction as we use traffic flow at one time point to predict traffic flow at next time point. If we use more information as input, that is, use not only the current traffic flow but also that of previous one or more time points, our model is likely to perform better.

5 Conclusion and Future work

In this paper, we demonstrate the basic ideas of deep residual network. It turns out that DRN is much simpler to train and have an excellent performance. Then we explain how we are inspired by DRN and and how we improve it to do our research. We show the architecture of our network and how it works. After that, we propose our dynamic model DIDRN and demonstrate why it makes sense.

We show the entire process of processing data step by step. We explain how we process the raw data into data that can be used in supervised learning and how we get the final predictions. Next, we develop several experiments and compare performance of different models. It turns out that DIDRN has better performance than some popular models. From the view of MAPE, DIDRN has a 1.41% performance improvement at most comparing to LSTM and DRN. For RMSE and MAE, DIDRN can have a mostly 9 of reduction.

Despite the good performance, our model still have some shortcomings.To summarize, our main contributions are as follows:

  • We apply deep residual network in traffic flow prediction and improve it.

  • We take practical applications into account and propose a dynamic model called DIDRN.

  • The results show that our model is more powerful than other commonly used models.

We only take temporal pattern into account in this paper. In future work, the spatial pattern would be considered and we will make our model learn spatial-temporal dependence.

In addition, our model can do traffic flow prediction well when the time interval is short or is the period of our data. But it has a poor performance when the time interval is a little larger. Future work can be done to extend the model to a more generalized version so that the model can perform well when time interval is both short and large.

Moreover, since weather condition certainly has an impact on traffic flow, it would be considered that making our network learn the correlation between weather condition and traffic flow.