1 Introduction
The goal of traffic forecasting is to predict the future traffic based on previous traffic flow measured by sensors. In traffic flow theory, speed, volume and density are the fundamental variables for indicating traffic condition [Mathew and Rao2017]. Forecast of these traffic variables is crucial to route planning, traffic control and management, therefore is an essential component of Intelligent Transportation System (ITS).
Accurate traffic forecasting however, is a challenging task. The main challenges come from the following two aspects: 1) The size of traffic prediction problem is usually very large. It’s hard to model the traffic flow generated by thousands of sensors in a traffic network. 2) The spatiotemporal dependency of traffic flow is complex and dynamic. The future traffic flow of a region is related with previous flow of many nearby regions in a very complex way since the traffic network is unstructured and largescale. Moreover, the traffic condition is time variant, making the relations between previous flow and future flow further dynamic.
Advanced algorithms that can model the interactions among the traffic network are required to predict their future trends. In literature, datadriven approaches have attracted many research attentions. For example, statistical methods such as autoregressive integrated moving average (ARIMA)[Davis et al.1990] and its variants [Williams and Hoel2003]
are well studied. The performances of such methods are limited because their capacity is insufficient to model the largescale traffic network. Besides, they cannot capture the complex nonlinear dependency among the traffic network in either spatial or temporal contexts. Recently, deep learning methods have shown promising results in dynamic prediction over sequential data, including stacked autoencoder (SAE)
[Lv et al.2015], DBN [Huang, Song, and Hong2014], LSTM [Dai et al.2017] and CNN [Zhang, Zheng, and Qi2016]. Although these methods made some progress in modeling complex patterns in sequential data, they have not yet modeled both spatial and temporal dependency of traffic network in an integrated fashion.Several methods [Li et al.2018, Yu, Yin, and Zhu2017]
attempt to model the traffic network by unrolling static graphs through time where each vertex denotes the reading of traffic data at a given location and edges represent the connectivity between traffic locations. These works show that the graph structure is capable of describing the spatiotemporal dependency of traffic. However, they usually neglect the dynamic graph structure by assuming that the affinity matrix of the constructed graph, i.e. the nodes proximity, does not change over time. It implies traffic conditions are timeinvariant, which is not true in the real world.
To address the above mentioned challenges, we propose a dynamic spatiotemporal graph based CNN (DSTGCNN), which can model both the dynamics of traffic flow and the graph structure. The contributions of this paper are fourfold:

We propose a novel spatiotemporal graphbased convolutional layer that is able to jointly extract both spatial and temporal information from the traffic data. This layer consists of two factorized convolutions applied to spatial and temporal dimensions respectively, which significantly reduces computations and can be implemented in a parallel way. Then, we build a hierarchy of stacked graphbased convolutional layers to extract expressive features and make traffic predictions.

We also learn the evolving graph structures that can adapt to the dynamic traffic condition over time. The learned graph structures can be seamlessly integrated with the stacked graphbased convolutional layers to make accurate traffic predictions.

We propose a novel twostep prediction scheme. The scheme first predicts traffic flow at close future time steps based on previous traffic flow. Afterwards, flow at later future step is predicted according to the predicted close future flow and the actual previous flow. This twostep prediction scheme splits the prediction task into two simpler subtasks, so that the longtime prediction accuracy gets improved.

We evaluate the proposed model on two challenging realworld datasets. Experimental results demonstrate that DSTGCNN outperforms the stateoftheart methods.
2 Related Work
The study of traffic forecasting can trace back to 1970s [Larry1995]. From then on, a large number of methods have been proposed, and a recent survey paper comprehensively summarizes the methods [Vlahogianni, Karlaftis, and Golias2014]. Early methods were often based on simulations, which were computationally demanding and required careful tuning of model parameters. With modern realtime traffic data collection systems, datadriven approaches have attracted more research attentions. In statistics, a family of autoregressive integrated moving average (ARIMA) models [Davis et al.1990]
are proposed to predict traffic data. However, these autoregressive models rely on the stationary assumption on sequential data, which fails to hold in real traffic conditions that vary over time. In
[Hoang, Zheng, and Singh2016], Intrinsic Gaussian Markov Random Fields (IGMRF) are developed to model both the season flow and the trend flow, which is shown to be robust against noise and missing data. Some conventional learning methods including Linear SVR [Jin, Zhang, and Yao2007]and random forest regression
[Leshem and Ritov2007] have also been tailored to solve traffic prediction problem. Nevertheless, these shallow models depend on handcrafted features and can not fully explore complex spatiotemporal patterns among the big traffic data, which greatly limits their performances.With the development of deep learning, various network architectures have been proposed for traffic prediction. CNNbased methods [Zhang, Zheng, and Qi2016, Ma et al.2017, Tan and Li2018] and RNNbased methods [Sutskever, Vinyals, and Le2014, Cui, Ke, and Wang2016] separately model the spatial and temporal dependency of traffic. However, spatial and temporal dependency are not simultaneously considered. To close this gap, hybrid models where temporal models such as LSTM and GRU are combined with spatial models like 1DCNN [Wu and Tan2016, Du et al.2018] and graphs [Li et al.2018] are proposed and achieve impressive performances. Nevertheless, these recurrent models are restricted to process sequential data successively oneafterone, which limits the parallelization of underlying computations. In contrast, the proposed model utilizes convolutions to capture both spatial and temporal data dependencies, which can reach much more efficiency than the compared recurrent models. Similar to our work, [Yu, Yin, and Zhu2017] also models spatial and temporal dynamics using graphCNNs. However, their method fails to consider the dynamics of graph structure which is an important information to traffic prediction. Our proposed model includes an extra stream to model the dynamic graph structure, thus the traffic prediction can benefit from the dynamic graph prediction.
3 Preliminaries
3.1 Traffic Prediction Problem formulation
Traffic prediction is to predict future traffic flow (e.g., traffic speed, volume) given a series of historical traffic flow. More specifically, given historical flow of time steps, the goal is to predict future flow after time steps.
We represent the traffic network as a weighted undirected graph , where is the vertices in the graph, representing the dimensional observations at locations. is the affinity matrix depicting the proximity between vertices. The traffic prediction problem can be represented as learning the mapping function that maps the historical flow into future flow:
(1) 
For simplicity, we use a tensor
to denote . Without ambiguity, the subscript may be ignored in the rest of paper.We derive the affinity matrix from the travel time such that , where is the travel time between location and . Travel time is a direct reflection of traffic condition between locations, therefore can be adopted as the measurement of graph node affinity.
4 Method
The proposed DSTGCNN framework can model both the complex spatiotemporal dependency in traffic network and the fastevolving traffic conditions. It takes three inputs: the previous traffic flow represented as stacked graph frames, the previous traffic conditions represented as a series of affinity matrices and auxiliary information. Then these types of information are fed to a twostream network. The graph prediction stream predicts the traffic conditions while the flow prediction stream forecasts evolutions of traffic flow given the predicted traffic conditions. The overall detailed architecture of DSTGCNN is presented in Figure 1. In the following subsections, we describe the two streams in details.
4.1 Flow Prediction Stream
In this subsection, we introduce the structure of the flow prediction stream that is the main subnetwork to perform prediction. First, we present the building block of this stream, which is a novel Spatiotemporal Graphbased Convolutional Layer (STC) that works with spatiotemporal graph data. Then we build a twostep hierarchical model using STC layers to predict traffic data.
The STC layer factorizes spatial graph convolution and temporal convolution separately. On the one hand, the computation can be efficiently conducted in a parallel way, which addresses the challenge of largescale traffic network. On the other hand, the factorized convolutions separately model the spatial and temporal dynamics, which deals with the challenge of spatiotemporal dependency. The twostep hierachical learning scheme splits the whole predition problem into two easier subproblems, and breaks the longtime prediction difficulty, which boosts the accuracy of future flow prediction.
4.1.1 Spatiotemporal Graphbased Convolution
The CNN is a popular tool in computer vision as it is powerful to extract hierarchy features expressive in many highlevel recognition and prediction tasks. However, it cannot be directly applied to process the graph data like in our task. Therefore, we propose a novel layer that works with spatiotemporal graph data and is also as efficient as conventional convolutions.
Inspired by [Howard et al.2017] that factorizes convolutions along two separate dimensions, we also present two factorized convolutions applied to spatial and temporal dimensions respectively, in a hope to reduce computational overhead. They form the proposed Spatiotemporal Graphbased Convolutional Layer (STC), whose structure is shown in Figure 2. The input to a STC layer contains a sequence of graph structured feature maps organized by their timestamps and channels. Each graph is first convolved spatially to extract its spatial feature representation, and then features of multiple graphs are fused by a temporal convolution in a sliding timewindow. In this way, both spatial and temporal information are merged to yield a dynamic feature representation for predicting future traffic data.

Spatial Convolution
Let us define the spatial convolution on a given graph first. The diagonal degree matrix and the graph Laplacian are defined as and
respectively. Then the Singular Value Decomposition (SVD) is applied to Laplacian as
, whereconsists of eigen vectors and
is a diagonal matrix of eigen values. The matrixis the graph Fourier transform matrix which transforms a signal to its frequency domain.
is the graph Fourier transform of input graph signal , and is the graph Fourier transform of filter . With the same notation in [Henaff, Bruna, and LeCun2015], the convolution of a graph signal with filter on is defined as(2) 
where is the elementwise product.
Let’s define as the filter in frequency domain, then the convolution can be rewritten as
(3) 
The above graph convolution requires filter to have the same size as input signal , which would be inefficient and hard to train when the graph has a large size. To make the filter “localized” as in CNN, can be approximated as polynomials of [Defferrard, Bresson, and Vandergheynst2016] so that and Eq. 3 can be rewritten as
(4) 
Now the trainable parameters become whose size is restricted to . In addition, a node is only supported by its neighbors [Hammond, Vandergheynst, and Gribonval2011].
Then we use the convolution operation above to define the spatial convolution in STC layer. When computing the spatial convolution between feature map and kernel in the th layer of DSTGCNN, where is the channel number, the graphbased convolution defined above is applied to individual graph frame separately. In specific, each graph feature at th channel and th time step is individually filtered such that
(5) 
where and are the individual kernel and filtered output at the th channel and th time step, while tensor is the whole output.

Temporal Convolution
At each time, after the spatial convolution, traffic flows are fused on the underlying graph, resulting in a multilayered feature tensor compactly representing individual traffic flow and their spatial interactions.
However, information across time steps is still isolated. To obtain spatiotemporal features, many previous methods [Jain et al.2016, Dai et al.2017, Sutskever, Vinyals, and Le2014] are based on recurrent models, which process sequential data iteratively stepbystep. Consequently, the information of current step is processed only when the information of all previous steps are done, which limits the efficiency of recurrent models.
To make temporal operations as efficient as a convolution, we perform a conventional convolution along the time dimension to extract the temporal relations, named after temporal convolution. For a feature tensor of size , its convolution with kernel of size is performed,
(6) 
where
is the size of time window. To keep the size of the time dimension unchanged, we pad
zeros on both sides of the time dimension.
Putting Together
By combining Eq. 5 and Eq. 6, we have the following definition of spatiotemporal graphbased convolution:
(7) 
whose structure is shown in Figure 2.
We now analyse the efficiency of our factorized convolution. Without such factorization, one needs to build a graph with nodes to capture both spatial and temporal structures, making the graph convolution in Eq. 3 have complexity of . While our STC layer builds graphs with nodes and separates spatial and temporal convolutions, has complexity of , which is much more efficient.
4.1.2 Twostep Prediction
The STC layers are able to jointly extract both spatial and temporal information from the sequence of traffic flow. We can build a hierarchical model using such layers to extract features and predict future traffic from previous flow . A straight way is to directly predict future traffic after intervals as existing methods [Dai et al.2017, Yu, Yin, and Zhu2017, Defferrard, Bresson, and Vandergheynst2016]. This onestep prediction scheme is simple but has two disadvantages. First, it only uses ground truth data at to train the model but neglects those between and . Second, when is large, it is hard for onestep methods to capture traffic trends for such a long time, since the input and the future flow may be very different.
To solve the above issues, we propose a new prediction scheme that divides the prediction problem into two steps. In the first step, we use previous flow to predict future traffic flow between and , which is called “close future flow”. During the training phase, the predicted “close future flow” is supervised by ground truth at the corresponding time period. As a result, ground truth data between and is imposed into training procedure. In the second step, the “target future flow” at time is predicted by considering both previous flow and the predicted “close future flow”. Compared with onestep methods, the prediction of “target future flow” is easier now since it utilizes “close future flow” and it only predicts one step further. The twostep prediction scheme is shown in the second path in Figure 1.
Let’s denote the models of the first step and the second step as and
respectively. These two model both stacks several STC layers for prediction. The loss function of twostep prediction can be written as:
(8) 
where is the predicted “close future flow” and is the ground truth. and are parameters of two models respectively.
4.1.3 Auxiliary Information Embedding
Apart from previous flow (e.g., traffic speed, volume), some auxiliary information like time, the day of week and weather are also useful for predicting future traffic flow. The influence of such information is studied in [Hoang, Zheng, and Singh2016, Zhang, Zheng, and Qi2016, Liao et al.2018]. For example, weekdays and weekends have very different transit patterns and a thunder storm can suddenly reduce the traffic flow.
To make full use of such auxiliary information, we embed them into the traffic flow prediction network. We first encode these information into onehot vectors. For example, we encode time into a onehot vector of length 48, which represents the index of half hour in the day. The day of week is encoded into a vector of length 7. Then these onehot vectors are concatenated and we use several fully connected layers to extract a feature vector. The feature vector is later reshaped so that it can be concatenated with traffic flow feature maps. Finally, the concatenated features are fed into prediction modules, as shown in Figure 1.
4.2 Graph Prediction Stream
In this subsection, we introduce the other stream in the framework, which is named as the graph prediction stream. Previous methods [Henaff, Bruna, and LeCun2015, Jain et al.2016, Yu, Yin, and Zhu2017] that model spatiotemporal graphs assume that the graph structure of spatiotemporal data is fixed without temporal evolutions. However, in real world applications, the graph structures are dynamic. For instance, in the traffic prediction problem, traffic conditions are timevariant, implying that the proximity between vertices in graphs change over time. In order to model such dynamics, we introduce a stream in the framework to predict such timevariant graph structures. The dynamic graph structure that is learned from previous graphs can represent future traffic condition better than static graph. When fed into the flow prediction stream, the learned graph provides a strong guidance for the future flow prediction.
In particular, at each time , we have a graph structure for STC layers in the model as a function of time . It reflects the average traffic condition in the period between time and . One way to obtain is first computing the average travel time in the corresponding period
(9) 
Then we have the average affinity matrix and the corresponding Laplacian. However, In the test phase, can not be directly computed since the future travel time during to is unavailable. To address this problem, we introduce another path in the network to predict graph structure from previous travel time data . In other words, we predict the average traffic condition during to using previous data from to . Specifically, are first converted to affinity matrices to construct a tensor , then it is fed into a subnetwork to predict a new affinity matrix representing for the average traffic condition during and , where is parameter of .
During training, the graph prediction stream is supervised by minimizing the following loss function
(10) 
where is the ground truth average affinity matrix during and . norm is used to avoid the loss from being dominated by some large errors. The Laplacian of is then computed and fed into STC layers. In this way, the prediction model takes the dynamic traffic conditions into consideration, thus it is able to make more accurate future predictions.
To model the relations of previous affinity matrices, a model with global field of view is required since entries of affinity matrices have “global” correlations. For instances, and is closely related no matter how apart they are located in . Stacked fully connected layers may be preferred because of the global view. However, fully connected layers are hard to train because they have a large number of parameters. In addition, affinity matrices are sparse, which makes many parameters in fully connected layers redundant.
To handle this issue, we use convolutional layers instead of fully connected layers. In particular, multiple pairs of convolutional layers are stacked, where each pair consists of convolutional layers of kernel sizes and respectively to get the large spatial extent. Here is the number of vertexes in the graph. In our experiment, such convolutional layers achieve better performance than fully connected layers.
4.3 The Whole Model
By combining the two streams, we get the full model of DSTGCNN shown in Figure 1. The loss function of the complete model is
(11) 
The network is trained by two losses, one is for dynamic graph learning defined in Eq. 10, the other is for traffic flow prediction as defined in Eq. 8.
It is worth noting that DSTGCNN is a general method to extract features on spatiotemporal graph structured data. It can be applied to not only traffic prediction tasks like speed or volume prediction, but also other more general regression or classification tasks on graph data, especially when the graph structure is dynamic. For instance, it can be adapted to skeletonbased action recognition or pose forecasting tasks with minor modification.
5 Experiments
In this section, we present a series of experiments to assess the performance of the proposed methods. First we describe the two datasets that we experiment with, next we introduce the implementation details of DSTGCNN. Then we conduct ablation experiments to evaluate the effectiveness of components in DSTGCNN. At last, experiments on the two datasets are conducted and the experiment results are compared with stateoftheart methods.
5.1 Dataset and Evaluation Metrics
We evaluate our method on two public traffic datasets: METRLA [Jagadish et al.2014] and TaxiBJ Dataset [Hoang, Zheng, and Singh2016]. METRLA[Jagadish et al.2014] is a largescale dataset collected from 1500 traffic loop detectors in Los Angeles country road network. This dataset includes speed, volume and occupancy data at the rate of 1 reading/sensor/min, covering approximately 3,420 miles. As [Li et al.2018], we choose four months of traffic speed data from Mar 1st 2012 to Jun 30th 2012 recorded by 207 sensors for our experiment. The traffic data are aggregated every 5 minutes with one direction.
The traffic volumes and travel time data of TaxiBJ Dataset [Hoang, Zheng, and Singh2016] are obtained from taxis’ GPS trajectories in Beijing during March 1st 2015 to June 30th 2015. The authors partition Beijing into 26 highlevel regions and traffic volumes are aggregated in every 30 minutes, with two directions {In, Out}. Besides traffic volumes and travel time, it also includes weather conditions that are categorized into good weather (sunny, cloudy) and bad weather (rainy, storm, dusty).
For evaluation, we use three metrics: the Root Mean Squared Error (RMSE), the Mean Absolute Percentage Error (MAPE) and the Mean Absolute Error (MAE), which are defined as below:
(12) 
(13) 
(14) 
where and are the predicted and ground truth traffic volumes at time and location .
5.2 Implementation Details
Models and presented in subsection 4.1.2 consist of three STC layers with , ,
channels respectively. A ReLU layer is inserted between two STC layers to introduce nonlinearity as CNNs. Another ReLU layer is added after the last STC layer to ensure nonnegative prediction. In spatial convolution of STC layer, the order
of polynomial approximation is set to be 5 and the temporal convolution kernel size is set to be . The graph prediction stream consists of three pairs of and convolutional layers with channels. The auxiliary information is encoded by two fully connected layers with andoutput neurons respectively, so that the output can be reshaped and concatenated with flow features. The scale factor
for constructing the affinity matrix is set to 500. In the training procedure, we first pretrain the dynamic graph learning subnetwork for epochs and jointly train the whole model for epochs. The model is trained by SGD with momentum. The first 50 epochs take a learning rate of and the last 50 epochs use. Finally, the framework is implemented by PyTorch
[Paszke et al.2017].5.3 Ablation Study
To investigate the effectiveness of each component, we first build a plain baseline model which stacks three STC layers as while uses onestep prediction scheme, keeps graph structure fixed and does not use auxiliary information. The static graph structure is calculated by averaging all traffic time in training set. Then different configurations are tested, including:
(1) the baseline model denoted as Basel;
(2) the baseline model with auxiliary information embedding (AE), denoted by Basel+AE;
(3) the above configuration plus graph prediction stream (GP), denoted by Basel+AE+GP;
(4) the above configuration plus twostep prediction (TP), which is the full model denoted by Basel+AE+GP+TP or simply DSTGCNN for short.
The experimental results evaluated on the TaxiBJ test set of all configurations are reported in Table 1. We predict two time steps ahead in all configurations. We can observe that each proposed component consistently reduces the prediction errors and the full model achieves the best performance. The results demonstrate that the auxiliary information embedding, the graph prediction stream and the twostep prediction scheme are all beneficial and complementary to each other. The combination of them accumulates the advantages, therefore achieves the best performance.
Method  Out Volumes  In Volumes  

MAE  RMSE  MAPE  MAE  RMSE  MAPE  
Basel  10.49  13.48  13.11%  10.71  14.44  14.46% 
Basel+AE  10.24  13.18  12.81%  10.41  13.96  14.35% 
Basel+AE+GP  10.03  12.88  12.75%  10.40  13.94  14.39% 
DSTGCNN  9.93  12.78  12.56%  10.24  13.78  14.02% 
5.4 Experiments on METRLA Dataset
In this subsection, we evaluate the prediction performance of DSTGCNN and the compared methods on METRLA dataset. We compare DSTGCNN with four different methods, including: 1) AutoRegressive Integrated Moving Average (ARIMA), which is a wellknown method for timeseries data forecasting and is widely used in traffic prediction; 2) Linear Support Vector Regression (SVR)[Pedregosa et al.2011]
. In order to make use of spatiotemporal data, for each node, we use the historical observations of itself and its neighbors to learn a LinearSVR model; 3) Recurrent Neural Network with fully connected LSTM hidden units (FCLSTM)
[Sutskever, Vinyals, and Le2014]. The training details of FCLSTM is followed as[Li et al.2018];T  Metric  ARIMA  SVR  FCLSTM  DCRNN  DSTGCNN 

15 min  MAE  3.99  3.99  3.44  2.77  2.68 
RMSE  8.21  8.45  6.3  5.38  5.35  
MAPE  9.6%  9.3%  9.6%  7.3%  7.2%  
30 min  MAE  5.15  5.05  3.77  3.15  3.01 
RMSE  10.45  10.87  7.23  6.45  6.23  
MAPE  12.7%  12.1%  10.9%  8.8%  8.52%  
1 hour  MAE  6.9  6.72  4.37  3.6  3.41 
RMSE  13.23  13.76  8.69  7.59  7.47  
MAPE  17.4%  16.7%  13.2%  10.5%  10.25% 
4) DCRNN [Li et al.2018]. This is a recent method which utilizes diffusion convolution and achieves decent results on METRLA;
Table 2 shows the comparison results on METRLA dataset. For all predicting horizons and all metrics, our method outperforms both traditional statistical approaches and deep learning based approaches. This demonstrates the consistency of our method’s performance for both shortterm and longterm prediction.
In Figure 3, we also show the qualitative comparison of prediction in a day. It shows that DSTGCNN can capture the trends of morning peak and evening rush hour better. As can be seen, DSTGCNN predicts the start and end of peak hours which are closer to the ground truth. In contrast, DCRNN does not catch up with the change of traffic data.
5.5 Experiments on TaxiBJ Dataset
We also compare the proposed methods with the stateofthearts on TaxiBJ dataset [Hoang, Zheng, and Singh2016]. The compared methods include: 1) Seasonal ARIMA (SARIMA); 2) vector autoregression model (VAR); 3) FCCF [Hoang, Zheng, and Singh2016]; 4) FCLSTM [Sutskever, Vinyals, and Le2014]; and 5) DCRNN [Li et al.2018]. FCCF [Hoang, Zheng, and Singh2016] utilizes both volume data and auxiliary information including time and weather. Note that we follow the experiments in FCCF that only predict volumes in the next step (30 min later), thus the twostep prediction in our model is not applied. The results by FCCF, SARIMA and VAR were reported in [Hoang, Zheng, and Singh2016]. Since only the RMSE results are provided for SARIMA, VAR and FCCF, we compare with these three methods in terms of RMSE metric. For FCLSTM and DCRNN, the default experimental settings from [Sutskever, Vinyals, and Le2014] and [Li et al.2018] are used, and the results are compared in terms of RMSE, MAP and MAPE. We show the results in Table 3.
Method  Out Volumes  In Volumes  

MAE  RMSE  MAPE  MAE  RMSE  MAPE  
SARIMA    21.2      18.9   
VAR    15.8      15.8   
FCCF    14.2      14.1   
FCLSTM  11.32  14.4  13.67%  11.92  15.3  17.3% 
DCRNN  10.49  13.8  13.11%  10.71  14.5  14.46% 
DSTGCNN  9.38  12.0  11.9%  9.3  12.62  13.27% 
From Table 3, we can see that the proposed DSTGCNN achieves the best performance. The comparison results suggests that the proposed STC layer combined with the graph prediction stream is very effective at future traffic prediction. Although the twostep prediction strategy is not utilized in the case of predicting onestep ahead, our method still models the spatiotemporal dependency and the dynamic graph structure robustly.
5.6 Experimental Results analysis
The reasons that our method achieves new stateoftheart are from the following aspects. Compared with traditional methods, our deep model has larger capacity to describe the complex data dependency in traffic network. Second, our method takes the dynamic topology of traffic network into consideration while the existing methods don’t. As a result, our method can capture the propagation of traffic trends better. Finally, our network is also carefully designed for traffic prediction. The twostep prediction scheme breaks longterm predictions into two shortterm predictions and makes the predictions easier.
6 Conclusion and Future Work
In this paper, we propose an effective and efficient framework DSTGCNN that can predict future traffic flow from previous traffic flow. DSTGCNN incorporates both spatial and temporal correlations in the traffic data, and is able to capture both the dynamics and complexity in traffic. Predicting the dynamic graph further enables DSTGCNN to adapt to the fast evolving traffic condition. The experiments on two largescale datasets indicate that our method outperforms other stateoftheart methods. In the future, we plan to apply the proposed framework to other traffic prediction tasks like pedestrian crowd prediction.
References
 [Cui, Ke, and Wang2016] Cui, Z.; Ke, R.; and Wang, Y. 2016. Deep stacked bidirectional and unidirectional lstm recurrent neural network for networkwide traffic speed prediction. In 6th International Workshop on Urban Computing (UrbComp 2017).
 [Dai et al.2017] Dai, X.; Fu, R.; Lin, Y.; Li, L.; and Wang, F.Y. 2017. Deeptrend: A deep hierarchical neural network for traffic flow prediction. arXiv preprint arXiv:1707.03213.
 [Davis et al.1990] Davis, G. A.; Nihan, N. L.; Hamed, M. M.; and Jacobson, L. N. 1990. Adaptive forecasting of freeway traffic congestion. Transportation Research Record (1287).
 [Defferrard, Bresson, and Vandergheynst2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, 3844–3852.
 [Du et al.2018] Du, S.; Li, T.; Gong, X.; Yu, Z.; and Horng, S.J. 2018. A hybrid method for traffic flow forecasting using multimodal deep learning. arXiv preprint arXiv:1803.02099.
 [Hammond, Vandergheynst, and Gribonval2011] Hammond, D. K.; Vandergheynst, P.; and Gribonval, R. 2011. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30(2):129–150.
 [Henaff, Bruna, and LeCun2015] Henaff, M.; Bruna, J.; and LeCun, Y. 2015. Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163.
 [Hoang, Zheng, and Singh2016] Hoang, M. X.; Zheng, Y.; and Singh, A. K. 2016. Fccf: forecasting citywide crowd flows based on big data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 6. ACM.
 [Howard et al.2017] Howard, A. G.; Zhu, M.; Chen, B.; and Kalenichenko, e. a. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[Huang, Song, and Hong2014]
Huang, W.; Song, G.; and Hong, e. a.
2014.
Deep architecture for traffic flow prediction: Deep belief networks with multitask learning.
IEEE Transactions on Intelligent Transportation Systems 15(5):2191–2201.  [Jagadish et al.2014] Jagadish, H. V.; Gehrke, J.; Labrinidis, A.; Papakonstantinou, Y.; Patel, J. M.; Ramakrishnan, R.; and Shahabi, C. 2014. Big data and its technical challenges. Communications of The ACM 57(7):86–94.
 [Jain et al.2016] Jain, A.; Zamir, A. R.; Savarese, S.; and Saxena, A. 2016. Structuralrnn: Deep learning on spatiotemporal graphs. In CVPR, 5308–5317.
 [Jin, Zhang, and Yao2007] Jin, X.; Zhang, Y.; and Yao, D. 2007. Simultaneously prediction of network traffic flow based on pcasvr. Advances in Neural Networks–ISNN 2007 1022–1031.
 [Larry1995] Larry, H. K. 1995. Event—based short—term traffic flow prediction model. Transportation Research Record 1510:125–143.
 [Leshem and Ritov2007] Leshem, G., and Ritov, Y. 2007. Traffic flow prediction using adaboost algorithm with random forests as a weak learner. In Proceedings of World Academy of Science, Engineering and Technology, volume 19, 193–198.
 [Li et al.2018] Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion convolutional recurrent neural network: Datadriven traffic forecasting. In International Conference on Learning Representations (ICLR ’18).
 [Liao et al.2018] Liao, B.; Zhang, J.; Wu, C.; McIlwraith, D.; Chen, T.; Yang, S.; Guo, Y.; and Wu, F. 2018. Deep sequence learning with auxiliary information for traffic prediction. arXiv preprint arXiv:1806.07380.
 [Lv et al.2015] Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; and Wang, F. 2015. Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16(2):865–873.
 [Ma et al.2017] Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; and Wang, Y. 2017. Learning traffic as images: a deep convolutional neural network for largescale transportation network speed prediction. Sensors 17(4):818.
 [Mathew and Rao2017] Mathew, T. V., and Rao, K. K. 2017. Fundamental relations of traffic flow. Lecture Notes in Transportation Systems Engineering. Bombay, India: Department of Civil Engineering Indian Institute of Technology Bombay. Google Scholar.
 [Paszke et al.2017] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch.

[Pedregosa et al.2011]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; and Michel, e. a.
2011.
Scikitlearn: Machine learning in python.
Journal of Machine Learning Research 12(Oct):2825–2830.  [Sutskever, Vinyals, and Le2014] Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 3104–3112.
 [Tan and Li2018] Tan, Z., and Li, R. 2018. A dynamic model for traffic flow prediction using improved drn. arXiv preprint arXiv:1805.00868.
 [Vlahogianni, Karlaftis, and Golias2014] Vlahogianni, E. I.; Karlaftis, M. G.; and Golias, J. C. 2014. Shortterm traffic forecasting: Where we are and where we’re going. Transportation Research Part C: Emerging Technologies 43:3–19.
 [Williams and Hoel2003] Williams, B. M., and Hoel, L. A. 2003. Modeling and forecasting vehicular traffic flow as a seasonal arima process: Theoretical basis and empirical results. Journal of Transportation Engineeringasce 129(6):664–672.
 [Wu and Tan2016] Wu, Y., and Tan, H. 2016. Shortterm traffic flow forecasting with spatialtemporal correlation in a hybrid deep learning framework. arXiv preprint arXiv:1612.01022.
 [Yu, Yin, and Zhu2017] Yu, B.; Yin, H.; and Zhu, Z. 2017. Spatiotemporal graph convolutional neural network: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875.

[Zhang, Zheng, and Qi2016]
Zhang, J.; Zheng, Y.; and Qi, D.
2016.
Deep spatiotemporal residual networks for citywide crowd flows
prediction.
national conference on artificial intelligence
1655–1661.
Comments
There are no comments yet.