RGCRN
None
view repo
Traffic flow forecasting is essential for traffic planning, control and management. The main challenge of traffic forecasting tasks is accurately capturing traffic networks' spatial and temporal correlation. Although there are many traffic forecasting methods, most of them still have limitations in capturing spatial and temporal correlations. To improve traffic forecasting accuracy, we propose a new Spatialtemporal forecasting model, namely the Residual Graph Convolutional Recurrent Network (RGCRN). The model uses our proposed Residual Graph Convolutional Network (ResGCN) to capture the finegrained spatial correlation of the traffic road network and then uses a Bidirectional Gated Recurrent Unit (BiGRU) to model time series with spatial information and obtains the temporal correlation by analysing the change in information transfer between the forward and reverse neurons of the time series data. Our comparative experimental results on two real datasets show that RGCRN improves on average by 20.66 our source code and data through https://github.com/zhangshqii/RGCRN.
READ FULL TEXT VIEW PDFNone
Traffic flow forecasting is an integral part of Intelligent Transportation Systems (ITSref1), which is of great significance for road planning and construction and smart city traffic management in the new era. The task of traffic flow forecasting can be described as analysing the trend of traffic changes in a future period based on historical road traffic conditions. However, it is very challenging to forecast traffic data accurately. First, each road in a traffic network has complex spatial correlations in the spatial dimension. The more spatially connected and proximate roads are, the more similar the traffic conditions are and the more likely they are to influence each other. Second, the dynamic variability and uncertainty of traffic data in the time dimension make modelling temporal correlation challenging.
Facing the above challenges, traditional time series forecasting modelsref2 focus only on the temporal correlation of traffic time series data while ignoring the spatial correlation of traffic road networks. Therefore, some studiesref3; ref4; ref5
have attempted to integrate Convolutional Neural Network (CNN
ref6) with Recurrent Neural Network (RNN
ref7) to propose CNNbased Spatialtemporal forecasting models. Specifically, this model uses CNN to model the traffic road network as a regular grid to capture the spatial correlation of the traffic network. They then use RNN and their successors (e.g., Long ShortTerm Memory (LSTM
ref8) network and Gated Recurrent Unit (GRUref9)) to capture the temporal correlation of traffic data. However, CNN does not apply to traffic road networks with nonEuclidean structuresref10, which improves in forecasting accuracy of CNNbased Spatialtemporal forecasting models compared to traditional timeseries forecasting models insignificant. To solve the above problem, the researchers replaced CNN with Graph Neural Network (GNN
ref11) to model traffic road networks. With its excellent modelling capability for nonEuclidean structured data, GNN can effectively capture spatial correlation information and significantly improve the correctness of traffic forecasting. However, we believe that there are still two critical aspects of GNNbased traffic forecasting methods that are overlooked.First, some GNNbased methodsref12; ref13 for modelling unstructured data are trained endtoendref14 by weighted graphs with information about the features of nodes and the topology of graphs. This feature makes GNNbased methods applicable to a variety of irregular graphstructured data. Meanwhile, Graph Convolutional Network (GCNref12) has a layered structure similar to that of CNN. As shown in Figure 1, each convolutional layer handles only firstorder neighbourhood information, and information transfer in multiorder neighbourhoods can be achieved by stacking several convolutional layers. CNN can capture more finegrained feature information by stacking more convolutional layers like CNNs. However, as the model goes more profound, the layers propagate noisy data from the extended neighbourhoodref12
, resulting in a degradation of the neural network’s performance. Although some researchers have tried to use residual connections
ref15 to overcome this problem, network degradation still occurs as the network depth deepens. This limits the ability of GNNs to extract finegrained spatial features and reduces accuracy.Second, although the RNNbased temporal forecasting model can use its selflooping mechanism to learn the material correlation information of time series, it still has some limitations. On the other hand, RNN is a strictly timeconstrained model, where the output information at a particular moment is only related to the input information before that moment. It cannot be correlated with the data after that moment. However, traffic events in the traffic domain have solid contextual relevance
ref16 in the time dimension. For example, an unexpected traffic accident on a traffic road may cause traffic conditions on that road to become congested for some time to come. On the other hand, the chain structure design of RNN makes it necessary for the signals in the network to be delivered along long cyclic pathsref17; ref18. However, as the length of the cyclic path increases, the more likely the RNN loses some vital information in the network. This makes RNNs ineffective in extracting temporal feature information when modelling longtime sequences, which reduces the model’s forecasting accuracy.Given the above discussion, a new spatialtemporal forecasting model is proposed in this paper. The model can obtain the spatial correlation of traffic road networks more effectively and effectively mine the potential contextual correlation in traffic events and capture the longterm temporal correlation of traffic data. In summary, the main contributions of our work are the following three points.
We propose a new Residual Graph Convolution (ResGCN) network that uses onedimensional convolution to aggregate feature information of graph nodes at once, captures spatial correlation of traffic road networks and solves the degradation problem of deep convolutional networks through residual connections.
We used Bidirectional Gated Recurrent Unit (BiGRU) for forward and backward modelling of traffic time series data, mining the hidden contextual relationships in the data and capturing the longterm temporal correlation of traffic timing information.
We propose a new Spatialtemporal forecasting model: Residual Graph Convolutional Recurrent Network (RGCRN). The model specialises in modelling Spatialtemporal data with complex topology and high nonlinearity to capture potential Spatialtemporal correlations. Experiments show that our model achieves SOTA in all metrics.
The rest of this paper is organised as follows. In Section 2, we present the literature related to traffic forecasting. In Section 3, the structure and implementation of RGCRN are explained in detail. In Section 4, we conduct experiments with RGCRN on two real traffic datasets with other models to evaluate the predictive power of the proposed model. In Section 5, we analyse and summarise the proposed model.
The task of traffic flow forecasting has been one of the important research directions in intelligent transportation. Traffic flow forecasting tasks can be classified into two categories based on the driving approach: modeldriven and datadriven approaches. First, the modeldriven approach requires modelling the traffic network based on a priori knowledge to describe road traffic conditions’ transient and steadystate relationships. The main approaches are queueing theory model
ref19 and the traffic speed modelref20. However, these theoretical approaches are challenging to describe the complex changes in traffic data in real traffic scenarios and cannot accurately predict the road traffic conditions.Secondly, the datadriven approach aims to discover the patterns of traffic flow changes from historical traffic data and finally complete the forecasting of road traffic conditions. Early research methods mainly include the Historical Average model (HAref21), Autoregressive Integrated Moving Average Model (ARIMAref22
), Vector AutoRegression model (VAR
ref23), and Support Vector Regression model (SVRref24), which are based on statistics and machine learning. With the rapid development of deep understanding, models such as LSTM
ref8 and GRUref9 have also been applied to traffic forecasting and have received attention for their excellent results. In recent years, some researchers have proposed a Temporal Convolutional Network (TCNref25; ref26) capable of processing very long sequences in less time, and the performance has been improved substantially. With the wide application of Transformerref27 on sequence data, some studiesref28 have proposed Transformerbased time series forecasting models, such as Informerref29 and Autoformerref30. However, all these methods treat traffic time series from different roads as independent data streams, ignoring the spatial correlation between traffic road nodes, making it impossible to predict road traffic conditions accurately.Accurately capturing traffic networks’ spatial and temporal correlation is the key to traffic flow forecasting. Therefore, Spatialtemporal forecasting models based on the integration of GNN and RNN have been widely used for traffic forecasting tasks. Many studies have improved to describe the Spatialtemporal correlation of traffic networks better. For example, the DCRNNref31 model captures the spatial correlation by a bidirectional random tour on the graph and the temporal correlation by diffusion convolution. STGCNref32 combines graph and 1D convolution to capture spatial correlation of traffic road networks through GCN and then captures temporal correlation using 1D convolutional networks. The TGCNref33 model combines GCN and GRU to capture traffic data’s spatial and temporal correlation. GMANref34 introduces a Spatialtemporal attention mechanism to capture the spatial and temporal correlation of traffic data dynamically. MTGNNref35 uses a hybrid jump propagation layer and an expanded starting layer to capture spatial and temporal correlations. Graph Wavenetref36 combines GCN with a comprehensive causal convolutional network and proposes an adaptive adjacency matrix to complement the predefined adjacency matrix to capture Spatialtemporal correlation. GWNETconvref37 introduces a new covariance loss on Graph WaveNet to significantly improve the forecasting accuracy of Graph WaveNet. DGCRNref38 uses a supernetwork to generate the adjacency matrix and merge it with the original road network matrix to capture spatial correlations dynamically. STGNNref39 model provides a learnable spatial graph neural network of location attention mechanisms and captures local and global temporal correlations using GRU and transformer layers. STFGNNref40 effectively learns spatial and temporal correlations by fusing multiple spatial and temporal graphs. STCGATref41 dynamically captures the spatial correlations of traffic road networks through Graph Attention Network (GATref13) and uses the proposed Causal Temporal Convolutional Network(CTCN) to capture traffic data’s causal and temporal correlations.
Inspired by the above methods, this paper proposes a new traffic forecasting method that not only captures the finegrained spatial correlation of traffic networks but also captures the temporal correlation of time series more accurately by analysing the hidden contextual correlation in traffic timeseries data, thus substantially improving the accuracy of traffic flow forecasting.
In this paper, traffic forecasting is the analysis of trends in traffic conditions over timebased on the historical traffic conditions of the traffic road network. Traffic condition is a general concept such as traffic speed, flow and density. We use traffic speed as traffic condition information.
: We consider each road as a node in the graph and use the weighted graph to express the topological structure information of the traffic road network with to denote the set of road nodes on the traffic road network graph, is a set of edges connecting these road nodes, and denotes the distance between all connected road nodes information. The adjacency matrix represents the connection relationship between road nodes. For example, for any two nodes and , the values of and are if the two nodes are connected, and the importance of the two elements is set to if they are not connected.
: We use the traffic speed on each road as feature information and represent it using the feature matrix , where denotes the number of node feature attributes(length of historical timing information) and indicates the feature vector of all nodes at any moment.
According to the above definition, the traffic forecasting problem aims to learn a mapping function that graph the weighted graph and the historical traffic data with step size to the subsequent traffic data with step size next. As shown in Equation (1).
(1) 
The framework of the proposed RGCRN is shown in Figure 2. RGCRN consists of two main components. resGCN and BiGRU. Specifically, we first model the spatial correlation of the traffic road network using ResGCN and then model the spatially correlated time series output from ResGCN using BiGRU to obtain the temporal by analysing the change in information transfer between the forward and reverse neurons of the time series data correlation. Finally, the forecasting results are output using the fully connected layer.
In traffic forecasting, it is essential to capture the spatial correlation of traffic road networks. Therefore, we propose that ResGCN capture the spatial correlation of traffic road networks in a finegrained manner. As shown in b of Figure 2
, we discard the graph convolution layer is multiple linear transform operations, aggregate the feature vectors of all nodes in the graph at once, and then process them using a onedimensional convolutional network to obtain the newly generated feature vectors. In addition, we also use residual connections to avoid the network degradation problem caused by the stacking of convolutional layers. As shown in c of Figure
2, each ResGCN Layer contains two 1D graph convolution modules, and Weight Norm and Dropout regularisation are applied to the computed results. Specifically, we input the feature set of all nodes at the moment into the ResGCN Layer at layer , and the output of this layer is obtained after calculation. The specific calculation process is shown in Equation (2).(2) 
where is the normalized Laplacian matrix to express the proximity relationship between road nodes, , is the unit diagonal matrix, is the degree matrix of . and are the convolution kernels of the first and second 1DCNN, respectively, denoting the convolution operation, is the regularization function;
represents the GELU activation function,
indicates the mapping function of the neural network, is the set of feature vectors at the output of the residual map convolution layer. is the number of channels of the 1D convolutional network.Until the residual map convolution layer at time completes the computation, we aggregate all the output results to obtain the final output of ResGCN . The calculation process is shown in Equation (3).
(3) 
To capture the Spatialtemporal correlation of the traffic network simultaneously, we use BiGRU to capture the temporal correlation of the time series data. To facilitate the description of the processing of BiGRU, we present the forward (rightward) GRU as an example. Figure 2 middle d shows that the GRU consists of several structurally identical neurons connected sequentially. There are two gating units within each neuron: 1) Reset Gate: control how much information needs to be forgotten from the last moment; 2) Update Gate: controls how much information can be saved from the previous moment and the current moment. Specifically, we take the output result in in Equation (3) as the input data of the neuron at time . The calculation procedure is shown in Equation (4).
(4) 
where , . denotes the hidden state at the previous moment, is the reset gate operation, is the update gate operation, is the memory content stored at time , and is the output state of the neuron at time .
is the sigmoid function,
, , are the weight matrices, , , are the bias terms, and is the multiplication operation by elements. Until the last neuron finishes its operation, we get the output of the forward GRU , is the number of GRU hidden layers.Like the forward GRU, the reverse(leftward) GRU processes the input timeseries data from the opposite direction to obtain . Finally, we stitch the output results of forward and reverse GRU to get the output results of BiGRU. As shown in Equation (5).
(5) 
Where is the final output, represents the GELU activation function, and is the splicing operation.
Finally, this paper uses a fully connected neural network to implement a multistep traffic flow forecasting task. Specifically, in Equation (5) is used as the input data, and the final forecasting result is obtained by learning a weight matrix . The specific calculation procedure is shown in Equation (6).
(6) 
Where denotes the mapping function, and is the time length of the forecasting.
In the training process, to minimise the error between the forecasting result in and the actual value
. This paper chooses the L1 loss function to optimize the neural network model. The specific calculation process is shown in Equation (
7).(7) 
In this paper, experiments are conducted on the following two real traffic datasets.
: This is a data set of traffic speeds in miles per hour derived from 207 different roadway loop detectors recorded on Los Angeles freeways. The recording period is from March 1, 2012, to June 30, 2012.
: This is a dataset of traffic speeds in miles per hour recorded by 325 different roadway sensors in the Bay Area selected from the Performance Measurement System (PeMS). The recording period is from January 1, 2017, to May 31, 2017.
In both datasets, aggregate the traffic data every 5 minutes, divide the dataset into training, validation, and test sets in the ratio of , and then process the above split data by a sliding window of length . Where denotes the size of the historical data, and denotes the length of the data to be predicted. In our experiments, both and were set to 12. Table 1 shows statistical information about the dataset.
Dataset  Samples  Sensors  Unit  Length  length 

METRLA  34272  207  5 min  12  12 
PEMSBAY 
52116  325  5 min  12  12 
We compared RGCRN with some advanced traffic forecasting models in recent years, and these baseline models are described below.
HAref21: The method averages the historical traffic speed as the forecast result.
ARIMAref22: It combines autoregressive and moving average models.
VARref23: This is a classical timeseries forecasting model using a linear kernel.
SVRref24: The linear kernel pair model is selected for training.
FCLSTMref8: LSTM with fully connected network.
DCRNNref31: The model uses bidirectional GCN to model the spatial information of the traffic network, which is modelled with GRU for timeseries data.
STGCNref32: The model combines graph convolution with onedimensional convolution to capture Spatialtemporal correlations.
TGCNref33: The model captures the spatial and temporal correlation of the traffic network using GCN and GRU, respectively.
GMANref34: This model is based on attentional mechanisms that incorporate spatial, temporal, and transformational attention.
MTGNNref35: This is a forecasting model for multivariate time series from a graph perspective using graph neural networks.
Graph WaveNetref36: The model combines diffusion graph convolution with a onedimensional extended convolution of the selection pass and proposes an adaptive adjacency matrix.
GWNETconvref37: This is a new loss function, namely covariance loss, is introduced based on Graph WaveNet.
DGCRNref38: The model proposes a supernetwork with adaptive stepbystep generation of dynamic adjacency matrices, which significantly improves the forecasting performance.
STGNNref39: The model provides a learnable location attention mechanism and a sequence component to capture spatial and temporal correlations.
STFGNNref40: The model assembles the extended CNN module in parallel with the spatialtemporal fusion graph module to extract longrange spatialtemporal correlations.
STCGATref41: The model consists of GAT and CTCN that dynamically captures spatial correlation and catches potential causal time correlation.
RGCRN(S): The model is a variant of RGCRN, which simplifies the processing of the residual map convolution module, fuses the time dimension and feature dimension of each node, and aggregates the feature information of the sliding time window at one time, further improving the forecasting efficiency.
The model was implemented by Pytorch 1.10.0, and all experiments were performed on an Nvidia GeForce RTX 2080Ti GPU, and all experiments were repeated ten times to select the best experimental results. In addition, we used the same hyperparameter settings for both METRLA and PEMSBAY. Expressly, we set the size of the 1D convolutional kernel to 3, the number of GRU hidden layer units to 128, the batch size to 64, the learning rate to 0.001, and the maximum number of iterations to 150, using the Adam optimiser for optimisation.
To measure the model’s forecasting performance, we evaluated the labelled value and the predicted value using the following three metrics.
Mean Absolute Error(MAE):
(8) 
Root Mean Squared Error(RMSE):
(9) 
Mean Absolute Percentage Error(MAPE):
(10) 
The smaller the value of the above three metrics, the better the forecasting model’s performance.
As shown in Tables 2 and 3, RGCRN achieved the best predictive performance for all metrics in all forecasting periods in the comparison experiments with all baseline models. In addition, we found that the forecasting performance of RGCRN on the METRLA was weaker than that on the PEMSBAY, which makes METRLA more challenging. Therefore, we performed the following analysis using the forecasting results on the METRLA.
Model  Time  15 Min  30 Min  60 Min  


Metrics  MAE  RMSE  MAPE  MAE  RMSE  MAPE  MAE  RMSE  MAPE 
HA 
4.16  7.80  13.00%  4.16  7.80  13.00%  4.16  7.80  13.00%  
ARIMA 
3.99  8.21  9.60%  5.15  10.45  12.70%  6.90  13.23  17.40%  
VAR 
4.42  7.89  10.20%  5.41  9.13  12.70%  6.52  10.11  15.80%  
SVR 
3.99  8.45  9.30%  5.05  10.87  12.10%  6.72  13.76  16.70%  
FCLSTM  3.44  6.30  9.60%  3.77  7.23  10.90%  4.37  8.69  13.20%  
DCRNN 
2.77  5.38  7.30%  3.15  6.45  8.80%  3.60  7.60  10.50%  
STGCN 
2.88  5.74  7.62%  3.47  7.24  9.57%  4.59  9.40  12.70%  
TGCN 
3.03  5.26  7.81%  3.52  6.12  9.45%  4.30  7.31  11.80%  
GMAN 
2.80  5.55  7.41%  3.12  6.49  8.73%  3.44  7.35  10.07%  
MTGNN 
2.69  5.18  6.86%  3.05  6.17  8.19%  3.49  7.23  9.87%  
Graph WaveNet 
2.69  5.15  6.90%  3.07  6.22  8.37%  3.53  7.37  10.01%  
GWNETconv 
2.69  5.14  6.83%  3.07  6.17  8.26%  3.53  7.27  9.85%  
DGCRN 
2.62  5.01  6.63%  2.99  6.05  8.02%  3.44  7.19  9.73%  
STGNN 
2.62  4.99  6.55%  2.98  5.88  7.77%  3.49  6.94  9.69%  
STFGNN 
2.56  4.70  6.46%  2.83  5.44  7.45%  3.17  6.37  8.73%  
STCGAT 
0.42  1.33  1.00%  0.60  2.30  1.46%  1.48  5.96  2.97%  

Model  Time  15 Min  30 Min  60 Min  


Metrics  MAE  RMSE  MAPE  MAE  RMSE  MAPE  MAE  RMSE  MAPE 
HA 
2.88  5.59  6.80%  2.88  5.59  6.80%  2.88  5.59  6.80%  
ARIMA 
1.62  3.30  3.50%  2.33  4.76  5.40%  3.38  6.50  8.30%  
VAR 
1.74  3.16  3.60%  2.32  4.25  5.00%  2.93  5.44  6.50%  
SVR 
1.85  3.59  3.80%  2.48  5.18  5.50%  3.28  7.08  8.00%  
FCLSTM  2.05  4.19  4.80%  2.20  4.55  5.20%  2.37  4.96  5.70%  
DCRNN 
1.38  2.95  2.90%  1.74  3.97  3.90%  2.07  4.74  4.90%  
STGCN 
1.36  2.96  2.90%  1.81  4.27  4.17%  2.49  5.69  5.79%  
TGCN 
1.50  2.83  3.14%  1.73  3.40  3.76%  2.18  4.35  4.94%  
GMAN 
1.34  2.91  2.86%  1.63  3.76  3.68%  1.86  4.32  4.37%  
MTGNN 
1.32  2.79  2.77%  1.65  3.74  3.69%  1.94  4.49  4.53%  
Graph WaveNet 
1.30  2.74  2.73%  1.63  3.70  3.67%  1.95  4.52  4.63%  
GWNETconv 
1.30  2.73  2.69%  1.62  3.67  3.59%  1.91  4.40  4.47%  
DGCRN 
1.28  2.69  2.66%  1.59  3.63  3.55%  1.89  4.42  4.43%  
STGNN 
1.17  2.43  2.34%  1.46  3.27  3.09%  1.83  4.20  4.15%  
STFGNN 
1.15  2.31  2.39%  1.38  3.00  3.01%  1.66  3.71  3.74%  
STCGAT 
0.30  0.72  0.66%  0.36  1.19  0.81%  0.61  2.27  1.38%  

: We can find that deep learningbased forecasting methods, on both datasets, usually have better forecasting accuracy than other probabilistic statistics and machine learningbased techniques (e.g., HA, ARIMA, VAR, and SVR). This is due to the difficulty of these traditional baseline methods in handling complex nonlinear time series data. In the spatiotemporal forecasting model, RGCRN(S) reduces MAE by about
, RMSE by , and MAPE by on the 15minute traffic forecasting task compared with the optimal baseline method STCGAT. On the 30minute traffic forecasting task, MAE was reduced by about , RMSE by , and MAPE by . On the 60minute traffic forecasting task, MAE was decreased by about , RMSE by , and MAPE by .: From Table 2, we can observe that RGCRN outperforms other Spatialtemporal forecasting methods in both shortterm and longterm traffic forecasting. Among these compared methods, GMAN uses a selfattentionbased architecture and a Spatialtemporal embedding module that facilitates remote forecasting. However, selfattention does not capture local sequence correlations, and Spatialtemporal embedding is relatively simple for modelling complex Spatialtemporal correlation, which reduces the performance of shortterm forecasting. Graph WaveNet and MTGNN can accurately capture the spatial correlation of traffic networks from the adaptive adjacency matrix. However, the adaptive graphs are still static and cannot capture dynamic spatial correlation at each time step as time changes, making longterm Spatialtemporal forecasting relatively poor. In addition, most of the other baselines cannot model the dynamic characteristics of the traffic network structure, limiting their representation capability. In contrast, STCGAT has the best Spatialtemporal forecasting performance among the baseline methods. This is mainly because STCGAT uses GAT to dynamically capture the spatial correlation of the traffic road network, which is then integrated with TCN by BiGRU to capture the causal time correlation of the time series.
: RGCRN(S) is a simplified model of RGCRN. Precisely, RGCRN(S) no longer stacks equalsized RGC layers according to the length of the time window. Instead, the time dimension and feature dimension of the time series data input to the residual map convolution module are fused to process the time series of all moments at once. From Table 2
, we can observe that the forecasting performance of RGCRN on the METRLA dataset is better than RGCRN(S) overall. However, on the PEMSBAY dataset, RGCRN did not perform well as RGCRN(S) in both cases. This may be due to the more complex structure of RGCRN, which is more suitable for handling data with complex systems by stacking multiple RGC layers, making RGCRN have better highlevel feature extraction capability. However, as shown in Table
4, RGCRN requires more adequate resources and time for training than RGCRN(S).Model  METRLA  PEMSBAY 

RGCRN  0.75 min/epoch 
1.45 min/epoch 
RGCRN(S) 
0.17 min/epoch  0.30 min/epoch 
To analyse the validity of the component components of our model, we designed several variants of the RGCRN model and performed ablation experiments on the METRLA dataset.
: The model removes the onedimensional convolutional layer from the RGCRN and replaces it with a linear layer.
: The model replaces the ResGCN in the RGCRN with a doublelayer GCN.
: The model removes the inverse GRU from the RGCRN and captures the time correlation of the time series using a single layer GRU.
: The model removes the residual connectivity of RGCRN.
Model  Time  15 Min  30 Min  60 Min  


Metrics  MAE  RMSE  MAPE  MAE  RMSE  MAPE  MAE  RMSE  MAPE 
w /o CNN  1.75  2.72  3.90%  1.80  3.09  4.16%  1.99  4.38  5.08%  
w /o RGC  0.61  1.51  1.31%  0.80  2.57  1.78%  1.42  4.46  3.29%  
w /o GRU  1.98  3.47  4.09%  2.34  4.34  4.78%  3.04  5.92  6.61%  
w /o Res  0.31  1.36  0.61%  0.47  2.41  1.24%  0.98  4.12  2.57%  
RGCRN  0.22  1.25  0.60%  0.42  2.26  1.12%  0.89  4.08  2.44% 
As shown in Table 5, the predictive performance metrics data for RGCRN and the four variants of this model at three different time steps are presented. We can observe that the w/o GRU model has the worst forecasting performance, proving the necessity of using BiGRU to analyse the contextual relevance of the time series. From the experimental results of w/o CNN, we can observe that using a onedimensional convolutional network is more advantageous than using a traditional linear layer. Meanwhile, our proposed ResGCN capturing spatial correlation is superior to the twolayer GCN by w/o RGC. Finally, we can find in the experimental results that the RGCRN is similar to w/o Res in three metrics. This may be because the depth of the network is not deep, and there is no network degradation.
Meanwhile, RGCRN is slightly better than w/o Res in all cases. This indicates the effectiveness of residual connectivity, which improves the predictive power of RGCRN. In addition to this, Figure 3 shows the average metric data for RGCRN and other baseline models within the onehour forecasting task. This proves that RGCRN also has the best performance overall, demonstrating the effectiveness of the RGCRN components.
In this paper, we propose a new traffic forecasting model. The spatial correlation of the traffic road network is captured using our proposed residual map convolutional network. The temporal correlation of the traffic flow sequence is captured using a bidirectional gating unit. We conducted comparison experiments with other advanced baseline models on two real traffic datasets to verify the model’s effectiveness. The experimental results show that all the metrics of our model achieve SOTA. The forecasting accuracy and performance are substantially improved compared with existing methods, bringing a new solution to the traffic forecasting task, which is essential for constructing intelligent transportation systems. In future work, we will continue to focus on the following two issues: (1) how to adaptively capture the spatial correlation of dynamic graphs when the structure of the traffic justification network atlas graph changes; (2) how to model unexpected traffic incidents and random traffic events to capture global temporal correlation.