Residual Graph Convolutional Recurrent Networks For Multi-step Traffic Flow Forecasting

Traffic flow forecasting is essential for traffic planning, control and management. The main challenge of traffic forecasting tasks is accurately capturing traffic networks' spatial and temporal correlation. Although there are many traffic forecasting methods, most of them still have limitations in capturing spatial and temporal correlations. To improve traffic forecasting accuracy, we propose a new Spatial-temporal forecasting model, namely the Residual Graph Convolutional Recurrent Network (RGCRN). The model uses our proposed Residual Graph Convolutional Network (ResGCN) to capture the fine-grained spatial correlation of the traffic road network and then uses a Bi-directional Gated Recurrent Unit (BiGRU) to model time series with spatial information and obtains the temporal correlation by analysing the change in information transfer between the forward and reverse neurons of the time series data. Our comparative experimental results on two real datasets show that RGCRN improves on average by 20.66 our source code and data through


Temporal Graph Convolutional Network for Urban Traffic Flow Prediction Method

Accurate and real-time traffic forecasting plays an important role in th...

Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting

Modeling complex spatial and temporal correlations in the correlated tim...

STCGAT: Spatial-temporal causal networks for complex urban road traffic flow prediction

Traffic forecasting is an essential component of intelligent transportat...

PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting

The complex spatial-temporal correlations in transportation networks mak...

Residual Correction in Real-Time Traffic Forecasting

Predicting traffic conditions is tremendously challenging since every ro...

Dynamic Adaptive and Adversarial Graph Convolutional Network for Traffic Forecasting

Traffic forecasting is challenging due to dynamic and complicated spatia...

Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) for Vehicle Condition Prediction

The prediction of urban vehicle flow and speed can greatly facilitate pe...

Code Repositories

1 Introduction

Traffic flow forecasting is an integral part of Intelligent Transportation Systems (ITSref1), which is of great significance for road planning and construction and smart city traffic management in the new era. The task of traffic flow forecasting can be described as analysing the trend of traffic changes in a future period based on historical road traffic conditions. However, it is very challenging to forecast traffic data accurately. First, each road in a traffic network has complex spatial correlations in the spatial dimension. The more spatially connected and proximate roads are, the more similar the traffic conditions are and the more likely they are to influence each other. Second, the dynamic variability and uncertainty of traffic data in the time dimension make modelling temporal correlation challenging.

Facing the above challenges, traditional time series forecasting modelsref2 focus only on the temporal correlation of traffic time series data while ignoring the spatial correlation of traffic road networks. Therefore, some studiesref3; ref4; ref5

have attempted to integrate Convolutional Neural Network (CNN


) with Recurrent Neural Network (RNN


) to propose CNN-based Spatial-temporal forecasting models. Specifically, this model uses CNN to model the traffic road network as a regular grid to capture the spatial correlation of the traffic network. They then use RNN and their successors (e.g., Long Short-Term Memory (LSTM

ref8) network and Gated Recurrent Unit (GRUref9)) to capture the temporal correlation of traffic data. However, CNN does not apply to traffic road networks with non-Euclidean structuresref10

, which improves in forecasting accuracy of CNN-based Spatial-temporal forecasting models compared to traditional time-series forecasting models insignificant. To solve the above problem, the researchers replaced CNN with Graph Neural Network (GNN

ref11) to model traffic road networks. With its excellent modelling capability for non-Euclidean structured data, GNN can effectively capture spatial correlation information and significantly improve the correctness of traffic forecasting. However, we believe that there are still two critical aspects of GNN-based traffic forecasting methods that are overlooked.

Figure 1: N-layer Graph Convolutional Network Framework.

First, some GNN-based methodsref12; ref13 for modelling unstructured data are trained end-to-endref14 by weighted graphs with information about the features of nodes and the topology of graphs. This feature makes GNN-based methods applicable to a variety of irregular graph-structured data. Meanwhile, Graph Convolutional Network (GCNref12) has a layered structure similar to that of CNN. As shown in Figure 1, each convolutional layer handles only first-order neighbourhood information, and information transfer in multi-order neighbourhoods can be achieved by stacking several convolutional layers. CNN can capture more fine-grained feature information by stacking more convolutional layers like CNNs. However, as the model goes more profound, the layers propagate noisy data from the extended neighbourhoodref12

, resulting in a degradation of the neural network’s performance. Although some researchers have tried to use residual connections

ref15 to overcome this problem, network degradation still occurs as the network depth deepens. This limits the ability of GNNs to extract fine-grained spatial features and reduces accuracy.

Second, although the RNN-based temporal forecasting model can use its self-looping mechanism to learn the material correlation information of time series, it still has some limitations. On the other hand, RNN is a strictly time-constrained model, where the output information at a particular moment is only related to the input information before that moment. It cannot be correlated with the data after that moment. However, traffic events in the traffic domain have solid contextual relevance

ref16 in the time dimension. For example, an unexpected traffic accident on a traffic road may cause traffic conditions on that road to become congested for some time to come. On the other hand, the chain structure design of RNN makes it necessary for the signals in the network to be delivered along long cyclic pathsref17; ref18. However, as the length of the cyclic path increases, the more likely the RNN loses some vital information in the network. This makes RNNs ineffective in extracting temporal feature information when modelling long-time sequences, which reduces the model’s forecasting accuracy.

Given the above discussion, a new spatial-temporal forecasting model is proposed in this paper. The model can obtain the spatial correlation of traffic road networks more effectively and effectively mine the potential contextual correlation in traffic events and capture the long-term temporal correlation of traffic data. In summary, the main contributions of our work are the following three points.

  1. We propose a new Residual Graph Convolution (ResGCN) network that uses one-dimensional convolution to aggregate feature information of graph nodes at once, captures spatial correlation of traffic road networks and solves the degradation problem of deep convolutional networks through residual connections.

  2. We used Bi-directional Gated Recurrent Unit (BiGRU) for forward and backward modelling of traffic time series data, mining the hidden contextual relationships in the data and capturing the long-term temporal correlation of traffic timing information.

  3. We propose a new Spatial-temporal forecasting model: Residual Graph Convolutional Recurrent Network (RGCRN). The model specialises in modelling Spatial-temporal data with complex topology and high nonlinearity to capture potential Spatial-temporal correlations. Experiments show that our model achieves SOTA in all metrics.

The rest of this paper is organised as follows. In Section 2, we present the literature related to traffic forecasting. In Section 3, the structure and implementation of RGCRN are explained in detail. In Section 4, we conduct experiments with RGCRN on two real traffic datasets with other models to evaluate the predictive power of the proposed model. In Section 5, we analyse and summarise the proposed model.

2 Related Works

The task of traffic flow forecasting has been one of the important research directions in intelligent transportation. Traffic flow forecasting tasks can be classified into two categories based on the driving approach: model-driven and data-driven approaches. First, the model-driven approach requires modelling the traffic network based on a priori knowledge to describe road traffic conditions’ transient and steady-state relationships. The main approaches are queueing theory model

ref19 and the traffic speed modelref20. However, these theoretical approaches are challenging to describe the complex changes in traffic data in real traffic scenarios and cannot accurately predict the road traffic conditions.

Secondly, the data-driven approach aims to discover the patterns of traffic flow changes from historical traffic data and finally complete the forecasting of road traffic conditions. Early research methods mainly include the Historical Average model (HAref21), Autoregressive Integrated Moving Average Model (ARIMAref22

), Vector Auto-Regression model (VAR

ref23), and Support Vector Regression model (SVRref24

), which are based on statistics and machine learning. With the rapid development of deep understanding, models such as LSTM

ref8 and GRUref9 have also been applied to traffic forecasting and have received attention for their excellent results. In recent years, some researchers have proposed a Temporal Convolutional Network (TCNref25; ref26) capable of processing very long sequences in less time, and the performance has been improved substantially. With the wide application of Transformerref27 on sequence data, some studiesref28 have proposed Transformer-based time series forecasting models, such as Informerref29 and Autoformerref30. However, all these methods treat traffic time series from different roads as independent data streams, ignoring the spatial correlation between traffic road nodes, making it impossible to predict road traffic conditions accurately.

Accurately capturing traffic networks’ spatial and temporal correlation is the key to traffic flow forecasting. Therefore, Spatial-temporal forecasting models based on the integration of GNN and RNN have been widely used for traffic forecasting tasks. Many studies have improved to describe the Spatial-temporal correlation of traffic networks better. For example, the DCRNNref31 model captures the spatial correlation by a bidirectional random tour on the graph and the temporal correlation by diffusion convolution. STGCNref32 combines graph and 1D convolution to capture spatial correlation of traffic road networks through GCN and then captures temporal correlation using 1D convolutional networks. The T-GCNref33 model combines GCN and GRU to capture traffic data’s spatial and temporal correlation. GMANref34 introduces a Spatial-temporal attention mechanism to capture the spatial and temporal correlation of traffic data dynamically. MTGNNref35 uses a hybrid jump propagation layer and an expanded starting layer to capture spatial and temporal correlations. Graph Wavenetref36 combines GCN with a comprehensive causal convolutional network and proposes an adaptive adjacency matrix to complement the predefined adjacency matrix to capture Spatial-temporal correlation. GWNET-convref37 introduces a new covariance loss on Graph WaveNet to significantly improve the forecasting accuracy of Graph WaveNet. DGCRNref38 uses a supernetwork to generate the adjacency matrix and merge it with the original road network matrix to capture spatial correlations dynamically. STGNNref39 model provides a learnable spatial graph neural network of location attention mechanisms and captures local and global temporal correlations using GRU and transformer layers. STFGNNref40 effectively learns spatial and temporal correlations by fusing multiple spatial and temporal graphs. STCGATref41 dynamically captures the spatial correlations of traffic road networks through Graph Attention Network (GATref13) and uses the proposed Causal Temporal Convolutional Network(CTCN) to capture traffic data’s causal and temporal correlations.

Inspired by the above methods, this paper proposes a new traffic forecasting method that not only captures the fine-grained spatial correlation of traffic networks but also captures the temporal correlation of time series more accurately by analysing the hidden contextual correlation in traffic time-series data, thus substantially improving the accuracy of traffic flow forecasting.

3 Methods

3.1 Problem Description and Definition

In this paper, traffic forecasting is the analysis of trends in traffic conditions over time-based on the historical traffic conditions of the traffic road network. Traffic condition is a general concept such as traffic speed, flow and density. We use traffic speed as traffic condition information.

: We consider each road as a node in the graph and use the weighted graph to express the topological structure information of the traffic road network with to denote the set of road nodes on the traffic road network graph, is a set of edges connecting these road nodes, and denotes the distance between all connected road nodes information. The adjacency matrix represents the connection relationship between road nodes. For example, for any two nodes and , the values of and are if the two nodes are connected, and the importance of the two elements is set to if they are not connected.

: We use the traffic speed on each road as feature information and represent it using the feature matrix , where denotes the number of node feature attributes(length of historical timing information) and indicates the feature vector of all nodes at any moment.

According to the above definition, the traffic forecasting problem aims to learn a mapping function that graph the weighted graph and the historical traffic data with step size to the subsequent traffic data with step size next. As shown in Equation (1).


3.2 The Model Architecture

The framework of the proposed RGCRN is shown in Figure 2. RGCRN consists of two main components. resGCN and BiGRU. Specifically, we first model the spatial correlation of the traffic road network using ResGCN and then model the spatially correlated time series output from ResGCN using BiGRU to obtain the temporal by analysing the change in information transfer between the forward and reverse neurons of the time series data correlation. Finally, the forecasting results are output using the fully connected layer.

Figure 2: Residual Graph Convolutional Recurrent Network framework.

3.2.1 Spatial Correlation Modeling

In traffic forecasting, it is essential to capture the spatial correlation of traffic road networks. Therefore, we propose that ResGCN capture the spatial correlation of traffic road networks in a fine-grained manner. As shown in b of Figure 2

, we discard the graph convolution layer is multiple linear transform operations, aggregate the feature vectors of all nodes in the graph at once, and then process them using a one-dimensional convolutional network to obtain the newly generated feature vectors. In addition, we also use residual connections to avoid the network degradation problem caused by the stacking of convolutional layers. As shown in c of Figure

2, each ResGCN Layer contains two 1D graph convolution modules, and Weight Norm and Dropout regularisation are applied to the computed results. Specifically, we input the feature set of all nodes at the moment into the ResGCN Layer at layer , and the output of this layer is obtained after calculation. The specific calculation process is shown in Equation (2).


where is the normalized Laplacian matrix to express the proximity relationship between road nodes, , is the unit diagonal matrix, is the degree matrix of . and are the convolution kernels of the first and second 1D-CNN, respectively, denoting the convolution operation, is the regularization function;

represents the GELU activation function,

indicates the mapping function of the neural network, is the set of feature vectors at the output of the residual map convolution layer. is the number of channels of the 1D convolutional network.

Until the residual map convolution layer at time completes the computation, we aggregate all the output results to obtain the final output of ResGCN . The calculation process is shown in Equation (3).


3.2.2 Temporal Correlation Modeling

To capture the Spatial-temporal correlation of the traffic network simultaneously, we use BiGRU to capture the temporal correlation of the time series data. To facilitate the description of the processing of BiGRU, we present the forward (rightward) GRU as an example. Figure 2 middle d shows that the GRU consists of several structurally identical neurons connected sequentially. There are two gating units within each neuron: 1) Reset Gate: control how much information needs to be forgotten from the last moment; 2) Update Gate: controls how much information can be saved from the previous moment and the current moment. Specifically, we take the output result in in Equation (3) as the input data of the neuron at time . The calculation procedure is shown in Equation (4).


where , . denotes the hidden state at the previous moment, is the reset gate operation, is the update gate operation, is the memory content stored at time , and is the output state of the neuron at time .

is the sigmoid function,

, , are the weight matrices, , , are the bias terms, and is the multiplication operation by elements. Until the last neuron finishes its operation, we get the output of the forward GRU , is the number of GRU hidden layers.

Like the forward GRU, the reverse(leftward) GRU processes the input time-series data from the opposite direction to obtain . Finally, we stitch the output results of forward and reverse GRU to get the output results of BiGRU. As shown in Equation (5).


Where is the final output, represents the GELU activation function, and is the splicing operation.

3.3 Multi-step Traffic Forecasting

Finally, this paper uses a fully connected neural network to implement a multi-step traffic flow forecasting task. Specifically, in Equation (5) is used as the input data, and the final forecasting result is obtained by learning a weight matrix . The specific calculation procedure is shown in Equation (6).


Where denotes the mapping function, and is the time length of the forecasting.

In the training process, to minimise the error between the forecasting result in and the actual value

. This paper chooses the L1 loss function to optimize the neural network model. The specific calculation process is shown in Equation (



4 Experiment

4.1 Datasets

In this paper, experiments are conducted on the following two real traffic datasets.

  • : This is a data set of traffic speeds in miles per hour derived from 207 different roadway loop detectors recorded on Los Angeles freeways. The recording period is from March 1, 2012, to June 30, 2012.

  • : This is a dataset of traffic speeds in miles per hour recorded by 325 different roadway sensors in the Bay Area selected from the Performance Measurement System (PeMS). The recording period is from January 1, 2017, to May 31, 2017.

In both datasets, aggregate the traffic data every 5 minutes, divide the dataset into training, validation, and test sets in the ratio of , and then process the above split data by a sliding window of length . Where denotes the size of the historical data, and denotes the length of the data to be predicted. In our experiments, both and were set to 12. Table 1 shows statistical information about the dataset.

Dataset Samples Sensors Unit Length length
METR-LA 34272 207 5 min 12 12

52116 325 5 min 12 12
Table 1: Dataset Statistics.

4.2 Baseline Methods

We compared RGCRN with some advanced traffic forecasting models in recent years, and these baseline models are described below.

  • HAref21: The method averages the historical traffic speed as the forecast result.

  • ARIMAref22: It combines autoregressive and moving average models.

  • VARref23: This is a classical time-series forecasting model using a linear kernel.

  • SVRref24: The linear kernel pair model is selected for training.

  • FC-LSTMref8: LSTM with fully connected network.

  • DCRNNref31: The model uses bidirectional GCN to model the spatial information of the traffic network, which is modelled with GRU for time-series data.

  • STGCNref32: The model combines graph convolution with one-dimensional convolution to capture Spatial-temporal correlations.

  • T-GCNref33: The model captures the spatial and temporal correlation of the traffic network using GCN and GRU, respectively.

  • GMANref34: This model is based on attentional mechanisms that incorporate spatial, temporal, and transformational attention.

  • MTGNNref35: This is a forecasting model for multivariate time series from a graph perspective using graph neural networks.

  • Graph WaveNetref36: The model combines diffusion graph convolution with a one-dimensional extended convolution of the selection pass and proposes an adaptive adjacency matrix.

  • GWNET-convref37: This is a new loss function, namely covariance loss, is introduced based on Graph WaveNet.

  • DGCRNref38: The model proposes a supernetwork with adaptive step-by-step generation of dynamic adjacency matrices, which significantly improves the forecasting performance.

  • STGNNref39: The model provides a learnable location attention mechanism and a sequence component to capture spatial and temporal correlations.

  • STFGNNref40: The model assembles the extended CNN module in parallel with the spatial-temporal fusion graph module to extract long-range spatial-temporal correlations.

  • STCGATref41: The model consists of GAT and CTCN that dynamically captures spatial correlation and catches potential causal time correlation.

  • RGCRN(S): The model is a variant of RGCRN, which simplifies the processing of the residual map convolution module, fuses the time dimension and feature dimension of each node, and aggregates the feature information of the sliding time window at one time, further improving the forecasting efficiency.

4.3 Experimental Settings and Evaluation Metrics

The model was implemented by Pytorch 1.10.0, and all experiments were performed on an Nvidia GeForce RTX 2080Ti GPU, and all experiments were repeated ten times to select the best experimental results. In addition, we used the same hyperparameter settings for both METR-LA and PEMS-BAY. Expressly, we set the size of the 1D convolutional kernel to 3, the number of GRU hidden layer units to 128, the batch size to 64, the learning rate to 0.001, and the maximum number of iterations to 150, using the Adam optimiser for optimisation.

To measure the model’s forecasting performance, we evaluated the labelled value and the predicted value using the following three metrics.

  • Mean Absolute Error(MAE):

  • Root Mean Squared Error(RMSE):

  • Mean Absolute Percentage Error(MAPE):


The smaller the value of the above three metrics, the better the forecasting model’s performance.

4.4 Experiment Results and Analysis

As shown in Tables 2 and 3, RGCRN achieved the best predictive performance for all metrics in all forecasting periods in the comparison experiments with all baseline models. In addition, we found that the forecasting performance of RGCRN on the METR-LA was weaker than that on the PEMS-BAY, which makes METR-LA more challenging. Therefore, we performed the following analysis using the forecasting results on the METR-LA.

Model Time 15 Min 30 Min 60 Min


4.16 7.80 13.00% 4.16 7.80 13.00% 4.16 7.80 13.00%

3.99 8.21 9.60% 5.15 10.45 12.70% 6.90 13.23 17.40%

4.42 7.89 10.20% 5.41 9.13 12.70% 6.52 10.11 15.80%

3.99 8.45 9.30% 5.05 10.87 12.10% 6.72 13.76 16.70%
FC-LSTM 3.44 6.30 9.60% 3.77 7.23 10.90% 4.37 8.69 13.20%

2.77 5.38 7.30% 3.15 6.45 8.80% 3.60 7.60 10.50%

2.88 5.74 7.62% 3.47 7.24 9.57% 4.59 9.40 12.70%

3.03 5.26 7.81% 3.52 6.12 9.45% 4.30 7.31 11.80%

2.80 5.55 7.41% 3.12 6.49 8.73% 3.44 7.35 10.07%

2.69 5.18 6.86% 3.05 6.17 8.19% 3.49 7.23 9.87%

Graph WaveNet
2.69 5.15 6.90% 3.07 6.22 8.37% 3.53 7.37 10.01%

2.69 5.14 6.83% 3.07 6.17 8.26% 3.53 7.27 9.85%

2.62 5.01 6.63% 2.99 6.05 8.02% 3.44 7.19 9.73%

2.62 4.99 6.55% 2.98 5.88 7.77% 3.49 6.94 9.69%

2.56 4.70 6.46% 2.83 5.44 7.45% 3.17 6.37 8.73%

0.42 1.33 1.00% 0.60 2.30 1.46% 1.48 5.96 2.97%

Table 2: Experimental Results On METR-LA Dataset.
Model Time 15 Min 30 Min 60 Min


2.88 5.59 6.80% 2.88 5.59 6.80% 2.88 5.59 6.80%

1.62 3.30 3.50% 2.33 4.76 5.40% 3.38 6.50 8.30%

1.74 3.16 3.60% 2.32 4.25 5.00% 2.93 5.44 6.50%

1.85 3.59 3.80% 2.48 5.18 5.50% 3.28 7.08 8.00%
FC-LSTM 2.05 4.19 4.80% 2.20 4.55 5.20% 2.37 4.96 5.70%

1.38 2.95 2.90% 1.74 3.97 3.90% 2.07 4.74 4.90%

1.36 2.96 2.90% 1.81 4.27 4.17% 2.49 5.69 5.79%

1.50 2.83 3.14% 1.73 3.40 3.76% 2.18 4.35 4.94%

1.34 2.91 2.86% 1.63 3.76 3.68% 1.86 4.32 4.37%

1.32 2.79 2.77% 1.65 3.74 3.69% 1.94 4.49 4.53%

Graph WaveNet
1.30 2.74 2.73% 1.63 3.70 3.67% 1.95 4.52 4.63%

1.30 2.73 2.69% 1.62 3.67 3.59% 1.91 4.40 4.47%

1.28 2.69 2.66% 1.59 3.63 3.55% 1.89 4.42 4.43%

1.17 2.43 2.34% 1.46 3.27 3.09% 1.83 4.20 4.15%

1.15 2.31 2.39% 1.38 3.00 3.01% 1.66 3.71 3.74%

0.30 0.72 0.66% 0.36 1.19 0.81% 0.61 2.27 1.38%

Table 3: Experimental Results On PEMS-BAY Dataset.
  • : We can find that deep learning-based forecasting methods, on both datasets, usually have better forecasting accuracy than other probabilistic statistics and machine learning-based techniques (e.g., HA, ARIMA, VAR, and SVR). This is due to the difficulty of these traditional baseline methods in handling complex nonlinear time series data. In the spatiotemporal forecasting model, RGCRN(S) reduces MAE by about

    , RMSE by , and MAPE by on the 15-minute traffic forecasting task compared with the optimal baseline method STCGAT. On the 30-minute traffic forecasting task, MAE was reduced by about , RMSE by , and MAPE by . On the 60-minute traffic forecasting task, MAE was decreased by about , RMSE by , and MAPE by .

  • : From Table 2, we can observe that RGCRN outperforms other Spatial-temporal forecasting methods in both short-term and long-term traffic forecasting. Among these compared methods, GMAN uses a self-attention-based architecture and a Spatial-temporal embedding module that facilitates remote forecasting. However, self-attention does not capture local sequence correlations, and Spatial-temporal embedding is relatively simple for modelling complex Spatial-temporal correlation, which reduces the performance of short-term forecasting. Graph WaveNet and MTGNN can accurately capture the spatial correlation of traffic networks from the adaptive adjacency matrix. However, the adaptive graphs are still static and cannot capture dynamic spatial correlation at each time step as time changes, making long-term Spatial-temporal forecasting relatively poor. In addition, most of the other baselines cannot model the dynamic characteristics of the traffic network structure, limiting their representation capability. In contrast, STCGAT has the best Spatial-temporal forecasting performance among the baseline methods. This is mainly because STCGAT uses GAT to dynamically capture the spatial correlation of the traffic road network, which is then integrated with TCN by BiGRU to capture the causal time correlation of the time series.

  • : RGCRN(S) is a simplified model of RGCRN. Precisely, RGCRN(S) no longer stacks equal-sized RGC layers according to the length of the time window. Instead, the time dimension and feature dimension of the time series data input to the residual map convolution module are fused to process the time series of all moments at once. From Table 2

    , we can observe that the forecasting performance of RGCRN on the METR-LA dataset is better than RGCRN(S) overall. However, on the PEMS-BAY dataset, RGCRN did not perform well as RGCRN(S) in both cases. This may be due to the more complex structure of RGCRN, which is more suitable for handling data with complex systems by stacking multiple RGC layers, making RGCRN have better high-level feature extraction capability. However, as shown in Table

    4, RGCRN requires more adequate resources and time for training than RGCRN(S).


    0.75 min/epoch

    1.45 min/epoch

    0.17 min/epoch 0.30 min/epoch
    Table 4: Time Required For RGCRN and RGCRN(S) To Complete An Epoch.

4.5 Ablation Study on Model Architecture

To analyse the validity of the component components of our model, we designed several variants of the RGCRN model and performed ablation experiments on the METR-LA dataset.

  • : The model removes the one-dimensional convolutional layer from the RGCRN and replaces it with a linear layer.

  • : The model replaces the ResGCN in the RGCRN with a double-layer GCN.

  • : The model removes the inverse GRU from the RGCRN and captures the time correlation of the time series using a single layer GRU.

  • : The model removes the residual connectivity of RGCRN.

Model Time 15 Min 30 Min 60 Min

w /o CNN 1.75 2.72 3.90% 1.80 3.09 4.16% 1.99 4.38 5.08%
w /o RGC 0.61 1.51 1.31% 0.80 2.57 1.78% 1.42 4.46 3.29%
w /o GRU 1.98 3.47 4.09% 2.34 4.34 4.78% 3.04 5.92 6.61%
w /o Res 0.31 1.36 0.61% 0.47 2.41 1.24% 0.98 4.12 2.57%
RGCRN 0.22 1.25 0.60% 0.42 2.26 1.12% 0.89 4.08 2.44%
Table 5: Results Of Ablation Experiments On MATR-LA Dataset.

As shown in Table 5, the predictive performance metrics data for RGCRN and the four variants of this model at three different time steps are presented. We can observe that the w/o GRU model has the worst forecasting performance, proving the necessity of using BiGRU to analyse the contextual relevance of the time series. From the experimental results of w/o CNN, we can observe that using a one-dimensional convolutional network is more advantageous than using a traditional linear layer. Meanwhile, our proposed ResGCN capturing spatial correlation is superior to the two-layer GCN by w/o RGC. Finally, we can find in the experimental results that the RGCRN is similar to w/o Res in three metrics. This may be because the depth of the network is not deep, and there is no network degradation.

Meanwhile, RGCRN is slightly better than w/o Res in all cases. This indicates the effectiveness of residual connectivity, which improves the predictive power of RGCRN. In addition to this, Figure 3 shows the average metric data for RGCRN and other baseline models within the one-hour forecasting task. This proves that RGCRN also has the best performance overall, demonstrating the effectiveness of the RGCRN components.

Figure 3: Average Predictive Performance Metrics.

5 Conclusion

In this paper, we propose a new traffic forecasting model. The spatial correlation of the traffic road network is captured using our proposed residual map convolutional network. The temporal correlation of the traffic flow sequence is captured using a bi-directional gating unit. We conducted comparison experiments with other advanced baseline models on two real traffic datasets to verify the model’s effectiveness. The experimental results show that all the metrics of our model achieve SOTA. The forecasting accuracy and performance are substantially improved compared with existing methods, bringing a new solution to the traffic forecasting task, which is essential for constructing intelligent transportation systems. In future work, we will continue to focus on the following two issues: (1) how to adaptively capture the spatial correlation of dynamic graphs when the structure of the traffic justification network atlas graph changes; (2) how to model unexpected traffic incidents and random traffic events to capture global temporal correlation.