KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

by   Jiawei Zhu, et al.
Central South University

When considering the spatial and temporal features of traffic, capturing the impacts of various external factors on travel is an important step towards achieving accurate traffic forecasting. The impacts of external factors on the traffic flow have complex correlations. However, existing studies seldom consider external factors or neglecting the effect of the complex correlations among external factors on traffic. Intuitively, knowledge graphs can naturally describe these correlations, but knowledge graphs and traffic networks are essentially heterogeneous networks; thus, it is a challenging problem to integrate the information in both networks. We propose a knowledge representation-driven traffic forecasting method based on spatiotemporal graph convolutional networks. We first construct a city knowledge graph for traffic forecasting, then use KS-Cells to combine the information from the knowledge graph and the traffic network, and finally, capture the temporal changes of the traffic state with GRU. Testing on real-world datasets shows that the KST-GCN has higher accuracy than the baseline traffic forecasting methods at various prediction horizons. We provide a new way to integrate knowledge and the spatiotemporal features of data for traffic forecasting tasks. Without any loss of generality, the proposed method can also be extended to other spatiotemporal forecasting tasks.



There are no comments yet.


page 1


AST-GCN: Attribute-Augmented Spatiotemporal Graph Convolutional Network for Traffic Forecasting

Traffic forecasting is a fundamental and challenging task in the field o...

A3T-GCN: Attention Temporal Graph Convolutional Network for Traffic Forecasting

Accurate real-time traffic forecasting is a core technological problem a...

Adaptive Multi-receptive Field Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

Mobile network traffic forecasting is one of the key functions in daily ...

Incorporating Reachability Knowledge into a Multi-Spatial Graph Convolution Based Seq2Seq Model for Traffic Forecasting

Accurate traffic state prediction is the foundation of transportation co...

Crime Prediction with Graph Neural Networks and Multivariate Normal Distributions

Existing approaches to the crime prediction problem are unsuccessful in ...

PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting

The complex spatial-temporal correlations in transportation networks mak...

Hybrid Spatio-Temporal Graph Convolutional Network: Improving Traffic Prediction with Navigation Data

Traffic forecasting has recently attracted increasing interest due to th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the steady increase in vehicle ownership, the demands on transportation also gradually increase. As a result, a series of problems such as traffic congestion and traffic accidents become significant. The emergence of intelligent transportation systems (ITSs) can effectively solve these problems [35]. Traffic flow forecasting, as one of the key technologies in the field of intelligent transportation, has become a popular research topic. First, it provides data support and suggestions for urban management. Second, it also provides travelers with reliable traffic prediction reports to develop optimal routes, which saves travelers time and improves travel efficiency.

The task of traffic forecasting is to predict the traffic flow states over a future period of time based on historical traffic information. Traffic flow states have a strong spatial and temporal correlation [21, 2]

, which is affected not only by the previous traffic conditions of the upstream and the downstream traffic flows of the current monitoring point but also by the historical traffic conditions of the neighboring roads. With the development of deep learning, a large number of researchers have used recurrent neural networks, such as the long short-term memory (LSTM) network

[19, 27]

and the gated recurrent unit (GRU)


network, to model the temporal dependence of traffic flows. To characterize the spatial dependence of traffic flows, some studies use convolutional neural networks (CNNs) to extract spatial information and combine them with LSTM

[17] to improve the prediction accuracy. In recent years, GNNs [38], which are applicable to non-Euclidean structures such as road networks, have emerged to better model the spatial dependence of roads to improve the prediction accuracy [36].

In addition, traffic information may be affected by multiple external factors, such as weather conditions, the presence of transportation stations, emergency events, holidays, and the distribution of nearby POIs [13]. These external factors have either direct or indirect relations with the traffic information that can influence the traffic conditions in the city. However, the few existing studies that consider external factors [14, 33] simply consider external factors and ignore the influence of the interrelationships between traffic information and external factors on traffic. For example, the weather changes over time, and the traffic flows in different weather conditions may have different states. Nonetheless, road sections are not uniformly affected by the weather, and we should consider other attributes of road sections. The less popular road sections with fewer surrounding facilities have less road load, so they are less affected by heavy rain in contrast to popular road sections downtown. How to integrate the semantic correlation of multisource data is the key to improving the ability to predict traffic flows. In recent years, the emergence of knowledge graphs has provided broader ideas for the above problem.

To address these issues, we represent traffic information and multiple external factors as a heterogeneous semantic network, that is, a city knowledge graph, and then adopt knowledge graph representation methods to capture the knowledge structures and the semantic relationships between traffic information and external factors. Our contributions are as follows:

  1. In order to consider the semantic correlations among various external factors that may affect traffic flows, we propose a traffic flow forecasting method. We adopt knowledge graph representation methods to incorporate external factors in the prediction model. Comparative experiments with models that directly use external factors as input demonstrate that it is effective to introduce semantic correlations.

  2. We evaluate the proposed model on real-world datasets. Under various prediction horizons, the prediction accuracies of the KST-GCN are higher than those of the current baseline methods.

  3. We conduct ablation experiments to further prove the validity of interattribute knowledge structure information and semantic relationships in traffic forecasting and verify the influence of dynamic and static external factors on traffic forecasting.

Ii Related Work

Ii-a Traffic Flow Forecasting

Conventional prediction models, including historical averages, time series, and Kalman filtering, often use statistical analysis to predict traffic conditions. The historical average model directly utilizes the average value of historical data as the prediction result. The time series model uses the relationship between current data and historical data and considers the periodicity and the tendency of the data to make predictions. The ARMA model

[1] proposed in 1979 is an important method to study time series. It consists of an autoregressive (AR) model and a moving average (MA) model. The AR model uses the autocorrelation function to find model parameters and predict the time series using original historical data, while the MA model accumulates the error term of the autocorrelation function. The ARIMA model [8] is a generalized version of the ARMA with an additional component of automatic differentiation, and both the ARIMA and ARMA models take the stationarity of the time series as a starting point. The Kalman filtering model uses a state space defined by a state equation and an observation to filter out the noise to make predictions.

With the continuous development of machine learning and deep learning, the advantages of intelligent forecasting models are becoming increasingly more prominent. These models take a large quantity of collected historical traffic data as the input and then automatically learn the potential patterns and features in the data to predict traffic states. Intelligent forecasting models can be mainly divided into two categories: conventional machine learning approaches and deep learning approaches. Neural networks, as one of the most used approaches, learn the nonlinear relations in the input data to make predictions. Artificial neural networks (ANNs) and the support vector regression (SVR) are two common models for practical prediction tasks. The SVR learns nonlinear statistical patterns using sufficient features from historical data. The k-nearest neighbors and fuzzy logic models are two additional examples of nonlinear parametric models. Alternatively, an ANN


adjusts its weights and biases via backpropagation or a radial basis function (RBF)


and obtains linear prediction results after applying a nonlinear activation function.

The models introduced above use historical traffic state data to predict the future. For a road network composed of many road section nodes, the adjacency between road section nodes will either directly or indirectly affect the final prediction. The Bayesian network (BN) analyzes the adjacency relationships in road networks to predict traffic conditions. Another model that uses topological information from road networks is the graph convolutional network (GCN) whose input consists of an adjacency matrix and a feature matrix. The adjacency matrix provides the topological features of a road network, and the feature matrix includes traffic information. The GCN captures the connection relationships between road section nodes to forecast future traffic conditions. However, these models only retain information about the spatial relationships in road networks and lack the capability to capture the temporal relationships in the feature matrix. Correspondingly, models such as the feed forward NN

[23], the DBN [9], the RNN [37] and the RNN variants GRU [5] and LSTM [28] capture the tendencies and periodicity of traffic features, but they ignore the intrinsic topological characteristics of the urban traffic network. Many researchers have noticed this issue, and numerous spatiotemporal forecasting models that fully utilize both the topological structures of networks and the temporal dependence in traffic data have been proposed. Such models include the ST-ResNet [34], SAE [18], FCL-Net [31], DCRNN [32] and T-GCN [36], among others.

In addition to historical traffic information, traffic states may be affected by a variety of external factors, such as weather conditions, metro station and bus stop information, POIs, and other factors. The main challenge of the current traffic forecasting task is to integrate external factor information into prediction models. Some methods that consider multisource data have been proposed in previous studies. Liao B et al. [14] encoded external information by using an encoder based on LSTM [28] and treated the integrated multimodal data as the input sequence of the prediction model. Based on GRU, the model proposed by Da Z et al. [33] fuses the input traffic features and weather information.

Ii-B Relation Mining in Multisource Data

Relations in multisource data are mostly presented in the form of networks, and mining the structural and relational information contained in networks through representation vectors becomes the main approach to capture network information. In general, networks can be divided into homogeneous networks and heterogeneous networks according to the types of nodes. Homogeneous networks only consider one type of data, that is, the types of nodes must be identical; however, the majority of real networks have different types of nodes. To overcome the limitation of homogeneous nodes, heterogeneous networks are proposed to represent the information of different types of nodes and the relationships between nodes. PTE [26]classifies texts, words, and labels and represents their pairwise relations to construct heterogeneous networks. [6] and [7] propose the HEBE embedding framework, which models the events with strong correlations as a whole to construct heterogeneous networks of events. A major drawback of heterogeneous networks is that accurate metapaths should be constructed when representing the relations between nodes, and specific metapaths may cause heterogeneous networks to be restricted to the framework of a particular network. In recent years, the emergence of knowledge graphs has provided broader ideas for the above problem. The modern concept of knowledge was first proposed by Google and then began to be applied in various fields. Because of the power of knowledge graphs in processing graphical structures and information, increasingly more researchers have begun to understand and apply knowledge graphs in various fields, such as social networks [22], search engines [11], intelligent Q&A systems and intelligent recommendations [24]. Knowledge graphs are also applied in industries such as e-commerce [29]. They also play roles in transportation, such as in site selection [25] and traffic accidents [30, 20].

Iii Methods

Iii-a Framework

In order to predict traffic considering the correlation between traffic information and external factors, we combine the knowledge graph embedding model and the spatiotemporal graph convolution network and propose a knowledge-driven spatiotemporal graph convolutional network (KST-GCN).

Fig. 1: The framework of the KST-GCN.

As shown in Figure 1, the KST-GCN first captures the information of the knowledge structures and semantic relations between traffic information and attributes through the knowledge graph representation method and then combines the GCN and GRU to capture the spatial and temporal dependence of traffic features. Thus, the KST-GCN is able to capture the spatiotemporal features of traffic in addition to the knowledge structures and semantic relations between traffic information and attributes.

The traffic forecasting problem can be considered as learning the function to calculate the traffic characteristics of a future period given the traffic network topology , the feature matrix , and the city knowledge graph :


where is the adjacency matrix of a road network. For each element in , 1 means that the corresponding two road sections are connected and 0 means that they are not connected. Feature matrix represents the inherent properties of each node in the urban road network, and we choose the traffic speed as the inherent property to be predicted in this paper, i.e., the speed of the traffic on the

road section at moment

. For more details on city knowledge graph , refer to SectionsIII-B2 and IV-A1.

Iii-B Knowledge Graph Representation Learning

Iii-B1 Knowledge Graph

A knowledge graph, which enables the fusion of data from multiple sources while preserving the original information, can be defined as a knowledge network composed of multiple triples (head, relation, tail) with semantic information and network structures [4]. The head and the tail in a triple are both entities, and the relation is a semantic relation between entities, which can represent heterogeneous nodes and multi-relational information. An illustrative example of a knowledge graph is shown in Figure 2. Hierarchical and semantic relations between the concepts can be discovered from the figure. For example, “Shenzhen belongs to China” can be represented as a (Shenzhen city, belongs to, China) triple.

Fig. 2: An illustrative example of a knowledge graph.

Iii-B2 Knowledge Representation

Most knowledge graphs (KGs) use entity-relation-based representations. The relation in the triple is used to represent both attributes and the relationship between entities at the same time. However, in fact, a relation in a semantic relationship between two entities and an attribute is a property of an entity itself. Usually, attributes and entities have one-to-many, many-to-one, or many-to-many relationships; and the properties of attributes and entities are not identical. The relationships between attribute factors and the traffic road network are many-to-one and one-to-many. Thus, in principle, knowledge graph representations that distinguish attribute and relational information are more suitable for capturing traffic information, attribute knowledge structures, and semantic information. Therefore, we adopt the entity-attribute-relationship-based knowledge graph representation model KR-EAR [16] to capture the knowledge structure and semantic information between road sections and external factors.

Fig. 3: Entity-relationship-based representation method (left) and entity-attribute-relationship-based representation method (right). The circles indicate entities while the squares and pentagons indicate the values of two different types of attributes.

Specifically, as shown in Figure 3, the left dashed box is the knowledge graph constructed based on the entity-relationship-based representation, and the right dashed box is the entity-attribute-relationship-based knowledge graph representation adopted by KR-EAR. is the class of relationships in the knowledge graph, where and are the attribute classes that connect the two classes of attribute values to entities. In the entity-relationship-based model, all triples are modeled for learning representations, while in the entity-attribute-relationship-based model, the triples are split into attribute triples and relation triples (as shown in the upper and lower parts on the right side of Figure 3). For example, is an attribute triple and is a relation triple.

In this paper, roads, attributes, and the relationships between them are expressed in the form of triples of , where R is the relation triple that represents the adjacency between road sections, as shown in Equation 2. is the attribute triple that represents the correspondence between roads and attributes and attribute values, as shown in Equation 3. The copresent relations between attributes are defined as Equation 4.


In the equations above, denotes the adjacency of road sections, and denotes attributes such as weather and numbers of POIs.

denotes the value of an attribute (e.g., weather, sunny). Then, based on the entity-attribute-relationship knowledge representation, KR-EAR defines the objective function by maximizing the joint probability of relation triples and attribute triples given the embedding vector



where denotes the conditional probability of the relation triple and denotes the conditional probability of the attribute triple . In terms of relation triple encoding, the structural and semantic relations between entities are encoded via TransE [3] and TransR [15]. The equation for calculating the conditional probability of a relation triple in the objective function can be defined as:


Equations 7 and 8

are loss functions of TransE and TransR, respectively.

is a bias term, and in Equation 8 stands for the transfer matrix.

A classification model is applied to capture the correlation between entities and attributes for attribute triple encoding. The equation for calculating the conditional probability of an attribute triple in the objective function can be defined as:


where is a nonlinear function, is the embedding vector of attribute and is a bias term. In this way, KR-EAR represents relations and attributes independently and strengthens the constraints between attributes.

Iii-C KS-Cell

To obtain information about the relationship between road sections and attribute knowledge and the correlation between the attributes, and to model the spatial dependence of traffic flows based on this information, we design KS-Cell, which combines KR-EAR and a graph convolutional network. The structure of KS-Cell is shown in Figure 4.

Fig. 4: KS-Cell.

The inputs to KS-Cell are traffic features , geographic knowledge graph KG and road network at time . The outputs are traffic state at time . Due to the diversity of external factors, we divided the external factors into two categories: the static factors and the dynamic factors. and in Figure 4 denote the node representation vectors of the static and dynamic external factors, respectively, after applying KR-EAR. and are the weight and bias parameters, respectively.

To model the spatial dependence of a traffic flow based on knowledge representation, we use the node features which that fuse knowledge representations and the adjacency matrix as the input of the GCN. Compared to the CNN, the GCN is more suitable for non-Euclidean structured data such as traffic networks. It is based on the idea that each node in the network is influenced by its own and its adjacent neighboring nodes. The GCN uses graph spectral theory to capture the topological relations and features in the network and obtain the representation vector of each node:


where is an activation function, is the adjacency matrix, is the feature matrix, is the adjacency matrix with self-connections, is the degree matrix of , is the weight matrix of the convolution layer, and is the output of the nonlinear combination of node features of the layer. In the first layer, has an initial value of feature matrix .

In summary, the result of KS-Cell can be described as . represents the process of knowledge representation process, and represents the convolution process of the GCN.

Iii-D Gru

The output of KS-Cell is input into the GRU model to capture the temporal features of a traffic flow. The GRU model consists of a reset gate and an update gate. For our task, is the output feature of KS-Cell at time . , and represent states at time , , and , respectively. is the reset gate, which combines the information in the memory and the information at the current time step. is the update gate, which can select or forget memories. is the signal used to represent the gated signal, and is the moment state at the current time step. In the memory updating phase, when acts as a forget gate, forgets unimportant previous information to update the memory; and when acts as a memory gate, remembers important information from the current node. indicates the state information at the current time step after adding the information of .


Iv Experiments

Iv-a Data Description

The experiments in this paper are all based on taxi tracking data and multisource data from Luohu district, Shenzhen. The dataset contains taxi track data from January 1 to January 31, 2015 and the road network data, weather data, and POI data of each street. Our study area includes 156 road sections and 9 types of POIs: food services, enterprises, shopping services, transportation services, education services, living services, medical services, accommodation services, and others. In addition, the weather data, including the temperature, weather conditions, wind speed, humidity, barometric pressure, and visibility, of the study area at 15 min intervals in January were crawled as auxiliary data. We classify weather conditions into five categories: sunny, cloudy, foggy, light rain, and heavy rain. Due to the difficulty of constructing the knowledge graph, we only conduct experiments on one validation dataset. Due to the generality of our experimental setup, the results can be extrapolated to the traffic data of other cities.

Iv-A1 Knowledge Graph

We count the number of POIs on each section and then use the road sections, categories, and numbers of POIs to construct attribute triples. For example, (road section 1, restaurant, 15) and (road section 1, school, 6) indicate that there are 6 schools and 15 restaurants on road section 1. Time, weather conditions, and their correlations, such as (road section ID, weather condition, moment) and (moment t, weather, light rain), are used to construct the city weather knowledge graph. In addition, the input data need to be preprocessed before being input into the knowledge graph embedding model. The input data include the road network triples (head entity, relation, tail entity), external factor triples (entity, attribute, attribute value), and attribute co-occurrence triples (attribute 1, attribute 2, co-occurrence probability). The attribute co-occurrence probability describes the probability of two attributes existing on the same road section. The data structures of these triples are shown in Table I, Table II, and Table III.

Head Entity Relation Tail Entity
90217 adj 95968
90225 adj2 95504
TABLE I: Road Network Triples
Head Entity Relation Tail Entity
90217 transportation service 4
90217 food services 31
TABLE II: External Factor Triples
Attribute Attribute Co-occurrence Possibility
transportation service food service 0.016
shopping service food service 0.255
TABLE III: Attribute Co-occurrence Triples

Part of the final Shenzhen city knowledge graph is shown in Figure 5.

Fig. 5: The city knowledge graph of Shenzhen.

Iv-A2 Hyperparameter Settings

The hyperparameters of the KST-GCN mainly include the learning rate, the number of training epochs, the number of hidden units, the dimension of knowledge graph embedding dimension, the batch size, and the proportion of the dataset used for training. Based on experience, the learning rate is set to 0.001, and the batch size is set to 64. The number of hidden units and the embedding dimension are two important hyperparameters for our model, as they have the greatest impacts on the prediction results. We conduct experiments to choose the appropriate values for these two hyperparameters.

Fig. 6: Performance under different hyperparameter settings
  1. Embedding dimension: We choose the embedding dimension from [5, 10, 15, 20, 30] to analyze the model performances. Figure 6 (a) shows the RMSE, MAE, accuracy, and of the models with different embedding dimensions settings. It can be seen that the KST-GCN has the best performance when the embedding dimension is set to 20.

  2. The number of hidden units: We fix the embedding dimension to 20 and choose the number of hidden units from [18, 32, 64, 128, 256]. Figure 6 (b) shows the RMSE, MAE, accuracy, and of the models with different numbers of hidden units. When the number of hidden units is set to 128 or 256, the KST-GCN has the best performance. Considering the cost of training costs, we choose 128 as the number of hidden units in the following experiments.

Iv-B Results and Analysis

RMSE 7.2203 6.7708 5.6419 5.0649 4.5000 4.0696 4.0443
MAE 4.7762 4.6656 4.2265 2.5988 3.1700 2.7460 2.7090
Accuracy 0.706 0.3852 0.6119 0.7243 0.2913 0.7165 0.7306
r2 0.8367 * 0.6678 0.8322 0.8391 0.8388 0.8400
var 0.8375 0.0111 0.6679 0.8322 0.8391 0.8388 0.8400
TABLE IV: Experimental Comparison with Baselines

Iv-B1 Prediction Accuracy

We compare the performance of the KST-GCN with those of the baseline methods with a prediction horizon of 15 minutes. The results in Table IV

indicate that the results of the KST-GCN are better than those of the baseline methods in all five evaluation metrics, which proves that the KST-GCN is effective in capturing the knowledge structure and semantic information between roads and attributes.

Table IV shows that the KST-GCN, DCRNN, and GRU all have higher accuracy than the SVR and ARIMA with the KST-GCN reducing the RMSE by 10.12% and 0.62% over compared to the DCRNN and T-GCN, respectively, verifying that the knowledge structure between roads and attributes will help the KST-GCN improve the prediction results. Compared to the GCN, which captures spatial structures, the prediction error of the KST-GCN is lowered by 27.91%, indicating that the co-occurrence relations between attributes impact the prediction. The KST-GCN reduced the prediction errors by 43.98% and 40.27% compared to those of the SVR and ARIMA, respectively, demonstrating that both the adjacency between roads and the knowledge structures between roads and attributes can directly or indirectly affect the traffic prediction results.

Iv-B2 Prediction Horizons

Fig. 7: Performance of Different Prediction Horizons

In order to investigate the ability of the KST-GCN to perceive temporal urban traffic features and attribute features, we take 80% of the traffic data with attribute knowledge structures as the input to the KST-GCN, and change the prediction horizon (15 min, 30 min, 45 min, and 60 min) to predict traffic features and calculate the deviation of the predicted values from the ground truth. Figure 7 shows the change tendency of the accuracy of the KST-GCN for different prediction horizons. As the prediction horizon increases, the variation in the prediction error and accuracy decreases, which confirms that the KST-GCN has long-term prediction ability.

15min RMSE 4.0696 4.5000 4.0443
MAE 2.7460 3.1700 2.7090
Accuracy 0.7165 0.2913 0.7206
r2 0.8388 0.8391 0.8400
var 0.8388 0.8391 0.8400
30min RMSE 4.077 4.5600 4.0687
MAE 2.747 3.2300 2.7228
Accuracy 0.7159 0.2970 0.7201
r2 0.8377 0.8332 0.8372
var 0.8377 0.8360 0.8374
45min RMSE 4.1035 4.6000 4.0775
MAE 2.7788 3.2700 2.7698
Accuracy 0.7141 0.3021 0.7195
r2 0.8357 0.8275 0.8365
var 0.8357 0.8314 0.8365
60min RMSE 4.266 4.6400 4.0798
MAE 2.7911 3.3100 2.7768
Accuracy 0.7125 0.3069 0.7194
r2 0.8339 0.8219 0.8363
var 0.834 0.8267 0.8364
TABLE V: Performance under Different Prediction Horizons

Table V compares the prediction accuracy between the KST-GCN and the baselines. We select two representative models (the T-GCN and the DCRNN) to illustrate that the KST-GCN works better under various prediction horizons. Therefore, it can be confirmed that the KST-GCN is able to maintain stability for long-term prediction stability. For the long-term prediction, the KST-GCN performs better with a prediction error of 0.63% lower than that of the T-GCN and 11.35% lower than that of the DCRNN at a prediction horizon of 45 min. The prediction error of the KST-GCN is 1.14% lower than that of the T-GCN and 12.07% lower than that of the DCRNN at a prediction horizon of 60 min.

Iv-B3 Knowledge Representation

In this section, we verify the validity of the knowledge representation based on a knowledge graph. We compare the KST-GCN with the AST-GCN[39], a model that enhances the feature matrix by directly concatenating external factor information without any knowledge translation. The following experiments analyzed the prediction errors and accuracies of the KST-GCN and AST-GCN under different prediction horizons. The results are shown in Figure 8.

Fig. 8: Performance of KST-GCN and AST-GCN

Figure 8 shows that the accuracy of the KST-GCN is higher than that of the AST-GCN at all prediction horizons, and the gap between the KST-GCN and AST-GCN increases as the prediction horizon increases. At a prediction horizon of 60 min, the KST-GCN outperforms the AST-GCN by 54.1%. The RMSE of the KST-GCN, which represents the prediction error, is slightly higher at prediction horizons of 15 min and 30 min, but it is lower for long-term prediction, and has a milder error fluctuation. The above experimental results confirm the validity and stability of the KST-GCN in long-term prediction, and the addition of the attribute knowledge structure can assist the model in perceiving the network structure from a global perspective.

Iv-B4 Knowledge Structures

We conduct ablation experiments to further analyze the effects of knowledge structure information and semantic relations between traffic information and attributes on the traffic prediction task. In the experiments, traffic feature data with POI knowledge structure information, with weather knowledge structure information, with a fusion of weather and POI knowledge structure information, and without any additional attribute information are input into models to conduct a comparison.

Weather POI KG
RMSE 4.0696 4.5000 4.0501 4.0489 4.0443
MAE 2.7460 3.1700 2.7357 2.7428 2.7090
Accuracy 0.7165 0.2913 0.7215 0.7208 0.7206
r2 0.8388 0.8391 0.8388 0.8381 0.8400
var 0.8388 0.8391 0.8389 0.8381 0.8400
TABLE VI: Performance of KST-GCN with Different Knowledge Structures

From Table VI, it can be seen that the prediction error of the KST-GCN (weather) is 0.28% and 10% lower than that of the T-GCN and DCRNN, respectively. Regarding fusing roads with static attribute knowledge structure information, the prediction error of the KST-GCN (POI) is 0.32% and 10.02% lower than that of the T-GCN and DCRNN, respectively. Regarding fusing roads with both static and dynamic attribute knowledge structure information, the KST-GCN (weather + POI) has a 0.62% and 10.12% lower prediction error than that of the T-GCN and DCRNN, respectively. The model that incorporates static attribute knowledge structure information outperforms the model that incorporates dynamic attribute knowledge structure information, indicating that the knowledge structure between roads and static attributes is more pronounced in traffic prediction tasks. Overall, the fusion of roads and attribute knowledge structure information can assist the prediction model to some extent and improve the prediction accuracy.

Iv-B5 Robustness Analysis

Real-world urban data are rich in information but contain noise. To understand the effect of noise on the KST-GCN, Gaussian noise and Poisson noise are added to the original urban traffic data. The added noise obeys a Gaussian distribution

and a Poisson distribution

. The experimental results after adding noise are shown in Figure 9 and Figure 10. From the results, we can conclude that adding noise does not significantly affect the performance of the KST-GCN and that the KST-GCN is robust to the possible noise in the data.

Fig. 9: Gaussian perturbation.
Fig. 10: Poisson perturbation.

Iv-C Model Interpretation

In this section, we compare the predictions of the KST-GCN with the ground truth values in the test set via visualization. We analyze the prediction ability of the KST-GCN from two perspectives: (1) long-term prediction, and (2) the importance of the attribute knowledge structure.

Iv-C1 Long-term prediction

We predict traffic speeds for the next 15 min, 30 min, 45 min, and 60 min based on historical data over a period of time using the KST-GCN, and the prediction results are shown in Figure 11 to Figure 14. These results show the following:

Fig. 11: Result for prediction horizon of 15 minutes.
Fig. 12: Result for prediction horizon of 30 minutes.
Fig. 13: Result for prediction horizon of 45 minutes.
Fig. 14: Result for prediction horizon of 60 minutes.
  1. For all prediction horizons, the KST-GCN better captures the change tendency of traffic data and is thus able to predict more accurately.

  2. The short-range prediction results of the KST-GCN are better than the long-range ones. The prediction result and the ground truth are closer in short-term prediction (15 min) than in long-term prediction (60 min), as shown in Figure 11 and Figure 14, indicating that the traffic characteristics are more important in short-term prediction.

  3. The prediction of the KST-GCN has significant deviations at turning points. The reason may be that the variations at peaks are influenced not only influenced by the weather and POIs but also by a combination of factors such as traffic contingencies.

Iv-C2 Importance of the attribute knowledge structure

We visualize the prediction results to verify the effect of the structural information between static and dynamic attribute knowledge and traffic information on traffic prediction. Figure 15, Figure 16, and Figure 17 show the prediction results after adding static attribute knowledge structure information, dynamic attribute knowledge structure information, and dynamic-static combined attribute knowledge structure information, respectively. The results show the following:

Fig. 15: Result for adding static attribute knowledge structure information.
Fig. 16: Result for adding dynamic attribute knowledge structure information.
Fig. 17: Result for adding static and dynamic attribute knowledge structure information.
  1. The addition of static and dynamic attribute knowledge structure information can enhance the prediction performance. Figure 15 and 16 compare the prediction results with dynamic/static attribute knowledge information or no additional information added, and we can find that the addition of dynamic/static attribute knowledge structure information enhances the model’s perception of semantics and time, making the prediction results closer to the real traffic state at each moment.

  2. The diversity of traffic information and attribute knowledge structure information can improve the model’s perception of peaks and turning points. Figure 17 compares the predictions of the KST-GCN (dynamic), KST-GCN (static), and KST-GCN (dynamic + static). Among the models, the KST-GCN (dynamic + static) performs better at turning points, which is probably because at turning points, the traffic state is influenced by more factors, whereas the KST-GCN (dynamic + static) incorporates multiple attributes and knowledge structure information.

V Conclusion

In this study, we propose a KST-GCN traffic prediction model KST-GCN based on knowledge representation and a spatial-temporal graph convolutional network to address the problem of neglecting correlations between traffic information and external factors in conventional urban traffic prediction methods. The KST-GCN adopts a knowledge graph representation method to capture the knowledge structure and the semantic information between traffic information and external factors in a low-dimensional space, and uses a spatiotemporal graph convolutional network to capture the spatiotemporal characteristics in traffic data. The experimental results reveal that the KST-GCN reduces the prediction error by at least 0.62% (compared to the T-GCN) and at most 44.98% (compared to the SVR), which proves the effectiveness of the KST-GCN. Moreover, the prediction results of the KST-GCN are better than those of models without additional knowledge structure information, indicating that the addition of knowledge structure information can enhance the perception of the structural information of the KST-GCN and thus improve the prediction accuracy. Finally, the KST-GCN outperforms the baselines in both short- and long-term predictions, and the prediction results are generally stable, which proves the superiority of the KST-GCN in short- and long-term predictions.


The authors would like to thank…


  • [1] M. S. Ahmed and A. R. Cook (1979) Analysis of freeway traffic time-series data by using box-jenkins techniques. Cited by: §II-A.
  • [2] J. Barros, M. Araujo, and R. J. Rossetti (2015) Short-term real-time traffic prediction methods: a survey. In 2015 International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), pp. 132–139. Cited by: §I.
  • [3] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: §III-B2.
  • [4] X. Chen, S. Jia, and Y. Xiang (2020) A review: knowledge reasoning over knowledge graph. Expert Systems with Applications 141, pp. 112948. Cited by: §III-B1.
  • [5] R. Fu, Z. Zhang, and L. Li (2016) Using lstm and gru neural network methods for traffic flow prediction. In 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pp. 324–328. Cited by: §I, §II-A.
  • [6] H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, and J. Han (2016) Large-scale embedding learning in heterogeneous event data. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 907–912. Cited by: §II-B.
  • [7] H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, L. Kaplan, and J. Han (2017) Embedding learning with events in heterogeneous information networks. IEEE transactions on knowledge and data engineering 29 (11), pp. 2428–2441. Cited by: §II-B.
  • [8] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said (1995) Short-term prediction of traffic volume in urban arterials. Journal of Transportation Engineering 121 (3), pp. 249–254. Cited by: §II-A.
  • [9] W. Huang, G. Song, H. Hong, and K. Xie (2014)

    Deep architecture for traffic flow prediction: deep belief networks with multitask learning

    IEEE Transactions on Intelligent Transportation Systems 15 (5), pp. 2191–2201. Cited by: §II-A.
  • [10] M. Jun and M. Ying (2008) Research of traffic flow forecasting based on neural network. In 2008 Second International Symposium on Intelligent Information Technology Application, Vol. 2, pp. 104–108. Cited by: §II-A.
  • [11] G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum (2008) Naga: searching and ranking knowledge. In 2008 IEEE 24th International Conference on Data Engineering, pp. 953–962. Cited by: §II-B.
  • [12] A. Kuang and Z. Huang (2004) Short-term traffic flow prediction based on rbf neural network. Systems engineering 2. Cited by: §II-A.
  • [13] I. Lana, J. Del Ser, M. Velez, and E. I. Vlahogianni (2018) Road traffic forecasting: recent advances and new challenges. IEEE Intelligent Transportation Systems Magazine 10 (2), pp. 93–109. Cited by: §I.
  • [14] B. Liao, J. Zhang, C. Wu, D. McIlwraith, T. Chen, S. Yang, Y. Guo, and F. Wu (2018) Deep sequence learning with auxiliary information for traffic prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 537–546. Cited by: §I, §II-A.
  • [15] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu (2015) Learning entity and relation embeddings for knowledge graph completion. In In Proceedings of AAAI’15, Cited by: §III-B2.
  • [16] Y. Lin, Z. Liu, and M. Sun (2016) Knowledge representation learning with entities, attributes and relations. ethnicity 1, pp. 41–52. Cited by: §III-B2.
  • [17] Y. Liu, H. Zheng, X. Feng, and Z. Chen (2017) Short-term traffic flow prediction with conv-lstm. In 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–6. Cited by: §I.
  • [18] Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Wang (2014) Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16 (2), pp. 865–873. Cited by: §II-A.
  • [19] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang (2015) Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies 54, pp. 187–197. Cited by: §I.
  • [20] R. Muppalla, S. Lalithsena, T. Banerjee, and A. Sheth (2017) A knowledge graph framework for detecting traffic events using stationary cameras. In Proceedings of the 2017 ACM on Web Science Conference, pp. 431–436. Cited by: §II-B.
  • [21] A. M. Nagy and V. Simon (2018) Survey on traffic prediction in smart cities. Pervasive and Mobile Computing 50, pp. 148–163. Cited by: §I.
  • [22] N. Noy, Y. Gao, A. Jain, A. Narayanan, A. Patterson, and J. Taylor (2019) Industry-scale knowledge graphs: lessons and challenges. Queue 17 (2), pp. 48–75. Cited by: §II-B.
  • [23] D. Park and L. R. Rilett (1999) Forecasting freeway link travel times with a multilayer feedforward neural network. Computer-Aided Civil and Infrastructure Engineering 14 (5), pp. 357–367. Cited by: §II-A.
  • [24] X. Sha, Z. Sun, and J. Zhang (2019) Attentive knowledge graph embedding for personalized recommendation. arXiv preprint arXiv:1910.08288. Cited by: §II-B.
  • [25] S. Shan and B. Cao (2017) Follow a guide to solve urban problems: the creation and application of urban knowledge graph. IET Software 11 (3), pp. 126–134. Cited by: §II-B.
  • [26] J. Tang, M. Qu, and Q. Mei (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165–1174. Cited by: §II-B.
  • [27] Y. Tian and L. Pan (2015) Predicting short-term traffic flow by long short-term memory recurrent neural network. In 2015 IEEE international conference on smart city/SocialCom/SustainCom (SmartCity), pp. 153–158. Cited by: §I.
  • [28] J. Van Lint, S. Hoogendoorn, and H. J. van Zuylen (2002) Freeway travel time prediction with state-space neural networks: modeling state-space dynamics with recurrent neural networks. Transportation Research Record 1811 (1), pp. 30–39. Cited by: §II-A, §II-A.
  • [29] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo (2019) Multi-task feature learning for knowledge graph enhanced recommendation. In The World Wide Web Conference, pp. 2000–2010. Cited by: §II-B.
  • [30] Z. Xu, H. Zhang, C. Hu, L. Mei, J. Xuan, K. R. Choo, V. Sugumaran, and Y. Zhu (2016) Building knowledge base of urban emergency events based on crowdsourcing of social media. Concurrency and Computation: Practice and experience 28 (15), pp. 4038–4052. Cited by: §II-B.
  • [31] B. Yu, H. Yin, and Z. Zhu (2017) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. Cited by: §II-A.
  • [32] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma (2017) Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17 (7), pp. 1501. Cited by: §II-A.
  • [33] D. Zhang and M. R. Kabuka (2018) Combining weather condition data to predict traffic flow: a gru-based deep learning approach. IET Intelligent Transport Systems 12 (7), pp. 578–585. Cited by: §I, §II-A.
  • [34] J. Zhang, Y. Zheng, and D. Qi (2016) Deep spatio-temporal residual networks for citywide crowd flows prediction. arXiv preprint arXiv:1610.00081. Cited by: §II-A.
  • [35] J. Zhang, F. Wang, K. Wang, W. Lin, X. Xu, and C. Chen (2011) Data-driven intelligent transportation systems: a survey. IEEE Transactions on Intelligent Transportation Systems 12 (4), pp. 1624–1639. Cited by: §I.
  • [36] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li (2019) T-gcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems. Cited by: §I, §II-A.
  • [37] Z. Zhene, P. Hao, L. Lin, X. Guixi, B. Du, M. Z. A. Bhuiyan, Y. Long, and D. Li (2018) Deep convolutional mesh rnn for urban traffic passenger flows prediction. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1305–1310. Cited by: §II-A.
  • [38] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §I.
  • [39] J. Zhu, C. Tao, H. Deng, L. Zhao, P. Wang, T. Lin, and H. Li (2020) AST-gcn: attribute-augmented spatiotemporal graph convolutional network for traffic forecasting. arXiv preprint arXiv:2011.11004. Cited by: §IV-B3.