Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network

11/02/2018 ∙ by Xiaoyu Wang, et al. ∙ Shanghai Jiao Tong University 0

Traffic prediction is a fundamental and vital task in Intelligence Transportation System (ITS), but it is very challenging to get high accuracy while containing low computational complexity due to the spatiotemporal characteristics of traffic flow, especially under the metropolitan circumstances. In this work, a new topological framework, called Linkage Network, is proposed to model the road networks and present the propagation patterns of traffic flow. Based on the Linkage Network model, a novel online predictor, named Graph Recurrent Neural Network (GRNN), is designed to learn the propagation patterns in the graph. It could simultaneously predict traffic flow for all road segments based on the information gathered from the whole graph, which thus reduces the computational complexity significantly from O(nm) to O(n+m), while keeping the high accuracy. Moreover, it can also predict the variations of traffic trends. Experiments based on real-world data demonstrate that the proposed method outperforms the existing prediction methods.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


An accurate traffic prediction in metropolitan circumstance is of great importance to the administration department. Taking Transportation Information Center (TIC) of Shanghai as an example, the high-accuracy traffic prediction helps to control the traffic flow. At the same time, the occurrences of large-scale traffic congestion always imply the gathering of citizens. Thus, traffic prediction also helps to prevent public or traffic accidents from happening [Zheng et al.2014] through noticing administrators in advance, and the emergency response plans can be deployed promptly.

There was research [Nguyen, Liu, and Chen2017] focusing on such a meaningful problem, but still left with some limitations. Firstly, most of the existing approaches, [Lippi, Bertini, and Frasconi2013, Fusco, Colombaroni, and Isaenko2016] for example, consider traffic prediction as a time series problem and solve it with common methods in the disciplines of time series analysis and statistical learning. However, the traffic condition of one road segment is strongly correlated to the others . Thus, the global information of the whole traffic network is ignored. Secondly, the traffic condition of some segments has an obvious seasonal regularity as the example shown in Figure 1(a), but most segments do not have such characteristics 1(b). This phenomenon restrains the performance of the methods which only excavate numerical correlations and exacerbates the difficulty of prediction. Thirdly, some approaches [Min and Wynter2011, Zhang et al.2016] introduce additional spatiotemporal data to assist the prediction. They addressed the global information to some extent, but the extra data also leads to copious expenditures on computation. And the existence of the strong coupling in the road system in both time and space indicates that the prediction using local information separately is eventually not equal to the prediction from global to global simultaneously. Hence, traffic condition prediction is still a tough problem remaining to be solved.

(a) Strong seasonal trend
(b) Week seasonal trend
Figure 1: Seasonal trend of differen road segments

In this paper, we propose a novel scheme to handle the limitations mentioned above, which consists of two key parts: the linkage network and the online regressor Graph Recurrent Neural Network (GRNN). First of all, we define the Linkage Network to enrich the properties a graph of the road network can present. Linkage which is newly introduced can include and present the significant property called propagation pattern, which actually shows the internal mechanism of the traffic variation.

After that, GRNN is proposed to mine and learn this propagation pattern and make the prediction globally and synchronously. GRNN contains a propagation module to propagate the hidden states along the linkage network just as the traffic flow spreading along the road network. Considering that the propagation of traffic flow directly affects the variation of traffic, GRNN can easily generate the prediction results with the already learned patterns. In conclusion, our contribution can be sum up into four folds:

  1. Linkage Network is modeled to dislodge the useless redundancy in the traditionally defined road network, and its new element linkage can contain and present the vital feature called propagation pattern, which is the major cause of the traffic variation.

  2. GRNN, which can absorb the information from the whole graph, is designed to mine and learn the propagation pattern in the linkage network, and it can further generate traffic prediction directly from the features it learned.

  3. We derive and give the learning algorithm of GRNN, and additionally prove that the computational complexity is lower than traditional approaches.

  4. We evaluate our scheme using taxi trajectory data of Shanghai. The experiment results demonstrate the advantages of the new scheme we proposed compared with 5 baselines.

Problem Formulation

In this section, we briefly introduce the traffic prediction problem. Here we give the most commonly used definition of road network firstly.

Definition 1.

Road Network. A traditional road network is defined as a directed graph, where is a set of intersections; while is a set of road segments. Vertex is defined by the coordinate of intersection , which are longitude and latitude respectively. Edge is a segment determined by two endpoints .

Definition 2.

Traffic Condition Prediction. For each road segment in road network , a time series represents the traffic condition of in each time interval . Traffic condition prediction aims to predict

from a feature vector

using a map , and minimize the following error:


In most of the traditional approaches, researchers usually aggregate traffic conditions of the to-be-predicted segment in former time steps and other spatiotemporal data as feature vector , and predict of each segment separately, which can be expressed by a map . In our approach, we make the prediction of all segments from global information simultaneously.

Definition 3.

Global Traffic Condition Prediction. In this task, we aim to form a map maps the prior knowledge of former steps’ conditions of all segments to the prediction , and change Equation 1 as follow:

where is the condition vector containing conditions of each road segment in the graph at time , and

denotes the loss function measuring the deviation from prediction to actuality, which will be introduced in the following.

Architecture and the Linkage Network

We demonstrate the architecture of the whole prediction scheme we proposed and further make a detail explanation of the linkage network in this section.

Architecture of the New Scheme

Our proposed traffic prediction scheme consists of two key model: linkage network and GRNN. The analysis of this architecture is briefly shown in Figure 2.

Figure 2: Architecture of the proposed scheme

In the following, we will always use a subgraph of the whole traffic system, as shown in Figure 3, as an example.

Figure 3: Example of a subgraph

Road network defined in Definition 1 is widely used, which abstracted from actual world intuitively and directly as shown in Figure 3. Vertexes A to D in the graph represent the intersections which roads converge to, and the directed edges represent unidirectional road segments between intersections.

Road segment has a set of features which can be carried by the vertex in the graph, which usually consists of average speed, number of lanes, length and so on. Meanwhile, two attributes which an intersection has should be analyzed specifically, which are geographic location and the ‘linkage’ among road segments. The former one is meaningless in the topology based research. And we define the latter one, ‘linkage’, as the physical connection between two end to end road segments. For example, at a crossroad, a vehicle has four choices: turn back, left, right and go straight forward. These four choices correspond to four linkages between four downstream road segments and the segment this vehicle currently driving on.

Additionally, we define the ‘propagation pattern’ as the proportion of the vehicles which choose a certain linkage. Back to the illustration of the subgraph in Figure 3, under the traditional definition, all linkages corresponding to the intersection ‘C’ are coupled together, and propagation patterns have the same coupling problem. As a consequence, although the graph in Definition 1 can express the ‘linkage’, the ‘propagation pattern’ a linkage contains can be represented by neither edge nor vertex of it. At the same time, we notice that a large number of vehicles congested on a road segment will eventually spread to downstream segments and increase the burden of them, and the size of traffic flow has a strong relationship with the traffic condition [Du et al.2013]. As a consequence, the traffic conditions of downstream road segments are directly influenced by their upstream segments. Thus, the propagation pattern of the urban transportation system is the key to analyze the internal mechanism of the traffic variation. This vital feature has to be decoupled and be expressed by elements in the graph clearly and separately.

Hence, we propose the linkage network to eliminate the useless redundancy of vertexes and compact the propagation pattern in the graph. After that, based on the linkage network, we propose GRNN to mine and learn the propagation pattern from it and further predict traffic in a more efficient way.

Linkage Network Modeling

Here we give the definition of the linkage network:

Definition 4.

Linkage Network. A linkage network is an unweighted directed graph, where vertexes represent the road segments; while directional edges denote the linkages between contiguous segments. A directional edge from segment to will be established if and only if the termination intersection of and the initiation intersection of are the same one.

(a) Road network
(b) Linkage network
Figure 4: Differences between two modeling

Compared to Figure 4(a), Figure 4(b) illustrates the graph structure of the linkage network. In Definition 4, the intersection, which carries no useful features, is ignored and the linkage is introduced as the edge of the graph; while the road segment is defined as the vertex. Such a transformation liberates the redundancy of vertexes. Simultaneously, edges now represent linkages containing the traffic propagation patterns, which is significant for the next part of the scheme, the GRNN model, we will introduce. Hence, the linkage network has two main advantages:

  1. Linkage network can carry is more plentiful information, especially the propagation pattern.

  2. Only under the definition of linkage network, we can design an algorithm to learn the traffic pattern.

To eliminate the ambiguity, ‘road network’ used in the following represents the road system in the real world, and ‘linkage network’ represents the graph as we defined. At the same time, the new topological structure can be easily transformed from graph defined in Definition 1 using the following algorithm we define.

1:Graph in Definition 1
2:Adjacency matrix of
5:for  do
6:     if  then
8:     end if
9:end for Save segments with same initial point
10:Chained list
11:for  do
12:     if  then
14:     end if
15:end for
16:Get adjacency matrix from chained list
Algorithm 1 Graph transformation

Graph Recurrent Neural Network

Next, we need an algorithm to complete the global prediction task as defined in Definition 3 through the mining and learning from the propagation pattern of traffic flow. Traffic flow is essentially the volume of traffic on each road segment, but the traffic monitoring data we restored in [Du et al.2015, Wang et al.2018] is the average speed of vehicles. Fortunately, [Du et al.2013] shows that there is a strong relationship between traffic flow and average speed. Thus, we propose the GRNN based on Graph Neural Networks (GNNs) [Scarselli et al.2009] to learn and predict the traffic condition online in an end-to-end form, which means that GRNN will learn the relation between those two metrics and mine the propagation patterns to achieve the goal of global prediction.

The propagation module in GNNs is formed to expand on the time axis of training and propagates hidden states to an equilibrium point to catch the static relationships among vertexes. However, propagation patterns in the transportation system are time-variant, which means that we do not need the propagation process executing too long till stable. Therefore, we compress the propagation module into only one time-step to capture the dynamic relations.

Additionally, the condition of a certain segment is affected by the upstream’s conditions not only in the last time step but also in long and short-term history. Thus, we concatenate multiple exactly the same propagation modules end to end to handle the correlations on the time axis of the real world. In other words, the propagation module sends its information back to itself in the next time step. The architecture of GRNN is illustrated in Figure 5, where and represent the propagation and output model separately. Under this construction, GRNN also has two major features:

  1. GRNN becomes a sequence-to-sequence model and overcomes the limitations of GNNs that they have difficulty dealing with streaming data.

  2. GRNN can learn the propagation pattern represented by the linkage network and predict traffic condition globally and synchronously.

After all, the Back Propagation Through Time (BPTT) algorithm [Werbos1988, Hochreiter and Schmidhuber1997] is utilized to train the whole GRNN.

Figure 5: Architecture of GRNN

Propagation Module

GRNN also use GRU cells in the propagation module like Gated Graph Neural Network (GGNN) [Li et al.2016] to control the reservation and rejection of information which is gathered from former steps dynamically. In the propagation module of GRNN, the hidden state matrix , which represents the propagation patterns here, does not directly relate to the node annotations (time series of traffic condition). is the dimension of the hidden state of each node. Thus,

cannot be initialized through padding a zero matrix on initial node annotations. In GRNN, we randomly initialize

with the normal distribution. Meanwhile, all edges in the linkage network are equal, and the differences of propagation pattern are represented by

. Therefore, edges share the same label and all elements in , which controls the propagation direction, are and . Additionally, since all propagation processes are unidirectional as the definition of the linkage network, is only a matrix without any affiliation information. The propagation module of GRNN is formed as follow:


where is the input of time . are bias matrices. is the dimension of input feature.

is the sigmoid function, and

is the element-wise multiplication. Considering that the traffic condition of a certain road segment dependents on not only its upstream segments’ condition but also its own condition in the last time step. GRNN propagates the states following , where is the adjacency matrix and

is a hyperparameter which controls the decaying of influence propagation, and

is an equal size identity matrix. Take the subgraph in Figure

3 we used above as an example, the propagation among vertexes is illustrated in Figure 6. GRNN propagates information and trains model with new inputs as time goes by, which means that hidden states will eventually contain all information from the whole graph. At the same time, since GRNN learns the propagation pattern dynamically, it can be implemented online. At last, equation 3-6 determine the remembering of the incoming information and the forgetting of old information follow a GRU structure.

Figure 6: Information propagation in the graph, where black arrows denote the self correlation and grey dash lines indicate the mutual correlations.

Output Module

GNN framework provides a flexible way to handle the output. We can easily get node-level or graph-level outputs from the hidden state matrix with different output models. In our regression task, we focus on the node-level predictions for next time steps. Hence we directly construct a fully connected linear layer for the output module as


where are prediction results corresponding to road segments.


Learning Algorithm

In GRNN we proposed, information is propagated continuously with the progress of online-training. Hence the learning algorithm of it has to be modified based on BPTT. We formula the BPTT algorithm for GRNN with matrices representation as follows. Firstly, we use the Mean Square Error (MSE) as our loss function:


where is the time span of propagation, and is the true values. can be whether the span of whole historical data or a certain value of hyperparameter to truncate the back-propagation process of deviation to simplify the training process. For any time step , the gradient of with respect to is formulated as follow:


For the last time step , gradients of with respect to weight matrices have the simplest forms, for example . For the time steps less than , information of each node will propagate to others. Hence the gradients have different forms, which contain multiple parts of gradients from different nodes from next time steps. The gradients are too sophisticated to express, so we give out the recursion form.

Under the representations of Equation 9, 10, gradients of with respect to weight matrices in the time steps less than can be expressed in Equation 11, and:

From the equations above, we can clearly see correlations among hidden states of all vertexes in different time steps. Finally, the online training and prediction process of GRNN is described in Algorithm 2.

1:Historical Data: ; Old model
2:Predictions: ; Updated model
3:Load from model trained in last time steps
4:Predict with through Equation 2 to 6

 Each iteration epoch

6:     Propagate forward through Equation 2 to 6
7:     Calculate loss through Equation 8
8:     Update gradients through Equation 10 and 11
9:end for
10:return Prediction and the up-to-date model
Algorithm 2 GRNN Online Training and Prediction

Computational Complexity

Here we briefly calculate and compare the computational complexity of GRNN with the traditional single-segment prediction methods.

Space Complexity. Space complexity, or the usage of computer memory in other words, mainly dependent on the magnitude of parameters to be learned in a model. If we use traditional predictors to predict the traffic conditions of total road segments in the whole city, there are models to be trained separately. For each model, memory usage of its weight matrices is . Here we replace with , which represents the size of the model to be trained. As a result, the complexity of predictors is . As for GRNN, memory usage of weight matrices is also , while of hidden states is . Since GRNN shares weight matrices between all nodes, the space complexity of these two parts can simply add together: , which is far less than . Considering the huge amount of road segments () a metropolis such as Shanghai has, such a reduction is very meaningful.

Time Complexity. The comparison of time complexity is more sophisticated. We start with the traditional models as well. Here we suppose that the time complexity in one model is for simplicity. Thus, the complexity of models is obviously. Back to GRNN, we have to give a more elaborate explanation of Equation 10 and 11. The formulas given in those two equations are the simplest for in matrices representation. If we split each of them into vector form, each gradient actually has to be updated times, whose complexity is similar to update models one time. However, GRNN can update weights through matrix operations, things have changed. In short, GRNN update parameters only one time in each step. Notice that the size of matrix is corresponding to , so time complexity of GRNN is far less than but slightly larger than , which corresponds to the time complexity of matrix operations of CPU or GPU. Since we can not give out the certain formula of time complexity, a numerical comparison will be presented in next section.


Datasets and Settings

Datasets. Raw taxi trajectory dataset we use in this research is obtained from TIC Shanghai, the distribution of samplings are illustrated in Figure 7. To be specific, GB data are gathered from taxis from Apr. 1, 2015 to Apr. 30, 2015 and a city-scale road network contains road segments. Each taxi reports the GPS report every 10 seconds. The raw trajectory data include the ID, geographical position, upload time stamp, carrying state, speed, the orientation of the vehicle and so on. We mined and restored the traffic conditions of all segments in that time span in our previous work [Wang et al.2018], and set the time interval to minutes. Unfortunately, samples from most of the segments are too sparse, in other words, we only have a set of segments with entire time series of traffic conditions. Thus, we select a connected subgraph with vertexes as shown in the attached graph on the right side of Figure 7 with highest sampling density as our test bed where all following experiments will be executed. To be noticed, all the raw data are private, but the processed testbed is available on GitHub, together with the codes of the proposed scheme and tests: https://github.com/xxArbiter/grnn.

Figure 7: Distribution of raw taxi trajectory data, the attached map on the right side show the subgraph we chose, and lighter the segment is, more samples it has.

Baselines. We compare our scheme with the following five baselines:

  1. HA: Historical average predicts traffic condition by the average value of conditions of the corresponding time interval of each passing day.

  2. ARIMA

  3. GBRT

    : Gradient Boosted Regression Tree is a efficient ensemble model.

  4. SVR: Support Vector Regression.

  5. GGNN: GGNN is slightly modified to fit the structure of data in the testbed.


We use PyTorch

[Paszke et al.2017] to construct our model, build ARIMA using Pyramid [Smith2018]. SVR and GBRT are implemented with scikit-learn. We select of data for training and the remaining for validation. For SVR and GBDT, we set the dimension of input to , and each GBDT has trees with up to layers. In addition, we modify GGNN [Li et al.2016]

slightly to fit our data. Firstly, we add a sigmoid activation function to its output module since the regression task we are facing. Then we put all data of

steps into initial node annotation and set the dimension of the hidden state to since it has to bigger than the size of annotation. Thus, the initial node representation of each node is

. Our GRNN can learn online, which means that it will learn and update itself, and produce the next prediction each time a new set of data is coming, so we truncate the backpropagation process with

time steps. Experiments will also be executed to show the effect of the other two extra hyperparameters: the dimension of the hidden state and iteration epochs .

Evaluation metrics. We evaluate our method by MSE, which shares the definition in Equation 8

, and Variance of Deviation (VD). Additionally, Running Time (RT) will be used to judge the computational complexity of GRNN.


where is the prediction error of each vertex, each time interval, and is the mean of all errors. VD is defined to measure the dispersity of prediction deviation. A higher VD means that the model cannot track the true value promptly, in other words, it cannot forecast the peaks. The usage of VD will be explained in details in the following.

Results of Experiments

We give the comparison between our model and baselines with metrics MSE and VD as shown in Table 1.

Model MSE VD
HA 22.920 22.878
ARIMA 14.750 14.396
SVR 22.230 18.135
GBDT 7.082 7.076
Modified GGNN 7.091 7.051
GRNN-144T-32D-2i 6.543 6.537
GRNN-144T-32D-10i 5.576 5.576
GRNN-576T-32D-10i 4.540 4.539
GRNN-576T-64D-10i 4.779 4.779
GRNN-1008T-32D-10i 4.850 4.850
GRNN-1008T-64D-10i 5.405 5.405
Table 1: Comparison among different methods

Results of several versions of GRNNs with different hyperparameters are also listed. It is obvious to see that GRNN-576T-32D-10i has the best performance, but even the smallest GRNN-144T-32D-2i can also outperform traditional time series analysis methods. We make a specific analysis of the influence of different hyperparameters further. Firstly, with the enlargement of the network scale, has to be increased together since the information to be learned is more detailed, but has an upper limit. For example, GRNN-144T-32D-100i will always diverge while a certain time. This indicates that too many iterations will converge the model to a wrong equilibrium. Simultaneously, has to be scaled together with from the same aspect of the more detailed information to be learned. Notably, the modified GGNN also have a relatively well performance and low complexity, but our algorithm beats it with a smaller size.

(a) Prediction of GRNN
(b) Prediction of GBDT
Figure 8: Performance of GRNN and GBDT

Additionally, we chose a random road network to show the results in details. Here we compare GRNN-576T-32D-10i and GBDT, which achieves the highest accuracy among traditional methods. Predictions from 8:00 to 16:00, 27th April 2015, are shown in Figure 8. Except for the higher accuracy we discussed above, GRNN can track the ground truth more promptly. Specifically, GRNN predicts the peaks correctly at the positions marked by the red circles in Figure 8(a). Meanwhile, the peaks predicted by GBDT always have phase differences, in other words, results of GBDT delay the true values. This phenomenon can also be indicated by the metric VD we defined above.

Numerical explanation of time complexity. Here we compare the running time of GRNN with different using a relatively small model to give out an intuitive cognition of time complexity of GRNN. We execute experiments with nodes separately, and chose the mode GRNN-144T-32D-10i-0.01Lr. Comparison results are briefly shown in Table 2, where the MSE and VD are the metrics of a certain road segment, rather than the whole graph. Through these experiments, we can clearly see that the running time remains almost unchanged with the growing of graph size. These results indicate the conclusion, that the time complexity of GRNN is far less than and approximate to , we made in the last Section. Additionally, it is clear to see that the accuracy of prediction will increase together with the enlargement of the scale of the subgraph, which verifies the inference that the propagation patterns GRNN learned from graph contribute to the effectiveness of prediction. And the reduction of accuracy when can also be explained by the superfluous information to be learned, which can be relieved by expansion of the network.

1 10 156
MSE of road #1 5.4053 3.482 4.072
VD of road #1 5.4050 3.478 3.715
RT/min 178.31 176.40 181.57
Table 2: Numerical experiment of time complexity

Related Work

Traffic condition prediction. There are many previous works [Chen et al.2012] considering the traffic condition as a time series and predicting for different segments separately through time series analysis, like Auto-Regressive Moving Average (ARMA) based algorithms (ARIMA, SARIMA). Additionally, some research [Oh, Byon, and Yeo2016, Hu et al.2016]

uses the methods of statistical learning such as Bayesian Network (BN), SVR and GBDT, and adds extra information to assist the training.

[Fusco, Colombaroni, and Isaenko2016] compares those methods and shows their similar performances. In these approaches, the strong spatiotemporal couplings, which exist in metropolitan circumstance particularly, lead to the dilemma of choices between the computation complexity and the sufficiency of input information.

[Li et al.2015] tries to mine the relationship between consecutive monitoring stations on the highway to predict traffic condition. It is an improvement but the correlation between stations is very intuitive. [Zhang, Zheng, and Qi2016, Polson and Sokolov2017]

delimit urban area into grids and predict the flow of citizen with deep learning algorithms like Convolutional Neural Network (CNN) and Residual Network (ResNet). Although these approaches can learn globally, the action of gridding has already broken the topological structure of the road network.

[Liang, Jiang, and Zheng2017] infers the cascading pattern of traffic flow with tree searching and forecasts the congestion further, but it is eventually a local learning method.

Learning of graph-structured data. Few frontier investigations focus on learning from graph-structured data. [Scarselli et al.2009] earliest proposes the framework of GNN to excavate the relationship in graph-structured data. [Shahsavari2015] utilizes it in traffic prediction task and shows its effectiveness. [Li et al.2016] further expands GNN with GRU cell to simplify the propagation process. Additionally, some works develop another framework call Graph Convolutional Network (GCN) to resolve the graph-structure puzzle in a different way. [Niepert, Ahmed, and Kutzkov2016] proposes an application of convolution kernel in the graph domain. Meanwhile, [Defferrard, Bresson, and Vandergheynst2016, Seo et al.2017, Hamilton, Ying, and Leskovec2017] establish a various implementation in the frequency domain.

Conclusion and Future Work

We model a new topological structure for the road network in metropolitan circumstance to remove the useless redundancy and represent more plentiful information and characteristic which cannot be carried by old definition. Further, we propose a novel network GRNN to mine the potential propagation patterns of traffic flow in the redefined graph and achieve the final object of global traffic condition prediction. The outstanding effectiveness of GRNN is shown in experiments and the high-efficiency is proved in the analysis of computational complexity.

In the future, we will expand GRNN with more additional information to achieve higher performance, and explore more application scenarios where data are driven by potential propagation behaviors, economic system and stocks for example. Moreover, we will visualize the patterns GRNN learned from graph-structured data.


  • [Chen et al.2012] Chen, C.; Wang, Y.; Li, L.; Hu, J.; and Zhang, Z. 2012. The retrieval of intra-day trend and its influence on traffic prediction. Transportation Research Part C 22(5):103–118.
  • [Defferrard, Bresson, and Vandergheynst2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. 3844–3852.
  • [Du et al.2013] Du, R.; Chen, C.; Yang, B.; and Guan, X. 2013.

    Vanet based traffic estimation: A matrix completion approach.

    In GLOBECOM, 30–35.
  • [Du et al.2015] Du, R.; Chen, C.; Yang, B.; Lu, N.; Guan, X.; and Shen, X. 2015. Effective urban traffic monitoring by vehicular sensor networks. IEEE Transactions on Vehicular Technology 64(1):273–286.
  • [Fusco, Colombaroni, and Isaenko2016] Fusco, G.; Colombaroni, C.; and Isaenko, N. 2016. Short-term speed predictions exploiting big data on large urban road networks. Transportation Research Part C 73:183–201.
  • [Hamilton, Ying, and Leskovec2017] Hamilton, W. L.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. 1024–1034.
  • [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.
  • [Hu et al.2016] Hu, H.; Li, G.; Bao, Z.; Cui, Y.; and Feng, J. 2016. Crowdsourcing-based real-time urban traffic speed estimation: From trends to speeds. In IEEE ICDE, 883–894.
  • [Li et al.2015] Li, L.; Su, X.; Wang, Y.; Lin, Y.; Li, Z.; and Li, Y. 2015. Robust causal dependence mining in big data network and its application to traffic flow predictions. Transportation Research Part C 58:292–307.
  • [Li et al.2016] Li, Y.; Tarlow, D.; Brockschmidt, M.; and Zemel, R. S. 2016. Gated graph sequence neural networks. arXiv: Learning.
  • [Liang, Jiang, and Zheng2017] Liang, Y.; Jiang, Z.; and Zheng, Y. 2017. Inferring traffic cascading patterns. In ACM SIG GIS.
  • [Lippi, Bertini, and Frasconi2013] Lippi, M.; Bertini, M.; and Frasconi, P. 2013.

    Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning.

    IEEE Transactions on Intelligent Transportation Systems 14(2):871–882.
  • [Min and Wynter2011] Min, W., and Wynter, L. 2011. Real-time road traffic prediction with spatio-temporal correlations. Transportation Research Part C 19(4):606–616.
  • [Nguyen, Liu, and Chen2017] Nguyen, H.; Liu, W.; and Chen, F. 2017. Discovering congestion propagation patterns in spatio-temporal traffic data. IEEE Transactions on Big Data 3(2):169–180.
  • [Niepert, Ahmed, and Kutzkov2016] Niepert, M.; Ahmed, M. O.; and Kutzkov, K. 2016. Learning convolutional neural networks for graphs. 2014–2023.
  • [Oh, Byon, and Yeo2016] Oh, S.; Byon, Y. J.; and Yeo, H. 2016. Improvement of search strategy with k-nearest neighbors approach for traffic state prediction. IEEE Transactions on Intelligent Transportation Systems 17(4):1146–1156.
  • [Paszke et al.2017] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. In NIPS workshop.
  • [Polson and Sokolov2017] Polson, N. G., and Sokolov, V. 2017. Deep learning for short-term traffic flow prediction. Transportation Research Part C 79:1–17.
  • [Scarselli et al.2009] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20(1):61–80.
  • [Seo et al.2017] Seo, Y.; Defferrard, M.; Vandergheynst, P.; and Bresson, X. 2017. Structured sequence modeling with graph convolutional recurrent networks.

    arXiv: Machine Learning

  • [Shahsavari2015] Shahsavari, B. 2015. Short-term traffic forecasting: Modeling and learning spatio-temporal relations in transportation networks using graph neural networks. Technical Report UCB/EECS-2015-243, Dept. of Electronical Engineering and Computer Science, Univ. California, Berkely.
  • [Smith2018] Smith, T. 2018. Pyramid. https://github.com/tgsmith61591/pyramid.
  • [Wang et al.2018] Wang, X.; Chen, C.; Min, Y.; He, J.; and Zhang, Y. 2018. Vehicular transportation system enabling traffic monitoring: A heterogeneous data fusion method. In WCSP.
  • [Werbos1988] Werbos, P. J. 1988. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1(4):339–356.
  • [Zhang et al.2016] Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; and Yi, X. 2016. Dnn-based prediction model for spatio-temporal data. In ACM SIGSPATIAL.
  • [Zhang, Zheng, and Qi2016] Zhang, J.; Zheng, Y.; and Qi, D. 2016. Deep spatio-temporal residual networks for citywide crowd flows prediction. In AAAI, 1655–1661.
  • [Zheng et al.2014] Zheng, Y.; Capra, L.; Wolfson, O.; and Yang, H. 2014. Urban computing: Concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology 5(3):38.