"How do urban incidents affect traffic speed?" A Deep Graph Convolutional Network for Incident-driven Traffic Speed Prediction

12/03/2019 ∙ by Qinge Xie, et al. ∙ aalto The University of Chicago FUDAN University 0

Accurate traffic speed prediction is an important and challenging topic for transportation planning. Previous studies on traffic speed prediction predominately used spatio-temporal and context features for prediction. However, they have not made good use of the impact of urban traffic incidents. In this work, we aim to make use of the information of urban incidents to achieve a better prediction of traffic speed. Our incident-driven prediction framework consists of three processes. First, we propose a critical incident discovery method to discover urban traffic incidents with high impact on traffic speed. Second, we design a binary classifier, which uses deep learning methods to extract the latent incident impact features from the middle layer of the classifier. Combining above methods, we propose a Deep Incident-Aware Graph Convolutional Network (DIGC-Net) to effectively incorporate urban traffic incident, spatio-temporal, periodic and context features for traffic speed prediction. We conduct experiments on two real-world urban traffic datasets of San Francisco and New York City. The results demonstrate the superior performance of our model compare to the competing benchmarks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Traffic speed prediction has been a challenging problem for decades, and has a wide range of traffic planning and related applications, including congestion control (Li et al., 2017), vehicle routing planning (Johnson et al., 2017), urban road planning (Rathore et al., 2016)

and travel time estimation 

(Gao et al., 2019). The difficulty of the problem comes from the complex and highly dynamic nature of traffic and road conditions, as well as a variety of other unpredictable, ad hoc factors. Urban traffic incidents, including lane restriction, road construction and traffic collision, which is one of the most important factors, tend to dramatically impact traffic for limited time periods. Yet the frequency of these events means their aggregate impact cannot be ignored when modeling and predicting traffic speed.

Despite a large amount of research on detecting urban traffic incidents (Zhang et al., 2018; Yuan et al., 2018), a small number of works study the impact of urban traffic incidents recently. (Miller and Gupta, 2012) proposed a system for predicting the cost and impact of highway incidents. (Javid and Javid, 2018) developed a framework to estimate travel time variability caused by traffic incidents. (He et al., 2019) proposed to use the ratio of speed before and after incident as the traffic impact coefficient to evaluate the traffic influence of an incident. Those works have proven the significant impact of urban traffic incidents on traffic conditions. However, improving traffic speed prediction by traffic incidents has not been well explored. Some previous works (Lin et al., 2017) use incident data collected from social networks (e.g., Twitter) by keywords to improve traffic prediction. However, they fail to consider the impact level of different urban traffic incidents but treat all incidents equally for speed prediction.

Today, the large majority solutions including traditional machine learning 

(Castro-Neto et al., 2009), matrix decomposition (Deng et al., 2016) and deep learning methods (Li et al., 2018; Lv et al., 2018; Yao et al., 2019) of traffic speed prediction mainly use spatio-temporal features of traffic network and context features such as weather data. These solutions for predicting traffic speed do not factor in the impact of those dynamic traffic incidents.

A number of questions naturally arise: how do different urban traffic incidents impact traffic flow speeds? Do high impact traffic incidents have specific spatio-temporal patterns in the city? How can we use urban traffic incident data to improve traffic speed prediction? In this paper, our goal is to answer these questions, and in doing so, understand the impact of urban traffic incidents on traffic speeds and propose an effective framework using urban traffic incident information to improve traffic speed prediction. There are two main challenges in our incident-driven traffic speed prediction problem. First, the impact of urban traffic incidents is complex and varies significantly across incidents. For example, incidents occur in the wee hours and in remote areas will have little impact on adjacent roads, while incidents during the rush hours and in high-traffic areas (e.g. downtown) are very likely to affect the surrounding traffic flows or even cause congestion (Pan et al., 2012). Therefore, it is unreasonable to treat all urban traffic incidents equally for traffic speed prediction, which may even negatively impact prediction performance. Second, the impact of urban traffic incidents on adjacent roads will be affected by external factors like incident occurrence time, incident type and road topology structure. We need to extract the latent impact features of traffic incidents on traffic flows to improve traffic speed prediction.

To tackle the first challenge, we propose a critical incident discovery method to quantify the impact of urban traffic incidents on traffic flows. We consider both anomalous degree and speed variation of adjacent roads to discover the critical traffic incidents. Next, to tackle the second challenge, we propose a binary classifier which uses deep learning methods to extract the latent impact features of incidents. The impact of incidents varies in degree and the impact is neither binary nor strict multi-class. So we extract the latent impact features from the middle layer of the classifier, where the latent features are continuous and filtered. We adopt Graph Convolution Network (GCN) (Bruna et al., 2014) to capture spatial features of road networks. GCN is known to be able to effectively capture the topology features in non-Euclidean structures and the complex road network is a typical non-Euclidean structure. Combining above methods, we propose a Deep Incident-Aware Graph Convolutional Network (DIGC-Net) to improve traffic prediction by traffic incident data. DIGC-Net can effectively leverage traffic incident, spatio-temporal, periodic and context features for prediction.

We test our framework using two real-world urban traffic datasets from San Francisco and New York City. Experimental results empirically answer the above mentioned questions, and also show the particularly different spatio-temporal distributions of critical/non-critical incidents. We compare DIGC-Net with state-of-the-art methods, and the results demonstrate the superior performance of our model and also verify that the incident learning component is the key to the improvement of prediction performance.

(a) Road network of SFO
(b) The congestion incident
(c) Speed curves of three candidate flows
Figure 1. Traffic illustration of SFO

We summarize our key contributions as follows:

  • To quantify the impact of traffic incidents on traffic speeds, we propose a critical incident discover method and discover critical incidents in the city. We further explore the spatio-temporal distributions of critical/non-critical incidents and find noteworthy differences.

  • In order to extract the latent incident impact features, we skillfully design a binary classifier to extract the latent impact features from the middle layer of the classifier. We use the binary classifier as an internal component of our final framework to improve traffic speed prediction.

  • We propose a DIGC-Net to effectively incorporate incident, spatio-temporal, periodic and context features for traffic speed prediction. We conduct experiments using two real-world urban traffic datasets, and results show that DIGC-Net outperforms competing benchmarks and the incident learning component is the key to the improvement of prediction performance. Meanwhile, the incident learning component can be flexibly insert to other models as a common use to learning incident impact features.

2. Preliminaries

Before diving into details of the model, we begin with some preliminaries on our datasets and problem formulation in this section.

2.1. Datasets

We utilize two datasets, a traffic dataset and an attribute dataset (weather data). The traffic dataset consists of traffic road network, speed and incident sub-dataset from two major metropolitan areas, San Francisco (SFO) and New York City (NYC), with complex traffic conditions and varying physical features that may affect latent traffic patterns (Xie et al., 2018). We collect the weather dataset using Yahoo Weather API (Yahoo, 2019) and fields includes weather type, temperature and sunrise time. We collect the traffic dataset from a public API: HERE Traffic (Here, 2019). 1) Road Network: We set lat/lng bounding boxes (Figure 1(a)) on two cities of SFO (37.707,-122.518/37.851,-122.337) and NYC (40.927,-74.258/40.495,-73.750) to gather the internal road network. 2) Traffic Speed: We collect the real-time traffic speed of each flow in the areas described above and record real-time speeds of each flow every 5 minutes. 3) Traffic Incident: We also collect the traffic incident data in same areas every 5 minutes. For each incident, we can get the incident features like type and location.

Component Datasets Description
[1pt] Critical Incident Discovery Use traffic incident, road network and speed sub-dataset. The incident and speed data are from Apr. 17 to Apr. 24, 2019.

[1pt] Impact Features Extraction

Use traffic incident, road network and speed sub-dataset. The incident and speed data are from Apr. 17 to Apr. 24, 2019.
[1pt] DIGC-Net Use traffic incident, road network, speed sub-dataset and weather dataset. The incident and speed data are form Apr. 4 to May 2, 2019 (4 weeks).
Table 1. Overall datasets

Flow.    The real-time speeds in different segments of one single road are discrete. HERE divides every road into multiple segments. We denote one road segment as one flow . Every flow at each time slot will have a speed and we use flow as the smallest unit in the road network.

2.2. Problem Formulation and Preprocessing

First, we denote a road network as an undirected graph , where each node represents an intersection or a split point on the road, and each edge represents a road segment.

Reconstruction of the road network.    As our task is to predict the speed of every road segment, we use the road segment as the node. More specifically, we use every flow as one node to build the road network. If two flows and have points of intersection, we will add an edge to connect node and node . Therefore, we build a new road network graph , where each node represents a flow and each edge represents an intersection of the flows or a split point on the flow. There are 2,416 nodes and 19,334 edges of SFO, and 13,028 nodes and 92,470 edges of NYC. We will use the re-build road network graph in the rest of the paper.

Problem formulation.    We use to represent the speed of flow at time slot

. For every speed snapshot of the road network, we will get a vector of all flows

, where is the total number of flows. Given the re-build road graph and a T-length historical real-time speed sequence of all flows, our task is to predict future speeds of every flow in the city, i.e., , where is the prediction length. Given a set of urban traffic incidents occur close to the predicted time , more specifically, a set of incidents occur within , where is the earliest included incident occurrence time and is the latest included incident occurrence time. We extract the features of the impact of above mentioned incidents on traffic flows to improve the speed prediction performance.

3. Urban Critical Incident Discovery

The impact of urban traffic incidents are complex and also influenced by other factors like the topological structure of urban road network, temporal features and incident type. Treating all urban traffic incidents equally will add additional noise to traffic speed prediction process. In this section, we focus on analyzing the impact of different urban traffic incidents, and introduce our urban critical incident discovery methodology.

3.1. Methodology

Case Study: A Congestion Incident.    Figure 1(b) presents a congestion incident occurred at 06:32 am on Apr. 17, 2019 in San Francisco. is the center point of the incident and we set to represent the radius of the impact range. The circle with the center and radius stands for the region affected by the incident. We define that if the center of flow is in the circle, then the flow might be affected by the incident. The circle in Figure 1(b) presents the affected region when . The blue, red and green lines represent three flows , and in San Francisco which might be affected by the incident, respectively. The speed curves of the three candidate flows are shown in Figure 1(c). We observe that during 6:00 am - 7:00 am, the speeds of and show a sharp reduction while the variation of is relatively slight, but it still become more choppy after the incident occurred.

Next, we analyze each candidate flow that whether it will truly be affected by the incident. We use a variant of the method proposed in (Zhang et al., 2018) to compute the anomalous degree of each flow. They divides the city area into several grids and compute the anomalous degree of each grid region to detect urban anomalies. The key idea to compute the anomalous degree of a region is based on its historically similar regions in the city. The sudden drop of speed similarity of a region and its historically similar regions indicates the occurrence of urban anomalies, and the well-designed experiments had verified the effectiveness of the detection method. In our problem, we use each flow as the unit rather than grid region.

      Definition 1 ().

Pair-wise Similarity of Flows. Given two flows at time slot with speeds and , for a time window , the pair-wise similarity is calculated by:

(1)

where is to calculate Pearson correlation coefficient (Lin, 1989) of two speed sequences. Then the similarity matrix of all flows at is calculated by the following equation:

(2)

where is the total number of flows in the city.

      Definition 2 ().

Similarity Decrease Matrix (SD). Similar to (Zhang et al., 2018), we define the similarity decrease matrix , which represents the decreased similarity of each flow pair from time slot to . at time slot is calculated by: . Zeroing the numbers less than zero is due to that we only consider the case where the similarity goes down.

      Definition 3 ().

Anomalous Degree (AD). Then we use similarity matrix and similarity decrease matrix to compute of flows at time slot . We use a threshold parameter to capture the historically similar flows. When the similarity of two flows is equal or greater than , we define they are historically similar. Given a flow at time slot , the historically similar flow sets of is denoted as . Pair-wise similarity is computed by Pearson correlation coefficient (PCC) and PCC in [0.5, 0.7] indicates variables are moderately correlated according to (Rumsey, 2015). Therefore, we set here to select the historically similar flows which are at least moderately similarity to the flow . Anomalous degree of flow at time slot is calculated by the following equation:

(3)

where is the decrease degree in speed similarity of and its historically similar flows.

(a) Anomalous degree
(b) Relative speed variation rate
Figure 2. and of three candidate flows

Local Anomalous Degree Algorithm.    The time complexity of computing similarity matrix is , where is the number of flows and is the length of historical speed sequences. For cities with complex traffic road networks such as New York City (13,028 flows), it will cost a lot to compute similarity matrix , similarity decrease matrix and anomalous degree

. We propose a local anomalous degree algorithm to speed up our method based on spectral clustering 

(Yu and Shi, 2003). Spectral clustering is able to identify spatial communities of nodes in graph structures. According to several studies (Tong et al., 2017; Yao et al., 2018; Zheng and Ni, 2013), which assume that traffic in nearby locations should be similar, we also assume that flows in the same community and in the spatially nearby regions will be historically similar. Given a graph , we perform spectral decomposition and obtain

graph spatial features of each flow. Then we use K-means 

(Dhillon et al., 2004), a common unsupervised clustering method, to cluster flows into classes.

0:  Road graph
  1. Compute the adjacency matrix , degree matrix , and normalized Laplacian matrix .
  2. Compute the first eigenvectors of .
  3. Let is the feature matrix of all nodes in the graph.
  for  node in  do
        
end for
  4. Use K-means method to cluster nodes into classes ( labels).
  5. Compute local-similarity matrix and local-similarity decrease matrix .
  6. Compute local-anomalous degree .
ALGORITHM 1 Local Anomalous Degree Algorithm

Validation of Local Algorithm.    Figure 4 shows the clustering result when (marked by different colors). The result shows that the eigenvectors can effectively capture spatial graph features. Our method divides New York City into 10 local districts which are conform to the real-world urban districts, e.g., the red area corresponds to the Manhattan area in New York City. Then we only need to compute the local values of , and in the same district.

Next, different from anomaly detection, we aim at exploring the impact on traffic flows of different

urban traffic incidents. Also taking Figure 1(b) as an example, there is a flawed scene that three flows , and are historically similar to each other at time slot . Therefore, the sharp variations of and will strongly affect the anomalous degree of . Figure 2(a) shows the anomalous degrees of them from 4:00 am to 12:00 pm. Near 06:32 am, actually has a higher anomalous degree (0.198) than (0.110) and (0.085). However, we can see it intuitively in Figure 1(c) that when close to 06:32 am, the anomalous variation of speeds of and are more striking than . The reason for this diametrically opposite result is that after the incident, the tendency of anomalous changes of and are mighty similar, which leads to the low anomalous degree of them. Therefore, in order to handle the scenario mentioned above, we add another metric to help amend our discovery method.

      Definition 4 ().

Relative Speed Variation (RSV). Given a flow at time slot , and the historical speed sequence of in a -length time window, we define the relative speed variation of is

(4)

We define a normalization time window and use the max value observed in the time window to normalize . We use 24 hours (288 intervals) as the normalization window length, i.e., and , and intervals.

Validation of .   

As a heuristic approach, we test different candidate computing methods of

as baselines for validation. We consider three related features: slope of speed variation ()  (Viovy et al., 1992), recent speed () and historical average speed ((Boriboonsomsin et al., 2012) corresponding to three candidate computing methods of . They are listed as follows:

  • Consider all three features: , where and is two parameters to control the ratio of recent speed and historical average speed, is the historical average slope and is the slope of time slot and .

  • Consider recent speed and historical average speed: .

  • Consider historical average speed: .

We use the normalized item to normalize the three computing methods. We use Pearson correlation coefficient to calculate the correlation coefficient of and of all urban traffic incidents in our dataset (an hour before and after the incident). In order to use to amend , we choose the most negatively correlated computing method as our ( and are set to 0.5), i.e., only consider historical average speed: . Figure 2(b) shows the result of the congestion incident. Near 06:32 am, in contrast to , the max of and are both larger (0.377 and 0.333) than . It is conform to the speed variation (Figure 1(c)) and indicates that can also capture anomalies well and effectively correct the flaw of .

Figure 3. Clusters of NYC
(a) SFO (b) NYC
Figure 4. Varying and
      Definition 5 ().

Incident Effect Score (IES). Due to the complementarity of anomalous degree and relative speed variation , we combine both of them to compute the incident effect score. Given a flow at time slot , the incident effect score is calculated by:

(5)

where is a parameter to control the ratio of and .

      Definition 6 ().

Critical Incident. For incidents like mega-events, the traffic flows might be affected before incidents begin, on the contrary, incidents like traffic collisions will begin to affect traffic flows after they occurred. Therefore, given an incident with a start time , we firstly set a T-length “start to influence” window and define the flows which are highly affected by the incident is , where is a threshold parameter.

When , more specifically, there is at least one flow is highly affected by , we call is a critical incident, where denotes the cardinality of a set. We define an incident which is not a critical incident as a non-critical incident.

3.2. Evaluation and results

Parameter Setting.    The datasets we use here are listed in Table 1. We set and one hour as the length of “start to influence” time window.

Varing and .    Figure 4 shows the number of critical incidents discovered when varying and . In SFO, when , most incidents are discovered as critical (1,706 out of 1,832 averagely), which indicates that most incidents indeed have an impact on traffic flows. There are a small number of incidents which almost have no impact (6.9%, and 12.2%, ), which further proves that treating all traffic incidents equally for traffic speed prediction is unreasonable. When rises (0.10, 0.15 or 0.20), there is a sharp reduction of critical incidents, which indicates the impact of incidents varies in degree. In order to discover incidents with high impact, we set and of SFO. The results of NYC is similar with SFO, most incidents are discovered as critical incident when is set to 0 or 0.05. Reductions also appear when rises. We set and of NYC.

(a) Spatial of SFO
(b) Temporal of SFO
(c) Spatial of NYC
(d) Temporal of NYC
Figure 5. Spatial and temporal distributions of urban traffic incidents

Spatial Distributions.    Figure 5(a) and 5(c) shows the spatial distributions of incidents in SFO and NYC. An incident is plotted by a line with an origin and an end. In SFO, although most of both two type incidents occur on the main roads (continuous parts), our method can effectively discover critical incidents (green circle). Moreover, we check critical incidents in the green circles and find they are mostly the Event type, which has a high severity level recorded by HERE. In NYC, both two type incidents also gather in the main roads. The number of urban traffic incidents in NYC is far more than in SFO but we can still observe the differences. Critical incidents which did not occur in the main roads are mainly locate in Manhattan (the middle circle). In the left circle, we find that most incidents away from city center are discovered as non-critical.

Temporal Distributions.    Figure 5(b) and 5(d) show the temporal distributions of critical and non-critical incidents in two cities. In SFO, incidents mostly occur in rush hours (7-9 am and 4-7 pm), which is in line with daily routine. At about 12 pm (noon) and 3 pm on weekday, the ratio of critical incidents has a drop while the ratio of non-critical rises, which might because both time are not in rush hours and incidents may not have high impact. On weekend and during mid-afternoon, there is also a drop of the critical incidents and a rise of non-critical type. We also find that incidents are more likely to occur in the early morning on weekend than weekday. In NYC, most critical incidents also occur in rush hours. Incidents occur in the early morning tend to be non-critical in both two cities. On weekend, NYC only has one incident peak (mid-afternoon) and on weekday, NYC does not have the mid-afternoon peak while SFO presents the peak.

Summary of Results.    Parameters and represent the threshold to discover urban incidents with high impact on traffic speeds. The lower the and are, the lower the threshold to mark critical incidents. The results of varying and show that some urban incidents almost have no impact on traffic speeds and impact of urban incidents varies in degree, which indicate that it is unreasonable to use all urban traffic incidents features for traffic speed prediction. Spatio-temporal distributions show noteworthy differences between urban critical and non-critical incidents, which indicates that our urban critical incident discovery method can effectively discover incidents with high impact on traffic speeds.

4. Extract the Latent Incident Impact Features

So far, we have proven that our discovery method can effectively discover urban critical/non-critical incidents. In this section, we propose to use deep learning methods to extract the latent incident impact features for traffic speed prediction. Taking two aspects into account, we design a binary classifier to extract the latent impact features:

  • There are some urban incidents almost have no impact on traffic flows and low-impact incidents features will even bring noise to the model. There are also noteworthy differences of spatio-temporal features between crucial and non-crucial incidents, which inspires us to consider the binary classification problem.

  • The impact of urban incidents on traffic speeds varies in degree and the impact is neither binary nor strict multi-class. Therefore, we should not use the binary result directly, we propose to extract the latent impact features from the middle layer of the binary classifier for traffic speed prediction, where the latent features are continuous and filtered.

4.1. Methodology

Figure 6. The architecture of the binary classifier

The task of the binary classifier is to predict whether an incident is critical/non-critical, i.e., whether an urban incident has a high/low impact on traffic speed. Considering that the impact of incidents is related to spatio-temporal and context features and previous works (Lv et al., 2018; Yao et al., 2018; Zhang et al., 2016) which use spatio-temporal and context features for traffic prediction (we will discuss them in Section 6), our classifier consists of three components: spatial learning component (GCN), temporal learning component (LSTM) and context learning component.

Spatial Learning: GCN (Figure 6(a)).    City road network has latent traffic patterns and there are complex spatial dependencies (Li et al., 2018)

. We need to capture the road topological features, i.e., the spatial dependencies of the road network. Traditional methods divide the city into several grids and use Convolutional Neural Network (CNN) to capture spatial features 

(Yao et al., 2018; Zhang et al., 2016). However, it neglects the road topological features and also lose the spatial information within grids. Moreover, graph structure related features are hard to be used in CNN for our problem. We adopt graph convolutional network (GCN) (Bruna et al., 2014) to learn the spatial topology features. GCN is known for being able to capture the topology features in non-Euclidean structures, which is suitable for road network. GCN model follows the layer-wise propagation rule (Kipf and Welling, 2017):

(6)

where is the adjacency matrix, is the adjacency matrix of the graph with added self-connections, is the degree matrix and . is the normalized Laplacian matrix of the graph .

denotes an activation function.

is the trainable weight matrix, is the matrix of activations in the -th layer. , where is the input vectors of GCN.

We use the above mentioned graph . At each time slot , we obtain a real-time speed of every flow in , and we define the speed snapshot , where is the total number of flows in the city. We also add another graph structure related feature: the distance of each flow from the incident, which is because of the impact of incidents on flows has a strong correlation with distance (Tong et al., 2017; Yao et al., 2018; Zheng and Ni, 2013). We define the distance of from the incident is the Euclidean distance between the flow center and incident center. Therefore, at each time slot , the input features . For a urban traffic incident, the time span of input speed snapshots is , where is the start time of the incident and is the length of “start to influence” time window which is defined in Section 3.

For the input signal with C input channels ( here) and filters or features of spectral convolutions map as follows (Kipf and Welling, 2017):

(7)

where is a matrix of filter parameters, is the convolved signal matrix and is the number of filters or features. Next, at each time slot , after k graph convolutional (GC) layers, we then feed middle states into fully connected (FC) layers to get the spatial learning output of each snapshot.

Temporal Learning: LSTM (Figure 6(b)).    We feed a sequence of graph speed snapshots to GCN, and the output is a sequence of spatial features at each time slot from to

. Then we adopt Long Short-Term Memory (LSTM) model 

(Hochreiter and Schmidhuber, 1997) as our temporal learning component. LSTM is known for being able to learn long-term dependency information of time related sequences. LSTM has the ability to remove or add information to the state of the cell through a well-designed structure “gate”. we extract the spatial features for each snapshot in GCN and feed the sequence into LSTM cells. Then we can iteratively get the output sequence . We use the last LSTM cell output as the output of temporal learning part.

Context Learning (Figure 6(c)).    Incident context features are also important for prediction. We use the following features for context learning:

  • Incident type (e.g., traffic collision and event).

  • Road status: Whether the urban incident lead to a road close or not.

  • Start and end hour: HERE gives a start time and an anticipative end time of an incident.

  • Incident duration: The anticipative duration of the incident, i.e., .

  • Weekday, Saturday or Sunday.

We use one-hot encoding to preprocess class features and normalize the incident duration feature.

The context learning component is a Deep Neural Network (DNN) structure, more specifically, an input layer and a fully connected layer (shown in Figure 6(c)). After embedding the context information, we feed the context embedding to a fully connected layer to get , which is the output of context learning.

Latent incident impact features extraction.    After getting and spatio-temporal feature , we use a concat operation to concatenate them as of each incident. Then we feed to FC layers. We extract the output of the last FC layer before the output layer as the latent incident impact features, which is because that output layer uses these features as the input to predict whether the incident has high impact on traffic flows. We denote the latent impact features as . Finally we get the prediction value , and compute the loss compare to real value .

Objective Function and Evaluation Metric.   

The classifier is training by minimizing Binary Cross Entropy Loss (BCELoss) between the predicted speed and the real value. BCELoss is defined as follow:

(8)

We use BCELoss and F1-score () to evaluate the binary classifier.

4.2. Middle Experiments

Parameter Setting.    The datasets we use here are listed in Table 1. We use the discovery results obtained in the last section as the ground truth. There are 1,061 positive samples (critical) and 771 negative samples (non-critical) of SFO and 17,924 positive samples and 15367 negative samples of NYC. We use 5 minutes as the time interval and train our classifier with the following hyper-parameter settings: learning rate (0.001) with Adam optimizer. In GCN, we set two GCN layers followed by one FC layer with the 64-dimension output. The length of ”start to influence” window is set to one hour, i.e., the input size of the first GCN layer is 12. We use relu activation function and add Dropout () in GC layer. We use one LSTM layer with 64-dimension hidden states. After concat, we adopt one FC layer (16-dimension) and follow by the output FC layer using sigmoid activation function. We use 70% data for training, and the remaining 30% as the test set. We select 90% of training set for training and 10% as the validation set for early stopping.

Results and Analysis.    Using the traffic incident and traffic speed sub-datasets for training, we finally get 0.8241 F1-score and 0.4429 BCELoss in the test set of SFO, and 4731 BCELoss and 0.8000 F1-score of NYC. Our binary classifier model can capture the latent impact features on traffic flows of different incidents, more specifically, we can get the embedding of each input incident. is the output features of context learning and is the output features of spatio-temporal learning. And we feed into ( in our experiment) FC layers to extract the latent impact features before the ouput layer. We will use the binary classifier in the next section as an internal component to help improve traffic speed prediction performance. Since we take the classifier as a middleware of our incident-driven framework, we further evaluate our complete framework with competitive baselines in the next Section.

5. Incident-driven Traffic Speed Prediction

So far, we can effectively capture the latent impact features of urban incidents on traffic flow speeds. Combining above methods, we propose Deep Incident-Aware Graph Convolutional Network (DIGC-Net) to improve traffic speed prediction by urban incident data.

Figure 7. The architecture of DIGC-Net

5.1. Methodology

DIGC-Net (Figure 7) consists of three components: spatio-temporal, incident and periodic learning. Our prediction problem is defined above in the Section 2.

Spatio-temporal Learning (Figure 7(a)).    Considering traffic speed prediction also related to spatio-temporal patterns of traffic network and previous works (Lv et al., 2018; Yao et al., 2018; Zhang et al., 2016) which use spatio-temporal features for traffic prediction (we will discuss them in Section 6), we use the similar structure of spatial and temporal learning in the binary classifier. The spatial-temporal and context structure is a common use in traffic prediction, and we use GCN rather than CNN to better capture spatial features of road network here. GCN is used for capturing spatial graph features and LSTM is adopted to capture the time evolution patterns of traffic speeds. The input features of each node is in GCN, i.e., the speed of each flow at time slot . More specifically, the input features is , which is graph speed snapshot at time slot . We input a sequence of graph speed snapshots features to GCN and after the GCN part, similar to (Yao et al., 2018), we concatenate the weather contexts at each time slot to get . Then we feed the spatial features sequence to LSTM cells to iteratively get the output sequence . Then we use learnable units to predict future traffic speeds . The output of spatio-temporal learning is .

Incident Learning (Figure 7(b)).    To predict traffic speed at time slot , we select all incidents occurred within as the incident learning inputs (the last two hours), where is the earliest included incident occurrence time and is the latest time. We use the pre-trained binary classifier (trained in last Section) to extract

, i.e., the latent incident impact features of each incident. Because the number of incidents occur within the time range is uncertain and incidents occur in a sequential order, so we adopt standard Recurrent Neural Network (RNN) 

(Mikolov et al., 2010) for incident learning. RNN is a neural network that contains loops that allow information to be persisted. Previous incidents will affect the traffic conditions, which may lead to the occurrence of future incidents. Using RNN also help us capture the interrelation of sequentially occurring urban traffic incidents, which is neglected by previous works (Lin et al., 2017). We denote as the output of the last RNN cell.

Periodic Learning (Figure 7(c)).    Traffic flow speeds change periodically and we use the similar structure of (Lv et al., 2018) to learn long-term periodical patterns. We use the same time slots in the last 5 days to learn the periodic features. A fully connected layer is adopted to capture the long-term periodic features. The output of periodic learning is .

Output.    After getting spatio-temporal features , incident impact features , and periodic features , we adopt a concat operation to concatenate them, then feed them into FC-layers. Finally we get the prediction value , and compute the loss compare to the real value .

Objective Function and Evaluation Metric.    DIGC-net is training by minimizing Mean Squared Error () between the predicted speed and the real value. We use Mean Absolute Percentage Error to evaluate DIGC-Net, MAPE is defined as follow:

(9)

where is the total number of flows.

Method MAPE-SFO MAPE-NYC
[1pt] ARIMA 26.70 % 38.60 %
[1pt] SVR 28.24 % 39.73 %
[1pt] LSTM 18.98 % 30.26 %
[1pt] GC 15.69 % 25.79 %
[1pt] LSM-RN 13.72 % 21.53 %
LC-RNN 12.26 % 18.77 %
DIGC-Net 11.02 % 17.21 %
Table 2. Evaluation among different methods

5.2. Evaluations

Parameter Setting.    The datasets we use here are listed in Table 1. We set 5 minutes as the time interval and time window as 4 hours, i.e., . We train our network with the following hyper-parameter settings: learning rate (0.001) with Adam optimizer. In spatio-temporal learning, we set two GCN layers followed by one FC-layer (64-dimension) and the input size of the first GCN layer is 64. We use relu activation function and add Dropout in GCN layer with . In incident learning, we use one RNN layer with 128-dimension hidden state. In periodical learning, we use one FC layer with 64-dimension hidden state. After concat operation, we adopt one FC-layer with 256-dimension and connect the final output layer. We use relu activation function in the FC layers. We use first three weeks data for training, and the remaining one week data as the test set. In training dataset, we select 90% of them for training and 10% as the validation set for early stopping.

Comparison with competitive benchmarks.    We compare our model with the following models in consideration of covering traditional machine learning, matrix decomposition and state-of-the-art deep learning methods:

  • ARIMA (Contreras et al., 2003): Autoregressive integrated moving average is a classics linear model in time series forecasting.

  • SVR (Smola and Schölkopf, 1997)

    : Support Vector Regression is based on the computation of linear regression in a high dimensional feature space and is widely used.

  • LSTM (Ma et al., 2015): This method uses LSTM to capture non-linear traffic dynamic to predict traffic speed.

  • GC (Michaël et al., 2016): GC uses graph convolution, pooling and fully-connected layer to predict future traffic speed. GC is the variation of basic GCN with the efficient pooling.

  • LSM-RN (Deng et al., 2016): Latent space model for road networks learns the attributes of vertices in latent spaces which mainly uses matrix decomposition. It also consider spatio-temporal effects of latent attributes and use an incremental online algorithm to predict traffic speed.

  • LC-RNN (Lv et al., 2018): LC-RNN takes advantage of both RNN and CNN models and designs a Look-up operation to capture complex traffic evolution patterns, which outperforms ST-ResNet (Zhang et al., 2017) and DCNN (Ma et al., 2017), so we do not further compare ST-ResNet and DCNN here.

Table 2

shows the MAPE results of using different methods of SFO and NYC. All other benchmarks in the table is one-step prediction. When compared with different methods, DIGC-Net achieves the best performance in both two cities. DIGC-net has relatively from 10.11% up to 60.97% lower MAPE than these benchmarks in SFO and relatively from 8.31% up to 56.68% lower MAPE than these benchmarks in NYC. We also note significant variance between SFO and NYC among all methods, likely due to large differences in the traffic road network (NYC is much larger than SFO: 2,416 vs 13,028 nodes and 19,334 vs 92,470 edges). The results indicate that DIGC-net can effectively incorporate incident, spatio-temporal, periodic and context features for traffic speed prediction.

Comparison with variants of DIGC-net.    We also present the comparison with different variants of DIGC-net with only spatio-temporal component, spatio-temporal and periodic component, and the whole DIGC-net with all components (spatio-temporal, periodic and incident component). The results are shown in Table 3. The first finding is that the performance improvement of periodic learning is relatively weak, with only difference of 0.25% of SFO and 0.06% of NYC. One possible reason that the improvement margin of SFO is larger than NYC is that there is a relatively simple road network in SFO and the variation of traffic speed is more regular. The MAPE without incident learning (spatio-temporal + periodic) is 12.22% of SFO and 18.63 % of NYC, which also outperform all benchmarks (sightly outperform LC-RNN). It also verifies that our incident learning component is the key to the improvement with a 1.2% MAPE improvement of SFO and 1.42% MAPE improvement of NYC.

Variant MAPE-SFO MAPE-NYC
[1pt] Spatio-temporal 12.47 % 18.69 %
[1pt] Spatio-temporal + periodic 12.22 % 18.63 %
[1pt] DIGC-Net-all (Spatio-temporal + periodic + incident ) 11.02 % 17.21 %
Table 3. Evaluation among different variants of DIGC-net

Comparison with different time period.    As shown in Figure 5(b) and Figure 5(d), the number of incidents varies over time, and more incidents occur at traffic peak periods. Meanwhile, traffic speed variation is also time-sensitive. Therefore, we further select 2:00 - 3:00 am as the wee hour and 07:00 - 08:00 am as the rush hour, and take SFO as the illustration to evaluate the performance of different methods. Figure 8 shows the MAPE results in the wee hour and rush hour. In the wee hour, our method has relatively from 2.08% up to 64.43% lower MAPE than these benchmarks in SFO, and relatively from 10.78% up to 89.50% lower MAPE than these benchmarks in the rush hour. The performance of our method and LC-RNN are pretty similar in the wee hour but exhibits a relatively clear gap in the rush hour, which derives from more complex traffic patterns in the rush hour.

(a) Wee Hour (b) Rush Hour
Figure 8. Time-sensitive comparison of SFO

Comparison for multi-step prediction.    We then present the comparison results for multi-step prediction. DIGC-net can be used for multi-step speed prediction by setting learnable units in spatio-temporal learning component. We set prediction length (speeds of next 5, 10 and 15 minutes) to evaluate the multi-step prediction case. The results are shown in Table 4. The performance of DIGC-net of multi-step prediction remains stable as the predicted length increases (drop relatively 3.09% of and 5.44 % of compare to in SFO and drop relatively 3.88% of and 9.03% of compare to in NYC). When prediction length is within three steps, DIGC-net outperforms all other baselines of one-step prediction in SFO, and in NYC, only of one-step that LC-RNN outperforms three-steps DIGC-net. The multi-step results demonstrate that our model can be effectively applied to multi-step prediction within a certain time range.

Method MAPE-SFO MAPE-NYC
DIGC-Net, k=1 11.02 % 17.27 %
DIGC-Net, k=2 11.36 % 17.94 %
DIGC-Net, k=3 11.62 % 18.83 %
Table 4. Evaluation for multi-step prediction

6. Related Work

Traffic Speed Prediction.    A number of solutions have been proposed for traffic speed prediction. ARIMA (Contreras et al., 2003) is a classical model for this area, and regression methods (Castro-Neto et al., 2009) are also widely used for predicting traffic speed. There are also matrix spectral decomposition models for traffic speed prediction: (Deng et al., 2016) proposed a latent space model to capture both topological and temporal properties. Recently, deep learning approachs achieve great success in this space by using spatio-temporal and context features (Lv et al., 2014; Ma et al., 2017). The spatio-temporal and context structure is a common use in traffic prediction. (Zhang et al., 2016) divided road network into grids and used CNN to capture spatial dependencies. (Lv et al., 2018) proposed a model that integrates both RNN and CNN models. GCN begin to be used for traffic speed prediction recently because of the ability to effectively capture the topology features in non-Euclidean structures. (Li et al., 2018) proposed to model the traffic flow as a diffusion process on a directed graph. (Yu et al., 2017) proposed the STGCN model to tackle the time series prediction problem in traffic domain. In our work, we effectively incorporate traffic incident, spatio-temporal, periodic and weather features for traffic speed prediction. Our main contributions are focus on the effective utilization of incident information for improving prediction performance.

Urban Incidents.    Research on urban anomalous incidents mainly focus on the detection of incidents. (Gu et al., 2016) mined tweet texts to extract incident information to do the traffic incident detection. (Zhang et al., 2018) proposed an algorithm based on SVM to capture rare patterns to detect urban anomalies. (Yuan et al., 2018) proposed a ConvLSTM model for traffic incident prediction. There are also a few works focus on mining the impact of incidents. (Miller and Gupta, 2012) proposed a system for predicting the cost and impact of highway incidents, in order to classify the duration of the incident induced delays and the magnitude of the incident impact. (Javid and Javid, 2018) developed a framework to estimate travel time variability caused by traffic incidents by using a series of robust regression methods. In our work, we extract the latent incident impact features for traffic speed prediction.

7. Conclusion

In this work, we investigate the problem of incident-driven traffic speed prediction. We first propose the critical incident discovery method to identify urban crucial incidents and their impact on traffic flows. Then we design a binary classifier to extract the latent incident impact features for improving traffic speed prediction. Combining both processes, we propose a Deep Incident-Aware Graph Convolutional Network (DIGC-Net) to effectively incorporate traffic incident, spatio-temporal, periodic and weather features for traffic speed prediction. We evaluate DIGC-Net using two real-world urban traffic datasets of large cities (SFO and NYC). The results demonstrate the superior performance of DIGC-Net and validate the effectiveness of extracting latent incident features in our framework.

References

  • (1)
  • Boriboonsomsin et al. (2012) Kanok Boriboonsomsin, Matthew J Barth, Weihua Zhu, and Alexander Vu. 2012. Eco-routing navigation system based on multisource historical and real-time traffic information. IEEE Transactions on Intelligent Transportation Systems 13, 4 (2012), 1694–1704.
  • Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of 2nd International Conference on Learning Representations (ICLR ’14).
  • Castro-Neto et al. (2009) Manoel Castro-Neto, Young-Seon Jeong, Myong-Kee Jeong, and Lee D Han. 2009. Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert systems with applications 36, 3 (2009), 6164–6173.
  • Contreras et al. (2003) Javier Contreras, Rosario Espinola, Francisco J Nogales, and Antonio J Conejo. 2003. ARIMA models to predict next-day electricity prices. IEEE transactions on power systems 18, 3 (2003), 1014–1020.
  • Deng et al. (2016) Dingxiong Deng, Cyrus Shahabi, Ugur Demiryurek, Linhong Zhu, Rose Yu, and Yan Liu. 2016. Latent Space Model for Road Networks to Predict Time-Varying Traffic. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). 1525–1534.
  • Dhillon et al. (2004) Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel K-means: Spectral Clustering and Normalized Cuts. In Proceedings of the 10th ACM SIGKDD international conference on Knowledge Discovery and Data Dining (KDD ’04). 551–556.
  • Gao et al. (2019) Ruipeng Gao, Xiaoyu Guo, Fuyong Sun, Lin Dai, Jiayan Zhu, Chenxi Hu, and Haibo Li. 2019. Aggressive driving saves more time? multi-task learning for customized travel time estimation. In

    Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI ’19)

    . AAAI Press, 1689–1696.
  • Gu et al. (2016) Yiming Gu, Zhen(Sean) Qian, and Feng Chen. 2016. From Twitter to detector: Real-time traffic incident detection using social media data. Transportation Research Part C: Emerging Technologies. 67 (2016), 321–342.
  • He et al. (2019) Ya-qin He, Yu-lun Rong, Zu-peng Liu, and Sheng-pin Du. 2019. Traffic Influence Degree of Urban Traffic Accident Based on Speed Ratio. Journal of Highway and Transportation Research and Development (English Edition) 13, 3 (2019), 96–102.
  • Here (2019) Here. 2019. https://developer.here.com/.
  • Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long Short-term Memory. Neural computation. (1997), 1735–1780.
  • Javid and Javid (2018) Roxana J Javid and Ramina Jahanbakhsh Javid. 2018. A framework for travel time variability analysis using urban traffic incident data. IATSS research 42, 1 (2018), 30–38.
  • Johnson et al. (2017) Isaac Johnson, Jessica Henderson, Caitlin Perry, Johannes Schöning, and Brent Hecht. 2017. Beautiful… but at What Cost?: An Examination of Externalities in Geographic Vehicle Routing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (Ubicomp ’17) 1, 2 (2017), 15.
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of 5th International Conference on Learning Representations (ICLR ’17).
  • Li et al. (2018) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. DIFFUSION CONVOLUTIONAL RECURRENT NEURAL NETWORK: DATA-DRIVEN TRAFFIC FORECASTING. In Proceedings of 6th International Conference on Learning Representations. (ICLR ’18).
  • Li et al. (2017) Zhibin Li, Pan Liu, Chengcheng Xu, Hui Duan, and Wei Wang. 2017. Reinforcement learning-based variable speed limit control strategy to reduce traffic congestion at freeway recurrent bottlenecks. IEEE transactions on intelligent transportation systems 18, 11 (2017), 3204–3217.
  • Lin et al. (2017) Lu Lin, Jianxin Li, Feng Chen, Jieping Ye, and Jinpeng Huai. 2017. Road traffic speed prediction: a probabilistic model fusing multi-source data. IEEE Transactions on Knowledge and Data Engineering 30, 7 (2017), 1310–1323.
  • Lin (1989) Lawrence I-Kuei Lin. 1989. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics. 67 (1989), 255–268.
  • Lv et al. (2014) Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. 2014. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16, 2 (2014), 865–873.
  • Lv et al. (2018) Zhongjian Lv, Jiajie Xu, Kai Zheng, Hongzhi Yin, Pengpeng Zhao, and Xiaofang Zhou. 2018. LC-RNN: A Deep Learning Model for Traffic Speed Prediction. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI ’18).
  • Ma et al. (2017) Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Ma, Yong Wang, and Yunpeng Wang. 2017. Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors 17, 4 (2017), 818.
  • Ma et al. (2015) Xiaolei Ma, Zhimin Tao, Yinhai Wang, Haiyang Yu, and Yunpeng Wang. 2015. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies 54 (2015), 187–197.
  • Michaël et al. (2016) Defferrard Michaël, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of Neural Information Processing Systems. (NIPS ’16).
  • Mikolov et al. (2010) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.
  • Miller and Gupta (2012) Mahalia Miller and Chetan Gupta. 2012. Mining Traffic Incidents to Forecast Impact. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing (Urbcomp ’12).
  • Pan et al. (2012) Bei Pan, Ugur Demiryurek, and Cyrus Shahabi. 2012. Utilizing Real-World Transportation Data for Accurate Traffic Prediction. In Proceedings of 2012 IEEE 12th International Conference on Data Mining (ICDM ’18).
  • Rathore et al. (2016) M Mazhar Rathore, Awais Ahmad, Anand Paul, and Seungmin Rhob. 2016. Urban planning and building smart cities based on the internet of things using big data analytics. Computer Networks 101 (2016), 63–80.
  • Rumsey (2015) Deborah J Rumsey. 2015. U Can: statistics for dummies. (2015).
  • Smola and Schölkopf (1997) Alex J. Smola and Bernhard Schölkopf. 1997. A Tutorial on Support Vector Regression. Statistics and Computing. (1997), 199–222.
  • Tong et al. (2017) Yongxin Tong, Yuqiang Chen, Zimu Zhou, Lei Chen, Jie Wang, Qiang Yang, Jieping Ye, and Weifeng Lv. 2017. The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1653–1662.
  • Viovy et al. (1992) N Viovy, O Arino, and AS Belward. 1992. The Best Index Slope Extraction (BISE): A method for reducing noise in NDVI time-series. International Journal of remote sensing 13, 8 (1992), 1585–1590.
  • Xie et al. (2018) Rong Xie, Yang Chen, Yu Xiao, and Xin Wang. 2018. We Know Your Preferences in New Cities: Mining and Modeling the Behavior of Travelers. IEEE Communications Magazine (2018), pages 28–35.
  • Yahoo (2019) Yahoo. 2019. https://developer.yahoo.com/weather/.
  • Yao et al. (2019) Huaxiu Yao, Xianfeng Tang, Hua Wei, Guanjie Zheng, and Zhenhui Li. 2019. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In AAAI Conference on Artificial Intelligence (AAAI ’19).
  • Yao et al. (2018) Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI’ 18).
  • Yu et al. (2017) Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).
  • Yu and Shi (2003) Stella X. Yu and Jianbo Shi. 2003. Multiclass spectral clustering. In

    Proceedings Ninth IEEE International Conference on Computer Vision

    (ICCV ’03).
  • Yuan et al. (2018) Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18). 984–992.
  • Zhang et al. (2018) Huichu Zhang, Yu Zheng, and Yong Yu. 2018. Detecting urban anomalies using multiple Spatio-temporal data sources. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (Ubicomp ’18) 2, 1 (2018), 54.
  • Zhang et al. (2017) Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of Thirty-First AAAI Conference on Artificial Intelligence (AAAI ’17).
  • Zhang et al. (2016) Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. 2016. DNN-Based Prediction Model for Spatial-Temporal Data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. (SIGSPATIAL ’16).
  • Zheng and Ni (2013) Jiangchuan Zheng and Lionel M Ni. 2013. Time-dependent trajectory regression on road networks via multi-task learning. In Twenty-Seventh AAAI Conference on Artificial Intelligence.