TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction

Critical incident stages identification and reasonable prediction of traffic incident duration are essential in traffic incident management. In this paper, we propose a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. First, we formulate a sparsity optimization problem that extracts low-level temporal features based on traffic speed readings and then generalizes higher level features as phases of traffic incidents. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the spatial connectivity of the road network to predict the incident duration. The proposed problem is challenging to solve due to the orthogonality constraints, non-convexity objective, and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world traffic data and traffic incident records justify the efficacy of our model.



There are no comments yet.


page 2

page 4

page 5


Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting

Predicting traffic incident duration is a major challenge for many traff...

Queuing Theory Guided Intelligent Traffic Scheduling through Video Analysis using Dirichlet Process Mixture Model

Accurate prediction of traffic signal duration for roadway junction is a...

Incident duration prediction using a bi-level machine learning framework with outlier removal and intra-extra joint optimisation

Predicting the duration of traffic incidents is a challenging task due t...

Multitask Learning for Network Traffic Classification

Traffic classification has various applications in today's Internet, fro...

Multi-Airport Delay Prediction with Transformers

Airport performance prediction with a reasonable look-ahead time is a ch...

DynLight: Realize dynamic phase duration with multi-level traffic signal control

Adopting reinforcement learning (RL) for traffic signal control is incre...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The studies of early detecting the traffic incidents and estimating the impact of the non-recurrent congestions caused by traffic incidents have become increasingly important research topics due to the significant social and economic losses generated. A one-minute reduction on congestion duration produces a 65 US dollars gain per incident 

(Adler et al., 2013). Although non-recurrent congestion is hard to predict due to its nature of randomness, the studies on impact and duration of the traffic incidents are still ones of the major focuses for the traffic operators. The vast deployment of transportation traffic speed sensors and Traffic Incident Management Systems (TIMS) make the traffic speed data and traffic incident records ubiquitously accessible for the transportation operators. With the abundance of the traffic data sources, an efficient multi-task learning model can be implemented to provide an accurate prediction on incident duration.

Figure 1. From the perspectives of traffic management and transportation operations, the life cycle of a traffic incident is separated into five stages: Detection, Verification, Response, Clearance, and Recovery

Incident duration is the time elapsed from the incident occurrence until all evidence of the incident has been removed from the incident scene. From the perspective of traffic management and operation, the life cycle of a traffic incident is split into five stages: Detection, Verification, Response, Clearance, and Recovery (Ozbay and Kachroo, 1999). Figure 1 shows the life cycle of a traffic incident. However, the five-stage life cycle separation cannot be used directly as the temporal features for the traffic incident duration prediction. To accurately estimate the duration of a traffic incident in its early stages, the transportation operators and first responders encounter three major challenges: 1) No explicit high-level temporal features: Although the conventional five-stage life cycle separation is effective for the purposes of traffic management, such five-stages cannot be considered as temporal features in traffic incident duration prediction task. It is important to group the critical time point features in the early stages of the incident forming higher level time periods that can perform as a better indicator for predicting the incident duration. 2) Hard to predict the influence of incident: In the research field of Traffic Incident Management, one of the most essential tasks is to estimate the impact of the traffic incident in terms of its temporal duration at early stages. However, the performances of the conventional time series based methods are limited by their incapability of identifying higher level temporal features. 3) Spatial connectivity of the road networks is rarely considered: The traffic congestion cascades within the road network. As a consequence, the traffic patterns of incidents in their early stages are similar when the traffic incidents are topologically closer from the perspective of the road networks. Traffic incidents that are spatiotemporally closer should share more similar traffic speed patterns. However, this spatial correlation between traffic incidents is rarely considered in the previous studies (Li et al., 2018).

The existing methods are mostly infeasible to solve these challenges. Current feature learning methods such as -norm regularized methods such as Lasso (Tibshirani et al., 2005) have properties in terms of feature selection. However, strong assumptions on the design matrix are required (Zhou et al., 2013a). Zhan et al. (Zhan et al., 2011) propose an M5P tree algorithm to predict the clearance time of traffic incident based on the geometric, and traffic features. Feature learning algorithms for biomarker identification (Zhou et al., 2013b) and social event indicators (Zhao et al., 2018) are proved to be effective while finding higher level features. However, most of them focus on learning important feature sets from attributes and does not apply to our encountered problem due to expensive computation. In these studies, they considered the duration of an incident to quantify the impact. However, their quantification strategies are designed to capture the one-time impact of the incident, instead of the time-varying nature of impact at different locations. Multi-task learning based spatiotemporal model plays an important role while considering the connectivity of the road networks. Multi-task based spatiotemporal models focus on regression and classification problems such as county income prediction (Zhang et al., 2017), social unrest event forecasting (Zhao et al., 2016), and even service disruption detection for transit networks (Ji et al., 2018). However, none of the previously proposed methods is capable of modeling the spatial connectivity between features at a higher level. Therefore, most of the existing models are not suitable for our traffic incident duration prediction problem.

To address these challenges, we propose a Traffic IncidenT DurAtion PredictioN (TITAN) model based on both sparse feature learning and multi-task learning framework. Our main contributions are:

Formulating a novel machine learning framework for traffic incident duration prediction using temporal features

. In contrast to existing works, we formulate the problem of traffic incident duration prediction for transportation systems as a multi-task supervised learning problem. In the proposed methods, models for different road segments are learned simultaneously by restricting all road segments to exploit a common set of features.

Modeling traffic speed similarity among road segments via spatial connectivity in feature space. Based on the cascading nature of the traffic congestion in road networks, specifically designed constraints are proposed to model traffic speed similarities among data for spatiotemporally correlated road segments. These similarities in feature space are driven by spatial connectivity.

Proposing a sparse feature learning process to identify groups of temporal features at a higher level. According to the nature of the traffic incidents, the traffic speed fluctuation in the early stages of the incidents is always important while estimating the impact and duration of the traffic incident. In the proposed model, constraints with sparsity and orthogonality are introduced to extract grouped important temporal features at a higher level.

Developing an efficient algorithm to train the proposed model. The underlying optimization problem of the proposed multi-task model is a non-smooth, multi-convex, and inequality-constrained problem, which is challenging to solve. By introducing auxiliary variables, we develop an effective ADMM- based algorithm to decouple the main problem into several sub-problems which can be solved by block coordinate descent and proximal operators.

The rest of our paper is structured as follows. Related works are reviewed in Section 2. In Section 3, we describe the problem setup of our work. In Sections 4 and 5, we present a detailed discussion of our proposed TITAN model for predicting durations of traffic incidents, and its solution for parameter learning. In Section 6, extensive experiment evaluations and comparisons are presented. In the last section, we discuss our conclusion and directions for future work.

2. Related Works

In this section, we provide a detailed review of the current state of research for traffic incident analysis problem. There are several threads of related work of this paper: traffic incident impacts analysis, urban event forecasting, and spatiotemporal multi-task learning.

Traffic Impacts Analysis

. The applications of conventional statistical methods have addressed its effectiveness in the traffic incident duration time prediction problems. The statistical methods fall into several branches: Bayesian classifier 

(Boyles et al., 2007), discrete choice model (Lin et al., 2004), linear/non-parametric regression (Peeta et al., 2000), hazard-based duration model (Nam and Mannering, 2000). In the recent decade, the Traffic Incident Management Systems (TIMS) have been deployed by traffic control centers in various cities and highways to alleviate the influence of traffic incidents on traffic conditions (Owens et al., 2010). The historical traffic data obtained corresponds to traffic incidents play an important role in predicting the traffic incident durations. A new research field based on data-driven algorithms and supported by real-world traffic data availability has recently emerged for traffic incident duration prediction with increasing research popularity. Various data mining and machine learning approaches have been employed to estimate and predict traffic incident duration time. Some of these approaches are the following: Lee et al. (Lee and Wei, 2010)

proposed a genetic algorithm on traffic incident duration time prediction problems; Kim et al. and Zhan et al. 

(Zhan et al., 2011)

applied decision trees and classification tree models on the same problem and achieved improvements; Valenti et al. 

(Valenti et al., 2010)

proposed a support vector machine related method that utilizes the temporal features of the traffic data; artificial neural networks 

(Vlahogianni and Karlaftis, 2013) is another highlighted direction for traffic incident duration prediction. In recent years, the research field of Intelligent Transportation Systems (ITS) have addressed its attention towards the hybrid methods (Kim and Chang, 2012) to predict traffic incident durations.

Urban Event Forecasting. To predict and detect the occurrence and impact the traffic incidents as urban events have received increasing attention in recent years. A large body of traditional work for event forecasting has focused on the early detection of events such as earthquakes (Sakaki et al., 2010), disease outbreaks (Zhao et al., 2015a), and transit service disruption (Ji et al., 2018), while event forecasting methods predict the incidence of such events in the future. Temporal events are the major focuses of the most existing event forecasting methods, with no interest in the geographical dimension, such as stock market movements (Bollen et al., 2011) and elections (O’Connor et al., 2010). A handful of works started to address the urban event prediction problem on a spatiotemporal resolution. For example, Zhao et al. (Zhao et al., 2015b) proposed a multi-task learning framework that models forecasting tasks in related geo-locations concurrently and; Gerber et al. (Gerber, 2014)

utilized a logistic regression model for spatiotemporal event forecasting, the urban event predictions with true spatiotemporal resolution. One limitation of these existing studies is that the temporal dimension is considered to be independent of the spatial dimension, and any interactions between the two are ignored. Our proposed

TITAN model addresses the importance of the topology dimension, which is derived from the spatial dimension. We propose a multi-task learning framework with orthogonal constraints to model the interactions between the temporal and topological dimensions.

Spatiotemporal Multi-task Learning. Multi-task learning (MTL) refers to models that learn multiple related tasks simultaneously to improve overall performance. Recent decades have witnessed proposals for many MTL approaches (Zhou et al., 2011). Evgeniou et al. (Evgeniou and Pontil, 2004) proposed a regularized MTL formulation that constrains the models of each task to be close to each other. Task relatedness can also be modeled by constraining multiple tasks to share a common underlying structure (e.g., a common set of features) (Argyriou et al., 2007), or a common subspace (Ando and Zhang, 2005). Zhao et al. (Zhao et al., 2015b)

designed a multi-task learning framework that models forecasting tasks in related geolocations. MTL approaches have been applied in many domains including computer vision and biomedical informatics. Our work, to the best of our knowledge, is the first paper to address the feasibility of combining multi-task learning and orthogonal regularization techniques to resolve traffic incident duration prediction and critical phases learning problems.

3. Problem Statement

Assume that we are given a collection of traffic incidents from the traffic incident management system (TIM). For each traffic incident in , we find the spatially correlated traffic sensor , and its traffic speed reading at time interval : , the granularity of the time interval is 1 minute. Given an incident record, and the traffic speed readings of its corresponding traffic speed sensor, the main objective of this paper is to predict the future impact of this given incident in terms of the temporal duration of this traffic incident.

Definition I: Traffic speed in detection time and early verification time. Suppose the verification time of the traffic incident is in time interval , we define and extract two important time periods respond time (time between incident occurrence and incident verification time ) and early verification time (a short period after the traffic incident verification time ) for feature construction. The traffic speeds for both time periods are extracted as: (1) Traffic speed in detection time: the previous readings: and (2) Traffic speed in early verification time: the succeeding readings .

Given the collection of traffic incidents, we first filter the collection with a selection of arterial roads. This produces the targeted traffic incidents collection . Then based on which traffic incident takes place at the arterial road, is grouped into , for example, .

We adopt a combination of traffic speed readings in detection time and early verification time as the training features. For each traffic incident subcollection , we construct the training input and the label . The problem is then formulated as solving the mapping:


where . is the number of traffic incident records for one arterial road; represents the feature dimension of the training data, which is a combination of the detection time and the verification time; is the learning model for inferring the traffic incident duration in the subcollection .

Consider that our problem is to predict the duration of the traffic incidents if there is a historical traffic speed reading for the corresponding collection of target traffic incidents , then it fits into the scope of the regression problem. For instance, learning the function

can be modeled as a regression problem with a least square loss function, and the model parameters

can be learned by solving the following optimization problem:


where controls the sparsity of the grouped features, is the total number of data points in . Moreover, as inspired by the spatial correlations of traffic incidents introduced by the connectivity between road segments, the subproblem defined in Section 3 to a regression problem under a multi- task learning framework. The proposed model should be encouraged to capture hidden patterns among road segments and to maintain sparsity in feature space. Mathematically, this consideration inspires us to use the norm (Argyriou et al., 2008) to perform joint feature selection:


where each column of , which represented by , denotes the model parameters for . In this way, we can further model the relatedness among the road segments with parameter matrix . The overview of the TITAN model is represented in Figure 3. The following subsections address the details of the constraints on orthogonality and spatial connectivity.

4. Model

To identify the critical temporal features for traffic incident duration prediction, orthogonal constraints are applied to the TITAN model; to properly model the correlations between the traffic incidents based on the connectivity between the arterial roads, we apply a multi-task learning framework while designing the model.

4.1. Group Feature Learning

In the studies of Traffic Incident Management (TIM), one important task is to identify the key response time points and periods of traffic incidents. Assume that a two-vehicle collision occurs at 5:15 pm on the road segment of Interstate 66, based on the traffic speed readings from the traffic sensor, the transportation agencies want to learn how much impact the traffic incident will introduce to the local transportation system in terms of duration in time. The traffic speed readings of 5 minutes and 15 minutes after the traffic incident’s occurrence play an important role in predicting the duration of the traffic incident.

Definition II: Groups of key time points for a traffic incident. The group assignment information is represented in a vector, and the th group of time points is denoted by . If the th time point feature belongs to this group, then the th component of is non-zero and the relative magnitude represents the ‘importance’ of the feature in this group. For training data for one specific road segment, the new features generated by the group assignment is given by . Assume that there are groups of features and the group structure is denoted by , and the generalized new features are given by . To assign physical meaning to each generated group, the elements of have to be non-negative.

The new model vector for the grouped features is denoted by . The resulting formulation of the key feature group identification problem is then defined by:


where the parameter that controls the sparsity of each assigned group in . The -norm in the constraint determines the length of the column in to be , which makes the group matrix easy to be interpreted.

By solving Equation 4, the model learns the group structure of the data features. However, the features may be largely overlapped because the proposed constraint does not consider any restrictions on feature overlapping. Such group overlapping is not ideal in our problem setting of traffic incident duration prediction problem. Because our selection of features is based on a time sequence of traffic speed readings, the consecutiveness of the features always provides a physical meaning.

In the research of traffic incident management, the lifetime of an incident generally consists of five different stages: incident detection, verification, response, clearance, and recovery. Because all stages do not overlap with each other, we impose the orthogonal constraints to control the overlapping conditions among the groups. The original nonnegative constraint between all , is also applied. In terms of simplicity and interpretation, we normalize the group assignments and assume that the columns of are of length 1 for norm. The constraint can further be expressed by . We use the norm regularization to control the sparsity on . The improved formulation of group feature learning can be given by:


4.2. Spatial Connectivity in Feature Space

In real-world transportation systems, different road segments are spatially related by intersections or interchanges. That is, two or more road segment may share similar traffic speed pattern during the traffic incidents. For instance, traffic congestion on Interstate 495 could not only cause traffic pattern change at local road segments but also lead to traffic pattern change on other arterial roads that have close spatial correlations (e. g. Interstate 66 and US Route 7). This spatial relatedness caused by network failure cascade (Su et al., 2014; Kwee et al., 2018) results in similar traffic speed fluctuations; therefore, a similar pattern of traffic incident durations.

Figure 2. Road Segments Connectivity Shown by Adjacency Matrix. The left figure shows an example of the road network, the edges represent the road and the vertices represent the intersections; the middle figure shows the converted line graph of the road network, the vertices represent the roads; the right figure shows the adjacency matrix generated from the line graph.
Figure 3. A Schematic View of the Traffic Incident Duration Prediction Model (TITAN). Similarities among temporal features are modeled by two major factors: spatial connectivity between arterial roads and the orthogonal constraint on . In particular, arterial roads connectivity constraints encourage the model to decrease differences between spatially related arterial roads in feature space. The orthogonal constraint encourages the model to identify groups of critical temporal features that are most influential to the prediction results.

Definition III: Traffic incident spatial correlations. With prior knowledge such as the road network connectivity, we assume that the traffic incidents are spatially correlated with each other. Given a road network , where the vertices set represents the union collection of the intersections and interchanges, and the edges set represents the collection of roadblocks. In order to model the connectivity of the road network, we transform the original road network graph to its line graph , where the vertices set represents the roads, and the edges set represents the connectivity between roads. The adjacency matrix of the line graph reflects the overall connectivity of the roads. The roads connectivity and the line graph transformation is shown in Figure 2. Mathematically, we improve the model with constraints on parameters among different tasks:


where each constraint with forces the Euclidean distance between model parameters for a specific pair of road segments to be within a range. As defined in Section 3, is the adjacency matrix that models the connectivity between road segments.

Combining the models represented by Equations 5 and 6, we obtain our proposed TITAN model. By moving the non-trivial constraints that are correlated to spatial connectivity into the objective function, we can obtain an equivalent regularized problem, which is easier to solve:


where is trade-off penalty balancing the value of the loss function and the regularizers. is the adjacency matrix representing the road connectivity; denotes the connectivity information between the -th road and the -th road. Because the line graph for road segments is undirected, the corresponding adjacency matrix is a symmetric matrix. The coefficient is introduced to eliminate the repeatedly added lower triangular matrix.

5. Parameter Learning for Titan

The objective function in Equation 7 is multi-convex and the regularizer is non-smooth. This increases the difficulty of solving this problem. A traditional way to solve this kind of problem is to use proximal gradient descent. But this approach is slow to converge. Recently, the alternating direction method of multipliers (ADMM) (Boyd et al., 2011) has become popular as an efficient algorithm framework which decouples the original problem into smaller and easier to handle subproblems. Here we propose an ADMM-based Algorithm 1 which can optimize the proposed models efficiently. In particular, primal variables are updated on Line 4, dual variables on Line 5, and Lagrange multipliers on Line 6. Line 7 calculates both primal and dual residuals.

Initialize , , , , , , ;
Initialize , , ;
for  do
      Update , with BCD using Equations 12 and  13;
       Update and with Equations 16;
       Update , , and with Equations 17;
       Compute and by Equations 18;
       if  and  then
       end if
end for
Algorithm 1 An ADMM-based solver for TITAN.

5.1. Augmented Lagrangian Scheme

First, we introduce an auxiliary variable and into the original problem 7 and obtain the following equivalent problem:


where is the set of variables to be optimized. Then we transform the above problem into its augmented Lagrangian form as follows:


where , , and are the Lagrangian multipliers. With this step, we decouple the original problem into two easier to handle problems in which seven variables , , , , , , and will be optimized individually. Note that the coefficient is omitted according to the optimization problem, and is the Frobenius norm.

5.2. Parameter Optimization

The Lagrangian form in Equation 9 is separated based on the primal variables and the dual variables, where the problem of solving the primal variables and is smooth and convex:


5.2.1. Update

We define Equation 10 as objective function which is multi-convex. In particular, of is convex where all other are fixed. This kind of problem can be decoupled into subproblems using block coordinate descent (BCD) (Xu and Yin, 2013), in which each is updated by solving the following sub-optimization problems:


is smooth and convex for each and can be solved by gradient descent as follows:


where according to the BCD algorithm, the is calculated in sequence, from to . And the is defined as follows:

where and are the -th columns of the corresponding Lagrangian multiplier and dual variable.

5.2.2. Update

Similarly, the objective function of is also smooth and convex. Because there are no constraints defined between the columns of , the problem can be solved by gradient decent directly based on the objective function 10, and the gradient of is calculated by:


and the primal variable is then updated with a step size of :


Now that the primal variable is taken care of, the dual variable is updated as follows:


Note that this problem is the definition of proximal , where is the non-smooth function . The proximal operator can be solved efficiently using (Parikh et al., 2014).

5.2.3. Update Dual Variables

Now that primal variables and is taken care of, the dual variables and are updated as follows:


where is the non-smooth function and is the non-smooth function . The proximal operator can be solved efficiently using proximal operators (Parikh et al., 2014).

Next, the Lagrangian multipliers , , and are updated as follows:


Finally, primal and dual residuals are calculated with:


where is primal residual, and is dual residual.

6. Experiment

In this section, we present the experiment environment, dataset introduction, evaluation metrics and comparison methods, extensive experimental analysis on predictive results, and discussions on the learner features.

Method I-270 I-295 I-395
Ridge 92.4709 76.4666 96.3826 89.1404 69.1273 87.3530 84.6881 65.5869 83.3106
LASSO 90.8535 73.8732 90.3336 76.4372 58.8515 70.1599 72.4028 55.8695 68.8993
SVR 87.8016 72.9036 88.7639 72.4579 53.9583 68.6843 68.4456 50.0854 62.6849
nMTL 70.7942 59.9754 82.8141 55.4657 42.6052 55.3893 57.2953 43.3107 41.2034
FeaFiner 77.0080 57.5550 81.4397 63.3036 50.1060 62.6381 51.6727 40.8695 47.4805
TITAN 73.1291 59.5265 81.3789 46.0873 34.3043 52.9296 46.2329 38.9277 42.3794
Method I-495 I-66 I-95
Ridge 69.9718 52.2384 81.2393 80.4118 62.5392 85.3443 76.0088 64.6172 80.1281
LASSO 60.0119 48.5583 75.6027 68.0900 60.7429 77.9394 84.5617 58.7706 69.6493
SVR 58.9676 46.7641 71.5021 72.7470 59.0808 71.1609 62.8689 54.7717 68.8999
nMTL 52.5722 40.5422 63.6820 60.6244 48.4900 58.4887 57.1166 45.1327 49.4991
FeaFiner 56.3049 44.0023 44.9048 62.5098 50.4090 56.4438 55.6806 46.0073 56.0013
TITAN 47.7131 31.7725 37.1649 53.7001 44.3786 40.9370 52.6403 40.5345 49.9848
Table 1. Traffic Incident Duration Prediction Comparisons (RMSE (Min), MAE (Min), MAPE (%))

6.1. Experiment Setup

6.1.1. Experiment Environment

We conducted our experiments on a machine with Intel Core i7-4790 3.6 GHz, the computational power of this CPU is 4.13 Gflops per core. For real-world traffic incident analysis problems, time requirements should be an important factor. The most time-consuming process of our proposed TITAN model is at the training stage. The training stage learns the parameters for temporal features and the orthogonal groups of the temporal features . A matrix multiplication will generate the prediction rapidly. In the validation and testing stages, our prediction for a single data point is generated in less than seconds.

6.1.2. Dataset and Feature Settings

We evaluate our proposed Traffic Incident Duration Prediction model using two real-world traffic data sources. 1) Traffic incident records with reported duration. We collect 43,923 records of traffic incidents in the year 2018 from three major transportation agencies in the Washington DC Metropolitan area: Washington DC, Virginia State and Maryland State departments of transportation. From the collected traffic incident records, we select 29,075 traffic incidents that take places on the six major arterial roads in the region: , , , , , and . In the selected data frame, the time duration of the traffic incidents are recorded in minutes, and we utilize the duration as the ground truth. From the selected incidents 80% of the records serve as the training set, while the rest serve as the testing set. 2) INRIX traffic speed data. We leverage the traffic speed readings from the traffic sensors as the training features. Given the location and verification time of the traffic incidents, we collect traffic speed readings of nearby traffic sensors.

The connectivity of the road network determines the number of tunable parameters in our TITAN

model. According to the selected arterial roads in our experiment, seven hyperparameters can be tuned. During the experiment, we observe that the value of the loss function is significantly larger than regularizers, which means a large penalty should be used to balance the loss function and the regularizers.

6.2. Comparison Methods

To evaluate the performance of the traffic incident duration prediction, 5 comparison methods are considered in our experiment:

regulized linear regression (ridge regression),

regulized linear regression (LASSO), support vector regression (SVR), Naïve multi-task learning model (nMTL), and feature refiner method (FeaFiner).

Regulized Linear Regression (Ridge) (Peeta et al., 2000). Ridge regression is an extension for linear regression. It’s a linear regression model regulized on norm. The parameter is a scalar that controls the model complexity; the smaller is, the more complex the model will be. In our implementation, is searched from . This model only considers the temporal features on duration prediction. No multi-task for arterial road connectivity and grouped temporal features are considered.

Regulized Linear Regression (LASSO) (Ramakrishnan et al., 2014; Tibshirani, 1996). This is a classic way to conduct cost-efficient regressions by enforcing the sparsity of the selected features. It has been proved to be effective in the field of event detection (Ramakrishnan et al., 2014). It includes a parameter that trades off the regularization term; typically, the larger this parameter is, the fewer the selected features will be. In our experiment, is searched from . The feature configurations applied by this model is the same as the ridge regression model.

Support Vector Regression (SVR) (Tibshirani, 1996). Support vector regression provides solutions for both linear and non-linear problems. In our experiment implementation, we utilize non-linear kernel functions (RBF kernel) to find the optimal solution for incident duration prediction problem. The model parameters are selected with and . This model considers similar temporal features with ridge regression and LASSO methods, no multi-task features for connectivity is considered.

Naïve Multi-task Learning Model (nMTL) (Zhao et al., 2016). We implement the fundamental settings of the naive multi-task learning model for event detection. This comparison method is regularized with constraint between tasks. The training tasks of this model are split by the arterial roads. The correlations between tasks are intuitively constrained by norm, and within each task, the importance of the features are constrained by norm. The penalty parameter is searched from .

FeaFiner (Zhou et al., 2013b). FeaFiner regression model with a capability of learning feature clusters. This method learns an optimal sparse feature grouping for general regression problems. However, there are no multi-task properties supported. In our implementation of this method, we apply this method on the complete set of traffic incidents, and the target feature is selected to be the temporal features. In the parameter initialization, we select the parameter

for the k-Mean clustering.

6.3. Evaluation Metrics

To quantify and validate model performance on traffic incident duration prediction, we adopt root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). These metrics are widely utilized in the field of traffic duration prediction studies (Li et al., 2018; Khattak et al., 2016; Park et al., 2016; Zou et al., 2016), it reflects the predictive performance of the proposed model. Equations 1920, and 21 represent the calculations of the selected evaluation metrics:


where is the total number of records; is the predicted traffic incident durations represented in vector; is the ground truth value of the corresponding record, which is also represented in vector. and are the predicted result and the ground truth value respectively.

6.4. Incident Duration Prediction Analysis

6.4.1. Titan Performance Analysis on Spatial Connectivity

Table 1 summarizes the comparisons of our proposed method to the competing methods for the task of traffic incident duration prediction. From the experimental results, we can justify our application of a multi-task learning framework for predicting the incident duration. In general, TITAN outperforms the single task models (LR, SVR, and FeaFiner) on RMSR, MAE, and MAPE. This result shows that the spatial correlations between the road segments can improve the performance of the traffic incident duration prediction. The TITAN model outperforms the nMTL on RMSE and MAE. These results demonstrated that for the traffic incident duration prediction problem, only regularizers is insufficient, detailed spatial connectivity between the road segments should also be considered.

6.4.2. Titan Performance Analysis on Feature Groups Learning

TITAN Performance Analysis on Feature Groups Learning. Among the comparison methods, the FeaFiner (Zhou et al., 2013b) method considers the orthogonal constraint that is capable of grouping low-level features into a high-level feature representation. The original FeaFiner applies the Ridge and LASSO as the original problem settings. Thus, the results presented in Table 1 can be categorized by whether the orthogonal constraints are considered or not. The methods consider orthogonal constraints are FeaFiner and TITAN; the methods do not consider the orthogonal constraint are Ridge, LASSO, and SVR. By comparing these two categories, we learn that the overall performance of the methods consider the orthogonal constraint is better than the methods do not consider the orthogonal constraint. However, the overall performance increase is not as significant as the performance increase from the spatial connectivity constraint introduced by the framework of multi-task learning.

6.4.3. Performance Analysis between Training Tasks

The results in Table 1 show that the model performance for traffic incident duration prediction is not the same across different road segments. For instance, the prediction performances of all the comparison methods on highway only have slight differences between each other. This is because the highway only has one spatial connectivity to the rest of the road segments, and the constraint of Euclidean distance for only shares a limited connection between the other columns of the feature matrix . In contrast, our model for the highway outperforms the comparison methods, because the subtask for shares feature similarity with all other subtasks.

(a) TITAN w/o Orth. Const.
(b) TITAN with Orth. Const.
Figure 4. Feature Learning Results on
Figure 5. Illustration of how the number of grouped temporal features will affect the performance of the TITAN model. The performance is evaluated in terms of RMSE, MAE, and MAPE respectively.

6.5. Feature Groups Assignment Analysis

The orthogonal constraint ensures the proposed model to learn a group of highlighted features that play an important role in predicting the traffic incident durations. In our experiment, we also study the results of the learned group features empirically. In the experiment, we set the number of groups to be 10, and we also apply two conditions: 1) TITAN with orthogonal constraint and 2) TITAN without orthogonal constraint. Figure 4 shows the learned feature groups assignments for both experimental conditions. We can find the learned features with orthogonal constraint overlap less than the learned feature assignment without orthogonal constraint.

While experimenting without the orthogonal constraint, we found that the model has a preference for grouping the low-level features into one feature assignment for every group . Figure 4(a) shows the single feature group assignment for the model without orthogonal constraint. From Figure 4(a), we can find that for the model without orthogonal constraint, temporal features with higher indexes are assigned with higher weights (¿300). This result is reasonable because this can be interpreted as the duration of the traffic incident can be better inferred with the most recent traffic speed readings.

To compare with the model with orthogonal constraint, Figure 4(b) shows the learned feature group assignment for several subtasks. We can find the most weighted feature group by checking the weights in the learned variable . For example, in Figure 4(b), we demonstrate top weighted group for three subtasks (, , and ). From Figure 4(b), we find that the top assigned feature group for different arterial roads differ from each other slightly. This result shows that the most critical temporal features for predicting the traffic incident duration for different roads differ. This observation is valuable for the transportation operators and first responders. In Figure 4(b), we can observe that the high-level features of the subtask have a shift comparing to the subtask of . The 10 minutes’ shift indicates that to predict the duration of an incident on , the traffic speed readings of 10 minutes in advance have higher importance.

6.6. Case Studies

During the experiments, several interesting facts revealed by using the proposed approach were discovered. Here we discuss the details towards the identification of the critical phases for traffic incidents and the influences of the connectivity between the arterial roads.

6.6.1. Critical Phases Identification for Traffic Incidents

According to the experiment results on the correlations between the number of groups and the performance of the TITAN mode, we discover the optimal number of groups for the temporal features. The physical meaning of the number of groups in this experiment, corresponding to the number of phases will be identified for the traffic incidents. As mentioned in Section I, the life cycle of the traffic incident is conventionally grouped into five phases: detection, verification, response, clearance, and recovery. Although such grouping strategy is efficient in the perspective of transportation management and operations, it cannot provide useful temporal feature grouping to predict the traffic incident durations. From this experiment, we can study how the performance of the TITAN model will be affected with respect to the number of feature groups. As shown in Figure 5, we illustrate the RMSE, MAE, and MAPE obtained by varying the number of the groups from 1 to 50; and the color-coded lines representing different arterial roads in the experiment. From Figure 5, we learn that for most of the arterial roads, the TITAN model reaches the best performance when the number of groups in the ranges of 18-20 and 40-43. This experiment result indicates that the conventional five-phase definition of traffic incident life cycle may not provide informative input to the traffic incident duration prediction problems.

6.6.2. Influences of Arterial Road Connectivity

The performance differences between the arterial roads can be observed in Figure 5. In Figure 5, the general prediction performance of the arterial road Interstate 495 outperforms the rest of the arterial roads, and the arterial road Interstate 270 has the worst duration prediction results overall. This comparison result reveals that the connectivity between different arterial roads plays an important role while predicting the traffic incident duration. Because the more connection with other arterial roads means the more information shared with other train tasks in the training stage. The Interstate 495 intersections with all other arterial roads, while the Interstate 270 only intersects with the Interstate 495.

7. Conclusion

This paper presents a novel traffic incident duration prediction and feature learning model TITAN. The proposed model is designed based on the multi-task learning framework for prediction, and a sparse feature learning framework for higher feature groups identification. The proposed TITAN model outperforms the existing traffic incident duration prediction models because of two major advantages in model design: 1) consideration of the connectivity between road segments within the urban road networks; 2) the learned higher level features provide a better predictive pattern for the problem of traffic incident duration prediction. Extensive experiments on real-world datasets with comparisons of the baseline methods justify the performance of TITAN model. By applying the orthogonal constraint, the proposed model is capable of identifying groups of higher level features which can be further considered as the critical evolution stages of the traffic incident. Such functionality provided by our proposed model is helpful for the transportation operators and first responders while judging the influences of the traffic incidents.


  • M. W. Adler, J. van Ommeren, and P. Rietveld (2013) Road congestion and incident duration. Economics of transportation 2 (4), pp. 109–118. Cited by: §1.
  • R. K. Ando and T. Zhang (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6 (Nov), pp. 1817–1853. Cited by: §2.
  • A. Argyriou, T. Evgeniou, and M. Pontil (2007) Multi-task feature learning. In Advances in neural information processing systems, pp. 41–48. Cited by: §2.
  • A. Argyriou, T. Evgeniou, and M. Pontil (2008) Convex multi-task feature learning. Machine Learning 73 (3), pp. 243–272. Cited by: §3.
  • J. Bollen, H. Mao, and X. Zeng (2011) Twitter mood predicts the stock market. Journal of computational science 2 (1), pp. 1–8. Cited by: §2.
  • S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3 (1), pp. 1–122. Cited by: §5.
  • S. Boyles, D. Fajardo, and S. T. Waller (2007) A naive bayesian classifier for incident duration prediction. In 86th Annual Meeting of the Transportation Research Board, Washington, DC, Cited by: §2.
  • T. Evgeniou and M. Pontil (2004) Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 109–117. Cited by: §2.
  • M. S. Gerber (2014)

    Predicting crime using twitter and kernel density estimation

    Decision Support Systems 61, pp. 115–125. Cited by: §2.
  • T. Ji, K. Fu, N. Self, C. Lu, and N. Ramakrishnan (2018) Multi-task learning for transit service disruption detection. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 634–641. Cited by: §1, §2.
  • A. J. Khattak, J. Liu, B. Wali, X. Li, and M. Ng (2016)

    Modeling traffic incident duration using quantile regression

    Transportation Research Record 2554 (1), pp. 139–148. Cited by: §6.3.
  • W. Kim and G. Chang (2012) Development of a hybrid prediction model for freeway incident duration: a case study in maryland. International journal of intelligent transportation systems research 10 (1), pp. 22–33. Cited by: §2.
  • A. T. Kwee, M. Chiang, P. K. Prasetyo, and E. Lim (2018) Traffic-cascade: mining and visualizing lifecycles of traffic congestion events using public bus trajectories. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1955–1958. Cited by: §4.2.
  • Y. Lee and C. Wei (2010) A computerized feature selection method using genetic algorithms to forecast freeway accident duration times. Computer-Aided Civil and Infrastructure Engineering 25 (2), pp. 132–148. Cited by: §2.
  • R. Li, F. C. Pereira, and M. E. Ben-Akiva (2018) Overview of traffic incident duration analysis and prediction. European Transport Research Review 10 (2), pp. 22. Cited by: §1, §6.3.
  • P. Lin, N. Zou, and G. Chang (2004)

    Integration of a discrete choice model and a rule-based system for estimation of incident duration: a case study in maryland

    In CD-ROM of Proceedings of the 83rd TRB Annual Meeting, Washington, DC, Cited by: §2.
  • D. Nam and F. Mannering (2000) An exploratory hazard-based analysis of highway incident duration. Transportation Research Part A: Policy and Practice 34 (2), pp. 85–102. Cited by: §2.
  • B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith (2010) From tweets to polls: linking text sentiment to public opinion time series. In Fourth International AAAI Conference on Weblogs and Social Media, Cited by: §2.
  • N. Owens, A. Armstrong, P. Sullivan, C. Mitchell, D. Newton, R. Brewster, and T. Trego (2010) Traffic incident management handbook. Technical report Cited by: §2.
  • K. Ozbay and P. Kachroo (1999) Incident management in intelligent transportation systems. Cited by: §1.
  • N. Parikh, S. Boyd, et al. (2014) Proximal algorithms. Foundations and Trends® in Optimization 1 (3), pp. 127–239. Cited by: §5.2.2, §5.2.3.
  • H. Park, A. Haghani, and X. Zhang (2016) Interpretation of bayesian neural networks for predicting the duration of detected incidents. Journal of Intelligent Transportation Systems 20 (4), pp. 385–400. Cited by: §6.3.
  • S. Peeta, J. L. Ramos, and S. Gedela (2000) Providing real-time traffic advisory and route guidance to manage borman incidents on-line using the hoosier helper program. Cited by: §2, §6.2.
  • N. Ramakrishnan, P. Butler, S. Muthiah, N. Self, R. Khandpur, P. Saraf, W. Wang, J. Cadena, A. Vullikanti, G. Korkmaz, et al. (2014)

    ’Beating the news’ with embers: forecasting civil unrest using open source indicators

    In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1799–1808. Cited by: §6.2.
  • T. Sakaki, M. Okazaki, and Y. Matsuo (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pp. 851–860. Cited by: §2.
  • Z. Su, L. Li, H. Peng, J. Kurths, J. Xiao, and Y. Yang (2014) Robustness of interrelated traffic networks to cascading failures. Scientific reports 4, pp. 5413. Cited by: §4.2.
  • R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight (2005) Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (1), pp. 91–108. Cited by: §1.
  • R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §6.2, §6.2.
  • G. Valenti, M. Lelli, and D. Cucina (2010) A comparative study of models for the incident duration prediction. European Transport Research Review 2 (2), pp. 103–111. Cited by: §2.
  • E. I. Vlahogianni and M. G. Karlaftis (2013) Fuzzy-entropy neural network freeway incident duration modeling with single and competing uncertainties. Computer-Aided Civil and Infrastructure Engineering 28 (6), pp. 420–433. Cited by: §2.
  • Y. Xu and W. Yin (2013)

    A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion

    SIAM Journal on imaging sciences 6 (3), pp. 1758–1789. Cited by: §5.2.1.
  • C. Zhan, A. Gan, and M. Hadi (2011) Prediction of lane clearance time of freeway incidents using the m5p tree algorithm. IEEE Transactions on Intelligent Transportation Systems 12 (4), pp. 1549–1557. Cited by: §1, §2.
  • X. Zhang, L. Zhao, A. P. Boedihardjo, C. Lu, and N. Ramakrishnan (2017) Spatiotemporal event forecasting from incomplete hyper-local price data. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 507–516. Cited by: §1.
  • L. Zhao, J. Chen, F. Chen, W. Wang, C. Lu, and N. Ramakrishnan (2015a)

    Simnest: social media nested epidemic simulation via online semi-supervised deep learning

    In 2015 IEEE International Conference on Data Mining, pp. 639–648. Cited by: §2.
  • L. Zhao, Q. Sun, J. Ye, F. Chen, C. Lu, and N. Ramakrishnan (2015b) Multi-task learning for spatio-temporal event forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1503–1512. Cited by: §2, §2.
  • L. Zhao, J. Wang, and X. Guo (2018) Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: §1.
  • L. Zhao, J. Ye, F. Chen, C. Lu, and N. Ramakrishnan (2016) Hierarchical incomplete multi-source feature learning for spatiotemporal event forecasting. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2085–2094. Cited by: §1, §6.2.
  • J. Zhou, J. Chen, and J. Ye (2011) Malsar: multi-task learning via structural regularization. Arizona State University 21. Cited by: §2.
  • J. Zhou, J. Liu, V. A. Narayan, J. Ye, A. D. N. Initiative, et al. (2013a) Modeling disease progression via multi-task learning. NeuroImage 78, pp. 233–248. Cited by: §1.
  • J. Zhou, Z. Lu, J. Sun, L. Yuan, F. Wang, and J. Ye (2013b) Feafiner: biomarker identification from medical data through feature generalization and selection. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1034–1042. Cited by: §1, §6.2, §6.4.2.
  • Y. Zou, K. Henrickson, D. Lord, Y. Wang, and K. Xu (2016) Application of finite mixture models for analysing freeway incident clearance time. Transportmetrica A: Transport Science 12 (2), pp. 99–115. Cited by: §6.3.