RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework

02/19/2020 ∙ by Zhengyang Zhou, et al. ∙ USTC 0

Real-time traffic accident forecasting is increasingly important for public safety and urban management (e.g., real-time safe route planning and emergency response deployment). Previous works on accident forecasting are often performed on hour levels, utilizing existed neural networks with static region-wise correlations taken into account. However, it is still challenging when the granularity of forecasting step improves as the highly dynamic nature of road network and inherent rareness of accident records in one training sample, which leads to biased results and zero-inflated issue. In this work, we propose a novel framework RiskOracle, to improve the prediction granularity to minute levels. Specifically, we first transform the zero-risk values in labels to fit the training network. Then, we propose the Differential Time-varying Graph neural network (DTGN) to capture the immediate changes of traffic status and dynamic inter-subregion correlations. Furthermore, we adopt multi-task and region selection schemes to highlight citywide most-likely accident subregions, bridging the gap between biased risk values and sporadic accident distribution. Extensive experiments on two real-world datasets demonstrate the effectiveness and scalability of our RiskOracle framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Traffic accident forecasting is of great significance for urban safety. For example, with the deployment of Tennessee accident prediction model, the fatality rate of Tennessee has been reduced by 8.16% in 2016, according to statistics [Tennessee]. There has been an increasing demand to conduct the accident prediction in a finer granularity, enabling more timely safe route recommendation for travelers and accurate emergency response for emerging applications, such as traffic intelligence and automatic driving.

Regarding the length of accident prediction periods, existing tasks of traffic accident forecasting are classified into two parts, long-term (day-level prediction) and mid-term (hour-level prediction). We summarize all these related works in Table 

LABEL:tab:relatedwork. Even though recent studies focus on day-level forecasting [11, 8, Tennessee] by modeling the spatiotemporal heterogeneous data take effects, and [8] reaches the state-of-the-art performance, it is less meaningful for emergency conditions.

Mid-term accident forecasting on hour levels can be further classified into classic learning- and deep learning-based methods. Classic learning models include clustering based 

[13], frequent tree based [12] and Nonnegative Matrix Factorization based [15] methods. Unfortunately, this type of methods ignores the temporal relations and cannot model the complex nonlinear spatiotemporal interactions. Deep learning-based methods such as [10] utilize LSTM layers to learn the temporal relations by feeding only historical traffic accident records into the training network, which lacks multi-source real-time traffic inputs to support the forecasting, leading to unsatisfactory prediction performance. There have been works [6, 7, 9] on investigating accident patterns through existing deep learning frameworks SDAE, SDCAE and ConvLSTM, respectively, by incorporating real-time human mobilities. However, they all fail to extract the time-varying inter-subregion and intra-subregion correlations in the whole city.

Although recent advances in deep learning models enable promising results in hour-level accident forecasting, we argue that three important issues are largely overlooked, resulting in poor performance in prediction on minute levels. Firstly, as mentioned in [9], when the spatiotemporal resolution of the prediction tasks improves, zero-inflated problems will occur, predicting all results as zeros. Without any strategy to deal with this issue, rare non-zero items in training data disable models to take effects [wang2018graph]

. Secondly, although degrees of static subregion-wise correlations can be learned by Convolution Neural Network (CNN) 

[7, 9], time-varying subregion-wise correlations also play a vital role in citywide short-term accident prediction, i.e. two subregions tend to be strongly correlated in the morning and less correlated in the afternoon due to the tidal flows. Thirdly, abnormal changes in traffic status within the same subregion during adjacent time intervals usually induce the occurrence of accidents or other events [chen2018radar, zheng2015detecting]. Without considering aforementioned spatiotemporal issues, the ability of previous hour-level prediction models would be hindered seriously.

In this paper, we study the problem of minute-level citywide traffic accident prediction by proposing the three-stage framework RiskOracle, based on Multi-task Differential Time-varying Graph convolution Network (Multi-task DTGN). In the data preprocessing stage, we propose a co-sensing strategy to maximumly infer global traffic status and then a priori knowledge-based data enhancement is designed to tackle zero-inflated issue for short-term predictions. In the training stage, we propose Multi-task DTGN, where time-varying overall affinity explicitly models the short-term dynamic subregions-wise correlations and differential feature generator establishes high-level relationships between the immediate changes of traffic status and accidents. As we know, the accidents and traffic volumes are often distributed imbalanced in the city, thus the multi-task scheme is to address spatial heterogeneities in accident prediction. Then we can obtain a set of discrete most-likely subregions by taking advantages of the learned multi-scale accident distributions in the prediction stage. Experiments on two datasets demonstrate that our framework surpasses state-of-the-art solutions on both 30-minute and 10-minute level prediction tasks.

Time
granularity
Classic learning
based method
Deep learning
based method
Day-level
[11]
(Tennessee
model 2017)
(Yuan, Zhou and
Yang 2018)
Hour-level
[13]
[12]
[15]
[6]
[7]
[9]
[10]
Minute-level - Our work
Table 1: Summarization of traffic accident prediction

Preliminaries and Problem Defintion

In this section, we present the preliminaries and basic definitions, then formally define the problem studied in this paper.

In our work, we find that it leads to unnecessary redundancies, if directly modeling the whole study area as an overall square-shaped region and adopting traditional CNN for spatiotemporal feature extraction, especially for real-time accident forecasting, as the contour of a city is usually irregular. As Figure 

2(a) shows, we first divide the study area within the road network into medium-sized rectangular regions (rectangular regions in short). Each rectangular region consists of several small-sized square subregions (subregions in short). There are total subregions in the study area, and we model the subregions by the urban graph.

Definition 1 (Urban Graph.)

The study area can be defined as an undirected graph, called urban graph . Here, the vertex set , where denotes the -th square-shaped urban subregion. Given two vertexes , the edge within these two vertexes indicates the connectedness between these two subregions, where

(1)

Note that in this paper, the traffic elements of a vertex consist of two aspects, static road network features and dynamic traffic features. And we keep the

connectedness of the whole urban graph to control sparsity of affinity matrix

and (introduced in the next section), then the corresponding nonzero items in affinity matrix refer to the subregions with strong correlations.

The dynamic traffic features of subregion in a specific time interval can be modeled by three parts, (a) the intensity of human activities, represented by traffic volume ; (b) the traffic conditions, represented by the average traffic speed ; and (c) the level of traffic accident risks . The formal definition of dynamic traffic features is as follows.

Definition 2 (Static Road Network Features.)

For an urban subregion

, the static features of road networks within the subregion, cover the statistical spatial attributes of the numbers of road lanes, road types, road segment lengths and widths, snow removal priorities and the numbers of overhead electronic signs, for all road segments inside, can be denoted as a fixed length vector

. The static road network features of the entire urban domain can be formulated as .

Definition 3 (Dynamic Traffic Features.)

For , the dynamic traffic features of within a given time interval can be formulated as . is the summation of the number of accidents weighted by the corresponding severity levels 111We define three accident risk types: minor accidents, injured accidents, and fatal accidents [6]. We assign weights 1, 2, and 3 to the three types, respectively.. In particular, , where indicates the type of accident severity, denotes the number of accidents of type . So the accident risk distributions and the dynamic traffic features of the entire urban domain within can be represented by and respectively.

Definition 4 (Traffic Accident Prediction.)

Given static road network features and the historical dynamic traffic features , our purpose is to predict the distribution of the citywide traffic accident risks and select high-risk subregions , for the future time interval .

Minute-level Real-time Traffic Accident Forecasting

In this section, we first show the overview of our proposed framework RiskOracle, and then elaborate on each stage.

Framework Overview

As illustrated in Figure 1, our proposed framework RiskOracle includes three stages, Data preprocessing stage, Model training stage and Prediction stage.

Figure 1: Framework Overview of RiskOracle

Data Preprocessing

Addressing Spatial Heterogeneities in Accident Prediction

High-risk values tend to bias to urban areas due to most accidents and traffic volumes are covered downtown, leading to a serious spatial imbalance in risks and ignoring the relatively high-risk regions in rural areas. To perform citywide predictions, it is necessary to select most-likely accident regions and address the spatial heterogeneities. Thus, as illustrated in Figure 2(a), the subregions are organized hierarchically in our work and they are responsible for collecting fine- and coarse-grained accident distributions, respectively. Then subregions in each medium-sized rectangular region will be further highlighted separately. The multi-scale distributions can be considered as the hierarchical accident distributions.

Overcoming Zero-inflated Issue

Deep Neural Networsks (DNNs) suffer from zero-inflated issues and predict invalid results if the nonzero items in training labels are extremely rare [wang2018graph, 9]. There only exist 6 accidents in the whole study area during a selected 10-minute interval in New York City (NYC) as Figure 2(b) shows, demonstrating the inherent rareness of short-term accidents. To overcome this issue in real-time accident prediction, we devise a priori knowledge-based data enhancement (PKDE) strategy to discriminate the risk values in labels of training dataset. Specifically, for interval , we transform zero items in to negative values. The transformation is done in two phases: a) the zero value is transformed into accident risk indicator by Equation (2); b) the value of indicator is transformed into statistical accident intensity by Equation (3). Given subregion , we can calculate its accident risk indicator by

(2)

where is the total number of weeks in the training dataset, and indicates the total risk value of region during all time intervals in the -th week. Then, we can calculate the statistical accident intensity of region by

(3)

where and are the coefficients to maintain symmetry between the range of the absolute value of

and the range of true risk values. With the logarithm transformation within 0 and 1, we can easily make transformed data discriminating and suitable for training networks. The transformation is implemented in such a way: 1) the accident intensity value of a zero-item subregion is negative and thus smaller than the value of nonzero-item subregion, reflecting the fact that a zero-item subregion is with lower accident risk; 2) the subregion with lower accident risk indicator has a lower accident probability, preserving the ranks of actual accidient risks.

Figure 2: An example of NYC

Complementing Sparse Sensing Data

Real-time traffic information is usually collected insufficiently [wang2018real] for accident prediction and the dynamic traffic information tends to have interactive effects with the static spatial road network structures [18, 12]. Thus, we propose a co-sensing strategy by modifying xDeepFM [lian2018xdeepfm] as SpatioTemporal Deep Factorization Machine (ST-DFM) by taking advantages of the interaction operations of FM.

We first extract the road network similarities and connections between subregions by static affinity matrix where the item in denotes static affinity within subregion and and the affinity can be calculated by

(4)

Here, the function is the Jensen-Shannon divergence [lin1991divergence]:

(5)

The same as xDeepFM, ST-DFM contains Compressed Interaction Network (CIN) module and DNN module. Three spatiotemporal fields i.e. static spatial features, dynamic traffic features222For the dynamic traffic feature field in one subregion , we first select the most proximal subregions with by the static affinity matrix, the available dynamic traffic information within these subregions will constitute of the dynamic traffic features in . and timestamps are embedded in ST-DFM. Then, ST-DFM learns the interactive relationships between different spatiotemporal features in vector-wise level with the CIN module and the high-level representation of features with the DNN module, and finally obtains high-level feature combinations. We infer speed values by feeding traffic volumes at the corresponding subregion into ST-DFM and vice versa. Then traffic information can thus be maximumly inferred to obtain global traffic status by training the data within the intersections of two real-time traffic datasets.

Multi-task DTGN for Accident Risk Prediction

SpatioTemporal DTGN

The accidents and congestions tend to be interacted and propagated in the road network, especially on holidays or in rush hours. Due to the potential of GCN in modeling non-Euclidean subregion-wise propagations and correlations [18], we hereby propose DTGN. We modify GCN by incorporating time-varying overall affinity and differential feature generator to tackle the challenges in minute-level accident prediction.

Time-varying overall affinity matrix with dynamic traffic features involved. It has been demonstrated strong time-varying correlations between traffic conditions of different urban subregions [26, wang2014data]. Also, there exist strong spatiotemporal correlations between traffic accidents and urban traffic conditions [chen2018radar]. Therefore, for our minute-level accident prediction, it is indispensable to capture the inter-subregion time-varying traffic correlations of a specific time interval by an overall affinity matrix . The item in denotes the dynamic overall affinity within subregions and :

(6)

includes the traffic volume and average speed of subregion within the same time interval in each day of last week. Notice that we modify the weights of static spatial attributes of subregions based on their different effects on accidents with an attention-based scheme [bahdanau2014neural]. Also, the accident-based static features of subregion can be denoted as . Further, a weighted factor is used to adjust the proportion that the dynamic traffic condition affinity accounts for the overall affinity matrix. With such overall affinity, distant subregions but have potential accident-related correlations due to traffic characteristics can also be connected dynamically. To perform GCN in spectral domain, we need to calculate the Laplacian matrix  [23] with , which can be seen as the graph adjacence martix. First, we derive :

(7)

where

is the identity matrix of

. Second, we calculate by

(8)

where and is the element in matrix . Then, we can obtain Laplacian matrix of by

(9)

Differential GCN for extracting spatiotemporal features. It has been generally accepted that the task of accident or event prediction is more relevant to abnormal variations of urban traffic conditions, compared with regular traffic conditions [chen2018radar, zheng2015detecting]. To this end, we introduce a differential feature generator to calculate differential images within adjacent time intervals. By feeding the differential dynamic traffic features into GCN, the propagations and interactions of abnormal changes in traffic can be modeled and the high-level correlations between the immediate traffic status variations and accidents are learned, especially benefiting minute-level accident forecasting. Given , the differential vector can be computed by

(10)

where and . For all subregions in , by combining their dynamic traffic features and the corresponding differential vectors, we generate a united feature tuple . As described in [2], urban traffic has obvious characteristics of three temporal perspectives, hourly closeness, daily periodicity and distant trend. To this end, when given , we select united feature tuples 333According to the settings in [2], we set the value of as 3. for each temporal perspective as the inputs of DTGNs. Specifically, we select the last intervals of , the same interval as of the last continuous days for hourly closeness and daily periodicity perspective. And for distant trend, we first select previous days at the frequency of every 10 days and for each of selected days, we extract the same interval as . As illustrated in Figure 1, we then feed the united feature tuple sets of all three temporal perspectives into DTGNs independently. The detailed architecture of one DTGN is demonstrated in Figure 3(a). For one specific temporal perspective, we denote the corresponding united feature tuple set as . We feed into a fully-connected (FC) network to encode all features into a lower-dimensional feature set, and then feed it into a GCN. The GCN works recursively,

(11)

Here indicates the -layer graph convolution, denotes the weights of the -layer graph convolution kernel. Notice here, given one temporal perspective, we take the mean of the matrices of all selected time intervals as

. We use a Batch Normalization between every 2 GCN layers to avoid gradient explosion. Considering negative values in the dataset we transformed, we select

as the activation function. In addition, real-time dynamic external factors, i.e. timestamps and meteorological data, are embedded into a vector of fixed length consecutively, and then fused with the output of each GCN unit. For three temporal perspectives, we denote the output feature maps of DTGN as

, and respectively.

Figure 3: Details of Multi-task DTGN

Multi-task Learning for Accident Risk Prediction

In this subsection, we design the multi-task scheme, not only to enhance the deep representation, also to learn hierarchical accident distributions and provide instructions for most-likely accident region selection. For forecasting accident risks of subregions, we first take the distribution of accident risks as the main task. Considering the prominent correlations between traffic accidents and the intensity of human activities, we take regional traffic volume prediction as the first auxiliary task to enhance the representation. To provide instructional information for the hierarchical accident region selection, we take the total numbers of accidents within different rectangular regions as the second auxiliary task.

Specifically, we feed the output feature maps of DTGN including , and into a convolution-based fusion module as Figure 1 shows, then perform the multi-task learning. We visualize the flow chart of our multi-task scheme in Figure 3(b). First, we generate the predicted risk distribution feature map and the citywide traffic volumes as follows, the reasons why we choose Leaky_ReLU in the main task due to the risks in labels are partly transformed into negative values and other tasks remain nonnegative.

(12)
(13)

With the additional fully-connected layer, we learn the total number of accidents within each rectangular region by:

(14)

Here, , and denote the fusion weights of accident risk, human activity intensity and the numbers of accidents within different medium rectangular regions in time interval , respectively. is the fusion weights of the fully-connected network in . And can be viewed as the coarse-grained accident distribution, it will be fed into another fully-connected layer to map to same shape as , and then intergrated with the output of accident risk distribution feature map , compelling both tasks to learn the relationship between multiple accident distributions adequately. Then can be updated by

(15)

where is the final output of main task and denoting the weights of the fully-connected layer. So we have the total loss of this multi-task learning framework as

(16)

where , and are the loss of the main task and two auxiliary tasks respectively. We use L2 regularization to avoid the overfitting issue, and use , ,

as the hyper-parameters of the loss function.

Hierarchical Most-likely Accident Region Selection

In a specific city, there often exist imbalanced coverage of accidents and traffic volumes in rural and urban areas, inducing the issue of spatial heterogeneities. Thus, it is illogical to cut off high accident risk with a unified risk threshold for selecting most-likely accident subregions. We then propose a hierarchical most-likely accident region selection (HARS) strategy based on the hierarchical accident distributions learned in the multi-task scheme.

For each rectangular region , we select subregions with the highest risks and the parameter equals to the corresponding element in learned by the second auxiliary task. In consequence, we obtain a set of most-likely accident regions. Also, the learned reduces the overpredicted regions and keeps the model conform to the changes of time and weather with external factors involved.

Empirical Studies

In this section, we conduct extensive empirical studies to evaluate our minute-level prediction framework by setting the temporal intervals as 30 minutes and 10 minutes.

Data Description

We conduct experiments on two real-world datasets: NYC Opendata and Suzhou Industrial Park (SIP) dataset. For NYC dataset, due to the lack of real-time traffic volumes, here we utilize the taxi trip volumes in each subregion as the indicator of human mobilities. For SIP dataset, it contains traffic flows and speeds. We integrate it with another traffic accident dataset collected from Microblog, Sina, a social media platform. The statistics are shown in Table LABEL:tab:datasets. More details are available on the website444https://github.com/zzyy0929/AAAI2020-RiskOracle/..

Implementation Details

For experiments, we select 60%, 30% and 10% of dataset for training, evaluation and validation, respectively. We generate the subregion set by partitioning the city map into small subregions with equal size referring to common settings [18] and practices. We stack 9 GCN layers with 384 filters in each layer. The weights of the loss function are set as , , . The multi-task DTGN is trained with back propagation and Adam method [kingma2014adam].

During training period, dynamic traffic data and affinity matrices are aggregated into 3 groups and two-scale accident distributions are fed into Multi-task DTGN. For testing, we fetch the needed data and pass it through the model, most-likely accident subregions are derived with main and second auxiliary task. The high-risk subregions are highlighted and compared with real-world accident records during the same spatiotemporal scope.

City Dataset555It refers to different types of traffic-related records in the city. Time Span
# of
Regions
# of
Records
NYC Accidents 01/01/2017- 05/31/2017 354 254k
Taxi Trips 48,496k
Speed Values 125k
Weathers 604
Demographics Investigated in 2016 195
Road Network 102k
SIP Accidents 01/01/2017- 03/31/2017 108 183
Traffic Flows 1,399k
Speed Values 311k
Weathers 180
Table 2: Datasets statistics

Evaluation Metrics

We evaluate our proposed RiskOracle from two perspectives, regression perspective and classification perspective. (1) Regression perspective: Mean Square Error (MSE) of predicted risks. (2) Spatial classification perspective: a) Accuracy of top (Acc@[liao2018predicting], which is widely applied in spatiotemporal ranking tasks, indicates the percentage of accurate predictions in subregions within highest risks. equals 20 and 6 for 30-minute and 10-minute evaluation in NYC dataset according to the statistics (NYC Accident Records 2017). And similarly, in SIP dataset, equals 5. b) Acc@, where is the summation of learned by the second auxiliary task. Note that Acc1 denotes the accuracy during hours with a high frequency of accidents, i.e. 7:00 a.m.-9:00 a.m. and 12:00 p.m.-4:00 p.m.

Baselines

Five baselines are as follows: (1) ARIMA

, a classic machine learning algorithm, for understanding and predicting future values, especially for time-series predictions;

(2) Hetero-ConvLSTM, the state-of-the-art deep learning framework for traffic accident prediction666We adjust the hyper-parameters to reach its best performance at 4 blocks with 16 filters, and a size of 1212 moving window with step=6. [8]; (3) ST-ResNet, proposed in [2] for predicting traffic flows; (4) SDAE, proposed in [6] for real-time risk prediction, by incorporating human mobilities; (5) SDCAE, the latest method for citywide hour-level accident risk prediction proposed in [7].

Evaluation Results and Analysis

Comparison Performances

Table 3 illustrates the performance comparisons on NYC and SIP datasets with 30-minute and 10-minute intervals settings. Encouragingly, our framework RiskOracle achieves the highest accuracy and outperforms baselines on almost all metrics. With HARS, our model addresses the spatial heterogeneities and overprediction issue in accident prediction by highlighting subregions in the -th rectangular region. Especially on NYC dataset, our RiskOracle improves the accuracy by 22.49% compared with the best baseline on Acc@20. In consequence, RiskOracle is more sensitive to accidents and extensible for sparse sensing data as well as short-term sporadic spatiotemporal forecasting. Additionally, it shows that our model performs better during high-risk hours, which is desired by real applications in accident forecasting. The reasons why the performance on NYC can be better than SIP dataset may be the incompletion of accident labels in SIP. We will report Acc@ later in ablation studies.

Overall, as the temporal granularity becomes finer, the performances of our framework decrease slightly while baselines decrease sharply as they trap into zero-inflated issue, which demonstrates the effectiveness and scalability of our proposal for short-term accident prediction. The improvements on both two datasets verify the robustness and generality of our proposed RiskOracle even when the dataset in real applications includes rare accident records.

30-minute Interval 10-minute Interval
NYC Models MSE Acc@20 Acc1@20 MSE Acc@6 Acc1@6
ARIMA 0.6801 14.23% 20.26% 0.2380 8.62% 10.05%
Hetero-ConvLSTM 0.1129 48.04% 58.01% 0.0185 24.53% 42.01%
ST-ResNet 0.0627 31.06% 40.93% 0.0162 10.02% 27.50%
SDAE 0.2414 12.08% 27.73% 0.0435 8.33% 12.74%
SDCAE 0.2209 14.79% 22.64% 0.0076 13.48% 31.48%
RiskOracle (Ours) 0.1085 70.53% 72.91% 0.0452 45.18% 69.22%
SIP Models MSE777Here we report the mse of accidents. Acc@5 Acc1@5 MSE Acc@5 Acc1@5
ARIMA - 19.68% 18.42% - 23.68% 28.62%
Hetero-ConvLSTM 3.392 28.92% 42.37% 3.980 31.42% 48.57%
ST-ResNet 3.459 60.73% 62.50% 3.180 41.78% 43.50%
SDAE 3.322 60.26% 36.72% 3.312 12.88% 20.83%
SDCAE 3.210 58.68% 67.50% 3.455 26.31% 37.50%
RiskOracle(Ours) 3.270 63.15% 65.24% 3.029 46.30% 48.91%
Table 3: Performance comparisons on NYC and SIP datasets

Evaluations on Acc@ and Ablation Studies

In Table 4, we report the results on Acc@ which we propose in our paper. We record in each interval, which is the summation of learned by our framework for fair comparisons. It is reasonable that results of Acc@20 and Acc@6 are slightly higher than Acc@ because the uniform threshold cannot adapt to the real-time conditions and tend to overpredict the accidents. In contrast, our framework has the flexibility to potentially approximate the number of accidents in each rectangular region with the multi-scale accident distribution forecasting. As observed, our framework can outperform other baselines and achieve an acceptable level of accuracy on Acc@ when compared to results in Table 3, verifying the effectiveness of our hierarchical accident selection mechanism in the task.

Further, to investigate how each component contributes to high-quality results, we perform an ablation study to tease apart which components of RiskOracle are most important for its success. The prediction performances of ablative variants of RiskOracle are shown in Table 4 on NYC dataset. RO-1 to RO-5 represent the variants of removing the following modules from the integrated RiskOracle in turn, PKDE strategy, ST-DFM, Overall affinity, Differential feature generator and Multi-task with HARS. The integrated model consistently outperforms other variants on both 30-minute and 10-minute levels. Specifically, the time-varying overall affinity and PKDE strategy contribute to the most remarkable promotions. We can conclude that the well-designed components exactly result in significant improvements in short-term predictions according to Table 4.

Hyper-parameter Studies

Here we show the parameter studies on 30-minute level in NYC. We adjust the number of layers and filters in each layer to reach the best performance at 9 layers with 384 filters. We fix the weight of the main task as 1, and adjust , , Acc@ arrives the highest 53.82% when = 0.8 and = 1. Also, we adjust the weight of the dynamic element in overall affinity and reach the best performance when equals among {0, 0.5, 1.0, 1.2, 1.5}. And equals 18 among {9, 18, 33} when the MSE of the second auxiliary task reaches the lowest. Note that our framework is trained offline and the parameters learned are utilized for online prediction. The computation workload can be determined by the parameters and done in several seconds, which sufficiently meets the real-time forecasting requirement.

30-minute Interval 10-minute Interval
Ablation
Variants
MSE
Acc
@20
Acc
@
MSE
Acc
@6
Acc
@
RO-1 0.069 48% 42% 0.048 28% 25%
RO-2 0.126 69% 53% 0.103 43% 38%
RO-3 0.169 34% 36% 0.124 29% 30%
RO-4 0.115 63% 49% 0.063 42% 31%
RO-5 0.118 65% - 0.053 40% -
Inte-
grated
0.108 70% 57% 0.045 45% 46%
Table 4: Ablation studies on NYC dataset

Case Study

We visualize the accidents predicted by RiskOracle at selected 30-minute intervals on one day in Figure 4. Overall, citywide risk maps generated by RiskOracle reveal discriminating risks and the highlighted subregions show great spatial similarities with the ground truth. Here, accidents predicted at 7:00 a.m. are rare, due to that few people go out on Sunday morning. However, the number of accidents increases when afternoon comes and it becomes even worse in the evening. It is mainly because of the heavy rain that evening, causing accident-prone road conditions. The results prove that the auxiliary task and HARS learn to adjust inferences accordingly by capturing dynamic patterns of accident distributions with external factors, which brings in better adaptivity than a unified threshold solution.

Figure 4: RiskOracle on May, 22th, 2017, NYC

Conclusion

In this paper, we tackle the challenges of minute-level citywide traffic accident forecasting by proposing the integrated framework RiskOracle based on Multi-task DTGN, providing a quantitive decision-making basis for urban safety in a more timely manner. We first propose two strategies to overcome the zero-inflated issue and sparse sensing. By incorporating the differential feature generator and time-varying overall affinity in Multi-task DTGN, our framework has the power to model sporadic spatiotemporal data and capture the short-term subregion-wise correlations. We also highlight most-likely accident regions to deal with spatial heterogeneities with learnable multi-scale accident distributions in the multi-task scheme. Experiments on two real-world datasets verify our framework outperforms the state-of-the-art solutions. Therefore, our work can be a paradigm for addressing spatiotemporal data mining tasks with sporadic labels and insufficient sensing data, e.g. predictions of the crimes and epidemic outbreaks.

Acknowledgements

This paper is partially supported by the Anhui Science Foundation for Distinguished Young Scholars (No.1908085J24), NSFC (No.61672487, No.61772492), Jiangsu Natural Science Foundation (No.BK20171240, BK20191193) and CAS Pioneer Hundred Talents Program.

References