Introduction
Traffic accident forecasting is of great significance for urban safety. For example, with the deployment of Tennessee accident prediction model, the fatality rate of Tennessee has been reduced by 8.16% in 2016, according to statistics [Tennessee]. There has been an increasing demand to conduct the accident prediction in a finer granularity, enabling more timely safe route recommendation for travelers and accurate emergency response for emerging applications, such as traffic intelligence and automatic driving.
Regarding the length of accident prediction periods, existing tasks of traffic accident forecasting are classified into two parts, longterm (daylevel prediction) and midterm (hourlevel prediction). We summarize all these related works in Table
LABEL:tab:relatedwork. Even though recent studies focus on daylevel forecasting [11, 8, Tennessee] by modeling the spatiotemporal heterogeneous data take effects, and [8] reaches the stateoftheart performance, it is less meaningful for emergency conditions.Midterm accident forecasting on hour levels can be further classified into classic learning and deep learningbased methods. Classic learning models include clustering based
[13], frequent tree based [12] and Nonnegative Matrix Factorization based [15] methods. Unfortunately, this type of methods ignores the temporal relations and cannot model the complex nonlinear spatiotemporal interactions. Deep learningbased methods such as [10] utilize LSTM layers to learn the temporal relations by feeding only historical traffic accident records into the training network, which lacks multisource realtime traffic inputs to support the forecasting, leading to unsatisfactory prediction performance. There have been works [6, 7, 9] on investigating accident patterns through existing deep learning frameworks SDAE, SDCAE and ConvLSTM, respectively, by incorporating realtime human mobilities. However, they all fail to extract the timevarying intersubregion and intrasubregion correlations in the whole city.Although recent advances in deep learning models enable promising results in hourlevel accident forecasting, we argue that three important issues are largely overlooked, resulting in poor performance in prediction on minute levels. Firstly, as mentioned in [9], when the spatiotemporal resolution of the prediction tasks improves, zeroinflated problems will occur, predicting all results as zeros. Without any strategy to deal with this issue, rare nonzero items in training data disable models to take effects [wang2018graph]
. Secondly, although degrees of static subregionwise correlations can be learned by Convolution Neural Network (CNN)
[7, 9], timevarying subregionwise correlations also play a vital role in citywide shortterm accident prediction, i.e. two subregions tend to be strongly correlated in the morning and less correlated in the afternoon due to the tidal flows. Thirdly, abnormal changes in traffic status within the same subregion during adjacent time intervals usually induce the occurrence of accidents or other events [chen2018radar, zheng2015detecting]. Without considering aforementioned spatiotemporal issues, the ability of previous hourlevel prediction models would be hindered seriously.In this paper, we study the problem of minutelevel citywide traffic accident prediction by proposing the threestage framework RiskOracle, based on Multitask Differential Timevarying Graph convolution Network (Multitask DTGN). In the data preprocessing stage, we propose a cosensing strategy to maximumly infer global traffic status and then a priori knowledgebased data enhancement is designed to tackle zeroinflated issue for shortterm predictions. In the training stage, we propose Multitask DTGN, where timevarying overall affinity explicitly models the shortterm dynamic subregionswise correlations and differential feature generator establishes highlevel relationships between the immediate changes of traffic status and accidents. As we know, the accidents and traffic volumes are often distributed imbalanced in the city, thus the multitask scheme is to address spatial heterogeneities in accident prediction. Then we can obtain a set of discrete mostlikely subregions by taking advantages of the learned multiscale accident distributions in the prediction stage. Experiments on two datasets demonstrate that our framework surpasses stateoftheart solutions on both 30minute and 10minute level prediction tasks.




Daylevel 



Hourlevel 



Minutelevel    Our work 
Preliminaries and Problem Defintion
In this section, we present the preliminaries and basic definitions, then formally define the problem studied in this paper.
In our work, we find that it leads to unnecessary redundancies, if directly modeling the whole study area as an overall squareshaped region and adopting traditional CNN for spatiotemporal feature extraction, especially for realtime accident forecasting, as the contour of a city is usually irregular. As Figure
2(a) shows, we first divide the study area within the road network into mediumsized rectangular regions (rectangular regions in short). Each rectangular region consists of several smallsized square subregions (subregions in short). There are total subregions in the study area, and we model the subregions by the urban graph.Definition 1 (Urban Graph.)
The study area can be defined as an undirected graph, called urban graph . Here, the vertex set , where denotes the th squareshaped urban subregion. Given two vertexes , the edge within these two vertexes indicates the connectedness between these two subregions, where
(1) 
Note that in this paper, the traffic elements of a vertex consist of two aspects, static road network features and dynamic traffic features. And we keep the
connectedness of the whole urban graph to control sparsity of affinity matrix
and (introduced in the next section), then the corresponding nonzero items in affinity matrix refer to the subregions with strong correlations.The dynamic traffic features of subregion in a specific time interval can be modeled by three parts, (a) the intensity of human activities, represented by traffic volume ; (b) the traffic conditions, represented by the average traffic speed ; and (c) the level of traffic accident risks . The formal definition of dynamic traffic features is as follows.
Definition 2 (Static Road Network Features.)
For an urban subregion
, the static features of road networks within the subregion, cover the statistical spatial attributes of the numbers of road lanes, road types, road segment lengths and widths, snow removal priorities and the numbers of overhead electronic signs, for all road segments inside, can be denoted as a fixed length vector
. The static road network features of the entire urban domain can be formulated as .Definition 3 (Dynamic Traffic Features.)
For , the dynamic traffic features of within a given time interval can be formulated as . is the summation of the number of accidents weighted by the corresponding severity levels ^{1}^{1}1We define three accident risk types: minor accidents, injured accidents, and fatal accidents [6]. We assign weights 1, 2, and 3 to the three types, respectively.. In particular, , where indicates the type of accident severity, denotes the number of accidents of type . So the accident risk distributions and the dynamic traffic features of the entire urban domain within can be represented by and respectively.
Definition 4 (Traffic Accident Prediction.)
Given static road network features and the historical dynamic traffic features , our purpose is to predict the distribution of the citywide traffic accident risks and select highrisk subregions , for the future time interval .
Minutelevel Realtime Traffic Accident Forecasting
In this section, we first show the overview of our proposed framework RiskOracle, and then elaborate on each stage.
Framework Overview
As illustrated in Figure 1, our proposed framework RiskOracle includes three stages, Data preprocessing stage, Model training stage and Prediction stage.
Data Preprocessing
Addressing Spatial Heterogeneities in Accident Prediction
Highrisk values tend to bias to urban areas due to most accidents and traffic volumes are covered downtown, leading to a serious spatial imbalance in risks and ignoring the relatively highrisk regions in rural areas. To perform citywide predictions, it is necessary to select mostlikely accident regions and address the spatial heterogeneities. Thus, as illustrated in Figure 2(a), the subregions are organized hierarchically in our work and they are responsible for collecting fine and coarsegrained accident distributions, respectively. Then subregions in each mediumsized rectangular region will be further highlighted separately. The multiscale distributions can be considered as the hierarchical accident distributions.
Overcoming Zeroinflated Issue
Deep Neural Networsks (DNNs) suffer from zeroinflated issues and predict invalid results if the nonzero items in training labels are extremely rare [wang2018graph, 9]. There only exist 6 accidents in the whole study area during a selected 10minute interval in New York City (NYC) as Figure 2(b) shows, demonstrating the inherent rareness of shortterm accidents. To overcome this issue in realtime accident prediction, we devise a priori knowledgebased data enhancement (PKDE) strategy to discriminate the risk values in labels of training dataset. Specifically, for interval , we transform zero items in to negative values. The transformation is done in two phases: a) the zero value is transformed into accident risk indicator by Equation (2); b) the value of indicator is transformed into statistical accident intensity by Equation (3). Given subregion , we can calculate its accident risk indicator by
(2) 
where is the total number of weeks in the training dataset, and indicates the total risk value of region during all time intervals in the th week. Then, we can calculate the statistical accident intensity of region by
(3) 
where and are the coefficients to maintain symmetry between the range of the absolute value of
and the range of true risk values. With the logarithm transformation within 0 and 1, we can easily make transformed data discriminating and suitable for training networks. The transformation is implemented in such a way: 1) the accident intensity value of a zeroitem subregion is negative and thus smaller than the value of nonzeroitem subregion, reflecting the fact that a zeroitem subregion is with lower accident risk; 2) the subregion with lower accident risk indicator has a lower accident probability, preserving the ranks of actual accidient risks.
Complementing Sparse Sensing Data
Realtime traffic information is usually collected insufficiently [wang2018real] for accident prediction and the dynamic traffic information tends to have interactive effects with the static spatial road network structures [18, 12]. Thus, we propose a cosensing strategy by modifying xDeepFM [lian2018xdeepfm] as SpatioTemporal Deep Factorization Machine (STDFM) by taking advantages of the interaction operations of FM.
We first extract the road network similarities and connections between subregions by static affinity matrix where the item in denotes static affinity within subregion and and the affinity can be calculated by
(4) 
Here, the function is the JensenShannon divergence [lin1991divergence]:
(5) 
The same as xDeepFM, STDFM contains Compressed Interaction Network (CIN) module and DNN module. Three spatiotemporal fields i.e. static spatial features, dynamic traffic features^{2}^{2}2For the dynamic traffic feature field in one subregion , we first select the most proximal subregions with by the static affinity matrix, the available dynamic traffic information within these subregions will constitute of the dynamic traffic features in . and timestamps are embedded in STDFM. Then, STDFM learns the interactive relationships between different spatiotemporal features in vectorwise level with the CIN module and the highlevel representation of features with the DNN module, and finally obtains highlevel feature combinations. We infer speed values by feeding traffic volumes at the corresponding subregion into STDFM and vice versa. Then traffic information can thus be maximumly inferred to obtain global traffic status by training the data within the intersections of two realtime traffic datasets.
Multitask DTGN for Accident Risk Prediction
SpatioTemporal DTGN
The accidents and congestions tend to be interacted and propagated in the road network, especially on holidays or in rush hours. Due to the potential of GCN in modeling nonEuclidean subregionwise propagations and correlations [18], we hereby propose DTGN. We modify GCN by incorporating timevarying overall affinity and differential feature generator to tackle the challenges in minutelevel accident prediction.
Timevarying overall affinity matrix with dynamic traffic features involved. It has been demonstrated strong timevarying correlations between traffic conditions of different urban subregions [26, wang2014data]. Also, there exist strong spatiotemporal correlations between traffic accidents and urban traffic conditions [chen2018radar]. Therefore, for our minutelevel accident prediction, it is indispensable to capture the intersubregion timevarying traffic correlations of a specific time interval by an overall affinity matrix . The item in denotes the dynamic overall affinity within subregions and :
(6) 
includes the traffic volume and average speed of subregion within the same time interval in each day of last week. Notice that we modify the weights of static spatial attributes of subregions based on their different effects on accidents with an attentionbased scheme [bahdanau2014neural]. Also, the accidentbased static features of subregion can be denoted as . Further, a weighted factor is used to adjust the proportion that the dynamic traffic condition affinity accounts for the overall affinity matrix. With such overall affinity, distant subregions but have potential accidentrelated correlations due to traffic characteristics can also be connected dynamically. To perform GCN in spectral domain, we need to calculate the Laplacian matrix [23] with , which can be seen as the graph adjacence martix. First, we derive :
(7) 
where
is the identity matrix of
. Second, we calculate by(8) 
where and is the element in matrix . Then, we can obtain Laplacian matrix of by
(9) 
Differential GCN for extracting spatiotemporal features. It has been generally accepted that the task of accident or event prediction is more relevant to abnormal variations of urban traffic conditions, compared with regular traffic conditions [chen2018radar, zheng2015detecting]. To this end, we introduce a differential feature generator to calculate differential images within adjacent time intervals. By feeding the differential dynamic traffic features into GCN, the propagations and interactions of abnormal changes in traffic can be modeled and the highlevel correlations between the immediate traffic status variations and accidents are learned, especially benefiting minutelevel accident forecasting. Given , the differential vector can be computed by
(10) 
where and . For all subregions in , by combining their dynamic traffic features and the corresponding differential vectors, we generate a united feature tuple . As described in [2], urban traffic has obvious characteristics of three temporal perspectives, hourly closeness, daily periodicity and distant trend. To this end, when given , we select united feature tuples ^{3}^{3}3According to the settings in [2], we set the value of as 3. for each temporal perspective as the inputs of DTGNs. Specifically, we select the last intervals of , the same interval as of the last continuous days for hourly closeness and daily periodicity perspective. And for distant trend, we first select previous days at the frequency of every 10 days and for each of selected days, we extract the same interval as . As illustrated in Figure 1, we then feed the united feature tuple sets of all three temporal perspectives into DTGNs independently. The detailed architecture of one DTGN is demonstrated in Figure 3(a). For one specific temporal perspective, we denote the corresponding united feature tuple set as . We feed into a fullyconnected (FC) network to encode all features into a lowerdimensional feature set, and then feed it into a GCN. The GCN works recursively,
(11) 
Here indicates the layer graph convolution, denotes the weights of the layer graph convolution kernel. Notice here, given one temporal perspective, we take the mean of the matrices of all selected time intervals as
. We use a Batch Normalization between every 2 GCN layers to avoid gradient explosion. Considering negative values in the dataset we transformed, we select
as the activation function. In addition, realtime dynamic external factors, i.e. timestamps and meteorological data, are embedded into a vector of fixed length consecutively, and then fused with the output of each GCN unit. For three temporal perspectives, we denote the output feature maps of DTGN as
, and respectively.Multitask Learning for Accident Risk Prediction
In this subsection, we design the multitask scheme, not only to enhance the deep representation, also to learn hierarchical accident distributions and provide instructions for mostlikely accident region selection. For forecasting accident risks of subregions, we first take the distribution of accident risks as the main task. Considering the prominent correlations between traffic accidents and the intensity of human activities, we take regional traffic volume prediction as the first auxiliary task to enhance the representation. To provide instructional information for the hierarchical accident region selection, we take the total numbers of accidents within different rectangular regions as the second auxiliary task.
Specifically, we feed the output feature maps of DTGN including , and into a convolutionbased fusion module as Figure 1 shows, then perform the multitask learning. We visualize the flow chart of our multitask scheme in Figure 3(b). First, we generate the predicted risk distribution feature map and the citywide traffic volumes as follows, the reasons why we choose Leaky_ReLU in the main task due to the risks in labels are partly transformed into negative values and other tasks remain nonnegative.
(12) 
(13) 
With the additional fullyconnected layer, we learn the total number of accidents within each rectangular region by:
(14) 
Here, , and denote the fusion weights of accident risk, human activity intensity and the numbers of accidents within different medium rectangular regions in time interval , respectively. is the fusion weights of the fullyconnected network in . And can be viewed as the coarsegrained accident distribution, it will be fed into another fullyconnected layer to map to same shape as , and then intergrated with the output of accident risk distribution feature map , compelling both tasks to learn the relationship between multiple accident distributions adequately. Then can be updated by
(15) 
where is the final output of main task and denoting the weights of the fullyconnected layer. So we have the total loss of this multitask learning framework as
(16) 
where , and are the loss of the main task and two auxiliary tasks respectively. We use L2 regularization to avoid the overfitting issue, and use , ,
as the hyperparameters of the loss function.
Hierarchical Mostlikely Accident Region Selection
In a specific city, there often exist imbalanced coverage of accidents and traffic volumes in rural and urban areas, inducing the issue of spatial heterogeneities. Thus, it is illogical to cut off high accident risk with a unified risk threshold for selecting mostlikely accident subregions. We then propose a hierarchical mostlikely accident region selection (HARS) strategy based on the hierarchical accident distributions learned in the multitask scheme.
For each rectangular region , we select subregions with the highest risks and the parameter equals to the corresponding element in learned by the second auxiliary task. In consequence, we obtain a set of mostlikely accident regions. Also, the learned reduces the overpredicted regions and keeps the model conform to the changes of time and weather with external factors involved.
Empirical Studies
In this section, we conduct extensive empirical studies to evaluate our minutelevel prediction framework by setting the temporal intervals as 30 minutes and 10 minutes.
Data Description
We conduct experiments on two realworld datasets: NYC Opendata and Suzhou Industrial Park (SIP) dataset. For NYC dataset, due to the lack of realtime traffic volumes, here we utilize the taxi trip volumes in each subregion as the indicator of human mobilities. For SIP dataset, it contains traffic flows and speeds. We integrate it with another traffic accident dataset collected from Microblog, Sina, a social media platform. The statistics are shown in Table LABEL:tab:datasets. More details are available on the website^{4}^{4}4https://github.com/zzyy0929/AAAI2020RiskOracle/..
Implementation Details
For experiments, we select 60%, 30% and 10% of dataset for training, evaluation and validation, respectively. We generate the subregion set by partitioning the city map into small subregions with equal size referring to common settings [18] and practices. We stack 9 GCN layers with 384 filters in each layer. The weights of the loss function are set as , , . The multitask DTGN is trained with back propagation and Adam method [kingma2014adam].
During training period, dynamic traffic data and affinity matrices are aggregated into 3 groups and twoscale accident distributions are fed into Multitask DTGN. For testing, we fetch the needed data and pass it through the model, mostlikely accident subregions are derived with main and second auxiliary task. The highrisk subregions are highlighted and compared with realworld accident records during the same spatiotemporal scope.
City  Dataset^{5}^{5}5It refers to different types of trafficrelated records in the city.  Time Span 



NYC  Accidents  01/01/2017 05/31/2017  354  254k  
Taxi Trips  48,496k  
Speed Values  125k  
Weathers  604  
Demographics  Investigated in 2016  195  
Road Network  102k  
SIP  Accidents  01/01/2017 03/31/2017  108  183  
Traffic Flows  1,399k  
Speed Values  311k  
Weathers  180 
Evaluation Metrics
We evaluate our proposed RiskOracle from two perspectives, regression perspective and classification perspective. (1) Regression perspective: Mean Square Error (MSE) of predicted risks. (2) Spatial classification perspective: a) Accuracy of top (Acc@) [liao2018predicting], which is widely applied in spatiotemporal ranking tasks, indicates the percentage of accurate predictions in subregions within highest risks. equals 20 and 6 for 30minute and 10minute evaluation in NYC dataset according to the statistics (NYC Accident Records 2017). And similarly, in SIP dataset, equals 5. b) Acc@, where is the summation of learned by the second auxiliary task. Note that Acc1 denotes the accuracy during hours with a high frequency of accidents, i.e. 7:00 a.m.9:00 a.m. and 12:00 p.m.4:00 p.m.
Baselines
Five baselines are as follows: (1) ARIMA
, a classic machine learning algorithm, for understanding and predicting future values, especially for timeseries predictions;
(2) HeteroConvLSTM, the stateoftheart deep learning framework for traffic accident prediction^{6}^{6}6We adjust the hyperparameters to reach its best performance at 4 blocks with 16 filters, and a size of 1212 moving window with step=6. [8]; (3) STResNet, proposed in [2] for predicting traffic flows; (4) SDAE, proposed in [6] for realtime risk prediction, by incorporating human mobilities; (5) SDCAE, the latest method for citywide hourlevel accident risk prediction proposed in [7].Evaluation Results and Analysis
Comparison Performances
Table 3 illustrates the performance comparisons on NYC and SIP datasets with 30minute and 10minute intervals settings. Encouragingly, our framework RiskOracle achieves the highest accuracy and outperforms baselines on almost all metrics. With HARS, our model addresses the spatial heterogeneities and overprediction issue in accident prediction by highlighting subregions in the th rectangular region. Especially on NYC dataset, our RiskOracle improves the accuracy by 22.49% compared with the best baseline on Acc@20. In consequence, RiskOracle is more sensitive to accidents and extensible for sparse sensing data as well as shortterm sporadic spatiotemporal forecasting. Additionally, it shows that our model performs better during highrisk hours, which is desired by real applications in accident forecasting. The reasons why the performance on NYC can be better than SIP dataset may be the incompletion of accident labels in SIP. We will report Acc@ later in ablation studies.
Overall, as the temporal granularity becomes finer, the performances of our framework decrease slightly while baselines decrease sharply as they trap into zeroinflated issue, which demonstrates the effectiveness and scalability of our proposal for shortterm accident prediction. The improvements on both two datasets verify the robustness and generality of our proposed RiskOracle even when the dataset in real applications includes rare accident records.
30minute Interval  10minute Interval  
NYC  Models  MSE  Acc@20  Acc1@20  MSE  Acc@6  Acc1@6 
ARIMA  0.6801  14.23%  20.26%  0.2380  8.62%  10.05%  
HeteroConvLSTM  0.1129  48.04%  58.01%  0.0185  24.53%  42.01%  
STResNet  0.0627  31.06%  40.93%  0.0162  10.02%  27.50%  
SDAE  0.2414  12.08%  27.73%  0.0435  8.33%  12.74%  
SDCAE  0.2209  14.79%  22.64%  0.0076  13.48%  31.48%  
RiskOracle (Ours)  0.1085  70.53%  72.91%  0.0452  45.18%  69.22%  
SIP  Models  MSE^{7}^{7}7Here we report the mse of accidents.  Acc@5  Acc1@5  MSE  Acc@5  Acc1@5 
ARIMA    19.68%  18.42%    23.68%  28.62%  
HeteroConvLSTM  3.392  28.92%  42.37%  3.980  31.42%  48.57%  
STResNet  3.459  60.73%  62.50%  3.180  41.78%  43.50%  
SDAE  3.322  60.26%  36.72%  3.312  12.88%  20.83%  
SDCAE  3.210  58.68%  67.50%  3.455  26.31%  37.50%  
RiskOracle(Ours)  3.270  63.15%  65.24%  3.029  46.30%  48.91% 
Evaluations on Acc@ and Ablation Studies
In Table 4, we report the results on Acc@ which we propose in our paper. We record in each interval, which is the summation of learned by our framework for fair comparisons. It is reasonable that results of Acc@20 and Acc@6 are slightly higher than Acc@ because the uniform threshold cannot adapt to the realtime conditions and tend to overpredict the accidents. In contrast, our framework has the flexibility to potentially approximate the number of accidents in each rectangular region with the multiscale accident distribution forecasting. As observed, our framework can outperform other baselines and achieve an acceptable level of accuracy on Acc@ when compared to results in Table 3, verifying the effectiveness of our hierarchical accident selection mechanism in the task.
Further, to investigate how each component contributes to highquality results, we perform an ablation study to tease apart which components of RiskOracle are most important for its success. The prediction performances of ablative variants of RiskOracle are shown in Table 4 on NYC dataset. RO1 to RO5 represent the variants of removing the following modules from the integrated RiskOracle in turn, PKDE strategy, STDFM, Overall affinity, Differential feature generator and Multitask with HARS. The integrated model consistently outperforms other variants on both 30minute and 10minute levels. Specifically, the timevarying overall affinity and PKDE strategy contribute to the most remarkable promotions. We can conclude that the welldesigned components exactly result in significant improvements in shortterm predictions according to Table 4.
Hyperparameter Studies
Here we show the parameter studies on 30minute level in NYC. We adjust the number of layers and filters in each layer to reach the best performance at 9 layers with 384 filters. We fix the weight of the main task as 1, and adjust , , Acc@ arrives the highest 53.82% when = 0.8 and = 1. Also, we adjust the weight of the dynamic element in overall affinity and reach the best performance when equals among {0, 0.5, 1.0, 1.2, 1.5}. And equals 18 among {9, 18, 33} when the MSE of the second auxiliary task reaches the lowest. Note that our framework is trained offline and the parameters learned are utilized for online prediction. The computation workload can be determined by the parameters and done in several seconds, which sufficiently meets the realtime forecasting requirement.
30minute Interval  10minute Interval  

MSE 


MSE 



RO1  0.069  48%  42%  0.048  28%  25%  
RO2  0.126  69%  53%  0.103  43%  38%  
RO3  0.169  34%  36%  0.124  29%  30%  
RO4  0.115  63%  49%  0.063  42%  31%  
RO5  0.118  65%    0.053  40%    

0.108  70%  57%  0.045  45%  46% 
Case Study
We visualize the accidents predicted by RiskOracle at selected 30minute intervals on one day in Figure 4. Overall, citywide risk maps generated by RiskOracle reveal discriminating risks and the highlighted subregions show great spatial similarities with the ground truth. Here, accidents predicted at 7:00 a.m. are rare, due to that few people go out on Sunday morning. However, the number of accidents increases when afternoon comes and it becomes even worse in the evening. It is mainly because of the heavy rain that evening, causing accidentprone road conditions. The results prove that the auxiliary task and HARS learn to adjust inferences accordingly by capturing dynamic patterns of accident distributions with external factors, which brings in better adaptivity than a unified threshold solution.
Conclusion
In this paper, we tackle the challenges of minutelevel citywide traffic accident forecasting by proposing the integrated framework RiskOracle based on Multitask DTGN, providing a quantitive decisionmaking basis for urban safety in a more timely manner. We first propose two strategies to overcome the zeroinflated issue and sparse sensing. By incorporating the differential feature generator and timevarying overall affinity in Multitask DTGN, our framework has the power to model sporadic spatiotemporal data and capture the shortterm subregionwise correlations. We also highlight mostlikely accident regions to deal with spatial heterogeneities with learnable multiscale accident distributions in the multitask scheme. Experiments on two realworld datasets verify our framework outperforms the stateoftheart solutions. Therefore, our work can be a paradigm for addressing spatiotemporal data mining tasks with sporadic labels and insufficient sensing data, e.g. predictions of the crimes and epidemic outbreaks.
Acknowledgements
This paper is partially supported by the Anhui Science Foundation for Distinguished Young Scholars (No.1908085J24), NSFC (No.61672487, No.61772492), Jiangsu Natural Science Foundation (No.BK20171240, BK20191193) and CAS Pioneer Hundred Talents Program.
Comments
There are no comments yet.