Event detection is an increasing interest in urban studies [1, 2]. Efficiently analyzing the impact of large events can help us assess the performance of urban infrastructure and aid urban management. Nowadays, with the development of intelligent transportation systems, large scale traffic data is accumulating via loop detectors, GPS, high-resolution cameras, etc. The large amount of data provides us insight into the dynamics of urban environment in face of large scale events. The main objective of this article is to develop a method to detect extreme events in traffic datasets describing large urban areas, and to impute missing data during regular conditions.
, meaning there are a large number of entries for which the current traffic condition is not known. The missing data can be caused by the lack of measurements (e.g., no instrumented vehicles recently drove over the road segment), or due to senor failure (e.g., a traffic sensor which loses communication, power, is physically damaged). Missing data can heavily influence the performance of traffic estimation[6, 4, 7, 5], especially as the missing data rate increases. Naive imputation of the missing entries to create a complete dataset is problematic, because without a clear understanding of the overall pattern, an incorrect value can be imputed that will later degrade the performance of an outlier detection algorithm. Consequently, missing data should be carefully handled.
The second challenge is to fully capture and utilize the pattern of regular traffic, in order to correctly impute missing data and separate the outliers out from regular traffic. Studies have suggested that for regular traffic patterns, there exist systematic correlations in time and space [8, 9, 10, 11]. For example, due to daily commute patterns, traffic conditions during Monday morning rush hour are generally repeated but with small variation from one week to the next (e.g., the rush hour might start a little earlier or last a little longer). Also, traffic conditions are spatially structured due to the network connectivity, ending up in global patterns. For example, the traffic volume on one road segment should influence and be influenced by its down stream and upper stream traffic volume.
Most existing researches have not fully addressed these two challenges. On the one hand, missing data and outlier detection tend to be dealt with separately. Either it is assumed that the dataset is complete, for the purpose of outlier detection [12, 13], or it is assumed that the dataset is clean, for the purpose of missing data imputation [14, 15]. Yet in reality missing data and outliers often exist at the same time. On the other hand, currently most studies on traffic outliers consider only a single monitor spot or a small region [16, 17, 18, 19], not fully exploiting the spatio-temporal correlations. Only a few studies scale up to large regions to capture the urban-wise correlation, including the work of Yang et al.  proposing a coupled Bayesian robust principal component analysis
robust principal component analysis(robust PCA or RPCA) approach to detect road traffic events, and Liu et al.  constructing a spatio-temporal outlier tree to discover the causal interactions among outliers.
In this article, we tackle the traffic outlier event detection problem from a different perspective, taking into account missing data and spatio-temporal correlation. Specifically, we model a robust tensor decomposition problem, as illustrated in subsection 1.2
. Furthermore, we develop an efficient algorithm to solve it. We note that the application of our method is not limited to traffic event detection, but can also be applied to general pattern recognition and anomaly detection, where there exist multi-way correlations in the dataset, and in either full or partial observations.
1.2 Solution approach
In this subsection, we develop the robust tensor decomposition model for traffic extreme event detection problem, and develop a robust tensor completion problem to take partial observation into account.
To exploit these temporal and spatial structures, a tensor [21, 22] is introduced to represent the traffic data over time and space. We form a three-way tensor, as shown in Figure 1. The first dimension is the road segment, the second dimension is the time of the week, (Monday Midnight-1am all the way to Sunday 11pm-midnight, entries in total), and the third dimension is the week in the dataset. In this way, the temporal and spacial patterns along different dimensions are naturally encoded. One effective way to quantify this multi-dimensional correlation is the Tucker rank of the tensor [21, 22, 23], which is the generalization of matrix rank to higher dimensions.
As for the extreme events, we expect them to occur relatively infrequently. We encode the outliers in a sparse tensor, which is organized in the same way as the traffic data tensor. Furthermore, extreme events tend to affect the overall traffic of an urban area. That is, we assume that at the time when extreme event occurs, the traffic data of all road segments deviates from the normal pattern. Thus, the outliers occur sparsely as fibers along the road segment dimension with hour of week fixed. This sets it apart from random noises, which appear scattered over the whole tensor entries and unstructured. This fiber-wise sparsity problem is studied in 2D matrix cases , where the norm is used to control the column sparsity, and we adapt it for higher dimensions.
Putting these together, we organize the traffic data into a tensor , then decompose it into two parts,
where tensor contains the data describing the regular traffic patterns, and tensor denotes the outliers, as illustrated in Figure 2. Because the normal traffic patterns is assumed to have strong correlation in time and space, it is approximated by a low rank tensor , Similarly, because outliers are infrequent, we expect the tensor containing outliers, , to be sparse. With these ideas in mind, it is possible formulate the following optimization problem:
The objective function in problem (1) is the weighted cost of tensor rank of denoted as rank(), the fiber-wise sparsity of is denoted as sparsity(), and is a weighting parameter balancing the two costs. A more precise formulation of the problem is provided in Section 4.
In the presence of missing data, we require the decomposition to match the observation data only at the available entries, and come to the optimization problem:
) into convex programming problems, and solve them by extending from matrix case to tensor case a singular value thresholding algorithm[25, 26] based on alternating direction method of multipliers (ADMM) framework. Our algorithm can exactly recover the values of uncorrupted fibers of the low rank tensor, and find the positions of corrupted fibers, based on relatively mild condition of observation and corruption ratio.111The resulting source code is available at https://github.com/Lab-Work/Robust_tensor_recovery_for_traffic_events.
1.3 Contributions and outline
To summarize, this work has three main contributions:
We propose a new robust tensor recovery with fiber outliers model for traffic extreme event detection under full or partial observations, to take full advantage of spatial-temporal correlations in traffic pattern.
We propose ADMM based algorithms to solve the robust tensor recovery under fiber-sparse corruption. Our algorithm can exactly recover the values of uncorrupted fibers of the low rank tensor, and find the positions of corrupted fibers under mild conditions.
We apply our method on a large traffic dataset in downtown Nashville and successfully detect large events.
The rest of the paper is organized as follows. In Section 3, we provide a review of tensor basics and related robust PCA methods. In Section 4, we formulate the tensor outlier detection problem, and propose efficient algorithms to solve it. In Section 5, we demonstrate the effectiveness of our method by numerical experiments. In Section 6, we apply our method on real world data set and show its ability to find large scale events.
2 Related work
In this section, we describe literature on outlier detection. We also compare our methodology with other relevant works.
2.1 outlier detection
The outliers we are interested in this work are due to outliers caused by extreme events. Another related problem considers methods to detect outliers caused by data measurement errors, such as sensor malfunction, malicious tampering, or measurement error [27, 17, 18]. The latter methods can be seen as a part of a standard data cleaning or data pre-processing step. On the other hand, outliers caused by extreme traffic have valuable information for congestion management, and can provide agencies with insights into the performance of urban network. The works [28, 29, 20] explore the problem of outlier detection caused by events, while the works [1, 30, 31, 32] focus on determining the root causes of the outlier.
2.2 Low rank matrix and tensor learning
Low rank matrix and tensor learning has been widely used to utilize the inner structure of the data. Various application have benefited from matrix and tensor based methods, including data completion [33, 9], link prediction , network structure clustering , etc.
The most relevant works with ours are robust matrix and tensor PCA for outlier detection. norm regularized robust tensor recovery, as proposed by Goldfarb and Qin , is useful when data is polluted with unstructured random noises. Tan et al.  also used norm regularized tensor decomposition for traffic data recovery, in face of random noise corruption. But if outliers are structured, for example grouped in columns, norm regularization does not yield good results. In addition, although traffic is also modeled in tensor format in , only a single road segment is considered, not taking into account network spacial structures.
In face of large events, outliers tend to group in columns or fibers in the dataset, as illustrated in section 1.2. norm regularized decomposition is suitable for group outlier detection, as shown in [37, 24] for matrices, and [38, 39] for tensors. In addition, Li et al.  introduced a multi-view low-rank analysis framework for outlier detection, and Wen et al.  used discriminant tensor factorization for event analytics. Our methods differ from the existing tensor outliers pursuit [38, 39] in that they are dealing with slab outliers, i.e., outliers form an entire slice instead of fibers of the tensor. Moreover, compared with existing works, we take one step further and deal with partial observations. As stated in Section 1.1, without an overall understanding of the underlying pattern, we can easily impute the missing entries incorrectly and influence our decision about outliers. We will show in Section 5.3 simulation that our new algorithm can exactly detect the outliers even with 40% missing rate, conditioned on the outlier corruption rate and the rank of the low rank tensor.
In this section, we briefly review the mathematical preliminaries for tensor factorization, adopting the notation of Kolda and Bader , and Goldfarb and Qin . We also summarize robust PCA , since it serves as a foundation for our extension to higher-order tensor decomposition.
3.1 Tensor basics
In this article, a tensor is denoted by an Euler script letter (e.g., ); a matrix by a boldface capital letter (e.g.,
); a vector by a boldface lowercase letter (e.g.,); and a scalar by a lowercase letter (e.g., ). A tensor of order has dimensions, and can be equivalently described as an -way tensor or an -mode tensor. Thus, matrix is a second order tensor, and vector is a first order tensor.
A fiber is a column vector formed by fixing all indices of a tensor but one. In a matrix for example, each column can be viewed as a mode-one fiber, and each row a mode-two fiber.
The unfolding of a tensor in the mode results in a matrix , which is formed by rearranging the mode- fibers as its columns. This process is also called flattening or matricization. The inverse function of unfolding is denoted as , i.e.,
The inner product of two tensors is the sum of their element-wise product, similar to vector inner products. Let and denote the element of and respectively. Then tensor inner product can be expressed as
The tensor Frobenius norm is denoted by , and computed as
Multiplication of a tensor by a matrix in mode is performed by multiplying every mode- fiber of the tensor by the matrix. The mode- product of a tensor and a matrix is denoted by , where . It is also equivalently written via its mode- unfolding as .
in (3) is called the core tensor, where through are given integers. If for some in , the core tensor can be viewed as a compressed version of . The matrices are factor matrices, which are usually assumed to be orthogonal.
The n-rank of , denoted by , is the column rank of . In other words, it is the dimension of the vector space spanned by the mode- fibers. If we denote the -rank of the tensor as for , i.e., , then the set of the -ranks of , , is called the Tucker Rank . In Tucker decomposition (3), if for all in , then the Tucker decomposition is exact. In this case, we can easily conduct the decomposition by setting as the left singular matrix of . Otherwise, if , the decomposition holds only as an approximation, and will be harder to be solved.
3.2 Robust PCA
We briefly summarize robust variants of PCA in the matrix setting, which are extended to higher-order tensor settings in Section 4. Robust PCA belongs to the family of dimension-reduction methods aiming at combating the so-called curse of dimensionality that often appears when dealing with large, high dimensional datasets, by finding the best representing low-dimensional subspace. PCA is a widely used technique in this family, yet it is sensitive to corruptions . For example, if we have a large data matrix that comes from a low rank matrix randomly corrupted by large noises, i.e.,
where is the data matrix, is a low rank matrix, and is a sparse corruption matrix of arbitrary magnitude. In this setting traditional PCA fails at finding the subspace for given only .
To address the problem of gross corruption, Candès et al.  proposed a RPCA method known as Principle Component Pursuit (PCP):
with the matrix norm of given by:
and where denotes the th element of . The nuclear norm of a matrix is denoted as and is computed as the sum of the singular values of :
where the SVD of is .
The nuclear norm in (4) is proposed as the tightest convex relaxation of matrix rank ; and the norm is the convex approximation for element-wise matrix sparsity. Candès et al.  showed that PCP can exactly recover a low rank matrix when it is grossly corrupted at sparse entries. Moreover, by adopting ADMM algorithm, it is possible to solve (4) in polynomial time. The PCP formulation (4) requires incoherence of the column space of the sparse matrix [26, 24], and does not address the setting where entire columns are corrupted.
It is essentially a form of group lasso , where each column is treated as a group. Minimizing encourages the entire columns of to be zero or non-zero, and leads to fewer non-zero columns.
Note that it is hard to recover an uncorrupted column from a completely corrupted one. Therefore, instead of trying to recover the complete low rank matrix, Xu et al.  seeks instead to recover the exact low-dimensional subspace while identifying the location of the corrupted columns. Tang et al.  makes an assumption that if a column is corrupted (i.e., has nonzero entries in this column), then the entries of the corresponding column in the low-rank matrix are zero. This choice allows exact recovery of the low rank matrix in the non-corrupted columns.
Like PCA for a matrix, we note that the Tucker decomposition of a tensor is also sensitive to gross corruption . Motivated by the ideas of Candès et al.  and Tang  for robust matrix PCA, in the next section we address the problem of robust decomposition of tensors with gross fiber-wise corruption.
In this section, we define and pose the higher-order tensor decomposition problem in the presence of fiber outliers and its partial-observation variant as convex programs, and provide efficient algorithms to solve them.
4.1 Problem formulation
The precise setup of the problem is as follows. We are given a high dimensional data tensorwhich is composed of a low-rank tensor that is corrupted in a few fibers. In other words, we have , where is the sparse fiber outlier tensor. We know neither the rank of , nor the number and position of non-zero entries of . Given only , our goal is to reconstruct on the non-corrupted fibers, as well as identify the outlier location. Moreover, we might have only partial observations of , and we seek to complete the decomposition nevertheless.
We do assume knowledge of the mode along which the fiber outliers are distributed; without loss of generality let it be the first dimension. Then it is equivalent to say the mode-1 unfolding of the outlier tensor is column-wise sparse. Thus, we can formulate the problem as:
Computing the rank of a tensor , denoted by rank() is generally an NP-hard problem [42, 22]. One commonly used convex relaxation of the tensor rank is , which sums the nuclear norm of the tensor unfoldings in all modes . In this way, we generalize the matrix nuclear norm to the higher-order case, and explore the potential low rank structure in all dimensions. Problem (5) thus becomes
Next, we deal with the case when the data is only partially available, in addition to observation data being grossly corrupted. We only know the entries , where is an observation index set. Let denote the projection of onto the tensor subspace supported on . Then can be defined as
Then we can force the decomposition to match the observation data only at the available entries, and find the decomposition that minimizes the weighted cost of tensor rank and sparsity, leading to the following model:
In this section, we develop algorithm for tensor decomposition with fiber-wise corruption model formulated in Section 4.1. We first solve (6) for the full-observation setting, then for the partial-observation setting (7), adopting an ADMM method  for each.
4.2.1 Higher-order RPCA
Problem (6) is difficult to solve because the terms in the objective function are interdependent, since each is unfolded from the same tensor . Alternatively, we split into auxiliary variables, , and rewrite (6) as:
where are the unfoldings of in the mode. The constraints ensure that are all equal to the original in problem (6).
Here are the Lagrange multipliers, and is a positive scalar.
Under the ADMM framework, the approach is to iteratively update the three sets of variables . To be specific, at the start of the iteration, we fix and , then for each solve:
Then, we fix and to solve:
Finally we fix and , and update :
Using the property of the Frobenius norm, , problem (12) can be written as:
In the second line of (13), we change the Frobenius norm of a tensor into the Frobenius norm of its -th unfolding, which does not change the actual value of the norm. As a result, the objective function of problem (13) only involves matrices, so we can solve for using the well-established closed form solution (e.g., see proof in Cai et. al ): Then we fold the matrix back into a tensor, i.e., . The truncation operator in for a matrix is where
is the eigenvalue diagonal matrix for. The operation discards the eigenvalues less than , and shrinks the remaining eigenvalues by .
Following the same technique as earlier, (14) is equivalent to:
since they have the same first-order conditions. In order to simplify expression (16), we denote the term by a single variable . Thus,
where in the second line we use the same approach as in (13) in which we replace the tensor Frobenius norm by the equivalent Frobenius norm of the mode one unfolding. The objective function of problem (17) only involves matrices, and the closed form solution is :
where is the column of , is the column of , and the integer is the total number of columns in . This operation effectively sets a column of to zero if its norm is less than , and scales the elements down by a factor otherwise .
Note that compared with the ADMM method where we just update and once, the augmented Lagrangian multipliers (ALM) method  seeks to find the exact solutions for primal variables and before updating Lagrangian multipliers , yielding the framework as
As pointed out by Lin et al. , compared with ALM, not only is ADMM still able to converge to the optimal solution for and , but also the speed performance is better. It is also noted that while in ALM, the and are optimized jointly, in the ADMM implementation, they are in fact updated sequentially . It is often observed in the matrix settings (see e.g., Lin et al. ) that updating the term containing outliers before the low rank term (i.e., before in the tensor setting) the low rank term results in faster convergence. As a consequence this is the approach followed in the numerical implementation of Algorithm 1 used later in this work.
In the implementation of Algorithm 1, we set the convergence criterion as
Where is the tolerance.
4.2.2 Partial Observation
Now we provide an algorithm to solve problem (7). Similar to the matrix setting in Tang et al. , we set the fibers of the low rank tensor to be zero in the locations corresponding to outliers. We introduce a compensation tensor , which is zero for entries in the observation set , and can take any value outside . Thus using the same auxiliary variables technique as in (8), we can reformulate problem (7) as:
Since compensates for whatever the value is in the unobserved entries of , we only need to keep track of the indices of the unobserved entries, and can simply set the unobserved entries of to zero. The augmented Lagrangian function for problem (20) is
5 Numerical experiments
In this section, we apply Algorithms 1 and 2 developed in Section 4.2 on a series of test problems using synthetically generated datasets. We first conduct tensor decomposition on fiber-wise corrupted data, then we examine the case when the data is only partially observed and is also fiber-wise corrupted. For both cases, we compare the performance of our algorithms, which are norm constrained decomposition, with norm constrained decomposition [22, 36]. We demonstrate via the numerical experiments that our algorithms are able to exactly recover the original low-rank tensor, and identify the position of corrupted fibers. In comparison, norm constrained algorithms performs poorly in the fiber-wise corrupted settings, unable to achieve exact recoveries.
5.1 Performance measures and implementation
For each of the numerical experiments, the performance of the algorithms are measured by the relative error of the low rank tensor, as well as the precision and recall of the outlier fibers. Therelative error (RE) of low rank tensor is calculated as:
where is the true low rank tensor modified to take the value 0 in the entries corresponding to the corrupted fibers; is the estimated low rank tensor resulting from application of Algorithm 1 or 2, which also has the value 0 in the fibers that are estimated to be corrupted.
We compute the precision of the algorithm to assess the potential to correctly identify only the outlier fibers. It is computed as:
where the true positives (tp) corresponds the number of estimated outlier fibers which are true outliers, and the false positives (fp) corresponds to the number of estimated outlier fibers which are not true outliers.
The recall, which measures the ability to find all outlier fibers, is defined as
where the false negatives (fn) correspond to the number of true outlier fibers that were not correctly identified by the estimator.
For the convergence criterion we set , and we use an empirical value , where
. The hyperparameterin norm constrained decomposition algorithm is also tuned for its best performance in our settings.
All of the experiments are carried out on a Macbook Pro with quad-core 2.7GHz Intel i7 Processor and 16GB RAM, running Matlab R2018a. We modify and extend the code of Lin et. al , using PROPACK toolbox to efficiently calculate the SVD. The code is modified to update variables in line with the distinct problem formulation using the norm and to scale to tensors rather than matrices. The Tensor Toolbox for Matlab   is also used for tensor manipulations. The resulting source code is available at https://github.com/Lab-Work/Robust_tensor_recovery_for_traffic_events.
5.2 Tensor robust PCA
In this subsection we apply higher-order RPCA to the problems where we have fully observed data with fiber-wise corrupted entries.
5.2.1 Simulation conditions
We synthetically generate the observation data as , where and are the true or “ground truth” low-rank tensor and fiber-sparse tensor, respectively. We generate as a core tensor with size and tucker rank , multiplied in each mode by orthogonal matrices of corresponding dimensions, :
The entries of
are independently sampled from standard Gaussian distribution. The orthogonal matricesare generated via a Gram-Schmidt orthogonalization on vectors of size drawn from standard Gaussian distribution. The sparse tensor is formed by first generating a tensor
, whose entries are i.i.d uniform distribution(0,1). Then we randomly keep a fraction of the fibers of to form . Finally, the corresponding fibers of with respect to non-zero fibers in are set to zero.
5.2.2 Algorithm performance for varying problem sizes
We apply Algorithm 1 on of varying tensor sizes and underlying tucker rank , and predict and using Algorithm 1. We also apply norm constrained decomposition on the same settings. Table 1 compares the result. The corruption rate is set to , i.e., . In all cases, for our algorithm the relative residual errors are less than , which is the same precision that we set for convergence tolerance. That is to say, we can exactly recover the low rank tensors in this setting. The precision and recall are both 1.0, indicating that the outlier detection is also exact. Similar to the observation of Candès’ et al. , the iteration numbers tend to be constant (between 28 and 29 in this case) regardless of tensor size. This indicates that the number of of SVD computations might be limited and insensitive to the size, which is important since SVD is the computational bottleneck of the algorithm. Furthermore, this property is important to allow the problem to solve quickly even on datasets of moderate sizes, as will be shown in a case study in Section 6.
In comparison, although norm constrained decomposition can also detect outliers in high precision and recall, the relative residual errors are relatively, high, in the order of . This indicates that norm constrained decomposition can do an adequate job when corruption ratio is low (), but cannot achieve exact recoveries.
5.2.3 Influence of the corruption rate
Next, we investigate the performance of Algorithm 1 as the corruption ratio changes, and compare the result with norm constrained decomposition. We fix the low-rank tensor at size with a tucker rank of , then vary the gross corruption ratio from 0% to 60%. The results are shown in Figure 3 as an average over 10 trials. In this setting we see that as long as the corruption ratio is below 0.47, Algorithm 1 can precisely recover the low rank tensor, and correctly identify the outlier fibers. On the other hand, the relative error of norm constrained decomposition is constantly higher. After the corruption ratio exceeds 0.2, the estimation is no longer useful, with the relative error exceeding 100%.
5.3 Robust tensor completion
In this subsection we look at the performance of Algorithm 2, when the data is only partially observed, and compare it with norm constrained decomposition.
5.3.1 Simulation conditions
We first generate a full observation data in the same way as Section 5.2. is the low rank tensor, and is the sparse tensor. Then, we form the partial observation data by randomly keeping a fraction of the entries in . We record the indices of the unobserved entries, and set their values in as 0.
5.3.2 Influence of the corruption and observation ratios
First, we apply Algorithm 2 on simulated data with a varying corruption ratio and observation ratio. We fix the low-rank tensor at size with a tucker rank of . For gross corruption ratio at 0.05, 0.1, 0.2, we vary the observation ratio from 0.1 to 1 and run Algorithm 2. We run 10 times and average the results.
Figure 4 shows that for the detection of corrupted fibers, the recall stays at 1. The precision stays at 1 when is above 0.6 but drops dramatically for smaller . The relative error of low rank tensor is zero when the observation ratio with corruption ratio , and when the observation ratio with
. We observe a phase-transition behavior, in that the decomposition is exact when the observation ratiois above a critical threshold, but the performance drops dramatically below the threshold. This critical threshold on the observation ratio varies for each case, namely the algorithm can handle more missing entries as the number of outliers is reduced. But when the corruption ratio is too large, exact recovery is not guaranteed.
Overall, the performance is promising, since we can exactly identify the outlier positions and recover the low rank tensor at non-corrupted entries, even with relatively a large missing ratio, for example when 5% of the fibers are corrupted and 40% of the data is missing.