# Robust Tensor Recovery with Fiber Outliers for Traffic Events

Event detection is gaining increasing attention in smart cities research. Large-scale mobility data serves as an important tool to uncover the dynamics of urban transportation systems, and more often than not the dataset is incomplete. In this article, we develop a method to detect extreme events in large traffic datasets, and to impute missing data during regular conditions. Specifically, we propose a robust tensor recovery problem to recover low rank tensors under fiber-sparse corruptions with partial observations, and use it to identify events, and impute missing data under typical conditions. Our approach is scalable to large urban areas, taking full advantage of the spatio-temporal correlations in traffic patterns. We develop an efficient algorithm to solve the tensor recovery problem based on the alternating direction method of multipliers (ADMM) framework. Compared with existing l_1 norm regularized tensor decomposition methods, our algorithm can exactly recover the values of uncorrupted fibers of a low rank tensor and find the positions of corrupted fibers under mild conditions. Numerical experiments illustrate that our algorithm can exactly detect outliers even with missing data rates as high as 40 rank tensor. Finally, we apply our method on a real traffic dataset corresponding to downtown Nashville, TN, USA and successfully detect the events like severe car crashes, construction lane closures, and other large events that cause significant traffic disruptions.

## Authors

• 13 publications
• 1 publication
• ### A Nonconvex Low-Rank Tensor Completion Model for Spatiotemporal Traffic Data Imputation

Sparsity and missing data problems are very common in spatiotemporal tra...
03/23/2020 ∙ by Xinyu Chen, et al. ∙ 0

• ### Tensor N-tubal rank and its convex relaxation for low-rank tensor recovery

As low-rank modeling has achieved great success in tensor recovery, many...
12/03/2018 ∙ by Yu-Bang Zheng, et al. ∙ 0

• ### Robust Tensor Recovery using Low-Rank Tensor Ring

Robust tensor completion recoveries the low-rank and sparse parts from i...
03/31/2019 ∙ by Huyan Huang, et al. ∙ 0

• ### Recover the lost Phasor Measurement Unit Data Using Alternating Direction Multipliers Method

This paper presents a novel algorithm for recovering missing data of pha...
08/18/2017 ∙ by Mang Liao, et al. ∙ 0

• ### Exact Tensor Completion from Sparsely Corrupted Observations via Convex Optimization

This paper conducts a rigorous analysis for provable estimation of multi...
08/02/2017 ∙ by Jonathan Q. Jiang, et al. ∙ 0

• ### Simple and Efficient Parallelization for Probabilistic Temporal Tensor Factorization

Probabilistic Temporal Tensor Factorization (PTTF) is an effective algor...
11/11/2016 ∙ by Guangxi Li, et al. ∙ 0

• ### Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization

Recent years have witnessed the world-wide emergence of mega-metropolise...
04/25/2019 ∙ by Jingyuan Wang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Motivation

Event detection is an increasing interest in urban studies [1, 2]. Efficiently analyzing the impact of large events can help us assess the performance of urban infrastructure and aid urban management. Nowadays, with the development of intelligent transportation systems, large scale traffic data is accumulating via loop detectors, GPS, high-resolution cameras, etc. The large amount of data provides us insight into the dynamics of urban environment in face of large scale events. The main objective of this article is to develop a method to detect extreme events in traffic datasets describing large urban areas, and to impute missing data during regular conditions.

There are two major challenges in event detecting outliers in traffic datasets. Firstly, most large traffic datasets are incomplete [3, 4, 5]

, meaning there are a large number of entries for which the current traffic condition is not known. The missing data can be caused by the lack of measurements (e.g., no instrumented vehicles recently drove over the road segment), or due to senor failure (e.g., a traffic sensor which loses communication, power, is physically damaged). Missing data can heavily influence the performance of traffic estimation

[6, 4, 7, 5], especially as the missing data rate increases. Naive imputation of the missing entries to create a complete dataset is problematic, because without a clear understanding of the overall pattern, an incorrect value can be imputed that will later degrade the performance of an outlier detection algorithm. Consequently, missing data should be carefully handled.

The second challenge is to fully capture and utilize the pattern of regular traffic, in order to correctly impute missing data and separate the outliers out from regular traffic. Studies have suggested that for regular traffic patterns, there exist systematic correlations in time and space [8, 9, 10, 11]. For example, due to daily commute patterns, traffic conditions during Monday morning rush hour are generally repeated but with small variation from one week to the next (e.g., the rush hour might start a little earlier or last a little longer). Also, traffic conditions are spatially structured due to the network connectivity, ending up in global patterns. For example, the traffic volume on one road segment should influence and be influenced by its down stream and upper stream traffic volume.

Most existing researches have not fully addressed these two challenges. On the one hand, missing data and outlier detection tend to be dealt with separately. Either it is assumed that the dataset is complete, for the purpose of outlier detection [12, 13], or it is assumed that the dataset is clean, for the purpose of missing data imputation [14, 15]. Yet in reality missing data and outliers often exist at the same time. On the other hand, currently most studies on traffic outliers consider only a single monitor spot or a small region [16, 17, 18, 19], not fully exploiting the spatio-temporal correlations. Only a few studies scale up to large regions to capture the urban-wise correlation, including the work of Yang et al. [8] proposing a coupled Bayesian

(robust PCA or RPCA) approach to detect road traffic events, and Liu et al. [20] constructing a spatio-temporal outlier tree to discover the causal interactions among outliers.

In this article, we tackle the traffic outlier event detection problem from a different perspective, taking into account missing data and spatio-temporal correlation. Specifically, we model a robust tensor decomposition problem, as illustrated in subsection 1.2

. Furthermore, we develop an efficient algorithm to solve it. We note that the application of our method is not limited to traffic event detection, but can also be applied to general pattern recognition and anomaly detection, where there exist multi-way correlations in the dataset, and in either full or partial observations.

### 1.2 Solution approach

In this subsection, we develop the robust tensor decomposition model for traffic extreme event detection problem, and develop a robust tensor completion problem to take partial observation into account.

To exploit these temporal and spatial structures, a tensor [21, 22] is introduced to represent the traffic data over time and space. We form a three-way tensor, as shown in Figure 1. The first dimension is the road segment, the second dimension is the time of the week, (Monday Midnight-1am all the way to Sunday 11pm-midnight, entries in total), and the third dimension is the week in the dataset. In this way, the temporal and spacial patterns along different dimensions are naturally encoded. One effective way to quantify this multi-dimensional correlation is the Tucker rank of the tensor [21, 22, 23], which is the generalization of matrix rank to higher dimensions.

As for the extreme events, we expect them to occur relatively infrequently. We encode the outliers in a sparse tensor, which is organized in the same way as the traffic data tensor. Furthermore, extreme events tend to affect the overall traffic of an urban area. That is, we assume that at the time when extreme event occurs, the traffic data of all road segments deviates from the normal pattern. Thus, the outliers occur sparsely as fibers along the road segment dimension with hour of week fixed. This sets it apart from random noises, which appear scattered over the whole tensor entries and unstructured. This fiber-wise sparsity problem is studied in 2D matrix cases [24], where the norm is used to control the column sparsity, and we adapt it for higher dimensions.

Putting these together, we organize the traffic data into a tensor , then decompose it into two parts,

 B=X+E,

where tensor contains the data describing the regular traffic patterns, and tensor denotes the outliers, as illustrated in Figure 2. Because the normal traffic patterns is assumed to have strong correlation in time and space, it is approximated by a low rank tensor , Similarly, because outliers are infrequent, we expect the tensor containing outliers, , to be sparse. With these ideas in mind, it is possible formulate the following optimization problem:

 minX,E rank(X)+λ sparsity(E) (1) s.t. B=X+E.

The objective function in problem (1) is the weighted cost of tensor rank of denoted as rank(), the fiber-wise sparsity of is denoted as sparsity(), and is a weighting parameter balancing the two costs. A more precise formulation of the problem is provided in Section 4.

In the presence of missing data, we require the decomposition to match the observation data only at the available entries, and come to the optimization problem:

 minX,E rank(X)+λ sparsity(E) (2) s.t. Bi1i2…iN=(X+E)i1i2…iN, where (i1,i2,…,iN) is an observed entry% .

In this paper, we turn problem (1) and (2

) into convex programming problems, and solve them by extending from matrix case to tensor case a singular value thresholding algorithm

[25, 26] based on alternating direction method of multipliers (ADMM) framework. Our algorithm can exactly recover the values of uncorrupted fibers of the low rank tensor, and find the positions of corrupted fibers, based on relatively mild condition of observation and corruption ratio.111The resulting source code is available at https://github.com/Lab-Work/Robust_tensor_recovery_for_traffic_events.

### 1.3 Contributions and outline

To summarize, this work has three main contributions:

1. We propose a new robust tensor recovery with fiber outliers model for traffic extreme event detection under full or partial observations, to take full advantage of spatial-temporal correlations in traffic pattern.

2. We propose ADMM based algorithms to solve the robust tensor recovery under fiber-sparse corruption. Our algorithm can exactly recover the values of uncorrupted fibers of the low rank tensor, and find the positions of corrupted fibers under mild conditions.

3. We apply our method on a large traffic dataset in downtown Nashville and successfully detect large events.

The rest of the paper is organized as follows. In Section 3, we provide a review of tensor basics and related robust PCA methods. In Section 4, we formulate the tensor outlier detection problem, and propose efficient algorithms to solve it. In Section 5, we demonstrate the effectiveness of our method by numerical experiments. In Section 6, we apply our method on real world data set and show its ability to find large scale events.

## 2 Related work

In this section, we describe literature on outlier detection. We also compare our methodology with other relevant works.

### 2.1 outlier detection

The outliers we are interested in this work are due to outliers caused by extreme events. Another related problem considers methods to detect outliers caused by data measurement errors, such as sensor malfunction, malicious tampering, or measurement error [27, 17, 18]. The latter methods can be seen as a part of a standard data cleaning or data pre-processing step. On the other hand, outliers caused by extreme traffic have valuable information for congestion management, and can provide agencies with insights into the performance of urban network. The works [28, 29, 20] explore the problem of outlier detection caused by events, while the works [1, 30, 31, 32] focus on determining the root causes of the outlier.

### 2.2 Low rank matrix and tensor learning

Low rank matrix and tensor learning has been widely used to utilize the inner structure of the data. Various application have benefited from matrix and tensor based methods, including data completion [33, 9], link prediction [34], network structure clustering [35], etc.

The most relevant works with ours are robust matrix and tensor PCA for outlier detection. norm regularized robust tensor recovery, as proposed by Goldfarb and Qin [22], is useful when data is polluted with unstructured random noises. Tan et al. [36] also used norm regularized tensor decomposition for traffic data recovery, in face of random noise corruption. But if outliers are structured, for example grouped in columns, norm regularization does not yield good results. In addition, although traffic is also modeled in tensor format in [36], only a single road segment is considered, not taking into account network spacial structures.

In face of large events, outliers tend to group in columns or fibers in the dataset, as illustrated in section 1.2. norm regularized decomposition is suitable for group outlier detection, as shown in [37, 24] for matrices, and [38, 39] for tensors. In addition, Li et al. [40] introduced a multi-view low-rank analysis framework for outlier detection, and Wen et al. [2] used discriminant tensor factorization for event analytics. Our methods differ from the existing tensor outliers pursuit [38, 39] in that they are dealing with slab outliers, i.e., outliers form an entire slice instead of fibers of the tensor. Moreover, compared with existing works, we take one step further and deal with partial observations. As stated in Section 1.1, without an overall understanding of the underlying pattern, we can easily impute the missing entries incorrectly and influence our decision about outliers. We will show in Section 5.3 simulation that our new algorithm can exactly detect the outliers even with 40% missing rate, conditioned on the outlier corruption rate and the rank of the low rank tensor.

## 3 Preliminaries

In this section, we briefly review the mathematical preliminaries for tensor factorization, adopting the notation of Kolda and Bader [21], and Goldfarb and Qin [22]. We also summarize robust PCA [26], since it serves as a foundation for our extension to higher-order tensor decomposition.

### 3.1 Tensor basics

In this article, a tensor is denoted by an Euler script letter (e.g., ); a matrix by a boldface capital letter (e.g.,

); a vector by a boldface lowercase letter (e.g.,

); and a scalar by a lowercase letter (e.g., ). A tensor of order has dimensions, and can be equivalently described as an -way tensor or an -mode tensor. Thus, matrix is a second order tensor, and vector is a first order tensor.

A fiber is a column vector formed by fixing all indices of a tensor but one. In a matrix for example, each column can be viewed as a mode-one fiber, and each row a mode-two fiber.

The unfolding of a tensor in the mode results in a matrix , which is formed by rearranging the mode- fibers as its columns. This process is also called flattening or matricization. The inverse function of unfolding is denoted as , i.e.,

 foldn(X(n))=X.

The inner product of two tensors is the sum of their element-wise product, similar to vector inner products. Let and denote the element of and respectively. Then tensor inner product can be expressed as

 ⟨X,Y⟩=I1∑i1=1I2∑i2=1…IN∑iN=1xi1i2…iNyi1i2…iN.

The tensor Frobenius norm is denoted by , and computed as

 ∥X∥F=√⟨X,X⟩.

Multiplication of a tensor by a matrix in mode is performed by multiplying every mode- fiber of the tensor by the matrix. The mode- product of a tensor and a matrix is denoted by , where . It is also equivalently written via its mode- unfolding as .

The Tucker decomposition [22, 21] is the generalization of matrix PCA in higher dimensions. It approximates a tensor as a core tensor multiplied in each mode by an appropriately sized matrix :

 X≈G×1U(1)×2U(2)×⋯×NU(N). (3)

in (3) is called the core tensor, where through are given integers. If for some in , the core tensor can be viewed as a compressed version of . The matrices are factor matrices, which are usually assumed to be orthogonal.

The n-rank of , denoted by , is the column rank of . In other words, it is the dimension of the vector space spanned by the mode- fibers. If we denote the -rank of the tensor as for , i.e., , then the set of the -ranks of , , is called the Tucker Rank [21]. In Tucker decomposition (3), if for all in , then the Tucker decomposition is exact. In this case, we can easily conduct the decomposition by setting as the left singular matrix of . Otherwise, if , the decomposition holds only as an approximation, and will be harder to be solved.

### 3.2 Robust PCA

We briefly summarize robust variants of PCA in the matrix setting, which are extended to higher-order tensor settings in Section 4. Robust PCA belongs to the family of dimension-reduction methods aiming at combating the so-called curse of dimensionality that often appears when dealing with large, high dimensional datasets, by finding the best representing low-dimensional subspace. PCA is a widely used technique in this family, yet it is sensitive to corruptions [26]. For example, if we have a large data matrix that comes from a low rank matrix randomly corrupted by large noises, i.e.,

 B=X+E,

where is the data matrix, is a low rank matrix, and is a sparse corruption matrix of arbitrary magnitude. In this setting traditional PCA fails at finding the subspace for given only .

To address the problem of gross corruption, Candès et al. [26] proposed a RPCA method known as Principle Component Pursuit (PCP):

 minX,E ∥X∥∗+λ∥E∥1 (4) s.t. B=X+E,

with the matrix norm of given by:

 ∥E∥1:=I1∑i=1I2∑j=1|eij|,

and where denotes the th element of . The nuclear norm of a matrix is denoted as and is computed as the sum of the singular values of :

 ∥X∥∗:=∑iσi.

where the SVD of is .

The nuclear norm in (4) is proposed as the tightest convex relaxation of matrix rank [25]; and the norm is the convex approximation for element-wise matrix sparsity. Candès et al. [26] showed that PCP can exactly recover a low rank matrix when it is grossly corrupted at sparse entries. Moreover, by adopting ADMM algorithm, it is possible to solve (4) in polynomial time. The PCP formulation (4) requires incoherence of the column space of the sparse matrix  [26, 24], and does not address the setting where entire columns are corrupted.

An alternative problem formulation using an norm on in (4) is introduced for matrix recovery with column-wise corruption [24, 37]. The norm of a matrix is defined as

 ∥E∥2,1=I2∑j=1 ⎷I1∑i=1(eij)2.

It is essentially a form of group lasso [41], where each column is treated as a group. Minimizing encourages the entire columns of to be zero or non-zero, and leads to fewer non-zero columns.

Note that it is hard to recover an uncorrupted column from a completely corrupted one. Therefore, instead of trying to recover the complete low rank matrix, Xu et al. [24] seeks instead to recover the exact low-dimensional subspace while identifying the location of the corrupted columns. Tang et al. [37] makes an assumption that if a column is corrupted (i.e., has nonzero entries in this column), then the entries of the corresponding column in the low-rank matrix are zero. This choice allows exact recovery of the low rank matrix in the non-corrupted columns.

Like PCA for a matrix, we note that the Tucker decomposition of a tensor is also sensitive to gross corruption [22]. Motivated by the ideas of Candès et al. [26] and Tang [37] for robust matrix PCA, in the next section we address the problem of robust decomposition of tensors with gross fiber-wise corruption.

## 4 Methods

In this section, we define and pose the higher-order tensor decomposition problem in the presence of fiber outliers and its partial-observation variant as convex programs, and provide efficient algorithms to solve them.

### 4.1 Problem formulation

The precise setup of the problem is as follows. We are given a high dimensional data tensor

which is composed of a low-rank tensor that is corrupted in a few fibers. In other words, we have , where is the sparse fiber outlier tensor. We know neither the rank of , nor the number and position of non-zero entries of . Given only , our goal is to reconstruct on the non-corrupted fibers, as well as identify the outlier location. Moreover, we might have only partial observations of , and we seek to complete the decomposition nevertheless.

We do assume knowledge of the mode along which the fiber outliers are distributed; without loss of generality let it be the first dimension. Then it is equivalent to say the mode-1 unfolding of the outlier tensor is column-wise sparse. Thus, we can formulate the problem as:

 minX,E rank(X)+λ∥E(1)∥2,1 (5) s.t. B=X+E.

Computing the rank of a tensor , denoted by rank() is generally an NP-hard problem [42, 22]. One commonly used convex relaxation of the tensor rank is , which sums the nuclear norm of the tensor unfoldings in all modes [22]. In this way, we generalize the matrix nuclear norm to the higher-order case, and explore the potential low rank structure in all dimensions. Problem (5) thus becomes

 minX,E N∑i=1∥X(i)∥∗+λ∥E(1)∥2,1 (6) s.t. B=X+E.

Next, we deal with the case when the data is only partially available, in addition to observation data being grossly corrupted. We only know the entries , where is an observation index set. Let denote the projection of onto the tensor subspace supported on . Then can be defined as

 XΩ={Xi1i2…iN,(i1,i2,…,iN)∈Ω0,otherwise.

Then we can force the decomposition to match the observation data only at the available entries, and find the decomposition that minimizes the weighted cost of tensor rank and sparsity, leading to the following model:

 minX,E N∑i=1∥X(i)∥∗+λ∥E(1)∥2,1 (7) s.t. BΩ=(X+E)Ω.

Note that related problems to (7) for the matrix setting are addressed in [26, 43]).

### 4.2 Algorithm

In this section, we develop algorithm for tensor decomposition with fiber-wise corruption model formulated in Section 4.1. We first solve (6) for the full-observation setting, then for the partial-observation setting (7), adopting an ADMM method [22] for each.

#### 4.2.1 Higher-order RPCA

Problem (6) is difficult to solve because the terms in the objective function are interdependent, since each is unfolded from the same tensor . Alternatively, we split into auxiliary variables, , and rewrite (6) as:

 minXi,E N∑i=1∥X\mathnormali(i)∥∗+λ∥E(1)∥2,1 (8) s.t. B=X\mathnormali+E,i=1,2,…, N,

where are the unfoldings of in the mode. The constraints ensure that are all equal to the original in problem (6).

Next, we proceed to solve problem (8) via an ADMM algorithm. A full explanation of the general ADMM framework can be found in [44]. The corresponding augmented Lagrangian function for problem (8) is

 L(X1,X2,…,XN,E,Y1,Y2,…,YN;μ)=N∑i=1∥Xi(i)∥∗+λ∥E(1)∥2,1+ N∑i=1(μ2∥X\mathnormali+E−B)∥2F−⟨Y\mathnormali,X\mathnormali+E−B⟩).

Here are the Lagrange multipliers, and is a positive scalar.

Under the ADMM framework, the approach is to iteratively update the three sets of variables . To be specific, at the start of the iteration, we fix and , then for each solve:

 Xk+1i=argmin XiL(Xi,Ek,Yki;μ). (9)

Then, we fix and to solve:

 Ek+1=argmin EL(Xk+1i,E,Yki;μ). (10)

Finally we fix and , and update :

 Yk+1i=Yki+μ(B−Xk+1\mathnormali−Ek+1). (11)

Next we derive closed form solutions for problem (9) and for problem (10). Problem (9), written out, reads:

 Xk+1i=argmin Xi∥Xi(i)∥∗+μ2∥X\mathnormali+E\mathnormalk−B)∥2F−⟨Y\mathnormalk\mathnormali,X\mathnormali+E\mathnormalk−B⟩. (12)

Using the property of the Frobenius norm, , problem (12) can be written as:

 Xk+1i = argmin Xi∥Xi(i)∥∗+μ2∥∥∥1μY\mathnormalk\mathnormali+B−E\mathnormalk−X\mathnormali∥∥∥2F (13) = argmin Xi∥Xi(i)∥∗+μ2∥∥∥1μYki(i)+B(i)−Ek(i)−X\mathnormali(i)∥∥∥2F.

In the second line of (13), we change the Frobenius norm of a tensor into the Frobenius norm of its -th unfolding, which does not change the actual value of the norm. As a result, the objective function of problem (13) only involves matrices, so we can solve for using the well-established closed form solution (e.g., see proof in Cai et. al [25]): Then we fold the matrix back into a tensor, i.e., . The truncation operator in for a matrix is where

is the eigenvalue diagonal matrix for

. The operation discards the eigenvalues less than , and shrinks the remaining eigenvalues by .

We now proceed to derive a closed form solution to update in problem (10). Problem (10) is equivalent to solving:

 Ek+1=argmin Eλ∥E(1)∥2,1+N∑i=1(μ∥Xk+1i+E−B)∥2F−⟨Yki,Xk+1i+E−B⟩). (14)

Following the same technique as earlier, (14) is equivalent to:

 Ek+1= argmin Eλ∥E(1)∥2,1+N∑i=1(μ2∥∥∥1μYki+B−Xk+1i−E∥∥∥2F). (15)

By the proof of Goldfarb and Qin [22], problem (15) shares the same solution with:

 Ek+1= (16)

since they have the same first-order conditions. In order to simplify expression (16), we denote the term by a single variable . Thus,

 Ek+1 = argmin Eλ∥E(1)∥2,1+μN2∥E−C∥2F (17) = argmin Eλ∥E(1)∥2,1+μN2∥E(1)−C(1)∥2F,

where in the second line we use the same approach as in (13) in which we replace the tensor Frobenius norm by the equivalent Frobenius norm of the mode one unfolding. The objective function of problem (17) only involves matrices, and the closed form solution is [37]:

 Ek+1(1)j=C(1)j max{0,1−λμN∥C(1)j∥2}, for j=1,2,…,p, (18)

where is the column of , is the column of , and the integer is the total number of columns in . This operation effectively sets a column of to zero if its norm is less than , and scales the elements down by a factor otherwise [37].

Note that compared with the ADMM method where we just update and once, the augmented Lagrangian multipliers (ALM) method [45] seeks to find the exact solutions for primal variables and before updating Lagrangian multipliers , yielding the framework as

 (Xk+1i,Ek+1)= argmin Xi,EL(Xi,E,Yki;μ) Yk+1i= Yki+μ(B−Xk+1\mathnormali−Ek+1).

As pointed out by Lin et al. [45], compared with ALM, not only is ADMM still able to converge to the optimal solution for and , but also the speed performance is better. It is also noted that while in ALM, the and are optimized jointly, in the ADMM implementation, they are in fact updated sequentially [44]. It is often observed in the matrix settings (see e.g., Lin et al. [45]) that updating the term containing outliers before the low rank term (i.e., before in the tensor setting) the low rank term results in faster convergence. As a consequence this is the approach followed in the numerical implementation of Algorithm 1 used later in this work.

In the implementation of Algorithm 1, we set the convergence criterion as

 ∥B−E−X∥F∥B∥F≤ϵ, (19)

Where is the tolerance.

#### 4.2.2 Partial Observation

Now we provide an algorithm to solve problem (7). Similar to the matrix setting in Tang et al. [37], we set the fibers of the low rank tensor to be zero in the locations corresponding to outliers. We introduce a compensation tensor , which is zero for entries in the observation set , and can take any value outside . Thus using the same auxiliary variables technique as in (8), we can reformulate problem (7) as:

 minXi,E N∑i=1∥Xi(i)∥∗+λ∥E(1)∥2,1 (20) s.t. B=X\mathnormali+E+O,i=1,2,…, N, OΩ=0.

Since compensates for whatever the value is in the unobserved entries of , we only need to keep track of the indices of the unobserved entries, and can simply set the unobserved entries of to zero. The augmented Lagrangian function for problem (20) is

 L(X1,X2,…,XN,E,O,Y1,Y2,…,YN;μ)=N∑i=1∥Xi(i)∥∗+λ∥E(1)∥2,1+ N∑i=1(μ2∥X\mathnormali+E+O−B)∥2F−⟨Y\mathnormali,X\mathnormali+E+O−B⟩).

We again use the ADMM framework now iteratively updating , , and . The proof of the closed form solution for updating , and is similar to Algorithm 1. For , we fix , and , to solve (21):

 Ok+1 = argminO (21) s.t. OΩ=0.

Following the same procedure as before (see (14), (15) and (16)), we have:

 Ok+1 = argmin O N∑i=1(μ2∥∥Xk+1i+Ek+1+O−B)∥∥2F−⟨Yki,Xk+1i+Ek+1+O−B⟩) (22) = argmin O N∑i=1(μ2∥∥∥1μYki+B−Xk+1i−Ek+1−O∥∥∥2F) = argmin O μN2∥∥ ∥∥O−1NN∑i=1(1μYki+B−Xk+1i−Ek+1)∥∥ ∥∥2F, s.t. OΩ=0.

For (22), we simply set for entries , and zero otherwise. The procedure is summarized in Algorithm 2.

We set the convergence criterion of Algorithm 2 as

 ∥B−E−X−O∥F∥B∥F≤ϵ,

which is similar to (19) but accounts for the additional tensor .

## 5 Numerical experiments

In this section, we apply Algorithms 1 and 2 developed in Section 4.2 on a series of test problems using synthetically generated datasets. We first conduct tensor decomposition on fiber-wise corrupted data, then we examine the case when the data is only partially observed and is also fiber-wise corrupted. For both cases, we compare the performance of our algorithms, which are norm constrained decomposition, with norm constrained decomposition [22, 36]. We demonstrate via the numerical experiments that our algorithms are able to exactly recover the original low-rank tensor, and identify the position of corrupted fibers. In comparison, norm constrained algorithms performs poorly in the fiber-wise corrupted settings, unable to achieve exact recoveries.

### 5.1 Performance measures and implementation

For each of the numerical experiments, the performance of the algorithms are measured by the relative error of the low rank tensor, as well as the precision and recall of the outlier fibers. The

relative error (RE) of low rank tensor is calculated as:

 RE=∥X0−^X∥F∥X0∥F,

where is the true low rank tensor modified to take the value 0 in the entries corresponding to the corrupted fibers; is the estimated low rank tensor resulting from application of Algorithm 1 or 2, which also has the value 0 in the fibers that are estimated to be corrupted.

We compute the precision of the algorithm to assess the potential to correctly identify only the outlier fibers. It is computed as:

 precision=tptp+fp,

where the true positives (tp) corresponds the number of estimated outlier fibers which are true outliers, and the false positives (fp) corresponds to the number of estimated outlier fibers which are not true outliers.

The recall, which measures the ability to find all outlier fibers, is defined as

 recall=tptp+fn,

where the false negatives (fn) correspond to the number of true outlier fibers that were not correctly identified by the estimator.

For the convergence criterion we set , and we use an empirical value , where

. The hyperparameter

in norm constrained decomposition algorithm is also tuned for its best performance in our settings.

All of the experiments are carried out on a Macbook Pro with quad-core 2.7GHz Intel i7 Processor and 16GB RAM, running Matlab R2018a. We modify and extend the code of Lin et. al [45], using PROPACK toolbox to efficiently calculate the SVD. The code is modified to update variables in line with the distinct problem formulation using the norm and to scale to tensors rather than matrices. The Tensor Toolbox for Matlab [46] [47] is also used for tensor manipulations. The resulting source code is available at https://github.com/Lab-Work/Robust_tensor_recovery_for_traffic_events.

### 5.2 Tensor robust PCA

In this subsection we apply higher-order RPCA to the problems where we have fully observed data with fiber-wise corrupted entries.

#### 5.2.1 Simulation conditions

We synthetically generate the observation data as , where and are the true or “ground truth” low-rank tensor and fiber-sparse tensor, respectively. We generate as a core tensor with size and tucker rank , multiplied in each mode by orthogonal matrices of corresponding dimensions, :

 X0=G×1U(1)×2U(2)×3U(3).

The entries of

are independently sampled from standard Gaussian distribution. The orthogonal matrices

are generated via a Gram-Schmidt orthogonalization on vectors of size drawn from standard Gaussian distribution. The sparse tensor is formed by first generating a tensor

, whose entries are i.i.d uniform distribution

(0,1). Then we randomly keep a fraction of the fibers of to form . Finally, the corresponding fibers of with respect to non-zero fibers in are set to zero.

#### 5.2.2 Algorithm performance for varying problem sizes

We apply Algorithm 1 on of varying tensor sizes and underlying tucker rank , and predict and using Algorithm 1. We also apply norm constrained decomposition on the same settings. Table 1 compares the result. The corruption rate is set to , i.e., . In all cases, for our algorithm the relative residual errors are less than , which is the same precision that we set for convergence tolerance. That is to say, we can exactly recover the low rank tensors in this setting. The precision and recall are both 1.0, indicating that the outlier detection is also exact. Similar to the observation of Candès’ et al. [26], the iteration numbers tend to be constant (between 28 and 29 in this case) regardless of tensor size. This indicates that the number of of SVD computations might be limited and insensitive to the size, which is important since SVD is the computational bottleneck of the algorithm. Furthermore, this property is important to allow the problem to solve quickly even on datasets of moderate sizes, as will be shown in a case study in Section 6.

In comparison, although norm constrained decomposition can also detect outliers in high precision and recall, the relative residual errors are relatively, high, in the order of . This indicates that norm constrained decomposition can do an adequate job when corruption ratio is low (), but cannot achieve exact recoveries.

#### 5.2.3 Influence of the corruption rate

Next, we investigate the performance of Algorithm 1 as the corruption ratio changes, and compare the result with norm constrained decomposition. We fix the low-rank tensor at size with a tucker rank of , then vary the gross corruption ratio from 0% to 60%. The results are shown in Figure 3 as an average over 10 trials. In this setting we see that as long as the corruption ratio is below 0.47, Algorithm 1 can precisely recover the low rank tensor, and correctly identify the outlier fibers. On the other hand, the relative error of norm constrained decomposition is constantly higher. After the corruption ratio exceeds 0.2, the estimation is no longer useful, with the relative error exceeding 100%.

### 5.3 Robust tensor completion

In this subsection we look at the performance of Algorithm 2, when the data is only partially observed, and compare it with norm constrained decomposition.

#### 5.3.1 Simulation conditions

We first generate a full observation data in the same way as Section 5.2. is the low rank tensor, and is the sparse tensor. Then, we form the partial observation data by randomly keeping a fraction of the entries in . We record the indices of the unobserved entries, and set their values in as 0.

#### 5.3.2 Influence of the corruption and observation ratios

First, we apply Algorithm 2 on simulated data with a varying corruption ratio and observation ratio. We fix the low-rank tensor at size with a tucker rank of . For gross corruption ratio at 0.05, 0.1, 0.2, we vary the observation ratio from 0.1 to 1 and run Algorithm 2. We run 10 times and average the results.

Figure 4 shows that for the detection of corrupted fibers, the recall stays at 1. The precision stays at 1 when is above 0.6 but drops dramatically for smaller . The relative error of low rank tensor is zero when the observation ratio with corruption ratio , and when the observation ratio with

. We observe a phase-transition behavior, in that the decomposition is exact when the observation ratio

is above a critical threshold, but the performance drops dramatically below the threshold. This critical threshold on the observation ratio varies for each case, namely the algorithm can handle more missing entries as the number of outliers is reduced. But when the corruption ratio is too large, exact recovery is not guaranteed.

Overall, the performance is promising, since we can exactly identify the outlier positions and recover the low rank tensor at non-corrupted entries, even with relatively a large missing ratio, for example when 5% of the fibers are corrupted and 40% of the data is missing.