online robust PCA
Robust PCA methods are typically batch algorithms which requires loading all observations into memory before processing. This makes them inefficient to process big data. In this paper, we develop an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can successfully track not only slowly changing subspace but also abruptly changed subspace. By embedding hypothesis testing into the algorithm, OMWRPCA can detect change points of the underlying subspaces. Extensive simulation studies demonstrate the superior performance of OMWRPCA compared with other state-of-art approaches. We also apply the algorithm for real-time background subtraction of surveillance video.READ FULL TEXT VIEW PDF
Principal Component Analysis (PCA) is a common multivariate statistical
Principal Component Analysis (PCA) is a fundamental method for estimatin...
In the current context of data explosion, online techniques that do not
We propose algorithms for online principal component analysis (PCA) and
Principal component pursuit (PCP) is a state-of-the-art approach for
Key frame extraction algorithms consider the problem of selecting a subs...
The high-dimensionality and volume of large scale multistream data has
online robust PCA
Low-rank subspace models are powerful tools to analyze high-dimensional data from dynamical systems. Applications include tracking in radar and sonar2], recommender system , cloud removal in satellite images 5] and background subtraction for surveillance video [6, 7, 8]. Principal component analysis (PCA) is the most widely used tool to obtain a low-rank subspace from high dimensional data. For a given rank , PCA finds the
dimensional linear subspace which minimizes the square error loss between the data vector and its projection onto that subspace. Although PCA is widely used, it has the drawback that it is not robust against outliers. Even with one grossly corrupted entry, the low-rank subspace estimated from classic PCA can be arbitrarily far from the true subspace. This shortcoming puts the application of the PCA technique into jeopardy for practical usage as outliers are often observed in the modern world’s massive data. For example, data collected through sensors, cameras and websites are often very noisy and contains error entries or outliers.
Various versions of robust PCA have been developed in the past few decades, including [9, 6, 10, 11, 12]. Among them, the Robust PCA based on Principal Component Pursuit (RPCA-PCP) [6, 13] is the most promising one, as it has both good practical performance and strong theoretical performance guarantees. A comprehensive review of the application of Robust PCA based on Principal Component Pursuit in surveillance video can be found in . RPCA-PCP decomposes the observed matrix into a low-rank matrix and a sparse matrix by solving Principal Component Pursuit. Under mild condition, both the low-rank matrix and the sparse matrix can be recovered exactly.
Most robust PCA methods including RPCA-PCP are implemented in the batch manner. Batch algorithms not only require huge amount of memory storage as all observed data needs to be stored in the memory before processing, but also become too slow to process big data. An online version of robust PCA method is highly necessary to process incremental dataset, where the tracked subspace can be updated whenever a new sample is received. The online version of robust PCA can be applied to video analysis, change point detection, abnormal events detection and network monitoring.
There are mainly three versions of online robust PCA proposed in the literature. They are Grassmannian Robust Adaptive Subspace Tracking Algorithm (GRASTA) , Recursive Projected Compress Sensing (ReProCS) [15, 16] and Online Robust PCA via Stochastic Optimization (RPCA-STOC) . GRASTA applies incremental gradient descent method on Grassmannian manifold to do robust PCA. It is built on Grassmannian Rank-One Update Subspace Estimation (GROUSE) , a widely used online subspace tracking algirithm. GRASTA uses the cost function to replace the cost function in GROUSE. In each step, GRASTA updates the subspace with gradient descent. ReProCS is designed mainly to solve the foreground and background separation problem for video sequence. ReProCS recursively finds the low-rank matrix and the sparse matrix through four steps: 1. perpendicular projection of observed vector to low-rank subspace; 2. sparse recovery of sparse error through optimization; 3. recover low-rank observation; 4. update low-rank subspace. Both GRASTA and Prac-ReProCS can only handle slowly changing subspaces. RPCA-STOC is based on stochastic optimization for a close reformulation of the Robust PCA based on Principal Component Pursuit . In particular, RPCA-STOC splits the nuclear norm of the low-rank matrix as the sum of Frobenius norm of two matrices, and iteratively updates the low-rank subspace and finds the sparse vector. RPCA-STOC works only with stable subspace.
Most of the previously developed online robust PCA algorithms only deal with stable subspaces or slowly changing subspaces. However, in practice, the low-rank subspace may change suddenly, and the corresponding change points are usually of great interests. For example, in the application of background subtraction from video, each scene change corresponds to a change point in the underlying subspace. Another application is in failure detection of mechanical systems based on sensor readings from the equipment. When a failure happens, the underlying subspace is changed. Accurate identification of the change point is useful for failure prediction. Some other applications include human activity recognition, intrusion detection in computer networks, etc. In this paper, we propose an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA), to handle both slowly changing and abruptly changed subspace. The method, which embeds hypothesis testing into online robust PCA framework, can accurately identify the change points of underlying subspace, and simultaneously estimate the low-rank subspace and sparse error. To the limit of the authors’ knowledge, OMWRPCA is the first algorithm developed to be able to simultaneously detect change points and compute RPCA in an online fashion. It is also the first online RPCA algorithm that can handle both slowly changing and abruptly changed subspaces.
The remainder of this paper is organized as follows. Section 2 gives the problem definition and introduces the related methods. The OMWRPCA algorithm is developed in Section 3. In Section 4, we compare OMWRPCA with RPCA-STOC via extensive numerial experiments. We also demonstrate the OMWRPCA algorithm with an application to a real-world video surveillance data. Section 5 concludes the paper and gives some discussion.
We use bold letters to denote vectors and matrices. We use and to represent the -norm and -norm of vector , respectively. For an arbitrary real matrix , denotes the Frobenius norm of matrix , denotes the nuclear norm of matrix
(sum of all singular values) andstands for the -norm of matrix where we treat as a vector.
Let denote the number of observed samples, and is the index of the sample instance. We assume that our inputs are streaming observations , , which can be decomposed as two parts . The first part is a vector from a low-rank subspace , and the second part is a sparse error with support size . Here represents the number of nonzero elements of , and . The underlying subspace may or may not change with time . When does not change over time, we say the data are generated from a stable subspace. Otherwise, the data are generated from a changing subspace. We assume satisfy the sparsity assumption, i.e., for all . We use to denote the matrix of observed samples until time . Let , , denotes , , respectively.
Robust PCA based on Principal Component Pursuit (RPCA-PCP) [6, 13] is the most popular RPCA algorithm which decomposes the observed matrix into a low-rank matrix and a sparse matrix by solving Principal Component Pursuit:
Theoretically, we can prove that and can be recovered exactly if (a) the rank of and the support size of are small enough; (b) is “dense”; (c) any element in sparse matrix
is nonzero with probabilityand 0 with probability independent over time [6, 14]. RPCA-PCP is often applied to estimate the low-rank matrix which approximates the observed data when the inputs are corrupted with gross but sparse errors.
Various algorithms have been developed to solve RPCA-PCP, including Accelerated Proximal Gradient (APG)  and Augmented Lagrangian Multiplier (ALM) . Bases on the experiment of , ALM achieves higher accuracy than APG, in fewer iterations. We implemented ALM in this paper. Let denote the shrinkage operator , and extend it to matrices by applying it to each element. Let denote the singular value thresholding operator given by , where
is any singular value decomposition. ALM is outlined in Algorithm1.
RPCA-PCP is a batch algorithm. It iteratively computes SVD with all data. Thus, it requires a huge amount of memory and does not scale well with big data.
In practice, the underlying subspace can change over time. This makes the RPCA-PCP infeasible to do after some time as the rank of matrix (includes all observed samples) will increase over time. One solution is Moving Window Robust Principal Component Analysis (MWRPCA) which applies a sliding window approach to the original data and iteratively computes batch RPCA-PCP using only the latest data column vectors, where is a user specified window size.
MWRPCA generally performs well for subspace tracking. It can do subspace tracking on both slowly changing subspace and abruptly changed subspace. However, MWRPCA often becomes too slow to deal with real-world big data problem. For example, suppose we want to track the subspace on an matrix with the size . If we chose a window size , we need to run RPCA-PCP times on matrices with size , which is computationally too expensive.
Feng et al. 
proposed an online algorithm to solve robust PCA based on Principal Component Pursuit. It starts by minimizing the following loss function
where , are tuning parameters. The main difficulty in developing an online algorithm to solve the above equation is that the nuclear norm couples all the samples tightly. Feng et al.  solve the problem by using an equivalent form of the nuclear norm following , which states that the nuclear norm for a matrix whose rank is upper bounded by has an equivalent form
where can be seen as the basis for low-rank subspace and represents the coefficients of observations with respect to the basis. They then propose their RPCA-STOC algorithm which minimizes the empirical version of loss function (4) and processes one sample per time instance . Given a finite set of samples , the empirical version of loss function (4) at time point is
where the loss function for each sample is defined as
Fixing as , and can be obtained by solving the optimization problem
Assuming are known, the basis can be updated by minimizing the following function
which gives the explicit solution
One limitation of RPCA-STOC is that the method assumes a stable subspace, which is generally a too restrict assumption for applications such as failure detection in mechanical systems, intrusion detection in computer networks, fraud detection in financial transaction and background subtraction for surveillance video. At time , RPCA-STOC updates the basis of subspace by minimizing an empirical loss which involves all previously observed samples with equal weights. Thus, the subspace we obtain at time can be viewed as an “average” subspace of observations from time 1 to . This is clearly not desirable if the underlying subspace is changing over time. We propose a new method which combines the idea of moving window RPCA and RPCA-STOC to track changing subspace. At time , the proposed method updates by minimizing an empirical loss based only on the most recent samples. Here is a user specified window size and the new empirical loss is defined as
We call our new method Online Moving Window RPCA (OMWRPCA). The biggest advantage of OMWRPCA comparing with RPCA-STOC is that OMWRPCA can quickly update the subspace when the underlying subspace is changing. It has two other minor differences in the details of implementation compared with RPCA-STOC. In RPCA-STOC, we assume the dimension of subspace () is known. In OMWRPCA, we estimate it by computing batch RPCA-PCP with burn-in samples. The size of the burn-in samples is another user specified parameter. We also use the estimated from burn-in samples as our initial guess of in OMWRPCA, while
is initialized as zero matrix in RPCA-STOC. Specifically, in the initialization step, batch RPCA-PCP is computed on burn-in samples to get rank, basis of subspace , and . For simplicity, we also require . Details are given as follows.
Compute RPCA-PCP on burn-in samples , and we have , where and .
From SVD, we have , where , , and .
, and ;
OMWRPCA is summarized in Algorithm 4.
One limitation of the basic OMWRPCA algorithm is that it can only deal with slowly changing subspace. When the subspace changes suddenly, the basic OMWRPCA algorithm will fail to update the subspace correctly. This is because when the new subspace is dramatically different from the original subspace, the subspace may not be able to be updated in an online fashion or it may take a while to finish the updates. One example is that in the case the new subspace is in higher dimension than the original subspace, basic OMWRPCA algorithm can never be able to update the subspace correctly as the rank of the subspace is fixed to a constant. Similar drawback is shared by the majority of the other online RPCA algorithms which have been previously developed. Furthermore, an online RPCA algorithm which can identify the change point is very desirable since the change point detection is very crucial for subsequent identification, moreover, we sometimes are more interested to pinpoint the change points than to estimate the underlying subspace. Thus, we propose another variant of our OMWRPCA algorithm to simultaneously detect change points and compute RPCA in an online fashion. The algorithm is robust to dramatic subspace change. We achieve this goal by embedding hypothesis testing into the original OMWRPCA algorithm. We call the new algorithm OMWRPCA-CP.
OMWRPCA-CP is based on a very simple yet important observation. That is when the new observation can not be modeled well with the current subspace, we will often result in an estimated sparse vector with an abnormally large support size from the OMWRPCA algorithm. So by monitoring the support size of the sparse vector computed from the OMWRPCA algorithm, we can identify the change point of subspace which is usually the start point of level increases in the time series of . Figure 1 demonstrates this observation on four simulated cases. In each plot, support sizes of estimated sparse vectors are plotted. For each case, burn-in samples are between the index -200 and 0, and two change points are at and 2000. The time line is separated to three pieces with change points. In each piece, we have a constant underlying subspace, and the rank is given in . The elements of the true sparse vector have a probability of to be nonzero, and are independently generated. The dimension of the samples is 400. We choose window size 200 in OMWRPCA. Theoretically, when the underlying subspace is accurately estimated, the support sizes of the estimated sparse vectors should be around 4 and 40 for case and case, respectively. For the upper two cases, blows up after the first change point and never return to the normal range. This is because after we fit RPCA-PCP on burn-in samples, the dimension of is fixed to 10. Thus, the estimated subspace can never approximate well the true subspace in later two pieces of the time line as the true subspaces have larger dimensions. On the other hand, for the lower two cases, blows up immediately after each change point, and then drops back to the normal range after a while when the estimated subspace has converged to the true new subspace.
We give an informal proof of why the above phenomenon happens. Assume we are at time point and the observation is . We estimate (, ) by computing
where is the estimated subspace at time point . The above optimization problem does not have an explicit solution and can be iteratively solved by fixing , solving and fixing , solving . We here approximate the solution with one step update from a good initial point . We have , . When is small, we approximately have and , where represents the projection to the orthogonal complement subspace of . A good choice of is in the order of , and is small when is large. We will discuss the tuning of parameters in more details later. Under the mild assumption that the nonzero elements of are not too small , we have when , and when . This analysis also provides us a clear mathematical description of the term “abruptly changed subspace”, i.e., for a piecewise constant subspace, we say that a change point exists at time when the vector is not close to zero.
We develop a change point detection algorithm by monitoring . The algorithm determines that a change point exists when it finds is abnormally high for a while. Specifically, the users can provide two parameters and . When in consecutive observations, the algorithm finds more than observations with abnormally high , the algorithm decides that a change point exists in these observations and traces back to find the change point. We refer the cases when the underlying estimated subspace approximates the true subspace well as “normal”, and “abnormal” otherwise. All the collected information of from the normal period is stored in , where the -th element of denotes the number of times that equals . When we compute from a new observation , based on we can flag this observation as a normal one or an abnormal one via hypothesis testing. We compute p-value , which is the probability that we observe a sparse vector with at least as many nonzero elements as the current observation under the hypothesis that this observation is normal. If , we flag this observation as abnormal. Otherwise, we flag this observation as normal. Here is a user specified threshold. We find works very well. Algorithm 5 displays OMWRPCA-CP in pseudocode. The key steps are
Initialize as a zero vector . Initialize the buffer list for flag and the buffer list for count as empty lists. In practice, and can be implemented as queue structures (first in first out).
Collect samples to get . Run RPCA-PCP on and start OMWRPCA.
For the first observations of OMWRPCA, we wait for the subspace of OMWRPCA algorithm to become stable. , and remain unchanged.
For the next observations of OMWRPCA, we calculate from , and update (). and remain unchanged. Hypothesis testing is not done in this stage, as the sample size of the hypothesis test is small () and the test will not be accurate.
Now we can do hypothesis testing on with . We get the flag for the th observation, where if the th observation is abnormal, and if the th observation is normal. Append and to and , respectively. Denote the size of buffer or equivalently the size of buffer as . If satisfies , pop one observation from both and on the left-hand side. Assuming the number popped from is , we then update with (). If , we do nothing. Thus, the buffer size is always kept within , and when the buffer size reaches , it will remain in . Buffers contains the flag information of most recent observations and can be used to detect change points.
If equals , We compute and compare it with . If , we know a change point exists in the most recent observations. We then do a simple loop over the list to find the change point, which is detected as the first instance of consecutive abnormal cases. For example, suppose we find that which corresponding to time points , , , are the first instance of consecutive abnormal cases. The change point is then determined as . Here is another parameter specified by the user. In practice, we find works well. After the change point is identified, OMWRPCA-CP can restart from the change point.
Good choice of tuning parameters is the key to the success of the proposed algorithms. (, ) can be chosen based on cross-validation on the order of . A rule of thumb choice is and . needs to be kept smaller than to avoid missing a change point, and not too small to avoid generating false alarms. can be chosen based on user’s prior-knowledge. Sometimes the assumption that the support size of sparse vector remains stable and much smaller than is violated in real world data. For example, in video surveillance data, the foreground may contain significant variations over time. In this case, we can add one additional positive tuning parameter and change the formula of p-value to . This change can make the hypothesis test more conservative, and force the algorithm not detecting too many change points (false alarms).
It is easy to prove that OMWRPCA and OMWRPCA-CP (ignore the burn-in samples training) have the same computational complexity as STOC-RPCA. The computational cost of each new observation is , which is independent of the sample size and linear in the dimension of observation . In contrast, RPCA-PCP computes an SVD and a thresholding operation in each iteration with the computational complexity . Based on the experiment of , the proposed method are also more efficient than other online RPCA algorithm such as GRASTA. The memory costs of OMWRPCA and OMWRPCA-CP are and respectively. In contrast, the memory cost of RPCA-PCP is , where . Thus, the proposed methods are well suitable to process big data.
Last, current version of OMWRPCA-CP does hypothesis tests based on all historical information of . We can easily change to a queue structure and store only recent history of , where the user can specify how long of the history they would like to trace back. This change makes the algorithm detect a change point only by comparing with recent history. We do not pursue this variation in this paper.
In this section, we compare OMWRPCA with RPCA-STOC via extensive numerial experiments and an application to a real-world video surveillance data. To make a fair comparison between OMWRPCA and RPCA-STOC, we estimate the rank in RPCA-STOC with burn-in samples following the same steps as OMWRPCA. We implement all algorithms in python and the code is available at Github address https://github.com/wxiao0421/onlineRPCA.git.
The simulation study has a setting similar to . The observations are generated through , where is a sparse matrix with a fraction of non-zero elements. The non-zero elements of
are randomly chosen and generated from a uniform distribution over the interval of. The low-rank subspace is generated as a product , where the sizes of and are and respectively. The elements of both and are i.i.d. samples from the distribution. Here is the basis of the constant subspace with dimension . We fix and . A burn-in samples with the size is also generated. We have four settings , , , and . For each setting, we run 50 replications. We choose the following parameters in all simulation studies, , , , , , , , , , and .
We compare three different methods, STOC-RPCA, OMWRPCA and OMWRPCA-CP. Three criteria are applied:
Here is the relative error of the low-rank matrix , is the relative error of the sparse matrix , and is the proportion of correctly identified elements in .
Box plots of , , and running times are shown in Figure 2. OMWRPCA-CP has the same result as OMWRPCA, and no change point is detected with OMWRPCA-CP in all replications. All three methods STOC-RPCA, OMWRPCA and OMWRPCA-CP has comparable performance on and running times. STOC-RPCA has slightly better performance on when . OMWRPCA and OMWRPCA-CP has slightly better performance on . All methods take approximately the same amount of time to run, and are all very fast (less than 1.1 minutes per replication for all settings). In contrast, MWRPCA takes around 100 minutes for the settings , , , and more than 1000 minutes for the setting .
We adopt almost all setting of Simulation Study 1 except that we make the underlying subspace change linearly over time. We first generate with i.i.d. samples from the distribution. We generate burnin samples based on . We make slowly change over time by adding new matrics generated independently with i.i.d. elements to the first columns of , where , and . We choose . Specifically, for , where , , we have
We set in this simulation.
Box plots of , , and running times are shown in Figure 3. OMWRPCA-CP has the same result as OMWRPCA, and no change point is detected with OMWRPCA-CP in all replications. OMWRPCA and OMWRPCA-CP have better performance on all three criteria , , comparing with STOC-RPCA as we pointed out before that STOC-RPCA is not able to efficiently track changing subspace. In Figure 4, we plot the average of across all replications as a function of number of observations to investigate the progress of performance across different methods. It shows that the performance of STOC-RPCA deteriorates over time, while OMWRPCA and OMWRPCA-CP’s performance is quite stable.