Low-Rank and Sparse Tools for Background Modeling and Subtraction in Videos
Numerous applications in data mining and machine learning require recovering a matrix of minimal rank. Robust principal component analysis (RPCA) is a general framework for handling this kind of problems. Nuclear norm based convex surrogate of the rank function in RPCA is widely investigated. Under certain assumptions, it can recover the underlying true low rank matrix with high probability. However, those assumptions may not hold in real-world applications. Since the nuclear norm approximates the rank by adding all singular values together, which is essentially a ℓ_1-norm of the singular values, the resulting approximation error is not trivial and thus the resulting matrix estimator can be significantly biased. To seek a closer approximation and to alleviate the above-mentioned limitations of the nuclear norm, we propose a nonconvex rank approximation. This approximation to the matrix rank is tighter than the nuclear norm. To solve the associated nonconvex minimization problem, we develop an efficient augmented Lagrange multiplier based optimization algorithm. Experimental results demonstrate that our method outperforms current state-of-the-art algorithms in both accuracy and efficiency.READ FULL TEXT VIEW PDF
Matrix rank minimization problem is in general NP-hard. The nuclear norm...
Rank minimization is of interest in machine learning applications such a...
Multi-Task Learning (MTL) can enhance a classifier's generalization
Low-rank modeling has many important applications in computer vision and...
In many applications that require matrix solutions of minimal rank, the
The importance of accurate recommender systems has been widely recognize...
Several bandwise total variation (TV) regularized low-rank (LR)-based mo...
Low-Rank and Sparse Tools for Background Modeling and Subtraction in Videos
In many machine learning and data mining applications, the dimensionality of data is very high, such as digital images, video sequences, text documents, genomic data, social networks, and financial time series. Data mining on such data sets is challenging due to the curse of dimensionality. Dimensionality reduction techniques, which project the original high-dimensional feature space to a low-dimensional space, have been extensively explored. Among them, principal component analysis (PCA)
, which finds a small number of orthogonal basis vectors that characterize most of the variability of the data set, is well established and commonly used. However, PCA may fail spectacularly even when a single grossly corrupted entry exists in the data. To enhance its robustness to outliers or corrupted observations, early attempts on robust PCA (RPCA) have been made, , , , . Nevertheless, none of these algorithms yields a solution in polynomial-time with strong performance guarantees under broad conditions.
Due to the seminal work of , , a more recent version of RPCA becomes popular these days. The idea is to recover a low-rank matrix from highly corrupted observations . Entries in the sparse component
can have arbitrarily large magnitude. This has numerous applications ranging from recommender system design to anomaly detection in dynamic networks. For example, for videos and face images under varying illumination, the background and underlying clean face image are regarded as the low-rank component while the moving objects and shadows represent the sparse part; common words in a collection of text documents can be captured by a low-rank matrix while the few words that distinguish each document from others can be represented by a sparse matrix .
Mathematically, this kind of problem can be modeled as
where a weight parameter. Unfortunately, (1) is generally an NP-hard problem. By relaxing the nonconvex rank function and the -norm into the nuclear norm and -norm respectively, a convex formulation can be yielded
where ; i.e., the nuclear norm of is the sum of its singular values, and . Under incoherence assumptions, both low-rank and sparse components can be recovered exactly with an overwhelming probability .
Despite its convex formulation and ease of optimization, RPCA in (2) has two major limitations. First, the underlying matrix may have no incoherence guarantee  in practical scenarios, and the data may be grossly corrupted. Under these circumstances, the resulting global optimal solution to (2) may deviate significantly from the truth. Second, RPCA shrinks all the singular values equally. The nuclear norm is essentially an norm of the singular values and it is well known that norm has a shrinkage effect and leads to a biased estimator , . This implies that the nuclear norm over-penalizes large singular values, and consequently it may only find a much biased solution. Nonconvex penalties to norm such as smoothly clipped absolute deviation penalty , minimax concave penalty , capped- regularization , and truncated function  have shown that they provide better estimation accuracy and variable selection consistency . Recently, nonconvex relaxations to the nuclear norm have received increasing attention . Variations of the nuclear norm, e.g., weighted nuclear norm , , singular value thresholding , and truncated nuclear norm  are proposed and outperform the standard nuclear norm. However, their applications are still quite limited and they are often designed for specific applications.
In this paper, we propose a novel nonconvex function to directly approximate the rank, which provides a tighter approximation than the nuclear norm does. This is crucial to reveal the rank in low-rank matrix estimation. To solve this nonconvex model, we devise an Augmented Lagrange Multiplier (ALM) based optimization algorithm. Theoretical convergence analysis shows that our iterative optimization at least converges to a stationary point. Extensive experiments on three representative applications confirm the advantages of our approach.
The convex approach to RPCA in (2) has been studied thoroughly. It is proved that when the locations of nonzero entries of
are uniformly distributed, and when the rank ofand the sparsity of satisfy some mild conditions, and can be exactly recovered with a high probability . In the literature, numerous algorithms have been developed to solve (2), e.g., SVT , APGL , FISTA , and ALM . Among them, ALM based approach is the most popular. Although the theory is elegant, convex technique is still computationally quite expensive and has poor convergence rate . Furthermore, (2) breaks down when large errors concentrate only on a number of columns of , .
Here, can detect outliers with column-wise sparsity, while treats each entry independently. Theoretical analysis on this model is difficult. In this model, only the column space of and the column support of can be exactly recovered , . When the rank of the intrinsic matrix is comparable to the number of samples , the working range of outlier pursuit is limited.
To alleviate the deficiency of convex relaxations, capped norm based nonconvex RPCA (CNorm) has been proposed and it solves the following problem :
where , for some small parameters , , and denotes the level of Gaussian noise. If all singular values of are greater than and all absolute values of elements are greater than , then the objective function in (4) falls back to (1). However, it is hard to provide any convergence guarantee about this nonconvex method. More importantly, as we will show in the experimental part, it cannot deal with large scale data well.
By combining the simplicity of PCA and elegant theory of convex RPCA, a recent paper has proposed a new nonconvex RPCA . The idea is to project the residuals onto the set of low-rank and sparse matrices alternatively. Specifically, it proceeds in (the desired rank of ) stages, and compute rank- projection in each stage, where . During this process, sparse errors are suppressed by discarding matrix elements with large approximation errors. This method enjoys several nice properties, including low complexity, global convergence guarantee, fast convergence rate, and theoretical guarantee for exact recovery of the low-rank matrix. However, it needs the knowledge of three parameters: sparsity of , incoherence of , and rank of . Such knowledge is not always readily available.
In this section, we present a novel matrix rank approximation, and propose a nonconvex RPCA algorithm.
Consider the general framework for RPCA
where denotes a rank approximation which we term -norm, and represents a proper norm of noise and outliers.
We define -norm of matrix as
It can be observed that , , and it coincides with true rank with being all 0 and all 1. Furthermore, is unitarily invariant, that is, for any orthonormal and . Certainly it is not a real norm. Figure 1 plots several rank relaxations in the literature. Among them, a log-determinant function, , where is a very small constant (e.g., ), has been well studied . As we can see, our formulation ( is used in this figure and our experiments) closely matches the true rank, while the nuclear norm deviates considerably when the singular values depart from 1. As a result, the proposed -norm overcomes the imbalanced penalization by different singular values in convex nuclear norm. On the other hand, (5) is a nonconvex formulation, which is usually difficult to optimize. In the next section, we design an effective algorithm to solve it.
For problem (5), by introducing a Lagrange multiplier and a quadratic penalty term, we can remove the equality constraint and construct the augmented Lagrangian function:
where is the inner product of two matrices, that is, , and is a positive parameter. An iterative approach is applied to update , and iteratively. At the th step, we update by solving the following subproblem:
To solve (8), we first develop the following theorem and provide the proof in Appendix A.
Let be the SVD of and . Let be a unitarily invariant function and . Then an optimal solution to the following problem
is , where and . Here is the Moreau-Yosida operator, defined as
In our case, the new objective function in (10) is a combination of concave and convex functions. This intrinsic structure motivates us to use difference of convex (DC) programing . DC algorithm decomposes a nonconvex function as the difference of two convex functions and iteratively optimizes it by linearizing the concave term at each iteration. At the th inner iteration,
which admits a closed-form solution
where is the gradient of at and is the SVD of . After a number of iterations, it converges to a local optimal point . Then .
where and is the -th column of .
When modeled by norm, based on Lemma 4, we have
The updates of and are standard:
where . The complete procedure is outlined in Algorithm 1.
Convergence analysis of nonconvex optimization problem is usually difficult. In this section, we will show that our algorithm has at least a convergent subsequence which tends to a stationary point. While the final solution might not be a globally optimal one, all our experiments show that our algorithm converges to a solution that produces promising results.
For convenience, we write as in (7)
The sequence is bounded.
satisfies the first-order necessary local optimality condition,
For case, since is nonsmooth at , we redefine subgradient if . Then , hence is bounded. Similarly, it can be shown that is also bounded. Thus is bounded. ∎
and are bounded if .
With some algebra, we have the following equality
Iterating the inequality chain (21) times, we obtain
Since is bounded, all terms on the right-hand side of the above inequality are bounded, thus is upper bounded.
Since each term on the right-hand side is bounded, is bounded. By the last term on the right-hand of (23), is bounded. Therefore, and are both bounded. ∎
Let be the sequence generated in Algorithm 1 and be an accumulation point. Then is a stationary point of optimization problem (5) if and .
The sequence generated in Algorithm 1 is bounded as shown in Lemmas 1 and 2. By Bolzano-Weierstrass theorem, the sequence must have at least one accumulation point, e.g., . Without loss of generality, we assume that itself converges to .
Since , we have . Then . Thus the primal feasibility condition is satisfied.
For , it is true that
If the singular value decomposition ofis , according to Theorem 3 in Appendix B,
where if ; otherwise, it is . Since is finite, is bounded. Since is bounded, is bounded. Under the assumption that ,
Hence, satisfies the KKT conditions of . Thus is a stationary point of (5). ∎
In this section, we evaluate our algorithm by deploying it to three important practical applications: foreground-background separation on surveillance video, shadow removal from face images, and anomaly detection. All applications involve the recovery of intrinsically low-dimensional data from gross corruption. We compare our algorithm with other state-of-the-art methods, including convex RPCA , CNorm , and AltProj . All these three methods call PROPACK package . In addition, for convex RPCA , we use the state-of-the-art solver, viz., an inexact augmented Lagrange multiplier (IALM) method , . For CNorm, we use the fast alternating algorithm and the convex relaxation solutions from NSA  as its initial conditions. We perform all experiments with Matlab in Windows 7 based on Intel Xeon 2.33GHz CPU with 4G RAM111The code is available at: https://github.com/sckangz/noncvx-PRCA.
There are three parameters in our model: , , and . If is too large, the trivial solution of is obtained, which generates with high rank. On the other hand, small leads to . Similar to , can be selected from the neighborhood of . Experiments show that our results are insensitive to in a pretty broad range, so we just set through all our experiments. As for , a large value will lead to fast convergence, while a small value of will result in a more accurate solution. In the literature, a often used value is 1.1. As discussed in , can also affect in IALM. If is too large, will have a rank larger than the desired low rank. This provides a way to manipulate the desired rank. By this principle, we tune the value of . Finally, we set to , , and 4 for the following four experiments, respectively. In practice, these parameters can be chosen by cross validation. For fair comparison, we follow the experimental setting in AltProj  and stop the program when a relative error of is reached. For other methods, we follow the parameter settings used in the corresponding papers.
|(a) AltProj||(b) Our||(c) IALM||(d) CNorm|
Background subtraction from video sequences is a popular approach to detecting interesting activities in the scene. Surveillance videos from a fixed camera can be naturally modeled by our model due to their relatively static background and sparse foreground.
In this experiment, we use a benchmark data set escalator , which contains 3,417 frames of size 160 130. The data matrix is formed by vectorizing each frame and concatenating the vectors column-wisely. As a result, the size of is . For this data set, the background appears to be completely static, thus the ideal rank of the background matrix is one.
For escalator, unfortunately, CNorm cannot successfully separate the foreground from background with our current stopping criterion. Thus we relax its terminating relative error to be . For AltProj, we set the desired rank of to be one. The low rank ground truth is not available for these videos so we present a visual comparison of extraction results using different algorithms in Figure 2. As we can see, CNorm suffers from noticeable artifacts (shadows of people), which is due to overfitting in the low rank component. is missing since the sparse component is absorbed by the big relative error. Blur exists in IALM recovery image, which is also observed in many other work , . In contrast, both AltProj and our method obtain a clean background. Moreover, the steps of the escalator are also removed by these two methods, since they are moving and are supposed to be part of the dynamic foreground.
Table I gives the quantitative comparison results. In terms of computing time, our method is more than twice faster than AltProj, and 54 times faster than IALM222The experiments in  are conducted on a machine with Dual 8-core Xeon (E5-2650) 2.0GHz CPU with 192GB RAM . . One intuitive interpretation is that we observe that fewer iterations are required for our algorithm to converge. Therefore our method is efficient even when the matrix size is large. Furthermore, both AltProj and our algorithm can obtain the desired rank-one matrix . The nuclear norm based IALM results in with rank 2011, which implies that contains many blurred images. If we increase the size of data matrix , IALM performs even worse since it may incur errors from rank approximation. These results emphasize the significance of good rank approximation.
The purpose of this experiment is to demonstrate the effectiveness of our algorithm on dynamic backgrounds. In some cases, the background changes over time due to, e.g., illumination variation or weather change. Then the background can have a higher rank. Here we use sequences captured from a lobby. The size of is . On this occasion, background changes are mainly caused by switching on/off lights. Therefore, we expect the rank of to be 2. Again, we set the desired rank in Altproj to be 2. Two examples are shown for each method in Figure 3. The first example denotes the scene before the other three lights are turned on. In this example, except for some shadows of pants in one image recovered by IALM, all the recovered background images appear satisfactory.
We list the numerical results in Table II. For this experiment, our algorithm is almost five times faster than AltProj, 26 times faster than IALM. It is also noted that, the rank obtained from IALM is still high.
|(a) Original images||(b) AltProj||(c) Our||(d) IALM||(e) CNorm|
Removing shadows, specularities, and saturations from face images is another important application of RPCA 
. Face images taken under different lighting conditions often introduce errors to face recognition. These errors might be large in magnitude, but are supposed to be sparse in the spatial domain. Given enough face images of the same person, it is possible to reconstruct the true face image.
We use images from the Extended Yale B database . There are 38 subjects and each subject has 64 images of size taken under varying different illuminations. Images of each subject are heavily corrupted due to different illumination conditions. All images are converted to 32,256-dimensional column vectors, hence for each subject. Since the images are well aligned, should have a rank of 1.
Figure 4 illustrates the results of different algorithms on two images. The proposed algorithm removes the specularities and shadows well, while there are some artifacts left by using IALM and CNorm. Although the visual qualities are similar for AltProj and our method, the numerical measurements in Table III demonstrate that our algorithm is 22 times faster than AltProj. Similar to the results in , IALM and CNorm result in of high ranks.
It is widely known that images from the same subject reside in a low-dimensional subspace. If we inject some images of different subjects into a dominant number of images of the same subject, they will stand out as outliers or anomalies. To test this, we use images from USPS data set . We choose digits ’1’ and ’7’, since they share some similarities. And we treat each 1616 image as a 256-dimensional column vector. Then the data matrix is constructed to contain the first 190 samples from digit ’1’ and the last 10 samples from ’7’. Our goal is to identify all anomalies, including all ’7’s, in an unsupervised way. We apply our model to estimate and , which are expected to capture ’1’s and ’7’s, respectively. norm of each column in is used to identify anomalies. Ideally, ’7’s should give larger values than ’1’s.
Figure 5 shows the norm of columns in . For visual quality, we apply thresholding with a threshold of 4 to get rid of small values. We can see that all ’7’s are found. Besides, four ’1’s at columns 16, 22, 49 and 130 also appear. As shown in Figure 6, these four ’1’s are written in a way different from the rest of ’1’s.
This paper investigates a nonconvex approach to the robust principal component analysis (RPCA) problem. In particular, we provide a novel matrix rank approximation, which is more robust and less biased than the nuclear norm. We devise an augmented Lagrange multiplier framework to solve this nonconvex optimization problem. Extensive experimental results demonstrate that our proposed approach outperforms previous algorithms. Our algorithm can be used as a powerful tool to efficiently separate low-dimensional and sparse structure for high-dimensional data. It would be interesting to establish more theoretical properties to the proposed nonconvex approach, for example, the theoretical guarantees for the estimator to be consistent.
This work is supported by US National Science Foundation Grants IIS 1218712. The corresponding author is Qiang Cheng.
is , where and . Here is the Moreau-Yosida operator, defined as
Since , then . Denoting which has exactly the same singular values as , we have
Note that (29) hold since the Frobenius norm is unitarily invariant; (30) is due to the Hoffman-Wielandt inequality; and (31) holds as . Thus, (31) is a lower bound of (28). Because , the SVD of is . By minimizing (32), we get . Hence , which is the optimal solution of problem (26). ∎
 Let be a given matrix. If the optimal solution to
is , then the -th column of is
 The shrinkage-thresholding operator is defined as
where and are scalars.
 Suppose is represented as , where with SVD , , and is differentiable. The gradient of at is
F. De la Torre and M. J. Black, “Robust principal component analysis for computer vision,” inComputer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 1. IEEE, 2001, pp. 362–369.
Journal of Multivariate Analysis, vol. 95, no. 1, pp. 206–226, 2005.
S. Xiang, X. Tong, and J. Ye, “Efficient sparse group feature selection via nonconvex optimization,” inProceedings of the 30th International Conference on Machine Learning (ICML-13), 2013, pp. 284–292.
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014, pp. 2862–2869.
M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices,” inAmerican Control Conference, 2003. Proceedings of the 2003, vol. 3. IEEE, 2003, pp. 2156–2162.