I Introduction
Principal component analysis (PCA) is a fundamental technique of exploratory data analysis and has been widely used in many data mining tasks. Given a data matrix , the classic PCA seeks the best rank approximation with complexity
. It is well known that PCA is sensitive to outliers. To combat this drawback, in the past decade, a number of approaches to robust PCA (RPCA) have been proposed, including alternating minimization
[1], random sampling techniques [2, 3], multivariate trimming [4], and others [5, 6]. More recently, a new type of RPCA method has emerged and become popular [7, 8]. It assumes that can be separated into two parts, i.e., a lowrank and a sparse , by solving the following problem:(1) 
where is the (pseudo) norm which counts the number of nonzero elements of a matrix, and is a balancing parameter. Because minimizing the rank and the norm in (1) is generally NPhard, in practice, (1) is often relaxed to the following convex optimization problem [8]:
(2) 
where is the nuclear norm of with denoting the th largest singular value of , and is the norm. Theoretically, under some mild conditions, (2) can exactly separate with the true rank from . A number of algorithms have been developed to solve (2), including singular value thresholding (SVT) [9], accelerated proximal gradient (APG) [10], and two versions of augmented Lagrange multipliers (ALM) based approaches [11]: exact ALM and inexact ALM (IALM). Among these algorithms, the ALM based are the stateoftheart ones for solving (2), which need to compute SVDs of matrices per iteration. To improve efficiency, another ALM based algorithm adopts PROPACK package [12], which solves only partial SVDs instead of full SVDs. However, this is still computationally costly when and are both large. Despite the elegant theory of the convex RPCA in (2), it has two major drawbacks: 1) When the underlying matrix has no incoherence guarantee [8]
, or the data get grossly corrupted, the results may be far from the underlying true ones; 2) The nuclear norm may lead to biased estimation of the rank
[13]. To combat these drawbacks, [14] fixes the rank of as a hard constraint, while [13] uses a nonconvex rank approximation to more accurately approximate the rank of , which needs to solve full SVDs. [14, 15] need only to solve partial SVDs, which significantly reduces the complexity compared to computation of full SVD; for example, AltProj has a complexity of [15]. However, if is not known a priori, [14, 15] may fail to recover correctly.To further reduce the complexity, enhance the scalability and alleviate the dependence on the knowledge of , in this paper, we propose a factorizationbased model for RPCA, where can be decomposed as with , , , and . This model relaxes the requirement on a priori knowledge of the rank of , assuming only that it is upper bounded by . Scalable algorithms are developed to optimize our models efficiently. After acceptance of this paper, it came to our attention that [16] also proposed a factorizationbased approach for such a problem. However, our model is distinct from [16]. Our model has two variants, one of which uses an explicit rank constraint by a matrix factorization in the case that the ground truth rank is known, while the other uses nonconvex rank approximation in the case that the ground truth rank is unknown. Both variants of our model differ starkly from [16], which only considers the second case and simply uses the nuclear norm.
We summarize the contributions of this paper as follows: We propose a factorizationbased model for RPCA, allowing the recovery of the lowrank component with or without a priori knowledge of its true rank. In the absence of true rank, a nonconvex rank approximation is adopted, which can approximate the rank of a matrix more accurately than the nuclear norm. Efficient ALMtype optimization algorithms are developed with scalability in both and . This is contrasted to AltProj whose cost is . This difference is important when and are large. Empirically, extensive experiments testify to the effectiveness of our model and algorithms both quantitatively and qualitatively in various applications.
Ii Related Work
The convex RPCA (2) has been thoroughly studied [9]. To exploit the examplewise sparsity of the sparse component, norm has been adopted [17, 18]:
(3) 
where is defined to be the sum of
norms of column vectors of a matrix. When a matrix has large singular values, the nuclear norm may be far from an accurate approximation of the rank. Nonconvex rank approximations have been considered in a number of applications, such as subspace clustering
[19, 20]. A nonconvex rank approximation has also been used in RPCA [13], which replaces the nuclear norm in (3) by a nonconvex rank approximation , with , and being the th largest singular value of . The above approaches usually need to solve SVDs. When the matrix involved is large, the computation of SVD, in general, is intensive. To reduce the complexity of RPCA, several approaches have been attempted. For example, AlgProj [15] uses nonconvex alternating minimization techniques in RPCA and admits complexity. [16] uses a factorization approach:(4) 
which solves SVDs of thin matrices and hence admits scalability, where
is identity matrix with proper size.
Iii Fast Factorizationbased RPCA
In this section, we formulate the Fast Factorizationbased RPCA (FFP). We model the data as , where is a sparse component and is a lowrank approximation of with being the core matrix, , satisfying with being an identity matrix of a proper size. It is seen that the factorization provides a natural upper bound for the rank of the lowrank component of . The upper bound, which is , can be used to relax the stringent requirement on the knowledge of the true rank by AltProj algorithm. To capture the sparse structure of , we adopt norm to obtain sparsity. Thus, we consider the following objective function:
(5) 
For many applications, the true rank of , which is , is known. In the presence of this prior knowledge, we let in 5. However, when this information on is not present, 5 may lack the desired capability of recovering with a (unknown) rank of while an arbitrary used. In this situation, we propose to marry the advantages of the convex and fixedrank RPCA approaches, yielding the following optimization problem:
(6)  
where is a balancing parameter. Even in the case that knowledge of the precise value of is not available, a proper can still be chosen because, in the worst case, we may let . Oftentimes, with domain information on the application at hand, an upper bound can be obtained. It has been shown that the nuclear norm can not approximate the true rank well if there are dominant singular values in a matrix, and nonconvex rank approximations may help improve learning performance [13]. Here, we adopt a logdeterminant rank approximation [20], , to obtain the following RPCA model:
(7)  
Due to the fact that
(8)  
model 7 can be reduced to the following model:
Iv Optimization
The augmented Lagrange functions of (5) and (9) are
(10) 
and
(11)  
respectively. The derivations for optimization are similar to those in [20]. We summarize the optimization in Algorithm 1. Here, we define and to be the left and right singular vectors of a matrix from thin SVD, and define the operator , with
(12) 
with , and .
Iva Complexity Analysis
Given that , both FFFP and UFFP have complexity .
V Experiments
To evaluate the proposed model and algorithms, we consider three important applications: foregroundbackground separation, shadow removal from face images, and anomaly detection. We compare our algorithms with the stateoftheart methods, including IALM
^{1}^{1}1http://perception.csl.illinois.edu/matrixrank/sample_code.html#RPCA. [8] and AltProj^{2}^{2}2http://www.personal.psu.edu/nsa10/codes.html. , both of which make use of the PROPACK package [12] to solve SVDs for efficiency. All experiments in this section are conducted using Matlab on a dualcore Intel Xeon E31240 V2 3.40 GHz Linux Server with 8 GB memory. For purpose of reproductivity, we provide our codes on the website^{3}^{3}3https://www.researchgate.net/publication/308174615_codes_icdm2016.Va Foregroundbackground separation
Foregroundbackground separation is to detect moving objects or interesting activities in a scene, and remove background(s) from a video sequence. For this task, we use 15 datasets, as listed in the first column of Table I, among which the first 11 contain a single background while the rest 4 have 2 backgrounds^{4}^{4}4Datasets used in this subsection can be found at:
http://perception.i2r.astar.edu.sg/bk_model/bk_index.html
http://limu.ait.kyushuu.ac.jp/dataset/en/
http://wordpressjodoin.dmi.usherb.ca/dataset2012/
http://research.microsoft.com/enus/um/people/jckrumm/wallflower/testimages.htm.. In this case, we have and for the first 11 and last 4 datasets, respectively. For each dataset, the data matrix is constructed by treating all vectorized frames as columns^{5}^{5}5For computational ease, downsampling is performed on Camera Parameter, Highway, Office, Shopping Mall, Pedestrian, and Time of Day data sets.. In the following, we test FFFP and UFFP in two cases according to whether is known.
VA1 Case 1 ( is known)
We set for FFFP and AltProj. We terminate all methods after 200 iterations or when is reached. For IALM, we fix and for all these data sets for fast convergence and good visual quality. The balancing parameter is chosen as the theoretical one [8]. For Altproj, the default parameters are used. For FFFP, we use the same and as IALM. Without specification, the parameter settings remain the same throughout this paper.
The numerical results are reported in Table I. It is observed that IALM separates more sparsely but fails to recover with low rank, while FFFP and AltProj recover properly. FFFP generates more sparse than AltProj for most of the datasets. All these methods have competitive fitting error. However, it is possible for FFFP to obtain more accurate fitting if the same iterations or time is provided as IALM or AltProj. Furthermore, FFFP needs the least amount of time on all these datasets.
Data  Method  Rank()  # of Iter.  # of SVDs  Time  
Highway  AltProj  1  0.9331  2.96e4  37  38  49.65 
IALM  539  0.8175  6.02e4  12  13  269.10  
FFFP  1  0.8854  5.74e4  24  24  14.83  
Office  AltProj  1  0.8018  9.40e4  51  52  84.43 
IALM  374  0.7582  9.46e4  11  12  230.53  
FFFP  1  0.8761  5.33e4  24  24  19.92  
PETS2006  AltProj  1  0.8590  5.20e4  35  36  44.64 
IALM  293  0.8649  5.63e4  12  13  144.26  
FFFP  1  0.8675  5.61e4  24  24  14.33  
Shopping Mall  AltProj  1  0.9853  3.91e5  45  46  45.35 
IALM  328  0.8158  9.37e4  11  12  123.99  
FFFP  1  0.9122  7.70e4  23  23  11.65  
Pedestrian  AltProj  1  0.5869  9.32e4  41  42  37.90 
IALM  35  0.8910  5.69e4  11  12  36.18  
FFFP  1  0.6023  9.98e4  22  22  10.53  
Bootstrap  AltProj  1  0.9747  1.17e4  44  45  107.15 
IALM  1146  0.8095  6.27e4  12  13  1182.92  
FFFP  1  0.9288  7.71e4  23  23  25.38  
Water Surface  AltProj  1  0.8890  3.97e4  47  48  27.27 
IALM  224  0.7861  5.32e4  12  13  51.00  
FFFP  1  0.8355  9.91e4  23  23  5.68  
Campus  AltProj  1  0.9790  9.50e5  41  42  54.1 
IALM  488  0.8136  9.30e4  11  12  242.59  
FFFP  1  0.9378  6.26e4  23  23  12.85  
Curtain  AltProj  1  0.8280  7.46e4  40  41  102.41 
IALM  834  0.7398  6.84e4  12  13  747.36  
FFFP  1  0.8680  6.28e4  24  24  27.51  
Fountain  AltProj  1  0.9113  2.91e4  50  51  23.90 
IALM  102  0.8272  4.91e4  12  13  25.62  
FFFP  1  0.8854  4.89e4  24  24  5.00  
Escalator Airport  AltProj  1  0.9152  2.29e4  40  41  110.75 
IALM  957  0.7744  7.76e4  11  12  1,040.91  
FFFP  1  0.8877  5.45e4  23  23  30.78  
Lobby  AltProj  2  0.9243  1.88e4  39  41  47.32 
IALM  223  0.8346  6.19e4  12  13  152.54  
FFFP  2  0.8524  6.42e4  24  24  15.20  
Light Switch2  AltProj  2  0.9050  2.24e4  47  49  87.35 
IALM  591  0.7921  7.93e4  12  13  613.98  
FFFP  2  0.8323  7.54e4  24  24  24.12  
Camera Parameter  AltProj  2  0.8806  5.34e4  47  49  84.99 
IALM  607  0.7750  6.86e4  12  13  433.47  
FFFP  2  0.8684  6.16e4  24  24  22.25  
Time Of Day  AltProj  2  0.8646  4.72e4  44  46  61.63 
IALM  351  0.6990  6.12e4  13  14  265.87  
FFFP  2  0.8441  6.81e4  25  25  18.49 
For IALM and AltProj, (partial) SVDs are for matrices. For FFFP, SVDs are for matrices, which are computationally far less expensive than those required by IALM and AltProj.
VA2 Case 2 ( is unknown)
is specified as a tight upper bound of based on domain knowledge on the video. In this test, we set on all datasets and compare UFFP with AltProj and IALM^{6}^{6}6Here the results of IALM are not shown since they are the same as Case 1.. For UFFP, is chosen from . We show the numerical results in Table II. It is observed that UFFP is still able to recover with the true rank, whereas AltProj fails in this case. Besides, the time cost of UFFP increases slightly by less than 1 second for most of the datasets while the time cost of AltProj increases by about 1020 seconds.
We also show some video frames in Figure 1 and the visual results in Figure 2. It is observed that the backgrounds recovered by IALM still have shadows; AltProj separates foreground and background well with known, but the backgrounds are not clean when is unknown; both FFFP and UFFP can successfully separate foreground and background from the video.
Data  Method  Rank()  # of Iter.  # of SVDs  Time  
Highway  AltProj  5  0.9007  3.66e4  43  48  75.60 
UFFP  1  0.8854  5.75e4  24  24+24+24  18.02  
Office  AltProj  5  0.7159  8.61e4  47  52  98.54 
UFFP  1  0.8761  5.40e4  24  24+24+24  24.40  
PETS2006  AltProj  5  0.8543  6.15e4  39  43  63.33 
UFFP  1  0.8675  5.61e4  24  24+24+24  17.29  
Shopping Mall  AltProj  5  0.9611  9.82e5  41  46  63.34 
UFFP  1  0.9122  7.70e4  23  23+23+23  14.37  
Pedestrian  AltProj  5  0.6202  6.37e4  44  49  58.10 
UFFP  1  0.6714  5.65e4  23  23+23+23  12.40  
Bootstrap  AltProj  5  0.9875  3.02e4  47  52  169.06 
UFFP  1  0.9288  7.70e4  23  23+23+23  31.03  
Water Surface  AltProj  5  0.9090  2.38e4  46  50  33.78 
UFFP  1  0.8355  9.77e4  23  23+23+23  7.27  
Campus  AltProj  5  0.9482  3.18e5  46  51  92.90 
UFFP  1  0.9377  6.26e4  23  23+23+24  16.29  
Curtain  AltProj  5  0.8079  8.82e4  36  39  101.79 
UFFP  1  0.8680  6.28e4  24  24+24  34.33  
Fountain  AltProj  5  0.7435  7.55e4  48  52  32.24 
UFFP  1  0.8852  4.91e4  24  24+24+24  6.26  
Escalator Airport  AltProj  5  0.8474  8.43e4  43  48  162.49 
UFFP  1  0.8877  5.45e4  23  23+23+23  39.70  
Lobby  AltProj  5  0.9176  1.71e4  40  44  61.50 
UFFP  2  0.8523  6.42e4  24  24+24+24  17.91  
Light Switch2  AltProj  5  0.8507  4.29e4  37  41  80.37 
UFFP  2  0.8329  7.57e4  24  24+24+24  28.53  
Camera Parameter  AltProj  5  0.7311  8.34e4  50  55  147.28 
UFFP  2  0.8689  6.26e4  24  24+24+24  26.35  
Time Of Day  AltProj  5  0.8651  4.61e4  46  51  73.35 
UFFP  2  0.8425  7.20e4  25  25+25+25  21.83 
For AltProj, (partial) SVDs are performed on matrices. For UFFP, SVDs are for , , and matrices, which are computationally far less expensive than those required by AltProj.
(1)  (2)  (3) 
VB Shadow removal from face images
Learning face is an important topic in pattern recognition; however, shadows make it more challenging. There are often shadows on face images due to varying lighting conditions
[21]. Therefore, it is crucial to handle shadows, peculiarities and saturations on face images, so as to improve the learning capability on face image data. This can be handled with RPCA, because clean images reside in a lowrank subspace while the shadows correspond to sparse components. Following [13, 8], we use Extended Yale B (EYaleB) dataset [22] for this test. Out of 38 persons in this dataset, we choose the first two subjects. For each subject, there are 64 heavily corrupted images of size taken under varying lighting conditions. Each image is vectorized into a column of a data matrix. It is natural to assume that the face images of the same person reside in a rank one space, hence we have .First, we consider the case where is known. Following [13, 8], IALM, AltProj, and FFFP are applied to each subject and the quantitative and visual results are shown in Table III and Figure 3, respectively. It is observed from Figure 3 that AltProj and FFFP can successfully remove shadows from face images. The majority of shadows can be removed by IALM, but some still remain. From Table III, we can see that AltProj and FFFP can recover the lowrank component with exactly rank 1, while IALM recovers which has a higher rank. Then, we consider the case where is unknown. Following the setting in VA, we set and apply AltProj and UFFP to each subject. The quantitative and visual results are shown in Table IV and Figure 4, respectively. It is observed from Table IV that UFFP has similar visual results as FFFP while AltProj appears unable to remove shadows completely. Quantitatively as shown in Table IV, AltProj gives an that has a higher rank while UFFP gives an that has the true rank. Besides, the proposed UFFP is the fastest among all these methods.
Data  Method  Rank()  # of Iter.  # of SVDs  Time  
Subject 1  AltProj  1  0.9553  8.18e4  50  51  4.62 
IALM  32  0.7745  6.28e4  25  26  2.43  
FFFP  1  0.9655  8.86e4  36  36  1.37  
Subject 2  AltProj  1  0.9755  2.34e4  49  50  5.00 
IALM  31  0.7656  6.47e4  25  26  2.66  
FFFP  1  0.9492  9.48e4  36  36  1.37 
(1)  (2)  (3)  (1)  (2)  (3) 
Data  Method  Rank()  # of Iter.  # of SVDs  Time  
Subject 1  AltProj  5  0.9309  3.93e4  51  55  6.08 
UFFP  5  0.9647  8.40e4  36  36+36+36  1.69  
Subject 2  AltProj  5  0.8903  6.40e4  54  58  7.92 
UFFP  1  0.9651  5.72e4  37  37+37+37  1.74 
(1)  (2)  (1)  (2) 
VC Anomaly Detection
Given a number of images from one subject, they form a lowdimensional subspace. Any image significantly different from the majority of the images can be regarded as an outlier; besides, fewer images from another subject can be regarded as outliers. Anomaly detection is to identify such kinds of outliers. USPS dataset contains 9,298 images of handwritten digits of size . Following [13], among these images, we select the first 190 images of ‘1’s and the last 10 of ‘7’s and construct a data matrix of size by regarding each vectorized image as a column. Since the number of ‘1’s is far greater than ‘7’s, the former is the dominant digit, while the images of the latter are outliers. The true rank of should be 1. Some examples of these selected images are shown in Figure 5. It is observed that besides ‘7’s, some ‘1’s are quite different from the majority. Therefore, anomaly detection, in this case, is to detect not only the ‘7’s, but also the anomaly of ‘1’s. After applying FFFP, the columns in that correspond to anomalies contain relatively larger values. We use norm to measure every column of and show the values in Figure 6. These outliers can be identified by finding the columns with the highest bars. For ease of visualization, we vanish all those values that are smaller than 5 in Figure 6. The corresponding digits are shown in Figure 7 as outliers, which include all ‘7’s and several unusual ‘1’s.
VD Scalability
To numerically illustrate the scalability of FFFP and UFFP, we test the time cost versus the values of and , respectively. To test how the computation time increases with , we uniformly sample a partition of the frames with the sampling rate in and keep all pixels for each frame. To test the relationship of time with respect to , we use all frames and downsample each frame with different rates varying over , to keep the spatial information of each frame. We run FFFP and UFFP with 50 iterations in 10 repeated runs and we report the average time in Figure 8. It is observed that the time cost increases essentially linearly with both and for both FFFP and UFFP.
Vi Conclusion
In this paper, we propose a new, factorizationbased RPCA model. Nonconvex rank approximation is used to minimize the rank when the ground truth is unknown. ALMtype optimization is developed for solving the model. Our model and algorithms admit scalability in both data dimension and sample size, suggesting a potential for real world applications. Extensive experiments testify to the effectiveness and scalability of the proposed model and algorithms both quantitatively and qualitatively.
Acknowledgment
Qiang Cheng is the corresponding author. This work is supported by National Science Foundation under grant IIS1218712, National Natural Science Foundation of China, under grant 11241005, and Shanxi Scholarship Council of China 2015093.
References
 [1] Q. Ke and T. Kanade, “Robust l norm factorization in the presence of outliers and missing data by alternative convex programming,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 739–746.
 [2] R. Maronna, D. Martin, and V. Yohai, Robust statistics. John Wiley & Sons, Chichester. ISBN, 2006.
 [3] F. De La Torre and M. J. Black, “A framework for robust subspace learning,” International Journal of Computer Vision, vol. 54, no. 13, pp. 117–142, 2003.

[4]
R. Gnanadesikan and J. R. Kettenring, “Robust estimates, residuals, and outlier detection with multiresponse data,”
Biometrics, pp. 81–124, 1972.  [5] L. Xu and A. L. Yuille, “Robust principal component analysis by selforganizing rules based on statistical physics approach,” Neural Networks, IEEE Transactions on, vol. 6, no. 1, pp. 131–143, 1995.
 [6] C. Croux and G. Haesbroeck, “Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies,” Biometrika, vol. 87, no. 3, pp. 603–618, 2000.
 [7] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization,” in Advances in neural information processing systems, 2009, pp. 2080–2088.
 [8] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM), vol. 58, no. 3, p. 11, 2011.
 [9] J.F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
 [10] K.C. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific Journal of Optimization, vol. 6, no. 615640, p. 15, 2010.
 [11] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices,” arXiv preprint arXiv:1009.5055, 2010.
 [12] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” Image Processing, IEEE Transactions on, vol. 20, no. 12, pp. 3419–3430, 2011.
 [13] Z. Kang, C. Peng, and Q. Cheng, “Robust pca via nonconvex rank approximation,” in Data Mining (ICDM), 2015 IEEE International Conference on. IEEE, 2015, pp. 211–220.
 [14] W. K. Leow, Y. Cheng, L. Zhang, T. Sim, and L. Foo, “Background recovery by fixedrank robust principal component analysis,” in Computer Analysis of Images and Patterns. Springer, 2013, pp. 54–61.
 [15] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, “Nonconvex robust pca,” in Advances in Neural Information Processing Systems, 2014, pp. 1107–1115.
 [16] G. Liu and S. Yan, “Active subspace: Toward scalable lowrank learning,” Neural computation, vol. 24, no. 12, pp. 3371–3394, 2012.
 [17] H. Xu, C. Caramanis, and S. Sanghavi, “Robust pca via outlier pursuit,” in Advances in Neural Information Processing Systems, 2010, pp. 2496–2504.
 [18] M. McCoy, J. A. Tropp et al., “Two proposals for robust pca using semidefinite programming,” Electronic Journal of Statistics, vol. 5, pp. 1123–1160, 2011.
 [19] C. Peng, Z. Kang, H. Li, and Q. Cheng, “Subspace clustering using logdeterminant rank approximation,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 925–934.

[20]
C. Peng, Z. Kang, M. Yang, and Q. Cheng, “Feature selection embedded subspace clustering,”
IEEE Signal Processing Letters, vol. 23, no. 7, pp. 1018–1022, July 2016.  [21] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 2, pp. 218–233, 2003.

[22]
A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 6, pp. 643–660, 2001.
Comments
There are no comments yet.