I Introduction
What is PCA? Given a data matrix with
pdimensional data vectors, the classical PCA can be formulated as learning the projection
of in a dimensional linear space characterized by an orthonormal basis ( model in Table I). Traditionally, and are termed as principal directions and principal components and the product is known as the lowrank approximation of .Applications of PCA: PCA has been widely used for two important applications.

Dimensionality reduction or lowrank recovery.

Data clustering using the principal components in the low dimensional space.
However, classical PCA formulation is susceptible to errors in the data because of the quadratic term. Thus, an outlier in the data might result in erratic principal components which can effect both the dimensionality reduction and clustering.
Robust dimensionality reduction using PCA: Candes et. al. [4] demonstrated that PCA can be made robust to outliers by exactly recovering the lowrank representation even from grossly corrupted data by solving a simple convex problem, named Robust PCA (RPCA, model in Table I). In this model, the data corruptions are represented by the sparse matrix . Extensions of this model for inexact recovery [20] and outlier pursuit [17] have been proposed.
The clustering quality of PCA can be significantly improved by incorporating the data manifold information in the form of a graph [7, 19, 8, 15]. The underlying assumption is that the lowdimensional embedding of data lies on a smooth manifold [1]. Let be the graph with vertex set as the samples of , the set of pairwise edges between the vertices and the adjacency matrix which encodes the weights of the edges . Then, the normalized graph Laplacian matrix which characterizes the graph is defined as , where is the degree matrix defined as and . Assuming that a pnearest neighbors graph is available, several methods to construct have been proposed in the literature. The three major weighting methods are: 1) binary, 2) heat kernel and 3) correlation distance [8].
In this paper we propose a novel convex method to improve the clustering and lowrank recovery applications of PCA by incorporating spectral graph regularization to the Robust PCA framework. Extensive experiments reveal that our proposed model is more robust to occlusions and missing values as compared to 10 stateoftheart dimensionality reduction models.
Model  Objective  Constraints  Parameters  Graph?  Factors?  Convex?  
1  PCA  no  yes  no  
2  RPCA [4]  no  no  yes  
3  PROPOSED  YES  NO  YES  
4  GLPCA [7]  
5  RGLPCA [7]  
6  MMF [19]  yes  yes  no  
7  MMMF [15]  
8  MHMF [8] 
Ii Main Idea & Proposed Model
The main contributions of our work are:

Exact recovery of the lowrank representation from grossly corrupted data .

Recovery of the lowrank representation that also reveals the underlying class separation.

High cluster purity in the low dimensional space even when no clean data is available.

A simple convex optimization problem with minimal parameters to achieve these objectives.
The figure on page 1 illustrates the main idea of our work.
Without any doubt the contributions 1 and 4 are given by the work of Candes et. al. [4], namely Robust PCA (RPCA, model in Table I). Thus, as a first step, we propose that instead of utilizing a classical PCAlike model so as to explicitly learn principal directions and principal components of data , one can directly recover the lowrank matrix itself ().
Secondly and more importantly, to achieve contributions 2 and 3 we propose: The lowrank matrix itself lies on a smooth manifold and it can be recovered directly on this manifold. Our proposed model is as follows:
(1)  
where the sparse errors in the data are modeled by and is the lowrank approximation of . Parameters and control the amount of sparsity of and smoothness of on the graph respectively. We will define our graph in Section VIC.
Iii Related Works: Factorized PCA Models
Both manifold regularization and robustness techniques for PCA have been proposed in the literature, either separately or combined [7, 19, 8, 15]. Following the classical PCA model they explicitly learn two factors and such that . We will, therefore refer to these models as factorized PCA models. Furthermore, unlike our model, some of these works [7] assume the graph smoothness of principal components (instead of ).
Jiang et. al. proposed Graph Laplacian PCA (GLPCA) [7] ( model in Table I) which leverages the graph regularization of principal components using the term for clustering in the low dimensional space. They also proposed a robust version of their model ( model in Table I) and demonstrated the robustness of principal components to occlusions in the data.
Zhang and Zhao [19] proposed Manifold Regularized Matrix Factorization (MMF, model in Table I) which exploits the orthonormality constraint on the principal directions (contrary to [7]) to acquire a unique lowrank matrix for any optimal pair , . In this case we have , therefore this model implicitly assumes the graph smoothness of . The extensions of this model with an ensemble of graph and hypergraph regularization terms have been proposed by Tao et. al. [15] and Jin et. al. [8] respectively ( and models in Table I).
Shortcomings of stateoftheart: Although the models proposed in [7, 19, 8, 15] leverage the graph to learn enhanced class structures, they still suffer from numerous problems. Most of these models are not robust to data corruptions [19, 8, 15]. Those which leverage the robustness suffer from nonconvexity [7]. An ensemble of graphs or hypergraphs leverages the nonlinearity of data in an effective manner [8, 15]. However, it makes the models nonconvex and the resulting alternating direction methods can get stuck in local minima.
Notation & Terminology: Throughout this article , and denote the Frobenius, nuclear and norms respectively. We will refer to the regularization term as principal components graph regularization. Note that the graph regularization involves principal components , not principal directions . We will also refer to the regularization term as lowrank graph regularization. RPCA and our proposed models ( and models in Table I) which perform exact lowrank recovery by splitting will be referred to as nonfactorized PCA models. A comparison of various PCA models introduced so far is presented in Table I. Note that only RPCA and our proposed model leverage convexity and enjoy a unique global optimum with guaranteed convergence.
Iv Comparison with Related Works
The main differences between our model (V) and the various stateoftheart factorized PCA models [7, 19, 12, 8, 15] are, as summarized in Table I, the following.
Nonfactorized model: Instead of explicitly learning the principal directions and principal components , it learns their product, i.e. the lowrank matrix . Hence, (V) is a nonfactorized PCA model.
Exact lowrank recovery: Unlike factorized models we target the exact lowrank recovery by modeling the data matrix as the sum of lowrank and a sparse matrix .
Different graph regularization term: Model (V) is based on the assumption that it is the lowrank matrix that is smooth on the graph, and not just the principal components matrix . Therefore we replace the principal components graph term with the lowrank graph term . Note that as explained in Section III, the two terms are only equivalent if orthogonality of is further assumed, as in [19, 12, 8, 15] and not in [7].
Iva Advantages over Factorized PCA Models
Robustness to gross corruptions for clustering & lowrank recovery: The lowrank graph can be more realistic than the principal components graph . It allows the principal directions to benefit from the graph regularization as well (recall that ). Thus, our model enjoys an enhanced lowrank recovery and class separation even from grossly corrupted data. For details, please refer to Sections VII & VIII (also see Fig. 13 in supplementary material).
Convexity: It is a strictly convex problem and a unique global optimum can be obtained by using standard methods like an Alternating Direction Method of Multipliers (ADMM) [2].
One model parameter only: Our model does not require the rank of to be specified upfront. The nuclear norm relaxation enables the automatic selection of an appropriate rank based on the parameters and . Furthermore, as illustrated in our experiments, the value proposed in [4] gives very good results. As a result, the only unknown parameter to be selected is , and for this we can use methods such as cross validation (For additional details please see Figs. 16 & 17 in the supplementary material).
V Optimization Solution
We use an ADMM [2] to rewrite Problem (V):
Thus, the augmented Lagrangian and iterative scheme are:
where and are the lagrange multipliers and is the iteration index. Let and , then this reduces to the following updates for , and as:
where is the proximity operator of the convex function as defined in [5]. The details of this solution, algorithm, convergence and computational complexity are given in the supplementary material Sections AA, AB & AC.
Vi Experimental Setup
We use the model (V) to solve two major PCAbased problems.
Our extensive experimental setup is designed to test the robustness and generalization capability of our model to a wide variety of datasets and corruptions for the above two applications. Precisely, we perform our experiments on 8 benchmark, 3 video and 2 artificial datasets with 10 different types of corruptions and compare with 10 stateoftheart dimensionality reduction models as explained in sections VIA & VIB.
Via Setup for Clustering
ViA1 Datasets
All the datasets are wellknown benchmarks. The 6 image databases include CMU PIE^{1}^{1}1http://vasc.ri.cmu.edu/idb/html/face/, ORL^{2}^{2}2http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html, YALE^{3}^{3}3http://vision.ucsd.edu/content/yalefacedatabase, COIL20^{4}^{4}4http://www.cs.columbia.edu/CAVE/software/softlib/coil20.php, MNIST^{5}^{5}5http://yann.lecun.com/exdb/mnist/ and USPS data sets. MFeat database ^{6}^{6}6https://archive.ics.uci.edu/ml/datasets/Multiple+Features
consists of features extracted from handwritten numerals and the
BCI database^{7}^{7}7http://olivier.chapelle.cc/sslbook/benchmarks.html comprises of features extracted from a Brain Computer Interface setup. Our choice of datasets is based on their various properties such as pose changes, rotation (for digits), data type and nonnegativity, as presented in Table II in the supplementary material.ViA2 Comparison with 10 models
We compare the clustering performance of our model with kmeans on the original data
and 9 other dimensionality reduction models. These models can be divided into two categories.1. Models without graph: 1) classical Principal Component Analysis (PCA) 2) Nonnegative Matrix Factorization (NMF) [10] and 4) Robust PCA (RPCA) [4].
2. Models with graph: These models can be further divided into two categories based on the graph type. a. Principal components graph: 1) Normalized Cuts (NCuts) [14], 2) Laplacian Eigenmaps (LE) [1], 3) Graph Laplacian PCA (GLPCA) [7], 4) Robust Graph Laplacian PCA (RGLPCA) [7], 5) Manifold Regularized Matrix Factorization (MMF) [19], 6) Graph Regularized Nonnegative Matrix Factorization (GNMF) [3], b. Lowrank graph: Our proposed model.
ViA3 Corruptions in datasets
Corruptions in image databases: We introduce three types of corruptions in each of the 6 image databases:
1. No corruptions.
2. Fully corrupted data. Two types of corruptions are introduced in all the images of each database: a) Block occlusions ranging from 10 to 40% of the image size. b) Missing pixels ranging from 10% to 40% of the total pixels in the image. These corruptions are modeled by placing zeros uniformly randomly in the images.
3. Sample specific corruptions. The above two types of corruptions (occlusions and missing pixels) are introduced in only 25% of the images of each database.
Corruptions in nonimage databases: We introduce only full and sample specific missing values in the nonimage databases because the block occlusions in nonimage databases correspond to an unrealistic assumption. Example missing pixels and block occlusions in the image are shown in Fig. 10 in the supplementary material.
Preprocessing
: For NCuts, LE, PCA, GLPCA, RGLPCA, MMF, RPCA and our model we preprocess the datasets to zero mean and unit standard deviation along the features. Additionally for MMF all the samples are made unit norm as suggested in
[19]. For NMF and GNMF we only preprocess the nonnegative datasets to unit norm along the samples. We perform preprocessing after introducing the corruptions.ViA4 Clustering Metric
We use clustering error to evaluate the clustering performance of all the models considered in this work. The clustering error is , where is the number of misclassified samples in cluster . We report the minimum clustering error from 10 runs of kmeans (k = number of classes) on the principal components . This procedure reduces the bias introduced by the nonconvex nature of kmeans. For RPCA and our model we obtain principal components via SVD of the lowrank matrix during the nuclear proximal update in every iteration. For more details of the clustering error evaluation and parameter selection scheme for each of the models, please refer to Fig. 11 and Table III of the supplementary material.
ViB Setup for LowRank Recovery
Since the lowrank ground truth for the 8 benchmark datasets used for clustering is not available, we perform the following two types of experiments.

Quantitative evaluation of the normalized lowrank reconstruction error using corrupted artificial datasets.

Visualization of the recovered lowrank representations for 1) occluded images of the CMU PIE dataset and 2) static background of 3 different videos^{8}^{8}8https://sites.google.com/site/backgroundsubtraction/testsequences.
ViC Normalized Graph Laplacian
In order to construct the graph Laplacian , the pairwise Euclidean distance is computed between each pair of the vectorized data samples . Let be the matrix which contains all the pairwise distances, then , the Euclidean distance between and is given as:
Case I: Block Occlusions
where is the vector mask corresponding to the intersection of uncorrupted values in and . Thus
Thus, we detect the block occlusions and consider only the observed pixels.
Case II: Random Missing Values: We use a Total Variation denoising procedure and calculate using the cleaned images.
Let be the minimum of all the pairwise distances in . Then the adjacency matrix for the graph is constructed by using
Finally, the normalized graph Laplacian is calculated, where is the degree matrix. Note that different types of data might call for different distance metrics and values of , however for all the experiments and datasets used in this work the Euclidean distance metric and work well. To clarify further for the reader, we present a detailed example of graph construction with corrupted and uncorrupted images in Fig. 12 of supplementary material.
Vii Clustering in Low Dimensional Space
Fig. 2 presents experimental results for the first important application of our model, i.e. clustering. Fig. 3 illustrates the principal components for three classes of occluded CMU PIE data set in 3dimensional space. Our model outperforms others in most of the cases with different types and levels of occlusions (please refer to Tables IV, V & VI in the supplementary material for additional results). In the next few paragraphs we elaborate further on 1) the advantage of graph over nongraph models, 2) the advantage of lowrank graph over principal components graph. Throughout the description of our results, the comparison with RPCA is of specific interest because it is a special case of our proposed model.
Viia Is the Graph Useful?
Our model performs better than kmeans, standard PCA, NMF and RPCA which do not leverage the graph.
Example case I: CMU PIE database with no pose variation: Consider the case of CMU PIE dataset in Fig. 2. This dataset does not suffer from pose changes and we observe that our model attains as low as error when there are no corruptions in the data. Furthermore, we attain lowest error even with the increasing levels of occlusions and missing pixels. This can also be observed visually from Fig. 3 where the principal components for our model are better separated than others.
Example case II: COIL20 database with pose variation: Our model outperforms RPCA and other nongraph models also for the COIL20 database which suffers from significant pose changes. Thus, even a graph constructed using the simple scheme of Section VIC enhances the robustness of our model to gross data corruptions and pose changes. Similar conclusions can be drawn for all the other databases as well.
ViiB LowRank or Principal Components Graph?
We compare the performance of our model with NCuts, LE, GLPCA, RGLPCA, MMF & GNMF which use principal components graph. It is obvious from Fig. 2 that our model outperforms the others even for the datasets with pose changes. Similar conclusions can be drawn for all the other databases with corruptions and by visually comparing the principal components of these models in Fig. 3 as well.
Unlike factorized models, the principal directions in the lowrank graph benefit from the graph regularization as well and show robustness to corruptions. This leads to a better clustering in the lowdimensional space even when the graph is constructed from corrupted data (please refer to Fig. 13 of the supplementary material for further experimental explanation).
ViiC Robustness to Graph Quality
We perform an experiment on the YALE dataset with 35% random missing pixels. Fig. 4 shows the variation in clustering error of different PCA models by using a graph constructed with decreasing information about the mask. Our model still outperforms others even though the quality of graph deteriorates with corrupted data. It is essential to note that in the worst case scenario when the mask is not known at all, our model performs equal to RPCA but still better than those which use a principal components graph.
Viii LowRank Recovery
Viiia LowRank Recovery from Artificial Datasets
To perform a quantitative comparison of exact lowrank recovery between RPCA and our model, we perform experiments on 2 artificial datasets as in [4]. Note that only RPCA and our model perform exact lowrank recovery so we do not present results for other PCA models. We generate lowrank square matrices (rank = to ) with and independently chosen
matrices with i.i.d. Gaussian entries of zero mean and variance
. We introduce 6% to 30% errorsin these matrices from an i.i.d. Bernoulli distribution for support of
with two sign schemes.
Coherent signs: The sign for each corrupted entry is coherent with the sign of the corresponding entry in .
Fig. 5 compares the variation of log normalized lowrank reconstruction error with rank (xaxis) and error (yaxis) between RPCA (b and d) and our model (a and c). The larger and darker the blue region, the better is the reconstruction. The first two plots correspond to the random sign scheme and the next ones to coherent sign scheme. It can be seen that for each (rank,error) pair the reconstruction error for our model is less than that for RPCA. Hence, the lowrank graph helps in an enhanced lowrank recovery.
ViiiB LowRank Recovery from Corrupted Faces
We use the PCA models to recover the clean lowrank representations from a set of occluded images of CMU PIE dataset as shown in Fig. 6. None of the factorized models using principal components graph is able to perfectly separate the block occlusion from the actual image. This is not surprising because these models (except RGLPCA) are not robust to gross errors. Our model is able to separate the occlusion better than all the models. Even though it inherits the robustness from RPCA, we observe that the robust recovery of the lowrank representation is greatly enhanced by using the lowrank graph. Please refer to Fig. 14 in the supplementary material for more results.
ViiiC LowRank Background Extraction from Videos
Static background separation from the dynamic foreground is an interesting application of lowrank recovery. We use 3 videos ^{8}^{8}footnotemark: 8 (1000 frames each) to recover the static lowrank background. The graph Laplacian is constructed between the frames of videos without utilizing any prior information about sparse errors. The lowrank ground truth is not available for these videos so we present a visual comparison between RPCA and our model for one of the frames in Fig. 7. The RPCA model (middle) is unable to completely remove the person in the middle of the frame and his shadow from the lowrank frame. However, the presence of graph in our model (right) helps in a better recovery of the static background, which in this case is the empty shopping mall lobby. The pictures are best viewed on a computer screen on the electronic version of this article. Complete videos for our model can be found here ^{9}^{9}9https://vid.me/GN0X ^{10}^{10}10https://vid.me/vR6d ^{11}^{11}11https://vid.me/RDgN. For more results please refer to Fig. 15 of supplementary material.
Ix Parameter Selection Scheme for Our Model
There are two parameters for our model: 1) Sparsity penalty and 2) Graph regularization penalty . The parameter can be set approximately equal to where is the number of data samples and is the dimension. This simple rule, as suggested by Candes et. al. [4] works reasonably well for our clustering & lowrank recovery error experiments. After fixing , the parameter can be easily selected by crossvalidation. The minimum clustering error always occurs around this value of , irrespective of the size, number of classes and of corruptions in the datasets. Due to lack of space we present the detailed results on the variation of clustering & lowrank recovery error over the grid in Figs. 16 & 17 of the supplementary material.
X Comparison of Computation Times
We corrupt different sizes of the CMU PIE dataset (n = 300, 600 and 1200) with 20% occlusions and compute the time for one run of each model which gives the minimum clustering error (Fig. 8). Clearly, RGLPCA has the highest computation time, followed by our proposed model. However, the tradeoff between the clustering error and computational time is worth observing from Figs. 2 and 8 (more details in the supplementary material Table VII). The large computation time of our model is dominated by the expensive SVD step in every iteration. Our future work will be dedicated to reduce this cost by using randomized algorithms for SVD [16] and exploiting the parallel processing capabilities [13].
Xi Conclusion
We present ‘Robust PCA on Graphs’, a generalization of the Robust PCA framework which leverages spectral graph regularization on the lowrank representation. The proposed model targets exact lowrank recovery and enhanced clustering in the lowdimensional space from grossly corrupted datasets by solving a convex optimization problem with minimal parameters. Numerical experiments on several benchmark datasets reveal that clustering using our lowrank graph regularization scheme outperforms various other stateoftheart factorized PCA based models which use principal components graph regularization. Experiments for exact lowrank recovery from artificial datasets and static background separation from videos demonstrate its ability of to perform an enhanced lowrank recovery. On the top of that, it is also robust w.r.t. the graph construction strategy and performs well even when the graph is constructed from corrupted data.
Note: Due to space constraints we present only a few important results in the main text. However, we encourage the reader to see the supplementary material for additional results.
References
 [1] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
 [2] S. Boyd, N. Parikhl, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers, found. trends mach. learn. 3 (1)(2011) 1–122, ht tp. dx. doi. org/10.1561/2200000016, 2010.
 [3] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized nonnegative matrix factorization for data representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(8):1548–1560, 2011.
 [4] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011.
 [5] P. L. Combettes and J.C. Pesquet. Proximal splitting methods in signal processing. In Fixedpoint algorithms for inverse problems in science and engineering, pages 185–212. Springer, 2011.
 [6] D. Goldfarb and S. Ma. Convergence of fixedpoint continuation algorithms for matrix rank minimization. Foundations of Computational Mathematics, 11(2):183–210, 2011.
 [7] B. Jiang, C. Ding, and J. Tang. Graphlaplacian pca: Closedform solution and robustness. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3492–3498. IEEE, 2013.
 [8] T. Jin, J. Yu, J. You, K. Zeng, C. Li, and Z. Yu. Lowrank matrix factorization with multiple hypergraph regularizers. Pattern Recognition, 2014.
 [9] S. Kontogiorgis and R. R. Meyer. A variablepenalty alternating directions method for convex optimization. Mathematical Programming, 83(13):29–53, 1998.
 [10] D. D. Lee and H. S. Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401(6755):788–791, 1999.
 [11] P.L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
 [12] X. Lu, Y. Wang, and Y. Yuan. Graphregularized lowrank representation for destriping of hyperspectral images. IEEE transactions on geoscience and remote sensing, 51(7):4009–4018, 2013.
 [13] A. Lucas, M. Stalzer, and J. Feo. Parallel implementation of fast randomized algorithms for low rank matrix decomposition. Parallel Processing Letters, 24(01), 2014.
 [14] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888–905, 2000.
 [15] L. Tao, H. H. Ip, Y. Wang, and X. Shu. Low rank approximation with sparse integration of multiple manifolds for data representation. Applied Intelligence, pages 1–17, 2014.
 [16] R. Witten and E. Candes. Randomized algorithms for lowrank matrix factorizations: sharp performance bounds. Algorithmica, pages 1–18, 2013.
 [17] H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. In Advances in Neural Information Processing Systems, pages 2496–2504, 2010.
 [18] X. Yuan and J. Yang. Sparse and lowrank matrix decomposition via alternating direction methods. preprint, 2009.
 [19] Z. Zhang and K. Zhao. Lowrank matrix approximation with manifold regularization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(7):1717–1729, 2013.
 [20] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma. Stable principal component pursuit. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pages 1518–1522, June 2010.
Appendix A Appendices
Aa Details of ADMM Solution
We use the Alternating Direction Method of Multipliers (ADMM) [2]. First, the problem (V) is split as follows:
Second, the augmented Lagrangian and iterative scheme are introduced:
(2)  
(3)  
(4) 
where and are the lagrange multipliers (dual variables) and is the iteration index.
In order to update the primal variables , and we consider the definition of proximity operator [5]. Let , where is the class of lower semicontinuous convex functions from to such that domain of F . Then, for every , the following minimization problem admits a unique solution which is known as the proximity operator of F denoted by .
Using this definition, the updates for , and at iteration using the previous iterates can be made using the proximity operators.
Update
Keeping only the terms with in Eq. (AA) we get
where and . Let denote the elementwise softthresholding operator , then we can define
as the singular value thresholding operator for matrix
, whereis any singular value decomposition of
. Let and then(5) 
Update
Following a similar procedure, we can write the update for as.
(6) 
Update
By keeping only the terms with in eq. (AA) we get
which is a smooth function in , so we can use the optimality condition to find a closed form solution for .
(7) 
Projected Conjugate Gradient method was used to update .
AB Algorithm
AC Convergence Analysis & Computational Complexity
Algorithm 1 is a special case of Alternating Directions method [18], [2]. These methods are a subset of more general class of methods known as Augmented Lagrange Multiplier methods. The convergence of these algorithms is wellstudied [2], [11], [9]. This algorithm has been reported to perform reasonably well on a wide range of problems and small number of iterations are enough to achieve a good accuracy [4]. The complexity of nuclear norm proximal computation is for and the computational complexity of the Conjugate Gradient method for updating is , per iteration. Thus, the dominant cost of each iteration corresponds to the computation of nuclear proximal operator. Our future work will be dedicated to reduce this cost by utilizing a partial SVD or an approximate SVD, as suggested in [6]. Further improvements can be made by using randomized algorithms for SVD [16] and exploiting the parallel processing capabilities [13]. Please refer to Section AL for a detailed comparison of computation time of this algorithm with other models considered in this work.
AD Properties of Various Models & Datasets
Image /  Data  Datasets  Evaluated 
nonimage  Type  Models  
Image  Faces  CMU PIE (no pose changes)  all 
ORL (pose changes)  
YALE (facial expressions)  
Objects  COIL20 (pose changes)  
Digits  MNIST (rotation)  
USPS  
non  features  BCI (Brain Computer Interface)  all except 
image  MFeat (handwritten numerals)  NMF & GNMF 
AE Evaluation Scheme and Parameter Selection for all Models
Model  Objective  Constraints  Parameters  Parameter Range  no. of runs for each  
parameter value  
1  NCut [14]  
2  LE [1]  10  
3  PCA  (nonconvex)  
4  GLPCA [7]  
5  RGLPCA [7]  using transformation in [7]  10  
(nonconvex)  
6  MMF [19]  
7  NMF [10]  10  
8  GNMF [3]  (nonconvex)  
9  RPCA [4]  1  
10  Our model  (convex) 
AF Adjacency Matrix Construction from Corrupted & Uncorrupted Data
AG Clustering Results on all Databases
Data Set  Model  No  Sample specific corruptions  Full Corruptions  
Corruptions  Occlusions  Missing values  Occlusions  Missing  
(25% of data)  (25% of data)  (% of image size)  (% of image pixels)  
25% image size  25% image pixels  15%  25%  40%  15%  25%  35%  50%  
CMU PIE ^{1}^{1}footnotemark: 1  kmeans  72.2  72.5  73.4  78.5  79.4  81.0  72.3  76.1  75.7  80.3 
(faces)  NCuts  41.7  36.0  40.0  42.0  40.0  44.0  35.0  41.3  40.7  46.7 
LE  41.7  40.0  43.3  49.3  54.0  51.3  46.3  46.0  48.3  55.0  
PCA  13.0  25.7  34.0  42.7  45.0  61.0  35.0  50.0  56.3  63.7  
GLPCA  29.8  27.7  38.3  38.0  40.2  41.3  39.3  37.5  42.7  48.2  
RGLPCA  28.0  28.3  31.7  35.0  41.3  39.0  34.3  33.7  28.7  42.0  
MMF  53.7  50.0  55.0  54.0  58.0  60.7  59.7  60.3  60.3  62.7  
NMF  49.0  68.0  70.3  76.0  80.0  79.3  49.0  76.0  80.0  79.3  
GNMF  52.7  66.0  68.3  71.3  75.0  75.3  52.7  71.3  75.0  75.3  
RPCA  3.3  4.3  11.0  6.7  13.3  23.3  9.7  8.3  17.0  37.3  
Our model  0.0  4.7  6.9  5.0  7.7  17.7  7.7  7.5  15.7  34.0  
ORL^{2}^{2}footnotemark: 2  kmeans  35.4  57.6  50.8  64.3  68.4  77.5  36.0  42.4  49.4  57.1 
(faces)  NCuts  47.0  35.7  43.0  42.3  53.0  61.0  49.7  45.7  42.7  46.7 
LE  45.3  38.0  34.7  39.0  45.3  54.3  42.7  45.0  44.0  42.7  
PCA  28.0  46.0  37.3  53.7  65.0  72.6  34.3  39.0  40.3  40.0  
GLPCA  27.6  28.3  29.7  33.6  34.1  44.3  30.3  27.9  27.6  33.3  
RGLPCA  28.3  28.7  29.0  28.3  36.3  37.3  26.0  29.3  26.7  26.3  
MMF  20.3  29.7  24.3  30.0  34.7  38.3  24.7  28.3  28.3  27.3  
NMF  29.0  31.7  27.7  78.0  79.0  81.3  29.7  39.3  50.3  53.3  
GNMF  22.7  29.0  25.3  35.0  37.0  39.3  25.3  26.7  28.0  26.7  
RPCA  18.6  27.7  20.3  26.0  35.0  71.4  24.7  26.0  27.0  36.0  
Our model  15.7  20.0  14.3  23.7  24.7  31.7  20.0  18.3  20.3  21.0  
YALE ^{3}^{3}footnotemark: 3  kmeans  53.4  54.9  64.5  71.6  73.4  73.1  54.3  56.4  58.9  63.5 
(faces)  NCuts  54.5  57.6  64.4  66.7  64.2  68.5  56.4  56.9  61.2  61.8 
LE  57.6  57.0  61.8  63.6  68.5  69.1  57.8  59.4  60.6  66.7  
PCA  53.9  49.7  58.8  61.8  66.1  70.9  55.7  55.1  62.4  61.2  
GLPCA  49.1  50.9  53.9  54.5  61.2  58.2  50.9  52.7  48.5  56.9  
RGLPCA  48.5  50.0  50.9  54.5  55.2  58.8  50.3  53.9  49.7  50.9  
MMF  38.8  37.6  46.7  55.7  55.2  53.9  38.2  38.2  44.2  49.1  
NMF  63.0  63.6  70.3  72.1  71.5  72.1  64.8  61.2  61.8  66.7  
GNMF  56.9  55.8  57.6  60.6  64.8  60.6  56.9  56.9  58.2  59.4  
RPCA  39.4  40.6  45.5  61.8  67.9  63.0  42.4  39.4  43.6  63.6  
Our model  35.1  35.8  40.0  43.6  46.1  50.9  35.2  35.8  39.4  50.3  
COIL20 ^{4}^{4}footnotemark: 4  kmeans  32.0  55.2  30.2  34.9  45.8  57.8  36.3  34.8  37.8  42.8 
(objects)  NCuts  44.5  30.5  39.5  38.5  38.5  43.5  38.0  50.0  48.0  47.5 
LE  38.0  31.5  27.0  34.5  36.5  40.0  31.0  37.5  40.0  39.0  
PCA  30.5  43.5  28.5  28.0  37.0  50.0  24.0  30.0  33.5  31.0  
GLPCA  20.5  22.0  17.0  25.0  25.5  25.3  21.7  21.5  21.5  23.3  
RGLPCA  18.5  23.5  17.5  20.5  22.0  23.5  22.0  20.0  22.0  21.0  
MMF  18.0  19.0  11.5  19.0  18.0  25.0  19.0  18.0  19.0  18.5  
NMF  20.0  30.0  19.0  32.5  47.0  60.5  17.5  26.0  24.0  26.0  
GNMF  15.5  18.5  9.5  20.0  19.5  30.0  14.5  13.5  11.0  17.0  
RPCA  19.0  33.0  13.5  19.0  28.0  50.5  15.0  16.0  16.5  27.0  
Our model  15.5  18.0  9.0  18.0  17.5  19.5  8.5  12.0  16.5  15.0 
Data Set  Model  No  Sample specific corruptions  Full Corruptions  
Corruptions  Occlusions  Missing values  Occlusions  Missing  
(25% of data)  (25% of data)  (% of image size)  (% of image pixels)  
25% image size  25% image pixels  15%  25%  40%  15%  25%  35%  50%  
MNIST^{5}^{5}footnotemark: 5  kmeans  62.2  71.2  59.6  68.2  79.2  82.5  62.0  60.5  60.6  65.9 
(digits)  NCuts  69.3  86.7  59.0  85.3  87.3  86.0  77.7  78.7  83.7  86.7 
LE  71.0  88.0  55.3  88.0  89.3  89.0  68.7  55.3  86.3  89.3  
PCA  52.0  60.0  51.3  68.3  77.0  77.3  46.0  57.3  61.0  57.0  
GLPCA  49.3  63.7  44.7  59.3  71.7  79.0  45.3  51.7  56.3  57.1  
RGLPCA  46.3  64.7  51.0  55.3  65.3  73.7  52.7  53.0  53.7  44.3  
MMF  45.7  65.7  45.0  52.7  65.3  77.3  52.7  44.7  53.3  55.7  
NMF  46.3  67.3  33.7  60.0  72.3  72.0  52.0  47.7  49.7  47.0  
GNMF  49.7  72.7  35.3  51.7  71.0  83.7  50.7  42.7  44.7  40.7  
RPCA  39.0  53.3  43.0  64.7  73.7  78.0  44.7  44.7  52.7  70.7  
Our model  31.7  46.0  32.3  53.0  62.7  69.7  29.7  33.7  35.3  37.7  
USPS  kmeans  51.0  45.3  54.0  55.4  66.2  73.8  41.2  41.7  40.1  38.9 
(digits)  NCuts  54.0  49.3  61.3  68.3  72.7  75.7  53.0  47.7  55.7  55.3 
LE  64.7  52.7  67.7  74.3  81.7  83.0  65.0  71.7  66.0  63.3  
PCA  49.7  42.0  50.3  55.0  66.0  71.7  43.3  36.0  39.3  48.0  
GLPCA  45.8  26.7  45.0  46.9  56.7  68.9  39.7  38.3  33.5  32.5  
RGLPCA  42.0  33.3  46.7  45.0  59.0  63.7  33.3  26.7  35.7  37.0  
MMF  33.0  22.3  40.7  37.7  50.3  62.6  21.5  21.6  20.7  20.7  
RPCA  28.3  29.0  46.7  56.0  67.7  69.7  29.0  32.3  32.6  38.0  
Our model  21.3  21.7  35.3  42.0  48.0  56.0  21.5  21.3  20.5  27.3 
Data  Model  No  Sample specific  Full Corruptions  
Set  Corruptions  Missing values  Missing values  
(25% of data)  (% features per sample)  
25% features per sample  15%  25%  35%  50%  
MFeat ^{6}^{6}footnotemark: 6  kmeans  32.4  30.8  25.5  29.9  32.2  33.9 
NCuts  39.8  47.0  26.3  28.3  33.5  37.8  
LE  38.0  47.3  35.3  33.0  33.5  50.8  
PCA  7.3  7.3  10.0  11.8  9.0  13.8  
GLPCA  6.0  12.5  6.0  7.8  8.0  10.0  
RGLPCA  9.5  15.0  15.0  17.3  8.3  14.8  
MMF  7.3  6.3  6.0  5.3  7.0  7.0  
RPCA  5.0  4.3  3.8  6.3  9.5  15.0  
Our model  3.3  2.0  3.5  4.3  6.8  7.0  
BCI ^{7}^{7}footnotemark: 7  kmeans  47.8  47.2  47.8  47.8  48.0  48.2 
NCuts  47.0  47.0  47.3  47.5  47.3  48.3  
LE  46.5  49.7  49.8  49.8  49.8  45.8  
PCA  45.3  45.2  42.0  43.0  43.0  45.5  
GLPCA  46.5  46.0  45.8  45.8  46.5  45.0  
RGLPCA  45.5  43.7  46.5  44.3  43.8  42.8  
MMF  47.25  47.2  47.0  47.5  47.3  48.3  
RPCA  39.8  43.0  40.0  42.8  43.5  43.0  
Our model  40.3  37.7  39.0  42.3  42.0  41.0 
AH A Comparison of the Principal Directions Learned with LowRank and Principal Components Graph
AI Additional LowRank Recovery Results on CMU PIE Dataset
AJ Additional Results for Background Separation from Videos
AK Parameter Selection for the Proposed Model
AK1 Parameter Selection for Clustering
AK2 Parameter Selection for LowRank Recovery
AL Comparison of Computation Times
Model  n = 300  n = 600  n = 1200  
Time (secs)  Clustering error (%)  Time (secs)  Clustering error (%)  Time (secs)  Clustering error (%)  
1  NCuts  0.24  47.2  0.72  48.3  1.31  49.0 
2  LE  0.24  48.1  0.70  50.2  1.24  52.7 
3  PCA  0.11  34.3  0.17  35.9  0.20  38.3 
4  GLPCA  0.12  37.5  0.43  36.1  1.60  37.2 
5  RGLPCA  150.4  34.8  356.6  35.3  1187.6  37.2 
6  MMF  0.13  36.7  0.32  37.5  1.52  38.9 
7  NMF  0.15  43.9  0.62  45.6  1.30  47.4 
8  GNMF  1.20  41.2  0.62  40.3  1.30  45.9 
9  RPCA  59.8  10.5  159.3  12.6  678.3  14.9 
10  Our model  69.7  8.4  169.6  10.9  869.8  13.8 