Fast and Robust Fixed-Rank Matrix Recovery

03/10/2015 ∙ by German Ros, et al. ∙ Universitat Autònoma de Barcelona 0

We address the problem of efficient sparse fixed-rank (S-FR) matrix decomposition, i.e., splitting a corrupted matrix M into an uncorrupted matrix L of rank r and a sparse matrix of outliers S. Fixed-rank constraints are usually imposed by the physical restrictions of the system under study. Here we propose a method to perform accurate and very efficient S-FR decomposition that is more suitable for large-scale problems than existing approaches. Our method is a grateful combination of geometrical and algebraical techniques, which avoids the bottleneck caused by the Truncated SVD (TSVD). Instead, a polar factorization is used to exploit the manifold structure of fixed-rank problems as the product of two Stiefel and an SPD manifold, leading to a better convergence and stability. Then, closed-form projectors help to speed up each iteration of the method. We introduce a novel and fast projector for the SPD manifold and a proof of its validity. Further acceleration is achieved using a Nystrom scheme. Extensive experiments with synthetic and real data in the context of robust photometric stereo and spectral clustering show that our proposals outperform the state of the art.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 6

page 7

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Systems with fixed-rank constraints exist in many applications within the fields of computer vision, machine learning and signal processing. Some examples are: photometric stereo, where depth is estimated from a still camera that acquires images of an object under different illumination conditions, leading to a rank constraint; motion estimation, where the type of motion of the objects defines a rank.

This paper addresses the problem of efficient sparse fixed-rank (S-FR) matrix decomposition, i.e.: given a matrix affected by outliers, this is, gross noise of unknown magnitude, we aim to recover an uncorrupted matrix and a sparse matrix such that and , with known beforehand, as defined in (1),

(1)

S-FR matrix recovery is intimately related to the sparse low-rank (S-LR) recovery problem (2

), for which algorithms such as Robust Principal Component Analysis (RPCA) 

[3] and Principal Component Pursuit (PCP) [30] are well known due to their extraordinary capabilities to solve it and their application to a wide range of problems.

(2)

Robust S-FR recovery might seem a simpler case of S-LR decomposition, or even a straightforward derivation. However, S-FR recovery is a hard problem that involves a highly non-convex constraint due to the rank imposition. This factor is not present in the S-LR decomposition problem due to the nuclear norm relaxation. Therefore, a careful design is needed in order to produce a stable S-FR decomposition method with a good convergence rate.

In addition to the convergence speed, achieving efficient and scalable S-FR decompositions requires algorithms with very low computational complexity per iteration. The main bottleneck of these algorithms is the enforcement of the correct rank or its minimization, a step that usually requires the use of a TSVD or an SVD per iteration, which complexity is for a matrix of rank . How to reduce this bottleneck is a line of research that has been recently targeted by several works such as [9][23][24], showing interesting ideas leading to algorithms with quadratic and linear complexities with respect to the input matrix size. The key lessons to learn from these works are two: (i) the factorization of large-scale problems into products of small size matrices [16]; and (ii) the use of a sub-sampled version of the input matrix to produce fast and accurate approximations of the solution [24].

Our work has been influenced by these concepts and several ideas drawn from state-of-the-art differential geometry techniques. We have experimented with the mentioned concepts and improved upon them in order to create an efficient and precise S-FR decomposition algorithm suitable for large scale problems. In this respect we present the following contributions: (i) an optimization method, named FR-ADM111Code is available at https://github.com/germanRos/FRADM (Fixed-Rank Alternating Direction Method), that solves S-FR problems following an ADM scheme to minimize an Augmented Lagrangian cost function; (ii) a novel procedure to impose fixed-rank constraints through a very efficient polar factorization, named FixedRankOptStep, which is superior in convergence, stability and speed than the bilinear counterparts used by state-of-the-art methods and; (iii) the use of a simple projector to impose SPD constraints efficiently along with a novel proof of its validity.

We show that our method, based on the FixedRankOptStep procedure, outperforms in time, accuracy and region of applicability current state-of-the-art methods. Furthermore, we also show that our proposal FR-ADM can benefit from Nystrom’s method [25] to improve its computational efficiency while maintaining a good level of accuracy. These results are supported by thorough experimentation in synthetic and real cases, as detailed in Sec. 5.

2 Summary of Notation

Capital letters, such as

represent matrices, while vectors are written in lower-case.

stands for the matrix transpose, for its pseudo-inverse and is the matrix trace operator. stands for the

-th largest singular value of a given matrix. The indexation of the

-th row and the -th column is defined as . Matrix sub-blocks of are referred to as to index from row to and column to . is the Frobenius norm and are the entry-wise -norm and the matrix nuclear norm, respectively. and are the square and the rectangular identity matrices. , is the Stiefel manifold of matrices with . and stand for the Symmetric (Semi-)Positive Definite matrices, respectively. is the fixed-rank manifold of matrices with and is the set of full-rank matrices. stands for the Orthogonal group, but be careful, since is also used to describe the complexity of algorithms in big-O notation. We also make use of some proximity operators and projectors defined as: , the symmetric part of M. for the standard soft-thresholding (promotes sparsity); for the projector onto the Stiefel manifold, and for the projector onto the SPD manifold (these are defined in Sec. 4.1).

3 Related Work

The Accelerated Proximal Gradient (APG) [13] will serve as our starting point within the plethora of methods present in the literature. This method, although it is not the first one proposed to solve the RPCA problem (see for instance FISTA [2]), is an appealing combination of concepts. It approximates the gradients of a given cost function to simplify its optimization and improves its convergence. It also includes Nesterov updates [20] and the critical continuation scheme [13], which all together lead to a method with sub-linear convergence , where is the number of iterations. Its computational complexity per iteration is for matrices. Afterwards, authors of [12] proposed the Augmented Lagrangian Multiplier method (ALM) in two flavours. First they present the exact ALM (eALM), which uses an Alternating Direction Method (ADM) to minimize an Augmented Lagrangian function in a traditional and exact way. Then, an inexact version is also proposed (iALM), which approximates the original algorithm to reduce the number of times the SVD is used. The convergence rate of eALM depends on the update of , the penalty parameter of the Augmented Lagrangian. When the sequence grows geometrically following the continuation principle, eALM is proven to converge Q-linearly . For iALM, there is not proof of convergence, but it is supposed to be Q-linear too. Both methods have computational complexity of per iteration.

Recently, ALM was extended in [14], which included a factorization technique along with a TSVD from the PROPACK suite [11] to achieve a complexity of per iteration. The bottleneck caused by the TSVD has also been addressed via random projections, leading to the efficient Randomized-TSVD (R-TSVD) [9]. However, although being more efficient than the regular TSVD, results are considerably less accurate. The idea of including a factorization of the data was then improved by LMAFIT [23], which uses a bi-linear factorization to produce two skinny matrices and , such that , to speed up the process. A similar concept was used in the Active Subspace method (AS) [16], but in this case the bi-linear factorization is given by and , such that . This formulation turns out to be very useful when , leading to a complexity per iteration of . Unfortunately, is an upper bound for the actual rank of and needs to be given by the user. This is not suitable for S-LR scenarios, but fits perfectly in the S-FR framework. Another point to highlight about LMAFIT and AS is the utilization of closed-form projectors to impose constraints like orthogonality, low-rank and sparsity. This algebraical way of optimizing functions differs from the geometrical counterparts in the literature on manifold optimization (see [1][18]). Substituting all the required machinery to perform differential geometry (e.g., retractions, lift maps, etc.) by projectors seems a good idea from the point of view of efficiency. However, this method is not absent of problems. The factorization in AS is highly non-convex, an issue that influences the number of iterations required for the convergence of the method, which is notably higher than eALM, despite having the same theoretical convergence rate .

One of the contributions of our work is to improve the convergence of fixed-rank projection methods. To this end we employ a polar decomposition as in [18]. This polar decomposition offers us the possibility of exploiting the manifold structure of fixed-rank problems as the product of two Stiefel and an SPD manifold. . However, we deviate from [18] to propose more efficient expressions that make use of projectors to speed up the process, giving rise to a better convergence. We also consider worth highlighting a key tool described in the recent work [24]. There, the authors follow a strategy that resembles the one described in [16], but they add a sub-sampling step based on the Nystrom’s method [25] that leads to a linear complexity per iteration. We borrow this idea to further speed up our optimization.

4 Sparse Fixed-Rank Decomposition

We propose the resolution of the non-convex program in (3) as a direct way to perform the sparse and fixed-rank decomposition—note that (3) is equivalent to the program (1) defined in Sec.1.

(3)

The optimization of (3) is carried out over an Augmented Lagrangian function, leading to (4). stands for the Lagrange multiplier and represents the fixed-rank manifold of rank .

(4)

To efficiently solve (4) we utilize an ADM scheme [12] endowed with a continuation step, as presented in Algorithm 1. The update of the fixed-rank matrix is obtained via the FixedRankOptStep algorithm that implements the proposed polar factorization. For the sparse matrix , the standard soft-thresholding is used. Notice that is updated following a geometric series (Alg.1.#9) in order to achieve a Q-linear converge rate  [12]. Despite having the same asymptotic convergence order as LMAFIT, AS and ROSL, our method FR-ADM takes less iterations to converge, due to the accuracy of the novel FixedRankOptStep. This is especially important for the cases where the magnitude of the entries of are similar to those of , a challenging situation that other state-of-the-art methods fail to address correctly. We provide empirical validation for this claim in Sec. 5.

0:  Data matrix , (rank)
1:  
2:  
3:  while not converged do
4:     
5:                
6:     
7:                
8:     
9:     
10:     
11:  end while
12:  return  , .
Algorithm 1 FR-ADM

An adapted version of FR-ADM, referred as FR-Nys, is also provided. FR-Nys exploits Nystrom’s subsampling method ([25][24]), to further speed up computations. This method is presented in Algorithm 2 and follows the recipe given in [24].

0:  Data matrix , (rank)
1:  
2:  ,  for
3:  ,   for
4:  ,  
5:  return  , .
Algorithm 2 FR-Nys

Nystrom’s scheme proceeds by randomly shuffling the rows of producing . Then, the top and the left blocks of are processed separately by using FR-ADM (Alg.1). Notice that these blocks are chosen of size and , where has to be a number larger than the expected matrix rank. In our case . Finally the independently recovered matrices and are combined to produce and .

4.1 Polar Factorization on the Fixed-Rank Manifold

Imposing rank constraints requires an efficient way of computing the projection of an arbitrary matrix with arbitrary rank onto the fixed-rank manifold . A simple solution is provided by the Eckartââ‚-¬â€œYoung theorem [5], which shows that the optimization problem (5):

(5)

is solved by the truncated SVD (TSVD) of . Despite the success of the TSVD as a tool for producing low-rank approximations, and the many available improvements, as for instance the usage of random projections[9], some problems require the computation of many TSVDs (typically one per iteration) of very large matrices. Thus an efficient alternative to the usual TSVD algorithm is required.

In this section we propose the method FixedRankOptStep (Algorithm 3), which computes a fast approximate solution to the projection of a matrix onto the fixed-rank manifold, like the given by the TSVD but much faster. Additionally, in the Appendix we also propose the method FixedRankOptFull (Algorithm 4) that can be seen as a series of iterations of the FixedRankOptStep algorithm, providing a solution with a prescribed accuracy and with Q-linear convergence rate to the minimization problem (5) living on . The FixedRankOptStep algorithm is suitable for large-scale problems where many TSVDs of large matrices are required, and an approximate solution is faster and enough for convergence, as we will show later.

Following [18], we use a polar factorization on suggested by the TSVD. Given a matrix of rank , its TSVD factorization is

(6)

where , and . Then, a transformation

(7)

where , does not change , and allows to write it as

(8)

where now , , and . Thus, the fixed-rank manifold can be seen as the quotient manifold . From this, we reformulate (5) in as the solution of (9).

(9)

The FixedRankOptStep algorithm performs a single step of an alternating directions minimization (ADM) on each of the submanifolds , and (Algorithm 3).

0:  Data matrix , previous values , ,
1:  
2:  
3:  
4:  return  , , .
Algorithm 3 FixedRankOptStep Algorithm

4.1.1 Minimization on the Stiefel Manifold

The minimization subproblems on Stiefel manifolds involving and in Algorithm 3, are not the standard Stiefel Procrustes Problem [6]. Here, the Stiefel matrix is left-multiplying instead of right-multiplying, as usually, which allows to provide a fast closed-form solution by using the Orthogonal Procrustes Problem (OPP) [22], as shown in (10):

(10)

where denotes the projector onto the Stiefel Manifold. This can be efficiently computed through a skinny SVD as . Alternatively, if (maximal rank, as we shall assume in the following), it can be computed as . This shows that always exists and it is unique. A similar result holds for the minimization of by simply transposing (10).

4.1.2 Minimization on the SPD manifold

The minimization subproblem on the SPD manifold is more challenging. The reason is that, although convex, the SPD manifold is an open manifold and therefore the existence of a global minimum is not guaranteed. Its closure is the SPSD manifold, and there the existence of a solution is neither guaranteed. However, we shall see that in our case there exists a minimun in . Let us analyse this by first introducing a novel projector onto the SPD manifold. To this end we consider the SPD Procrustes Problem [26] (11):

(11)

where the projector is simply given by . In general, the solution of the SPD Procrustes Problem requires solving a Lyapunov equation [26], but in our case is simpler since and are orthogonal. Although in general there is not guarantee that is positive definite, we can assure it for our formulation, see the Appendix.

4.2 Convergence Analysis of FR-ADM

Since the optimization problem (3) is highly non-convex, a global convergence theorem as in eALM [12] cannot be given. However, a weak covergence result similar to iALM or that of LMAFIT [23] (where there is no nuclear norm minimization) can be given. For that purpose, let us state the first-order optimality conditions for the constrained minimization problem (3):

(12)

where , and is a Lagrange multiplier. Then we can prove that:

Theorem 1

If the sequence of iterates generated by Algorithm 1 converges to a point , this point satisfies the conditions (4.2) and therefore is a local minimum of (3).

Proof: Using Algorithm 1, and given a projector , we have:

and from this the conditions (4.2) are easily derived. . As for the convergence rate, a similar argument to the one used in [12] shows that the convergence rate is .

5 Experimental Evaluation

Fig. 1: Convergence speed of different Fixed-rank projectors with respect to an exact TSVD.
Fig. 2: Phase transition diagrams for APG, eALM, iALM, , AS, LMAFIT, ROSL and FR-ADM, showing the percentage of error according to the percentage of outliers (y-axis) and the fraction of the matrix rank for

(x-axis). Probabilities are calculated as

, where is the number of repetitions and .

FR-ADM, and its Nystrom accelerated variant FR-Nys, are compared here against the selected methods of the state of the art, i.e.: Accelerated Proximal Gradients (APG) [13], inexact Augmented Lagrangian Multiplier (iALM), exact Augmented Lagrangian Multiplier (eALM) [12], Active Subspace method (AS) [16], Low-Rank Matrix Fitting (LMAFIT) [23] and Robust Orthonormal Subspace Learning (ROSL) [24]. These methods are good representatives of the evolution of S-LR and S-FR solutions, ranging from the fundamental proximal gradients of APG to sophisticated factorizations included in AS and ROSL, via ADMM optimization [12]. We also included a version of iALM that makes use of the Randomized TSVD (R-TSVD) [9] in order to show the benefits of our approach against simple randomization. It is critical for the correct understanding of our experiment to clarify that we have split the previous methods up into two categories, i.e., S-LR and S-FR techniques. APG, iALM and eALM represent S-LR methods, i.e., the rank is not known a priori; while R-TSVD, AS, LMAFIT, ROSL and FR-ADM represent S-FR solutions, i.e., a correct initialization of the rank is provided, since the specific application allows it. This assumption holds for the entire section. Experiments are conducted over synthetic and real data to show the capabilities of our technique in computer vision problems. All the algorithms have been configured according to the suggestions of their respective authors. The experiments were run on a desktop PC at 3.2GHz and 64GB of RAM.

5.1 Synthetic Data

We test the recovery accuracy and time performance of the methods with matrices of different dimensions and ranks. To this end, we generate full-rank matrices and

from a Gaussian distribution

, such that and . A sparse matrix representing outliers is created with a given percentage of its entries being non-zero and magnitudes in the range . Then, the final corrupted matrix is . We deliberately forced the sparse entries to have a magnitude similar to the one of the expected low-rank matrix. The reason for this is that usually the experiments presented in the literature impose a good differentiation between the magnitude of the entries of and , making the recovery problem almost trivial. Here, we remove that simplification, allowing for similar magnitudes of the corrupted entries, which makes the problem more interesting. We will show that, with this challenging setup, the performance of many state-of-the-art methods dramatically decreases, while our approach maintains a good recovery accuracy.

Our first test measures the recovery capabilities of the different methods under study when subjected to similar magnitudes of the entries of and . To this end we create corrupted matrices of increasing rank and an increasing fraction of outliers. The result of this experiment is shown in Figure 2 in form of phase transition diagrams, with rank fractions represented in the x-axis and outlier fraction in the y-axis. Colors represent the recovery (inverse) probability of each case, i.e., the lower error (cold colors, i.e. blue-ish) the better. From this plot, it can be seen that these conditions are very challenging for all the algorithms.

APG, eALM and iALM, making use of an SVD, end up with a very narrow recovery region (in blue). R-TSVD gets a narrow recovery region due to accuracy problems (see also Fig. 1 for further information). Notice that AS is not even able to converge beyond a of rank due to the strong non-convexity induced by its bi-linear factorization. LMAFIT shows a rather acceptable recovery region, while ROSL clearly suffers in obtaining a correct recovery for this sort of data. In our analysis, ROSL performs well when the magnitude of (the noisy entries) are several magnitudes bigger than those of . However, in other cases the recoverability of ROSL dramatically decay. Our proposal, FR-ADM, presents the best recovery for a wider region even in this challenging setup. This characteristic is critical for real applications where outliers might be either very large or very subtle.

We also evaluated one of the most critical aspects of these methods, i.e., the accuracy of a given method at providing a good low-rank approximations of a matrix . State-of-the-art approaches have gained in efficiency by replacing the SVD for a more convenient fixed-rank projection, as in the case of R-TSVD, AS, LMAFIT, ROSL and our proposal FR-ADM. However, as shown in Figure 1, different projection strategies lead to different convergence rates and speeds. In this way, when compared against an exact TSVD, the polar decomposition used by FR-ADM turns out to be superior to all its competitors, as derived from the reduced number of iterations required to achieve a relative error of . We would like to highlight that our approach even presents a better convergence behaviour than the well-known R-TSVD, which is considered one of the fastest methods for low-rank projection. Later, we will show that FR-ADM not only has a better convergence, but is also faster and more accurate.

Our second experiment uses matrices of increasing sizes (), ranging from to , while keeping the rank fixed, and the entries magnitudes as defined above. 10 repetitions are considered per each size. In this case the methods under evaluation are the APG, iALM, eALM, R-TSVD, AS, LMAFIT, ROSL, ROSL+ and our proposal FR-ADM, along with its equivalent accelerated version, FR-Nys. We have accelerated FR-ADM to present a counterpart to ROSL+[24]. In this way we can offer a fair comparison with our proposal and show that our method remains superior after the Nystrom’s speed-up. Results of this test are shown in Table I, considering the recovery error for both matrices and given by and , where and are the optimal matrices. We also consider computational time in seconds and the number of iterations used by each method. FR-ADM is the method with the best trade-off of high recovery accuracy and low computational time (the fastest of the non-accelerated methods). The efficiency of methods such as LMAFIT and ROSL considerably decreased due to their difficulties to face small sparse entries. APG, iALM and eALM also find troubles searching for the appropriate rank in this challenging conditions. For the case of the R-TSVD, its accuracy is lower than desired, and due to its lack of accuracy requires too many iterations to converge.

For the accelerated methods, FR-Nys has proven to be the fastest and the most accurate in all synthetic tests, despite the accuracy degradation provoked by the matrix sampling. The benefits of applying Nystrom’s acceleration are clear, specially for big matrices, as in the case, where the total time is reduced in two orders of magnitude. However, as we show in the next experiment, this acceleration is not convenient for problems with large matrix ranks.

In this third experiment methods performance is tested against matrices of increasing dimensions and rank. Matrices are created as described above, but their rank is established to be , where is the matrix size. Results are summarized in Table II. The first thing to notice is that the time of Nystrom-accelerated methods is bigger than their unaccelerated counterparts. This is due to the high rank of the problem and that the matrices resulting from the Nystrom’s sampling technique are of sizes and , with big enough (usually ). This leads to two matrices that are almost of the size of the original one, making the use of Nystrom counterproductive. Regarding the unaccelerated methods, FR-ADM performs almost twice faster than the second best approach, AS. In terms of recovery accuracy, all the methods present similar results, except for small matrices, where ROSL fails due to its sensibility to initialization parameters.

S-LR S-FR No Accel. S-FR Accel.
APG iALM eALM R-TSVD AS LMAFIT ROSL FR-ADM ROSL+ FR-Nys
Err. L 6.2e-7 6.3e-10 3.3e-9 1.0e-7 7.6e-9 1.3e-4 5.0e-10 2.21e-10 1.7e-2 6.8e-10
Err. S 8.4e-5 1.1e-7 5.8e-7 8.7e-6 3.8e-7 9.1e-3 7.5e-8 3.6e-8 1.2e+0 4.8e-8
500 iters 140 33 9 96 120 40 93 28 200 68
time 11.74 1.25 2.29 0.90 0.59 0.11 0.89 0.11 0.67 0.17
Err. L 4.7e-7 1.1e-9 2.5e-10 1.8e-7 1.4e-9 1.8e-7 1.3e-9 5.0e-10 1.2e-4 6.2e-10
Err. S 8.7e-5 5.1e-7 6.4e-8 1.6e-5 4.3e-7 1.2e-5 6.3e-7 2.0e-7 8.4e-3 8.7e-7
1K iters 142 34 10 95 133 65 98 28 200 65
time 61.75 5.87 11.04 1.57 2.39 0.75 4.27 0.46 1.29 0.22
Err. L 3.7e-7 8.1e-10 2.3e-10 4.4e-7 1.3e-9 2.1e-8 1.1e-9 3.0e-10 5.2e-8 3.1e-10
Err. S 9.4e-5 4.5e-7 7.8e-8 3.4e-5 6.3e-7 4.6e-7 6.9e-7 1.8e-7 3.6e-6 7.6e-8
2K iters 144 35 10 92 131 300 98 29 200 66
time 396.4 20.34 50.02 5.14 9.64 12.45 17.79 1.57 1.98 0.31
Err. L 2.6e-7 4.8e-10 2.5e-10 7.3e-7 9.4e-10 1.3e-8 6.8e-10 2.7e-10 4.7e-9 3.3e-10
Err. S 9.3e-5 4.2e-7 1.0e-7 5.9e-5 6.3e-7 2.9e-7 6.0e-7 2.2e-7 3.2e-7 2.2e-8
4K iters 147 36 10 91 135 300 99 29 140 62
time 3002 112 328 18 50.05 56.93 67.63 7.52 3.33 0.65
Err. L 1.9e-7 5.4e-10 2.7e-10 1.4e-6 6.7e-10 9.4e-9 4.9e-10 4.5e-10 5.4e-9 5.0e-10
Err. S 9.3e-5 6.5e-7 1.6e-7 1.3e-4 6.8e-7 2.0e-7 6.2e-7 7.6e-7 3.7e-7 2.8e-8
8K iters 150 36 10 88 138 300 100 28 139 61
time 22415 517 2214 63.6 243 202 261 27.42 6.90 1.24
TABLE I: Average evaluation of recovery accuracy and computational performance for matrices of different dimensions, with 10% outliers and across ten repetitions. Best time of an accelerated method is shown in red and the best time of an unaccelerated method is shown in blue.
S-LR S-FR No Accel. S-FR Accel.
APG iALM eALM R-TSVD AS LMAFIT ROSL FR-ADM ROSL+ FR-Nys
Err. L 1.3e-5 1.7e-4 1.1e-6 2.3e-7 1.5e-8 1.8e-4 1.2e-2 9.0e-9 1.7e-4 2.7e-10
500 Err. S 1.3e-4 1.7e-3 1.1e-5 4.1e-5 1.1e-7 2.8e-2 1.2e-1 1.1e-7 2.5e-2 4.1e-8
r=50 iters 175 37 11 88 85 48 138 37 200 66
time 13.77 3.61 15.44 1.70 0.75 0.42 6.43 0.34 8.99 0.97
Err. L 2.4e-6 6.9e-9 4.4e-7 6.7e-7 1.6e-8 2.8e-8 1.2e-8 9.9e-9 4.3e-8 8.1e-10
1K Err. S 3.4e-5 1.3e-7 6.2e-6 1.6e-4 1.7e-7 2.2e-6 8.8e-8 1.8e-7 9.4e-6 1.8e-7
r=100 iters 174 37 11 82 79 300 69 36 200 62
time 68.88 14.36 61.74 7.63 3.18 7.43 33.13 1.54 69.75 3.61
Err. L 6.8e-7 6.3e-9 2.0e-7 1.7e-6 1.1e-8 1.9e-8 1.1e-8 1.1e-8 4.5e-8 1.6e-9
2K Err. S 1.3e-5 1.7e-7 4.0e-6 5.9e-4 1.6e-7 2.0e-6 1.2e-7 3.0e-7 1.4e-5 4.8e-7
r=200 iters 175 37 11 77 85 300 67 35 200 60
time 461 80.49 332 32.74 17.32 30.77 325 7.53 653.13 16.93
Err. L 3.5e-7 7.5e-9 1.5e-7 5.0e-6 1.2e-8 1.3e-8 1.0e-8 6.8e-9 4.7e-8 5.3e-9
4K Err. S 1.0e-5 2.8e-7 4.4e-6 2.4e-3 2.7e-7 1.9e-6 1.8e-7 2.6e-7 2.0e-5 2.3e-6
r=400 iters 175 37 10 71 89 300 66 36 200 57
time 3453 529 2106 183.5 107 162 3586 43 7008 96.57
Err. L 5.9e-7 5.0e-9 4.3e-9 1.5e-5 6.3e-9 8.6e-9 7.1e-8 9.5e-9 4.8e-8 1.4e-8
8K Err. S 3.7e-4 5.5e-6 3.4e-6 1.0e-2 6.6e-6 1.8e-6 1.2e-5 1e-5 3.0e-5 8.6e-6
r=800 iters 143 35 10 65 130 300 7107 21 200 54
time 23130 2651 6394 1382 1075 1035 97397 166 91242 564
TABLE II: Average evaluation of recovery accuracy and computational performance for matrices of different dimensions, with 10% outliers and across ten repetitions. Best time of an accelerated method is shown in red and the best time of an unaccelerated method is shown in blue.

5.2 Robust Photometric Stereo

We have chosen photometric stereo [27] (PS) as our first example of fixed-rank problem. PS consists in estimating the normal map and depth of a still scene from several 2D images grabbed from the same position but under different light directions. The Lambertian reflectance model [27] is assumed, such that the light directions , the matrix of normals (unknowns) , and the matrix of pixel intensities are related via , where represents the albedo. The objective of recovering the normal map can be achieved by a Least-Squares (LS) method, but the quality of such a solution would suffer in the presence of outliers. Instead, robust decompositions can be used to get ride of outliers, as proposed in [28]. Since is a product of two rank-3 matrices, in ideal conditions its rank is at most 3. We make use of this rank property to recover an uncorrupted version of that leads to a better estimation of the map and consequently of the depth map.

In our tests we use a dataset of objects viewed under 20 different illuminations, provided in  [29]. From such images, we recover an uncorrupted version of the intensities . Then we run the Photometric Stereo Toolbox [29] to recover normal maps, depth maps, 3D models and some statistics. Table III shows the error in the normal maps after the recovery process with different methods. Here, we consider the reconstruction error, i.e., the normal map is re-rendered into a shading image and then compared with the captured images. From the resulting error map several statistics are computed (RMS, mean and maximum error). The classical LS approach is taken as a reference of non-robust approaches. As robust methods, APG, eALM and iALM, AS, LMAFIT, ROSL and FR-ADM are considered. R-TSVD has not been considered due to its observed reduced accuracy. Nystrom accelerated versions are excluded due to the small size of the observation matrices, a constraint that prevents speed-ups.

The comparison shows that AS, ROSL and FR-ADM are the most accurate methods, producing estimations of the normal map with reconstruction errors below . The remaining methods are far from the accuracy offered by these fixed-rank techniques, producing high residuals. Although AS consistently presents a lower error in the majority of the cases, the error differences below are of no impact for the application. This is shown in the error maps of Fig. 3(a). However, computational time is a critical factor for this problem, where, FR-ADM is one order of magnitude faster than ROSL and two orders faster than AS.

This figure displays the error maps of the considered approaches. As expected, LS leads to high errors due to outliers. APG, iALM and eALM improve LS results, but since they do not use the rank- constraint recovered matrices have an erroneous low-rank. Fixed-rank techniques, such as AS, ROSL and FR-ADM achieve very low residuals, making the error maps black. The recovered normal maps after the application of the FR-ADM technique are shown in Fig. 3(b) along with the 3D reconstruction of the objects. It can be concluded that S-FR techniques, can drastically benefit problems like photometric stereo and FR-ADM stands as the fastest alternative while offering a very high accuracy.

LS APG iALM eALM AS ROSL LMAFIT FR-ADM
RMS 1.4e-2 3.7e-3 3.9e-3 3.9e-3 1.2e-12 1.6e-11 2.3e-2 1.5e-11
Frog Mean Err. 1.1e-2 2.7e-3 2.7e-3 2.7e-3 1.2e-12 1.4e-11 7.9e-3 1.3e-11
Max Err. 1.6e-1 2.2e-2 2.1e-2 2.1e-2 1.8e-12 4.8e-11 2.1e-1 4.7e-11
Time(s) x 2.3e+2 1.4e+2 5.6e+2 3.1e+1 4.0e+1 1.4e+2 7.1e+0
RMS 1.4e-2 2.7e-3 2.5e-3 2.5e-3 4.3e-14 2.7e-11 9.6e-3 2.5e-11
Cat Mean Err. 9.3e-3 1.9e-3 1.8e-3 1.8e-3 4.1e-14 2.3e-11 3.9e-3 2.2e-11
Max Err. 2.2e-1 1.8e-2 1.4e-2 1.4e-2 6.4e-14 6.6e-11 1.4e-1 6.7e-11
Time(s) x 1.8e+2 1.1e+2 4.3e+2 2.4e+1 3.0e+1 1.1e+2 5.9e+0
RMS 1.5e-2 2.9e-3 2.8e-3 2.8e-3 6.0e-13 2.6e-11 1.4e-2 2.6e-11
Hippo Mean Err. 9.8e-3 1.6e-3 1.5e-3 1.5e-3 5.7e-13 2.4e-11 6.4e-3 2.3e-11
Max Err. 1.9e-1 2.3e-2 1.9e-2 1.9e-2 9.8e-13 8.1e-11 1.8e-1 8.4e-11
Time(s) x 1.9e+2 1.2e+2 4.7e+2 2.6e+1 3.2e+1 1.2e+2 6.0e+0
RMS 1.4e-2 4.0e-3 3.9e-3 3.6e-3 3.8e-12 1.8e-11 1.8e-2 1.5e-11
Lizard Mean Err. 1.2e-2 3.1e-3 3.0e-3 2.8e-3 3.5e-12 1.6e-11 6.2e-3 1.3e-11
Max Err. 1.7e-1 3.6e-2 2.7e-2 2.7e-2 1.2e-11 5.5e-11 2.2e-1 4.4e-11
Time(s) x 2.8e+2 1.6e+2 7.8e+2 3.7e+1 4.3e+1 1.6e+2 8.9e+0
RMS 1.0e-2 2.7e-3 2.5e-3 2.5e-3 1.4e-11 1.9e-14 1.5e-2 6.8e-11
Pig Mean Err. 7.9e-3 2.2e-3 2.1e-3 2.0e-3 1.4e-11 1.5e-14 5.1e-3 5.5e-11
Max Err 2.1e-1 1.2e-2 1.5e-2 1.4e-2 2.7e-11 8.7e-14 2.2e-1 3.1e-10
Time(s) x 2.3e+2 1.4e+2 5.2e+2 3.2e+1 3.7e+1 1.5e+2 7.7e+0
RMS 4.3e-2 1.1e-2 9.1e-3 9.9e-3 8.8e-13 2.8e-13 2.7e-2 1.3e-13
Scholar Mean Err. 3.3e-2 1.0e-2 8.4e-3 9.2e-3 7.9e-13 2.2e-13 1.5e-2 1.0e-13
Max Err. 3.3e-1 3.3e-2 2.2e-2 2.4e-2 1.9e-12 1.3e-12 2.4e-1 6.0e-13
Time(s) x 5.0e+2 3.0e+2 1.3e+3 6.5e+1 8.0e+1 3.1e+2 1.5e+1
TABLE III: Evaluation of the reconstruction error for the photometric stereo dataset [29]. The time taken for the LS method is not included in the evaluation.
Fig. 3: (a) Normal error maps after the reconstruction, with intensities scaled by for visualization. Notice that the errors of AS, ROSL and FR-ADM are insignificant, below . (b) 3D reconstruction of the objects after the application of the FR-ADM technique.

5.3 Robust Spectral Clustering

We address clustering as a fixed-rank optimization problem with a known number of clusters represented by the matrix rank, where such a rank can become very high. Here, S-FR methods can be easily added to the pipeline of Spectral Clustering approaches (SP)[4] to increase robustness to outliers and improve accuracy. We consider the problem of clustering faces given the number of categories for three face data sets, i.e., the Extended Yale Face Database B [7] (16128 images of 38 different subjects), the AR Face database [17] (4000 images of 126 different subjects) and the MUCT Face Database [19] (3755 images of 625 different subjects). All of them contain people under different illumination conditions. In addition, MUCT and AR include pose variations, and in the case of AR people use different outfits (see Fig.4 for some examples).

Fig. 4: Instances of males and females subjects of the different data sets used in our evaluation.

In our experiments we use the Parallel Spectral Clustering in Distributed Systems (PSCDS) [4] method as the base code for spectral clustering, but just employ a simple desktop machine. The different S-FR methods are incorporated to PSCDS as a preprocessing stage as follows. First, each image is described by the Gist [21] holistic descriptor with scales of orientations and blocks. This produces a vector of dimensions. The use of Gist instead of the original images has consistently produced an improvement in accuracy in the range of [15%, 20%]. Secondly, all the descriptors of a dataset are combined forming an observation matrix , where is the total number of images in the specific dataset. The rank of is the number of expected clusters . Then, the S-FR method under evaluation recovers a subspace of rank from . The matrix is then used in the pipeline of PSCDS to compute the distance matrix , considering five nearest neighbours per sample, followed by the spectral clustering.

We have considered LMAFIT, ROSL, ROSL+, FR-ADM and FR-Nys as representatives of the S-FR approaches. Additionally, we also compared against state-of-the-art clustering techniques such as the Robust Subspace Segmentation by Low-Rank Representation (LRR) [15] and the Smooth Representation Clustering (SMR) [10], specifically designed for clustering purposes. The results of our evaluation are presented in Table IV, including the average clustering error (ce); the base time, i.e., time taken by the specific S-FR method; and the total time, i.e., base time plus the time taken by the PSCDS. For LRR and SMR the total time is that produced by the method.

When considering the Yale-B and AR datasets FR-ADM obtains the lowest clustering errors, 2.7% and 6.65% respectively. Moreover, FR-ADM and FR-Nys present the best balance between accuracy and computational time for these datasets. MUCT, is the most challenging dataset with 625 classes, which is a very high rank in comparison to its matrix dimensions (). These conditions are beyond the recovery boundaries of S-FR methods, and even though FR-ADM accuracy is comparable to that obtained by the top method, LRR. Furthermore, FR-ADM computational performance is more than 20 times faster than LRR for this case, supporting the good accuracy-speed trade-off offered by the method.

PSCDS LRR SMR LMAFIT ROSL ROSL+ FR-ADM FR-Nys
Yale-B ce (%) 18.7% 13.8% 28.4% 18.8% 20.1% 30.4% 2.7% 2.89%
A=16128x5760 base time 5.17 64.8 351.6 2.4 274.6 7.6 6.8 0.58
=38 total time 5.17 64.8 351.6 5.6 275.5 8.7 8.8 2.5
AR ce (%) 17.2% 36.8% 39.7% 6.70% 7.17% 46.0% 6.65% 7.17%
A=4000x5760 base time 5.08 606.8 105.1 17.6 662.2 48.1 13.81 1.63
=126 total time 5.08 606.8 105.1 21.05 665.1 49.9 16.91 3.83
MUCT ce (%) 55.3% 53.4% 55.8% 56.2% 56.3% 78.4% 55.7% 62.8%
A=3755x5760 base time 76.7 3820 3995 101.2 17696 4890 85.5 67.2
=625 total time 76.7 3820 3995 190.9 17771 4977 175.2 156.3
TABLE IV: Clustering errors including time evaluation. Base time refers to the time used by the specific S-FR method, while total time refers to the time required to perform the full clustering task.

6 Conclusion and Future Work

In this paper we have proposed an efficient, stable and accurate technique, FR-ADM, to perform a robust decomposition of a corrupted matrix into its fixed-rank and sparse components. To this end we have based our algorithm on a polar factorization on a product manifold , combining key tools from manifold optimization and fast projectors. We also proposed a fast SPD projector to speed up computation, along with a proof of its validity in this context. Additionally, Nystrom’s sampling techniques have been used to further accelerate the results, achieving a linear complexity. The resulting algorithm has been tested on synthetic cases and the challenging problems of robust photometric stereo and spectral clustering, proving to be as accurate and more efficient than state-of-the-art approaches and paving the way towards large-scale problems.

Appendix: Convergence analysis of the FixedRankOptFull algorithm

In this Appendix we shall show that the minimization subproblem (9), i.e.

(13)

although highly non-convex, converges geometrically to the global minimum when optimized via the proposed FixedRankOptFull method (Algorithm 4).

0:  Data matrix , initial matrices , ,
1:  
2:  while not converged do
3:     
4:     
5:  end while
6:  return  , , such that is the TSVD of
Algorithm 4 FixedRankOptFull Algorithm

The FixedRankOptFull algorithm performs an alternating directions minimization (ADM) on each of the submanifolds , and (Algorithm 3). In each iteration it uses the algorithm FixedRankOptStep, described in Sec. 4.1, that performs a single step of the alternating directions minimization.

In Sec. 4.1 we provided the exact projectors on each of the submanifolds , and , and proved the validity of the ones corresponding to the Stiefel manifolds. For the case of the manifold, a careful analysis is required to prove its validity.

Given that , and considering and as the solutions of an OPP, then a unique solution in the SPD manifolds must exist. This solution is given in the following discussion, but we need some previous results.

Lemma 2

(see [6]) Let and be the solutions given by Algorithm 3. Suppose , then and are in .

Proof: Since , if , then and therefore , which is SPD since . A similar argument, but without any additional assumption on , since , shows that after minimizing with respect to .

Note that in Lemma 2 it is not neccesary that , only that it is invertible. Also, we conclude that .

, is in general not symmetric, although it can be written as a product of two SPD matrices, and therefore has positive eigenvalues. Even though from Lemma 

2 we have that and are in , we cannot directly prove that , but we can do it passing to the limit inside Algorithm 4. When passing to the limit the sequences defined by , and , both conditions are simultaneously met, as in Lemma 3:

Lemma 3

Suppose that the FixedRankOptFull Algorithm converges to a fixed point , then , , and .

Proof: Since , and are solutions to their respective OPPs and have to be in their respective Stiefel manifolds. Then, by applying Lemma 2, both and are in SPD. Since , we have that:

(14)

which is on SPD since it is a convex manifold. Then, by taking the square root of we have that .

Now, since the eigenvalues are continuous functions of the matrix entries, there exists such that all symmetric matrices in the open ball of radius centered at are contained in . Thus, if FixedRankOptFull converges, then there exists such that .

Let us now discuss the convergence of the FixedRankOptFull Algorithm. Given , then is the projector onto the column space of in . Note that , where . Then we have the following:

Theorem 4

If , the FixedRankOptFull algorithm converges Q-linearly to a global minimum of (9) given by such that is the unique projection of onto . The convergence is Q-linear, in the sense that and .

Proof: For each , denote by the projectors as defined before. Then it is easy to proof, using the alternative definition of , that , where . Thus the sequence of subspaces is the same as that produced by the Orthogonal Iteration [8] for the computation of the first

eigenvalues and eigenvectors of the symmetric matrix

. The Orthogonal Iteration converges Q-linearly in the sense that , with the eigenvalues of . Since , we have that . By a similar argument .

In our case, , with and is a perturbation matrix, then will be much smaller than and the error will be largely decreased in each iteration.

We would like to stress that although we do not provide an algebraic proof for due to its complexity, Lemma 3 along with the continuity of eigenvalues argument guarantee that when we are near an optimum. Starting with then for the first iteration we have , and according to Figure 1 very near the optimum, thus we can ensure that the whole sequence is in . This is not a complete proof, but Theorem 4 ensures global convergence despite the nature of . Thus at some point will be in , which is also shown to always occur in our extensive numerical experiments, even starting from random .

References

  • [1] P. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008.
  • [2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2009.
  • [3] E. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? J. of the ACM, 2011.
  • [4] W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E. Y. Chang. Parallel spectral clustering in distributed systems. IEEE PAMI, 2011.
  • [5] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1936.
  • [6] L. Eldén and H. Park. A procrustes problem on the stiefel manifold. Numerische Mathematik, 1999.
  • [7] A. Georghiades, P. Belhumeur, and D. Kriegman.

    From few to many: illumination cone models for face recognition under variable lighting and pose.

    Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2001.
  • [8] G. H. Golub and C. F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, 1996.
  • [9] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2), May 2011.
  • [10] H. Hu, Z. L. J. Feng, and J. Zhou. Smooth representation clustering. In CVPR, 2014.
  • [11] R. M. Larsen. The PROPACK suite, http://sun.stanford.edu/~rmunk/PROPACK/, (last accessed June 2014).
  • [12] Z. Lin, M. Chen, and Y. Ma. The Augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint, 2010.
  • [13] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. In Workshop on Comp. Adv. in Multi-Sensor Adapt. Processing, 2009.
  • [14] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In Proc. NIPS. 2011.
  • [15] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In Proceedings of the ICML, 2010.
  • [16] G. Liu and S. Yan. Active subspace: Toward scalable low-rank learning. J. Neural Computation, 2012.
  • [17] A. Martinez and R. Benavente. The AR face database,”. Technical Report 24, Computer Vision Center, 1998.
  • [18] G. Meyer, S. Bonnabel, and R. Sepulchre. Linear regression under fixed-rank constraints: A riemannian approach. In Proc. of the Int. Conf. on Machine Learning, 2011.
  • [19] S. Milborrow, J. Morkel, and F. Nicolls. The MUCT Landmarked Face Database. Pattern Recognition Association of South Africa, 2010. http://www.milbo.org/muct.
  • [20] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 2005.
  • [21] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision, 2001.
  • [22] P. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 1966.
  • [23] Y. Shen, Z. Wen, and Y. Zhang. Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. J. Opt. Methods Software, 2014.
  • [24] X. Shu, F. Porikli, and N. Ahuja. Robust orthonormal subspace learning: Efficient recovery of corrupted low-rank matrices. In Proc. CVPR, 2014.
  • [25] C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In Proc. NIPS, 2001.
  • [26] K. G. Woodgate. A new algorithm for the positive semi-definite procrustes problem. In Proc. Conf. on Decision and Control, 1993.
  • [27] R. J. Woodham, Y. Iwahori, and R. A. Barman. Photometric stereo: Lambertian reflectance and light sources with unknown direction and strength, 1991.
  • [28] L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang, and Y. Ma. Robust photometric stereo via low-rank matrix completion and recovery. In Proc. ACCV, 2011.
  • [29] Y. Xiong. Psbox: Photometric stereo toolbox, https://github.com/yxiong/psbox, (last accessed June 2014).
  • [30] Z. Zhou, X. Li, J. Wright, E. J. Candès, and Y. Ma. Stable principal component pursuit. CoRR, abs/1001.2363, 2010.