# Structured Low-Rank Matrix Factorization with Missing and Grossly Corrupted Observations

Recovering low-rank and sparse matrices from incomplete or corrupted observations is an important problem in machine learning, statistics, bioinformatics, computer vision, as well as signal and image processing. In theory, this problem can be solved by the natural convex joint/mixed relaxations (i.e., l_1-norm and trace norm) under certain conditions. However, all current provable algorithms suffer from superlinear per-iteration cost, which severely limits their applicability to large-scale problems. In this paper, we propose a scalable, provable structured low-rank matrix factorization method to recover low-rank and sparse matrices from missing and grossly corrupted data, i.e., robust matrix completion (RMC) problems, or incomplete and grossly corrupted measurements, i.e., compressive principal component pursuit (CPCP) problems. Specifically, we first present two small-scale matrix trace norm regularized bilinear structured factorization models for RMC and CPCP problems, in which repetitively calculating SVD of a large-scale matrix is replaced by updating two much smaller factor matrices. Then, we apply the alternating direction method of multipliers (ADMM) to efficiently solve the RMC problems. Finally, we provide the convergence analysis of our algorithm, and extend it to address general CPCP problems. Experimental results verified both the efficiency and effectiveness of our method compared with the state-of-the-art methods.

• 33 publications
• 31 publications
• 37 publications
• 44 publications
• 21 publications
03/29/2014

### Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods

Recovering matrices from compressive and grossly corrupted observations ...
09/26/2021

### Provable Low Rank Plus Sparse Matrix Separation Via Nonconvex Regularizers

This paper considers a large class of problems where we seek to recover ...
08/26/2011

### Solving Principal Component Pursuit in Linear Time via l_1 Filtering

In the past decades, exactly recovering the intrinsic data structure fro...
07/19/2014

### Tight convex relaxations for sparse matrix factorization

Based on a new atomic norm, we propose a new convex formulation for spar...
10/07/2018

### Recovering Quantized Data with Missing Information Using Bilinear Factorization and Augmented Lagrangian Method

In this paper, we propose a novel approach in order to recover a quantiz...
11/27/2018

### Bilinear Parameterization For Differentiable Rank-Regularization

Low rank approximation is a commonly occurring problem in many computer ...
02/12/2016

### Pursuits in Structured Non-Convex Matrix Factorizations

Efficiently representing real world data in a succinct and parsimonious ...

## 1 Introduction

In recent years, recovering low-rank and sparse matrices from severely incomplete or even corrupted observations has received broad attention in many different fields, such as statistics wright:cpcp ; agarwal:nmd ; chen:lrmr , bioinformatics otazo:lrsd , machine learning wright:rpca ; waters:rcs ; tao:lrsc ; liu:nnr , computer vision candes:rpca ; zhou:gb ; zheng:rma ; cabral:nnbf ; shang:rpca , signal and image processing ma:lrmc ; peng:rasl ; li:lrml ; liu:mbf ; feng:lr

. In those areas, the data to be analyzed often have high dimensionality, which brings great challenges to data analysis, such as digital photographs, surveillance videos, text and web documents. Fortunately, the high-dimensional data are observed to have low intrinsic dimension, which is often much smaller than the dimension of the ambient space

liu:lrr .

For the high-dimensional data, principal component analysis (PCA) is one of the most popular analysis tools to recover a low-rank structure of the data mainly because it is simple to implement, can be solved efficiently, and is effective in many real-world applications such as face recognition and text clustering. However, one of the main challenges faced by PCA is that the observed data is often contaminated by outliers and missing values

favaro:ssc , or is a small set of linear measurements wright:cpcp . To address these issues, many compressive sensing or rank minimization based techniques and methods have been proposed, such as robust PCA wright:rpca ; xu:rpca ; shang:rpca (RPCA, also called PCP in candes:rpca and low-rank and sparse matrix decomposition in tao:lrsc ; yuan:slmd , LRSD) and low-rank matrix completion candes:emc ; chen:lrmr .

In many applications, we have to recover a matrix from only a small number of observed entries, for example collaborative filtering for recommender systems. This problem is often called matrix completion, where missing entries or outliers are presented at arbitrary location in the measurement matrix. Matrix completion has been used in a wide range of problems such as collaborative filtering candes:emc ; chen:lrmr , structure-from-motion eriksson:lrma ; zheng:rma , click prediction yu:cp , tag recommendation wang:at , and face reconstruction meng:cwm . In some other applications, we would like to recover low-rank and sparse matrices from corrupted data. For example, the face images of a person may be corrupted by glasses or shadows lee:fr . The classical PCA cannot address the issue as its least-squares fitting is sensitive to these gross outliers. Recovering a low-rank matrix in the presence of outliers has been extensively studied, which is often called RPCA, PCP or LRSD. The RPCA problem has been successfully applied in many important applications, such as latten semantic indexing min:pcp , video surveillance wright:rpca ; candes:rpca , and image alignment peng:rasl . In some more general applications, we also have to simultaneously recover both low-rank and sparse matrices from small sets of linear measurements, which is called compressive principal component pursuit (CPCP) in wright:cpcp .

In principle, those problems mentioned above can be exactly solved with high probability under mild assumptions via a hybrid convex program involving both the

-norm and the trace norm (also called the nuclear norm) minimization. In recent years, many new techniques and algorithms candes:emc ; chen:lrmr ; candes:rpca ; wright:rpca ; xu:rpca ; wright:cpcp for low-rank matrix recovery and completion have been proposed, and the theoretical guarantees have been derived in candes:emc ; candes:rpca ; wright:cpcp

. However, those provable algorithms all exploit a closed-form expression for the proximal operator of the trace norm, which involves the singular value decomposition (SVD). Hence, they all have high computational cost and are even not applicable for solving large-scale problems.

To address this issue, we propose a scalable robust bilinear structured factorization (RBF) method to recover low-rank and sparse matrices from incomplete, corrupted data or a small set of linear measurements, which is formulated as follows:

 minL,Sf(L,S)+λ∥L∥∗,s.t.,A(L+S)=y, (1)

where is a regularization parameter, is the trace norm of a low-rank matrix , i.e., the sum of its singular values, is a sparse error matrix, is the given linear measurements, is an underdetermined linear operator such as the linear projection operator , and

denotes the loss function such as the

-norm loss or the -norm loss functions.

Unlike existing robust low-rank matrix factorization approaches, our method not only takes into account the fact that the observation is contaminated by additive outliers (Fig. 1 shows an example) or missing data, i.e., robust matrix completion chen:lrmr ; li:cs (RMC, also called RPCA plus matrix completion) problems, but can also identify both low-rank and sparse noisy components from incomplete and grossly corrupted measurements, i.e., CPCP problems. We also present a robust bilateral factorization framework for both RMC and CPCP problems, in which repetitively calculating SVD of a large matrix in candes:emc ; candes:rpca ; wright:cpcp is replaced by updating two much smaller factor matrices. We verify with convincing experimental results both the efficiency and effectiveness of our RBF method.

The main contributions of this paper are summarized as follows:

1. We propose a scalable structured RBF framework to simultaneously recover both low-rank and sparse matrices for both RMC and CPCP problems. By imposing the orthogonality constraint, we convert the original RMC and CPCP models into two smaller-scale matrix trace norm regularized problems, respectively.

2. By the fact that the optimal solution , i.e., the values of at unobserved locations are zero, we reformulate the proposed RMC problem by replacing the linear projection operator constraint with a simple equality one.

3. Moreover, we propose an efficient alternating direction method of multipliers (ADMM) to solve our RMC problems, and then extend it to address CPCP problems with a linearization technique.

4. Finally, we theoretically analyze the suboptimality of the solution produced by our algorithm.

The remainder of this paper is organized as follows. We review background and related work in Section 2. In Section 3, we propose two scalable trace norm regularized RBF models for RMC and CPCP problems. We develop an efficient ADMM algorithm for solving RMC problems and then extend it to solve CPCP problems in Section 4. We provide the theoretical analysis of our algorithm in Section 5. We report empirical results in Section 6, and conclude this paper in Section 7.

## 2 Background

A low-rank structured matrix and a sparse one can be recovered from highly corrupted measurements via the following CPCP model,

 minL,S∥S∥1+λ∥L∥∗,s.t.,PQ(D=L0+S0)=PQ(L+S), (2)

where denotes the -norm of , i.e., , is a linear subspace, and is the projection operator onto that subspace. When , the model (2) is the robust matrix completion (RMC) problem, where is the index set of observed entries. Wright et al., wright:cpcp proved the following result.

###### Theorem 1.

Let , , with , and suppose that 0 is a -incoherent matrix of rank ,

 r≤c1nμlog2m,

and sign() is i.i.d. Bernoulli-Rademacher with non-zero probability . Let be a random subspace of dimension

 dim(Q)≥C1(ρmn+mr)log2m,

distributed according to the Haar measure, probabilistically independent of sign(). Then the minimizer to the problem (2) with is unique and equal to with probability at least , where , , and are positive numerical constants.

This theorem states that a commensurately small number of measurements are sufficient to accurately recover the low-rank and sparse matrices with high probability. Indeed, if is the entire space, the model (2) degenerates to the following RPCA problem wright:rpca ; candes:rpca

 minL,S∥S∥1+λ∥L∥∗,s.t.,D=L+S, (3)

where denotes the given observations. Several algorithms have been developed to solve the convex optimization problem (3), such as IALM lin:ladmm and LRSD yuan:slmd . Although both models (2) and (3) are convex optimization problems, and their algorithms converge to the globally optimal solution, they involve SVD at each iteration and suffer from a high computational cost of . While there have been many efforts towards fast SVD computation such as partial SVD lin:ladmm and approximate SVD ma:fpca , the performance of these existing methods is still unsatisfactory for many real applications. To address this problem, we propose a scalable, provable robust bilinear factorization method with missing and grossly corrupted observations.

## 3 Our Rbf Framework

Matrix factorization is one of the most useful tools in scientific computing and high dimensional data analysis, such as the QR decomposition, the LU decomposition, SVD, and NMF. In this paper, robust bilinear factorization (RBF) aims to find two smaller low-rank matrices

and whose product is equal to the desired matrix of low-rank, ,

 L=UVT,

where is an upper bound for the rank of , i.e., .

### 3.1 RMC Model

Suppose that the observed matrix is corrupted by outliers and missing data, the RMC problem is given by

 minL,S∥S∥1+λ∥L∥∗,s.t.,PΩ(D)=PΩ(L+S). (4)

From the optimization problem (4), we easily find the optimal solution , where is the complement of , i.e., the index set of unobserved entries. Consequently, we have the following lemma.

###### Lemma 2.

The RMC model (4) with the operator is equivalent to the following problem

 minL,S∥PΩ(S)∥1+λ∥L∥∗,s.t.,PΩ(D)=PΩ(L+S)andPΩC(S)=\emph0. (5)

The proof of this lemma can be found in APPENDIX A. From the incomplete and corrupted matrix , our RBF model is to find two smaller matrices, whose product approximates , can be formulated as follows:

 minU,V,S∥PΩ(S)∥1+λ∥UVT∥∗,s.t.,PΩ(D)=PΩ(UVT+S). (6)
###### Lemma 3.

Let and be two matrices of compatible dimensions, where has orthogonal columns, i.e., , then we have .

The proof of this lemma can be found in APPENDIX B. By imposing and substituting into (6), we arrive at a much smaller-scale matrix trace norm minimization problem

 minU,V,S∥PΩ(S)∥1+λ∥V∥∗,s.t.,PΩ(D)=PΩ(UVT+S),UTU=I. (7)
###### Theorem 4.

Suppose is a solution of the problem (5) with , then there exists the solution , and to the problem (7) with and , is also a solution to the problem (5).

The proof of this theorem can be found in APPENDIX C.

### 3.2 CPCP Model

From a small set of linear measurements , the CPCP problem is to recover low-rank and sparse matrices as follows,

 minU,V,S∥S∥1+λ∥V∥∗,s.t.,PQ(D)=PQ(UVT+S). (8)
###### Theorem 5.

Suppose is a solution of the problem (2) with , then there exists the solution , and to the problem (8) with , is also a solution to the problem (2).

We omit the proof of this theorem since it is very similar to that of Theorem 4. In the following, we will discuss how to solve the models (7) and (8). It is worth noting that the RPCA problem can be viewed as a special case of the RMC problem (7) when all entries of the corrupted matrix are directly observed. In the next section, we will mainly develop an efficient alternating direction method of multipliers (ADMM) solver for solving the non-convex problem (7

). It is also worth noting that although our algorithm will produce different estimations of

and , the estimation of is stable as guaranteed by Theorems 4 and 5, and the convexity of the problems (2) and (4).

### 3.3 Connections to Existing Approaches

According to the discussion above, it is clear that our RBF method is a scalable method for both RMC and CPCP problems. Compared with convex algorithms such as common RPCA candes:rpca and CPCP wright:cpcp methods, which have a computational complexity of and are impractical for solving relatively large-scale problems, our RBF method has a linear complexity and scales well to handle large-scale problems.

To understand better the superiority of our RBF method, we compare and relate RBF with several popular robust low-rank matrix factorization methods. It is clear that the model in shen:alad ; zhou:gb ; meng:cwm is a special case of our trace norm regularized model (7), while . Moreover, the models used in zheng:rma ; cabral:nnbf focus only on the desired low-rank matrix. In this sense, they can be viewed as special cases of our model (7). The other major difference is that SVD is used in zheng:rma , while QR factorizations are used in this paper. The use of QR factorizations also makes the update operation highly scalable on modern parallel architectures avron:ssgd . Regarding the complexity, it is clear that both schemes have the similar theory computational complexity. However, from the experimental results in Section 6, we can see that our algorithm usually runs much faster, but more accurate than the methods in zheng:rma ; cabral:nnbf . The following bilinear spectral regularized matrix factorization formulation in cabral:nnbf is one of the most similar models to our model (7),

 minL,U,V∥W⊙(D−L)∥1+λ2(∥U∥2F+∥V∥2F),s.t.,L=UVT,

where denotes the Hadamard product and is a weight matrix that can be used to denote missing data (i.e., ).

## 4 Optimization Algorithm

In this section, we propose an efficient alternating direction method of multipliers (ADMM) for solving the RMC problem (7), and then extend it for solving the CPCP problem (8). We provide the convergence analysis of our algorithm in Section 5.

### 4.1 Formulation

Recently, it has been shown in the literature boyd:admm ; wen:nsor that ADMM is very efficient for some convex or non-convex programming problems from various applications. We also refer to a recent survey boyd:admm for some recently exploited applications of ADMM. For efficiently solving the RMC problem (7), we can assume without loss of generality that the unknown entries of are simply set as zeros, i.e., =0, and may be any values such that . Therefore, the constraint with the operator in (7) is simplified into . Hence, we introduce the constraint into (7), and obtain the following equivalent form:

 minU,V,S∥PΩ(S)∥1+λ∥V∥∗,s.t.,D=UVT+S,UTU=I. (9)

The partial augmented Lagrangian function of (9) is

 Lα(U,V,S,Y)=λ∥V∥∗+∥PΩ(S)∥1+⟨Y,D−S−UVT⟩+α2∥D−S−UVT∥2F, (10)

where is a matrix of Lagrange multipliers, is a penalty parameter, and denotes the inner product between matrices and of equal sizes, i.e., .

### 4.2 Robust Bilateral Factorization Scheme

We will derive our scheme for solving the following subproblems with respect to , and , respectively,

 Uk+1=argminU∈Rm×dLαk(U,Vk,Sk,Yk),s.t.,UTU=I, (11)
 Vk+1=argminV∈Rn×dLαk(Uk+1,V,Sk,Yk), (12)
 Sk+1=argminS∈Rm×nLαk(Uk+1,Vk+1,S,Yk). (13)

#### 4.2.1 Updating U

Fixing and at their latest values, and by removing the terms that do not depend on and adding some proper terms that do not depend on , the problem (11) with respect to is reformulated as follows:

 minU∥UVTk−Pk∥2F,s.t.,UTU=I, (14)

where . In fact, the optimal solution can be given by the SVD of the matrix as in nick:mpp . To further speed-up the calculation, we introduce the idea in wen:nsor that uses a QR decomposition instead of SVD. The resulting iteration step is formulated as follows:

 Uk+1=Q,QR(PkVk)=QR, (15)

where is an orthogonal basis for the range space , i.e., . Although in (15) is not an optimal solution to (14), our iterative scheme and the one in liu:as are equivalent to solve (14) and (16), and their equivalent analysis is provided in Section 5. Moreover, the use of QR factorizations also makes our update scheme highly scalable on modern parallel architectures avron:ssgd .

#### 4.2.2 Updating V

Fixing and , the optimization problem (12) with respect to can be rewritten as:

 minVαk2∥Uk+1VT−Pk∥2F+λ∥V∥∗. (16)

To solve the convex problem (16), we first introduce the following definition cai:svt .

###### Definition 1.

For any given matrix whose rank is , and , the singular value thresholding (SVT) operator is defined as follows:

 SVTμ(M)=¯¯¯¯Udiag(max{σ−μ,0})¯VT,

where should be understood element-wise, , and are obtained by SVD of , i.e., .

###### Theorem 6.

The trace norm minimization problem (16) has a closed-form solution given by:

 Vk+1=\emphSVTλ/αk(PTkUk+1). (17)
###### Proof.

The first-order optimality condition for (16) is given by

 0∈λ∂∥V∥∗+αk(VUTk+1−PTk)Uk+1,

where is the set of subgradients of the trace norm. Since , the optimality condition for (16) is rewritten as follows:

 0∈λ∂∥V∥∗+αk(V−PTkUk+1). (18)

(18) is also the optimality condition for the following convex problem,

 minVαk2∥V−PTkUk+1∥2F+λ∥V∥∗. (19)

By Theorem 2.1 in cai:svt , then the optimal solution of (19) is given by (17). ∎

#### 4.2.3 Updating S

Fixing and , we can update by solving

 minS∥PΩ(S)∥1+αk2∥S+Uk+1VTk+1−D−Yk/αk∥2F. (20)

For solving the problem (20), we introduce the following soft-thresholding operator :

 Sτ(Aij):=⎧⎪⎨⎪⎩Aij−τ,Aij>τ,Aij+τ,Aij<−τ,0,otherwise.

Then the optimal solution can be obtained by solving the following two subproblems with respect to and , respectively. The optimization problem with respect to is first formulated as follows:

 minSΩαk2∥PΩ(S+Uk+1VTk+1−D−Yk/αk)∥2F+∥PΩ(S)∥1. (21)

By the operator and letting , the closed-form solution to the problem (21) is given by

 (Sk+1)Ω=Sτ((D−Uk+1VTk+1+Yk/αk)Ω). (22)

Moreover, the subproblem with respect to is formulated as follows:

 minSΩC∥PΩC(S+Uk+1VTk+1−D−Yk/αk)∥2F. (23)

We can easily obtain the closed-form solution by zeroing the gradient of the cost function (23) with respect to , i.e.,

 (Sk+1)ΩC=(D−Uk+1VTk+1+Yk/αk)ΩC. (24)

Summarizing the analysis above, we obtain an ADMM scheme to solve the RMC problem (7), as outlined in Algorithm 1. Our algorithm is essentially a Gauss-Seidel-type scheme of ADMM, and the update strategy of the Jacobi version of ADMM is easily implemented, well suited for parallel and distributed computing and hence is particularly attractive for solving large-scale problems shang:hotd . In addition, should be set to 0 for the expected output . This algorithm can also be accelerated by adaptively changing . An efficient strategy lin:ladmm is to let (the initialization in Algorithm 1) and increase iteratively by , where in general and is a small constant. Furthermore, is initialized with . Algorithm 1 is easily used to solve the RPCA problem (3), where all entries of the corrupted matrix are directly observed. Moreover, we introduce an adaptive rank adjusting strategy for our algorithm in Section 4.4.

### 4.3 Extension for CPCP

Algorithm 1 can be extended to solve the CPCP problem (8) with the complex operator , as outlined in Algorithm 2, which is to optimize the following augmented Lagrange function

 Fα(U,V,S,Y)=λ∥V∥∗+∥S∥1+⟨Y,y−PQ(S+UVT)⟩+α2∥y−PQ(S+UVT)∥22. (25)

We minimize with respect to using a recently proposed linearization technique yang:lalad , which resolves such problems where is not the identity operator. Specifically, for updating and , let and , then is approximated by

 g(T)≈g(Tk)+⟨∇g(Tk),T−Tk⟩+τ∥T−Tk∥2F, (26)

where , is the adjoint operator of , and is chosen as as in yang:lalad , and the spectral norm of a matrix, i.e., the largest singular value of a matrix.

Similarly, for updating , let and , then is approximated by

 h(S)≈h(Sk)+⟨∇h(Sk),S−Sk⟩+τ∥S−Sk∥2F, (27)

where .

### 4.4 Stopping Criteria and Rank Adjusting Strategy

As the stopping criteria for terminating our RBF algorithms, we employ some gap criteria; that is, we stop Algorithm 1 when the current gap is satisfied as a given tolerance , i.e., , and run Algorithm 2 when .

In Algorithms 1 and 2, is one of the most important parameters. If is too small, it can cause underfitting and a large estimation error; but if is too large, it can cause overfitting and large deviation to the underlying low-rank matrix . Fortunately, several works keshavan:mc ; wen:nsor have provided some matrix rank estimation strategies to compute a good value for the rank of the involved matrices. Thus, we only set a relatively large integer such that . In addition, we provide a scheme to dynamically adjust the rank parameter . Our scheme starts from an overestimated input, i.e., . Following keshavan:mc and shen:alad , we decease it aggressively once a dramatic change in the estimated rank of the product

is detected based on the eigenvalue decomposition which usually occurs after a few iterations. Specifically, we calculate the eigenvalues of

, which are assumed to be ordered as . Since the product and have the same nonzero eigenvalues, it is much more efficient to compute the eigenvalues of the product . Then we compute the quotient sequence . Suppose . If the condition

 gap=(d−1)¯λ^r∑i≠^r¯λi≥10,

is satisfied, which means a “big” jump between and , then we reduce to , and this adjustment is performed only once.

## 5 Theoretical Analysis and Applications

In this section, we will present several theoretical properties of Algorithm 1. First, we provide the equivalent relationship analysis for our iterative solving scheme and the one in liu:as , as shown by the following theorem.

###### Theorem 7.

Let be the solution of the subproblems (11), (12) and (13) at the k-th iteration, respectively, , and be generated by Algorithm 1 at the k-th iteration, . Then

1. such that and .

2. , , , and .

Remark: The proof of this theorem can be found in APPENDIX D. Since the Lagrange function (10) is determined by the product , , and , the different values of and are essentially equivalent as long as the same product and . Meanwhile, our scheme replaces SVD by the QR decomposition, and can avoid the SVD computation for solving the optimization problem with the orthogonal constraint.

### 5.1 Convergence Analysis

The global convergence of our derived algorithm is guaranteed, as shown in the following lemmas and theorems.

###### Lemma 8.

Let be a sequence generated by Algorithm 1, then we have the following conclusions:

1. approaches to a feasible solution, i.e., .

2. Both sequences and are Cauchy sequences.

The detailed proofs of this lemma, the following lemma and theorems can be found in APPENDIX E. Lemma 8 ensures only that the feasibility of each solution has been assessed. In this paper, we want to show that it is possible to prove the local optimality of the solution produced by Algorithm 1. Let be the number of iterations needed by Algorithm 1 to stop, and be defined by

 U∗=Uk∗+1,V∗=Vk∗+1,S∗=Sk∗+1.

In addition, let (resp. ) denote the Lagrange multiplier (resp. ) associated with , i.e., , where .

###### Lemma 9.

For the solution generated by Algorithm 1, then we have the following conclusion:

 ∥PΩ(S)∥1+λ∥V∥∗≥∥PΩ(S∗)∥1+λ∥V∗∥∗+⟨ˆY∗−Y∗,UVT−U∗(V∗)T⟩−mnε

holds for any feasible solution to (9).

To reach the global optimality of (9) based on the above lemma, it is required to show that the term diminishes. Since

 ∥Y∗−ˆY∗∥2≤√mn∥Y∗−ˆY∗∥∞,

and by Lemma 13 (Please see APPENDIX E), we have

 ∥Y∗−ˆY∗∥∞=∥PΩ(Y∗)−ˆY∗∥∞≤∥PΩ(Y∗)∥∞+∥ˆY∗∥∞≤1+λ,

which means that is bounded. By setting the parameter to be relatively small as in liu:as , is small, which means that is also small. Let , then we have the following theorems.

###### Theorem 10.

Let be the globally optimal objective function value of (9), and be the objective function value generated by Algorithm 1. We have that

 f∗≤fg+c1ε1+mnε,

where is a constant defined by

 c1=mnλ∥PΩ(D)∥F(ρ(1+ρ)ρ−1+12ρk∗)+∥PΩ(D)∥1λ.
###### Theorem 11.

Suppose is an optimal solution to the RMC problem (5), and . Let be the objective function value generated by Algorithm 1 with parameter . We have that

 f0≤f∗≤f0+c1ε1+mnε+(√mn−λ)σd+1max(r−d,0),

where are the singular values of .

Since the rank parameter is set to be higher than the rank of the optimal solution to the RMC problem (5), i.e., , Theorem 11 directly concludes that

 f0≤f∗≤f0+c1ε1+mnε.

Moreover, the value of can be set to be arbitrarily small, and the second term involving diminishes. Hence, for the solution generated by Algorithm 1, a solution to the RMC problem (5) can be achieved by computing .

### 5.2 Complexity Analysis

We also discuss the time complexity of our RBF algorithm. For the RMC problem (7), the main running time of our RBF algorithm is consumed by performing SVD on the small matrix of size , the QR decomposition of the matrix , and some matrix multiplications. In (17), the time complexity of performing SVD is . The time complexity of QR decomposition and matrix multiplications is . The total time complexity of our RBF algorithm for solving both problems (3) and (7) is (usually ), where is the number of iterations.

### 5.3 Applications of Matrix Completion

As our RBF framework introduced for robust matrix factorization is general, there are many possible extensions of our methodology. In this section, we outline a novel result and methodology for one extension we consider most important: low-rank matrix completion. The space limit refrains us from fully describing each development, but we try to give readers enough details to understand and use each of these applications.

By introducing an auxiliary variable , the low-rank matrix completion problem can be written into the following form,

 minU,V,L12∥PΩ(D)−PΩ(L)∥2F+λ∥V∥∗,s.t.,L=UVT,UTU=I. (28)

Similar to Algorithm 1, we can present an efficient ADMM scheme to solve the matrix completion problem (28). This algorithm can also be easily used to solve the low-rank matrix factorization problem, where all entries of the given matrix are observed.

## 6 Experimental Evaluation

We now evaluate the effectiveness and efficiency of our RBF method for RMC and CPCP problems such as text removal, background modeling, face reconstruction, and collaborative filtering. We ran experiments on an Intel(R) Core (TM) i5-4570 (3.20 GHz) PC running Windows 7 with 8GB main memory.

### 6.1 Text Removal

We first conduct an experiment by considering a simulated task on artificially generated data, whose goal is to remove some generated text from an image. The ground-truth image is of size

with rank equal to 10 for the data matrix. we then add to the image a short phase in text form which plays the role of outliers. Fig. 2 shows the image together with the clean image and outliers mask. For fairness, we set the rank of all the algorithms to 20, which is two times the true rank of the underlying matrix. The input data are generated by setting 30% of the randomly selected pixels of the image as missing entries. We compare our RBF method with the state-of-the-art methods, including PCP candes:rpca , SpaRCS waters:rcs , RegL1 zheng:rma and BF-ALM cabral:nnbf . We set the regularization parameter for RegL1 and RBF, and the stopping tolerance for all algorithms in this section.

The results obtained by different methods are visually shown in Fig. 3, where the outlier detection accuracy (the score Area Under the receiver operating characteristic Curve, AUC) and the error of low-rank component recovery (i.e.,

, where and denote the ground-truth image matrix and the recovered image matrix, respectively) are also presented. As far as low-rank matrix recovery is concerned, these RMC methods including SpaRCS, RegL1, BF-ALM and RBF, outperform PCP, not only visually but also quantitatively. For outlier detection, it can be seen that our RBF method significantly performs better than the other methods. In short, RBF significantly outperforms PCP, RegL1, BF-ALM and SpaRCS in terms of both low-rank matrix recovery and spare outlier identification. Moreover, the running time of PCP, SpaRCS, RegL1, BF-ALM and RBF is 36.25sec, 15.68sec, 26.85sec, 6.36sec and 0.87sec, respectively.

We further evaluate the robustness of our RBF method with respect to the regularization parameter and the given rank variations. We conduct some experiments on the artificially generated data, and illustrate the outlier detection accuracy (AUC) and the error (Error) of low-rank component recovery of PCP, SpaRCS, RegL1 and our RBF method, where the given rank of SpaRCS, RegL1 and our RBF method is chosen from , and the regularization parameter of PCP, RegL1 and RBF is chosen from the grid . Notice that because BF-ALM and RegL1 achieve very similar results, we do not provide the results of the former in the following. The average AUC and Error results of 10 independent runs are shown in Figs. 4 and 5, from which we can see that our RBF method performs much more robust than SpaRCS and RegL1 with respect to the given rank. Moreover, our RBF method is much more robust than PCP and RegL1 against the regularization parameter .

### 6.2 Background Modeling

In this experiment we test our RBF method on real surveillance videos for object detection and background subtraction as a RPCA plus matrix completion problem. Background modeling is a crucial task for motion segmentation in surveillance videos. A video sequence satisfies the low-rank and sparse structures, because the background of all the frames is controlled by few factors and hence exhibits low-rank property, and the foreground is detected by identifying spatially localized sparse residuals wright:rpca ; candes:rpca ; wang:wvs . We test our RBF method on four color surveillance videos: Bootstrap, Lobby, Hall and Mall databases. The data matrix D consists of the first 400 frames of size

. Since all the original videos have colors, we first reshape every frame of the video into a long column vector and then collect all the columns into a data matrix

D with size of . Moreover, the input data is generated by setting 10% of the randomly selected pixels of each frame as missing entries.

Fig. 6 illustrates the background extraction results on the Bootstrap data set, where the first and fourth columns represent the input images with missing data, the second and fifth columns show the low-rank recoveries, and the third and sixth columns show the sparse components. It is clear that the background can be effectively extracted by our RBF method, RegL1 and GRASTA he:online . Notice that SpaRCS could not yield experimental results on these databases because they ran out of memory. Moreover, we can see that the decomposition results of our RBF method, especially the recovered low-rank components, are slightly better than that of RegL1 and GRASTA. We also report the running time in Table 1, from which we can see that RBF is more than 3 times faster than GRASTA and more than 2 times faster than RegL1. This further shows that our RBF method has very good scalability and can address large-scale problems.

### 6.3 Face Reconstruction

We also test our RBF method for the face reconstruction problems with the incomplete and corrupted face data or a small set of linear measurements as in wright:cpcp , respectively. The face database used here is a part of Extended Yale Face Database B lee:fr with the large corruptions. The face images can often be decomposed as a low-rank part, capturing the face appearances under different illuminations, and a sparse component, representing varying illumination conditions and heavily “shadows”. The resolution of all images is and the pixel values are normalized to [0, 1], then the pixel values are used to form data vectors of dimension 32,256. The input data are generated by setting 40% of the randomly selected pixels of each image as missing entries.

Fig. 7 shows some original and reconstructed images by RBF, PCP, RegL1 and CWM meng:cwm , where the average computational time of all these algorithms on each people’s faces is presented. It can be observed that RBF performs better than the other methods not only visually but also efficiently, and effectively eliminates the heavy noise and “shadows” and simultaneously completes the missing entries. In other words, RBF can achieve the latent features underlying the original images regardless of the observed data corrupted by outliers or missing values.

Moreover, we implement a challenging problem to recover face images from incomplete line measurements. Considering the computational burden of the projection operator , we resize the original images into and normalize the raw pixel values to form data vectors of dimension 2016. Following wright:cpcp , the input data is , where is a subspace generated randomly with the dimension .

Fig. 8 illustrates some reconstructed images by CPCP wright:cpcp and RBF, respectively. It is clear that both CPCP and RBF effectively remove “shadows” from faces images and simultaneously successfully recover both low-rank and sparse components from the reduced measurements.

### 6.4 Collaborative Filtering

Collaborative filtering is a technique used by some recommender systems liu:cf ; lee:cf . One of the main purposes is to predict the unknown preference of a user on a set of unrated items, according to other similar users or similar items. In order to evaluate our RBF method, some matrix completion experiments are conducted on three widely used recommendation system data sets: MovieLens100K with 100K ratings, MovieLens1M (ML-1M) with 1M ratings and MovieLens10M (ML-10M) with 10M ratings. We randomly split these data sets to training and testing sets such that the ratio of the training set to testing set is 9:1, and the experimental results are reported over 10 independent runs. We also compare our RBF method with APG toh:apg

, Soft-Impute

mazumder:sr , OptSpace keshavan:mc and LMaFit wen:nsor , and two state-of-the-art manifold optimization methods: ScGrass ngo:gm and RTRMC10