# A Greedy Algorithm for Matrix Recovery with Subspace Prior Information

Matrix recovery is the problem of recovering a low-rank matrix from a few linear measurements. Recently, this problem has gained a lot of attention as it is employed in many applications such as Netflix prize problem, seismic data interpolation and collaborative filtering. In these applications, one might access to additional prior information about the column and row spaces of the matrix. These extra information can potentially enhance the matrix recovery performance. In this paper, we propose an efficient greedy algorithm that exploits prior information in the recovery procedure. The performance of the proposed algorithm is measured in terms of the rank restricted isometry property (R-RIP). Our proposed algorithm with prior subspace information converges under a more milder condition on the R-RIP in compared with the case that we do not use prior information. Additionally, our algorithm performs much better than nuclear norm minimization in terms of both computational complexity and success rate.

## Authors

• 1 publication
• 14 publications
• 17 publications
09/27/2018

### Optimal Exploitation of Subspace Prior Information in Matrix Sensing

Matrix sensing is the problem of reconstructing a low-rank matrix from a...
05/21/2020

### Multi-weight Nuclear Norm Minimization for Low-rank Matrix Recovery in Presence of Subspace Prior Information

Weighted nuclear norm minimization has been recently recognized as a tec...
05/23/2015

### Low-Rank Matrix Recovery from Row-and-Column Affine Measurements

We propose and study a row-and-column affine measurement scheme for low-...
12/05/2014

### Two step recovery of jointly sparse and low-rank matrices: theoretical guarantees

We introduce a two step algorithm with theoretical guarantees to recover...
10/30/2021

### Multi-weight Matrix Completion with Arbitrary Subspace Prior Information

Matrix completion refers to completing a low-rank matrix from a few obse...
09/23/2021

### Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

We study the robust recovery of a low-rank matrix from sparsely and gros...
10/09/2019

### Subspace Estimation from Unbalanced and Incomplete Data Matrices: ℓ_2,∞ Statistical Guarantees

This paper is concerned with estimating the column space of an unknown l...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The problem of recovering a low-rank matrix from undersampled measurements arises in many applications such as MRI[haldar2010spatiotemporal],[zhao2010low], collaborative filtering [srebro2010collaborative], Netflix [bennett2007netflix], exploration seismology [aravkin2014fast], and quantum state tomography[gross2010quantum].

An idealistic approach to solve this problem is to consider the problem111In this work, we consider only square matrices. However, the extension to non-square matrices is straightforward.

 minZ∈Rn×n rank(Z) s.t. y=A(Z), (1)

which is known to be NP-hard in general [recht2010guaranteed]. Here is a linear operator, and for measurement. Then problem (I) can be relaxed and converted to the following tractable convex problem:

 minZ∈Rn×n ∥Z∥∗ s.t. y=A(Z), (2)

where sums the singular values of a matrix and is called nuclear norm. There are many algorithms based on SVD, truncated SVD, and greedy methods for matrix recovery and weighted matrix recovery (see Section I-B for more explanations); however these algorithms are not capable to exploit prior subspace information. In this paper, we propose an algorithm that uses prior information for low-rank matrix recovery based on CoSaMP 222Compressive Sampling Matching Pursuit[needell2009cosamp]. Before introducing our algorithm for matrix recovery, it is necessary to know how to incorporate prior information into matrix recovery problem. Let and be the column and row space of the ground-truth matrix with rank , respectively. Suppose that we are given two subspaces and that form the principal angles333For the definition of principal angles between subspaces, see for example[daei2018optimal, Section II].

 θu=∠[Ur,˜Ur], θv=∠[Vr,˜Vr],

with and , respectively. We also define the subspaces and as the orthogonal complements of and , respectively. Then, the following problem is proposed for capturing both low-rankness and subspace prior information:

 minZ∈Rn×n rank(Q˜UrZQ˜Vr) s.t. y=A(Z), (3)

where

 Q˜Ur:=˜UrW1˜UHr+˜U⊥rW2˜U⊥Hr Q˜Vr:=˜VrW3˜VHr+˜V⊥rW4˜V⊥Hr, (4)

and are diagonal matrices. Here, and are some bases of the subspaces and , respectively (the same argument holds for and ). The diagonal entries of s specify how much the corresponding principal angles shall be penalized in the minimization problem. For instance, if is close to and far from the , then the values on the diagonal of are small while the values on the diagonal of shall be large.

### I-a Contributions

To highlight our contributions, we list them below:

1. Proposing a new optimization problem for matrix recovery and completion. As mentioned in (I), we design a novel optimization problem that encourages both rank and subspace information. The problem uses the principal angles between a given subspace and the matrix subspace.

2. Proposing a greedy-based algorithm for matrix recovery and completion. We propose new and efficient greedy-based algorithms named rank minimization with subspace prior information (RMSPI) and generalized RMSPI (GRMSPI) to solve (I). While RMSPI penalizes the principal angles with a single weight, GRMSPI is designed for a multi-weight scenario where each principal angle is penalized separately.

3. Convergence guaranty. We prove convergence guarantee results for RMSPI and GRMSPI. The convergence rate of our proposed algorithm is superior to that of [lee2010admira].

4. We present a performance guarantee for RMSPI and GRMSPI in terms of rank-restricted isometry property (R-RIP) given in Section III. Simulation results show that even when the measurement operator does not satisfy the R-RIP constraint (for example in matrix completion problem), our proposed algorithms are still capable of recovering the interested matrix exactly.

5. We examine our algorithms in presence of noisy measurements and numerically observe that it is robust to measurement noise.

### I-B Related Works and Key Differences

Recht et al. in [recht2010guaranteed] convert problem (I) to the relaxed form in (I). Following [recht2010guaranteed], Cai et al. in [cai2010singular] develop an algorithm for solving the problem

 minZ∈Rn×n∥A(Z)−y∥22+λ∥Z∥∗ (5)

which is a regularized version of (I). Specifically, they provide an iterative algorithm based on a soft singular value thresholding (SVT). Besides the fact that the algorithm is designed for the noiseless case, the computation cost is rather low, yet it lacks any convergence rate analysis.

Ji et al. in [ji2009accelerated], Liu et al. in [liu2012implementable], and Toh et al. in[toh2010accelerated] independently provide algorithms based on gradient method to solve the problem (5). However, the purpose of these papers is to improve SVT and to reduce the number of required iterations for matrix recovery.

Mazumder et al. in [mazumder2010spectral] propose a convex algorithm for minimizing the error in each iteration under the condition that the nuclear norm is bounded. This algorithm requires taking SVD in each step which is costly.

Ma et al. in [ma2011fixed] propose an iterative algorithm for minimizing nuclear norm by using the Bregman divergence and fixed point. This algorithm is very fast and powerful, yet suffers lack of convergence rate analysis. Also, it is only useful in the absence of noise.

In [mishra2013low], [wen2010low], and [recht2013parallel], the authors propose methods based on approximating the nuclear norm by using its variational features.

The above-mentioned methods need calculating SVD which is expensive in general. Henceforth, in [wang2015orthogonal] and [lee2010admira], the authors propose algorithms based on greedy methods. More explicitly, they extend the well-studied methods OMP444Orthogonal Matching Pursuit[tropp2007signal] and CoSaMP [needell2009cosamp] to the matrix case. These methods are observed to be more efficient than relaxation-based algorithm (e.g. nuclear norm minimization).

Despite the effectiveness in matrix recovery problem, only few works consider prior information [jain2013provable], [xu2013speedup], [eftekhari2018weighted]. Specifically, the common feature of these algorithms is to use prior information in such a way that the number of required measurements is minimized.

The authors in [jain2013provable], [xu2013speedup], employ side information to enhance the performance of nuclear norm minimization. Their side information was to completely know a few directions of the matrix subspace and differs from knowing the principal angles that we consider.

Aravkin et al. in [aravkin2014fast] and Eftekhari et al. in [eftekhari2018weighted] incorporate prior knowledge about the matrix column and row spaces into the recovery procedure. They consider the maximum principal angle between a given (e.g. ) and the ground-truth subspace (e.g. ). They show that as long as the given subspace is close to the interested one, the required number of measurements decreases compared to the regular nuclear norm minimization. Our model in (I) differs from the ones in [aravkin2014fast] and [eftekhari2018weighted] in that we penalize the given subspace with multiple weights instead of a single weight. Also, our model outperforms theirs when is either far or close to . Besides the generality of our model, RMSPI and GRMSPI are considerably superior to the ones in [aravkin2014fast, eftekhari2018weighted] in terms of computational complexity (see Section IV).

The authors in [mohan2010reweighted], propose an alternative for rank minimization:

 minZ∈Rn×nlog(det(Z)) Z∈C. (6)

where is the feasible set. In each iteration (e.g. th) of the algorithm solves

 minZ∈Rn×n ∥Wk1ZWk2∥ Z∈C, (7)

where and are some weighting matrices.

Finally, [rao2015collaborative] solves

 minZ∈Rn×n1m∥A(Z)−y∥2+λN∥AZB∥∗, (8)

in which and are some certain invertible matrices related to the interested subspace. This problem is also similar to [ji2009accelerated].

### I-C Outline and Notations

The paper is organized as follows: problem formulation and the proposed algorithm are given in Section II. In Section III, we explain the main result regarding the convergence rate of our algorithm. We compare our proposed methods with the state of the art algorithms in Section IV.

Throughout the paper, scalars are denoted by lowercase letters, vectors by lowercase boldface letters, and matrices by uppercase letters. The

th element of the vector is shown by or . The operators and are used to denote the trace and Hermitian of a matrix, respectively. represents the pseudo-inverse operator. The Frobenius inner product is denoted by . The orthogonal projection matrices onto the subspaces and are shown by

 PU:=UUH,

and

 PU⊥:=I−PU,

where is a basis fo r the subspace and

is the identity matrix. Also define the support of

by the linear subspace

 :=supp(X)

We define the linear operator as

 AX=[⟨X,A1⟩F,…,⟨X,Ap⟩F]T

for appropriate . Also, the adjoint operator of is defined as . is identity linear operator i.e. .

## Ii Algorithm Formulation

Let be a matrix with singular value decomposition (SVD) . Here, and are orthonormal bases. For an integer , we might have a rank truncation of as where and are obtained by retaining columns of and corresponding to the largest singular values i.e. and . We also denote the residual by . Another decomposition that we employ in this paper is the atomic decomposition.

###### Definition 1.

Let us denote an orthonormal set of rank-1 matrices in by . We define the smallest set of rank-1 matrices in that spans as

 atoms(X)=argminX{|Ψ|:Ψ⊂O,X∈span(Ψ)}, (9)

where and is the indicator function and returns the cardinality of a set.

As stated earlier, we have access to some prior information about the column and row spaces of . More explicitly, we are given the subspaces and that form the principal angles

 θu=∠[Ur,˜Ur], θv=∠[Vr,˜Vr],

with the column and row spaces of , respectively.

Diagonal values of s are supposed to be in the range of . Notice that and form distinct angles with each other. The directions corresponding to the principal angles i.e. are determined based on the angles i.e. if the angles are small, the corresponding directions shall be penalized less and vice versa. However, the diagonal values of and corresponding to other directions (not having effective role on the principal angles) are set to . We propose algorithms called RMSPI and GRMSPI to exploit this beneficial subspace information. Our proposed algorithms have very low computational cost. Indeed, our aim is to solve the following problem:

 ˆXrec:=Q−1˜UrargminZ∈Rn×nrank(Z)Q−1˜Vr s.t. y=A(Q−1˜UrZQ−1˜Vr):=B(Z). (10)

This problem is mathematically equivalent to (I).

In the first step of our greedy algorithms, we maximize the correlation between the residual matrix in each iteration and the atoms to update an estimate for the support of the true matrix i.e.

. To do so, we maximize the norm of the projection of the residual matrix over all the subspaces i.e.

 maxψ∈O{∥PψA∗(y−AˆXrec)∥F} =maxψ∈O|⟨(y−AˆXrec),Aψ⟩|,

to reach the new rank- matrix with support

 ´Ψ←argmaxΨ⊂O{∥PψA∗(y−AˆXrec)∥F:|Ψ|≤r}. (11)

We have provided the pseudo code of the proposed RMSPI and GRMSPI in Algorithm 1. The difference between RMSPI and GRMSPI is in choosing the matrices and . RMSPI uses (III) for and while GRMSPI uses (I). Given the principal angles, the optimal choice of weights in and

(either in RMSPI or GRMSPI) is challenging and beyond the scope of this paper. We, however, find the weights heuristically. In other words, we use the discussion following (

I).

The approach of our algorithm is based on the greedy method used in the vector case such as CoSaMP. At the beginning of each iteration (step 4), we obtain a set of atoms or support with size according to (11). In step 6, by solving the least squares problem, a good approximation of with rank and known support estimate is obtained; for more details of least squares method see Appendix A. The support in step 5 is obtained by concatenating the support estimate in step 4 and the support estimates in the previous iteration. In the final step, since our matrix has to be of rank , we retain only the directions corresponding to the largest correlations in step 7. In the final step, we use the least square solution to find the corresponding singular values.

## Iii Main Results

In this section, we investigate convergence guarantees for RMSPI and GRMSPI. We also compare our proposed algorithms with ADMiRA [lee2010admira].

Before stating our main theorem about convergence, we should emphasize that our proofs of convergence are completely different from [lee2010admira] which does not use subspace prior information. Similar to section II, we use the equivalent problem (II) instead of (I). First, we define the well-studied restricted isometry property (RIP) in the following.

###### Definition 2.

A linear operator satisfies RIP condition with constant if

 (1−δr(A))∥X∥2F≤∥AX∥22≤(1+δr(A))∥X∥2F (12)

holds for every that [foucart2017mathematical, Section 9.12].

In what follows, we include the relation between RIP constants of and .

###### Lemma 1.

Let and be defined as III and I Consider the operator

 B(⋅):=A(Q−1˜Ur⋅Q−1˜Vr). (13)

Then, we have:

 δ(A)≤δ(B). (14)

Proof. See Appendix B.

###### Theorem 1.

Let and be the ground-truth matrix and the solution of (II), respectively. Then, if satisfies

 δ4r(B)< ⎷√113−14≈0.4782,

we have that

 ∥(Xr−ˆXkrec)∥F≤ρk∥(Xr−X0)∥F + ⎷2(1+3δ24r(B))1−δ24r(B)∥PΦΔ´ΨB∗(´e)∥F +21−δ4r(B)∥P˜ΨB(´e)∥F,

where and .

Proof. See Appendix C.

Theorem 1 has some consequences which are included in the following remarks.

###### Remark 1.

In ADMiRA, it is proved that if then

 ∥(Xr−ˆXkrec)∥F≤2−k∥(Xr)∥F.

However, by Theorem 1, we find that if , then

 ∥(Xr−ˆXkrec)∥F≤0.05k∥(Xr)∥F,

which verifies the superiority of our algorithm in terms of convergence rate.

###### Remark 2.

With a fixed convergence rate, the RIP constant of our operator which is equipped with subspace prior information is more conservative than that in ADMiRA. For example, to reach a convergence rate of , in ADMiRA must be less than while in our algorithm, we need by Lemma 1. This in turn implies that the convergence rate of our algorithm is more than that in ADMiRA.

In some applications, we do not have access to all angles between and , and we are only aware of the maximum angle. Thus, we have to penalize the subspace or with only one weight. To follow this practical model, we occupationally use

 Q˜Ur:=w1P˜Ur+w2P˜U⊥r Q˜Vr:=w3P˜Vr+w4P˜V⊥r, (15)

in Algorithm 1 which we call RMSPI. This model might ignore the exact penalizations. This means that all the given subspace i.e. is penalized with a single weight. We investigate this simple model in Section IV.

## Iv Simulation Results

In this section, we provide numerical experiments which compares RMSPI and GRMSPI with ADMiRA [lee2010admira] in terms of the required iterations, success rate and computational complexity. Almost all of the experiments are implemented for with rank , except where otherwise stated. Each experiment is repeated times over the choice of .

Figures 1,1, 1 show the success rate for noisy matrix recovery, noiseless matrix recovery and noiseless matrix completion, respectively. In this experiment, we assume that and are close to and , so the principal angles are small; in other words and are counted as good estimates for and . Notice that we declare success when the normalized error in

iterations. We observe that GRMSPI needs much less measurements than ADMiRA to reach a fixed probability of success.

Figure 2,2, 2 show the success rate for noisy matrix recovery, noiseless matrix recovery and noiseless matrix completion, respectively. In this experiment we assume that and are close to and and the principal angles are small. We observe that the multi-weight scenario (GRMPSI) performs much better than the case of single weight (RMSPI).

Figures 3 and 4 show the cases for which and are either close to or far from and , respectively. As expected, the performance of RMSPI and GRMSPI are better than the others.

In Tables I to IV, we provide experiments in which we compare the algorithms in terms of , the number of required iterations for exact recovery in cases of different ranks. Notice that the considered cases for Tables I to IV are respectively the same as in Figures 1 to 4. Notice that we do not consider ADMiRA in Tables III, IV, since its performance is constant as it is not able to exploit the prior information.

In these experiments, we observe that if the number of measurements increases, we have a better SNR and the number of required iterations decreases. Also, the performance will increase if the matrix of interest has a lower rank.

In Table V, we compare RMSPI, GRMSPI, and nuclear norm minimization implemented by CVX [cvx] (succinctly shown by CVX) in terms of computational complexity and SNR. We observe that while CVX method outperforms the others in terms of SNR, its computational complexity is poorly large. Notice that [eftekhari2018weighted] does not consider the case of far subspaces. Thus, it has not been included in this experiment.

## Appendix A Proofs of least squares

###### Proof.

To solve least squares in step 6 by using (II), we solve the following least squares problem:

 ˜X←argminX{∥y−BX∥2:X∈span(˜Ψ)}. (16)

Due to the definition of operator, the least squares problem is converted to

 ˜x←argminx{∥y−Bx∥2}, (17)

where

 B=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣vec(UHQ−1˜UrA1Q−1˜VrV)T⋮vec(UHQ−1˜UrAiQ−1˜VrV)T⋮vec(UHQ−1˜UrApQ−1˜VrV)T⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

and

 ˜x=B†y.

Here, is the operation that vectorizes a matrix by concatenation of the columns. Finally, the solution of least squares reads

 ˜X=U reshape(˜x) VH, (18)

where write the vector in a matrix form with size … ∎

## Appendix B Proof of Lemma 1

Suppose that be the RIP constant of i.e. we have

 (1−δr(B))∥Z∥2F≤∥BZ∥22≤(1+δr(B))∥Z∥2F (19)

for every with in particular for any matrix with . Hence, we have:

 ∥B(Z′)∥2=∥A(Z)∥2≤(1+δ(B))∥Z′∥F≤ (1+δ(B))∥Q˜Ur∥2→2∥Z∥F∥Q˜Vr∥2→2≤(1+δ(B))∥Z∥F, (20)

where in the first relation, we used the definition of . The last step uses the relations

 ∥Q˜Ur∥2→2≤1 ∥Q˜Vr∥2→2≤1. (21)

Since is the smallest constant satisfying (12), it is straightforward from (B) to verify that .

## Appendix C Proof of convergence rate

###### Proof.

In this section, we provide a new method for proving the convergence rate of our algorithm. Define . We should highlight that due to the invertiblity of and , the convergence of to is equivalent to the convergence of to . Also, notice that the interested matrix might not be exactly of rank , hence we consider a rank-r truncation denoted by and the residual . We begin with steps 7 and 8 of Algorithm 1. We know that is a better estimate of (in terms of -term approximation) than . Due to the fact that , we conclude that

 (22)

Considering that and , we have

 ∥X1,r−ˆXk+1∥2F=∥P˜Ψ(X1,r−ˆXk+1)∥2F (23) +∥P˜Ψ⊥(X1,r−˜X)∥2F ≤4∥P˜Ψ(X1,r−˜X)∥2F+∥P˜Ψ⊥(X1,r−˜X)∥2F.

Let be a solution of the following least squares problem (step 6 of Algorithm 1).

 ˜X←argminZ{∥y−BZ∥2:Z∈span(˜Ψ)}.

Hence, by the properties of the least square solution, it holds that

 ⟨y−B˜X,BZ⟩=0∀Z: span(Z)⊂˜Ψ.

In particular, we have which is also equivalent to . Due to where