I Introduction
The problem of recovering a lowrank matrix from undersampled measurements arises in many applications such as MRI[haldar2010spatiotemporal],[zhao2010low], collaborative filtering [srebro2010collaborative], Netflix [bennett2007netflix], exploration seismology [aravkin2014fast], and quantum state tomography[gross2010quantum].
An idealistic approach to solve this problem is to consider the problem^{1}^{1}1In this work, we consider only square matrices. However, the extension to nonsquare matrices is straightforward.
(1) 
which is known to be NPhard in general [recht2010guaranteed]. Here is a linear operator, and for measurement. Then problem (I) can be relaxed and converted to the following tractable convex problem:
(2) 
where sums the singular values of a matrix and is called nuclear norm. There are many algorithms based on SVD, truncated SVD, and greedy methods for matrix recovery and weighted matrix recovery (see Section IB for more explanations); however these algorithms are not capable to exploit prior subspace information. In this paper, we propose an algorithm that uses prior information for lowrank matrix recovery based on CoSaMP ^{2}^{2}2Compressive Sampling Matching Pursuit[needell2009cosamp]. Before introducing our algorithm for matrix recovery, it is necessary to know how to incorporate prior information into matrix recovery problem. Let and be the column and row space of the groundtruth matrix with rank , respectively. Suppose that we are given two subspaces and that form the principal angles^{3}^{3}3For the definition of principal angles between subspaces, see for example[daei2018optimal, Section II].
with and , respectively. We also define the subspaces and as the orthogonal complements of and , respectively. Then, the following problem is proposed for capturing both lowrankness and subspace prior information:
(3) 
where
(4) 
and are diagonal matrices. Here, and are some bases of the subspaces and , respectively (the same argument holds for and ). The diagonal entries of s specify how much the corresponding principal angles shall be penalized in the minimization problem. For instance, if is close to and far from the , then the values on the diagonal of are small while the values on the diagonal of shall be large.
Ia Contributions
To highlight our contributions, we list them below:

Proposing a new optimization problem for matrix recovery and completion. As mentioned in (I), we design a novel optimization problem that encourages both rank and subspace information. The problem uses the principal angles between a given subspace and the matrix subspace.

Proposing a greedybased algorithm for matrix recovery and completion. We propose new and efficient greedybased algorithms named rank minimization with subspace prior information (RMSPI) and generalized RMSPI (GRMSPI) to solve (I). While RMSPI penalizes the principal angles with a single weight, GRMSPI is designed for a multiweight scenario where each principal angle is penalized separately.

Convergence guaranty. We prove convergence guarantee results for RMSPI and GRMSPI. The convergence rate of our proposed algorithm is superior to that of [lee2010admira].

We present a performance guarantee for RMSPI and GRMSPI in terms of rankrestricted isometry property (RRIP) given in Section III. Simulation results show that even when the measurement operator does not satisfy the RRIP constraint (for example in matrix completion problem), our proposed algorithms are still capable of recovering the interested matrix exactly.

We examine our algorithms in presence of noisy measurements and numerically observe that it is robust to measurement noise.
IB Related Works and Key Differences
Recht et al. in [recht2010guaranteed] convert problem (I) to the relaxed form in (I). Following [recht2010guaranteed], Cai et al. in [cai2010singular] develop an algorithm for solving the problem
(5) 
which is a regularized version of (I). Specifically, they provide an iterative algorithm based on a soft singular value thresholding (SVT). Besides the fact that the algorithm is designed for the noiseless case, the computation cost is rather low, yet it lacks any convergence rate analysis.
Ji et al. in [ji2009accelerated], Liu et al. in [liu2012implementable], and Toh et al. in[toh2010accelerated] independently provide algorithms based on gradient method to solve the problem (5). However, the purpose of these papers is to improve SVT and to reduce the number of required iterations for matrix recovery.
Mazumder et al. in [mazumder2010spectral] propose a convex algorithm for minimizing the error in each iteration under the condition that the nuclear norm is bounded. This algorithm requires taking SVD in each step which is costly.
Ma et al. in [ma2011fixed] propose an iterative algorithm for minimizing nuclear norm by using the Bregman divergence and fixed point. This algorithm is very fast and powerful, yet suffers lack of convergence rate analysis. Also, it is only useful in the absence of noise.
In [mishra2013low], [wen2010low], and [recht2013parallel], the authors propose methods based on approximating the nuclear norm by using its variational features.
The abovementioned methods need calculating SVD which is expensive in general. Henceforth, in [wang2015orthogonal] and [lee2010admira], the authors propose algorithms based on greedy methods. More explicitly, they extend the wellstudied methods OMP^{4}^{4}4Orthogonal Matching Pursuit[tropp2007signal] and CoSaMP [needell2009cosamp] to the matrix case. These methods are observed to be more efficient than relaxationbased algorithm (e.g. nuclear norm minimization).
Despite the effectiveness in matrix recovery problem, only few works consider prior information [jain2013provable], [xu2013speedup], [eftekhari2018weighted]. Specifically, the common feature of these algorithms is to use prior information in such a way that the number of required measurements is minimized.
The authors in [jain2013provable], [xu2013speedup], employ side information to enhance the performance of nuclear norm minimization. Their side information was to completely know a few directions of the matrix subspace and differs from knowing the principal angles that we consider.
Aravkin et al. in [aravkin2014fast] and Eftekhari et al. in [eftekhari2018weighted] incorporate prior knowledge about the matrix column and row spaces into the recovery procedure. They consider the maximum principal angle between a given (e.g. ) and the groundtruth subspace (e.g. ). They show that as long as the given subspace is close to the interested one, the required number of measurements decreases compared to the regular nuclear norm minimization. Our model in (I) differs from the ones in [aravkin2014fast] and [eftekhari2018weighted] in that we penalize the given subspace with multiple weights instead of a single weight. Also, our model outperforms theirs when is either far or close to . Besides the generality of our model, RMSPI and GRMSPI are considerably superior to the ones in [aravkin2014fast, eftekhari2018weighted] in terms of computational complexity (see Section IV).
The authors in [mohan2010reweighted], propose an alternative for rank minimization:
(6) 
where is the feasible set. In each iteration (e.g. th) of the algorithm solves
(7) 
where and are some weighting matrices.
Finally, [rao2015collaborative] solves
(8) 
in which and are some certain invertible matrices related to the interested subspace. This problem is also similar to [ji2009accelerated].
IC Outline and Notations
The paper is organized as follows: problem formulation and the proposed algorithm are given in Section II. In Section III, we explain the main result regarding the convergence rate of our algorithm. We compare our proposed methods with the state of the art algorithms in Section IV.
Throughout the paper, scalars are denoted by lowercase letters, vectors by lowercase boldface letters, and matrices by uppercase letters. The
th element of the vector is shown by or . The operators and are used to denote the trace and Hermitian of a matrix, respectively. represents the pseudoinverse operator. The Frobenius inner product is denoted by . The orthogonal projection matrices onto the subspaces and are shown byand
where is a basis fo r the subspace and
is the identity matrix. Also define the support of
by the linear subspaceWe define the linear operator as
for appropriate . Also, the adjoint operator of is defined as . is identity linear operator i.e. .
Ii Algorithm Formulation
Let be a matrix with singular value decomposition (SVD) . Here, and are orthonormal bases. For an integer , we might have a rank truncation of as where and are obtained by retaining columns of and corresponding to the largest singular values i.e. and . We also denote the residual by . Another decomposition that we employ in this paper is the atomic decomposition.
Definition 1.
Let us denote an orthonormal set of rank1 matrices in by . We define the smallest set of rank1 matrices in that spans as
(9) 
where and is the indicator function and returns the cardinality of a set.
As stated earlier, we have access to some prior information about the column and row spaces of . More explicitly, we are given the subspaces and that form the principal angles
with the column and row spaces of , respectively.
Diagonal values of s are supposed to be in the range of . Notice that and form distinct angles with each other. The directions corresponding to the principal angles i.e. are determined based on the angles i.e. if the angles are small, the corresponding directions shall be penalized less and vice versa. However, the diagonal values of and corresponding to other directions (not having effective role on the principal angles) are set to . We propose algorithms called RMSPI and GRMSPI to exploit this beneficial subspace information. Our proposed algorithms have very low computational cost. Indeed, our aim is to solve the following problem:
(10) 
This problem is mathematically equivalent to (I).
In the first step of our greedy algorithms, we maximize the correlation between the residual matrix in each iteration and the atoms to update an estimate for the support of the true matrix i.e.
. To do so, we maximize the norm of the projection of the residual matrix over all the subspaces i.e.to reach the new rank matrix with support
(11) 
We have provided the pseudo code of the proposed RMSPI and GRMSPI in Algorithm 1. The difference between RMSPI and GRMSPI is in choosing the matrices and . RMSPI uses (III) for and while GRMSPI uses (I). Given the principal angles, the optimal choice of weights in and
(either in RMSPI or GRMSPI) is challenging and beyond the scope of this paper. We, however, find the weights heuristically. In other words, we use the discussion following (
I).The approach of our algorithm is based on the greedy method used in the vector case such as CoSaMP. At the beginning of each iteration (step 4), we obtain a set of atoms or support with size according to (11). In step 6, by solving the least squares problem, a good approximation of with rank and known support estimate is obtained; for more details of least squares method see Appendix A. The support in step 5 is obtained by concatenating the support estimate in step 4 and the support estimates in the previous iteration. In the final step, since our matrix has to be of rank , we retain only the directions corresponding to the largest correlations in step 7. In the final step, we use the least square solution to find the corresponding singular values.
Iii Main Results
In this section, we investigate convergence guarantees for RMSPI and GRMSPI. We also compare our proposed algorithms with ADMiRA [lee2010admira].
Before stating our main theorem about convergence, we should emphasize that our proofs of convergence are completely different from [lee2010admira] which does not use subspace prior information. Similar to section II, we use the equivalent problem (II) instead of (I). First, we define the wellstudied restricted isometry property (RIP) in the following.
Definition 2.
A linear operator satisfies RIP condition with constant if
(12) 
holds for every that [foucart2017mathematical, Section 9.12].
In what follows, we include the relation between RIP constants of and .
Proof. See Appendix B.
Theorem 1.
Let and be the groundtruth matrix and the solution of (II), respectively. Then, if satisfies
we have that
where and .
Proof. See Appendix C.
Theorem 1 has some consequences which are included in the following remarks.
Remark 1.
In ADMiRA, it is proved that if then
However, by Theorem 1, we find that if , then
which verifies the superiority of our algorithm in terms of convergence rate.
Remark 2.
With a fixed convergence rate, the RIP constant of our operator which is equipped with subspace prior information is more conservative than that in ADMiRA. For example, to reach a convergence rate of , in ADMiRA must be less than while in our algorithm, we need by Lemma 1. This in turn implies that the convergence rate of our algorithm is more than that in ADMiRA.
In some applications, we do not have access to all angles between and , and we are only aware of the maximum angle. Thus, we have to penalize the subspace or with only one weight. To follow this practical model, we occupationally use
(15) 
in Algorithm 1 which we call RMSPI. This model might ignore the exact penalizations. This means that all the given subspace i.e. is penalized with a single weight. We investigate this simple model in Section IV.
Iv Simulation Results
In this section, we provide numerical experiments which compares RMSPI and GRMSPI with ADMiRA [lee2010admira] in terms of the required iterations, success rate and computational complexity. Almost all of the experiments are implemented for with rank , except where otherwise stated. Each experiment is repeated times over the choice of .
Figures 1,1, 1 show the success rate for noisy matrix recovery, noiseless matrix recovery and noiseless matrix completion, respectively. In this experiment, we assume that and are close to and , so the principal angles are small; in other words and are counted as good estimates for and . Notice that we declare success when the normalized error in
iterations. We observe that GRMSPI needs much less measurements than ADMiRA to reach a fixed probability of success.
Figure 2,2, 2 show the success rate for noisy matrix recovery, noiseless matrix recovery and noiseless matrix completion, respectively. In this experiment we assume that and are close to and and the principal angles are small. We observe that the multiweight scenario (GRMPSI) performs much better than the case of single weight (RMSPI).
Figures 3 and 4 show the cases for which and are either close to or far from and , respectively. As expected, the performance of RMSPI and GRMSPI are better than the others.
In Tables I to IV, we provide experiments in which we compare the algorithms in terms of , the number of required iterations for exact recovery in cases of different ranks. Notice that the considered cases for Tables I to IV are respectively the same as in Figures 1 to 4. Notice that we do not consider ADMiRA in Tables III, IV, since its performance is constant as it is not able to exploit the prior information.
In these experiments, we observe that if the number of measurements increases, we have a better SNR and the number of required iterations decreases. Also, the performance will increase if the matrix of interest has a lower rank.
In Table V, we compare RMSPI, GRMSPI, and nuclear norm minimization implemented by CVX [cvx] (succinctly shown by CVX) in terms of computational complexity and SNR. We observe that while CVX method outperforms the others in terms of SNR, its computational complexity is poorly large. Notice that [eftekhari2018weighted] does not consider the case of far subspaces. Thus, it has not been included in this experiment.
Appendix A Proofs of least squares
Proof.
To solve least squares in step 6 by using (II), we solve the following least squares problem:
(16) 
Due to the definition of operator, the least squares problem is converted to
(17) 
where
and
Here, is the operation that vectorizes a matrix by concatenation of the columns. Finally, the solution of least squares reads
(18) 
where write the vector in a matrix form with size … ∎
Appendix B Proof of Lemma 1
Suppose that be the RIP constant of i.e. we have
(19) 
for every with in particular for any matrix with . Hence, we have:
(20) 
where in the first relation, we used the definition of . The last step uses the relations
(21) 
Since is the smallest constant satisfying (12), it is straightforward from (B) to verify that .
Appendix C Proof of convergence rate
Proof.
In this section, we provide a new method for proving the convergence rate of our algorithm. Define . We should highlight that due to the invertiblity of and , the convergence of to is equivalent to the convergence of to . Also, notice that the interested matrix might not be exactly of rank , hence we consider a rankr truncation denoted by and the residual . We begin with steps 7 and 8 of Algorithm 1. We know that is a better estimate of (in terms of term approximation) than . Due to the fact that , we conclude that
(22) 
Considering that and , we have
(23)  
Let be a solution of the following least squares problem (step 6 of Algorithm 1).
Hence, by the properties of the least square solution, it holds that
In particular, we have which is also equivalent to . Due to where
Comments
There are no comments yet.