Rank minimization (RM) addresses the recovery of a low rank matrix from a set of linear measurements that project the matrix onto a lower dimensional space. The problem has gained extensive attention in the past few years, due to the promising applicability in many practical problems . Suppose that is a low rank matrix of size and let . Further let be a linear measurement operator. Given the measurements , the problem is to recover , with the knowledge of the fact that it is low rank. Provided that is the solution with lowest rank, this problem can be formulated with the following minimization program.
The function is non-convex, and it turns out that (1
) is NP hard and cannot be solved efficiently. Fazel et al. suggested replacing the rank with the nuclear norm heuristic as the closest convex relaxation. The resulting convex optimization program is called nuclear norm minimization and is as follows.
refers to the nuclear norm of its argument, i.e., the sum of the singular values. (2) can be written as a semi-definite program (SDP) and thus be solved in polynomial time. Recent works have studied the sufficient conditions under which (2) will recover (i.e. is unique minimizer of (2)). In  it is shown that, similar to compressed sensing, Restricted Isometry Property (RIP) is a sufficient condition for the success of (2) and
measurement is enough for guaranteeing RIP with high probability. In, Candes extended these results and showed that a minimal sampling of is in fact enough to have RIP and hence recovery. In later works [4, 12], necessary and sufficient null space conditions are derived and were analyzed for Gaussian measurement operators, i.e., operators where the entries are i.i.d. Gaussian, leading to thresholds for the success of (2). These thresholds establish explicit relationships between the problem parameters, as opposed to the order-wise relationships that result from RIP techniques. However these results are far from being optimal in the low rank regime which necessitates a new approach to be taken. In particular, if the matrix size is and the rank of the matrix to be recovered is then even if is very small, they require a minimum sampling of for success. In this paper, we come up with a novel null space analysis for the rank minimization problem and we find significantly better thresholds than the results of [4, 12]. Although the analysis is novel for the rank minimization problem, we basically follow the analysis developed for compressed sensing by Stojnic in  which is based on a seminal result of Gordon 
. In addition to the analysis of general matrices, we give a separate analysis for positive semidefinite matrices which resemble nonnegative vectors in compressed sensing. We also consider the case of unique positive semidefinite solutions, which was recently analyzed by Xu in.
We extensively use the results of . Basically, we slightly modify Lemmas 2, 5, 7 of  and use null space conditions for the NNM problem. The strength of this analysis comes from the facts that the analysis is more accessible and that the weak threshold of  matches the exact threshold of . In fact, while it is not at all clear how to extend the analysis of  from compressed sensing to NNM, it is relatively straightforward to do so for . Our simulation results also indicate that our thresholds for the NNM problem are seemingly tight. This is perhaps not surprising since, as we shall see, the null space conditions for NNM and compressed sensing are very similar.
2 Basic Definitions and Notations
Denote identity matrix of sizeby . We call partial unitary if columns of form an orthonormal set i.e. . Clearly we need for to be partial unitary. Also for a partial unitary , let denote an arbitrary partial unitary of size so that
is a unitary matrix (i.e. columns are complete orthonormal basis of).
For a matrix , we denote the singular values by where
. The (skinny) singular value decomposition (SVD) ofis shown as where , and , where . Note that , are partial unitary and is positive, diagonal and full rank. Also let denote vector of increasingly ordered singular values of i.e. .
The norm of denoted by is defined as . When it is called the nuclear norm, i.e. , and when it is equivalent to the spectral norm denoted by . Also Frobenius norm is denoted by . Note that we always have:
For a linear operator acting on a linear space, we denote the null space of by , i.e. iff . We denote by the ensemble of real matrices in which the entries are i.i.d.
(zero-mean, unit variance Gaussian).
It is a well known fact that normalized singular values of a square matrix with i.i.d. Gaussian entries have quarter circle distribution asymptotically . In other words the histogram of singular values (normalized by ) converges to the function
Similarly, the distribution of the squares of the singular values (normalized by ) converges to the well known Marcenko-Pastur distribution 
. Note that this is nothing but the distribution of the eigenvalues ofwhere is a square matrix drawn from ,
be the cumulative distribution function ofi.e.,
Let . We define to be the asymptotic normalized expected value of the norm of a matrix drawn from , i.e.:
Similarly define to be the asymptotic normalized expected value of the norm of a matrix where is drawn from :
Note that these limits exist and is well defined .
A function is called -Lipschitz if for all we have:
We say an orthogonal projection pair is a support of the matrix if . In particular is the unique support of the matrix , if and are orthogonal projectors with such that . In other words, and .
We say is a random Gaussian measurement operator if the measurement is where ’s are i.i.d. matrices drawn from for all . Note that this is equivalent to where is obtained by putting columns of on top of each other to get a vector of size .
Model complexity is defined as the number of degrees of freedom of the matrix. For a matrix of sizeand rank model complexity is . Then we define normalized model complexity to be .
Finally let denote ”greater than” in partially ordered sets. In particular if are Hermitian matrices then . Similarly for a given two vectors we write .
3 Key Lemmas to be Used
In this section, we state several lemmas that we will make use of later. Proofs that are omitted can be found in the given references.
Can be found in .
In case of vectors (i.e. matrices are diagonal) we have the following simple extension: Let be vectors. Let be ’th largest value of vector (i.e. ) then
Let . Let and let be a decreasingly ordered arrangement of . Then we have the following inequality:
In particular we have:
If matrix then we have:
Proof can be found in .
Similarly, we have the following obvious inequality when is square ():
Dual norm of the nuclear norm is the spectral norm . Remember that is identity. Then:
(Escape through a mesh, ) Let be a subset of the unit Euclidean sphere in . Let be a random -dimensional subspace of , distributed uniformly in the Grassmanian with respect to Haar measure. Let
where is a column vector drawn from . Then if we have:
For all , is a function of .
Let be such that and . But then from Lemma (2) we have:
For analyzing positive semidefinite matrices, we will introduce some more definitions and lemmas later on.
4 Thresholds for Square Matrices
In the following section, we’ll give and analyze strong, sectional and weak null space conditions for square matrices (). With minor modifications, one can obtain the equivalent results for rectangular matrices ().
4.1 Strong Threshold
Strong recovery threshold.
Let be a random Gaussian operator. We define () to be the strong recovery threshold if with high probability satisfies the following property:
Any matrix with rank at most can be recovered from measurements via (2).
Using (2) one can recover all matrices of rank at most if and only if for all we have
Hence is unique minimizer of (2). Conversely if (20) doesn’t hold for some then choose where is the matrix induced by setting all but largest singular values of to 0. Then we get: . Finally we find but is not the unique minimizer.
Now we can start analyzing the strong null space condition for the NNM problem. is a random Gaussian operator and we’ll analyze the linear regime where and . Our aim is to determine the least () so that is a strong threshold for . Similar to compressed sensing the null space of is an dimensional random subspace of distributed uniformly in the Grassmanian w.r.t. Haar measure. This can also be viewed as the span of matrices drawn i.i.d. from . Then similar to  we have established the necessary framework.
As a first step, given a fixed we’ll calculate an upper bound on . Note that from Lemma 1 we have:
The careful reader will notice that actually we have equality in (23) because the set is unitarily invariant hence any value we can get on the right hand side, we can also get on the left hand side by aligning the singular vectors of and . Let , . Note that . Then since and any , we need to solve the following optimization problem given :
where such that and and such that . As long as we can find such . In addition, in order to minimize right hand side of (25), one should choose largest such .
In case of , the following is the obvious upper bound from Cauchy-Schwarz and the fact that
Let be defined same as in Lemma 7. Let be chosen from and let and . Then we have: where
where and is a such that
where can be arbitrarily small. Note that is deterministic. Secondly one can observe that .
Here is the c.d.f. of the quarter circle distribution previously defined in (6).
4.1.1 Probabilistic Analysis of
The matrix is drawn from and . In the following discussion, we’ll focus on the case and we’ll declare failure (no recovery) else. This is reasonable since our approach will eventually lead to in case of . The reason is that, with high probability we’ll have and this will result in which is the worst upper bound.
Then, we’ll basically argue that whenever , asymptotically with probability one, we’ll have . Next, we’ll show that contribution of the region to the expectation of asymptotically converges to .
From the union bound, we have:
We’ll analyze the two components separately. Note that
is a function of singular values which is actually a Lipschitz function of the random matrixas we’ll argue in the following lemma.
Let and let and is as defined previously. Then:
is Lipschitz function of .
Let be such that . From Lemma (2) we have:
On the other hand we have: which implies finishing the proof.
Now, using the fact that is i.i.d. Gaussian and is the vector of singular values of , we have hence from Lemma 5 and from the fact that is i.i.d. Gaussian, we have:
if (which is equivalent to and ).
In particular we always have for any , (because for ). Hence converges to exponentially fast. One can actually show instead of however this won’t affect the results.
Then since : . It remains to upper bound as follows:
Note that is linear function of (hence ) so if for any . In other words similar to the discussion in  for any value of , the fraction of the region on the sphere of radius will be constant. On the other hand since
is iid Gaussian, the probability distribution ofis just a function of i.e. for any matrix . As a result:
where is the area of a sphere in with radius . Hence
Using the exact same argument:
The last term clearly goes to zero for large . Then we need to calculate the second part which is:
The last inequality is due to the following Cauchy-Schwarz. For a random variable (R.V.)
Note that for large and fixed and we have
Then combining (34) and (38), it follows that (42) gives an upper bound for and thereby . To be able to calculate the required number of measurements we need to find and substitute in (42) because (42) will also be an upper bound on the minimum asymptotically.
If we consider (8), asymptotically will be solution of:
If then . Otherwise:
is sufficient sampling rate for to be strong threshold of random Gaussian operator . Here is solution of:
Next we define and analyze sectional threshold.
4.2 Sectional Threshold
Sectional recovery threshold.
Let be a random Gaussian operator and let be an arbitrary orthogonal projection pair with . Then we say that () is a sectional recovery threshold if with high probability satisfies the following property:
Any matrix with support can be recovered from measurements via (2).
Given a fixed , our aim is to calculate the least such that is sectional threshold for a random Gaussian operator .