 # New Null Space Results and Recovery Thresholds for Matrix Rank Minimization

Nuclear norm minimization (NNM) has recently gained significant attention for its use in rank minimization problems. Similar to compressed sensing, using null space characterizations, recovery thresholds for NNM have been studied in arxiv,Recht_Xu_Hassibi. However simulations show that the thresholds are far from optimal, especially in the low rank region. In this paper we apply the recent analysis of Stojnic for compressed sensing mihailo to the null space conditions of NNM. The resulting thresholds are significantly better and in particular our weak threshold appears to match with simulation results. Further our curves suggest for any rank growing linearly with matrix size n we need only three times of oversampling (the model complexity) for weak recovery. Similar to arxiv we analyze the conditions for weak, sectional and strong thresholds. Additionally a separate analysis is given for special case of positive semidefinite matrices. We conclude by discussing simulation results and future research directions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Rank minimization (RM) addresses the recovery of a low rank matrix from a set of linear measurements that project the matrix onto a lower dimensional space. The problem has gained extensive attention in the past few years, due to the promising applicability in many practical problems . Suppose that is a low rank matrix of size and let . Further let be a linear measurement operator. Given the measurements , the problem is to recover , with the knowledge of the fact that it is low rank. Provided that is the solution with lowest rank, this problem can be formulated with the following minimization program.

 min rank(X) (1) subject to A(X)=y0,

The function is non-convex, and it turns out that (1

) is NP hard and cannot be solved efficiently. Fazel et al. suggested replacing the rank with the nuclear norm heuristic as the closest convex relaxation

. The resulting convex optimization program is called nuclear norm minimization and is as follows.

 min ∥X∥⋆ (2) subject to A(X)=A(X0),

where

refers to the nuclear norm of its argument, i.e., the sum of the singular values. (

2) can be written as a semi-definite program (SDP) and thus be solved in polynomial time. Recent works have studied the sufficient conditions under which (2) will recover (i.e. is unique minimizer of (2)). In  it is shown that, similar to compressed sensing, Restricted Isometry Property (RIP) is a sufficient condition for the success of (2) and

measurement is enough for guaranteeing RIP with high probability. In

, Candes extended these results and showed that a minimal sampling of is in fact enough to have RIP and hence recovery. In later works [4, 12], necessary and sufficient null space conditions are derived and were analyzed for Gaussian measurement operators, i.e., operators where the entries are i.i.d. Gaussian, leading to thresholds for the success of (2). These thresholds establish explicit relationships between the problem parameters, as opposed to the order-wise relationships that result from RIP techniques. However these results are far from being optimal in the low rank regime which necessitates a new approach to be taken. In particular, if the matrix size is and the rank of the matrix to be recovered is then even if is very small, they require a minimum sampling of for success. In this paper, we come up with a novel null space analysis for the rank minimization problem and we find significantly better thresholds than the results of [4, 12]. Although the analysis is novel for the rank minimization problem, we basically follow the analysis developed for compressed sensing by Stojnic in  which is based on a seminal result of Gordon 

. In addition to the analysis of general matrices, we give a separate analysis for positive semidefinite matrices which resemble nonnegative vectors in compressed sensing. We also consider the case of unique positive semidefinite solutions, which was recently analyzed by Xu in

.

We extensively use the results of . Basically, we slightly modify Lemmas 2, 5, 7 of  and use null space conditions for the NNM problem. The strength of this analysis comes from the facts that the analysis is more accessible and that the weak threshold of  matches the exact threshold of . In fact, while it is not at all clear how to extend the analysis of  from compressed sensing to NNM, it is relatively straightforward to do so for . Our simulation results also indicate that our thresholds for the NNM problem are seemingly tight. This is perhaps not surprising since, as we shall see, the null space conditions for NNM and compressed sensing are very similar.

## 2 Basic Definitions and Notations

Denote identity matrix of size

by . We call partial unitary if columns of form an orthonormal set i.e. . Clearly we need for to be partial unitary. Also for a partial unitary , let denote an arbitrary partial unitary of size so that

is a unitary matrix (i.e. columns are complete orthonormal basis of

).

For a matrix , we denote the singular values by where

. The (skinny) singular value decomposition (SVD) of

is shown as where , and , where . Note that , are partial unitary and is positive, diagonal and full rank. Also let denote vector of increasingly ordered singular values of i.e. .

The norm of denoted by is defined as . When it is called the nuclear norm, i.e. , and when it is equivalent to the spectral norm denoted by . Also Frobenius norm is denoted by . Note that we always have:

 ∥X∥k=k∑i=1σi(X)≤ ⎷k∑i=11k∑i=1σ2i(X)≤√k∥X∥F (3)

For a linear operator acting on a linear space, we denote the null space of by , i.e. iff . We denote by the ensemble of real matrices in which the entries are i.i.d.

(zero-mean, unit variance Gaussian).

It is a well known fact that normalized singular values of a square matrix with i.i.d. Gaussian entries have quarter circle distribution asymptotically . In other words the histogram of singular values (normalized by ) converges to the function

 ϕ(x)=√4−x2π    0≤x≤2 (4)

Similarly, the distribution of the squares of the singular values (normalized by ) converges to the well known Marcenko-Pastur distribution 

. Note that this is nothing but the distribution of the eigenvalues of

where is a square matrix drawn from ,

 ϕ2(x)=√4x−x22πx    0≤x≤4 (5)

Let

be the cumulative distribution function of

i.e.,

 F(x)=∫x0ϕ(t)dt (6)

Let . We define to be the asymptotic normalized expected value of the norm of a matrix drawn from , i.e.:

 γ(β):=limn→∞E[∥X∥βn]n3/2=∫2F−1(1−β)xϕ(x)dx (7)

Similarly define to be the asymptotic normalized expected value of the norm of a matrix where is drawn from :

 (8)

Note that these limits exist and is well defined .

A function is called -Lipschitz if for all we have:

We say an orthogonal projection pair is a support of the matrix if . In particular is the unique support of the matrix , if and are orthogonal projectors with such that . In other words, and .

We say is a random Gaussian measurement operator if the measurement is where ’s are i.i.d. matrices drawn from for all . Note that this is equivalent to where is obtained by putting columns of on top of each other to get a vector of size .

Model complexity is defined as the number of degrees of freedom of the matrix. For a matrix of size

and rank model complexity is . Then we define normalized model complexity to be .

Finally let denote ”greater than” in partially ordered sets. In particular if are Hermitian matrices then . Similarly for a given two vectors we write .

## 3 Key Lemmas to be Used

In this section, we state several lemmas that we will make use of later. Proofs that are omitted can be found in the given references.

For Lemmas (1), (2), (3), let with .

###### Lemma 1.
 tr(XTY)≤m∑i=1σi(X)σi(Y)=Σ(X)TΣ(Y) (9)
###### Proof.

Can be found in .

In case of vectors (i.e. matrices are diagonal) we have the following simple extension: Let be vectors. Let be ’th largest value of vector (i.e. ) then

 ⟨x,y⟩≤m∑i=1x[i]y[i] (10)
###### Lemma 2.

Let . Let and let be a decreasingly ordered arrangement of . Then we have the following inequality:

 ∀ m≥k≥1:  k∑i=1s[i](X,Y)≤k∑i=1σi(Z)=∥Z∥k (11)

In particular we have:

 m∑i=1|σi(X)−σi(Y)|≤∥Z∥⋆ (12)
###### Proof.

Proof can be found in [16, 21]

###### Lemma 3.

If matrix then we have:

 ∥X∥⋆≥∥X11∥⋆+∥X22∥⋆ (13)
###### Proof.

Proof can be found in .

Similarly, we have the following obvious inequality when is square ():

 ∥X∥⋆≥trace(X) (14)
###### Proof.

Dual norm of the nuclear norm is the spectral norm . Remember that is identity. Then:

 ∥X∥⋆=sup∥Y∥=1⟨X,Y⟩≥⟨X,Im⟩=% trace(X) (15)

###### Theorem 1.

(Escape through a mesh, ) Let be a subset of the unit Euclidean sphere in . Let be a random -dimensional subspace of , distributed uniformly in the Grassmanian with respect to Haar measure. Let

 ω(S)=Esupw∈S(hTw) (16)

where is a column vector drawn from . Then if we have:

 P(Y∩S=∅)>1−3.5exp(−(√m−14√m−ω(S))218 (17)
###### Lemma 4.

For all , is a function of .

###### Proof.

Let be such that and . But then from Lemma (2) we have:

 1≥∥~X∥F≥σ1(~X)≥s(X,^X)≥|σk(X)−σk(^X)| (18)

###### Lemma 5.

(from [4, 13]) Let be drawn from and be a function with Lipschitz constant then we have the following concentration inequality

 P(|f(x)−Ef(x)|≥t)≤2exp(−t22L2) (19)

For analyzing positive semidefinite matrices, we will introduce some more definitions and lemmas later on.

## 4 Thresholds for Square Matrices

In the following section, we’ll give and analyze strong, sectional and weak null space conditions for square matrices (). With minor modifications, one can obtain the equivalent results for rectangular matrices ().

### 4.1 Strong Threshold

###### Strong recovery threshold.

Let be a random Gaussian operator. We define () to be the strong recovery threshold if with high probability satisfies the following property:

Any matrix with rank at most can be recovered from measurements via (2).

###### Lemma 6.

Using (2) one can recover all matrices of rank at most if and only if for all we have

 2∥W∥r<∥W∥⋆ (20)
###### Proof.

If (20) holds then using Lemma (2) and the fact that for any we have

 ∥X+W∥⋆≥n∑i=1|σi(X)−σi(W)| ≥ r∑i=1(σi(X)−σi(W))+n∑i=r+1σi(W) (21) ≥ ∥X∥⋆+∥W∥⋆−2∥W∥r>∥X∥⋆ (22)

Hence is unique minimizer of (2). Conversely if (20) doesn’t hold for some then choose where is the matrix induced by setting all but largest singular values of to 0. Then we get: . Finally we find but is not the unique minimizer.

Now we can start analyzing the strong null space condition for the NNM problem. is a random Gaussian operator and we’ll analyze the linear regime where and . Our aim is to determine the least () so that is a strong threshold for . Similar to compressed sensing the null space of is an dimensional random subspace of distributed uniformly in the Grassmanian w.r.t. Haar measure. This can also be viewed as the span of matrices drawn i.i.d. from . Then similar to  we have established the necessary framework.

Let be the set of all matrices such that and . We need to make sure the null space of has no intersection with . We will first upper bound (16) in Theorem 1 then choose (and ) respectively.

As a first step, given a fixed we’ll calculate an upper bound on . Note that from Lemma 1 we have:

 f(H,Ss)=supW∈Ss⟨H,W⟩≤supW∈SsΣ(H)TΣ(W) (23)

The careful reader will notice that actually we have equality in (23) because the set is unitarily invariant hence any value we can get on the right hand side, we can also get on the left hand side by aligning the singular vectors of and . Let , . Note that . Then since and any , we need to solve the following optimization problem given :

 maxy hTy (24) subject to y⪰0 n∑i=n−r+1yi≥n−r∑i=1yi ∥y∥ℓ2≤1

Clearly the right hand side of (23) and the result of (24) is same because will be maximized when are sorted increasingly due to Lemma 1.

Note that (24) is exactly the same as (10) of . Then we can use (22), (29) of  directly to get:

###### Lemma 7.

If then

 f(H,Ss)≤ ⎷n∑i=c+1h2i−((hTz)−∑ci=1hi)2n−c (25)

where such that and and such that . As long as we can find such . In addition, in order to minimize right hand side of (25), one should choose largest such .

In case of , the following is the obvious upper bound from Cauchy-Schwarz and the fact that

 f(H,Ss)≤∥h∥ℓ2= ⎷n∑i=1h2i (26)

Similar to , for the escape through a mesh (ETM) analysis, using Lemma 7, we’ll consider the following worse upper bound:

###### Lemma 8.

Let be defined same as in Lemma 7. Let be chosen from and let and . Then we have: where

 Bs=∥h∥ℓ2   if g(H,cs)≤0 Bs= ⎷n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−cs   else

where and is a such that

 cs =0    if E[hTz]≤0 cs is solution of    (1−ϵ)E[(hTz)−∑ci=1hi]√n(n−c)=F−1((1+ϵ)cn) else if E[hTz]>0 (27)

where can be arbitrarily small. Note that is deterministic. Secondly one can observe that .

Here is the c.d.f. of the quarter circle distribution previously defined in (6).

#### 4.1.1 Probabilistic Analysis of E[Bs]

The matrix is drawn from and . In the following discussion, we’ll focus on the case and we’ll declare failure (no recovery) else. This is reasonable since our approach will eventually lead to in case of . The reason is that, with high probability we’ll have and this will result in which is the worst upper bound.

Then, we’ll basically argue that whenever , asymptotically with probability one, we’ll have . Next, we’ll show that contribution of the region to the expectation of asymptotically converges to .

From the union bound, we have:

 P(g(H,cs)≤0)≤P(hTz−cs∑i=1hi≤(1−ϵ)E[(hTz)−cs∑i=1hi])+P(hcs≥√nF−1((1+ϵ)csn)) (28)

We’ll analyze the two components separately. Note that

is a function of singular values which is actually a Lipschitz function of the random matrix

as we’ll argue in the following lemma.

###### Lemma 9.

Let and let and is as defined previously. Then:

 f(H)=hTz−cs∑i=1hi (29)

is Lipschitz function of .

###### Proof.

Let be such that . From Lemma (2) we have:

 ∥~H∥n−cs≥n−cs∑i=1|σi(H)−σi(^H)| ≥|r∑i=1(σi(H)−σi(^H))|+|n−cs∑i=r+1(σi(^H)−σi(H))| (30) ≥|hTz−^hTz|=|f(H)−f(^H)| (31)

On the other hand we have: which implies finishing the proof.

Now, using the fact that is i.i.d. Gaussian and is the vector of singular values of , we have hence from Lemma 5 and from the fact that is i.i.d. Gaussian, we have:

 P1:=P(hTz−cs∑i=1hi≤(1−ϵ)E[(hTz)−cs∑i=1hi])≤exp(−ϵ2(γ(1−δs)−2γ(β))+o(1))2n22(1−δs)) (32)

if (which is equivalent to and ).

Similarly from the quarter circle law we have . Using Lemmas 5, 4 we can find:

 P2:=P(hcs≥√nF−1((1+ϵ)csn))≤exp(−n2(F−1((1+ϵ)csn)−F−1(csn)+o(1))2) (33)

In particular we always have for any , (because for ). Hence converges to exponentially fast. One can actually show instead of however this won’t affect the results.

Then since : . It remains to upper bound as follows:

 E[Bs]≤∫g(H,cs)≤0∥h∥ℓ2p(H)dH+∫H ⎷n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−csp(H)dH (34)

Note that is linear function of (hence ) so if for any . In other words similar to the discussion in  for any value of , the fraction of the region on the sphere of radius will be constant. On the other hand since

is iid Gaussian, the probability distribution of

is just a function of i.e. for any matrix . As a result:

where is the area of a sphere in with radius . Hence

 P(g(H,cs)≤0) = ∫a≥0∫g(H,cs)≤0,∥H∥F=ap(H)dHda=∫a≥0∫g(H,cs)≤0,∥H∥F=af(a)dHda (36) = C0∫a≥0f(a)Sada=C0 (37)

Using the exact same argument:

 ∫g(H,cs)≤0∥H∥Fp(H)dH = ∫∞a=0∫g(H,cs)≤0,∥H∥F=a∥H∥Fp(H)dHda (38) = ∫∞a=0∫g(H,cs)≤0,∥H∥F=aaf(a)dHda = ∫∞a=0af(a)C0Sa=P(g(H,cs)≤0)E(∥H∥F) ≤ exp(−n8(πϵδs+o(1))2)n

The last term clearly goes to zero for large . Then we need to calculate the second part which is:

 ∫H ⎷n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−csp(H)dH = E( ⎷n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−cs) (39) ≤  ⎷E(n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−cs) (40)

The last inequality is due to the following Cauchy-Schwarz. For a random variable (R.V.)

 E(X)=∫xxp(x)dx∫xp(x)dx≥(∫x√xp(x)2dx)2=E(√X)2 (41)

Note that for large and fixed and we have

 E(n∑i=cs+1h2i−((hTz)−∑csi=1hi)2n−cs)=(γ2(1−δs)−(γ(1−δs)−2γ(β))21−δs+o(1))n2 (42)

Then combining (34) and (38), it follows that (42) gives an upper bound for and thereby . To be able to calculate the required number of measurements we need to find and substitute in (42) because (42) will also be an upper bound on the minimum asymptotically.

If we consider (8), asymptotically will be solution of:

 (1−ϵ)γ(1−δs)−2γ(β)1−δs=F−1((1+ϵ)δs) (43)

Then we can substitute this in (42) to solve for (and ). Using Theorem 1 and (43) we find:

###### Theorem 2.

If then . Otherwise:

 μ>γ2(1−δs)−(γ(1−δs)−2γ(β))21−δs (44)

is sufficient sampling rate for to be strong threshold of random Gaussian operator . Here is solution of:

 (1−ϵ)γ(1−δs)−2γ(β)1−δs=F−1((1+ϵ)δs) (45)

In order to get the smallest we let . Numerical calculations give the strong threshold in Figure 1. Obviously we found and plotted the least for a given (i.e. equality in (44)).

Next we define and analyze sectional threshold.

### 4.2 Sectional Threshold

###### Sectional recovery threshold.

Let be a random Gaussian operator and let be an arbitrary orthogonal projection pair with . Then we say that () is a sectional recovery threshold if with high probability satisfies the following property:

Any matrix with support can be recovered from measurements via (2).

Given a fixed , our aim is to calculate the least such that is sectional threshold for a random Gaussian operator .

###### Lemma 10.

Given support with one can recover all matrices with this support using (2) iff for all we have

 ∥(I−P)W(I−QT)∥⋆>∥PWQT∥⋆ (46)
###### Proof.

Note that in a suitable basis induced by we can write:

 X=[X11000],    W=[W11W12W21W22] (47)

where . Now If (46) holds then using Lemma 3 we immediately have for all :

 ∥X+W∥⋆=[X11+W11W12W21W22]≥∥X11+W11