    # An analysis of noise folding for low-rank matrix recovery

Previous work regarding low-rank matrix recovery has concentrated on the scenarios in which the matrix is noise-free and the measurements are corrupted by noise. However, in practical application, the matrix itself is usually perturbed by random noise preceding to measurement. This paper concisely investigates this scenario and evidences that, for most measurement schemes utilized in compressed sensing, the two models are equivalent with the central distinctness that the noise associated with (<ref>) is larger by a factor to mn/M, where m, n are the dimension of the matrix and M is the number of measurements. Additionally, this paper discusses the reconstruction of low-rank matrices in the setting, presents sufficient conditions based on the associating null space property to guarantee the robust recovery and obtains the number of measurements. Furthermore, for the non-Gaussian noise scenario, we further explore it and give the corresponding result. The simulation experiments conducted, on the one hand show effect of noise variance on recovery performance, on the other hand demonstrate the verifiability of the proposed model.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In recent years, low-rank matrix recovery (LRMR) from noisy measurements, with applications in collaborative filtering 

 , control , quantum tomography , recommender systems , and remote sensing , has gained significant interest. Formally, this problem considers linear measurements of a (approximately) low-rank matrix of the following form

 y=A(X)+w, (1.1)

where

is the observed vector,

is an additive noise term, and is a linear measurement map, which is determined by

 A(X)=[tr(X⊤A(1)),tr(X⊤A(2)),⋯,tr(X⊤A(M))]⊤. (1.2)

Here, is the trace function, is the transposition of and are called measurement matrices. Each can be equal to a row of a compressive measurement matrix, and could be written as

 A(X)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣vec⊤(A(1))vec⊤(A(2))⋮vec⊤(A(M))⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦vec(X):=Avec(X), (1.3)

where is a long vector gained by stacking the columns of and is an matrix defined by (1.3) which associates with the linear measurement map .

However, the aforementioned model (1.1) only considers the noise introduced at the measurement stage. In a variety of application scenarios, the matrix to be recovered may also be corrupted by noise. Such issue exists in a great number of applications such as the recovery of a video sequence  , statistical modeling of hyperspectral imaging , robust matrix completion , and signal processing  . Accordingly, it is appropriate to take into the following model account

 y=A(X+Z)+w, (1.4)

where denotes the noise on the original matrix. Throughout this paper, we suppose that

is a white noise vector satisfying

and , and similarly is a white noise matrix obeying and , independent of . Here and elsewhere in this paper,

stands for the identity matrix of order

. Under these hypotheses, in the next section, we will reveal that the model (1.4) is equivalent to

 ~y=B(X)+u, (1.5)

where is a linear measurement map, whose restricted isometry property and spherical section property constants are very close to those of , and is white noise with mean zero and covariance matrix .

When and the matrices and are diagonal, the models (1.1) and (1.4) degenerates to the vector models

 y=~Ax+w, (1.6)
 y=~A(x+z)+w, (1.7)

where is the measurement matrix, and is the noise on the original signal, for more details, see    and . As far as we know, recently most researchers either only discuss the situation of the noise matrix in (1.4), or merely think over the vector model (1.7) and its associating sparse recovery problem. Specifically, Arias-Castro and Eldar  considered the model (1.7) and showed that, for the vast majority of measurement schemes employed in compressed sensing, the two models (1.6) and (1.7

) are equivalent with the significant distinction that the signal-to-noise ratio (SNR) is divided by a factor proportional to

. For the model (1.4) with , Recht et al. 

showed that the minimum-rank solution can be recovered by solving a convex optimization problem if a certain restricted isometry property holds for the linear transformation defining the constraints. More related works can be found in

  and .

In this paper, our main work incorporates the following parts: firstly, we investigate the relation between the restricted isometry property constants and the spherical section property constants of and when

; secondly, based on certain properties of the null space of the linear measurement map, we establish a sufficient condition for stable and robust recovery of the low-rank matrix itself contaminated by noise and the corresponding upper bound estimation of recovery error; thirdly, we obtain the minimal amount of measurements regarding the sufficient condition guaranteeing recovery via the nuclear norm minimization; finally, the results of numerical experiments show that the method of nuclear norm minimization is effective in recovering low rank matrices after whitening treatment.

The rest of the paper is constructed as follows. In Section 2, we discuss the relationship between the restricted isometry property constants and spherical section property constants of and under certain conditions. In Section 3, the recovery of low-rank matrices is thought over via the nuclear norm minimization method and sufficient conditions are established to ensure the robust reconstruction. In Section 4, the sampling number based on null space property that make sure the stable recovery is present. Some simulation experiments are carried out in Section 5. The proofs of the main results are provided in Section 6. Finally, the conclusion is given in Section 7.

## 2 RIP and SSP Analysis

In order to derive our results, the model (1.4) can be transformed into

 y=A(X)+v, (2.8)

where is determined by

 v=A(Z)+w=Avec(Z)+w. (2.9)

Due to the assumption of white noise and independence, one can easily verify that the covariance of the noise vector is equal to . Obviously, is not white noise like the noise , so the recovery analysis may become more complex.

Set , , , , . In order to whiten the noise vector , through multiplying the equation (2.8) by , then we derive the equivalent equation below

 ~y=Bvec(X)+u. (2.10)

By applying (1.3), the model (2.10) can be written as

 ~y=B(X)+u, (2.11)

where

 B(X)=[tr(X⊤B(1)),tr(X⊤B(2)),⋯,tr(X⊤B(M))]⊤, (2.12)

, and denotes the th row of the matrix . Observe that the noise vector is the white noise and its covariance matrix equals to . In order to investigate (1.4), we can utilize the results which are exploited to deal with (1.1), with the central distinctness that the noise corresponding with (1.4) is larger by a factor proportional to . In the case of , this gives rise to a large noise amplification or noise folding. The specific reason is the linear measurement amalgamates all the noise entries in , even those associated to zero entries in , accordingly it brings about a large noise raise in the compressed sampling.

Our analysis depends on approximating by . Set

 δ:=∥∥∥IM−MmnAA⊤∥∥∥. (2.13)

Here, weighs the quality approximating by and represents the operator norm on . For the rest of this paper, suppose that

is small. The assumption not only holds with high probability, but also has been shown in

.

In the following, we investigate what is the relationship between the restricted isometry constants of and .

For each integer , where , we say that a linear measurement map has the restricted isometry property (RIP) with constants if

 μr∥X∥2F≤∥A(X)∥22≤νr∥X∥2F (2.14)

holds for all matrices of rank at most (abbreviated as -rank), where . The theorem below presents the relationship between the RIP constants of and . Set .

###### Theorem 2.1.

Suppose that in (2.13) and that the linear measurement map fulfills the RIP of order with constants . It holds that the linear measurement map obeys the RIP of order with constants and .

###### Remark 2.2.

The theorem shows that under the assumption of the RIP constants of and are equivalent.

###### Remark 2.3.

In the case of and the matrices is diagonal with being -sparse (i.e., the number of non-zero elements in is at most), Theorem 2.1 is the same as Proposition in .

###### Proof of the theorem 2.1: method 1.

The idea is inspired by . In order to bound we utilize the definition of in (2.13),

 ∥Σ1−IM∥ =σ20mnθM∥IM−MmnAA⊤∥ =σ20mnMσ2+σ20mnMδ ≤δ. (2.15)

In the following, by applying the geometric series formula for , is expressed as

 Σ−11−I=[I−(I−Σ1)]−1−I=∞∑k=1(I−Σ1)k. (2.16)

The above power series converges because . In order to bound , we take operator norms on both sides of the equality (2.16), we get

 ∥Σ−11−I∥ (a)≤∞∑k=1∥(I−Σ1)k∥ (b)≤∞∑k=1∥I−Σ1∥k (c)≤∞∑k=1δk=δ1−δ=:δ1, (2.17)

where (a) follows from the triangle inequality, (b) uses the fact that for all matrices and in , and (c) is due to (2).

Take any satisfying rank at most . Note that

 ∥B(X)∥22−∥A(X)∥22 =∥Bvec(X)∥22−∥Avec(X)∥22 =vec⊤(X)A⊤(Σ−11−I)Avec(X). (2.18)

By employing Hlder’s inequality, the definition of operator norm and (2), we get

 |vec⊤(X)A⊤(Σ−11−I)Avec(X)| ≤∥(Avec(X))⊤∥2∥(Σ−11−I)A% vec(X)∥2 ≤∥(Σ−11−I)∥∥Avec(X)∥22 ≤δ1∥Avec(X)∥22=δ1∥A(X)∥22. (2.19)

A combination of (2) and (2), we get

 (1−δ1)∥A(X)∥22≤∥B(X)∥22≤(1+δ1)∥A(X)∥22. (2.20)

Combining with (2.20) and (2.14), it implies

 μr(1−δ1)∥X∥2F≤∥B(X)∥22≤νr(1+δ1)∥X∥2F.

This completes the proof. ∎

###### Proof of the theorem 2.1: method 2.

We still use the preceding symbols unless specifically stated. Since vectorizing the matrix loses its structural information, we deal directly with it. Set . By some calculations,

 H=σ20⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣⟨A(1),A(1)⟩  ⟨A(1),A(2)⟩  ⋯  ⟨A(1),A(M)⟩⟨A(2),A(1)⟩  ⟨A(2),A(2)⟩  ⋯  ⟨A(2),A(M)⟩⋮⟨A(M),A(1)⟩  ⟨A(M),A(2)⟩  ⋯  ⟨A(M),A(M)⟩⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦:=σ20G.

It follows that . Set and

 δ:=∥∥∥IM−MmnG∥∥∥. (2.21)

It is easy to check that , i.e., is white noise. Next we estimate the upper bound of . By applying (2.21), we get

 ∥Σ1−IM∥ =σ20mnθM∥IM−MmnG∥ =σ20mnMσ2+σ20mnMδ ≤δ. (2.22)

For any -rank matrix , due to and is symmetrical, we get

 ∥B(X)∥22−∥A(X)∥22 =∥Σ−1/21A(X)∥22−∥A(X)∥22 =⟨Σ−1/21A(X),Σ−1/21A(X)⟩−⟨A(X),A(X)⟩ =⟨A(X),(Σ−1/21)∗Σ−1/21A(X)⟩−⟨A(X),A(X)⟩ =⟨A(X),Σ−11A(X)⟩−⟨A(X),A(X)⟩ =⟨A(X),(Σ−11−I)A(X)⟩, (2.23)

where is the conjugate of . By using Hlder’s inequality and (2), we get

 |⟨A(X),(Σ−11−I)A(X)⟩| ≤∥A(X)∥2∥(Σ−11−I)A(X)∥2 ≤∥Σ−11−I∥∥A(X)∥22 ≤δ1∥A(X)∥22. (2.24)

The remaining proof is the same as Method 1, and we omit it for brevity. ∎

Next, we present the concept of spherical section property of a linear measurement map.

The spherical section constant of a linear measurement map is defined as

 Δ(A)=minX∈N(A)∖{0}∥X∥2∗∥X∥2F,

and we say satisfies the -spherical section property (SSP) if , where is the nuclear norm of the matrix

, i.e., the sum of its singular values. In the following proposition, we will explore the connection between SSP constants of

and .

###### Proposition 2.4.

Suppose that the linear measurement map satisfies the -SSP with . Then the linear measurement map obeys the -SSP with .

###### Remark 2.5.

The proposition indicates that the SSP constants of and are identical.

###### Proof of the lemma 2.4.

Firstly, we show that .

For any , then , i.e. . Note that . Hence, , namely, . Therefore, . Similarly, we could deduce that . Combining with the above facts, .

Now, we calculate the SSP constant of . By making use of the definition of SSP, we get

 Δ(B) =minX∈N(B)∖{0}∥X∥2∗∥X∥2F =minX∈N(A)∖{0}∥X∥2∗∥X∥2F =Δ(A)≥Δ.

The proof is complete. ∎

## 3 The null space property for LRMR

For recovering , a prominent model is solving a constrained nuclear norm minimization problem

 min^X∈Rm×n∥^X∥∗ % subject~{}to ∥B(^X)−~y∥2≤ϵ, (3.25)

where stands for the noise level, and holds with high probability, for more details, see Lemma 6.1. As one of the crucial tool for the analysis of LRMR, the Frobenius-robust rank null space property (FRRNSP) of a linear measurement map attracts specific interest.

###### Definition 3.1.

(FRRNSP ) The linear measurement map is said to satisfy the Frobenius-robust rank null space property of order with constants and if for any , the singular values of fulfill

 ∥X[r]∥F≤ρ√r∥X[r]c∥∗+τ∥A(X)∥2.

Here, the singular value decomposition (SVD) of

is with , where is the th largest singular value of , and and are respectively the left and right singular value vectors of . In this situation, write , where is the best -rank approximation of , i.e., . Combining with Definition 3.1 and (2.20), we obtain the FRRNSP of the linear measurement map given by the following lemma.

###### Lemma 3.2.

Set . Under the assumptions of Definition 3.1 and , the linear measurement map obeys the FRRNSP of order , namely, for all ,

 √1−δ1∥X[r]∥F≤ρ√1−δ1√r∥X[r]c∥∗+τ∥B(X)∥2,

holds for the singular values of .

Based on the above notion and lemma, we will establish an FRRNSP condition for stable and robust recovery of low-rank matrix via the nuclear norm minimization and discuss the upper bound estimation of reconstruction error.

###### Theorem 3.3.

Suppose that a linear measurement map satisfies the Frobenius-robust rank null space property of order with constants and . Set . Assume that . Then, for any , a solution of (3.25) with and approximates the matrix with error

 ∥X−X∗∥F≤C1∥X[r]c∥∗√r+C2ϵ, (3.26)

where

 C1=2(1+ρ)21−ρ

and

 C2=2(3+ρ)τ(1−ρ)√1−δ1.
###### Remark 3.4.

The theorem gives a sufficient condition to ensure the stable and robust reconstruction of the low-rank matrices.

###### Remark 3.5.

The inequality (3.26) in Theorem 3.3 provides an upper bound estimation on the reconstruction of the nuclear norm minimization. Especially, this estimation evidences that reconstruction precision of the nuclear norm minimization can be controlled by the noise level and the best -rank approximation error. Furthermore, the estimation (3.26) shows that the reconstruction accuracy of the method (3.25) can be bounded by the degree of rank of the matrix. In this sense, Theorem 3.3 demonstrates that under certain conditions, an -rank matrix can be robustly reconstructed by the method (3.25).

###### Remark 3.6.

When no noise is introduced, i.e., and , it will result in the exact recovery when matrices are -rank.

###### Remark 3.7.

By Lemma 6.1, we know that is Gaussian noise, so it is usually bounded by -norm. However, when is non-Gaussian noise, for example, Gaussian mixture noise, it is more appropriate to exploit -norm to bound that noise, see . Then the real matrix could be robustly recovered by

 min^X∈Rm×n∥^X∥∗ % subject~{}to ∥B(^X)−~y∥p≤ϵ, (3.27)

where , denotes the noise level which varies according to the range of , and holds with high probability, for more details, see Lemma 6.2. In the following, we only consider the case of because of another (i.e. ) situation is similar. In this case, assuming the conditions of Lemma 3.2 (just replace by ), the linear map satisfies the FRRNSP of order , viz,

 ∥X[r]∥F≤ρ√r∥X[r]c∥∗+τM1/2−1/p√1−δ1∥B(X)∥2.

Under the assumptions of Theorem 3.3, the solution of (3.27) satisfies

 ∥X−X∗∥F≤2(1+ρ)21−ρ∥X[r]c∥∗√r+2τ(3+ρ)(1−ρ)M1/2−1/p√1−δ1ϵ, ∥X−X∗∥p≤2(1+ρ)2(1−ρ)r1−1/p∥X[r]c∥∗+2τ(3+ρ)r1/p−1/2(1−ρ)M1/2−1/p√1−δ1ϵ,

where with .

In the following, we present the stable rank null space property (SRNSP) of a linear measurement map weaker than the Frobenius-robust rank null space property, see Definition in  for the analogue in the sparse signal reconstruction situation.

###### Definition 3.8.

(SRNSP) We say that the linear measurement map satisfies the stable rank null space property of order with constants and if for any , the singular values of fulfill

 ∥X[r]∥∗≤ρ∥X[r]c∥∗+τ∥A(X)∥2.

Similar to Lemma 3.2, we derive the following result on the SRNSP of the linear measurement map .

###### Lemma 3.9.

Set . Assume that the conditions of Definition 3.8 and . Then, the linear measurement map satisfies the SRNSP of order , viz., for all ,

 √1−δ1∥X[r]∥∗≤ρ√1−δ1∥X[r]c∥∗+τ∥B(X)∥2,

holds for the singular values of .

With preparation above, we now state the stability and robustness of the method (3.25) under the definition scheme of SRNSP of a linear measurement map.

###### Theorem 3.10.

We assume that a linear measurement map satisfies the sable rank null space property of order with constants and . Set with . Then, for any , a solution of (3.25) with and approximates the matrix with error

 ∥X−X∗∥F≤D1∥X[r]c∥∗√r+D2ϵ, (3.28)

where

 D1=2(1+ρ)(ρ√r+1)1−ρ

and

 D2=2[(1+ρ)√r+2]τ(1−ρ)√r(1−δ1).
###### Corollary 3.11.

Under the same assumptions as in Theorem 3.10, suppose that and is -rank. Then, can be exactly reconstructed via the method (3.25).

## 4 Measurement map with independent entries and four finite moments

In this section, we will determine how many measurement matrices with independent elements and four finite moments are needed for the FRRNSP condition to be fulfilled with high probability.

###### Theorem 4.1.

Set . Let , and is defined by (1.2), where the

are independent copies of a random matrix

with independent mean zero elements following

 Var(Xij)=(σ2+σ20)/(σ2+mnσ20/M)

and

 EX4ij≤(σ2+σ20)2/(σ2+mnσ20/M)2C4

for all and some positive constant . In addition, assume that are mutually orthogonal, and the columns of are mutually orthogonal and the lengths of its columns equal to .

Then, for given and with , there exists relying on that are positive constants, such that satisfies the Frobenius-robust rank null space property with constants and with probability at least whenever

 M≥c1r(m+n).
###### Proof of the theorem 4.1.

By the assumptions of Theorem 4.1, we get

 vec⊤(A(i))vec(A(i))=1, i=1,2,⋯,M, vec⊤(A(i))vec(A(j))=0, i≠j.

Consequently,

 AA⊤=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝vec⊤(A(1))vec⊤(A(2))⋮vec⊤(A(M))⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠(vec(A(1)),% vec(A(2)),⋯,vec(A(M)))=IM.

By employing the identity above and the definition of , we get

 Bi⋅ =(Σ−1/21)i⋅A =(σ2+mnσ20/Mσ2+σ20)1/2Ai⋅,

i.e.,

 vec⊤(B(i))=(σ2+mnσ20/Mσ2+σ20)1/2vec⊤(A(i)).

By applying the conditions of Theorem 4.1, we get

 E(Bij)=E⎡⎢⎣(σ2+mnσ20/Mσ2+σ20)1/2Aij⎤⎥⎦=0,
 Var(Bij)=Var⎡⎢⎣(σ2+mnσ20/Mσ2+σ20)1/2Aij⎤⎥⎦=1,
 E(B4ij)=E⎡⎢⎣(σ2+mnσ20/Mσ2+σ20)1/2Aij⎤⎥⎦4≤C4.

The remainder of the proof follows similarly the proof of Theorem in , which is omitted here for succinctness. ∎

## 5 Numerical Simulations

In this section, we present the optimization inside information of the constrained problem (3.25). The regularization form of the problem (3.25) is

 min^X∥^X∥∗+λ2∥Bvec(^X)−~y∥22, (5.29)

where is a regularization parameter, , , and stands for the vectorization of . Then, we solve the unconstrained problem (5.29) by using the alternating direction method of multipliers (ADMM)   . The problem (5.29) can be equivalently rewritten as

 min^X∥^X∥∗+λ2∥Bvec(U)−~y∥22 s.t. ^X=U. (5.30)

The associating augmented Lagrangian function is

 L(^X,U,W)=∥^X∥∗+λ2∥Bvec(^X)−~y∥22+⟨W,^X−U⟩+ρ12∥^X−U∥2F. (5.31)

where indicates the Lagrangian multiplier, and is a positive scalar. Then and can be obtained by minimizing each variable alternately while fixing the other variables. The updated details are summarized in Algorithm 5.1.

In our experiments, the measurement matrix is generated with its elements being i.i.d., zero-mean,

-variance Gaussian distribution. Next, the matrix

of rank is generated by , where and are with i.i.d. draw from a standard Gaussian distribution. The noise matrix and the measurement noise vector are then respectively generated with its entries being i.i.d., zero-mean, -variance Gaussian distribution () and -variance Gaussian distribution (). We choose and . With , , and , the measurement is produced by . Due to ,