# Robust recovery of low-rank matrices and low-tubal-rank tensors from noisy sketches

A common approach for compressing large-scale data is through matrix sketching. In this work, we consider the problem of recovering low-rank matrices from two noisy sketches using the double sketching algorithm discussed in Fazel et al. (2008). Using tools from non-asymptotic random matrix theory, we provide the first theoretical guarantees characterizing the error between the output of the double sketch algorithm and the ground truth low-rank matrix. We apply our result to the problems of low-rank matrix approximation and low-tubal-rank tensor recovery.

• 15 publications
• 9 publications
• 21 publications
05/21/2017

### Improved Algorithms for Matrix Recovery from Rank-One Projections

We consider the problem of estimation of a low-rank matrix from a limite...
12/05/2014

### Two step recovery of jointly sparse and low-rank matrices: theoretical guarantees

We introduce a two step algorithm with theoretical guarantees to recover...
06/14/2020

### Nonasymptotic Guarantees for Low-Rank Matrix Recovery with Generative Priors

Many problems in statistics and machine learning require the reconstruct...
05/12/2022

### Sketching sparse low-rank matrices with near-optimal sample- and time-complexity

We consider the problem of recovering an n_1 × n_2 low-rank matrix with ...
05/28/2019

### Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix

This paper studies how to sketch element-wise functions of low-rank matr...
10/30/2019

### Learning-Based Low-Rank Approximations

We introduce a "learning-based" algorithm for the low-rank decomposition...
03/08/2017

### On Approximation Guarantees for Greedy Low Rank Optimization

We provide new approximation guarantees for greedy low rank matrix estim...

## 1 Introduction

The prevalence of large-scale data in data science applications has created immense demand for methods that can aid in reducing the computational cost of processing and storing said data. Oftentimes, data such as images, videos, and text documents, can be represented as a matrix, and thus, the ability to efficiently store matrices becomes an important task. One way to efficiently store a large-scale matrix is to store a sketch of the matrix, i.e., another matrix such that two goals are accomplished. Firstly, the sketch of must be cheaper to store than itself, i.e., we want . Second, the matrix must be recoverable from its sketch.

A variety of works have been produced for the setting in which is a low-rank matrix, and one wishes to recover using its sketch [13, 26]. However, in certain settings, one may only have access to noisy sketches. For example, suppose a sketch is stored on a hard disk drive. Over time, the hard drive experiences data degradation due to bits losing their magnetic orientation or extreme fluctuations in temperature affecting the physical hard drive itself [12]. As another example, the matrix being sketched can be a noisy version of the data one is trying to preserve [4]. One can even consider the low-rank approximation problem as one such instance.

In this work, we analyze the noisy double-sketch algorithm originally proposed but not theoretically studied in [13]. We show that when the sketching matrices are i.i.d. complex Gaussian random matrices, one can recover the original low-rank matrix

with high probability where the error on the approximation depends on the noise level for both sketches. Here, we do not assume that one has access to the exact rank of

but instead an approximate rank . We also remark on the utility of our theoretical guarantees for when the double sketch algorithm is used not for low-rank matrix recovery but instead for low-rank approximation with noise. Lastly, we present results for the application of this work to a more extreme large-scale data setting in which one wants to recover a low-tubal-rank tensor.

A key step in our robust recovery analysis is to control the perturbation error of a low-rank matrix under noise. A standard way is to apply Wedin’s theorem, or Davis-Kahan theorem [34, 8, 7, 22]

which results in a bound that depends on the condition number of the low-rank matrix. However, our proof is based on an exact formula to calculate the difference between output and the ground truth matrix and a detailed analysis of the random matrices (extreme singular value bounds of Gaussian matrices

[32, 25, 28] and the least singular value of truncated Haar unitary matrices [3, 9]) involved in the double sketch algorithm. This novel approach yields a bound independent of the condition number of the low-rank matrix (Theorem 2). Due to the Gaussian structure of our sensing matrices, our results are non-asymptotic, and all the constants involved in the probabilistic error bounds are explicit.

### 1.1 Low-rank matrix recovery

A double sketching algorithm was proposed in [13] to recover low-rank matrices. This approach was also called bilateral random projection and analyzed in [40] to obtain a low-rank approximation of matrix using two sketched matrices from in the noiseless situation. A similar approach was analyzed in [29]. The so-called problem of compressive PCA was studied in [26] and [1]. It can be interpreted as a variant of sketching where only the columns of a matrix are sketched. However, this problem is not directly comparable to the setting in the paper at hand, as in compressive PCA, a different sketching matrix is used for each column.

### 1.2 Low-tubal-rank tensor recovery

The notion of a low-tubal-rank tensor stems from the t-product, originally introduced by [18]. We state the relevant definitions for order-3 tensors, and more general definitions for tensors of higher orders can be found in [17, 19].

###### Definition 1 (Operations on tensors).

Let . The unfold of a tensor is defined to be the frontal slice stacking of that tensor. In other words,

 unfold(A)=⎛⎜ ⎜ ⎜ ⎜⎝A1A2⋮An3⎞⎟ ⎟ ⎟ ⎟⎠∈Cn1n3×n2,

where denotes the frontal slice of . We define the inverse of the as so that . The block circulant matrix of is:

 bcirc(A)=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝A1An3An3−1…A2A2A1An3…A3⋮⋮⋮⋱⋮An3An3−1An3−2…A1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Cn1n3×n2n3.

The conjugate transpose of a tensor is the tensor obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices through .

###### Definition 2 (Tensor t-product).

Let and then the t-product between and , denoted , is a tensor of size as is computed as:

 A∗B=fold(bcirc(A)unfold(B)). (1)
###### Definition 3 (Mode-3 fast Fourier transformation (FFT)).

The mode-3 FFT of a tensor , denoted

, is obtained by applying the discrete Fourier Transform matrix,

, to each of :

 ˆAi,j,:=FAi,j,:. (2)

Here,

is a unitary matrix,

is an

-dimensional vector, and the product is the usual matrix-vector product.

###### Definition 4 (t-SVD).

The Tensor Singular Value Decomposition (t-SVD) of a tensor

is given by

 M=U∗S∗V∗, (3)

where and are unitary tensors and is a tubal tensor (a tensor in which each frontal slice is diagonal), and denotes the t-product.

###### Definition 5 (Tubal rank).

The tubal rank of a tensor is the number of non-zero singular tubes of .

###### Definition 6 (CP rank).

The CP rank of an order three tensor is the smallest integer such that is a sum of rank-1 tensor:

 M=r∑i=1ui⊗vi⊗wi,

where , .

###### Remark 1.

If a tensor has CP rank then its tubal rank is at most , see [39, Remark 2.3].

###### Definition 7 (Tensor Frobenius norm).

Let . The Frobenius norm of is given by

 ∥M∥F=⎛⎝∑i1,i2,i3|Mi1,i2,i3|2⎞⎠1/2.

Other low-rank tensor sketching approaches have been proposed for low-CP-rank tensors [16] and low-Tucker-rank tensors [27]. In the following, we focus on low-tubal-rank tensors since this is the topic of this paper.

In the related line of work [20, 38, 37], the authors consider recovering low-tubal-rank tensors through general linear Gaussian measurements of the form . This can be seen as a generalization of the low-rank matrix recovery problem [24] to low-tubal-rank tensors. The proof of tensor recovery under Gaussian measurements in [20, 38, 37] relies crucially on the assumption that the entries of the measurement matrix

are i.i.d. Gaussian. In this setting, it was shown that the tensor nuclear norm is an atomic norm, and a general theorem from

[6, Corollary 12] for i.i.d. measurements for atomic norms was used to establish recovery guarantees. In [33] a non-convex surrogate for the tensor nuclear norm was proposed and studied.

An extension of the matrix sketching algorithm in [30] to a low-tubal-rank approximation of tensors was considered in [23]. However, their setting does not cover noisy sketching, which is the topic of this paper. Streaming low-tubal-rank tensor approximation was considered in [36].

#### Notations

We define a standard complex Gaussian random variable

as , where and are independent. By we define the -th largest singular value of a matrix . By we denote the smallest non-zero singular value of a matrix . is the spectral norm of a matrix , and is a general norm of . is the complex conjugate, and is the pseudo-inverse of . Let be a matrix with orthonormal columns. is the orthogonal complement of , which means the column vectors of and the column vectors of form a complete orthonormal basis.

#### Organization of the paper

The rest of the paper is organized as follows. We state our main results and corollaries of robust recovery of low-rank matrix and low-tubal-rank tensors in Section 2. The proof of all theorems and corollaries are given in Section 3, and auxiliary lemmas are provided in Appendix.

## 2 Main Results

### 2.1 Low-rank matrix recovery

Let be a matrix of rank . be two independent complex Gaussian random matrices with . Define

 Y=SX0+Z,~Y=~SX∗0+~Z, (4)

where are of full rank, and is independent of . The double sketch algorithm outputs

 X=~Y∗(S~Y∗)†Y. (5)

When , we denote the output of (5) as . In this case, the output will be

 ¯¯¯¯¯X=X0~S∗(SX0~S∗)†SX0. (6)

We first show that without noise, the algorithm exactly recovers , i.e., with probability 1.

###### Theorem 1 (Exact recovery).

Let be two independent complex standard Gaussian random matrices. Furthermore, let be a matrix with rank . If and then with probability one , where is as defined in (6).

Our Theorem 1 generalized the exact recovery result [13, Lemma 6], where is assumed to be exactly . Our Theorem 1 implies the exact value of is not needed for the double sketch algorithm, and one can always use the parameter . In fact, our robust recovery result (Theorem 2) suggest choosing a larger makes the output of the double sketch algorithm more robust to noise.

When are not all zero, the robust recovery guarantee is given as follows.

###### Theorem 2 (Robust recovery).

Assume , is of rank and is independent of . For any , with probability at least , the output from the double sketch algorithm given in (5) satisfies

 \vvvertX−X0\vvvert ≤√r(n1−r)\vvvert~Z\vvvert√δ1(√r−√r0−√log(1/δ2))+√r\vvvertZ\vvvert√log(1/(1−ε)),

where is any matrix norm that satisfies for any two matrices and , , and . In particular, it holds for and .

###### Remark 2.

The condition that is of rank can be easily verified in different settings. For example, it holds when is independent of , or , where is of full rank. In the second case, when is a matrix with independent entries generated from a continuous distribution, we cover “low-rank plus noise” sketching.

###### Remark 3.

Our proof of Theorem 2 works for or , but the error bounds are slightly different.

• When , (14

) is replaced by the estimate that

with probability . This implies with probability at least ,

 \vvvertX−X0\vvvert ≤r√n1−r\vvvert~Z\vvvert√δ1(log(1/(1−δ2))+√r\vvvertZ\vvvert√log(1/(1−ε)).
• When , defined in (8) is a unitary matrix and is invertible. We find in (11) and following the rest of the proof we obtain with probability at least ,

 \vvvertX−X0\vvvert≤√n1\vvvertZ\vvvert√log(1/(1−ε)).

### 2.2 Low-rank matrix approximation

When is not low-rank, we can write , where is the best rank- approximation of . Letting , we can use the noisy double sketch model in (4) to consider the sketches

 Y =SX0+Z=SX1+(SE+Z), ~Y =~SX∗0+~Z=~SX∗1+(~SE+~Z).

When , such a problem was considered in [13, 35] using the double sketch algorithm and an extra step of truncated -term SVD of and . See [13] for more details. The noiseless version of the algorithm was also analyzed in [29] and a power scheme modification of the algorithm was analyzed in [40]. In the noiseless setting, a direct application of the double sketch algorithm without the truncation steps yields a weaker error bound from Corollary 1 when compared to [29]. We can handle noise in the double sketch, while the proofs from [29, 40] are not applicable. The proofs in [29, 40] heavily rely on the assumption to use properties of orthogonal projections, which only hold in this noiseless scenario. See for example [29, Fact A.2]. The proof of Corollary 1 is given in Section 3.3.

###### Corollary 1 (Low-rank approximation with noisy sketch).

Let . Let be an integer such that . Consider the algorithm

 Y=SX0+Z,~Y=~SX∗0+~Z,X=~Y∗(S~Y∗)†Y.

Suppose is of rank . For any and , with probability at least , the output satisfies

 ∥X−X0∥2→2≤σr1+1⋅ +√r(n1−r)∥~Z∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥Z∥2→2√log(1/(1−ε)).
###### Remark 4.

Although our error bound depends on , the output is a rank- approximation of the ground truth matrix . This bound is true for any . Therefore one can optimize to find the best bound in terms of the failure probability and the approximation error.

### 2.3 Application to sketching low-tubal-rank tensors

The approach set forth in (5) can be used to sketch and recover low-tubal-rank tensors. For such an application, one considers the low-tubal-rank tensor with tubal rank . Taking the mode-3 FFT of , one obtains which is composed of a collection of matrices (frontal slices) of dimension with rank at most . As such, (5) can be used to sketch each of the frontal slices of . Corollary 2 captures the approximation error for such an approach, and its proof is given in Section 3.3.

###### Corollary 2 (Recovering low tubal-rank tensors).

Let be a low-tubal-rank tensor with rank . Furthermore, let , be two independent complex standard Gaussian random matrices. Consider the measurements

 Y =S∗X0+Z,~Y=~S∗X∗0+~Z, (7)

where , and for all .

1. (Exact Recovery) If and then with probability 1, .

2. (Robust Recovery) If for all , is of rank , then for any , and , with probability at least ,

 ∥X−X0∥2F≤2r(n1−r)∥~Z∥2Fδ1(√r1−√r−√log(1/δ2))2+2r∥Z∥2Flog(1/(1−ε)).
###### Remark 5.

In the closely related work [23], the authors considered low-tubal-rank tensor approximation from noiseless sketches, extending the results from [29], while our setting here is to recover low-tubal-rank tensors from noisy sketches. In addition, [23] requires two sketching tensors with i.i.d. Gaussian entries in the sketching procedure, whereas here we only require two sensing matrices and with independent Gaussian entries to sketch the low-tubal-rank tensors.

## 3 Proof of main results

Our proof for robust matrix recovery, presented in Theorem 2, derives an upper bound for the difference between the output and the ground truth matrix . To accomplish this, the approximation error, , is decomposed into two components. The first part depends on and can be written as for a projection matrix , and we used the oblique projection matrix expression in Lemma 3 to simplify the expression. We then control the error by relating it to the smallest singular value of a truncated Haar unitary matrix. Here we use the crucial fact that when is full-rank,

is uniformly distributed on the Grassmannian of all

-dimensional subspaces in . When is not full rank, does not have such nice property, and our proof technique cannot be directly applied. This part of the proof is summarized in Lemma 1.

The second part of the component in the decomposition of the error depends on is simpler to handle. For this part, a lower bound on the smallest singular value of Gaussian random matrices is utilized.

###### Remark 6.

The distribution of the smallest singular value of truncated Haar unitary matrices was explicitly calculated in [9, 3]. For a more general class of random matrices (including Haar orthogonal matrices), such a distribution was derived in [9, 11, 2] in terms of generalized hypergeometric functions. By using the corresponding tail probability bound [2, Corollary 3.4] for truncated Haar orthogonal matrices, our analysis can be extended to real Gaussian sketching matrices and .

### 3.1 Proof of Theorem 1

###### Proof.

Let be the SVD of where is , is and is an invertible, diagonal matrix. Now we can write as

 ¯¯¯¯¯X =U0Σ0V∗0~S∗(SU0Σ0V∗0~S∗)†SU0Σ0V∗0.

Note that since are Gaussian matrices and since and are orthonormal, and have linearly independent columns with probability 1 and by Lemma 2,

 (SU0)†SU0=I,V∗0~S∗(V∗0~S∗)†=I.

So with probability 1,

 ¯¯¯¯¯X =U0Σ0V∗0~S∗(V∗0~S∗)†Σ−10(SU0)†(SU0)Σ0V∗0=U0ΣV∗0=X0.

### 3.2 Proof of Theorem 2

The double sketch algorithm outputs

 X=~Y∗(S~Y∗)−1(SX0+Z)=~X+~Y∗(S~Y∗)−1Z

where .

Let be such that

 A:=~Z∗+X0~S∗∈Cn1×r.

Since is of rank from the assumption in Theorem 2, we denote the SVD of as

 A =UAΣAV∗A, UA ∈Cn1×r,ΣA∈Cr×r,VA∈Cr×r. (8)

Furthermore, since and are independent, is invertible with probability . Therefore,

 ~Y∗(S~Y∗)† =A(SA)† =(UAΣAV∗A)(SUAΣAV∗A)† =UAΣAV∗AVAΣ−1A(SUA)−1 =UA(SUA)−1. (9)

Using this notation, the output of (5) simplifies to

 X=~X+~Y∗(S~Y∗)−1Z=~X+UA(SUA)−1Z.

Then

 X−X0=~X−X0+~Y∗(S~Y∗)−1Z=(~X−X0)+UA(SUA)−1Z,

and since our goal is to bound the approximation error, we consider

 \vvvertX−X0\vvvert ≤\vvvert~X−X0\vvvert+\vvvertUA(SUA)−1Z\vvvert.

Lemma 1 allows us to bound the first term in this inequality.

###### Lemma 1.

If , then with probability at least with , the output of the algorithm in (5) satisfies

 \vvvert~X−X0\vvvert≤√r(n1−r)\vvvert~Z\vvvert√δ1(√r−√r0−√log(1/δ2)). (10)
###### Proof.

From (9) and Theorem 1,

 ~X−X0 =~Y∗(S~Y∗)†SX0−X0 =UA(SUA)−1SX0−X0 (11) =−(I−UA(SUA)−1S)X0 =−PX0,

where we have set . We observe that is a projection, i.e. which satisfies

 ker P =UA, Im P =ker S=:~V∈Cn1×(n1−r).

Recall that is the SVD of . From Lemma 3,

 PX0 =~V(U∗A,⊥~V)−1U∗A,⊥X0 =~V(U∗A,⊥~V)−1U∗A,⊥U0Σ0V∗0.

Then we can bound

 \vvvert~X−¯¯¯¯¯X\vvvert =\vvvert~V(U∗A,⊥~V)−1U∗A,⊥U0Σ0V∗0\vvvert ≤∥(U∗A,⊥~V)−1∥2→2\vvvertU∗A,⊥U0Σ0V∗0\vvvert. (12)

We focus our efforts on simplifying the second term of (12). Writing in terms of the SVD of , one obtains

 A=~Z∗+X0~S∗=UAΣAV∗A=~Z+U0Σ0V∗0~S∗.

Rearranging terms yields

 U0Σ0V∗0~S∗=A−~Z∗.

Since is of size , is complex Gaussian and has orthonormal columns, then from Lemma 2, with probability , has linearly independent columns. Thus has linearly independent rows and

 (V∗0~S∗)(V∗0~S∗)†=I.

This implies that

We note that

 \vvvertU∗A,⊥U0Σ0V∗0\vvvert =\vvvertU∗A,⊥U0Σ0\vvvert =\vvvertU∗A,⊥(A−~Z∗)(V∗0~S∗)†\vvvert =\vvvertU∗A,⊥~Z∗(V∗0~S∗)†\vvvert ≤\vvvertU∗A,⊥~Z∗\vvvert∥(V∗0~S∗)†∥2→2 =\vvvertU∗A,⊥~Z∗\vvvertσmin(V∗0~S∗)≤\vvvert~Z\vvvertσmin(V∗0~S∗).

Using the fact that for the first term of (12), we then obtain the following bound:

 \vvvert~X−¯¯¯¯¯X\vvvert≤\vvvert~Z\vvvertσmin(U∗A,⊥~V)σmin(~SV0), (13)

which holds with probability 1.

We now derive a probabilistic bound from (13) using concentration inequalities from random matrix theory. Since can be seen as the submatrix of a Haar unitary matrix . By unitary invariance property,

 M:=[U∗A,⊥U∗A](~V,~V⊥)

is also a Haar unitary matrix, and is exactly the upper left corner of . We can apply Lemma 4 to get

 P(∥(U∗A,⊥~V)−1∥≤√r(n1−r)√δ1)≥1−δ1

for any . Since is distributed as a complex Gaussian random matrix, if , by Lemma 5, for any , with probability at least ,

 σmin(~SV0)≥√r−√r0−√log(1/δ2). (14)

Combining the two probability estimates, with probability at least ,

 \vvvert~X−¯¯¯¯¯X\vvvert≤√r(n1−r)\vvvert~Z\vvvert√δ1(√r−√r0−√log(1/δ2)).

For the second term, by the independence between and , is distributed as an standard complex Gaussian matrix it follows from Lemma 6 that with probability at least ,

 σmin(SUA)≥√log(1/(1−ε))r−1/2.

Thus, with probability at least ,

 \vvvertUA(SUA)−1Z\vvvert ≤∥UA∥2→2∥(SUA)−1∥2→2\vvvertZ\vvvert =1σmin(SUA)\vvvertZ\vvvert ≤√r√log(1/(1−ε))\vvvertZ\vvvert. (15)

Combining (10) and (15) yields the desired result.

### 3.3 Proof of Corollaries

###### Proof of Corollary 1.

Write , where is the best rank- approximation to . We obtain

 Y =SX0+Z=SX1+(SE+Z), ~Y =~SX∗0+~Z=~SX∗1+(~SE+~Z).

Applying Theorem 2 to and two error terms and , the following error bound holds with probability at least ,

 ∥X−X0∥2→2≤∥X−X1∥2→2+∥E∥2→2 ≤√r(n1−r)∥~SE+~Z∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥SE+Z∥2→2√log(1/(1−ε))+∥E∥2→2 ≤(√r(n1−r)∥~S∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥S∥2→2√log(1/(1−ε))+1)∥E∥2→2 +√r(n1−r)∥~Z∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥Z∥2→2√log(1/(1−ε)) =(√r(n1−r)∥~S∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥S∥2→2√log(1/(1−ε))+1)σr1+1(X0) +√r(n1−r)∥~Z∥2→2√δ1(√r−√r1−√log(1/δ2))+√r∥Z∥2→2√log(1/(1−ε)).

From concentration the of operator norm of and in Lemma 5, with probability , we have that

 ∥S∥2→2≤√r+√n1+√log(1/δ2),∥~S∥2→2≤√r+√n2+√log(1/δ2).

We then obtain the desired bound. ∎

###### Proof of Corollary 2.

Consider the double sketch tensor approach described in (7):

 Y =S∗X