1 Introduction
The prevalence of largescale data in data science applications has created immense demand for methods that can aid in reducing the computational cost of processing and storing said data. Oftentimes, data such as images, videos, and text documents, can be represented as a matrix, and thus, the ability to efficiently store matrices becomes an important task. One way to efficiently store a largescale matrix is to store a sketch of the matrix, i.e., another matrix such that two goals are accomplished. Firstly, the sketch of must be cheaper to store than itself, i.e., we want . Second, the matrix must be recoverable from its sketch.
A variety of works have been produced for the setting in which is a lowrank matrix, and one wishes to recover using its sketch [13, 26]. However, in certain settings, one may only have access to noisy sketches. For example, suppose a sketch is stored on a hard disk drive. Over time, the hard drive experiences data degradation due to bits losing their magnetic orientation or extreme fluctuations in temperature affecting the physical hard drive itself [12]. As another example, the matrix being sketched can be a noisy version of the data one is trying to preserve [4]. One can even consider the lowrank approximation problem as one such instance.
In this work, we analyze the noisy doublesketch algorithm originally proposed but not theoretically studied in [13]. We show that when the sketching matrices are i.i.d. complex Gaussian random matrices, one can recover the original lowrank matrix
with high probability where the error on the approximation depends on the noise level for both sketches. Here, we do not assume that one has access to the exact rank of
but instead an approximate rank . We also remark on the utility of our theoretical guarantees for when the double sketch algorithm is used not for lowrank matrix recovery but instead for lowrank approximation with noise. Lastly, we present results for the application of this work to a more extreme largescale data setting in which one wants to recover a lowtubalrank tensor.A key step in our robust recovery analysis is to control the perturbation error of a lowrank matrix under noise. A standard way is to apply Wedin’s theorem, or DavisKahan theorem [34, 8, 7, 22]
which results in a bound that depends on the condition number of the lowrank matrix. However, our proof is based on an exact formula to calculate the difference between output and the ground truth matrix and a detailed analysis of the random matrices (extreme singular value bounds of Gaussian matrices
[32, 25, 28] and the least singular value of truncated Haar unitary matrices [3, 9]) involved in the double sketch algorithm. This novel approach yields a bound independent of the condition number of the lowrank matrix (Theorem 2). Due to the Gaussian structure of our sensing matrices, our results are nonasymptotic, and all the constants involved in the probabilistic error bounds are explicit.1.1 Lowrank matrix recovery
A double sketching algorithm was proposed in [13] to recover lowrank matrices. This approach was also called bilateral random projection and analyzed in [40] to obtain a lowrank approximation of matrix using two sketched matrices from in the noiseless situation. A similar approach was analyzed in [29]. The socalled problem of compressive PCA was studied in [26] and [1]. It can be interpreted as a variant of sketching where only the columns of a matrix are sketched. However, this problem is not directly comparable to the setting in the paper at hand, as in compressive PCA, a different sketching matrix is used for each column.
1.2 Lowtubalrank tensor recovery
The notion of a lowtubalrank tensor stems from the tproduct, originally introduced by [18]. We state the relevant definitions for order3 tensors, and more general definitions for tensors of higher orders can be found in [17, 19].
Definition 1 (Operations on tensors).
Let . The unfold of a tensor is defined to be the frontal slice stacking of that tensor. In other words,
where denotes the frontal slice of . We define the inverse of the as so that . The block circulant matrix of is:
The conjugate transpose of a tensor is the tensor obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices through .
Definition 2 (Tensor tproduct).
Let and then the tproduct between and , denoted , is a tensor of size as is computed as:
(1) 
Definition 3 (Mode3 fast Fourier transformation (FFT)).
The mode3 FFT of a tensor , denoted
, is obtained by applying the discrete Fourier Transform matrix,
, to each of :(2) 
Here,
is a unitary matrix,
is andimensional vector, and the product is the usual matrixvector product.
Definition 4 (tSVD).
The Tensor Singular Value Decomposition (tSVD) of a tensor
is given by(3) 
where and are unitary tensors and is a tubal tensor (a tensor in which each frontal slice is diagonal), and denotes the tproduct.
Definition 5 (Tubal rank).
The tubal rank of a tensor is the number of nonzero singular tubes of .
Definition 6 (CP rank).
The CP rank of an order three tensor is the smallest integer such that is a sum of rank1 tensor:
where , .
Remark 1.
If a tensor has CP rank then its tubal rank is at most , see [39, Remark 2.3].
Definition 7 (Tensor Frobenius norm).
Let . The Frobenius norm of is given by
Other lowrank tensor sketching approaches have been proposed for lowCPrank tensors [16] and lowTuckerrank tensors [27]. In the following, we focus on lowtubalrank tensors since this is the topic of this paper.
In the related line of work [20, 38, 37], the authors consider recovering lowtubalrank tensors through general linear Gaussian measurements of the form . This can be seen as a generalization of the lowrank matrix recovery problem [24] to lowtubalrank tensors. The proof of tensor recovery under Gaussian measurements in [20, 38, 37] relies crucially on the assumption that the entries of the measurement matrix
are i.i.d. Gaussian. In this setting, it was shown that the tensor nuclear norm is an atomic norm, and a general theorem from
[6, Corollary 12] for i.i.d. measurements for atomic norms was used to establish recovery guarantees. In [33] a nonconvex surrogate for the tensor nuclear norm was proposed and studied.An extension of the matrix sketching algorithm in [30] to a lowtubalrank approximation of tensors was considered in [23]. However, their setting does not cover noisy sketching, which is the topic of this paper. Streaming lowtubalrank tensor approximation was considered in [36].
Notations
We define a standard complex Gaussian random variable
as , where and are independent. By we define the th largest singular value of a matrix . By we denote the smallest nonzero singular value of a matrix . is the spectral norm of a matrix , and is a general norm of . is the complex conjugate, and is the pseudoinverse of . Let be a matrix with orthonormal columns. is the orthogonal complement of , which means the column vectors of and the column vectors of form a complete orthonormal basis.Organization of the paper
2 Main Results
2.1 Lowrank matrix recovery
Let be a matrix of rank . be two independent complex Gaussian random matrices with . Define
(4) 
where are of full rank, and is independent of . The double sketch algorithm outputs
(5) 
When , we denote the output of (5) as . In this case, the output will be
(6) 
We first show that without noise, the algorithm exactly recovers , i.e., with probability 1.
Theorem 1 (Exact recovery).
Let be two independent complex standard Gaussian random matrices. Furthermore, let be a matrix with rank . If and then with probability one , where is as defined in (6).
Our Theorem 1 generalized the exact recovery result [13, Lemma 6], where is assumed to be exactly . Our Theorem 1 implies the exact value of is not needed for the double sketch algorithm, and one can always use the parameter . In fact, our robust recovery result (Theorem 2) suggest choosing a larger makes the output of the double sketch algorithm more robust to noise.
When are not all zero, the robust recovery guarantee is given as follows.
Theorem 2 (Robust recovery).
Assume , is of rank and is independent of . For any , with probability at least , the output from the double sketch algorithm given in (5) satisfies
where is any matrix norm that satisfies for any two matrices and , , and . In particular, it holds for and .
Remark 2.
The condition that is of rank can be easily verified in different settings. For example, it holds when is independent of , or , where is of full rank. In the second case, when is a matrix with independent entries generated from a continuous distribution, we cover “lowrank plus noise” sketching.
Remark 3.
Our proof of Theorem 2 works for or , but the error bounds are slightly different.
2.2 Lowrank matrix approximation
When is not lowrank, we can write , where is the best rank approximation of . Letting , we can use the noisy double sketch model in (4) to consider the sketches
When , such a problem was considered in [13, 35] using the double sketch algorithm and an extra step of truncated term SVD of and . See [13] for more details. The noiseless version of the algorithm was also analyzed in [29] and a power scheme modification of the algorithm was analyzed in [40]. In the noiseless setting, a direct application of the double sketch algorithm without the truncation steps yields a weaker error bound from Corollary 1 when compared to [29]. We can handle noise in the double sketch, while the proofs from [29, 40] are not applicable. The proofs in [29, 40] heavily rely on the assumption to use properties of orthogonal projections, which only hold in this noiseless scenario. See for example [29, Fact A.2]. The proof of Corollary 1 is given in Section 3.3.
Corollary 1 (Lowrank approximation with noisy sketch).
Let . Let be an integer such that . Consider the algorithm
Suppose is of rank . For any and , with probability at least , the output satisfies
Remark 4.
Although our error bound depends on , the output is a rank approximation of the ground truth matrix . This bound is true for any . Therefore one can optimize to find the best bound in terms of the failure probability and the approximation error.
2.3 Application to sketching lowtubalrank tensors
The approach set forth in (5) can be used to sketch and recover lowtubalrank tensors. For such an application, one considers the lowtubalrank tensor with tubal rank . Taking the mode3 FFT of , one obtains which is composed of a collection of matrices (frontal slices) of dimension with rank at most . As such, (5) can be used to sketch each of the frontal slices of . Corollary 2 captures the approximation error for such an approach, and its proof is given in Section 3.3.
Corollary 2 (Recovering low tubalrank tensors).
Let be a lowtubalrank tensor with rank . Furthermore, let , be two independent complex standard Gaussian random matrices. Consider the measurements
(7) 
where , and for all .

(Exact Recovery) If and then with probability 1, .

(Robust Recovery) If for all , is of rank , then for any , and , with probability at least ,
Remark 5.
In the closely related work [23], the authors considered lowtubalrank tensor approximation from noiseless sketches, extending the results from [29], while our setting here is to recover lowtubalrank tensors from noisy sketches. In addition, [23] requires two sketching tensors with i.i.d. Gaussian entries in the sketching procedure, whereas here we only require two sensing matrices and with independent Gaussian entries to sketch the lowtubalrank tensors.
3 Proof of main results
Our proof for robust matrix recovery, presented in Theorem 2, derives an upper bound for the difference between the output and the ground truth matrix . To accomplish this, the approximation error, , is decomposed into two components. The first part depends on and can be written as for a projection matrix , and we used the oblique projection matrix expression in Lemma 3 to simplify the expression. We then control the error by relating it to the smallest singular value of a truncated Haar unitary matrix. Here we use the crucial fact that when is fullrank,
is uniformly distributed on the Grassmannian of all
dimensional subspaces in . When is not full rank, does not have such nice property, and our proof technique cannot be directly applied. This part of the proof is summarized in Lemma 1.The second part of the component in the decomposition of the error depends on is simpler to handle. For this part, a lower bound on the smallest singular value of Gaussian random matrices is utilized.
Remark 6.
The distribution of the smallest singular value of truncated Haar unitary matrices was explicitly calculated in [9, 3]. For a more general class of random matrices (including Haar orthogonal matrices), such a distribution was derived in [9, 11, 2] in terms of generalized hypergeometric functions. By using the corresponding tail probability bound [2, Corollary 3.4] for truncated Haar orthogonal matrices, our analysis can be extended to real Gaussian sketching matrices and .
3.1 Proof of Theorem 1
Proof.
Let be the SVD of where is , is and is an invertible, diagonal matrix. Now we can write as
Note that since are Gaussian matrices and since and are orthonormal, and have linearly independent columns with probability 1 and by Lemma 2,
So with probability 1,
∎
3.2 Proof of Theorem 2
The double sketch algorithm outputs
where .
Let be such that
Since is of rank from the assumption in Theorem 2, we denote the SVD of as
(8) 
Furthermore, since and are independent, is invertible with probability . Therefore,
(9) 
Using this notation, the output of (5) simplifies to
Then
and since our goal is to bound the approximation error, we consider
Lemma 1 allows us to bound the first term in this inequality.
Lemma 1.
If , then with probability at least with , the output of the algorithm in (5) satisfies
(10) 
Proof.
(11)  
where we have set . We observe that is a projection, i.e. which satisfies
Recall that is the SVD of . From Lemma 3,
Then we can bound
(12) 
We focus our efforts on simplifying the second term of (12). Writing in terms of the SVD of , one obtains
Rearranging terms yields
Since is of size , is complex Gaussian and has orthonormal columns, then from Lemma 2, with probability , has linearly independent columns. Thus has linearly independent rows and
This implies that
We note that
Using the fact that for the first term of (12), we then obtain the following bound:
(13) 
which holds with probability 1.
We now derive a probabilistic bound from (13) using concentration inequalities from random matrix theory. Since can be seen as the submatrix of a Haar unitary matrix . By unitary invariance property,
is also a Haar unitary matrix, and is exactly the upper left corner of . We can apply Lemma 4 to get
for any . Since is distributed as a complex Gaussian random matrix, if , by Lemma 5, for any , with probability at least ,
(14) 
Combining the two probability estimates, with probability at least ,
∎
3.3 Proof of Corollaries
Proof of Corollary 1.
Write , where is the best rank approximation to . We obtain