The prevalence of large-scale data in data science applications has created immense demand for methods that can aid in reducing the computational cost of processing and storing said data. Oftentimes, data such as images, videos, and text documents, can be represented as a matrix, and thus, the ability to efficiently store matrices becomes an important task. One way to efficiently store a large-scale matrix is to store a sketch of the matrix, i.e., another matrix such that two goals are accomplished. Firstly, the sketch of must be cheaper to store than itself, i.e., we want . Second, the matrix must be recoverable from its sketch.
A variety of works have been produced for the setting in which is a low-rank matrix, and one wishes to recover using its sketch [13, 26]. However, in certain settings, one may only have access to noisy sketches. For example, suppose a sketch is stored on a hard disk drive. Over time, the hard drive experiences data degradation due to bits losing their magnetic orientation or extreme fluctuations in temperature affecting the physical hard drive itself . As another example, the matrix being sketched can be a noisy version of the data one is trying to preserve . One can even consider the low-rank approximation problem as one such instance.
In this work, we analyze the noisy double-sketch algorithm originally proposed but not theoretically studied in . We show that when the sketching matrices are i.i.d. complex Gaussian random matrices, one can recover the original low-rank matrix
with high probability where the error on the approximation depends on the noise level for both sketches. Here, we do not assume that one has access to the exact rank ofbut instead an approximate rank . We also remark on the utility of our theoretical guarantees for when the double sketch algorithm is used not for low-rank matrix recovery but instead for low-rank approximation with noise. Lastly, we present results for the application of this work to a more extreme large-scale data setting in which one wants to recover a low-tubal-rank tensor.
which results in a bound that depends on the condition number of the low-rank matrix. However, our proof is based on an exact formula to calculate the difference between output and the ground truth matrix and a detailed analysis of the random matrices (extreme singular value bounds of Gaussian matrices[32, 25, 28] and the least singular value of truncated Haar unitary matrices [3, 9]) involved in the double sketch algorithm. This novel approach yields a bound independent of the condition number of the low-rank matrix (Theorem 2). Due to the Gaussian structure of our sensing matrices, our results are non-asymptotic, and all the constants involved in the probabilistic error bounds are explicit.
1.1 Low-rank matrix recovery
A double sketching algorithm was proposed in  to recover low-rank matrices. This approach was also called bilateral random projection and analyzed in  to obtain a low-rank approximation of matrix using two sketched matrices from in the noiseless situation. A similar approach was analyzed in . The so-called problem of compressive PCA was studied in  and . It can be interpreted as a variant of sketching where only the columns of a matrix are sketched. However, this problem is not directly comparable to the setting in the paper at hand, as in compressive PCA, a different sketching matrix is used for each column.
1.2 Low-tubal-rank tensor recovery
The notion of a low-tubal-rank tensor stems from the t-product, originally introduced by . We state the relevant definitions for order-3 tensors, and more general definitions for tensors of higher orders can be found in [17, 19].
Definition 1 (Operations on tensors).
Let . The unfold of a tensor is defined to be the frontal slice stacking of that tensor. In other words,
where denotes the frontal slice of . We define the inverse of the as so that . The block circulant matrix of is:
The conjugate transpose of a tensor is the tensor obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices through .
Definition 2 (Tensor t-product).
Let and then the t-product between and , denoted , is a tensor of size as is computed as:
Definition 3 (Mode-3 fast Fourier transformation (FFT)).
Definition 4 (t-SVD).
The Tensor Singular Value Decomposition (t-SVD) of a tensor
The Tensor Singular Value Decomposition (t-SVD) of a tensoris given by
where and are unitary tensors and is a tubal tensor (a tensor in which each frontal slice is diagonal), and denotes the t-product.
Definition 5 (Tubal rank).
The tubal rank of a tensor is the number of non-zero singular tubes of .
Definition 6 (CP rank).
The CP rank of an order three tensor is the smallest integer such that is a sum of rank-1 tensor:
where , .
If a tensor has CP rank then its tubal rank is at most , see [39, Remark 2.3].
Definition 7 (Tensor Frobenius norm).
Let . The Frobenius norm of is given by
Other low-rank tensor sketching approaches have been proposed for low-CP-rank tensors  and low-Tucker-rank tensors . In the following, we focus on low-tubal-rank tensors since this is the topic of this paper.
In the related line of work [20, 38, 37], the authors consider recovering low-tubal-rank tensors through general linear Gaussian measurements of the form . This can be seen as a generalization of the low-rank matrix recovery problem  to low-tubal-rank tensors. The proof of tensor recovery under Gaussian measurements in [20, 38, 37] relies crucially on the assumption that the entries of the measurement matrix
are i.i.d. Gaussian. In this setting, it was shown that the tensor nuclear norm is an atomic norm, and a general theorem from[6, Corollary 12] for i.i.d. measurements for atomic norms was used to establish recovery guarantees. In  a non-convex surrogate for the tensor nuclear norm was proposed and studied.
An extension of the matrix sketching algorithm in  to a low-tubal-rank approximation of tensors was considered in . However, their setting does not cover noisy sketching, which is the topic of this paper. Streaming low-tubal-rank tensor approximation was considered in .
We define a standard complex Gaussian random variableas , where and are independent. By we define the -th largest singular value of a matrix . By we denote the smallest non-zero singular value of a matrix . is the spectral norm of a matrix , and is a general norm of . is the complex conjugate, and is the pseudo-inverse of . Let be a matrix with orthonormal columns. is the orthogonal complement of , which means the column vectors of and the column vectors of form a complete orthonormal basis.
Organization of the paper
2 Main Results
2.1 Low-rank matrix recovery
Let be a matrix of rank . be two independent complex Gaussian random matrices with . Define
where are of full rank, and is independent of . The double sketch algorithm outputs
When , we denote the output of (5) as . In this case, the output will be
We first show that without noise, the algorithm exactly recovers , i.e., with probability 1.
Theorem 1 (Exact recovery).
Let be two independent complex standard Gaussian random matrices. Furthermore, let be a matrix with rank . If and then with probability one , where is as defined in (6).
Our Theorem 1 generalized the exact recovery result [13, Lemma 6], where is assumed to be exactly . Our Theorem 1 implies the exact value of is not needed for the double sketch algorithm, and one can always use the parameter . In fact, our robust recovery result (Theorem 2) suggest choosing a larger makes the output of the double sketch algorithm more robust to noise.
When are not all zero, the robust recovery guarantee is given as follows.
Theorem 2 (Robust recovery).
Assume , is of rank and is independent of . For any , with probability at least , the output from the double sketch algorithm given in (5) satisfies
where is any matrix norm that satisfies for any two matrices and , , and . In particular, it holds for and .
The condition that is of rank can be easily verified in different settings. For example, it holds when is independent of , or , where is of full rank. In the second case, when is a matrix with independent entries generated from a continuous distribution, we cover “low-rank plus noise” sketching.
Our proof of Theorem 2 works for or , but the error bounds are slightly different.
2.2 Low-rank matrix approximation
When is not low-rank, we can write , where is the best rank- approximation of . Letting , we can use the noisy double sketch model in (4) to consider the sketches
When , such a problem was considered in [13, 35] using the double sketch algorithm and an extra step of truncated -term SVD of and . See  for more details. The noiseless version of the algorithm was also analyzed in  and a power scheme modification of the algorithm was analyzed in . In the noiseless setting, a direct application of the double sketch algorithm without the truncation steps yields a weaker error bound from Corollary 1 when compared to . We can handle noise in the double sketch, while the proofs from [29, 40] are not applicable. The proofs in [29, 40] heavily rely on the assumption to use properties of orthogonal projections, which only hold in this noiseless scenario. See for example [29, Fact A.2]. The proof of Corollary 1 is given in Section 3.3.
Corollary 1 (Low-rank approximation with noisy sketch).
Let . Let be an integer such that . Consider the algorithm
Suppose is of rank . For any and , with probability at least , the output satisfies
Although our error bound depends on , the output is a rank- approximation of the ground truth matrix . This bound is true for any . Therefore one can optimize to find the best bound in terms of the failure probability and the approximation error.
2.3 Application to sketching low-tubal-rank tensors
The approach set forth in (5) can be used to sketch and recover low-tubal-rank tensors. For such an application, one considers the low-tubal-rank tensor with tubal rank . Taking the mode-3 FFT of , one obtains which is composed of a collection of matrices (frontal slices) of dimension with rank at most . As such, (5) can be used to sketch each of the frontal slices of . Corollary 2 captures the approximation error for such an approach, and its proof is given in Section 3.3.
Corollary 2 (Recovering low tubal-rank tensors).
Let be a low-tubal-rank tensor with rank . Furthermore, let , be two independent complex standard Gaussian random matrices. Consider the measurements
where , and for all .
(Exact Recovery) If and then with probability 1, .
(Robust Recovery) If for all , is of rank , then for any , and , with probability at least ,
In the closely related work , the authors considered low-tubal-rank tensor approximation from noiseless sketches, extending the results from , while our setting here is to recover low-tubal-rank tensors from noisy sketches. In addition,  requires two sketching tensors with i.i.d. Gaussian entries in the sketching procedure, whereas here we only require two sensing matrices and with independent Gaussian entries to sketch the low-tubal-rank tensors.
3 Proof of main results
Our proof for robust matrix recovery, presented in Theorem 2, derives an upper bound for the difference between the output and the ground truth matrix . To accomplish this, the approximation error, , is decomposed into two components. The first part depends on and can be written as for a projection matrix , and we used the oblique projection matrix expression in Lemma 3 to simplify the expression. We then control the error by relating it to the smallest singular value of a truncated Haar unitary matrix. Here we use the crucial fact that when is full-rank,
is uniformly distributed on the Grassmannian of all-dimensional subspaces in . When is not full rank, does not have such nice property, and our proof technique cannot be directly applied. This part of the proof is summarized in Lemma 1.
The second part of the component in the decomposition of the error depends on is simpler to handle. For this part, a lower bound on the smallest singular value of Gaussian random matrices is utilized.
The distribution of the smallest singular value of truncated Haar unitary matrices was explicitly calculated in [9, 3]. For a more general class of random matrices (including Haar orthogonal matrices), such a distribution was derived in [9, 11, 2] in terms of generalized hypergeometric functions. By using the corresponding tail probability bound [2, Corollary 3.4] for truncated Haar orthogonal matrices, our analysis can be extended to real Gaussian sketching matrices and .
3.1 Proof of Theorem 1
Let be the SVD of where is , is and is an invertible, diagonal matrix. Now we can write as
Note that since are Gaussian matrices and since and are orthonormal, and have linearly independent columns with probability 1 and by Lemma 2,
So with probability 1,
3.2 Proof of Theorem 2
The double sketch algorithm outputs
Let be such that
Since is of rank from the assumption in Theorem 2, we denote the SVD of as
Furthermore, since and are independent, is invertible with probability . Therefore,
Using this notation, the output of (5) simplifies to
and since our goal is to bound the approximation error, we consider
Lemma 1 allows us to bound the first term in this inequality.
If , then with probability at least with , the output of the algorithm in (5) satisfies
where we have set . We observe that is a projection, i.e. which satisfies
Recall that is the SVD of . From Lemma 3,
Then we can bound
We focus our efforts on simplifying the second term of (12). Writing in terms of the SVD of , one obtains
Rearranging terms yields
Since is of size , is complex Gaussian and has orthonormal columns, then from Lemma 2, with probability , has linearly independent columns. Thus has linearly independent rows and
This implies that
We note that
Using the fact that for the first term of (12), we then obtain the following bound:
which holds with probability 1.
We now derive a probabilistic bound from (13) using concentration inequalities from random matrix theory. Since can be seen as the submatrix of a Haar unitary matrix . By unitary invariance property,
is also a Haar unitary matrix, and is exactly the upper left corner of . We can apply Lemma 4 to get
for any . Since is distributed as a complex Gaussian random matrix, if , by Lemma 5, for any , with probability at least ,
Combining the two probability estimates, with probability at least ,
3.3 Proof of Corollaries
Proof of Corollary 1.
Write , where is the best rank- approximation to . We obtain