# On Recoverability of Randomly Compressed Tensors with Low CP Rank

Our interest lies in the recoverability properties of compressed tensors under the canonical polyadic decomposition (CPD) model. The considered problem is well-motivated in many applications, e.g., hyperspectral image and video compression. Prior work studied this problem under somewhat special assumptions—e.g., the latent factors of the tensor are sparse or drawn from absolutely continuous distributions. We offer an alternative result: We show that if the tensor is compressed by a subgaussian linear mapping, then the tensor is recoverable if the number of measurements is on the same order of magnitude as that of the model parameters—without strong assumptions on the latent factors. Our proof is based on deriving a restricted isometry property (R.I.P.) under the CPD model via set covering techniques, and thus exhibits a flavor of classic compressive sensing. The new recoverability result enriches the understanding to the compressed CP tensor recovery problem; it offers theoretical guarantees for recovering tensors whose elements are not necessarily continuous or sparse.

## Authors

• 7 publications
• 39 publications
• 23 publications
08/22/2019

### Iterative Hard Thresholding for Low CP-rank Tensor Models

Recovery of low-rank matrices from a small number of linear measurements...
06/14/2015

### Fast and Guaranteed Tensor Decomposition via Sketching

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in sta...
03/27/2017

### Randomized CP Tensor Decomposition

The CANDECOMP/PARAFAC (CP) tensor decomposition is a popular dimensional...
07/03/2017

### Vectorial Dimension Reduction for Tensors Based on Bayesian Inference

Dimensionality reduction for high-order tensors is a challenging problem...
10/28/2018

### Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

We analyze linear independence of rank one tensors produced by tensor po...
02/09/2018

### Curve Registered Coupled Low Rank Factorization

We propose an extension of the canonical polyadic (CP) tensor model wher...
04/08/2017

### Noisy Tensor Completion for Tensors with a Sparse Canonical Polyadic Factor

In this paper we study the problem of noisy tensor completion for tensor...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many signal processing problems boil down to an inverse problem. Consider a system of linear equations, i.e.,

 y=Φx (1)

where denotes a sensing system,

is the observed measurement vector, and

is the signal of interest. The task of the inverse probem is to recover from with the knowledge of the sensing system . In many cases, the number of measurements is much smaller than that of the signal dimension, i.e., , which makes the inverse problem highly under-determined. In general, recovering is impossible under such cases—an infinite number of solutions exist because admits a nontrivial null space [1].

To recover when , one workaround is to exploit some special structure of . For example, in compressive sensing (CS) [2, 3, 4], it is now well-known that if is a sparse vector, recovery is possible under some conditions. This is not entirely surprising, since if the number of nonzero elements in is small, the system of linear equations in (1) is “essentially over-determined”. An extension of CS is low-rank matrix recovery (LMR) [5, 6]. Similarly, when and is a low-rank matrix, the number of unknowns can be much smaller than , which again makes the inverse problem virtually over-determined. Both CS and LMR have received tremendous attention due to their wide spectrum of applications [7, 8, 9, 10].

As a step further, tensor compression and recovery [11, 12, 13, 14, 15] is also quite well-motivated, since many real-world signals are naturally low-rank tensors. For example, remotely sensed hyperspectral images are third-order tensors (each data entry has two spatial coordinates and one spectral coordinate) [16, 17]. For sensing devices deployed on satellites or aircrafts, compression is needed for transmitting the acquired data back to earth stations [11, 14]

. This way, the communication overhead can be substantially reduced. A lot of data arising in machine learning are also tensors, e.g., social network data

[18] and traffic flow data [13]. Compressing such data helps save space for storage and overhead for transmission.

A number of works have considered recoverability properties in tensor compression. The recent work [12] considers recovering tensors from random measurements under the Tucker model, hierarchical Tucker (HT) model, and the tensor train (TT) model, respectively. The works in [19, 20, 21] consider recovering compressed tensors under the canonical polyadic decomposition (CPD) model. Notably, [19] shows that tensors with low CP rank and sparse latent factors can be recovered from compressed measurements, via solving a series of CS problems in the latent domain. The work [21] shows that if the latent factors are drawn from a certain joint continuous distribution, then the compressed tensor can be recovered almost surely if the number of measurements is larger than or equal to that of the parameters in the CPD model. These are all plausible results—showing that recovering compressed tensors is viable under some conditions.

In this work, we offer a new result regarding recoverability of compressed tensors that follow the CPD model (or, CP tensors for short). Our result is different from the existing recoverability arguments in [19, 21] in the sense that no sparsity or distributional assumption is imposed on the latent factors in our case. Our technical approach is based on set covering and deriving a new restricted isometry property (R.I.P.) for CP tensors, which is similar to the route of proof in [12] that considers the Tucker, HT and TT models. Showing that a compression system satisfies R.I.P. for CP tensors is challenging since the latent factors of the CPD model cannot be orthogonalized in most cases—as CPD is essentially unique under mild conditions. However, orthogonality of the latent factors are hinged on to show R.I.P. for Tucker/HT/TT tensors. Nevertheless, we show that recovering a tensor with low CP rank from limited measurements is possible—if the latent factors are reasonably well-conditioned. Unlike existing results, our recovery proof does not impose sparsity or continuity constraints on the latent factors of the CP tensor, and thus covers cases whose recoverability properties were unknown before.

## 2 Problem Statement and Background

### 2.1 Tensor Preliminaries

An th order tensor is an array whose elements are indexed by indices, namely, , which can be considered as a high-dimensional extension of a matrix. Unlike matrices whose definition for rank is singular, there are many different definitions of tensor rank [22, 23]. Among them, a popular and useful one is CP rank. Specifically, the CP rank of a tensor , means that is the smallest integer such that is expressed as follows:

 X––=F∑f=1A(1)(:,f)∘…∘A(N)(:,f)∈RI1×…×IN, (2)

where denotes the mode- latent factor under CPD and “” is the outer product; see details in [23]. The term is called a rank-one tensor. CPD is seemingly similar to the matrix SVD, since SVD can also be understood as a summation of rank-one matrices. However, the ’s in (2) cannot always be orthogonalized as in the SVD case, because the CPD is essentially unique under mild conditions; see details in the tutorial on CPD uniqueness [23].

Besides CPD, many other tensor decomposition models exist in the literature. For example, Tucker decomposition [24], hierarchical Tucker (HT) decomposition [12] and tensor train (TT) decomposition[25] are also considered useful in representing tensor data in parsimonious ways.

### 2.2 The Compressed Tensor Recovery Problem

Our interest lies in the following linear system:

 y=A(X––♮), (3)

where is the “ground-truth signal” of interest, is a linear mapping, i.e., where . When , the inverse problem of recovering from may have an infinite number of solutions. However, if is a low CP rank tensor with and the number of linear measurements is larger than the number of unknown parameters (i.e., ), then the inverse problem is “essentially over-determined”, and recovering is possible—which is the starting point of our work.

Consider a recovery criterion as follows:

Recovery Criterion:

(4a) (4b)

We are concerned with the recoverability properties of Criterion (4). Specifically, assume that one can solve Problem (4) to optimality using a certain algorithm, does the optimal solution(s) (denoted by ) recover the uncompressed signal under some conditions on and ? In addition, how many measurements are needed to recover ?

### 2.3 Related Work

#### 2.3.1 Tucker, HT, and TT Tensors

The recent work in [12] considered a similar problem but the tensors admit low-rank Tucker, HT, or TT representation. Assuming that a subgaussian mapping is used, then when the number of measurements is on the same order of magnitude as that of the tensor parameters, then recovery is possible under the Tucker, HT, and TT models.

#### 2.3.2 CP Tensors

It is also of great interest to study the recoverability properties of CP tensors, since exact CPD exists for every tensor without modeling error [23]. In addition, the CP representation is very economical in terms of the number of unknowns (i.e. ), which only increases linearly with the tensor order (while Tucker’s number of parameters increases exponentially with ).

Several notable works on CP tensor recovery appeared in recent years. Specifically, the work in [19] considers a case where ’s latent factors are all sparse. Using a special sensing system where “” denotes the Kronecker product, the tensor recovery problem can be recast as a series classical CS problems in the latent factor domain—which helps establish the identifiability of ’s, thereby that of . The work in [20] extends this latent factor recovery-based approach to dense ’s, with the price of using many more different compressed measurements in parallel. The works in [19, 20] are both based on the assumption that the compressed measurements are small tensors that admit unique CPD. In [21], this assumption is relaxed. There, almost sure recoverability of is shown under the assumption that ’s and are drawn from certain joint continuous distributions. The sample complexity proved in [21] is appealing, which is exactly the number of unknowns. The caveat is that the ’s have to follow a certain continuous distribution—which means that some important types of tensors (e.g., tensors with discrete latent factors that have applications in machine learning [18, 26, 27]) may not be covered by the recoverability theorem in [21].

## 3 Main Result

In this work, we consider the recoverability problem for CP tensors as in [19, 20, 21]. Unlike these prior works, we do not restrict and its compressed versions to admit unique CPD or assume that ’s latent factors are drawn from joint continuous distributions. As a trade-off, we restrict the entries of the sensing matrix to be zero-mean i.i.d. subgaussian (see [28] for more details about subgaussian matrices). Subgaussian sensing matrices are widely used in compressive sensing and dimensionality reduction, since they have a lot of appealing features [28, 2, 5, 12]. Fortunately, in many scenarios, the sensing/compressing matrix is under control of the system designers (e.g., in communications)—and thus assuming subguassianity of is considered reasonable in such cases.

### 3.1 Recoverability under CP Tensor R.I.P.

Let us consider the following definition:

###### Definition 1

(CP tensor R.I.P.) Assume that for all and , the following holds:

 (1−δF)∥X––∥2F≤∥A(X––)∥2F≤(1+δF)∥X––∥2F. (5)

Then, it is said that the mapping satisfies the restricted isometry property (R.I.P.) with parameter for tensors with CP rank being .

If a mapping on a set of tensors satisfies R.I.P., then recoverability of this set of tensors can be readily established:

###### Lemma 1

(Recoverability under R.I.P.) If satisfies R.I.P. for tensors whose CP rank is smaller than or equal to with parameter . Assume that . Then, the optimal solution to Problem (4) is .

Proof: The proof is the same as that in matrix recovery [5]. Assume that there is a rank- tensor and which satisfies Then, we have

 0=∥A(X––♮−Z––)∥2F≥(1−δ2F)∥X––♮−Z––∥2F>0,

which is a contradiction. In the above, we have used the facts that and that satisfies R.I.P. for all rank- CP tensors.

From Lemma 1, one can see that, if we could prove that for all the tensors in , some satisfies R.I.P. with , then the recoverability can be established. Showing this for all rank- tensors is, unfortunately, challenging. As we will see, it turns out that the conditioning of ’s plays an important role of establishing R.I.P. for low-rank CP tensors. This is quite different from the low-rank matrix (or the Tucker/HT/TT tensor) case, where only the matrix size and rank matter. This contrast makes sense, since the matrix latent factors under SVD are always orthonormal, and thus the condition numbers of the latent factors are constants. But for CP tensors, since ’s are essentially unique and not orthogonalizable in many cases, the impact of their conditioning naturally shows up. To proceed, we define the following parameter:

###### Definition 2

The condition number of the CP tensor is defined as follows:

 κ(X––)=∏Nn=1σmax(A(n))σmin(⊙Nn=1A(n)).

One can see that implies that , which is a necessary condition for the CPD of being essentially unique [23]. The parameter is clearly related to the condition numbers of ’s. This may be clearer when and for all . Under such cases, we have

 σmin(⊙Nn=1A(n)) =min∥x∥2=1 ∥∥(⊙Nn=1A(n))x∥∥2 (6a) =min∥x∥2=1 ∥∥(⊗Nn=1A(n))Px∥∥2 (6b) ≥N∏n=1σmin(A(n))∥P∥2∥x∥2 (6c) =N∏n=1σmin(A(n)) (6d)

where denotes the Khatri-Rao product and is a column selection matrix (and thus ) and we have used the fact that the columns of is a subset of the columns of . The above leads to where denotes the matrix condition number of . Hence, can be understood as a parameter that reflects the conditioning of the latent factors. From the above, another note is that , resulting in , which resembles the property of the matrix condition number, i.e., for any .

With this parameter defined, our main result is stated in the following theorem:

###### Theorem 1

Assume that is a mapping such that , where has i.i.d. zero-mean -subgaussian entries. In addition, assume that and . Then, for a certain constant , the criterion in (4) recovers

at its optimal solution with a probability larger than or equal to

if

 M>Cα2max{(1+2N∑n=1InF)log(3(N+1)τ),log(η−1)}.

Note that is known as the subgaussian parameter which is related to the subgaussian distribution that generates the entries of . For example, is -subgaussian; see more details in [28, 29]. Also note that since we only need to establish recoverability, the lower bound of does not contain explicitly.

From Theorem 1, one can see that with , recovering from compressed measurements is possible. This and the number of unknowns have the same order of magnitude—which is quite plausible. In addition, there is no sparsity or continuous distribution assumptions on , which means that Theorem 1 may be able to cover cases where previous recoverability results in [19, 21] do not support.

### 3.2 Proof of Theorem 1

In this section, we outline the proof of Theorem 1 in a concise way. Some of the details can be found in the supplementary materials. Consider the following set of low-rank tensors:

 SF,τ={˜X–– ∣∣∣ ˜X––=X––∥X––∥F,rankC(X––)≤F, κ(–X)≤τ}.

We will show that (5) holds for with high probability if is drawn from a subgaussian distribution. This will imply that (5) holds for all the ’s associated with since the mapping in (5) is linear. Note that

 X––(i1,…,in)∥X––∥F =F∑f=1∏Nn=1σmax(A(n))∥X––∥F~λN∏n=1A(n)(in,f)∥A(n)∥2~A(n)(in,f) =F∑f=1~λN∏n=1~A(n)(in,f).

Since ,

 ~λ =∏Nn=1σmax(A(n))∥X––∥F≤∏Nn=1σmax(A(n))σmin(⊙Nn=1A(n))√F=κ(X––)√F.

Consequently, we have

 σmax(~A(n))=∥~A(n)∥2=1,λ≤\nicefracκ(X––)√F≤\nicefracτ√F. (7)

For notational simplicity, we now represent all the tensors in as , where and satisfy (7) and . The set has an infinite number of elements. To establish R.I.P. we construct an -net (w.r.t. Euclidean norm) that covers . An -net of , denoted as , is a finite set such that for any , one can find a such that [30]. We have the following proposition:

###### Proposition 1

There exists an -net with respect to the Fronenius norm such that the cardinality of is upper bounded by the following inequality:

 |¯SF,τ|≤(3(N+1)τ/ε)1+∑Nn=1InF.

The proof of Proposition 1 is given in the supplementary materials. With Proposition 1 at hands, we show that

###### Proposition 2

For and zero-mean -subgaussian sensing matrix , -R.I.P holds for and for a certain constant with probability larger than provided that

 M≥Cα2δ−2Fmax{(1+N∑n=1InF)log(3(N+1)τ),log(η−1)}.

Proposition 2 invokes Corollary 5.4 in [28] (see more details of the proof in the supplementary materials).

Combining Propositions 1-2 and Lemma 1, and the fact that recoverability holds with , one can easily show Theorem 1.

## 4 Numerical Validation

In this section, we present numerical results to validate Theorem 1. We randomly generate the latent factors of third-order tensors with CP rank such that the condition number of the latent factors satisfy

. In order to generate the latent factor with desired condition number, we first generate the entries of the latent factors uniformly at random. Then we change the singular values of the latent factor, while keeping the singular vectors unchanged.

Using the latent factors, we generate the tensor using Eq. (2). This way, we have generated where . We employ

such that the entries are randomly chosen from the normal distribution with zero mean and variance

. This makes the entries of to be i.i.d zero mean -subgaussian. The observations are then generated using Eq. (3). In order to solve the tensor recovery problem in (4), we employ the Gauss-Newton based algorithm proposed in [21]. We stop the algorithm when the relative change in the objective function is less than the machine accuracy. For each , we run 100 random trials and each trial is counted towards a successful tensor recovery if the mean squared error (MSE) is lower than , where the MSE is defined as where is the recovered tensor.

Fig. 1 shows the number of successful recovery cases against the condition number for different values of and . Note that the tensor recovery problem in (4) is NP-hard [31] and thus numerical optimizers may not necessarily output optimal solutions. However, when is small, the problem can be solved very well by the algorithm in [21] according to our extensive simulations. The numerical results here can therefore serve as reasonably reliable references. It can be observed that as the condition number increases, successful tensor recovery becomes harder to attain under both settings. This is consistent with the result in Theorem 1 which indicates that a larger needs larger for successful tensor recovery.

## 5 Conclusion

In this work, we considered the recoverability problem for compressed CP tensors. Unlike previous works which tackled this problem leveraging CPD uniqueness of the compressed tensors or assumptions on the latent factors’ distribution, we offered a recoverability theory without making such assumptions. The result derived in this work can potentially cover more cases in practice. The proof also offers insights on how the conditioning of the latent factors of a CPD model can affect recoverability of compressed tensors. We also presented experimental results supporting our theoretical claims.

## References

• [1] G. H. Golub and C. F. V. Loan, Matrix Computations.   The Johns Hopkins University Press, 1996.
• [2] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253–263, 2008.
• [3] E. J. Candes, “The restricted isometry property and its implications for compressed sensing,” Comptes rendus mathematique, vol. 346, no. 9-10, pp. 589–592, 2008.
• [4] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing.   Birkhäuser Basel, 2013, vol. 1, no. 3.
• [5] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
• [6] E. J. Candes and Y. Plan, “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2342–2359, 2011.
• [7] X. Shen and Y. Wu, “A unified approach to salient object detection via low rank matrix recovery,” in Proc. CVPR 2012.   IEEE, 2012, pp. 853–860.
• [8] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectral image restoration using low-rank matrix recovery,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 4729–4743, 2014.
• [9] B. Zhao, J. P. Haldar, C. Brinegar, and Z.-P. Liang, “Low rank matrix recovery for real-time cardiac MRI,” in Proc. 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010, pp. 996–999.
• [10] X. Fu, W.-K. Ma, T.-H. Chan, and J. M. Bioucas-Dias, “Self-dictionary sparse regression for hyperspectral unmixing: Greedy pursuit and pure pixel search are related,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 6, pp. 1128–1141, 2015.
• [11] Y. Wang, D. Meng, and M. Yuan, “Sparse recovery: from vectors to tensors,” National Science Review, 2017.
• [12] H. Rauhut, R. Schneider, and Ž. Stojanac, “Low rank tensor recovery via iterative hard thresholding,” Linear Algebra and its Applications, vol. 523, pp. 220–262, 2017.
• [13]

Y. Yang, Y. Feng, and J. A. K. Suykens, “Robust low-rank tensor recovery with regularized redescending M-estimator,”

IEEE Trans. Neural Net. Learning Sys., vol. 27, no. 9, pp. 1933–1946, Sept 2016.
• [14] Y. Wang, J. Peng, Q. Zhao, Y. Leung, X.-L. Zhao, and D. Meng, “Hyperspectral image restoration via total variation regularized low-rank tensor decomposition,” IEEE J. Sel. Topics Appl. Earth Observ., vol. 11, no. 4, pp. 1227–1243, 2018.
• [15] S. Friedland, Q. Li, and D. Schonfeld, “Compressive sensing of sparse tensors.” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4438–4447, 2014.
• [16] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot, “Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches,” IEEE J. Sel. Topics Appl. Earth Observ.
• [17] W.-K. Ma, J. Bioucas-Dias, T.-H. Chan, N. Gillis, P. Gader, A. Plaza, A. Ambikapathi, and C.-Y. Chi, “A signal processing perspective on hyperspectral unmixing,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 67–81, Jan 2014.
• [18] E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, “Tensors for data mining and data fusion: Models, applications, and scalable algorithms,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 2, p. 16, 2017.
• [19] N. D. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing for sparse low-rank tensors,” IEEE Signal Process. Lett., vol. 19, no. 11, pp. 757–760, 2012.
• [20] N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos, “Parallel randomly compressed cubes: A scalable distributed architecture for big tensor decomposition,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 57–70, 2014.
• [21] M. Boussé, N. Vervliet, I. Domanov, O. Debals, and L. De Lathauwer, “Linear systems with a canonical polyadic decomposition constrained solution: Algorithms and applications,” Numerical Linear Algebra with Applications, vol. 25, 2018.
• [22] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
• [23] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, “Tensor decomposition for signal processing and machine learning,” IEEE Trans. Signal Process., vol. 65, no. 13, pp. 3551–3582.
• [24] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
• [25] I. V. Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
• [26] B. Yang, X. Fu, and N. D. Sidiropoulos, “Learning from hidden traits: Joint factor analysis and latent clustering,” IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 256–269, 2017.
• [27] X. Fu, K. Huang, W.-K. Ma, N. Sidiropoulos, and R. Bro, “Joint tensor factorization and outlying slab suppression with applications,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6315–6328, 2015.
• [28] S. Dirksen, “Dimensionality reduction with subgaussian matrices: a unified theory,” Foundations of Computational Mathematics, vol. 16, no. 5, pp. 1367–1396, 2016.
• [29] M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, ser. Cambridge Series in Statistical and Probabilistic Mathematics.   Cambridge University Press, 2019.
• [30] R. Vershynin,

Introduction to the non-asymptotic analysis of random matrices

.   Cambridge University Press, 2012, p. 210–268.
• [31] C. J. Hillar and L.-H. Lim, “Most tensor problems are np-hard,” Journal of the ACM (JACM), vol. 60, no. 6, p. 45, 2013.

## Appendix A Proof of Proposition 1

To proceed, we first show the following lemma:

###### Lemma 2

Suppose that . For any integer such that and any subset of with elements, i.e., , we consider a term , where . By definition, when , appears in the leftmost position of the term; when , is in the rightmost of . Then, we have the following:

 ∥W∥2≤∥U∥2.

Proof: We prove the lemma with . For , the proof is almost identical.

Note that , where

is a submatrix of the identity matrix which does column selection. We have the following chain of inequalities:

 ∥U⊙A(k1)⊙…⊙A(kL)∥2 =∥U⊗(A(k1)⊙…⊙A(kL)))P∥2 ≤∥U∥2∥(A(k1)⊙…⊙A(kL))∥2∥P∥2 =∥U∥2∥(A(k1)⊗…⊗A(kL))P′∥2∥P∥2 ≤∥U∥2∥(A(k1)⊗…⊗A(kL))∥2∥P′∥2∥P∥2 =∥U∥2∥A(k1)∥2…∥A(kL)∥2∥P′∥2∥P∥2≤∥U∥2.

where is also a proper column selection matrix, and we have used . Note that the last equality holds due to the fact that

Consider a tensor . This tensor can be represented as , which is a short-hand notation for the expression in (2). Now consider another tensor . The Euclidean distance between the two tensors are bounded because of the following inequalities:

 ∥¯X––−X––∥F =∥∥ ∥∥F∑f=1¯λ(∘Nn=1¯A(n)(:,f))−F∑f=1λ(∘Nn=1A(n)(:,f))∥∥ ∥∥F ≤∥∥ ∥∥F∑f=1(¯λ−λ)(∘Nn=1¯A(n)(:,f))∥∥ ∥∥F +∥∥ ∥∥F∑f=1λ(∘Nn=1¯A(n)(:,f))−F∑f=1λ(∘Nn=1A(n)(:,f))∥∥ ∥∥FQλ ≤∥∥¯A(N)⊙…⊙¯A(1)∥∥2|¯λ−λ|√F+Qλ ≤|¯λ−λ|√F+Qλ. (8)

where Eq. (8) is obtained by Lemma 2. Now consider,

 Qλ =∥∥ ∥∥F∑f=1λ(∘Nn=1¯A(n)(:,f))−F∑f=1λ(∘Nn=1A(n)(:,f))∥∥ ∥∥F ≤∥∥ ∥∥F∑f=1λ(¯A(1)(:,f)−A(1)(:,f))∘(∘Nn=2¯A(n)(:,f))∥∥ ∥∥F +∥∥ ∥∥F∑f=1λA(1)(:,f)∘(∘Nn=2¯A(n)(:,f)−∘Nn=2A(n)(:,f))∥∥ ∥∥FQA(1) ≤∥∥¯A(N)⊙…⊙(¯A(1)−A(1))∥∥2|λ|√F+QA(1) ≤∥¯A(1)−A(1)∥2τ+QA(1). (9)

where Eq. (9) is obtained by invoking Lemma 2 and using the fact as given by Eq. (7). In this way, we can obtain similar inequalities for all and we can finally establish the below relationship:

 ∥X––−¯X––∥F≤N∑n=1∥¯A(n)−A(n)∥2τ+|¯λ−λ|√F.

Hence, to show that there exists a -net covering we only need to show that there exists a set covering with width and the same applies to . Since both and live in respective unit norm balls (unit matrix -norm ball for in particular), it is well-known that there exist -nets that cover them, which have the cardinalities bounded by and respectively[28, 2, 5]. Overall, the -net of has points inside. Or, if we let , we have -net of with elements.

## Appendix B Proof of Proposition 2

Consider the following lemma:

###### Lemma 3 (Corollary 5.4 [28])

Let be subsets of a Hilbert space and let . Set where is the induced norm on and . Suppose that has covering dimension with parameter and base covering with respect to the induced metric on . Set , and . Let be a subgaussian map which maps to . Then, for some constant , for any , restricted isometry constant of on satisfies provided that

 M≥Cα2δ−2max{logk+logN0+Klog(c),log(η−1)}.

In our case, . The set belongs to the Hilbert space which is the Euclidean space with Euclidean distance as the induced metric. If the covering number of a Hilbert space with respect to unit norm ball is bounded by an expression of the form for any , then is the base covering and is the covering dimension with parameter (Def. 5.1, [28]). Therefore, according to Proposition 1, we have , and By applying these parameters in Lemma 3, we get the result in Proposition 2.