1 Introduction
The field of compressive sensing [1, 2]
has lead to a rich corpus of results showcasing that intrinsically lowdimensional objects living in large ambient dimensional space can be recovered from small numbers of linear measurements. As a complement to the socalled sparse vector recovery problem is the lowrank matrix recovery problem. Motivated by applications in signal processing (see e.g.
[3, 4, 5, 6]) and data science (see e.g.
[7, 8]), the latter asks for a lowrank matrix to be recovered from a small number of linear measurements or observations. Formally, one considers a matrix of (nearly) rank along with a linear operator with , and the goal is to recover from the measurements . A common approach is to consider the relaxation of the NPhard rank minimization [9], leading to the socalled nuclearnorm minimization problem [8, 10],(1.1) 
where the nuclearnorm is the L1 norm of the singular values:
. It has been shown that (1.1) yields exact recovery of (or approximate when the measurements contain noise) when the measurement operator obeys some assumptions such as incoherence, restricted isometry, or is constructed from some random models [11, 8, 10]. Examples include when is constructed by taking matrix inner products with matrices containing i.i.d. (sub)Gaussian entries or when views entries of the matrix selected uniformly at random. In these and most other cases, the number of measurements required is on the order of .As in vector sparse recovery, an alternative to optimization based programs like (1.1
) is to use iterative methods that produce estimates that converge to the solution
. Relevant to this paper is the Iterative Hard Thresholding (IHT) method [12, 13, 14], that can be described succinctly by the update(1.2) 
where is chosen either as the zero vector/matrix or randomly. Here, denotes the adjoint of the operator , and the function is a thresholding operator. In the vector sparse recovery setting, simply keeps the largest in magnitude entries of its input and sets the rest to zero, thereby returning the closest sparse vector to its input. In the matrix case, it returns the closest rank matrix to its input. To guarantee recovery via the IHT method, one may consider the restricted isometry property (RIP) [15], which asks that the operator roughly preserves the geometry of sparse/lowrank vectors/matrices:
where is a controlled constant that may depend on and denotes the Frobenius norm. For example, when the operator satisfies the RIP for sparse vectors with constant , after a suitable number of iterations, IHT exactly recovers any sparse vector from the measurements . Moreover, the result is robust and shows that when the measurements are corrupted by noise and the vector is not exactly sparse but wellapproximated by its sparse representation , that IHT still produces an accurate estimate to with error proportional to . See [13] for details.
1.1 Extension to tensor recovery
Extending IHT from sparse vector recovery to lowrank matrix recovery is somewhat natural. Indeed, a matrix is lowrank if and only if its vector of singular values is sparse. Extensions to the tensor setting, however, yield some nontrivial challenges. Nonetheless, there are many applications that motivate the use of tensors, ranging from video and longitudinal imaging [16, 17]
[18, 19] and differential equations [20, 21].We will write a order tensor as , where denotes the dimension in the th mode. Unlike the matrix case, for order tensors, there is not one unique notion of rank. In fact, many notions of rank along with their corresponding decompositions have now been defined and studied, including Tucker rank and higher order SVD [22, 23], CPrank [24, 25], and tubal rank [26, 27]. We refer the reader to [28] for a nice review of tensors and these various concepts. Succinctly, the Tucker format relies on unfoldings of the tensor whereas the CP format relies on rank1 tensor factors. For example, an order tensor can be matricized in ways by unfolding it along each of the modes [28]. One can then consider a notion of rank for this tensor as a tuple where is the rank of the th unfolding. Such a notion is attractive since rank is welldefined for matrices. The CP format on the other hand avoids the need to matricize or unfold the tensor. For the purposes of this work, we will focus on CPdecompositions and CPrank.
For vectors and denote by their outer product and for any integer , let . Then one can build a tensor in by taking the outer product where and is a rank tensor. This leads to the notation of a rank tensor by considering vectors for and and considering the sum of rank tensors:
When can be written via this decomposition, is at most rank . The smallest number of rank tensors that can be used to express a tensor is then defined to be the rank of the tensor, and a decomposition using that number of rank tensors is its rank decomposition. Note that often one may also ask that the vectors have unit norm, and aggregate the magnitude information into constants so that
Note that there are many differences between matrix rank and tensor rank. For example, the rank of a realvalued tensor may be different when considered over versus (i.e. if one allows the factor vectors above to be complexvalued or restricted to the reals). Throughout this paper, we will consider realvalued tensors in , but our analysis extends to the complex case as well. The CPrank and CPdecompositions can be viewed as natural extensions of the matrix rank and SVD, and are well motivated by applications such as topic modeling, psychometrics, signal processing, linguistics and many others [24, 25, 29].
Unfortunately, not only is rankminimization of tensors NPHard, but even the relaxation to the nuclear norm minimization using any of these notions of rank is also NPHard [30]. Therefore, it is even more crucial to consider other types of methods for tensor recovery. Fortunately, many iterative methods have natural extensions to the tensor setting.
Here, we will focus on the extension of the IHT method (1.2) to the tensor setting, as put forth in [31]. The authors prove accuracy of the tensor variant under a tensor RIP assumption. Likewise, we will consider measurements of the form , where is a linear operator. The tensor IHT method (TIHT) of [31] is summarized as in Algorithm 1.
Note that the algorithm depends on the “thresholding” operator^{1}^{1}1We note that this operator need not really be a true “thresholding” operator, but use this nomenclature since the method is derived from the classical iterative hard thresholding method. as input. In the vector case this operator performs simple thresholding, thus the name of the algorithm. In the matrix case it typically performs a lowrank projection via the SVD. In the tensor case, there are several options for this operator. In [31], the authors ask that computes an approximation to the closest rank tensor to its input. Such an approximation is necessary since computing the best rank tensor may be illdefined or NPHard depending on the notion of rank used. For order tensors, the approximation in [31] is asked to satisfy
where is the best rank approximation to using various notions of the Tucker rank. We will adapt this operator to the CPrank and propose a valid approximation for our purposes later. As in [31], the recovery of low CPrank tensors by TIHT will rely on a tensor variant of the RIP, defined below in Definition 2.1.
We will utilize the TRIP for an appropriate set corresponding to normalized tensors with low CPrank. In [31], the TRIP is utilized for several other notions of Tucker rank, namely for tensors with low rank higher order SVD (HOSVD) decompositions, hierarchical Tucker (HT) decompositions, and tensor train (TT) decompositions. The authors then prove that various randomly constructed measurement maps
satisfy those variations of the TRIP with high probability. Indeed, letting the rank
bound the rank entries of the appropriate Tucker tuple (see [31] for details), the TRIP is satisfied when the number of measurements is on the order of and or HOSVD and TT/HT decompositions, respectively. Such random constructions include those obtained by taking tensor inner products of with tensors having i.i.d. (sub)Gaussian entries. Under the TRIP assumption, the main result of [31] shows that TIHT provides recovery of low Tucker rank tensors, as summarized by the following theorem.Theorem 1.1 ([31]).
Let satisfy the TRIP (for rank HOSVD or TT or HT tensors) with , and run TIHT with noisy measurements . Assume that the following holds at each iteration of TIHT:
(1.3) 
Then the estimates produced by TIHT satisfy:
where and denote constants that may depend on .
As a consequence, after iterations, the estimate satisfies
where and denote constants that may depend on .
As the authors themselves point out, the challenge with this result is that (1.3) may be challenging to verify.
1.2 Contributions and Organization
The main contribution of this work is the extension and analysis of TIHT to low CPrank tensors. Using recent work in low CPrank tensor approximations, this work provides theoretical guarantees for the recovery of low CPrank tensors without requiring assumptions on the hard thresholding operation
. We also show that tensor measurement maps with properly normalized Gaussian random variables satisfy a CPrank version of the tensor RIP (TRIP) with high probability. These contributions are then supported by synthetically generated as well as real world experiments on video data.
The remainder of the paper is organized as follows. Section 2 contains our main results. Theorem 2.4 proves accurate recovery of TIHT for tensors with low CPrank, under an appropriate TRIP assumption, without the need to verify an assumption like (1.3). In Section 2.1 we prove Theorem 2.5 showing that measurement maps satisfying our TRIP assumption can be obtained by random constructions. Section 3 showcases numerical results for real and synthetic tensors, and we conclude in Section 4.
2 Main Results for TIHT for CPrank
First, let us formally define the set of tensors for which we will prove accurate recovery using TIHT. Given some , let us define the set of tensors:
(2.1) 
In other words, is the set of all CPrank tensors with bounded factors. Such tensors are not unusual and have been used to provide theoretical guarantees in previous works [32, 33].
We first define an analog of the tensor RIP (TRIP) for low CPrank tensors.
Definition 2.1.
The measurement operator satisfies the TRIP adapted to with parameter when
for all , defined in (2.1).
We will utilize the method and result from [32] that guarantees the following.
Theorem 2.2 ([32], Theorem 1.2).
Let be an arbitrary order tensor, , and positive integer , and set
Suppose there is a rank tensor satisfying and whose CP factors have norms bounded by . Then there is an efficient algorithm that outputs a rank tensor estimate such that
We will write this method as .
Our variant of the TIHT method will utilize this result. It can be summarized by the update steps, initialized with :
(2.2)  
(2.3) 
In our context, Theorem 2.2 means the following.
Corollary 2.3.
Let , and let be an arbitrary CPrank tensor with bounded factors; in particular let . Assume is a measurement operator satisfying the TRIP with parameter .
Assume measurements with bounded noise . Using the notation in (2.2) and (2.3), we have
and is of CPrank with factors bounded by .
In addition, if , then we have that
and is of CPrank with factors bounded by .
Proof.
To apply Theorem 2.2, we will verify there is a rank tensor with bounded factors that satisfies . Our choice for is precisely the tensor . Indeed, we have
Thus, , so by Theorem 2.2, the output satisfies
To prove the second part, we proceed in the same way. Namely, we have
Thus, , so by Theorem 2.2, the output satisfies
∎
Our goal will be to prove that the TIHT variant described in (2.2)(2.3) provides accurate recovery of tensors in (2.1), provided that the measurement operator satisfies the CPrank analog of the TRIP.
We now proceed with our main theorems.
Theorem 2.4 (TIHT with bounded low CPrank).
Let . Consider the TIHT method described in (2.2)(2.3), assume satisfies the TRIP with parameter as in Definition 2.1, and run TIHT with noisy measurements where the noise is bounded . Then TIHT has iterates that satisfy
(2.4) 
As a consequence, recovery error on the order of the upper bound of the noise, , is achieved after roughly iterations.
Proof.
The proof follows that of Theorem 1 in [31] with some crucial modifications. In particular, instead of requiring assumption (31) in the proof the aforementioned theorem, we have Corollary 2.3. As a direct result, we also have a nice upper bound on whereas Theorem 1 in [31] requires additional computation for upper bounding this term.
Starting from Corollary 2.3, we have that
Adding and subtracting to the right hand side, rearranging terms, and substituting the value of from (2.2), we can write:
Using the fact that , we invoke TRIP and the CauchySchwarz inequality on the third term and the upper bound shown in Corollary 2.3 on the fourth term in the summation to obtain
Let by the orthogonal projection operator into the subspace spanned by , , and . Additionally denote for all . Using this notation, we can rewrite the above inequality as:
where the final inequality uses simplification to combine the first two terms and the CauchySchwarz inequality.
Let , such that then we can write:
Dividing the first two inequalities by , square rooting both sides of the last inequality, and taking the sum of all three inequalities results in:
(2.5) 
where . Noting that for , we can drop the term proceeding inequalities. With the bound from the proof of [31][Theorem 1], we have that leading to the inequality:
Finally, iterating the upper bound leads to the desired result:
∎
Note that the TIHT method and main theorem have been modified in two crucial ways. First, the “thresholding” operator has been replaced by the output guaranteed by Theorem 2.2. This allows us to obtain an efficient thresholding step at the price of outputting a tensor of slightly higher rank. This higher rank output in turn requires a stricter assumption on the operator , namely that the TRIP is satisfied with that higher rank. However, we typically assume is small and bounded, so the increase is not severe. That leads to the second modification, which is that the TRIP is defined for low CPrank matrices rather than other types of low rankness. Thus, what remains to be proved is for what types of measurement operators this TRIP holds.
2.1 Measurement maps satisfying our TRIP
The following theorem shows that Gaussian measurement maps satisfy the desired TRIP with high probability.
Theorem 2.5.
Let be represented by a tensor in whose entries are properly normalized, i.i.d. Gaussian random variables.^{2}^{2}2This can easily be extended to mean zero subgaussian random variables, as in [31], where the constant in (2.6) depends on the subgaussian parameter . Then satisfies the TRIP as in Definition 2.1 with parameter as long as
(2.6) 
Proof.
We proceed again as in [31], with the main change being the construction of the covering of the set . To that end, we first obtain a bound on the covering number of this set, defined to the minimal cardinality of an net such that for any point , there is a point such that . We denote this covering number as . See e.g. [34] for more details on covering numbers.
Lemma 2.6.
For any , the covering number of satisfies
Proof.
For simplicity, we will prove the result for and , and the general case follows similarly. Denote by the unit ball in , namely . First, we begin by obtaining an net of the ball of radius , namely ( will be chosen later). Classical results [35] utilizing volumetric estimates show that such an net exists with cardinality at most . Create the set of tensors
and note that that its cardinality is at most
(2.7) 
Now let be given. Thus, can be written as for . For each and , choose so that , and note that . First, we want to bound a single term . For notational convenience, let us drop the subscript , and write to denote the th entry of the vector . In addition, let us define the gradually modified tensors , , , and . Our goal is thus to bound . To that end, we have
Similarly, we bound and . Thus,
Finally,
Thus, is a net for . Since our goal is to obtain an net, we choose . By (2.7), this yields a covering number of
Note that in the case of arbitrary order tensors, the same proof yields
∎
With Lemma 2.6 in tow, we may now use more classical results from concentration inequalities (see e.g. the proof of Theorem 2 of [31] for a complete proof in the tensor case) that show the number of measurements for a subGaussian operator to satisfy the RIP with parameter over a space scales like . This completes our proof. ∎
Since Corollary 2.3 and Theorem 2.4 require the TRIP with parameter , we will apply Theorem 2.5 with (and replaced by ). Theorem 2.5 then immediately yields the following. Note there is of course a tradeoff in choosing small or large. Larger allows for larger noise tolerance and a weaker TRIP requirement, but incurs additional runtime cost in utilizing Corollary 2.3.
3 Numerical Results
Here we present some synthetic and real experiments that showcase the performance of TIHT when applied to low CPrank tensors. All experiments are done on order3 tensors, so . We utilize Gaussian measurements as motivated by Theorem 2.5. To be precise, for a tensor , we construct for specified number of measurements a matrix
whose entries are i.i.d. Gaussian with mean zero and variance
. We then compute the measurements by applying the matrix to the vectorized tensor . Note there are of course other natural ways of applying such Gaussian operators and we do not seek optimality in terms of computation here. In addition, the “thresholding” step of (2.3) is performed using the cpd function from the Tensorlab package [36], which we note may be implemented differently than the method guaranteed by Theorem 2.2. We measure the relative recovery error as for a given tensor . We refer to the measurement rate as the percentage of the total number of pixels; a rate of corresponds to number of measurements . The rank reported is the rank used in the TIHT method (2.3). We present recovery results for this model here.3.1 Synthetic data
For our first set of experiments, we create synthetic lowrank tensors and test the performance of TIHT with varying parameters , and , corresponding to the tensor rank, number of measurements, and the noise level, respectively. All tensors are created with i.i.d. standard normal entries, and the noise is also Gaussian, normalized as described. Figure 1 displays results showcasing the effect of varying the number of measurements as well as running TIHT on tensors of varying ranks. The left and center plots of this figure show that as expected, lower numbers of measurements lead to slower convergence rates in both the noiseless and noisy case. In the noisy case (center), we also see different levels of the socalled “convergence horizon.” We see the same effect in the right plot, showing that lower rank tensors exhibit faster convergence, again as expected. We repeat these experiments but for the tensor completion setting, where instead of taking Gaussian measurements we simply observe randomly chosen entries of the tensor. The results are displayed in Figure 2, where we observe similar behavior.
3.2 Real data
There is an abundance of real applications that involve tensors. Here, we consider only two, that are appealing for presentation reasons, and for the sole purpose of verifying our theoretical findings. Namely, we will consider tensors arising from color images and from grayscale video. For color (RGB) images, we view the image as a tensor in where the third mode represents the three color channels. Similarly, we use a tensor representation for a video with frames, each of size . These experiments are meant to showcase that TIHT can indeed be reasonably applied to such real applications.
Figure 3 shows the relative recovery error as a function of TIHT iterations for the RGB images shown, which are already lowrank (by construction). Figure 4 shows the same result for a real image which is not made to be lowrank. We compute its approximate distance to its nearest rank15 tensor using the TensorLab cpd function to be . Note that we use small image sizes for sake of computation (the image is actually a patch out of a larger image), which causes the lack of sharpness in the visualization of the original images as displayed. We see in Figure 4 that the relative error reaches around the optimal, namely the relative distance to its lowrank representation. Thus, this error can be viewed as reaching the noise floor. Figure 5 shows the same information for the videos whose frames are displayed, and we see again that the error decays until approximately the noise floor. Note that for computational reasons we consider only small image sizes, since the measurement maps themselves grow quite large quickly. Of course, other types of maps may reduce this problem, although that is not the focus of this paper.
4 Conclusion
This work presents an Iterative Hard Thresholding approach to low CPrank tensor approximation from linear measurements. We show that the proposed algorithm converges to a lowrank tensor when a linear operator satisfies an RIPtype condition for low rank tensors (TRIP). In addition, we prove that Gaussian measurements satisfy the TRIP condition for low CPrank signals. Our numerical experiments not only verify our theoretical findings but also highlight the potential of the proposed method. Future directions for this work include extensions to stochastic iterative hard thresholding [37] and utilizing different types of tensor measurement maps [38].
Acknowledgements
This material is based upon work supported by the National Security Agency under Grant No. H982301910119, The Lyda Hill Foundation, The McGovern Foundation, and Microsoft Research, while the authors were in residence at the Mathematical Sciences Research Institute in Berkeley, California, during the summer of 2019. In addition, Needell was funded by NSF CAREER DMS and NSF BIGDATA DMS . Li was supported by the NSF grants CCF1409258, CCF1704204, and the DARPA Lagrange Program under ONR/SPAWAR contract N660011824020. Qin is supported by the NSF grant DMS1941197.
References
 [1] S. Foucart and H. Rauhut, “A mathematical introduction to compressive sensing,” Applied and Numerical Harmonic Analysis, 2013.
 [2] Y. C. Eldar and G. Kutyniok, Compressed sensing: theory and applications. Cambridge University Press, 2012.
 [3] A. Ahmed and J. Romberg, “Compressive multiplexing of correlated signals,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 479–498, 2014.
 [4] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectral image restoration using lowrank matrix recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 4729–4743, 2013.
 [5] D. Gross, Y.K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,” Physical review letters, vol. 105, no. 15, p. 150401, 2010.

[6]
E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,”
Journal of the ACM (JACM), vol. 58, no. 3, p. 11, 2011.  [7] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 2, pp. 218–233, 2003.
 [8] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
 [9] Y. Eldar, D. Needell, and Y. Plan, “Uniqueness conditions for lowrank matrix recovery,” Applied and Computational Harmonic Analysis, vol. 33, no. 2, pp. 309–314, 2012.
 [10] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
 [11] E. Candes and Y. Plan, “Tight oracle bounds for lowrank matrix recovery from a mininal number of noisy random measurements,” IEEE Trans. Inform. Theory, vol. 57, no. 4, pp. 2342–2359, 2011.
 [12] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Appl. Comput. Harmon. A., vol. 27, no. 3, pp. 265–274, 2009.
 [13] T. Blumensath and M. E. Davies, “Normalized iterative hard thresholding: Guaranteed stability and performance,” IEEE Journal of selected topics in signal processing, vol. 4, no. 2, pp. 298–309, 2010.
 [14] J. Tanner and K. Wei, “Normalized iterative hard thresholding for matrix completion,” SIAM Journal on Scientific Computing, vol. 35, no. 5, pp. S104–S125, 2013.

[15]
E. J. Candès and T. Tao, “Decoding by linear programming,”
IEEE T. Inform. Theory, vol. 51, pp. 4203–4215, 2005.  [16] J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 208–220, 2012.
 [17] J. A. Bengua, H. N. Phien, H. D. Tuan, and M. N. Do, “Efficient tensor completion for color image and video recovery: Lowrank tensor train,” IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2466–2479, 2017.
 [18] B. RomeraParedes, H. Aung, N. BianchiBerthouze, and M. Pontil, “Multilinear multitask learning,” in International Conference on Machine Learning, pp. 1444–1452, 2013.

[19]
M. A. O. Vasilescu and D. Terzopoulos, “Multilinear independent components analysis,” in
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
, vol. 1, pp. 547–553, IEEE, 2005.  [20] M. H. Beck, A. Jäckle, G. A. Worth, and H.D. Meyer, “The multiconfiguration timedependent hartree (mctdh) method: a highly efficient algorithm for propagating wavepackets,” Physics reports, vol. 324, no. 1, pp. 1–105, 2000.
 [21] C. Lubich, From quantum to classical molecular dynamics: reduced models and numerical analysis. European Mathematical Society, 2008.
 [22] L. R. Tucker, “Some mathematical notes on threemode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.

[23]
L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,”
SIAM journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000.  [24] J. D. Carroll and J.J. Chang, “Analysis of individual differences in multidimensional scaling via an nway generalization of “eckartyoung” decomposition,” Psychometrika, vol. 35, no. 3, pp. 283–319, 1970.
 [25] R. A. Harshman et al., “Foundations of the parafac procedure: Models and conditions for an" explanatory" multimodal factor analysis,” 1970.
 [26] M. E. Kilmer and C. D. Martin, “Factorization strategies for thirdorder tensors,” Linear Algebra and its Applications, vol. 435, no. 3, pp. 641–658, 2011.
 [27] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer, “Novel methods for multilinear data completion and denoising based on tensorsvd,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3842–3849, 2014.
 [28] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
 [29] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, “Tensor decompositions for learning latent variable models,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 2773–2832, 2014.
 [30] C. J. Hillar and L.H. Lim, “Most tensor problems are nphard,” Journal of the ACM (JACM), vol. 60, no. 6, p. 45, 2013.
 [31] H. Rauhut, R. Schneider, and Ž. Stojanac, “Low rank tensor recovery via iterative hard thresholding,” Linear Algebra and its Applications, vol. 523, pp. 220–262, 2017.
 [32] Z. Song, D. P. Woodruff, and P. Zhong, “Relative error tensor low rank approximation,” in Proceedings of the Thirtieth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 2772–2789, Society for Industrial and Applied Mathematics, 2019.
 [33] A. Bhaskara, M. Charikar, and A. Vijayaraghavan, “Uniqueness of tensor decompositions with applications to polynomial identifiability,” in Conference on Learning Theory, pp. 742–778, 2014.
 [34] C. A. Rogers, Packing and covering. No. 54, University Press, 1964.
 [35] R. Vershynin, “Introduction to the nonasymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010.
 [36] N. Vervliet, O. Debals, and L. De Lathauwer, “Tensorlab 3.0 – numerical optimization strategies for largescale constrained and coupled matrix/tensor factorization,” in Asilomar Conference on Signals, Systems and Computers, pp. 1733–1738, IEEE, 2016.
 [37] N. Nguyen, D. Needell, and T. Woolf, “Linear convergence of stochastic iterative greedy algorithms with sparse constraints,” IEEE Transactions on Information Theory, vol. 63, no. 11, pp. 6869–6895, 2017.
 [38] I. Georgieva and C. Hofreither, “Greedy lowrank approximation in tucker format of solutions of tensor linear systems,” Journal of Computational and Applied Mathematics, vol. 358, pp. 206–220, 2019.