has lead to a rich corpus of results showcasing that intrinsically low-dimensional objects living in large ambient dimensional space can be recovered from small numbers of linear measurements. As a complement to the so-called sparse vector recovery problem is the low-rank matrix recovery problem. Motivated by applications in signal processing (see e.g.[3, 4, 5, 6]
) and data science (see e.g.[7, 8]), the latter asks for a low-rank matrix to be recovered from a small number of linear measurements or observations. Formally, one considers a matrix of (nearly) rank along with a linear operator with , and the goal is to recover from the measurements . A common approach is to consider the relaxation of the NP-hard rank minimization , leading to the so-called nuclear-norm minimization problem [8, 10],
where the nuclear-norm is the L1 norm of the singular values:. It has been shown that (1.1) yields exact recovery of (or approximate when the measurements contain noise) when the measurement operator obeys some assumptions such as incoherence, restricted isometry, or is constructed from some random models [11, 8, 10]. Examples include when is constructed by taking matrix inner products with matrices containing i.i.d. (sub)-Gaussian entries or when views entries of the matrix selected uniformly at random. In these and most other cases, the number of measurements required is on the order of .
As in vector sparse recovery, an alternative to optimization based programs like (1.1
) is to use iterative methods that produce estimates that converge to the solution. Relevant to this paper is the Iterative Hard Thresholding (IHT) method [12, 13, 14], that can be described succinctly by the update
where is chosen either as the zero vector/matrix or randomly. Here, denotes the adjoint of the operator , and the function is a thresholding operator. In the vector sparse recovery setting, simply keeps the largest in magnitude entries of its input and sets the rest to zero, thereby returning the closest -sparse vector to its input. In the matrix case, it returns the closest rank- matrix to its input. To guarantee recovery via the IHT method, one may consider the restricted isometry property (RIP) , which asks that the operator roughly preserves the geometry of sparse/low-rank vectors/matrices:
where is a controlled constant that may depend on and denotes the Frobenius norm. For example, when the operator satisfies the RIP for -sparse vectors with constant , after a suitable number of iterations, IHT exactly recovers any -sparse vector from the measurements . Moreover, the result is robust and shows that when the measurements are corrupted by noise and the vector is not exactly sparse but well-approximated by its -sparse representation , that IHT still produces an accurate estimate to with error proportional to . See  for details.
1.1 Extension to tensor recovery
Extending IHT from sparse vector recovery to low-rank matrix recovery is somewhat natural. Indeed, a matrix is low-rank if and only if its vector of singular values is sparse. Extensions to the tensor setting, however, yield some non-trivial challenges. Nonetheless, there are many applications that motivate the use of tensors, ranging from video and longitudinal imaging [16, 17]18, 19] and differential equations [20, 21].
We will write a -order tensor as , where denotes the dimension in the th mode. Unlike the matrix case, for order tensors, there is not one unique notion of rank. In fact, many notions of rank along with their corresponding decompositions have now been defined and studied, including Tucker rank and higher order SVD [22, 23], CP-rank [24, 25], and tubal rank [26, 27]. We refer the reader to  for a nice review of tensors and these various concepts. Succinctly, the Tucker format relies on unfoldings of the tensor whereas the CP format relies on rank-1 tensor factors. For example, an order tensor can be matricized in ways by unfolding it along each of the modes . One can then consider a notion of rank for this tensor as a -tuple where is the rank of the th unfolding. Such a notion is attractive since rank is well-defined for matrices. The CP format on the other hand avoids the need to matricize or unfold the tensor. For the purposes of this work, we will focus on CP-decompositions and CP-rank.
For vectors and denote by their outer product and for any integer , let . Then one can build a tensor in by taking the outer product where and is a rank- tensor. This leads to the notation of a rank- tensor by considering vectors for and and considering the sum of rank- tensors:
When can be written via this decomposition, is at most rank . The smallest number of rank- tensors that can be used to express a tensor is then defined to be the rank of the tensor, and a decomposition using that number of rank- tensors is its rank decomposition. Note that often one may also ask that the vectors have unit norm, and aggregate the magnitude information into constants so that
Note that there are many differences between matrix rank and tensor rank. For example, the rank of a real-valued tensor may be different when considered over versus (i.e. if one allows the factor vectors above to be complex-valued or restricted to the reals). Throughout this paper, we will consider real-valued tensors in , but our analysis extends to the complex case as well. The CP-rank and CP-decompositions can be viewed as natural extensions of the matrix rank and SVD, and are well motivated by applications such as topic modeling, psychometrics, signal processing, linguistics and many others [24, 25, 29].
Unfortunately, not only is rank-minimization of tensors NP-Hard, but even the relaxation to the nuclear norm minimization using any of these notions of rank is also NP-Hard . Therefore, it is even more crucial to consider other types of methods for tensor recovery. Fortunately, many iterative methods have natural extensions to the tensor setting.
Here, we will focus on the extension of the IHT method (1.2) to the tensor setting, as put forth in . The authors prove accuracy of the tensor variant under a tensor RIP assumption. Likewise, we will consider measurements of the form , where is a linear operator. The tensor IHT method (TIHT) of  is summarized as in Algorithm 1.
Note that the algorithm depends on the “thresholding” operator111We note that this operator need not really be a true “thresholding” operator, but use this nomenclature since the method is derived from the classical iterative hard thresholding method. as input. In the vector case this operator performs simple thresholding, thus the name of the algorithm. In the matrix case it typically performs a low-rank projection via the SVD. In the tensor case, there are several options for this operator. In , the authors ask that computes an approximation to the closest rank- tensor to its input. Such an approximation is necessary since computing the best rank- tensor may be ill-defined or NP-Hard depending on the notion of rank used. For -order tensors, the approximation in  is asked to satisfy
where is the best rank- approximation to using various notions of the Tucker rank. We will adapt this operator to the CP-rank and propose a valid approximation for our purposes later. As in , the recovery of low CP-rank tensors by TIHT will rely on a tensor variant of the RIP, defined below in Definition 2.1.
We will utilize the TRIP for an appropriate set corresponding to normalized tensors with low CP-rank. In , the TRIP is utilized for several other notions of Tucker rank, namely for tensors with low rank higher order SVD (HOSVD) decompositions, hierarchical Tucker (HT) decompositions, and tensor train (TT) decompositions. The authors then prove that various randomly constructed measurement maps
satisfy those variations of the TRIP with high probability. Indeed, letting the rankbound the rank entries of the appropriate Tucker -tuple (see  for details), the TRIP is satisfied when the number of measurements is on the order of and or HOSVD and TT/HT decompositions, respectively. Such random constructions include those obtained by taking tensor inner products of with tensors having i.i.d. (sub)-Gaussian entries. Under the TRIP assumption, the main result of  shows that TIHT provides recovery of low Tucker rank tensors, as summarized by the following theorem.
Theorem 1.1 ().
Let satisfy the TRIP (for rank HOSVD or TT or HT tensors) with , and run TIHT with noisy measurements . Assume that the following holds at each iteration of TIHT:
Then the estimates produced by TIHT satisfy:
where and denote constants that may depend on .
As a consequence, after iterations, the estimate satisfies
where and denote constants that may depend on .
As the authors themselves point out, the challenge with this result is that (1.3) may be challenging to verify.
1.2 Contributions and Organization
The main contribution of this work is the extension and analysis of TIHT to low CP-rank tensors. Using recent work in low CP-rank tensor approximations, this work provides theoretical guarantees for the recovery of low CP-rank tensors without requiring assumptions on the hard thresholding operation
. We also show that tensor measurement maps with properly normalized Gaussian random variables satisfy a CP-rank version of the tensor RIP (TRIP) with high probability. These contributions are then supported by synthetically generated as well as real world experiments on video data.
The remainder of the paper is organized as follows. Section 2 contains our main results. Theorem 2.4 proves accurate recovery of TIHT for tensors with low CP-rank, under an appropriate TRIP assumption, without the need to verify an assumption like (1.3). In Section 2.1 we prove Theorem 2.5 showing that measurement maps satisfying our TRIP assumption can be obtained by random constructions. Section 3 showcases numerical results for real and synthetic tensors, and we conclude in Section 4.
2 Main Results for TIHT for CP-rank
First, let us formally define the set of tensors for which we will prove accurate recovery using TIHT. Given some , let us define the set of tensors:
We first define an analog of the tensor RIP (TRIP) for low CP-rank tensors.
The measurement operator satisfies the TRIP adapted to with parameter when
for all , defined in (2.1).
We will utilize the method and result from  that guarantees the following.
Theorem 2.2 (, Theorem 1.2).
Let be an arbitrary order- tensor, , and positive integer , and set
Suppose there is a rank- tensor satisfying and whose CP factors have norms bounded by . Then there is an efficient algorithm that outputs a rank- tensor estimate such that
We will write this method as .
Our variant of the TIHT method will utilize this result. It can be summarized by the update steps, initialized with :
In our context, Theorem 2.2 means the following.
Let , and let be an arbitrary CP-rank tensor with bounded factors; in particular let . Assume is a measurement operator satisfying the TRIP with parameter .
and is of CP-rank with factors bounded by .
In addition, if , then we have that
and is of CP-rank with factors bounded by .
To apply Theorem 2.2, we will verify there is a rank- tensor with bounded factors that satisfies . Our choice for is precisely the tensor . Indeed, we have
Thus, , so by Theorem 2.2, the output satisfies
To prove the second part, we proceed in the same way. Namely, we have
Thus, , so by Theorem 2.2, the output satisfies
Our goal will be to prove that the TIHT variant described in (2.2)-(2.3) provides accurate recovery of tensors in (2.1), provided that the measurement operator satisfies the CP-rank analog of the TRIP.
We now proceed with our main theorems.
Theorem 2.4 (TIHT with bounded low CP-rank).
Let . Consider the TIHT method described in (2.2)-(2.3), assume satisfies the TRIP with parameter as in Definition 2.1, and run TIHT with noisy measurements where the noise is bounded . Then TIHT has iterates that satisfy
As a consequence, recovery error on the order of the upper bound of the noise, , is achieved after roughly iterations.
The proof follows that of Theorem 1 in  with some crucial modifications. In particular, instead of requiring assumption (31) in the proof the aforementioned theorem, we have Corollary 2.3. As a direct result, we also have a nice upper bound on whereas Theorem 1 in  requires additional computation for upper bounding this term.
Starting from Corollary 2.3, we have that
Adding and subtracting to the right hand side, rearranging terms, and substituting the value of from (2.2), we can write:
Using the fact that , we invoke TRIP and the Cauchy-Schwarz inequality on the third term and the upper bound shown in Corollary 2.3 on the fourth term in the summation to obtain
Let by the orthogonal projection operator into the subspace spanned by , , and . Additionally denote for all . Using this notation, we can rewrite the above inequality as:
where the final inequality uses simplification to combine the first two terms and the Cauchy-Schwarz inequality.
Let , such that then we can write:
Dividing the first two inequalities by , square rooting both sides of the last inequality, and taking the sum of all three inequalities results in:
where . Noting that for , we can drop the term proceeding inequalities. With the bound from the proof of [Theorem 1], we have that leading to the inequality:
Finally, iterating the upper bound leads to the desired result:
Note that the TIHT method and main theorem have been modified in two crucial ways. First, the “thresholding” operator has been replaced by the output guaranteed by Theorem 2.2. This allows us to obtain an efficient thresholding step at the price of outputting a tensor of slightly higher rank. This higher rank output in turn requires a stricter assumption on the operator , namely that the TRIP is satisfied with that higher rank. However, we typically assume is small and bounded, so the increase is not severe. That leads to the second modification, which is that the TRIP is defined for low CP-rank matrices rather than other types of low rankness. Thus, what remains to be proved is for what types of measurement operators this TRIP holds.
2.1 Measurement maps satisfying our TRIP
The following theorem shows that Gaussian measurement maps satisfy the desired TRIP with high probability.
Let be represented by a tensor in whose entries are properly normalized, i.i.d. Gaussian random variables.222This can easily be extended to mean zero -subgaussian random variables, as in , where the constant in (2.6) depends on the subgaussian parameter . Then satisfies the TRIP as in Definition 2.1 with parameter as long as
We proceed again as in , with the main change being the construction of the covering of the set . To that end, we first obtain a bound on the covering number of this set, defined to the minimal cardinality of an -net such that for any point , there is a point such that . We denote this covering number as . See e.g.  for more details on covering numbers.
For any , the covering number of satisfies
For simplicity, we will prove the result for and , and the general case follows similarly. Denote by the unit ball in , namely . First, we begin by obtaining an -net of the ball of radius , namely ( will be chosen later). Classical results  utilizing volumetric estimates show that such an -net exists with cardinality at most . Create the set of tensors
and note that that its cardinality is at most
Now let be given. Thus, can be written as for . For each and , choose so that , and note that . First, we want to bound a single term . For notational convenience, let us drop the subscript , and write to denote the th entry of the vector . In addition, let us define the gradually modified tensors , , , and . Our goal is thus to bound . To that end, we have
Similarly, we bound and . Thus,
Thus, is a -net for . Since our goal is to obtain an -net, we choose . By (2.7), this yields a covering number of
Note that in the case of arbitrary -order tensors, the same proof yields
With Lemma 2.6 in tow, we may now use more classical results from concentration inequalities (see e.g. the proof of Theorem 2 of  for a complete proof in the tensor case) that show the number of measurements for a sub-Gaussian operator to satisfy the RIP with parameter over a space scales like . This completes our proof. ∎
Since Corollary 2.3 and Theorem 2.4 require the TRIP with parameter , we will apply Theorem 2.5 with (and replaced by ). Theorem 2.5 then immediately yields the following. Note there is of course a tradeoff in choosing small or large. Larger allows for larger noise tolerance and a weaker TRIP requirement, but incurs additional runtime cost in utilizing Corollary 2.3.
3 Numerical Results
Here we present some synthetic and real experiments that showcase the performance of TIHT when applied to low CP-rank tensors. All experiments are done on order-3 tensors, so . We utilize Gaussian measurements as motivated by Theorem 2.5. To be precise, for a tensor , we construct for specified number of measurements a matrix
whose entries are i.i.d. Gaussian with mean zero and variance. We then compute the measurements by applying the matrix to the vectorized tensor . Note there are of course other natural ways of applying such Gaussian operators and we do not seek optimality in terms of computation here. In addition, the “thresholding” step of (2.3) is performed using the cpd function from the Tensorlab package , which we note may be implemented differently than the method guaranteed by Theorem 2.2. We measure the relative recovery error as for a given tensor . We refer to the measurement rate as the percentage of the total number of pixels; a rate of corresponds to number of measurements . The rank reported is the rank used in the TIHT method (2.3). We present recovery results for this model here.
3.1 Synthetic data
For our first set of experiments, we create synthetic low-rank tensors and test the performance of TIHT with varying parameters , and , corresponding to the tensor rank, number of measurements, and the noise level, respectively. All tensors are created with i.i.d. standard normal entries, and the noise is also Gaussian, normalized as described. Figure 1 displays results showcasing the effect of varying the number of measurements as well as running TIHT on tensors of varying ranks. The left and center plots of this figure show that as expected, lower numbers of measurements lead to slower convergence rates in both the noiseless and noisy case. In the noisy case (center), we also see different levels of the so-called “convergence horizon.” We see the same effect in the right plot, showing that lower rank tensors exhibit faster convergence, again as expected. We repeat these experiments but for the tensor completion setting, where instead of taking Gaussian measurements we simply observe randomly chosen entries of the tensor. The results are displayed in Figure 2, where we observe similar behavior.
3.2 Real data
There is an abundance of real applications that involve tensors. Here, we consider only two, that are appealing for presentation reasons, and for the sole purpose of verifying our theoretical findings. Namely, we will consider tensors arising from color images and from grayscale video. For color (RGB) images, we view the image as a tensor in where the third mode represents the three color channels. Similarly, we use a tensor representation for a video with frames, each of size . These experiments are meant to showcase that TIHT can indeed be reasonably applied to such real applications.
Figure 3 shows the relative recovery error as a function of TIHT iterations for the RGB images shown, which are already low-rank (by construction). Figure 4 shows the same result for a real image which is not made to be low-rank. We compute its approximate distance to its nearest rank-15 tensor using the TensorLab cpd function to be . Note that we use small image sizes for sake of computation (the image is actually a patch out of a larger image), which causes the lack of sharpness in the visualization of the original images as displayed. We see in Figure 4 that the relative error reaches around the optimal, namely the relative distance to its low-rank representation. Thus, this error can be viewed as reaching the noise floor. Figure 5 shows the same information for the videos whose frames are displayed, and we see again that the error decays until approximately the noise floor. Note that for computational reasons we consider only small image sizes, since the measurement maps themselves grow quite large quickly. Of course, other types of maps may reduce this problem, although that is not the focus of this paper.
This work presents an Iterative Hard Thresholding approach to low CP-rank tensor approximation from linear measurements. We show that the proposed algorithm converges to a low-rank tensor when a linear operator satisfies an RIP-type condition for low rank tensors (TRIP). In addition, we prove that Gaussian measurements satisfy the TRIP condition for low CP-rank signals. Our numerical experiments not only verify our theoretical findings but also highlight the potential of the proposed method. Future directions for this work include extensions to stochastic iterative hard thresholding  and utilizing different types of tensor measurement maps .
This material is based upon work supported by the National Security Agency under Grant No. H98230-19-1-0119, The Lyda Hill Foundation, The McGovern Foundation, and Microsoft Research, while the authors were in residence at the Mathematical Sciences Research Institute in Berkeley, California, during the summer of 2019. In addition, Needell was funded by NSF CAREER DMS and NSF BIGDATA DMS . Li was supported by the NSF grants CCF-1409258, CCF-1704204, and the DARPA Lagrange Program under ONR/SPAWAR contract N660011824020. Qin is supported by the NSF grant DMS-1941197.
-  S. Foucart and H. Rauhut, “A mathematical introduction to compressive sensing,” Applied and Numerical Harmonic Analysis, 2013.
-  Y. C. Eldar and G. Kutyniok, Compressed sensing: theory and applications. Cambridge University Press, 2012.
-  A. Ahmed and J. Romberg, “Compressive multiplexing of correlated signals,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 479–498, 2014.
-  H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectral image restoration using low-rank matrix recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 4729–4743, 2013.
-  D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,” Physical review letters, vol. 105, no. 15, p. 150401, 2010.
E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,”Journal of the ACM (JACM), vol. 58, no. 3, p. 11, 2011.
-  R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 2, pp. 218–233, 2003.
-  E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
-  Y. Eldar, D. Needell, and Y. Plan, “Uniqueness conditions for low-rank matrix recovery,” Applied and Computational Harmonic Analysis, vol. 33, no. 2, pp. 309–314, 2012.
-  B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
-  E. Candes and Y. Plan, “Tight oracle bounds for low-rank matrix recovery from a mininal number of noisy random measurements,” IEEE Trans. Inform. Theory, vol. 57, no. 4, pp. 2342–2359, 2011.
-  T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Appl. Comput. Harmon. A., vol. 27, no. 3, pp. 265–274, 2009.
-  T. Blumensath and M. E. Davies, “Normalized iterative hard thresholding: Guaranteed stability and performance,” IEEE Journal of selected topics in signal processing, vol. 4, no. 2, pp. 298–309, 2010.
-  J. Tanner and K. Wei, “Normalized iterative hard thresholding for matrix completion,” SIAM Journal on Scientific Computing, vol. 35, no. 5, pp. S104–S125, 2013.
E. J. Candès and T. Tao, “Decoding by linear programming,”IEEE T. Inform. Theory, vol. 51, pp. 4203–4215, 2005.
-  J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 208–220, 2012.
-  J. A. Bengua, H. N. Phien, H. D. Tuan, and M. N. Do, “Efficient tensor completion for color image and video recovery: Low-rank tensor train,” IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2466–2479, 2017.
-  B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, and M. Pontil, “Multilinear multitask learning,” in International Conference on Machine Learning, pp. 1444–1452, 2013.
M. A. O. Vasilescu and D. Terzopoulos, “Multilinear independent components analysis,” in, vol. 1, pp. 547–553, IEEE, 2005.
-  M. H. Beck, A. Jäckle, G. A. Worth, and H.-D. Meyer, “The multiconfiguration time-dependent hartree (mctdh) method: a highly efficient algorithm for propagating wavepackets,” Physics reports, vol. 324, no. 1, pp. 1–105, 2000.
-  C. Lubich, From quantum to classical molecular dynamics: reduced models and numerical analysis. European Mathematical Society, 2008.
-  L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,”SIAM journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000.
-  J. D. Carroll and J.-J. Chang, “Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition,” Psychometrika, vol. 35, no. 3, pp. 283–319, 1970.
-  R. A. Harshman et al., “Foundations of the parafac procedure: Models and conditions for an" explanatory" multimodal factor analysis,” 1970.
-  M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order tensors,” Linear Algebra and its Applications, vol. 435, no. 3, pp. 641–658, 2011.
-  Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer, “Novel methods for multilinear data completion and de-noising based on tensor-svd,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3842–3849, 2014.
-  T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
-  A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, “Tensor decompositions for learning latent variable models,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 2773–2832, 2014.
-  C. J. Hillar and L.-H. Lim, “Most tensor problems are np-hard,” Journal of the ACM (JACM), vol. 60, no. 6, p. 45, 2013.
-  H. Rauhut, R. Schneider, and Ž. Stojanac, “Low rank tensor recovery via iterative hard thresholding,” Linear Algebra and its Applications, vol. 523, pp. 220–262, 2017.
-  Z. Song, D. P. Woodruff, and P. Zhong, “Relative error tensor low rank approximation,” in Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2772–2789, Society for Industrial and Applied Mathematics, 2019.
-  A. Bhaskara, M. Charikar, and A. Vijayaraghavan, “Uniqueness of tensor decompositions with applications to polynomial identifiability,” in Conference on Learning Theory, pp. 742–778, 2014.
-  C. A. Rogers, Packing and covering. No. 54, University Press, 1964.
-  R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010.
-  N. Vervliet, O. Debals, and