1. Introduction and prior work
Geometry preserving dimension reduction has become important in a wide variety of applications in the last two decades due to improved sensing capabilities and the increasing prevalence of massive data sets. This is motivated in part by the fact that the data one collects often consists of highdimensional representations of intrinsically simpler and effectively lowerdimensional data. In such settings, randomized linear projections have been demonstrated to preserve the intrinsic geometric structure of the collected data in a wide range of applications in both computer science (where one often deals with finite data sets [14, 1]) and signal processing (where manifold [4] and sparsity [17] assumptions are common). In this context, the vast majority of prior work has been focused on recovering vector data taking values in a set using random linear maps into with which are guaranteed to approximately preserve the norms of all elements in . The focus of this paper is extending this line of work to higherorder tensors taking values in .
In the vector case, uniform guarantees for the approximate norm preservation for all sparse vectors, in the form of the restricted isometry property (RIP), have numerous applications. They include recovery algorithms that reconstruct all sparse vectors from a few linear measurements (such as, minimization [12, 16, 29], orthogonal matching pursuit [40], CoSaMP [30, 16], iterative hard thresholding [8] and hard thresholding pursuit [15]). Extending these algorithms from sparse vector recovery to lowrank matrix or lowrank tensor recovery is very natural. Indeed, rank matrices (i.e., twomode tensors) in can be recovered from linear measurements [11, 17]
. Extensions to the lowrank higherorder tensor setting, however, are less straightforward due to, e.g., the more complicated structure of higherorder singular value decomposition and nonunique definition of the tensor rank. Still, there are many applications that motivate the use of tensors, ranging from video and longitudinal imaging
[26, 6][35, 37] and differential equations [5, 27]. Thus, while tensor applications are ubiquitous and moreover the tensors arising in these applications are extremely largescale, few methods exist that do satisfactory tensor dimension reduction. Our goal here is thus to demonstrate a tensor dimension reduction technique that is computationally feasible (in terms of application and storage) and that guarantees preservation of geometry. As a motivating example, we consider the problem of tensor reconstruction from such dimension reduction measurements, and in particular the Tensor Iterative Hard Thresholding method is used for this purpose herein.In [33]
, the authors propose tensor extensions of the Iterative Hard Thresholding (IHT) method for several tensor decomposition formats, namely the higherorder singular value decomposition (HOSVD), the tensor train format, and the general hierarchical Tucker decomposition. Additionally, the recent papers
[20, 19] extend the Tensor IHT method (TIHT) to low Canonical Polyadic (CP) rank and low Tucker rank tensors, respectively. TIHT as the name suggests is an iterative method that consists of one step that applies the adjoint of the measurement operator to the remaining residual and a second step that thresholds that signal proxy to a lowrank tensor. This method has seen provable guarantees for reconstruction under various geometry preserving assumptions on the measurement maps [33, 20, 19]. All these works however propose first reshaping a mode tensor into an dimensional vector and then multiplying by an matrix . Unfortunately, this means that the matrix must be even larger than the original tensors . The main goal of this paper is to propose a more memoryefficient alternative to this approach.In particular, we propose a modewise framework for lowrank tensor recovery. A general twostage modewise linear operator takes the form
(1) 
where is a reshaping operator which reorganizes an tensor into an tensor, after which each is applied to the resphaped tensor for via a modewise product (reviewed in Section 2), followed by an additional reshaping via into an tensor, and finally additional mode products with the matrices for . More general stage modewise operators can be defined similarly. First analyzed in [21, 23] for aiding in the rapid computation of the CP decomposition, such modewise compression operators offer a wide variety of computational advantages over standard vectorbased approaches (in which is a vectorization operator so that , is a standard JohnsonLindenstrauss map, and all remaining operators are the identity). In particular, when
is a more modest reshaping (or even the identity) the resulting modewise linear transforms can be formed using significantly fewer random variables (effectively, independent random bits), and stored using less memory by avoiding the use of a single massive
matrix. In addition, such modewise linear operators also offer trivially parallelizable operations, faster serial data evaluations than standard vectorized approaches do for structured data (see, e.g., [23]), and the ability to better respect the multimodal structure of the given tensor data.Related Work and Contributions: This paper is directly motivated by recent work on lowrank tensor recovery using vectorized measurements [33, 20, 19]. Given the framework that modewise measurement operators (1) provide for creating highly structured and computationally efficient measurement maps, we aim to provide both theoretical guarantees and empirical evidence that several modewise maps allow for the efficient recovery of tensors with lowrank HOSVD decompositions. This represents the first study of such modewise maps for performing normpreserving dimension reduction of nontrivial infinite sets of elements in (tensorized) Euclidean spaces, and so provides a general framework for generalizing the use of such maps to other types of, e.g., lowrank tensor models. See Section 3 for the specifics of our theoretical results as well as Section 4 for an empirical demonstration of the good performance such modewise maps can provide for tensor recovery in practice.
Other recent work involving the analysis of modewise maps for tensor data include, e.g., applications in kernel learning methods which effectively use modewise operators specialized to finite sets of rankone tensors [2], as well as a variety of works in the computer science literature aimed at compressing finite sets of lowrank (with respect to, e.g., CP and tensor train decompositions [32]) tensors. More general results involving extensions of bounded orthonormal sampling results to the tensor setting [23, 3] apply to finite sets of arbitrary tensors. With respect to normpreserving modewise embeddings of infinite sets, prior work has been limited to oblivious subspace embeddings (see, e.g., [21, 28]). Here, we extend these techniques to the set of all tensors with a lowrank HOSVD decomposition in order to obtain modewise embeddings with the Tensor Restricted Isometry Property (TRIP). Having obtained modewise TRIP operators, we then consider lowrank tensor recovery via Tensor IHT (TIHT).
Paper Outline: The rest of this paper is organized as follows. In Section 2, we will provide a brief review of basic tensor definitions. In Section 3, we will state our main results, which we then prove in Section 5. In Section 4, we discuss applications of our results recovering lowrank tensors via the TIHT, and in present numerical results. In Section 6, we provide a short conclusion and discussion of directions for future work. Proofs of auxiliary results are provided in the appendices.
2. Tensor prerequisites
In this section, we briefly review some basic definitions concerning tensors. For further overview, we refer the reader to [24]. Let , be integers, and for all . For a multiindex , we will denote the th entry of a mode tensor by . When convenient we wil also denote the entries by , , or . For the remainder of this work, we will use bold text to denote vectors (i.e., onemode tensors), capital letters to denote matrices (i.e., twomode tensors) and use calligraphic text for all other tensors.
2.1. Modewise multiplication and mode products:
For , the mode product of mode tensor with a matrix is another mode tensor . Its entries are given by
(2) 
for all . If is a tensor of the form , where denotes the outer product, then one may use (2) to see that
(3) 
For further discussion of the properties of modewise products, please see [21, 24].
2.2. HOSVD decomposition and multilinear rank
Let be a multiindex in . We say that a mode tensor has multilinear rank or HOSVD rank at most if there exist subspaces such that
where denotes the tensor product of the subspaces here. We note that a tensor has rank at most if and only if there exists a core tensor such that
(4) 
where, for each , is an orthonormal basis for and is the matrix . A factorization of the form (4) is called a HigherOrder Singular Value Decomposition (HOSVD) of the tensor It is wellknown (see e.g., [33]) that we may assume that the core tensor has orthogonal subtensors in the sense that for all , we have for all , i.e.
(5) 
We also note that, since each of the form an orthonormal basis for , we have , where here denotes the trace inner product.
Remark 1.
The Canonical Polyadic (CP) rank of a mode tensor is the minimum number of rankone tensors (i.e., outer products of vectors) required to represent the tensor as a sum. If has HOSVD rank then (4) implies has CP rank at most .(In particular, if for all , then has CP rank at most .)
2.3. Restricted Isometry Properties and Tensors
Definition 1.
[TRIP property] We say that a linear map has the TRIP property if for all with HOSVD rank at most we have
(6) 
Definition 2.
[RIP property] We say that a linear map has the RIP property if for all elements
(7) 
We emphasize that the set in Definition 2 can be a subset of any vector space (not necessarily a tensor space).
2.4. Reshaping and the HOSVD
For simplicity we will assume below, and for the rest of this paper, that there exist such that for . We note that this assumption is made only for the sake of clarity, and all of our analysis can be extended to the general case.
We let be an integer which divides and let Consider the reshaping operator
that flattens every modes of a tensor into one. Note that decreases the total number of modes from to . Formally, is defined to be the unique linear operator such that on rankone tensors it acts as
where denotes the Kronecker product when applied to vectors. We observe that if a tensor has a form (4), then its reshaping is the mode tensor with HOSVD rank at most given by
(8) 
where the new component vectors are obtained by taking Kronecker product of the appropriate , and where is a reshaped version of from (4). Since each of the was an orthonormal basis for it follows that is an orthonormal basis for .
3. Main results: modewise TRIP
For , let be an matrix, let be the linear map which acts modewise on mode tensors by
(9) 
Let be a mode tensor with HOSVD decomposition given by (4). By (3) and (8), we have that
(10) 
Our first main result will show that satisfies the TRIP property under the assumption that each of the satisfies a restricted isometry property on the set defined below.
Definition 3.
[The set ] Consider a set of vectors in
(11) 
and let For the rest of this text we will let , and note that .
More precisely, will show that satisfies the TRIP property under the assumption that each of the satisfy the RIP(, ), where is a suitably chosen parameter depending on . In the case where , this is nearly trivial. Indeed, if , and is the map defined in (9), then we have
Therefore, since , we immediately obtain the following proposition.
Proposition 1.
Suppose that is defined as per (9) and that each of the have RIP property. Let and assume that . Then satisfies the TRIP property, that is,
for all with HOSVD rank .
Our first main result is the following theorem which is the analogue for Proposition 1 for It shows that if the each of the satisfies RIP property for a suitable value of , then has the TRIP( property.
Theorem 1.
Suppose that is defined as per (9) and that each of the have RIP property. Let , let and assume that . Then satisfies the TRIP property, i.e.,
(12) 
for all with HOSVD rank less than .
Proof.
See Section 5.3. ∎
The following corollary shows that we may pick the matrices to have i.i.d subgaussian entries.
Corollary 1.
Let , and let . Suppose that is defined as per (9) and that each of the has i.i.d. subgaussian entries, for all , where for , and suppose that Let
(13) 
for a sufficiently large constant . Then satisfies TRIP() property (6
) with probability at least
.Proof.
See Section 5.3. ∎
For another possible choice of the we consider the set of Subsampled Orthogonal with Random Sign matrices defined below. Note, in particular, that this class includes subsampled Fourier (i.e., discrete cosine and sine) matrices.
Definition 4 (Subsampled Orthogonal with Random Sign (SORS) matrices).
Let denote an orthonormal matrix obeying
(14) 
for some Let be a matrix whose rows are chosen i.i.d. uniformly at random from the rows of . We define a Subsampled Orthogonal with Random Sign (SORS) measurement ensemble as , where is a random diagonal matrix whose the diagonal entries are i.i.d. with equal probability.
Analogous to Corollary 1, the following result shows that we may choose our matrices to be SORS matrices in Theorem 1.
Corollary 2.
Let and let . Suppose that is defined as per (9) and that each of the is a SORS matrix with for a universal constant , as per Definition 4, for all , where for . Furthermore, suppose that Let
(15) 
where
(16) 
and are sufficiently large absolute constants. Then satisfies TRIP() property (6) with probability at least .
Proof.
See Section 5.3. ∎
To further improve embedding dimension of provided by Corollaries 1 and 2, we can apply a secondary compression, analogous to the one used in [21], by letting
(17) 
where vect is a vectorization operator. In this case, we again wish to show that satisfies for suitably chose parameters. One of the key challenges in doing this is that, for any given , the new factor vectors defined as in (10) are no longer orthogonal to one another. Therefore (10) is not an HOSVD decomposition of ), and the HOSVD rank of might be much larger than the HOSVD rank of . However, one may overcome this difficulty by observing that, with high probability, will belong to the following set of nearly orthogonal tensors.
Definition 5 (Nearly orthogonal tensors ).
Our next main result is the following theorem which shows that satisfies for suitably chosen parameters.
Theorem 2.
Proof.
See Section 5.4. ∎
The following two corollaries show that we may choose the matrices and to be either subgaussian or SORS matrices. We also note that it is possible to produce other variants of these corollaries where, for example, one takes each to be subgaussian and lets be a SORS matrix.
Corollary 3.
Let and let . Suppose that and are defined as in (9) and (17), and that all of the have i.i.d. subgaussian entries for all , where , and suppose that Let
(18) 
and let also be a subgaussian matrix with i.i.d. entries with
(19) 
Then, satisfies the TRIP() property, i.e.,
for all with HOSVD rank at most with probability at least .
Proof.
See Section 5.4. ∎
Remark 2.
Note that applying the reshaping operator (with ) is necessary in order for us to actually achieve dimension reduction in the first step. Indeed, if then (18) requires . We also note that when other parameters are held fixed, the final dimension will be required to be , , or . While the dependence on the number modes is exponential, we are primarily interested in cases where is large in comparison to the rank or the number of modes. In this case, the terms involving will dominate the terms involving . In Section 4.1, we will show that TRIPdependent tensor recovery methods (e.g., tensor iterative hard thresholding, discussed in Section 4), successfully work for and .
In [33], the author considered i.i.d. subgaussian measurements applied to the vectorizations of lowrank tensors and proved that the property will hold with probability at least if the target dimension satisfies
We note this bound has the same computational complexity as ours with respect to , , and . While their result has much better dependence on , here, we are primarily interested in highdimensional, lowrank tensors and therefore are primarily concerned with the dependence on .
Corollary 4.
Let and let . Suppose that and are defined as in (9) and (17), and that all of the are SORS matrices (as per Definition 4) for all , where . Furthermore, suppose that and, as in (15), let
(20) 
Next, let also be a SORS matrix with
(21) 
where
Then, satisfies the TRIP() property, i.e.,
holds for all with HOSVD rank at most with probability at least .
Proof.
See Section 5.4. ∎
Remark 3.
Similar to the subgaussian case, we note that reshaping (with ) is needed in order for us to achieve dimension reduction in the first compression. We also note that the final dimension is , , and .
4. LowRank Tensor Recovery
Lowrank tensor recovery is the task of recovering a lowrank (or approximately lowrank) tensor from a comparatively small number of possibly noisy linear measurements. This problem serves as a nice motivating example of where the use of modewise maps with the TRIP property can help alleviate the burdensome storage requirements of maps which require vectorization. Indeed, when the goal is compression, storing very large maps in memory as required by vectorizationbased approaches is counterintuitive and often infeasible.
In the twomode (matrix) case, the question of lowrank recovery from a small number of linear measurements is now wellknown to be possible under various model assumptions on the measurements [11, 10, 34]. One of the standard approaches is socalled nuclearnorm minimization:
Since the nuclear norm is defined to be the sum of the singular values, it serves as a fairly good, computationally feasible proxy for rank. As in classical compressed sensing, an alternative to optimizationbased reconstruction is the use of iterative solvers. One such approach is the Iterative Hard Thresholding (IHT) method [8, 9, 36] that finds a solution via the alternating updates
(22)  
where is initiated randomly. Here, denotes the adjoint of the operator , and the function is a thresholding operator, which returns the closest rank matrix. Results for IHT prove that sparse or lowrank recovery is guaranteed when the measurement operator satisfies various properties. For example, in the case of sparse vector recovery, the restricted isometry property is enough to guarantee accurate reconstruction [8]. In the lowrank matrix case, measurements can be taken to be Gaussian [13], or satisfy various analogues of the restricted isometry property [36, 7, 39]. In what follows, for the sake of simplicity, we will focus on the case where which is referred to as Classical IHT. However, our results can also be extended to Normalized TIHT where the step size takes a different value at each step. (See [33] and the references provided there.)
The iterative hard thresholding method has been extended to the tensor case ([18, 33, 19]). In this problem, one aims to recover an unknown tensor with e.g., HOSVD rank , where , from linear measurements of the form , where is a linear map from , with and is an arbitrary noise vector. The iteration update is given by the same updates as (22). The primary difference with the matrix case is in the thresholding operator that approximately computes the best rank approximation of a given tensor. Unfortunately, exactly computing the best rank approximation of a general tensor is NPhard. However, it is possible to construct an operator in a way such that
(23) 
where is the true best rank approximation of . (For details, please see [33] and the references therein.) For the rest of this section, we will always assume that is constructed in a way to satisfy (23).
The following theorem is the main result of [33]. It guarantees accurate reconstruction via TIHT guarantee when the measurement operator satisfies the TRIP property for a sufficiently small . Unfortunately, the condition (24), required by this theorem, is a bit stronger than (23), which is guaranteed to hold. As noted in [33], getting rid of the condition (24) appears to be difficult if not impossible. That said, (23
) is a worstcase estimate, and
typically returns much better estimates. Moreover, numerical experiments show that the TIHT algorithm works well in practice, and indeed the condition (24) does often hold, especially in early iterations of the algorithm.Theorem 3 ([33], Theorem 1).
Let , let satisfy TRIP with for some , and let and be defined as in (22). We assume that
(24) 
Suppose where is an arbitrary noise vector. Then
where
Theorem 3 shows that that lowrank tensor recovery is possible when the measurements satisfy the TRIP property. In [33], the authors also show it is possible to randomly construct maps which satisfy this property with high probability. Unfortunately, these maps require first vectorizing the input tensor into a dimensional vector and then multiplying by an matrix. This greatly limits the practical use of such maps since this matrix requires more memory than the original tensor. Thus, our results here for modewise TRIP are especially important and applicable in the tensor recovery setting. The following corollary, which shows that we may choose or (as in (9) or (17)), now follows immediately from combining Theorem 3 with Theorems 1 and 2.
Corollary 5.
Assume the operator , is defined in one of the following ways:

, where is defined as per (9) and the matrices satisfy the RIP, and .

defined as in (17), its component matrices satisfy RIP property, , and satisfies the property.
Consider the recovery problem from the noisy measurements where is an arbitrary noise vector. Let and let , and be defined as in (22), and assume that (24) holds. Then,
where
4.1. Experiments
In this section, we show that TIHT can be used with modewise measurement maps in order to successfully reconstruct lowrank tensors. Herein we present numerical results for recovery of random fourmode tensors in from both modewise Gaussian and SORS measurements.
We run tensor iterative hard thresholding algorithm as defined in (22) to recover lowrank tensors from measurements for a variety of values. We compare the percentage of successfully recovered tensors from a batch of randomly generated lowrank tensors, as well as the average number of iterations used for recovery on the successful runs. We call a recovery process successful if the initial error between the true lowrank tensor and its initial random approximation decreases by a factor of at least in at most iterations. We compare standard vectorized measurements with proposed step partially modewise measurements that reshape a fourmode tensor into a matrix, perform modewise measurements reducing each of the two reshaped modes to , and , and then vectorize that result and compress it further to the target dimension . Herein we consider a variety of intermediate dimensions to demonstrate the stability of advantage of the modewise measurements over the vectorized ones.
In addition to the smaller memory required for storing modewise measurement matrices, we show (Figure 1) that modewise measurements are able to recover tensors from at least as small of a compressed representation as standard vectorized measurements can. Indeed, in the SORS case, modewise measurements can actually successfully recover lowrank tensors using a much smaller number of measurements (see Figure 1, right column). In Figure 2 we show that the described memory advantages do not result in the need for a substantially increased number of iterations in order to achieve our convergence criteria. In the Gaussian case, the number of iterations needed when using a modewise measurements is at most twice the number as when using vectorized measurements. In the SORS case, modewise measurements actually require fewer iterations. Thus, modewise measurements are an effective, memoryefficient method of dimension reduction.
5. Proofs and theoretical guarantees
In this section, we will state auxiliary results that link the RIP property on a set with the covering number of , and establish the covering number estimates for the subsets of interest.
5.1. Auxiliary Results: RIP estimates
For a set we let denote its covering number, i.e., the minimal cardinality of a net contained in , such that every element of is within distance of an element of the net. For further discussion of the covering numbers, please see [38]. The following proposition shows the estimates on the covering number of can be used to show that maps constructed from subgaussian matrices have the RIP property. Its proof, which is a generalization of the proof of [33, Theorem 2], can be found in Appendix C.
Proposition 2.
Suppose has i.i.d. subgaussian entries. Let be a subset of unit norm mode tensors and let denote the covering number of . Then for any and
(25) 
for some suitably chosen constant , with probability at least , the map has the RIP property, i.e.,
Comments
There are no comments yet.