Modewise Operators, the Tensor Restricted Isometry Property, and Low-Rank Tensor Recovery

Recovery of sparse vectors and low-rank matrices from a small number of linear measurements is well-known to be possible under various model assumptions on the measurements. The key requirement on the measurement matrices is typically the restricted isometry property, that is, approximate orthonormality when acting on the subspace to be recovered. Among the most widely used random matrix measurement models are (a) independent sub-gaussian models and (b) randomized Fourier-based models, allowing for the efficient computation of the measurements. For the now ubiquitous tensor data, direct application of the known recovery algorithms to the vectorized or matricized tensor is awkward and memory-heavy because of the huge measurement matrices to be constructed and stored. In this paper, we propose modewise measurement schemes based on sub-gaussian and randomized Fourier measurements. These modewise operators act on the pairs or other small subsets of the tensor modes separately. They require significantly less memory than the measurements working on the vectorized tensor, provably satisfy the tensor restricted isometry property and experimentally can recover the tensor data from fewer measurements and do not require impractical storage.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/04/2019

Tensor Restricted Isometry Property Analysis For a Large Class of Random Measurement Ensembles

In previous work, theoretical analysis based on the tensor Restricted Is...
06/12/2019

Optimal low rank tensor recovery

We investigate the sample size requirement for exact recovery of a high ...
08/22/2019

Iterative Hard Thresholding for Low CP-rank Tensor Models

Recovery of low-rank matrices from a small number of linear measurements...
06/28/2017

Generalized notions of sparsity and restricted isometry property. Part II: Applications

The restricted isometry property (RIP) is a universal tool for data reco...
10/02/2013

Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming

Statistical inference and information processing of high-dimensional dat...
03/28/2019

Improved Lower Bounds for the Restricted Isometry Property of Subsampled Fourier Matrices

Let A be an N × N Fourier matrix over F_p^N/p for some prime p. We impro...
12/05/2019

Tensor Recovery from Noisy and Multi-Level Quantized Measurements

Higher-order tensors can represent scores in a rating system, frames in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and prior work

Geometry preserving dimension reduction has become important in a wide variety of applications in the last two decades due to improved sensing capabilities and the increasing prevalence of massive data sets. This is motivated in part by the fact that the data one collects often consists of high-dimensional representations of intrinsically simpler and effectively lower-dimensional data. In such settings, randomized linear projections have been demonstrated to preserve the intrinsic geometric structure of the collected data in a wide range of applications in both computer science (where one often deals with finite data sets [14, 1]) and signal processing (where manifold [4] and sparsity [17] assumptions are common). In this context, the vast majority of prior work has been focused on recovering vector data taking values in a set using random linear maps into with which are guaranteed to approximately preserve the norms of all elements in . The focus of this paper is extending this line of work to higher-order tensors taking values in .

In the vector case, uniform guarantees for the approximate norm preservation for all sparse vectors, in the form of the restricted isometry property (RIP), have numerous applications. They include recovery algorithms that reconstruct all sparse vectors from a few linear measurements (such as, -minimization [12, 16, 29], orthogonal matching pursuit [40], CoSaMP [30, 16], iterative hard thresholding [8] and hard thresholding pursuit [15]). Extending these algorithms from sparse vector recovery to low-rank matrix or low-rank tensor recovery is very natural. Indeed, rank- matrices (i.e., two-mode tensors) in can be recovered from linear measurements [11, 17]

. Extensions to the low-rank higher-order tensor setting, however, are less straightforward due to, e.g., the more complicated structure of higher-order singular value decomposition and non-unique definition of the tensor rank. Still, there are many applications that motivate the use of tensors, ranging from video and longitudinal imaging

[26, 6]

to machine learning

[35, 37] and differential equations [5, 27]. Thus, while tensor applications are ubiquitous and moreover the tensors arising in these applications are extremely large-scale, few methods exist that do satisfactory tensor dimension reduction. Our goal here is thus to demonstrate a tensor dimension reduction technique that is computationally feasible (in terms of application and storage) and that guarantees preservation of geometry. As a motivating example, we consider the problem of tensor reconstruction from such dimension reduction measurements, and in particular the Tensor Iterative Hard Thresholding method is used for this purpose herein.

In [33]

, the authors propose tensor extensions of the Iterative Hard Thresholding (IHT) method for several tensor decomposition formats, namely the higher-order singular value decomposition (HOSVD), the tensor train format, and the general hierarchical Tucker decomposition. Additionally, the recent papers

[20, 19] extend the Tensor IHT method (TIHT) to low Canonical Polyadic (CP) rank and low Tucker rank tensors, respectively. TIHT as the name suggests is an iterative method that consists of one step that applies the adjoint of the measurement operator to the remaining residual and a second step that thresholds that signal proxy to a low-rank tensor. This method has seen provable guarantees for reconstruction under various geometry preserving assumptions on the measurement maps [33, 20, 19]. All these works however propose first reshaping a -mode tensor into an -dimensional vector and then multiplying by an matrix . Unfortunately, this means that the matrix must be even larger than the original tensors . The main goal of this paper is to propose a more memory-efficient alternative to this approach.

In particular, we propose a modewise framework for low-rank tensor recovery. A general two-stage modewise linear operator takes the form

(1)

where is a reshaping operator which reorganizes an tensor into an tensor, after which each is applied to the resphaped tensor for via a modewise product (reviewed in Section 2), followed by an additional reshaping via into an tensor, and finally additional -mode products with the matrices for . More general -stage modewise operators can be defined similarly. First analyzed in [21, 23] for aiding in the rapid computation of the CP decomposition, such modewise compression operators offer a wide variety of computational advantages over standard vector-based approaches (in which is a vectorization operator so that , is a standard Johnson-Lindenstrauss map, and all remaining operators are the identity). In particular, when

is a more modest reshaping (or even the identity) the resulting modewise linear transforms can be formed using significantly fewer random variables (effectively, independent random bits), and stored using less memory by avoiding the use of a single massive

matrix. In addition, such modewise linear operators also offer trivially parallelizable operations, faster serial data evaluations than standard vectorized approaches do for structured data (see, e.g., [23]), and the ability to better respect the multimodal structure of the given tensor data.

Related Work and Contributions: This paper is directly motivated by recent work on low-rank tensor recovery using vectorized measurements [33, 20, 19]. Given the framework that modewise measurement operators (1) provide for creating highly structured and computationally efficient measurement maps, we aim to provide both theoretical guarantees and empirical evidence that several modewise maps allow for the efficient recovery of tensors with low-rank HOSVD decompositions. This represents the first study of such modewise maps for performing norm-preserving dimension reduction of nontrivial infinite sets of elements in (tensorized) Euclidean spaces, and so provides a general framework for generalizing the use of such maps to other types of, e.g., low-rank tensor models. See Section 3 for the specifics of our theoretical results as well as Section 4 for an empirical demonstration of the good performance such modewise maps can provide for tensor recovery in practice.

Other recent work involving the analysis of modewise maps for tensor data include, e.g., applications in kernel learning methods which effectively use modewise operators specialized to finite sets of rank-one tensors [2], as well as a variety of works in the computer science literature aimed at compressing finite sets of low-rank (with respect to, e.g., CP and tensor train decompositions [32]) tensors. More general results involving extensions of bounded orthonormal sampling results to the tensor setting [23, 3] apply to finite sets of arbitrary tensors. With respect to norm-preserving modewise embeddings of infinite sets, prior work has been limited to oblivious subspace embeddings (see, e.g., [21, 28]). Here, we extend these techniques to the set of all tensors with a low-rank HOSVD decomposition in order to obtain modewise embeddings with the Tensor Restricted Isometry Property (TRIP). Having obtained modewise TRIP operators, we then consider low-rank tensor recovery via Tensor IHT (TIHT).

Paper Outline: The rest of this paper is organized as follows. In Section 2, we will provide a brief review of basic tensor definitions. In Section 3, we will state our main results, which we then prove in Section 5. In Section 4, we discuss applications of our results recovering low-rank tensors via the TIHT, and in present numerical results. In Section 6, we provide a short conclusion and discussion of directions for future work. Proofs of auxiliary results are provided in the appendices.

2. Tensor prerequisites

In this section, we briefly review some basic definitions concerning tensors. For further overview, we refer the reader to [24]. Let , be integers, and for all . For a multi-index , we will denote the -th entry of a -mode tensor by . When convenient we wil also denote the entries by , , or . For the remainder of this work, we will use bold text to denote vectors (i.e., one-mode tensors), capital letters to denote matrices (i.e., two-mode tensors) and use calligraphic text for all other tensors.

2.1. Modewise multiplication and -mode products:

For , the -mode product of -mode tensor with a matrix is another -mode tensor . Its entries are given by

(2)

for all . If is a tensor of the form , where denotes the outer product, then one may use (2) to see that

(3)

For further discussion of the properties of modewise products, please see [21, 24].

2.2. HOSVD decomposition and multilinear rank

Let be a multi-index in . We say that a -mode tensor has multilinear rank or HOSVD rank at most if there exist subspaces such that

where denotes the tensor product of the subspaces here. We note that a tensor has rank at most if and only if there exists a core tensor such that

(4)

where, for each , is an orthonormal basis for and is the matrix . A factorization of the form (4) is called a Higher-Order Singular Value Decomposition (HOSVD) of the tensor It is well-known (see e.g., [33]) that we may assume that the core tensor has orthogonal subtensors in the sense that for all , we have for all , i.e.

(5)

We also note that, since each of the form an orthonormal basis for , we have , where here denotes the trace inner product.

Remark 1.

The Canonical Polyadic (CP) rank of a -mode tensor is the minimum number of rank-one tensors (i.e., outer products of vectors) required to represent the tensor as a sum. If has HOSVD rank then (4) implies has CP rank at most .(In particular, if for all , then has CP rank at most .)

2.3. Restricted Isometry Properties and Tensors

Definition 1.

[TRIP property] We say that a linear map has the TRIP property if for all with HOSVD rank at most we have

(6)
Definition 2.

[RIP property] We say that a linear map has the RIP property if for all elements

(7)

We emphasize that the set in Definition 2 can be a subset of any vector space (not necessarily a tensor space).

2.4. Reshaping and the HOSVD

For simplicity we will assume below, and for the rest of this paper, that there exist such that for . We note that this assumption is made only for the sake of clarity, and all of our analysis can be extended to the general case.

We let be an integer which divides and let Consider the reshaping operator

that flattens every modes of a tensor into one. Note that decreases the total number of modes from to . Formally, is defined to be the unique linear operator such that on rank-one tensors it acts as

where denotes the Kronecker product when applied to vectors. We observe that if a tensor has a form (4), then its reshaping is the -mode tensor with HOSVD rank at most given by

(8)

where the new component vectors are obtained by taking Kronecker product of the appropriate , and where is a reshaped version of from (4). Since each of the was an orthonormal basis for it follows that is an orthonormal basis for .

3. Main results: modewise TRIP

For , let be an matrix, let be the linear map which acts modewise on -mode tensors by

(9)

Let be a mode tensor with HOSVD decomposition given by (4). By (3) and (8), we have that

(10)

Our first main result will show that satisfies the TRIP property under the assumption that each of the satisfies a restricted isometry property on the set defined below.

Definition 3.

[The set ] Consider a set of vectors in

(11)

and let For the rest of this text we will let , and note that .

More precisely, will show that satisfies the TRIP property under the assumption that each of the satisfy the RIP(, ), where is a suitably chosen parameter depending on . In the case where , this is nearly trivial. Indeed, if , and is the map defined in (9), then we have

Therefore, since , we immediately obtain the following proposition.

Proposition 1.

Suppose that is defined as per (9) and that each of the have RIP property. Let and assume that . Then satisfies the TRIP property, that is,

for all with HOSVD rank .

Our first main result is the following theorem which is the analogue for Proposition 1 for It shows that if the each of the satisfies RIP property for a suitable value of , then has the TRIP( property.

Theorem 1.

Suppose that is defined as per (9) and that each of the have RIP property. Let , let and assume that . Then satisfies the TRIP property, i.e.,

(12)

for all with HOSVD rank less than .

Proof.

See Section 5.3. ∎

The following corollary shows that we may pick the matrices to have i.i.d sub-gaussian entries.

Corollary 1.

Let , and let . Suppose that is defined as per (9) and that each of the has i.i.d. sub-gaussian entries, for all , where for , and suppose that Let

(13)

for a sufficiently large constant . Then satisfies TRIP() property (6

) with probability at least

.

Proof.

See Section 5.3. ∎

For another possible choice of the we consider the set of Subsampled Orthogonal with Random Sign matrices defined below. Note, in particular, that this class includes subsampled Fourier (i.e., discrete cosine and sine) matrices.

Definition 4 (Subsampled Orthogonal with Random Sign (SORS) matrices).

Let denote an orthonormal matrix obeying

(14)

for some Let be a matrix whose rows are chosen i.i.d. uniformly at random from the rows of . We define a Subsampled Orthogonal with Random Sign (SORS) measurement ensemble as , where is a random diagonal matrix whose the diagonal entries are i.i.d.  with equal probability.

Analogous to Corollary 1, the following result shows that we may choose our matrices to be SORS matrices in Theorem 1.

Corollary 2.

Let and let . Suppose that is defined as per (9) and that each of the is a SORS matrix with for a universal constant , as per Definition 4, for all , where for . Furthermore, suppose that Let

(15)

where

(16)

and are sufficiently large absolute constants. Then satisfies TRIP() property (6) with probability at least .

Proof.

See Section 5.3. ∎

To further improve embedding dimension of provided by Corollaries 1 and 2, we can apply a secondary compression, analogous to the one used in [21], by letting

(17)

where vect is a vectorization operator. In this case, we again wish to show that satisfies for suitably chose parameters. One of the key challenges in doing this is that, for any given , the new factor vectors defined as in (10) are no longer orthogonal to one another. Therefore (10) is not an HOSVD decomposition of ), and the HOSVD rank of might be much larger than the HOSVD rank of . However, one may overcome this difficulty by observing that, with high probability, will belong to the following set of nearly orthogonal tensors.

Definition 5 (Nearly orthogonal tensors ).

Let be the set of d-mode tensors in that may be written in standard form (4) such that

  1. for all and ,

  2. for all ,

  3. the core tensor satisfies ,

  4. has orthogonal subtensors in the sense that (5) holds for all ,

  5. .

Our next main result is the following theorem which shows that satisfies for suitably chosen parameters.

Theorem 2.

Let and let . Suppose that and are defined as in (9) and (17). Let and assume that satisfies the RIP property for all where . Assume that satisfies the RIP, property where . Then, will satisfy the TRIP() property, i.e.,

for all with HOSVD rank at most .

Proof.

See Section 5.4. ∎

The following two corollaries show that we may choose the matrices and to be either sub-gaussian or SORS matrices. We also note that it is possible to produce other variants of these corollaries where, for example, one takes each to be sub-gaussian and lets be a SORS matrix.

Corollary 3.

Let and let . Suppose that and are defined as in (9) and (17), and that all of the have i.i.d. sub-gaussian entries for all , where , and suppose that Let

(18)

and let also be a sub-gaussian matrix with i.i.d. entries with

(19)

Then, satisfies the TRIP() property, i.e.,

for all with HOSVD rank at most with probability at least .

Proof.

See Section 5.4. ∎

Remark 2.

Note that applying the reshaping operator (with ) is necessary in order for us to actually achieve dimension reduction in the first step. Indeed, if then (18) requires . We also note that when other parameters are held fixed, the final dimension will be required to be , , or . While the dependence on the number modes is exponential, we are primarily interested in cases where is large in comparison to the rank or the number of modes. In this case, the terms involving will dominate the terms involving . In Section 4.1, we will show that TRIP-dependent tensor recovery methods (e.g., tensor iterative hard thresholding, discussed in Section 4), successfully work for and .

In [33], the author considered i.i.d. sub-gaussian measurements applied to the vectorizations of low-rank tensors and proved that the property will hold with probability at least if the target dimension satisfies

We note this bound has the same computational complexity as ours with respect to , , and . While their result has much better dependence on , here, we are primarily interested in high-dimensional, low-rank tensors and therefore are primarily concerned with the dependence on .

Corollary 4.

Let and let . Suppose that and are defined as in (9) and (17), and that all of the are SORS matrices (as per Definition 4) for all , where . Furthermore, suppose that and, as in (15), let

(20)

Next, let also be a SORS matrix with

(21)

where

Then, satisfies the TRIP() property, i.e.,

holds for all with HOSVD rank at most with probability at least .

Proof.

See Section 5.4. ∎

Remark 3.

Similar to the sub-gaussian case, we note that reshaping (with ) is needed in order for us to achieve dimension reduction in the first compression. We also note that the final dimension is , , and .

4. Low-Rank Tensor Recovery

Low-rank tensor recovery is the task of recovering a low-rank (or approximately low-rank) tensor from a comparatively small number of possibly noisy linear measurements. This problem serves as a nice motivating example of where the use of modewise maps with the TRIP property can help alleviate the burdensome storage requirements of maps which require vectorization. Indeed, when the goal is compression, storing very large maps in memory as required by vectorization-based approaches is counterintuitive and often infeasible.

In the two-mode (matrix) case, the question of low-rank recovery from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements [11, 10, 34]. One of the standard approaches is so-called nuclear-norm minimization:

Since the nuclear norm is defined to be the sum of the singular values, it serves as a fairly good, computationally feasible proxy for rank. As in classical compressed sensing, an alternative to optimization-based reconstruction is the use of iterative solvers. One such approach is the Iterative Hard Thresholding (IHT) method [8, 9, 36] that finds a solution via the alternating updates

(22)

where is initiated randomly. Here, denotes the adjoint of the operator , and the function is a thresholding operator, which returns the closest rank matrix. Results for IHT prove that sparse or low-rank recovery is guaranteed when the measurement operator satisfies various properties. For example, in the case of sparse vector recovery, the restricted isometry property is enough to guarantee accurate reconstruction [8]. In the low-rank matrix case, measurements can be taken to be Gaussian [13], or satisfy various analogues of the restricted isometry property [36, 7, 39]. In what follows, for the sake of simplicity, we will focus on the case where which is referred to as Classical IHT. However, our results can also be extended to Normalized TIHT where the step size takes a different value at each step. (See [33] and the references provided there.)

The iterative hard thresholding method has been extended to the tensor case ([18, 33, 19]). In this problem, one aims to recover an unknown tensor with e.g., HOSVD rank , where , from linear measurements of the form , where is a linear map from , with and is an arbitrary noise vector. The iteration update is given by the same updates as (22). The primary difference with the matrix case is in the thresholding operator that approximately computes the best rank approximation of a given tensor. Unfortunately, exactly computing the best rank approximation of a general tensor is NP-hard. However, it is possible to construct an operator in a way such that

(23)

where is the true best rank approximation of . (For details, please see [33] and the references therein.) For the rest of this section, we will always assume that is constructed in a way to satisfy (23).

The following theorem is the main result of [33]. It guarantees accurate reconstruction via TIHT guarantee when the measurement operator satisfies the TRIP property for a sufficiently small . Unfortunately, the condition (24), required by this theorem, is a bit stronger than (23), which is guaranteed to hold. As noted in [33], getting rid of the condition (24) appears to be difficult if not impossible. That said, (23

) is a worst-case estimate, and

typically returns much better estimates. Moreover, numerical experiments show that the TIHT algorithm works well in practice, and indeed the condition (24) does often hold, especially in early iterations of the algorithm.

Theorem 3 ([33], Theorem 1).

Let , let satisfy TRIP with for some , and let and be defined as in (22). We assume that

(24)

Suppose where is an arbitrary noise vector. Then

where

Theorem 3 shows that that low-rank tensor recovery is possible when the measurements satisfy the TRIP property. In [33], the authors also show it is possible to randomly construct maps which satisfy this property with high probability. Unfortunately, these maps require first vectorizing the input tensor into a -dimensional vector and then multiplying by an matrix. This greatly limits the practical use of such maps since this matrix requires more memory than the original tensor. Thus, our results here for modewise TRIP are especially important and applicable in the tensor recovery setting. The following corollary, which shows that we may choose or (as in (9) or (17)), now follows immediately from combining Theorem 3 with Theorems 1 and 2.

Corollary 5.

Assume the operator , is defined in one of the following ways:

  1. , where is defined as per (9) and the matrices satisfy the RIP, and .

  2. defined as in (17), its component matrices satisfy RIP property, , and satisfies the property.

Consider the recovery problem from the noisy measurements where is an arbitrary noise vector. Let and let , and be defined as in (22), and assume that (24) holds. Then,

where

4.1. Experiments

In this section, we show that TIHT can be used with modewise measurement maps in order to successfully reconstruct low-rank tensors. Herein we present numerical results for recovery of random four-mode tensors in from both modewise Gaussian and SORS measurements.

We run tensor iterative hard thresholding algorithm as defined in (22) to recover low-rank tensors from measurements for a variety of values. We compare the percentage of successfully recovered tensors from a batch of randomly generated low-rank tensors, as well as the average number of iterations used for recovery on the successful runs. We call a recovery process successful if the initial error between the true low-rank tensor and its initial random approximation decreases by a factor of at least in at most iterations. We compare standard vectorized measurements with proposed -step partially modewise measurements that reshape a four-mode tensor into a matrix, perform modewise measurements reducing each of the two reshaped modes to , and , and then vectorize that result and compress it further to the target dimension . Herein we consider a variety of intermediate dimensions to demonstrate the stability of advantage of the modewise measurements over the vectorized ones.

In addition to the smaller memory required for storing modewise measurement matrices, we show (Figure 1) that modewise measurements are able to recover tensors from at least as small of a compressed representation as standard vectorized measurements can. Indeed, in the SORS case, modewise measurements can actually successfully recover low-rank tensors using a much smaller number of measurements (see Figure 1, right column). In Figure 2 we show that the described memory advantages do not result in the need for a substantially increased number of iterations in order to achieve our convergence criteria. In the Gaussian case, the number of iterations needed when using a modewise measurements is at most twice the number as when using vectorized measurements. In the SORS case, modewise measurements actually require fewer iterations. Thus, modewise measurements are an effective, memory-efficient method of dimension reduction.

Figure 1. Fraction of successfully recovered random tensors out of a random sample of tensors in with various intermediate dimensions. A run is considered successful if it converged times the initial approximation error in at most iterations.
Figure 2. Average number of iterations until convergence among the successful runs. A run is considered successful if it converged times the initial approximation error in at most iterations.

5. Proofs and theoretical guarantees

In this section, we will state auxiliary results that link the RIP property on a set with the covering number of , and establish the covering number estimates for the subsets of interest.

5.1. Auxiliary Results: RIP estimates

For a set we let denote its covering number, i.e., the minimal cardinality of a net contained in , such that every element of is within distance of an element of the net. For further discussion of the covering numbers, please see [38]. The following proposition shows the estimates on the covering number of can be used to show that maps constructed from sub-gaussian matrices have the RIP property. Its proof, which is a generalization of the proof of [33, Theorem 2], can be found in Appendix C.

Proposition 2.

Suppose has i.i.d. sub-gaussian entries. Let be a subset of unit norm -mode tensors and let denote the covering number of . Then for any and

(25)

for some suitably chosen constant , with probability at least , the map has the RIP property, i.e.,