, is being extensively applied in various fields of artificial intelligence, including computer vision, image processing 
and machine learning, etc. LRTR aims at recovering a low-rank tensor 111For simplicity, this letter considers only the third-order tensor. All results can be extended with minor modifications to order tensor. from linear noise measurements , where is a random map from to with and
is a vector of measurement errors with noise level.
It’s not that easy to achieve this goal. On one hand, the naive approach of solving the nonconvex program
is NP-hard in general, where the operation
acts as a sparsity regularization of tensor singular values of. On the other hand, some existing tensor ranks do not work well, such as CP rank  and Tucker rank . Since calculating the CP rank of a tensor is usually NP-hard  and the convex surrogate of the Tucker rank, Sum of Nuclear Norms (SNN) , is not the tightest convex relaxation. To avoid these defects, Lu et al.  first pay attention to the novel tensor tubal rank of (see Definition 1), denoted as , induced by tensor-tensor product (t-product) 
and tensor Singular Value Decomposition (t-SVD)
and consider the following convex Tensor Nuclear Norm Minimization (TNNM) model
where is referred to as Tensor Nuclear Norm (TNN) (see Definition 1) which has been proved to be the convex envelop of tensor average rank222The reference  indicates that low average rank assumption is a weaker low tubal rank assumption, i.e., a tensor with low tubal always has low average rank. Its definition can be found in . within the unit ball of the tensor spectral norm . In order to facilitate the design of algorithms and the needs of practical applications, in previous work , Zhang et al. first present a theoretical analysis for Regularized Tensor Nuclear Norm Minimization (RTNNM) model, which takes the form
) when the noise level is not given or cannot be accurately estimated. The tensor Restricted Isometry Property (t-RIP) was first defined based on t-SVD in as an analysis framework for LRTR via (3). For an integer , the -tensor restricted isometry constants of a linear map is defined as the smallest constants satisfying
for all tensors whose tubal rank is at most . Moreover, our Theorem 4.1 in  shows that if satisfies the t-RIP with for certain , then the solution to (3) can robustly recover the low-tubal-rank tensor .
Note that Zhang et al.  have derived a deterministic condition of robust recovery for the RTNNM model (3) based on the t-RIP. Unfortunately, it is unknown how to construct a linear map that satisfies t-RIP. The purpose of this paper is precisely to show their existence under suitable conditions on the number of measurements in terms of the tubal rank and the size of tensor using probabilistic arguments. We consider the sub-Gaussian measurement ensemble whose all elements (tensors with size ) are drawn independently according to a sub-Gaussian distribution. This includes Gaussian, Bernoulli and all bounded distributions. For such liner maps, the t-RIP holds with high probability in the stated parameter regime.
In 2018, Lu et al.  provided an exact recovery result based on the Gaussian width for TNNM model (2). Specifically, they pointed out that the unknown tensor of size with tubal rank can be exactly recovered with high probability by solving (2) when the given number of Gaussian measurements is of the order . In 2019, Wang et al.  presented a generalized tensor Dantzig selector for low-tubal-rank tensor recovery problem with noisy measurements where is the noise term. They showed that whenever the sample size , the solution of generalized tensor Dantzig selector satisfies with high probability. In the noiseless setting (i.e., ), their results will degenerate to Lu’s case. All recovery results mentioned are probabilistic. Some deterministic results involved tensor RIP have emerged in LRTR. In 2013, the first tensor deterministic condition—tensor RIP based on Tucker decomposition  which can guarantee that a given linear map can be utilized for LRTR was proposed by Shi et al. . They showed that a tensor with Tucker rank- can be exactly recovered in the noiseless case if the linear map satisfies the tensor RIP with the constant for . Such tensor RIP is hardly practical because it depends on a rank tuple that differs greatly from the definition of familiar matrix rank, which will result in some existing analysis tools and techniques that can not be used for tensor cases. What’s more, which linear mappings satisfy such tensor RIP is still an open problem for them.
In previous work , Zhang et al. used the t-RIP to answer under what conditions the robust solution to model (3) can be obtained. In this paper, we continue the work and answer a quintessential and all-important question: which liner maps satisfy the t-RIP? Our main contributions are summarized as follows:
Using the arguments of covering numbers and chaos processes as well as concentration inequalities, we determine how many random measurements are sufficient for the linear maps that satisfy a t-RIP with high probability.
We consider a large class of sub-Gaussian distributions that include Gaussian, Bernoulli and all bounded distributions, which makes the conclusions in this paper more general.
In order to verify our conclusions, we carry out some numerical experiments on studying the variation of success recovery ratio in term of increasing measurements.
The remainder of the paper is organized as follows. In Section 2, we introduce some notations and definitions. In Section 3, some probabilistic tools for proving are given. In Section 4, our main results and their proofs are presented and discussed. Section 5 conducts some numerical experiments to support our analysis. The conclusion is addressed in Section 6.
2 Notations and preliminaries
For the sake of brevity, we list main notations which will be used later in Table 1. For a third-order tensor , let
be the Discrete Fourier transform (DFT) along the third dimension of, i.e., . Utilizing the inverse DFT, can be calculated from by . Let be the block diagonal matrix with each block on diagonal as the frontal slice of and be the block circular matrix, i.e.,
The operator and its inverse operator are, respectively, defined as
The tensor transpose  of , denoted as , is obtained by transposing each of the frontal slice and then reversing the order of transposed frontal slices 2 through . The identity tensor  is the tensor whose first frontal slice is the identity matrix, and other frontal slices are all zeros. For tensors and , the tensor-tensor product (t-product) , , is defined to be a tensor of size . The orthogonal tensor  is the tensor which satisfies . A tensor is called F-diagonal  if each of its frontal slices is a diagonal matrix.
|A tensor.||A subset of .||The -th lateral slice of|
|A matrix.||An -net of .||The tube fiber of .|
|A vector.||The identity tensor.||The transpose of .|
|A scalar.||or||The -th entry of .||The DFT of .|
|A set.||or||The -th frontal slice of .||.|
With the above notations, we first introduce three basic concepts of tensor algebra which will be used later. [t-SVD ]Let , the t-SVD factorization of tensor is
where and are orthogonal, is an F-diagonal tensor. Figure 1 illustrates the t-SVD factorization.
[Tensor tubal rank ]For , the tensor tubal rank, denoted as , is defined as the number of nonzero singular tubes of , where is from the t-SVD of . We can write
[Tensor nuclear norm ] Let be the t-SVD of . The tensor nuclear norm of is defined as
3 Probabilistic tools
This paper aims to answer which liner maps
satisfy the t-RIP. We will analyze this question from a more general perspective by considering the class of sub-Gaussian distributions. To this end, we first introduce some probabilistic tools that will be required for our results. [Sub-Gaussian random variables]A random variable is called sub-Gaussian if there exists a number such that the inequality
holds for all , and we denote that satisfies the above formula by . Sub-Gaussian distributions is a wider class of distributions as it contains Gaussian, Bernoulli and all bounded distributions. For example, if
is a Gaussian random variable with zero-mean and variance, then is also a sub-Gaussian random variable, i.e., . Therefore, we require that the distribution of all elements (tensors with size ) of the measurement ensemble is a sub-Gaussian distribution.
Next we provide some instrumental theoretical skills for the analysis of our main results which include -net, covering numbers, -functional and concentration inequalities. [-net ]For a metric space , and , if each element in is within distance () of some elements of , i.e.
then the subset is referred to as an -net of , denoted .
Throughout the article, we consider that and is the Euclidean distance, i.e. .
[Covering numbers ]Let be a subset of metric space . For , the covering number of is defined as the smallest possible cardinality of an -net of . [Covering numbers and volume ]If be a subset of metric space , then for , we have
where is the volume in and is Euclidean ball with radius .
Note that when is a unit Euclidean ball in dimensions (or it is the surface of the unit Euclidean ball), is contained in the ball. If we assume that , then we have the following crucial inequality,
which will is employed repetitively.
It is useful to observe that the tensor restricted isometry constants can be expressed as a random variable as follows
where is a set of matrices and is a sub-Gaussian vector. In order to obtain deviation bounds for random variables of this form in terms of a complexity parameter of the set of matrices , we need to introduce the complexity parameter, i.e., Talagrand’s -functional. [-functional [28, 27, 26]]Given a metric space , a collection of subsets of , , is referred to as an admissible sequence if and for every , then the -functional with any of is defined by
where the infimum is taken in regard to all admissible sequences of and .
In this paper, we mainly focus on the -functional of a set of matrices with the operator norm. The proof of our results requires the use of the covering number to give the bound of -functional. In order to do this, we will utilize the Schatten spaces. Its detailed definition is as follows:
and are defined as the Schatten norms of a given matrix , and
is defined as the radius of any set of matrices. Especially, . With these notions, for a given metric space and with the covering number , by exploiting the Dudley type integral, we have the following inequality for -functional
where is a universal constant.
In CS, the following concentration inequality which involves -functional is often adopted to estimate the deviation bound of . We will also make use of this important result. []Suppose that is a random vector whose entries with mean and variance . Let be a set of matrices, and
Then, there exist constants , depending only on such that for all ,
4 Main results
In this section, we will show that the t-RIP (4) holds with high probability for certain linear maps from a large class of random distributions satisfying the required number of measurements. We first compute the covering number of the set of tensors whose tubal rank is at most and Frobenius norm is . [Covering number for low-tubal-rank tensors]For a set
there exists an -net in regard to the Frobenius norm obeying
Here we take the proof strategy of Lemma 3.1 in  and modify it to accommodate our t-SVD. For any , we have the skinny t-SVD
where and are two orthogonal tensors and is an F-diagonal tensor. Since
so we have
We first construct -nets for sets of , and respectively, and then achieve the purpose of covering . Without loss of generality, we may assume that since the adjustments for the general case will be obvious.
Let be the set of F-diagonal tensors whose first frontal slice has nonnegative and nonincreasing diagonal entries. According to Lemma 3 and (5), there exists an -net for with . And then we let and use the notation to denote the th lateral slice of , i.e., a tensor in . Definition 3.6 in  shows that is an orthogonal tensor if and only if the lateral slices form an orthonormal set of matrices with . Therefore, it is less difficult to know that is a subset of the unit ball under the following norm
Hence, due to (5), there is an -net for satisfying . Then we can construct an -net
such that the covering number of the corresponding set satisfies
The rest of the work is to prove that is an -net for the set , i.e., . In other words, we need to prove that for any , there exists with .
Next, let with
where satisfying , , and , then we have
where the first inequality uses the triangle inequality. Since Frobenius norm has the property of being invariant under orthogonal multiplication and , are two orthogonal tensors, we thus obtain
So similarly, we would find that . Thus, we conclude that . This completes the proof. ∎
Lemma 4 leads to an important consequence of volumetric bound (7) is that the covering numbers of the collection of low-tubal-rank tensors of interest and plays a key role in the proof of Theorem 4. Besides, note that the proof of Lemma 4 is based on the t-product and t-SVD whose definitions are consistent with matrix cases. Benefit from the good property of the t-product and t-SVD, the bound (7) can reduce to the corresponding result in low-rank matrix  when .
We are in the position to state our main results. Fix and let be an any given third-order tensor whose tubal rank is at most , then a random draw of a sub-Gaussian measurement ensemble satisfies with probability at least provided that
where the constant only depends on the sub-Gaussian parameter.
Given a tensor and a measurement ensemble , then we can construct a matrix of size as follow
with being the vectorized version of the tensor . and by utilizing an -dimensional random vector whose entries with mean 0 and variance 1 to obtain the measurements, that is
Recall that the tensor restricted isometry constant can be expressed as
where . In order to apply Lemma 3 to estimate the probabilistic bound for above expressions, we define the set in Lemma 3. It remains to check that the radii , , and of the set and the complexity parameter—Talagrand’s functional . Clearly, is on account of for all . In addition, based on this fact that the operator norm of a block-diagonal matrix is the maximum of the operator norms of the diagonal blocks and the operator norm of a vector is its norm, we see that
Thus, we have . And because of
for all , we obtain
where is a universal constant. Let us now compute the constants , and in Lemma 3. This gives
By applying Lemma 3 and let , we conclude that if
then and hold. This completes the proof. ∎
Theorem 4 tells us that a random sub-Gaussian measurement ensemble obeys (4). We know that sub-gaussian distributions belong to a larger class of random distributions, including Gaussian, Bernoulli and all bounded distributions. Thus, in some sense, Theorem 4 completely characterizes the behavior of numerous random measurement ensembles in term of the t-RIP. Note that an tensor with tubal rank has at most degrees of freedom. So the required number of measurements is very reasonable and nearly optimal compared with the degrees of freedom. It is worth mentioning that there exists a similar conclusion (refer to Theorem 2 in ) motivated by some special tensor decompositions. However, our Theorem 4 improves on the result in  by a factor of ( denotes the order of a tensor) and implies that one only needs a constant number of measurements per degree of freedom of the underlying rank-tensor in order to obtain the t-RIP at rank. In addition, if , the three-order tensor will reduce to a two-order tensor, i.e., a matrix. Accordingly, the tensor tubal rank will reduce to the matrix rank, and t-RIP will reduce to the Definition 2.1 in . Thus the required number of measurements for random sub-Gaussian measurement ensembles in Theorem 4 includes the results of Theorem 2.3 in  for LRMR.
The following is a trivial corollary but an important special case of Theorem 4. Let be a Gaussian or Bernoulli measurement ensemble. Then there exists a universal constant such that the tensor restricted isometry constant of satisfies with probability at least provided that
In CS and LRMR, Gaussian random matrix or Bernoulli random matrix is often used as a universal measurement matrix (ensemble) because they satisfy vector RIP with high probability. Accordingly, Corollary 4 guarantees that the Gaussian or Bernoulli measurement ensemble can also be used for LRTR. The proof of Corollary 4 is trivial, which is omitted here.
5 Numerical experiments
In CS, it has been proved that it is NP-hard to verify vector RIP  for a specific random matrix directly . Similarly, it seems very complex to check whether a given instance of a random measurement ensemble fails to obey t-RIP. Therefore, in this section we conduct several numerical experiments to corroborate indirectly our main results.
We present numerical results for recovery of three-order tensors with different problem setups, i.e., different tensor sizes , tubal ranks , measurement ensembles and sampling rate . We perform to get the linear noise measurements instead of where is a long vector obtained by stacking the columns of . In all experiments,
is the Gaussian white noise with meanand variance . We consider two sizes of and different tubal ranks: (a) , , , , ; (b) , , , , . is a measurement matrix with i.i.d. zero-mean Gaussian entries having variance or i.i.d. Bernoulli entries, i.e., . Then the RTNNM model (3) can be reformulated as
We adopt effective Algorithm 1 in  to solve (8). With the experimental results in , the regularization parameter is set to . We deem that the tensor can be as a successful reconstruction for the original tensor from the measurements if the relative error satisfies .
Figure 2 and Figure 3 show the success rate of recovery in trials versus the sampling rate for the random Gaussian measurements ensemble and random Bernoulli measurements ensemble, respectively. The minimum required sampling rate by theory (the minimum required number of measurements, i.e., ) for successful recovery is indicated by the vertical lines. All of the cases consistently show that the unknown tensor of size with tubal rank can be successfully recovered by solving (3) when the given number of measurements . This conclusion, combined with Theorem 4.1 in , indirectly verifies our Theorem 1. However, from Figure 2 and Figure 3, it is not difficult to find that there is a small gap between the required number of measurements by theory and that required by experiment. This gap is allowed because there are many factors in the experiment such as the choice of algorithm, parameter setting, etc., which may cause this gap.