A novel nonconvex approach to recover the low-tubal-rank tensor data: when t-SVD meets PSSV

12/15/2017 ∙ by Tai-Xiang Jiang, et al. ∙ NetEase, Inc 0

In this paper we fix attention on a recently developed tensor decomposition scheme named tensor SVD (t-SVD). the t-SVD not only provides similar properties as the matrix case, but also convert the tensor tubal-rank minimization into matrix rank minimization in the Fourier domain. Generally, minimizing the tensor nuclear norm (TNN) may cause some bias. In this paper, to alleviate these bias phenomenon, we consider to minimize the proposed partial sum of the tensor nuclear norm (PSTNN) in place of the tensor nuclear norm. The novel PSTNN is used for the problems of tensor completion (TC) and tensor principal component analysis (TRPCA). The effectiveness of the proposed methods are conducted on the synthetic data and real world data, and experimental results reveal that the algorithm outperforms TNN based methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The tensor is an important format for multidimensional data, which play an increasingly significant role in a wide range of real-world applications, e.g., color image and video processing (1; 2; 3; 4; 5), hyperspectral data processing (6; 7; 8; 9), personalized web search (10; 11), high-order web link analysis (12), magnetic resonance imaging (MRI) data recovery (13), seismic data reconstruction (14)

and face recognition

(15). How to characterize and utilize the internal structural information of these multidimensional data is of crucial importance.

In matrix processing, low-rank models can robustly and efficiently handle two-dimensional data of various sources (16; 17; 18; 19; 20; 21; 22; 23; 24). Generalized from matrix format, a tensor is able to contain more essentially structural information, being a powerful tool for dealing with multi-modal and multi-relational data (25; 26; 27). Unfortunately, it is not easy to directly extend the low-rankness from the matrix to tensors. More precisely, there is not an exact (or unique) definition for tensor rank. The most popular rank definitions are CANDECOMP/PARAFAC (CP) rank and Tucker rank (28).

Figure 1: The illustrations of: (a) the Tucker decomposition and (b) the CP factorization, of an tensor.

Actually, the CP rank and Tucker rank are both defined based on their corresponding decompositions, respectively. For a tensor , its CP decomposition can be written as

(1)

where the symbol “

” represents the vector outer product,

is a positive integer and , and for . Then, the positive integer , i.e., the smallest number of the outer product of 3 vectors (or denoted as rank-one tensors in (28)) that generate , is denoted as the CP rank of . Meanwhile the Tucker decomposition for a tensor is as follow

(2)

where the symbol “” stands for the mode- product (please see details in 2), is called the core tensor, and , and are matrices. Then, the Tucker rank (or denoted as “-rank” in some literatures) is defined as a vector . The Tucker decomposition and CP decomposition are illustrated in Fig. 1.

In this paper, we fix attention on a recently developed novel tensor decomposition scheme named tensor singular value decomposition (t-SVD), which has been well studied in (29; 30; 31; 32; 33). Furthermore, in (34; 35), the bounds and conditions for recovery of corrupted tensors have been well analyzed in the tensor completion and tensor robust principal component analysis problems, respectively. The t-SVD is based on a new definition of the tensor-tensor product, which enjoys many similar properties as the matrix case (Please see Section 2.2 for details). For a tensor , its t-SVD is given by

(3)

where the symbol “” denotes the tensor-tensor product (see more details in Sec. 2.2), , and .

Figure 2: The t-SVD factorization of an tensor.

Figure 2 exhibits the t-SVD scheme. Then, the tensor tubal-rank is defined as the number of non-zero singular tubes of . Hence, the tensor nuclear norm (TNN, defined in Sec. 2.2) is adopt by (34; 35), as a convex relation of tensor tubal-rank.

The relationship between tubal-rank and CP rank is that a low CP rank tensor is indeed a low tubal-rank tensor. As the analysis in (34), if we take the FFT along the third dimension of a low CP rank tensor , we can obtain , where , . It implies that if a tensor is of CP rank , its tubal-rank is at most . Thus, for a third-order tensor with low CP rank, we can recover it using the t-SVD structure. The relationship between tubal-rank and Tucker rank is not explicit, therefore, the performance of the Tucker rank based method is brought into comparison in our numerical experiments. The experimental results would reveal the superior of the tubal-rank over the Tucker rank.

It should be noted that the t-SVD not only provides similar properties as the matrix case but also convert the tensor tubal-rank minimization into matrix rank minimization in the Fourier domain. Meanwhile, though the selected matrix nuclear norm in the Fourier domain (34; 35) is tractable, it would cause some unavoidable biases (23; 24). First, the nuclear norm minimizes not only the rank of an underlying matrix

, but also the variance of

by simultaneously minimizing all the singular values of . Second, if the ground truth matrix

has a large variance but a sparse distribution within the ground truth subspace, some inliers can be regarded as outliers in order to reduce the singular values within the target rank. For more detailed analysis, please refer to

(24). Therefore, there is still room to further enhance the potential capacity and efficiency of these t-SVD methods.

To alleviate these bias phenomenon caused by a convex surrogate, the non-convex relaxations of the matrix nuclear norm (36; 37) are reasonable options. In this paper, we consider to minimize the proposed partial sum of the tensor nuclear norm (PSTNN) in place of the tensor nuclear norm.

The main contribution of this paper mainly consists of three folds. First, on the foundation of the nonconvex surrogate of matrix rank, we propose a novel nonconvex approximation of the tensor tubal-rank, PSTNN, with superior performance than TNN. To best of our knowledge, it is the first nonconvex approach under the t-SVD scheme. Second, to minimize the proposed PSTNN, we extend the partial singular value thresholding (PSVT) operator, which was primarily proposed in (23), for the matrices in the complex field, and demonstrate that it is the exact solution to the PSTNN minimization problem. Third, we apply PSTNN to two typical tensor recovery problems and propose the PSTNN based tensor completion (PSTNN-TC) model and PSTNN based robust principal component analysis (PSTNN-RPCA) model. Two efficient alternating direction method of multipliers (ADMM) algorithms have been designed to solve the models by using the PSVT solver. Moreover, numerical experiments are conducted on the synthetic data and real-world data and the experimental results demonstrate the effectiveness and robustness of the proposed PSTNN based models.

The outline of this paper is given as follows. In Section 2, some preliminary background on tensors is given. In Section 3, the main result is presented. Experimental results are reported in Section 4. Finally, we draw some conclusions in Section 5.

2 Notation and preliminaries

In this section, before going to the main result, we briefly introduce the basic notations and definitions about tensors at first and then give the detailed novel definitions related to the t-SVD scheme.

2.1 Basic tensor notations and definition

Following (28), we use lowercase letters for saclars, e.g., , boldface lowercase letters for vectors, e.g., , boldface upper-case letters for matrices, e.g., , and boldface calligraphic letters for tensors, e.g., . Generally, an -mode tensor is defined as , and is its -th component.

Fibers are defined by fixing every index but one. Third-order tensors have column, row, and tube fibers, denoted by , , and , respectively. When extracted from the tensor, fibers are always assumed to be oriented as column vectors.

Slices are two-dimensional sections of a tensor, defined by fixing all but two indices. The horizontal, lateral, and frontal slides of a third-order tensor , denoted by , , and , respectively. The -th frontal slice of a third-order tensor, , may alternatively be denoted as in this paper.

The inner product of two same-sized tensors and is defined as. The corresponding norm (Frobenius norm) is then defined as .

The mode- unfolding of a tensor is denoted as , where the tensor element maps to the matrix element satisfying with . The inverse operator of unfolding is denoted as “fold”, i.e., .

The -mode (matrix) product of a tensor with a matrix is denoted by and is of size . Elementwise, we have

(4)

Each mode- fiber is multiplied by the matrix . This idea can also be expressed in terms of unfolded tensors

Please refer to (28) for a more extensive overview.

2.2 Notations and definition corresponding to t-SVD

For a tensor , by using the matlab command fft, we denote

as the result of discrete Fourier transformation of

along the third dimension, i.e., . Meanwhile, the inverse FFT can be denoted as .

Definition 2.1 (tensor conjugate transpose (30))

The conjugate transpose of a tensor is tensor obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices 2 through :

Definition 2.2 (t-product (30))

The t-product of and is a tensor of size , where the th tube is given by

(5)

where denotes the circular convolution between two tubes of same size.

Interpreted in another way, a 3-D tensor of size can be viewed as a matrix of fibers (tubes) with each entry as a tube lies in the third dimension. So the t-product of two tensors can be regarded as a matrix-matrix multiplication, except that the multiplication operation between scalars is replaced by circular convolution between the tubes.

Definition 2.3 (identity tensor(30))

The identity tensor is the tensor whose first frontal slice is the identity matrix, and whose other frontal slices are all zeros.

Definition 2.4 (orthogonal tensor(30))

A tensor is orthogonal if it satisfies

(6)
Definition 2.5 (block diagonal form(33))

Let denote the block-diagonal matrix of the tensor in the Fourier domain, i.e.,

(7)

It is easy to verify that the block diagonal matrix of is equal to the transpose of the block diagonal matrix of , i.e., . Further more, for any tensor and , we have

Definition 2.6 (f-diagonal tensor(30))

A tensor is called f-diagonal if each frontal slice is a diagonal matrix.

Theorem 2.1 (t-SVD(30; 32))

For , the t-SVD of is given by

(8)

where and are orthogonal tensors, and is a f-diagonal tensor.

The illustration of the t-SVD decomposition is in Figure 2. Note that one can efficiently obtain this decomposition by computing matrix SVDs in the Fourier domain as shown in Algorithm 1.

1:
2:
3:for  to  do
4:     ;
5:     ; ; ;
6:end for
7: ifft ; ifft ; ifft ;
8:, , .
Algorithm 1 T-SVD for third order tensors
Definition 2.7 (tensor tubal-rank and multi-rank(33))

The tubal-rank of a tensor , denoted as , is defined to be the number of non-zero singular tubes of , where comes from the t-SVD of : . That is

(9)

The tensor multi-rank of is a vector with the -th element equal to the rank of -th frontal slice of .

Definition 2.8 (tensor-nuclear-norm (TNN))

The tubal nuclear norm of a tensor , denoted as , is defined as the sum of singular values of all the frontal slices of .

In particular,

(10)

3 Main results

In this section, we first present the definition of the proposed PSTNN. Then the PSVT based solver of the PSTNN minimization model is presented. Furthermore, we propose the PSTNN based TC model and TRPCA model and their corresponding algorithms, respectively.

3.1 Partial sum of tensor tensor nuclear norm (PSTNN)

In (33; 34), the TNN is selected to characterize the low-tubal-rank structure of a tensor, for the tensor completion problem. TNN is also chosen to approximate the low-rank part to handle the RPCA problem (35) and outlier-RPCA problem (38). It noteworthy that, for a tensor , there is a link between its tensor tubal-rank and multi-rank:

(11)

Meanwhile, according to Definition 2.7, the -th element of the multi-rank is , and Definition 2.5 implies . Thus the norm of ’s multi-rank equals to the rank of its block-diagonal matrix in the Fourier domain , i.e.,

(12)

More precisely, the TNN defined in (10) is herein a convex relaxation of the norm of a three order tensor’s multi-rank , i.e., .

Although the nuclear norm minimization problem can be easily solved by the singular value thresholding (SVT) (39), the nuclear norm-based methods treat each singular value equally. However, the larger singular values are generally associated with the major information, and hence they should better be shrunk less to preserve the major data information (40). Recent advances show that the low-rank matrix factorization (41; 42) and MCP function (43) outperform the nuclear norm. Therefore, we tend to apply a nonconvex relaxation instead of the nuclear norm.

We firstly give our novel nonconvex tensor tubal-rank approximation, which is derived from the partial sum of singular values (PSSV) (23; 24). The PSTNN of a third order tensor is defined as follow

(13)

In (13), is the PSSV (23; 24), which is defined as for a matrix , where denotes the -th largest singular value of . It is notable that, as illustrated in Figure 3, there is a link between the PSTNN of a tensor and the PSSV of a matrix. From Figure 3, we can find that the definition of PSTNN maintains a distinct meaning in the t-SVD scheme.

Figure 3: Illustration of the relation between PSSV of a matrix (fisrt row) and PSTNN of a tensor (second row).

3.2 The PSTNN minimization model

The fundamental PSTNN-based tensor recovery model aiming at restoring a tensor from its observation with PSTNN regularization. For an observed tensor , the PSTNN regularized tensor recovery model can be written as following:

(14)

where and .

If we take FFT of and along the third mode, it is easy to see that solving the above optimization problem (14) is equivalent to solving matrix optimization problems in the Fourier domain,

(15)

where and . Thus, the tensor optimization problem (14) is transformed to matrix optimization problems in (15) in the Fourier transform domain. It should be note that, Oh et al. have proposed the close-formed solution of (15) in (23; 24) for real matrices. Hence, we restated the solving results in (23; 24), and generalize it to the complex matrices, in the followings.

To minimize (15) for the real matrices case, Oh et al. (23; 24) defined the PSVT operator . Before extending the PSVT operator for the matrices in the complex field, we first restate the von Neumann’s lemma (44; 45; 46).

Lemma 3.0 (von Neumann)

If are complex matrices with singular values

respectively, then

(16)

Moreover, equality holds in (16) there exists a simultaneous singular value decomposition and of and in the following form:

(17)

where and .

The von Neumann’s lemma shows that is always bounded by the inner product of and . Notice that the maximum value of can be only achieved when has the same singular vector matrices and as . This fact is useful to derive the PSVT.

Theorem 3.3 (Psvt)

Let , and which can be decomposed by SVD. can be considered as the sum of two matrices, , where , are the singular vector matrices corresponding to the largest singular values, and , from the -th to the last singular values. Define a complex minimization problem for the PSSV as

(18)

Then, the optimal solution of (18) can be expressed by the PSVT operator defined as:

(19)

where , , and is the soft-thresholding operator.

Proof 3.1

Lets consider , where , and , where the singular values are sorted in a non-increasing order. Also we define the function as the objective function of (18). The first term of (18) can be derived as follows:

(20)

In the minimization of (20) with respect to , is regarded as a constant and thus can be ignored. For a more detailed representation, we change the parameterization of to and minimize the function:

(21)

From von Neumann’s lemma, the upper bound of is given as for all when and . Then (21) becomes a function depending only on as follows:

(22)

Since (22) consists of simple quadratic equations for each independently, it is trivial to show that the minimum of (22) is obtained at by derivative in a feasible domain as the first-order optimality condition, where is defined as

(23)

Hence, the solution of (18) is . This result exactly corresponds to the PSVT operator where a feasible solution exists.

Therefore, the solution of (15) is

(24)

Moreover, the pseudocode of the proposed algorithm to solve (14) is given in Algorithm 2.

1:, ,
2:,
3:
4:for  do
5:     
6:end for
7:
8:
Algorithm 2 Solve (14) using PSVT

In the following subsections, based on the proposed rank approximation, we can easily give our proposed tensor completion model and tensor RPCA model.

3.3 Tensor completion using PSTNN

A tensor completion model using PSTNN can be formulated as

(25)
s.t.

Let

(26)

where . Thus, the problem (25) can be rewritten as the following unconstraint problem:

(27)

Then, the problem (27) can be solved efficiently using ADMM (47; 7; 48; 49; 22).

After introducing a auxiliary tensor, the problem (27) can be rewritten as follows:

(28)
s.t.

The augmented Lagrangian function of (28) is given by

(29)

where is the Lagrangian multiplier, is the penalty parameter for the violation of the linear constraints and is a constant.

Then, the problem in (29) can be updated as:

(30)

Algorithm 3 shows the pseudocode for the proposed PSTNN based tensor completion method.

1:The observed tensor , the set of index of observed entries , the given multi-rank , stopping criterion , ..
2:, , , .
3:while not converged do
4:     update with and by algorithm 2
5:     
6:     
7:     Check the convergence conditions , ,
8:end while
9:The completed tensor .
Algorithm 3 Solve the PSTNN based TC model (25) by ADMM

3.4 Tensor RPCA using PSTNN

A tensor RPCA model using PSTNN can be formulated as

(31)
s.t.

Its Lagrangian function is

(32)

where is the Lagrangian multiplier, is the penalty parameter for the violation of the linear constraints, and is a constant.

Then, the problem in (32) can be updated as:

(33)

where the tensor non-negative soft-thresholding operator is defined as

with

Algorithm 4 shows the pseudocode for the proposed PSTNN based tensor robust component analysis method.

1:The observed tensor , the given multi-rank , parameter , stopping criterion , .
2:, .
3:while not converged do
4:     update with and by algorithm 2
5:     
6:     
7:     Check the convergence conditions , ,
8:end while
9:The low PSTNN tensor and the sparse tensor
Algorithm 4 Solve the PSTNN based TRPCA model (31) by ADMM

4 Experimental results

To validate the effectiveness and efficiency of the proposed method, we compare the performance of the proposed method with the tensor nuclear norm based methods on both synthetic data sets and real world application examples. To measure the reconstruction accuracies, we employ the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) (50). PSNR is defined as

where , , and

are the original tensor, the maximum pixel value of the original tensor, and the estimated tensor, respectively. SSIM measures the structural similarity of two images, and please see

(50) for details. Better completion results correspond to larger values in PSNR and SSIM. All algorithms are implemented on the platform of Windows 10 and Matlab (R2017b) with an Intel(R) Core(TM) i5-4590 CPU at 3.30 GHz and 16 GB RAM. Our Matlab code now is available at https://github.com/uestctensorgroup/PSTNN.

4.1 Synthetic data

To synthesize a ground-truth low tubal-rank tensor of rank , we perform a t-prod , where and

are independently sampled from an i.i.d. Gaussian distribution

.

4.1.1 Tensor completion

TNN PSTNN
(a) tensor
TNN PSTNN
(b) tensor
Figure 4: Success ratio for synthetic data of two different size and varying tubal-ranks with varying sampling rate. The left figures illustrate the empirical recovery rate by minimizing the TNN while the right figures by minimizing the PSTNN. The color magnitude represents the success ratio . The white dotted lines are provided as a guide for easier comparison.

For the tensor completion task, we try to recover from the partial observation which is randomly sampled entries of . To verify the robustness of the TNN based TC method and the proposed PSTNN based TC method, we conducted the experiments with respect to data sizes, the tubal-rank , the sampling rate, i.e. , respectively. We examine the performance by counting the number of successes. If the relative square error of the recovered and the ground truth , i.e. , is less than , then we claim that the recovery is successful. We repeat each case 10 times, and each cell in Figure 4 reflects the success percentage, which is computed by the successful times dividing 10. Figure 4 illustrates that the proposed PSTNN based TC method is more robust than the TNN based TC method, because of bigger brown areas.

4.1.2 Tensor robust principal components analysis

TNN PSTNN
(a) tensor
TNN PSTNN
(b) tensor
Figure 5: Correct recovery for varying tubal-rank and sparsity. Each entry in the figures reflects the fraction of correct recoveries across 10 trials. The white dotted lines are provided as a guide for easier comparison.

For the tensor robust principal components analysis task, is corrupted by a sparse noise with sparsity

and uniform distributed values. We try to recover

using Algorithm 4 and the TNN based tensor completion method. The setting of the experiments in this part is similar to that in Section 4.1.1. We conducted the experiments with respect to data sizes, the tubal-rank , sparsity , respectively. We examine the performance by counting the number of successes. We repeat each case 10 times, and each cell in Figure 5 reflects the success percentage, which is computed by the successful times dividing 10. Figure 5 illustrate that the proposed PSTNN TC method is more robust than the TNN based TC method, because of smaller blue areas.

4.1.3 Sensitivity to initialization

Figure 6: Distribution of residual errors with 1000 different random initializations for the TC task.

The converged solution may be different with different initializations, on account of that the proposed objective function is non-convex. To study the sensitivity of the optimization against the initialization, we conducted 1000 experiments with random initialization on a tensor with tubal-rank 5 and with missing entries for the TC task.The distribution of the rooted relative squared error are shown in Figure 6. While the convergence of non-convex problem to an optimum is hard to be guaranteed, most solutions are concentrically distributed in regions near the ground-truth solution with small errors.

4.2 Tensor completion for the real-world data

Data Size Index Observed HaLRTC TNN PSTNN
video PSNR 7.1475 22.6886 26.2793 26.7292
SSIM 0.0459 0.6786 0.8187 0.8288

MRI
PSNR 10.3162 24.3162 26.9626 27.9680
SSIM 0.0887 0.7175 0.8144 0.8236

MSI
PSNR 13.8113 24.0003 28.9523 30.8586
SSIM 0.1353 0.6703 0.8695 0.9026
Table 1: Quantitative comparisons of the completion results of HaLRTC, TNN and PSTNN on the real-world data.
Original Observed HaLRTC TNN PSTNN



Figure 7: Results for the tensor completion for the real-world data. From top to bottom: one frame of the video data, one slice of the MRI data, one band of the MSI data.

In this subsection, we compare our PSTNN based TC method with the HaLRTC (2) and the TNN based TC method (34) on the real-world data, including the video data111http://www.changedetection.net, the MRI data222http://brainweb.bic.mni.mcgill.ca/brainweb/selection˙normal.html and the multispectral image (MSI) data333http://www1.cs.columbia.edu/CAVE/databases/multispectral. The ratio of the missing entries is set as 80%. Figure