In this paper, we propose a novel approach to recover the missing entries of
incomplete data represented by a highdimension tensor. Tensortrain
decomposition, which has powerful tensor representation ability and is free
from `the curse of dimensionality', is employed in our approach. By observed
entries of incomplete data, we consider to find the factors which can capture
the latent features of the data and then reconstruct the missing entries. With
lowrank assumption to the original data, tensor completion problem is cast
into solving optimization models. Gradient descent methods are applied to
optimize the core tensors of tensortrain decomposition. We propose two
algorithms: Tensortrain Weighted Optimization (TTWOPT) and Tensortrain
Stochastic Gradient Descent (TTSGD) to solve tensor completion problems. A
highorder tensorization method named visual data tensorization (VDT) is
proposed to transform visual data to higherorder forms by which the
performance of our algorithms can be improved. The synthetic data experiments
and visual data experiments show that our algorithms outperform the
stateoftheart completion algorithms. Especially in highdimension, high
missing rate and largescale data cases, significant performance can be
obtained from our algorithms.
Tensors are the highorder generalizations of vectors and matrices. Representing data by tensor can retain the high dimensional form of data and keep adjacent structure information of data. Most of the realworld data are more than two orders. For example, RGB images are orderthree tensors (
height×width×channel ), videos are orderfour tensors (height×width×channel×time) and electroencephalography (EEG) signals are orderthree tensors (magnitude×trails×time). When facing data with more than two orders, traditional methods usually transform data into matrices or vectors by concatenation, which leads to spatial redundancy and less efficient factorizationshashua2005non . In recent years, many theories, algorithms and applications of tensor methodologies have been studied and proposed kolda2009tensor ; vasilescu2003multilinear ; franz2009triplerank . Due to the high compression ability and data representation ability of tensor decomposition, many applications related to tensor decomposition have been proposed in a variety of fields such as image and video completion acar2011scalable ; zhao2015bayesian , signal processing de2008blind ; muti2005multidimensional , braincomputer interface mocks1988topographic , image classification shashua2001linear , etc.
In practical situations, data missing is ubiquitous due to the error and the noise in data collecting process, resulting in the generation of data outliers and unwanted data entries. Generally, the lynchpin of tensor completion is to find the correlations between the missing entries and the observed entries. Tensor decomposition is to decompose tensor data into decomposition factors which can catch the latent features of the whole data. The basic concept of solving data completion problems by tensor decomposition is that we find the decomposition factors by the partially observed data, then we take advantages of the powerful feature representation ability of the factors to approximate the missing entries. The most studied and classical tensor decomposition models are the CANDECOMP/PARAFAC (CP) decomposition
sorber2013optimization ; goulart2016tensor , and the Tucker decomposition tucker1966some ; de2000best ; tsai2016tensor . CP decomposition decomposes a tensor into a sum of rankone tensors, and Tucker decomposition approximates a tensor by a core tensor and several factor matrices. There are many proposed tensor completion methods which employ the two tensor decomposition models. In acar2011scalable , CP weighted optimization (CPWOPT) is proposed. It formulates tensor completion problem as a weighted least squares (WLS) problem and uses optimization algorithms to find the optimal CP factors. Fully Bayesian CP Factorization (FBCP) in zhao2015bayesian employs a Bayesian probabilistic model to find the optimal CP factors and CPrank at the same time. Three algorithms based on nuclear norm minimization are proposed in liu2013tensor , i.e., SiLRTC, FaLRTC, and HaLRTC. They extend the nuclear norm regularization for matrix completion to tensor completion by minimizing the Tucker rank of the incomplete tensor. In filipovic2015tucker , Tucker lownrank tensor completion (TLnR) is proposed, and the experiments show better results than the traditional nuclear norm minimization methods.
Though CP and Tucker can obtain relatively high performance in loworder tensors, due to the natural limitations of these two models, when it comes to highorder tensors, the performance of the two decomposition models will decrease rapidly. In recent years, a matrix product state (MPS) model named tensor train (TT) is proposed and becomes popular oseledets2011tensor ; bengua2017efficient ; yang2017tensor . For an Nth order tensor X∈RI1×⋯×IN, CP decomposition represents data by O(∑Nn=1InR) model parameters, Tucker model needs O(∑Nn=1InR+rN) model parameters, and TT model requires O(∑Nn=1InR2) parameters, where R represents the rank of each decomposition model. TT decomposition scales linearly to the tensor order which is the same as CP decomposition. Though the CP model is more compact by ranks, it is difficult to find the optimal CP factors especially when the tensor order is high. Tucker model is more flexible and stable, but model parameters will grow exponentially when the tensor order increases. Tensor train is free from the ‘curse of dimensionality’ so it is a better model to process highorder tensors. In addition to CPbased and Tuckerbased tensor completion algorithms, there are several works about TTbased tensor completion. bengua2017efficient develops the lowTTrank algorithms for tensor completion. By tensor lowrank assumption based on TTrank, the nuclear norm regularizations are imposed on the more balanced unfoldings of the tensor, by which the performance improvement is obtained. TTALS is proposed in wang2016tensor , in which the authors employ the alternative least squares (ALS) method to find the TT decomposition factors to solve tensor completion problem. A gradientbased completion algorithm is discussed in yuan2017completion , which is to find the TT decomposition by gradient descent method and it shows high performance in highorder tensors and high missing rates tensor completion problems. There are also tensor completion algorithms which are based on the other tensor decomposition models, i.e., tensor ring (TR) decomposition zhao2016tensor ; zhao2017learning and hierarchical Tucker (HT) decomposition. Based on TR decomposition, works in wang2017efficient ; yuan2018higher ; yuan2018tensor propose algorithms named TRALS, TRWOPT and TRLRF which apply ALS, gradient descent and nuclear norm minimization methods to solve various tensor completion problems. Moreover, by total variations (TV) and HT decomposition, liu2018image proposes a completion algorithm named STTC, which explores the global lowrank tensor structure and the local correlation structure of the data simultaneously.
In this paper, we mainly focus on developing efficient tensor completion algorithms based on TT decomposition. Though several tensor completion methods based on TT model have been proposed recently bengua2017efficient ; wang2016tensor ; yuan2017completion , their applicability and effectiveness are limited. The main works of this paper are concluded as follows: 1) Based on optimization methodology and tensor train decomposition, we propose two algorithms named Tensor train Weighted Optimization (TTWOPT) and Tensor train Stochastic Gradient Descent (TTSGD) which apply gradientbased optimization algorithms to solve tensor completion problems. 2) We conduct simulation experiments in different tensor orders and compare our algorithms to the stateoftheart tensor completion algorithms. The superior performance of our algorithms is obtained in both loworder and highorder tensors. 3) We propose a tensorization method named Visual Data Tensorization (VDT) to transform visual data into higherorder tensors, by which the performance of our algorithms is improved. 4) We test the performance of our algorithms on benchmark RGB images, video data, and hyperspectral image data. The higher performance of our algorithms is shown compared to the stateoftheart algorithms.
The rest of the paper is organized as follows. In Section 2, we state the notations applied in this paper and introduce the tensor train decomposition. In Section 3, we present the two tensor completion algorithms and analyze the computational complexities of the algorithms. In Section 4, various experiments are conducted on synthetic data and realworld data, in which the proposed algorithms are compared to the stateoftheart algorithms. We conclude our work in Section 5.
2 Preliminaries and Related works
2.1 Notations
Notations in kolda2009tensor are adopted in our paper. A scalar is denoted by a normal lowercase/uppercase letter, e.g., x,X∈R, a vector is denoted by a boldface lowercase letter, e.g., x∈RI, a matrix is denoted by a boldface capital letter, e.g., X∈RI×J, a tensor of order N≥3 is denoted by an Euler script letter, e.g., X∈RI1×I2×⋯×IN.
x(1),x(2),⋯,x(N) denotes a vector sequence, in which x(n) denotes the nth vector in the sequence. The representations of matrix sequences and tensor sequences are denoted in the same way. An element of tensor X∈RI1×I2×⋯×IN of index {i1,i2,⋯,iN} is denoted by xi1i2⋯iN or X(i1,i2,⋯,iN). The moden matricization (unfolding) of tensor X∈RI1×I2×⋯×IN is denoted by X(n)∈RIn×I1⋯In−1In+1⋯IN.
Furthermore, the inner product of two tensor X, Y with the same size RI1×I2×⋯×IN is defined as ⟨X,Y⟩=∑i1∑i2⋯∑iNxi1i2⋯iNyi1i2⋯iN. The Frobenius norm of X is defined by ∥X∥F=√⟨X,X⟩. The Hadamard product is denoted by ‘∗’ and it is an elementwise product of vectors, matrices or tensors of the same size. For instance, given tensors X,Y∈RI1×I2×⋯×IN, Z=X∗Y, then Z∈RI1×I2×⋯×IN and zi1i2⋯iN=xi1i2⋯iNyi1i2⋯iN are satisfied. The Kronecker product of two matrices X∈RI×K and Y∈RJ×L is X⊗Y∈RIJ×KL, see more details in kolda2009tensor .
2.2 Tensor Train Decomposition
The most significant feature of TT decomposition is that the number of model parameters will not grow exponentially by the increase of the tensor order. TT decomposition is to decompose a tensor into a sequence of orderthree core tensors (factor tensors): G(1),G(2),⋯,G(N). The relation between the approximated tensor X∈RI1×I2×⋯×IN and core tensors can be expressed as follow:
X=≪G(1),G(2),⋯,G(N)≫,
(1)
where for n=1,⋯,N, G(n)∈RRn−1×In×Rn, R0=RN=1, and the notation ≪⋅≫ is the operation to transform the core tensors to the approximated tensor. It should be noted that, for overall expression convenience, G(1)∈RI1×R1 and G(N)∈RRN−1×IN are considered as two ordertwo tensors. The sequence R0,R1,⋯,RN is named TTrank which limits the size of every core tensor. Furthermore, the (i1,i2,⋯,iN)th element of tensor X can be represented by the multiple product of the corresponding mode2 slices of the core tensors as:
xi1i2⋯iN=N∏n=1G(n)in,
(2)
where G(1)i1,⋯,G(N)iN is the sequence of slices from each core tensor. For n=1,2,⋯,N, G(n)in∈RRn−1×Rn is the mode2 slice extracted from G(n) according to each mode of the element index of xi1i2⋯iN. G(1)i1∈RR1 and G(N)iN∈RRN−1 are extracted from first core tensor and last core tensor, they are considered as two orderone matrices for overall expression convenience.
3 Gradientbased Tensor Train Completion
3.1 Tensor train Weighted Optimization (TTWOPT)
We define Y∈RI1×I2×⋯×IN as the partially observed tensor with missing entries and X∈RI1×I2×⋯×IN is the tensor approximated by the core tensors of a TT decomposition. The missing entries of Y are filled with zero to make Y to be a realvalued tensor. For modeling the completion problem, the indices of the missing entries need to be specified. We define a binary tensor W∈RI1×I2×⋯×IN named weight tensor in which the indices of missing entries and observed entries of the incomplete tensor Y can be recorded. Every entry of W meets:
wi1i2⋯iN={0%
ifyi1i2⋯iNis a missing entry,1ifyi1i2⋯iNis an %
observed entry.
(3)
The problem of finding the decomposition factors of an incomplete tensor can be formulated by a weight least squares (WLS) model. Define Yw=W∗Y, and Xw=W∗X, then the WLS model for calculating tensor decomposition factors is formulated by:
f(G(1),G(2),⋯,G(N))=12∥Yw−Xw∥2F.
(4)
This is an optimization objective function w.r.t. all the TT core tensors and we aim to solve the model by gradient descent methods. The relation between the approximated tensor X and the TT core tensors can be deduced as the following equation cichocki2016tensor :
X(n)=G(n)(2)(G>n(1)⊗G<n(n)),
(5)
where for n=1,...,N,
G>n=≪G(n+1),G(n+2),⋯,G(N)≫∈RRn×In+1×⋯×IN,
(6)
G<n=≪G(1),G(2),⋯,G(n−1)≫∈RI1×⋯×In−1×Rn−1.
(7)
G>n and G<n are the tensors generated by merging the selected TT core tensors, and we define G>N=G<1=1.
By equation (5), for n=1,...,N, the partial derivatives of the objective function (4) w.r.t. the mode2 matricization of the nth core tensor G(n) can be inferred as:
(8)
After the objective function and the gradients are obtained, we can apply various optimization algorithms to optimize the core tensors. The implementation procedure of TTWOPT to find the TT decomposition from incomplete tensor Y is listed in Algorithm 1.
As seen from equation (4), TTWOPT computes the gradients by the whole scale of the tensor for every iteration. The computation can be redundant because the missing entries still occupy the computational space. If the scale of data is huge and the number of missing entries is high, then we only need to apply a small amount of the observed entries. In this situation, TTWOPT can waste much computational storage and the computation will become timeconsuming. In order to solve the problems of TTWOPT as mentioned above, we propose the TTSGD algorithm which only randomly samples one observed entry to compute the gradients for every iteration.
Stochastic Gradient Descent (SGD) has been applied in matrix and tensor decompositions gemulla2011large ; maehara2016expected ; wang2016online . For every optimization iteration, we only use one entry which is randomly sampled from the observed entries, and one entry can only influence the gradient of part of the core tensors. For one observed entry of index {i1,i2,⋯iN}, if a value approximated by TT core tensors is xi1i2⋯iN and the observed value (real value) is yi1i2⋯iN, by considering equation (2), the objective function can be formulated by:
From the equation we can see, the computational complexity of TTSGD is not related to the scale of the observed tensor or the number of observed entries, so it can process largescale data by much smaller computational complexity than TTWOPT. This algorithm is also suitable for online/realtime learning. The optimization process of TTSGD is listed in Algorithm 2:
3: While the optimization stopping condition is not satisfied
4: Randomly sample yi1i2…iN from Y.
5: For n=1:N
6: Compute the gradients of the core tensors by equation (10).
7: End
8: Update G(1)i1,G(2)i2,⋯,G(N)iN by gradient descent method.
9: End while
10: Output: G(1),G(2),⋯,G(N).
3.3 Computational Complexity
For tensor X∈RI1×I2×⋯×IN, we assume all I1,I2,⋯,IN is equal to I, and R1=R2=⋯=RN−1=R. According to equation (8) and (10), the time complexity of TTWOPT and TTSGD are O(NIN+NIN−1R2) and O(N2R3) respectively, and the space complexity of the two algorithms is O(IN+IN−1R2) and O(R2) respectively. Though TTWOPT has larger computational complexity, it has a steady and fast convergence when processing normalsize data. TTSGD is free from data dimensionality and the complexity of every iteration is extremely low, so it is more suitable to process largescale data. It should be noted that for every iteration of TTSGD, we can also apply the batchbased SGD method which calculates the summation of the gradients of bathsized entries for every iteration. Though this can improve the stability of TTSGD and the algorithm might need fewer iterations to be converged, the computational complexity will be increased and more computational time is needed for every iteration. In this paper, we only apply batchone SGD algorithm, and the synthetic experiment in the next section show that our method can also achieve fast and stable convergence. The code of the proposed algorithms is available at \(https://github.com/yuanlonghao/T3C\_tensor\_completion\).
4 Experiment results
In this section, simulation experiments are conducted to show the performance of our algorithms and the compared algorithms under various tensor orders. For realworld data experiments, we test our algorithms by color images, video data and hyperspectral image data. TTWOPT and TTSGD are compared with several stateoftheart algorithms: TTALS wang2016tensor , SiLRTCTT bengua2017efficient , TRALS wang2017efficient , STTC liu2018image , CPWOPT acar2011scalable , FBCP zhao2015bayesian , HaLRTC and FaLRTC liu2013tensor , and TLnR filipovic2015tucker . For all the compared algorithms, the input incomplete tensor is W∗Y, where Y is the fully observed true tensor, W is the binary tensor recording the position of observed entries. The final completed tensor Z is calculated by Z=(1−W)∗X+W∗Y, where X is the output tensor obtained by each algorithm. We apply relative squared error (RSE) which is defined as to evaluate the completion performance for each algorithm. For experiments of random missing cases, we randomly remove data points according to different missing rates mr which is defined as mr=1−M/∏Nn=1In, where M is the number of the observed entries. Moreover, to evaluate the completion quality of visual data, we introduce PSNR (Peak Signaltonoise Ratio). PSNR is obtained by PSNR=10log10(2552/MSE), where MSE is deduced by MSE=∥Z−Y∥2F/num(Z), and num(⋅) denotes the number of the element of the tensor.
For optimization method of TTWOPT, in order to have a clear comparison with CPWOPT which is also based on gradient descent methods, we adopt the same optimization method as paper acar2011scalable . The paper applies nonlinear conjugate gradient (NCG) with HestenesStiefel updates wright1999numerical and the MoréThuente line search method more1994line . The optimization method is implemented by an optimization toolbox named Pablano Toolbox dunlavy2010poblano
. For TTSGD, we employ an algorithm named Adaptive MomentEstimation (Adam) as our gradient descent method, it has prominent performance on stochasticgradientbased optimization
where t is the iteration time of optimization value θ, η and ϵ are hyper parameters, mt and vt are the first moment estimate and second moment estimate of gradient gt respectively. mt=β1mt−1+(1−β1)gt, vt=β2vt−1+(1−β2)g2t, where β1 and β2 are hyper parameters. For choosing the hyper parameters in Adam method, we adopt the reference values from paper kingma2014adam . The values of β1, β2 and ϵ are set as 0.9, 0.999 and 10−8 respectively. The selection of learning rate is essential to the convergence speed and the performance of the gradientbased algorithms, in our experiments, we empirically choose the learning rate η from {0.0001,0.0005,0.001} to obtain the best convergence speed and the best performance. In addition, all the data in our experiments are regularized to 0 to 1 to make the algorithms more effective.
We mainly adopt two optimization stopping conditions for all the compared completion algorithms. One is the error of two adjacent iterations of the objective function value: ft−ft−1≤tol, where ft is the objective function value of the tth iteration and we set tol=1e−4 in our experiment. The other stopping condition is the maximum number of iteration which is set according to the scale of data and different algorithms, e.g., the maximum iteration for most algorithms are set as 500 and for TTSGD it usually set from 105 to 107. If one of the two conditions is satisfied, the optimization will be stopped. All the computations are conducted on a Mac PC with Intel Core i7 and 16GB DDR3 memory, and the computational time of the algorithms are recorded in some experiments based on this configuration.
4.1 Synthetic Data
We apply synthetic data generated from a highly oscillating function: f(x)=sinx4cos(x2)khoromskij2015tensor in our simulation experiments. The synthetic data is expected to be well approximated by tensor decomposition models. We sample IN entries from the values generated from the function, then the sampled values are reshaped to the desired tensor size. We employ four different tensor structures: 26×26×26 (3D), 7×7×7×7×7 (5D), 4×4×4×4×4×4×4 (7D), and 3×3×3×3×3×3×3×3 (9D), then we test TTSGD ,TTWOPT, TTALS, SiLRTCTT, TRALS, CPWOPT and HaLRTC on the synthetic data. For parameter settings, the hyperparameters of each algorithm are tuned to obtain the best performance. For simplicity, we set values of each TTrank and TRrank identically, i.e., R1=⋯=RN−1 for TT and R1=⋯=RN for TR. Moreover, the TTrank, TRrank and CPrank are set as 12,10 and 30 under all the different tensor orders for the corresponding algorithms to make a clear comparison of the completion performance. In addition, the maximum iteration of TTSGD is set as 105, and iteration for other algorithms are all set as 500.
The graphs of Figure 1 show the experiment results of RSE values, which change by different mr (from 0.1 to 0.9) under the four different tensor orders. From the figure, we can see that TTWOPT and TTSGD show high performance in all the cases. HaLRTC only shows high performance in 3D tensor case, and CPWOPT and SiLRTC show stable but low performance in every case. Though TTALS and TRALS show higher performance than our algorithms in some low missing rate cases, the drastic performance decrease can be obtained from them when the missing rate increases, and our algorithms always show high and stable performance.
For the next synthetic data experiment, we aim to look into the convergence performance of the proposed TTSGD. The four tensors which applied in the previous experiment is employed as the input data. We record the value of loss function (i.e.,
12∥Z−Y∥2F) for every 103 iterations and Figure 2 shows the convergence status of TTSGD when the missing rate is 0.1, 0.5 and 0.9 respectively. Though our TTSGD needs large numbers of iteration to be converged, the computational complexity of each iteration is rather low (i.e., N2R3), and only one entry is sampled to calculate the gradient for every iteration. For TTSGD, the running time of reaching 105 iterations for the 3D, 5D, 7D, 9D data under the parameter setting in the experiment is 10.09 seconds, 25.09 seconds, 45.86 seconds and 75.41 seconds respectively, while for TTWOPT, it takes about two times longer than TTSGD (i.e., 18.80 seconds, 41.84 seconds, 100.02 seconds and 122.77 seconds) to converge to the same RSE values. The performance and computation time manifest the effectiveness of the TTSGD algorithm.
4.2 Visual Data Tensorization (VDT) method
From the simulation results we can see, our proposed algorithms achieve high and stable performance in highorder tensors. In this section, we provide a Visual Data Tensorization (VDT) method to transform loworder tensor into higherorder tensor and improve the performance of our algorithms. The VDT method is derived from an image compression and entanglement methodology latorre2005image which is to transform a grayscale image of size 2l×2l into a real ket of a Hilbert space. The method cast the image to a higherorder tensor structure with an appropriate block structured addressing. Similar method named KA augmentation is proposed in bengua2017efficient which extends the method in latorre2005image to orderthree visual data of size 2l×2l×3. Our VDT method is a generalization of the KA augmentation, and the visual data of various data sizes can be applied to our tensorization method. For visual data like RGB image, video, hyperspectral image, the first two orders of the tensor (e.g., Y∈RU×V) are named as the image modes. The 2D representation of the image modes cannot fully exploit the correlation and local structure of the data, so we propose the VDT method to strengthen the local structure correlation of visual data. The VDT method operates as follows: if the first two orders of a visual data tensor is U×V and can be reshaped to u1×u2×⋯×ul×v1×v2×⋯×vl, then VDT method permutes and reshapes the data to size u1v1×u2v2×⋯×ulvl and obtain the higherorder representation of the visual data. This higherorder tensor is a new structure of the original data: the first order of this higherorder tensor corresponds to a u1×v1 pixel block of the image, and the following orders of u2v2,⋯,ulvl describe the expanding largerscale partition of the image. Based on VDT method, TTbased algorithms can efficiently exploit the structure information of visual data and achieve a better lowrank representation. After the tensorized data is calculated by the completion algorithms, a reverse operation of VDT is conducted to get the original image structure. The diagrams to explain the procedure of VDT are shown in Figure 3.
To verify the effectiveness of our VDT method, we choose a benchmark image ‘Lena’ with 0.9 missing rate. We compare the performance of the six algorithms (TTWOPT, TTSGD, CPWOPT, FBCP, HaLRTC and TLnR) under three different data structures: orderthree tensor, ordernine tensor without VDT, ordernine tensor generated by VDT method. The orderthree tensor applies original image data structure of size 256×256×3. The nineorder tensor without VDT is generated by directly reshaping data to the size 4×4×4×4×4×4×4×4×3. For nineorder tensor with VDT method, firstly the original data is reshaped to a orderseventeen tensor of size 2×2×2×2×2×2×2×2×2×2×2×2×2×2×2×2×3 and then it is permuted according to the order of {1921031141251361471581617}. Finally we reshape the tensor to a nineorder tensor of size 4×4×4×4×4×4×4×4×3. This nineorder tensor with VDT is considered to be a better structure of the image data. The first order of the nine way tensor contains the data of a 2×2 pixel block of the image and the following orders of the tensor describe the expanding pixel blocks of the image. Most of the parameter settings follow the previous synthetic data experiments, and we tune the TTrank, CPrank and Tuckerrank of the corresponding algorithms to obtain the best performance. Figure 4 and Table 1 show the visual results and numerical results of the six algorithms under the three different data structure. We can see that in the threeorder tensor case, the results among the algorithms are similar. However, for nineorder cases, other algorithms fail the completion task while TTWOPT and TTSGD perform well. Furthermore, when the image is transformed to nineorder tensor by VDT method, we see the distinct improvement of our two algorithms.
TTWOPT
TTSGD
CPWOPT
FBCP
HaLRTC
TLnR
threeorder
RSE
PSNR
0.2822
16.12
0.2604
16.84
0.3392
14.53
0.1942
19.36
0.1981
19.18
0.6552
8.802
nineorder
RSE
PSNR
0.1558
21.31
0.1793
20.06
0.2562
16.95
0.2682
16.57
0.9310
5.746
1.207
3.486
nineorder VDT
RSE
PSNR
0.1262
23.21
0.1493
21.77
0.2573
16.97
0.2687
16.57
0.9301
5.751
0.7114
10.84
Table 1: Numerical results of completion performance (RSE and PSNR) of six algorithms under three tensor structures of image ‘Lena’.
4.3 Benchmark Image Completion
From the previous experiments we can see, TTbased and TRbased algorithms can be applied to higherorder tensors, and significant improvement of TTbased algorithms can be seen when the VDT method is applied to the image tensorization. However, for algorithms which are based on CP decomposition and Tucker decomposition, higherorder tensorization will decrease the performance. In later experiments, we only apply the VDT method to TTWOPT, TTSGD, TTALS, SILRTCTT and TRALS. For CPWOPT, FBCP, TLnR, STTC and HaLRTC, we keep the original data structure to get better results.
In this experiment, we consider several irregular missing cases (the scratch missing, the whole row missing and the block missing) and some highrandommissing cases on benchmark RGB images. The parameter settings for each compared algorithms are tuned to get the best performance. The completion results from Figure 5 and Table 2 we can see, our algorithms show high completion performance in all the missing cases. Moreover, for irregular missing cases and 0.8 random missing cases, STTC and HaLRTC performs well and achieve low RSE values. However, the two algorithms fail to solve the completion task when the random missing rate is 0.9 and 0.99, this is because the nuclearnormbased and totalvariationsbased algorithms cannot explore lowrank and local information when only a very small amount of entries is obtained. It should be noted that the 0.99 random missing case is a challenging task among all the image completion algorithms. Our two proposed algorithms with VDT method can achieve high performance under this situation while the other algorithms fail.
missing patterns
indices
TTSGD
TTWOPT
TTALS
SiLRTCTT
TRALS
CPWOPT
FBCP
TLnR
STTC
HaLRTC
Scratch
RSE
PSNR
0.09946
24.93
0.1455
21.62
0.1319
22.48
0.1522
21.23
0.1173
23.49
0.2160
18.19
0.1185
23.40
0.2871
15.72
0.1085
24.17
0.1168
23.53
Row
RSE
PSNR
0.07319
27.91
0.09629
25.54
0.1385
22.38
0.1379
22.41
0.07325
27.91
0.1653
20.84
0.3605
14.07
0.1797
20.12
0.1069
24.62
0.3605
14.07
Block
RSE
PSNR
0.08084
27.20
0.09196
26.09
0.09671
25.65
0.09511
29.86
0.08517
26.75
0.1391
22.49
0.1147
24.17
0.1579
21.39
0.07315
28.07
0.08167
27.12
0.9 random
RSE
PSNR
0.1444
21.11
0.1635
20.03
0.1891
18.77
0.1969
18.41
0.1090
23.55
0.3209
14.17
0.1967
18.43
0.5815
9.01
0.1845
18.98
0.1621
20.11
0.95 random
RSE
PSNR
0.1576
21.17
0.1797
20.32
0.2547
17.00
0.2865
15.98
0.2147
18.49
0.4045
12.98
0.2850
16.02
0.5557
10.23


0.2820
16.11
0.99 random
RSE
PSNR
0.3318
15.81
0.2520
15.30


0.4049
6.98


0.4749
12.70
0.4074
14.03
0.8545
7.30


0.9129
7.03
Table 2: Comparison of the inpainting performance (RSE and PSNR) of ten algorithms under six missing situations.
4.4 Video and Hyperspectral Image Completion
For largescale data completion task, we test a video and a hyperspectral image (HSI) in the following experiments. For our proposed algorithms, we only test TTSGD because TTSGD is better for largescale data than TTWOPT. In addition, when largescale data is employed, many algorithms which work well on benchmark images will become inefficient or ineffective, so we compare TTSGD to only several algorithms (TTALS, CPWOPT, FBCP, and HaLRTC).
First, we test a video which records a moving train. The size of the data is 320×256×3×100 and the background of the video changes by frames. By the VDT method, we first reshape the data to size 2×2×2×2×2×2×5×2×2×2×2×2×2×4×3×100, then permute it by index {18293104115126137141516}, and finally we reshape it to size 4×4×4×4×4×4×20×3×100 as the input tensor. We compare three random missing cases (mr=0.7, mr=0.9 and mr=0.99) in this experiment. Part of the visual results are shown in Figure 6, and the numerical results are shown in Table 3. The performance of TTSGD outperforms other compared algorithms. More specifically, it can recover the video well even there is only 1% sampled entries while other compared algorithms fail in this high missing rate case. It should also be noted that the time cost of TTSGD is lower than the other compared algorithms, which shows high efficiency of TTSGD.
mr=0.7
mr=0.9
mr=0.99
Algorithm
RSE PSNR Time
RSE PSNR Time
RSE PSNR Time
TTSGD
0.1459 22.67 680.94
0.2045 19.87 674.17
0.2185 19.24 698.11
TTALS
0.2116 19.48 7100.43
0.2400 18.39 1622.42
0.2557 17.8466 793.76
CPWOPT
0.2673 17.41 825.06
0.3264 15.67 790.60
0.3610 14.80 814.44
FBCP
0.2204 19.11 870.89
0.2547 17.86 920.78
0.3258 15.72 720.01
HaLRTC
0.1758 21.16 1132.05
0.2562 17.78 1044.88
0.8844 7.016 1121.37
Table 3: Numerical results (RSE and PSNR) on video completion experiments of five algorithms under three random missing cases.
Then we test TTSGD, CPWOPT, FBCP and HaLRTC on a hyperspectral image (HSI) of size 256×256×191 recorded by a satellite. Due to the inferior working condition of satellite sensors, the collected data often has Gaussian noise, impulse noise, dead lines, and stripes zhang2014hyperspectral . In this experiment, we first consider the situation when the HSI has ‘dead lines’, which is a common missing case in HSI record. Then we consider the case when only 1% of the data is obtained, which is meaningful in data compression and transformation. We transform the HSI data to 16×16×16×16×191 by VDT method as the input for TTSGD and apply original threeorder tensor as the input for the other compared algorithms. We set TTranks as 48 and 24 for dead line missing case and 99% missing case respectively. The visual completion results in Figure 7 shows the image of the first channel of the HSI and the numerical results are the evaluation of the overall completion performance.
TTSGD performs best among the algorithms at both dead line missing case and 99% random missing case. In 99% random missing case, HaLRTC fails the completion task, while CPWOPT and FBCP obtain lower performance than TTSGD. In addition, it should be noted that the volume of data is about 1.25×107, and when the iteration reaches 1×106 (16% of the total data), the optimization of TTSGD is converged. This indicates that TTSGD has fast and efficient computation.
5 Conclusion
In this paper, in order to solve the tensor completion problem, based on tensor train decomposition and gradient descent method, we propose two tensor completion algorithms named TTWOPT and TTSGD. We first cast the completion problem into solving the optimization models, then we use gradient descent methods to find the optimal core tensors of TT decomposition. Finally, the TT core tensors are applied to approximate the missing entries of the incomplete tensor. Furthermore, to improve the performance of the proposed algorithms, we propose the VDT method to tensorize visual data to higherorder. We conduct simulation experiments and visual data experiments to compare our algorithms to the stateoftheart algorithms. From the simulation experiments we can see, the performance of our algorithms stays stable when the tensor order increases. Moreover, the visual data experiments show that after higherorder tensorization by VDT, the performance of our two algorithms can be improved. Our algorithms outperform the compared stateoftheart algorithms in various missing situations, particularly when the tensor order is high and the missing rate is high. More specially, our algorithms with VDT method can process extreme high random missing situation (i.e., 99% random missing) well while other algorithms fail. Besides, our proposed TTSGD achieves low computational complexity and high efficiency in processing largescale data.
The high performance of the proposed algorithms shows that TTbased tensor completion is a promising aspect. It should be noted that TTrank setting is essential to obtain better experiment results and it is selected manually in common. We will extend our algorithms by choosing TTrank automatically in our future work.
Acknowledgement
This work was supported by JSPS KAKENHI (Grant No. 17K00326, 15H04002, 18K04178), JST CREST (Grant No. JPMJCR1784) and the National Natural Science Foundation of China (Grant No. 61773129).
References
References
(1)
A. Shashua, T. Hazan, Nonnegative tensor factorization with applications to
statistics and computer vision, in: Proceedings of the 22nd international
conference on Machine learning, ACM, 2005, pp. 792–799.
(2)
T. G. Kolda, B. W. Bader, Tensor decompositions and applications, SIAM review
51 (3) (2009) 455–500.
(3)
M. A. O. Vasilescu, D. Terzopoulos, Multilinear subspace analysis of image
ensembles, in: Computer Vision and Pattern Recognition, 2003. Proceedings.
2003 IEEE Computer Society Conference on, Vol. 2, IEEE, 2003, pp. II–93.
(4)
T. Franz, A. Schultz, S. Sizov, S. Staab, Triplerank: Ranking semantic web data
by tensor decomposition, The Semantic WebISWC 2009 (2009) 213–228.
(5)
E. Acar, D. M. Dunlavy, T. G. Kolda, M. Mørup, Scalable tensor
factorizations for incomplete data, Chemometrics and Intelligent Laboratory
Systems 106 (1) (2011) 41–56.
(6)
Q. Zhao, L. Zhang, A. Cichocki, Bayesian cp factorization of incomplete tensors
with automatic rank determination, IEEE transactions on pattern analysis and
machine intelligence 37 (9) (2015) 1751–1763.
(7)
L. De Lathauwer, J. Castaing, Blind identification of underdetermined mixtures
by simultaneous matrix diagonalization, IEEE Transactions on Signal
Processing 56 (3) (2008) 1096–1105.
(8)
D. Muti, S. Bourennane, Multidimensional filtering based on a tensor approach,
Signal Processing 85 (12) (2005) 2338–2353.
(9)
J. Mocks, Topographic components model for eventrelated potentials and some
biophysical considerations, IEEE transactions on biomedical engineering
35 (6) (1988) 482–484.
(10)
A. Shashua, A. Levin, Linear image coding for regression and classification
using the tensorrank principle, in: Computer Vision and Pattern Recognition,
2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on,
Vol. 1, IEEE, 2001, pp. I–I.
(11)
L. Sorber, M. Van Barel, L. De Lathauwer, Optimizationbased algorithms for
tensor decompositions: Canonical polyadic decomposition, decomposition in
rank(l_r,l_r,1) terms, and a new generalization, SIAM Journal on
Optimization 23 (2) (2013) 695–720.
(12)
J. H. d. M. Goulart, M. Boizard, R. Boyer, G. Favier, P. Comon, Tensor cp
decomposition with structured factor matrices: Algorithms and performance,
IEEE Journal of Selected Topics in Signal Processing 10 (4) (2016) 757–769.
(13)
L. R. Tucker, Some mathematical notes on threemode factor analysis,
Psychometrika 31 (3) (1966) 279–311.
(14)
L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank1 and rank(r 1, r
2,…, rn) approximation of higherorder tensors, SIAM journal on Matrix
Analysis and Applications 21 (4) (2000) 1324–1342.
(15)
C.Y. Tsai, A. M. Saxe, D. Cox, Tensor switching networks, in: Advances in
Neural Information Processing Systems, 2016, pp. 2038–2046.
(16)
J. Liu, P. Musialski, P. Wonka, J. Ye, Tensor completion for estimating missing
values in visual data, IEEE Transactions on Pattern Analysis and Machine
Intelligence 35 (1) (2013) 208–220.
(17)
M. Filipović, A. Jukić, Tucker factorization with missing data with
application to lownrank tensor completion, Multidimensional systems and
signal processing 26 (3) (2015) 677–692.
(18)
I. V. Oseledets, Tensortrain decomposition, SIAM Journal on Scientific
Computing 33 (5) (2011) 2295–2317.
(19)
J. A. Bengua, H. N. Phien, H. D. Tuan, M. N. Do, Efficient tensor completion
for color image and video recovery: Lowrank tensor train, IEEE Transactions
on Image Processing 26 (5) (2017) 2466–2479.
(20)
Y. Yang, D. Krompass, V. Tresp, Tensortrain recurrent neural networks for
video classification, in: International Conference on Machine Learning, 2017,
pp. 3891–3900.
(22)
L. Yuan, Q. Zhao, J. Cao, Completion of high order tensor data with missing
entries via tensortrain decomposition, in: International Conference on
Neural Information Processing, Springer, 2017, pp. 222–229.
(28)
Y. Liu, Z. Long, C. Zhu, Image completion using low tensor tree rank and total
variation minimization, IEEE Transactions on Multimedia.
(29)
A. Cichocki, N. Lee, I. Oseledets, A.H. Phan, Q. Zhao, D. P. Mandic, et al.,
Tensor networks for dimensionality reduction and largescale optimization:
Part 1 lowrank tensor decompositions, Foundations and
Trends® in Machine Learning 9 (45) (2016) 249–429.
(30)
R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis, Largescale matrix
factorization with distributed stochastic gradient descent, in: Proceedings
of the 17th ACM SIGKDD international conference on Knowledge discovery and
data mining, ACM, 2011, pp. 69–77.
(31)
T. Maehara, K. Hayashi, K.i. Kawarabayashi, Expected tensor decomposition with
stochastic gradient descent., in: AAAI, 2016, pp. 1919–1925.
(32)
Y. Wang, A. Anandkumar, Online and differentiallyprivate tensor decomposition,
in: Advances in Neural Information Processing Systems, 2016, pp. 3531–3539.
(33)
S. Wright, J. Nocedal, Numerical optimization, Springer Science 35 (6768)
(1999) 7.
(34)
J. J. Moré, D. J. Thuente, Line search algorithms with guaranteed
sufficient decrease, ACM Transactions on Mathematical Software (TOMS) 20 (3)
(1994) 286–307.
(35)
D. M. Dunlavy, T. G. Kolda, E. Acar, Poblano v1. 0: A matlab toolbox for
gradientbased optimization, Sandia National Laboratories, Albuquerque, NM
and Livermore, CA, Tech. Rep. SAND20101422.
(38)
B. N. Khoromskij, Tensor numerical methods for multidimensional pdes:
theoretical analysis and initial applications, ESAIM: Proceedings and Surveys
48 (2015) 1–28.
(40)
A. Novikov, D. Podoprikhin, A. Osokin, D. P. Vetrov, Tensorizing neural
networks, in: Advances in Neural Information Processing Systems, 2015, pp.
442–450.
(41)
H. Zhang, W. He, L. Zhang, H. Shen, Q. Yuan, Hyperspectral image restoration
using lowrank matrix recovery, IEEE Transactions on Geoscience and Remote
Sensing 52 (8) (2014) 4729–4743.