 # High-dimension Tensor Completion via Gradient-based Optimization Under Tensor-train Format

In this paper, we propose a novel approach to recover the missing entries of incomplete data represented by a high-dimension tensor. Tensor-train decomposition, which has powerful tensor representation ability and is free from `the curse of dimensionality', is employed in our approach. By observed entries of incomplete data, we consider to find the factors which can capture the latent features of the data and then reconstruct the missing entries. With low-rank assumption to the original data, tensor completion problem is cast into solving optimization models. Gradient descent methods are applied to optimize the core tensors of tensor-train decomposition. We propose two algorithms: Tensor-train Weighted Optimization (TT-WOPT) and Tensor-train Stochastic Gradient Descent (TT-SGD) to solve tensor completion problems. A high-order tensorization method named visual data tensorization (VDT) is proposed to transform visual data to higher-order forms by which the performance of our algorithms can be improved. The synthetic data experiments and visual data experiments show that our algorithms outperform the state-of-the-art completion algorithms. Especially in high-dimension, high missing rate and large-scale data cases, significant performance can be obtained from our algorithms.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Tensors are the high-order generalizations of vectors and matrices. Representing data by tensor can retain the high dimensional form of data and keep adjacent structure information of data. Most of the real-world data are more than two orders. For example, RGB images are order-three tensors (

), videos are order-four tensors () and electroencephalography (EEG) signals are order-three tensors (). When facing data with more than two orders, traditional methods usually transform data into matrices or vectors by concatenation, which leads to spatial redundancy and less efficient factorizationshashua2005non . In recent years, many theories, algorithms and applications of tensor methodologies have been studied and proposed kolda2009tensor ; vasilescu2003multilinear ; franz2009triplerank . Due to the high compression ability and data representation ability of tensor decomposition, many applications related to tensor decomposition have been proposed in a variety of fields such as image and video completion acar2011scalable ; zhao2015bayesian , signal processing de2008blind ; muti2005multidimensional , brain-computer interface mocks1988topographic , image classification shashua2001linear , etc.

In practical situations, data missing is ubiquitous due to the error and the noise in data collecting process, resulting in the generation of data outliers and unwanted data entries. Generally, the lynchpin of tensor completion is to find the correlations between the missing entries and the observed entries. Tensor decomposition is to decompose tensor data into decomposition factors which can catch the latent features of the whole data. The basic concept of solving data completion problems by tensor decomposition is that we find the decomposition factors by the partially observed data, then we take advantages of the powerful feature representation ability of the factors to approximate the missing entries. The most studied and classical tensor decomposition models are the CANDECOMP/PARAFAC (CP) decomposition

sorber2013optimization ; goulart2016tensor , and the Tucker decomposition tucker1966some ; de2000best ; tsai2016tensor . CP decomposition decomposes a tensor into a sum of rank-one tensors, and Tucker decomposition approximates a tensor by a core tensor and several factor matrices. There are many proposed tensor completion methods which employ the two tensor decomposition models. In acar2011scalable , CP weighted optimization (CP-WOPT) is proposed. It formulates tensor completion problem as a weighted least squares (WLS) problem and uses optimization algorithms to find the optimal CP factors. Fully Bayesian CP Factorization (FBCP) in zhao2015bayesian employs a Bayesian probabilistic model to find the optimal CP factors and CP-rank at the same time. Three algorithms based on nuclear norm minimization are proposed in liu2013tensor , i.e., SiLRTC, FaLRTC, and HaLRTC. They extend the nuclear norm regularization for matrix completion to tensor completion by minimizing the Tucker rank of the incomplete tensor. In filipovic2015tucker , Tucker low--rank tensor completion (TLnR) is proposed, and the experiments show better results than the traditional nuclear norm minimization methods.

Though CP and Tucker can obtain relatively high performance in low-order tensors, due to the natural limitations of these two models, when it comes to high-order tensors, the performance of the two decomposition models will decrease rapidly. In recent years, a matrix product state (MPS) model named tensor train (TT) is proposed and becomes popular oseledets2011tensor ; bengua2017efficient ; yang2017tensor . For an th order tensor , CP decomposition represents data by model parameters, Tucker model needs model parameters, and TT model requires parameters, where represents the rank of each decomposition model. TT decomposition scales linearly to the tensor order which is the same as CP decomposition. Though the CP model is more compact by ranks, it is difficult to find the optimal CP factors especially when the tensor order is high. Tucker model is more flexible and stable, but model parameters will grow exponentially when the tensor order increases. Tensor train is free from the ‘curse of dimensionality’ so it is a better model to process high-order tensors. In addition to CP-based and Tucker-based tensor completion algorithms, there are several works about TT-based tensor completion. bengua2017efficient develops the low-TT-rank algorithms for tensor completion. By tensor low-rank assumption based on TT-rank, the nuclear norm regularizations are imposed on the more balanced unfoldings of the tensor, by which the performance improvement is obtained. TT-ALS is proposed in wang2016tensor , in which the authors employ the alternative least squares (ALS) method to find the TT decomposition factors to solve tensor completion problem. A gradient-based completion algorithm is discussed in yuan2017completion , which is to find the TT decomposition by gradient descent method and it shows high performance in high-order tensors and high missing rates tensor completion problems. There are also tensor completion algorithms which are based on the other tensor decomposition models, i.e., tensor ring (TR) decomposition zhao2016tensor ; zhao2017learning and hierarchical Tucker (HT) decomposition. Based on TR decomposition, works in wang2017efficient ; yuan2018higher ; yuan2018tensor propose algorithms named TR-ALS, TR-WOPT and TRLRF which apply ALS, gradient descent and nuclear norm minimization methods to solve various tensor completion problems. Moreover, by total variations (TV) and HT decomposition, liu2018image proposes a completion algorithm named STTC, which explores the global low-rank tensor structure and the local correlation structure of the data simultaneously.

In this paper, we mainly focus on developing efficient tensor completion algorithms based on TT decomposition. Though several tensor completion methods based on TT model have been proposed recently bengua2017efficient ; wang2016tensor ; yuan2017completion , their applicability and effectiveness are limited. The main works of this paper are concluded as follows: 1) Based on optimization methodology and tensor train decomposition, we propose two algorithms named Tensor train Weighted Optimization (TT-WOPT) and Tensor train Stochastic Gradient Descent (TT-SGD) which apply gradient-based optimization algorithms to solve tensor completion problems. 2) We conduct simulation experiments in different tensor orders and compare our algorithms to the state-of-the-art tensor completion algorithms. The superior performance of our algorithms is obtained in both low-order and high-order tensors. 3) We propose a tensorization method named Visual Data Tensorization (VDT) to transform visual data into higher-order tensors, by which the performance of our algorithms is improved. 4) We test the performance of our algorithms on benchmark RGB images, video data, and hyperspectral image data. The higher performance of our algorithms is shown compared to the state-of-the-art algorithms.

The rest of the paper is organized as follows. In Section 2, we state the notations applied in this paper and introduce the tensor train decomposition. In Section 3, we present the two tensor completion algorithms and analyze the computational complexities of the algorithms. In Section 4, various experiments are conducted on synthetic data and real-world data, in which the proposed algorithms are compared to the state-of-the-art algorithms. We conclude our work in Section 5.

## 2 Preliminaries and Related works

### 2.1 Notations

Notations in kolda2009tensor are adopted in our paper. A scalar is denoted by a normal lowercase/uppercase letter, e.g., , a vector is denoted by a boldface lowercase letter, e.g., , a matrix is denoted by a boldface capital letter, e.g., , a tensor of order is denoted by an Euler script letter, e.g., .

denotes a vector sequence, in which denotes the th vector in the sequence. The representations of matrix sequences and tensor sequences are denoted in the same way. An element of tensor of index is denoted by or . The mode- matricization (unfolding) of tensor is denoted by .

Furthermore, the inner product of two tensor , with the same size is defined as . The Frobenius norm of is defined by . The Hadamard product is denoted by ‘’ and it is an element-wise product of vectors, matrices or tensors of the same size. For instance, given tensors , , then and are satisfied. The Kronecker product of two matrices and is , see more details in kolda2009tensor .

### 2.2 Tensor Train Decomposition

The most significant feature of TT decomposition is that the number of model parameters will not grow exponentially by the increase of the tensor order. TT decomposition is to decompose a tensor into a sequence of order-three core tensors (factor tensors): . The relation between the approximated tensor and core tensors can be expressed as follow:

 X=≪G(1),G(2),⋯,G(N)≫, (1)

where for , , , and the notation is the operation to transform the core tensors to the approximated tensor. It should be noted that, for overall expression convenience, and are considered as two order-two tensors. The sequence is named TT-rank which limits the size of every core tensor. Furthermore, the th element of tensor can be represented by the multiple product of the corresponding mode- slices of the core tensors as:

 xi1i2⋯iN=N∏n=1G(n)in, (2)

where is the sequence of slices from each core tensor. For , is the mode- slice extracted from according to each mode of the element index of . and are extracted from first core tensor and last core tensor, they are considered as two order-one matrices for overall expression convenience.

## 3 Gradient-based Tensor Train Completion

### 3.1 Tensor train Weighted Optimization (TT-WOPT)

We define as the partially observed tensor with missing entries and is the tensor approximated by the core tensors of a TT decomposition. The missing entries of are filled with zero to make to be a real-valued tensor. For modeling the completion problem, the indices of the missing entries need to be specified. We define a binary tensor named weight tensor in which the indices of missing entries and observed entries of the incomplete tensor can be recorded. Every entry of meets:

 wi1i2⋯iN={0% ifyi1i2⋯iNis a missing entry,1ifyi1i2⋯iNis an % observed entry. (3)

The problem of finding the decomposition factors of an incomplete tensor can be formulated by a weight least squares (WLS) model. Define , and , then the WLS model for calculating tensor decomposition factors is formulated by:

 f(G(1),G(2),⋯,G(N))=12∥Yw−Xw∥2F. (4)

This is an optimization objective function w.r.t. all the TT core tensors and we aim to solve the model by gradient descent methods. The relation between the approximated tensor and the TT core tensors can be deduced as the following equation cichocki2016tensor :

 X(n)=G(n)(2)(G>n(1)⊗G

where for ,

 G>n=≪G(n+1),G(n+2),⋯,G(N)≫∈RRn×In+1×⋯×IN, (6)
 G

and are the tensors generated by merging the selected TT core tensors, and we define .

By equation (5), for , the partial derivatives of the objective function (4) w.r.t. the mode-2 matricization of the th core tensor can be inferred as:

 (8)

After the objective function and the gradients are obtained, we can apply various optimization algorithms to optimize the core tensors. The implementation procedure of TT-WOPT to find the TT decomposition from incomplete tensor is listed in Algorithm 1.

### 3.2 Tensor Train Stochastic Gradient Descent (TT-SGD)

As seen from equation (4), TT-WOPT computes the gradients by the whole scale of the tensor for every iteration. The computation can be redundant because the missing entries still occupy the computational space. If the scale of data is huge and the number of missing entries is high, then we only need to apply a small amount of the observed entries. In this situation, TT-WOPT can waste much computational storage and the computation will become time-consuming. In order to solve the problems of TT-WOPT as mentioned above, we propose the TT-SGD algorithm which only randomly samples one observed entry to compute the gradients for every iteration.

Stochastic Gradient Descent (SGD) has been applied in matrix and tensor decompositions gemulla2011large ; maehara2016expected ; wang2016online . For every optimization iteration, we only use one entry which is randomly sampled from the observed entries, and one entry can only influence the gradient of part of the core tensors. For one observed entry of index , if a value approximated by TT core tensors is and the observed value (real value) is , by considering equation (2), the objective function can be formulated by:

 f(G(1)i1,G(2)i2,⋯,G(N)iN)=12∥∥ ∥∥yi1i2⋯iN−N∏k=1G(k)ik∥∥ ∥∥2F. (9)

For , the partial derivatives of every corresponding slice w.r.t. index is calculated as:

 ∂f∂G(n)in=(xi1i2⋯iN−yi1i2⋯iN)(N∏k=n+1G(k)ikn−1∏k=1G(k)ik)T. (10)

From the equation we can see, the computational complexity of TT-SGD is not related to the scale of the observed tensor or the number of observed entries, so it can process large-scale data by much smaller computational complexity than TT-WOPT. This algorithm is also suitable for online/real-time learning. The optimization process of TT-SGD is listed in Algorithm 2:

### 3.3 Computational Complexity

For tensor , we assume all is equal to , and . According to equation (8) and (10), the time complexity of TT-WOPT and TT-SGD are and respectively, and the space complexity of the two algorithms is and respectively. Though TT-WOPT has larger computational complexity, it has a steady and fast convergence when processing normal-size data. TT-SGD is free from data dimensionality and the complexity of every iteration is extremely low, so it is more suitable to process large-scale data. It should be noted that for every iteration of TT-SGD, we can also apply the batch-based SGD method which calculates the summation of the gradients of bath-sized entries for every iteration. Though this can improve the stability of TT-SGD and the algorithm might need fewer iterations to be converged, the computational complexity will be increased and more computational time is needed for every iteration. In this paper, we only apply batch-one SGD algorithm, and the synthetic experiment in the next section show that our method can also achieve fast and stable convergence. The code of the proposed algorithms is available at $$https://github.com/yuanlonghao/T3C\_tensor\_completion$$.

## 4 Experiment results

In this section, simulation experiments are conducted to show the performance of our algorithms and the compared algorithms under various tensor orders. For real-world data experiments, we test our algorithms by color images, video data and hyperspectral image data. TT-WOPT and TT-SGD are compared with several state-of-the-art algorithms: TT-ALS wang2016tensor , SiLRTC-TT bengua2017efficient , TRALS wang2017efficient , STTC liu2018image , CP-WOPT acar2011scalable , FBCP zhao2015bayesian , HaLRTC and FaLRTC liu2013tensor , and TLnR filipovic2015tucker . For all the compared algorithms, the input incomplete tensor is , where is the fully observed true tensor, is the binary tensor recording the position of observed entries. The final completed tensor is calculated by , where is the output tensor obtained by each algorithm. We apply relative squared error (RSE) which is defined as to evaluate the completion performance for each algorithm. For experiments of random missing cases, we randomly remove data points according to different missing rates which is defined as , where is the number of the observed entries. Moreover, to evaluate the completion quality of visual data, we introduce PSNR (Peak Signal-to-noise Ratio). PSNR is obtained by , where MSE is deduced by , and denotes the number of the element of the tensor.

For optimization method of TT-WOPT, in order to have a clear comparison with CP-WOPT which is also based on gradient descent methods, we adopt the same optimization method as paper acar2011scalable . The paper applies nonlinear conjugate gradient (NCG) with Hestenes-Stiefel updates wright1999numerical and the Moré-Thuente line search method more1994line . The optimization method is implemented by an optimization toolbox named Pablano Toolbox dunlavy2010poblano

. For TT-SGD, we employ an algorithm named Adaptive Moment Estimation (Adam) as our gradient descent method, it has prominent performance on stochastic-gradient-based optimization

kingma2014adam ; ruder2016overview . The update rule of Adam is as follow:

 θt+1=θt−η√vt+ϵmt, (11)

where is the iteration time of optimization value , and are hyper parameters, and are the first moment estimate and second moment estimate of gradient respectively. , , where and are hyper parameters. For choosing the hyper parameters in Adam method, we adopt the reference values from paper kingma2014adam . The values of , and are set as 0.9, 0.999 and respectively. The selection of learning rate is essential to the convergence speed and the performance of the gradient-based algorithms, in our experiments, we empirically choose the learning rate from to obtain the best convergence speed and the best performance. In addition, all the data in our experiments are regularized to 0 to 1 to make the algorithms more effective.

We mainly adopt two optimization stopping conditions for all the compared completion algorithms. One is the error of two adjacent iterations of the objective function value: , where is the objective function value of the th iteration and we set in our experiment. The other stopping condition is the maximum number of iteration which is set according to the scale of data and different algorithms, e.g., the maximum iteration for most algorithms are set as and for TT-SGD it usually set from to . If one of the two conditions is satisfied, the optimization will be stopped. All the computations are conducted on a Mac PC with Intel Core i7 and 16GB DDR3 memory, and the computational time of the algorithms are recorded in some experiments based on this configuration.

### 4.1 Synthetic Data

We apply synthetic data generated from a highly oscillating function: khoromskij2015tensor in our simulation experiments. The synthetic data is expected to be well approximated by tensor decomposition models. We sample entries from the values generated from the function, then the sampled values are reshaped to the desired tensor size. We employ four different tensor structures: (3D), (5D), (7D), and (9D), then we test TT-SGD ,TT-WOPT, TT-ALS, SiLRTC-TT, TR-ALS, CP-WOPT and HaLRTC on the synthetic data. For parameter settings, the hyper-parameters of each algorithm are tuned to obtain the best performance. For simplicity, we set values of each TT-rank and TR-rank identically, i.e., for TT and for TR. Moreover, the TT-rank, TR-rank and CP-rank are set as and under all the different tensor orders for the corresponding algorithms to make a clear comparison of the completion performance. In addition, the maximum iteration of TT-SGD is set as , and iteration for other algorithms are all set as .

The graphs of Figure 1 show the experiment results of RSE values, which change by different (from 0.1 to 0.9) under the four different tensor orders. From the figure, we can see that TT-WOPT and TT-SGD show high performance in all the cases. HaLRTC only shows high performance in 3D tensor case, and CP-WOPT and SiLRTC show stable but low performance in every case. Though TT-ALS and TR-ALS show higher performance than our algorithms in some low missing rate cases, the drastic performance decrease can be obtained from them when the missing rate increases, and our algorithms always show high and stable performance. Figure 1: RSE comparison of seven algorithms under four different tensor orders. The missing rate is tested from 0.1 to 0.9.

For the next synthetic data experiment, we aim to look into the convergence performance of the proposed TT-SGD. The four tensors which applied in the previous experiment is employed as the input data. We record the value of loss function (i.e.,

) for every iterations and Figure 2 shows the convergence status of TT-SGD when the missing rate is 0.1, 0.5 and 0.9 respectively. Though our TT-SGD needs large numbers of iteration to be converged, the computational complexity of each iteration is rather low (i.e., ), and only one entry is sampled to calculate the gradient for every iteration. For TT-SGD, the running time of reaching iterations for the 3D, 5D, 7D, 9D data under the parameter setting in the experiment is 10.09 seconds, 25.09 seconds, 45.86 seconds and 75.41 seconds respectively, while for TT-WOPT, it takes about two times longer than TT-SGD (i.e., 18.80 seconds, 41.84 seconds, 100.02 seconds and 122.77 seconds) to converge to the same RSE values. The performance and computation time manifest the effectiveness of the TT-SGD algorithm. Figure 2: Convergence performance of TT-SGD under four different synthetic tensors. From left to right, the missing rate of the data in each figure is 0.1, 0.5 and 0.9 respectively.

### 4.2 Visual Data Tensorization (VDT) method

From the simulation results we can see, our proposed algorithms achieve high and stable performance in high-order tensors. In this section, we provide a Visual Data Tensorization (VDT) method to transform low-order tensor into higher-order tensor and improve the performance of our algorithms. The VDT method is derived from an image compression and entanglement methodology latorre2005image which is to transform a gray-scale image of size into a real ket of a Hilbert space. The method cast the image to a higher-order tensor structure with an appropriate block structured addressing. Similar method named KA augmentation is proposed in bengua2017efficient which extends the method in latorre2005image to order-three visual data of size . Our VDT method is a generalization of the KA augmentation, and the visual data of various data sizes can be applied to our tensorization method. For visual data like RGB image, video, hyperspectral image, the first two orders of the tensor (e.g., ) are named as the image modes. The 2D representation of the image modes cannot fully exploit the correlation and local structure of the data, so we propose the VDT method to strengthen the local structure correlation of visual data. The VDT method operates as follows: if the first two orders of a visual data tensor is and can be reshaped to , then VDT method permutes and reshapes the data to size and obtain the higher-order representation of the visual data. This higher-order tensor is a new structure of the original data: the first order of this higher-order tensor corresponds to a pixel block of the image, and the following orders of describe the expanding larger-scale partition of the image. Based on VDT method, TT-based algorithms can efficiently exploit the structure information of visual data and achieve a better low-rank representation. After the tensorized data is calculated by the completion algorithms, a reverse operation of VDT is conducted to get the original image structure. The diagrams to explain the procedure of VDT are shown in Figure 3. Figure 3: Illustration of the proposed VDT method. Figure (a) is the example of applying VDT method on an I×I×C tensor. Figure (b) and Figure (c) shows the example of the VDT operation on a 256×256×3 image.

To verify the effectiveness of our VDT method, we choose a benchmark image ‘Lena’ with missing rate. We compare the performance of the six algorithms (TT-WOPT, TT-SGD, CP-WOPT, FBCP, HaLRTC and TLnR) under three different data structures: order-three tensor, order-nine tensor without VDT, order-nine tensor generated by VDT method. The order-three tensor applies original image data structure of size . The nine-order tensor without VDT is generated by directly reshaping data to the size . For nine-order tensor with VDT method, firstly the original data is reshaped to a order-seventeen tensor of size and then it is permuted according to the order of . Finally we reshape the tensor to a nine-order tensor of size . This nine-order tensor with VDT is considered to be a better structure of the image data. The first order of the nine way tensor contains the data of a pixel block of the image and the following orders of the tensor describe the expanding pixel blocks of the image. Most of the parameter settings follow the previous synthetic data experiments, and we tune the TT-rank, CP-rank and Tucker-rank of the corresponding algorithms to obtain the best performance. Figure 4 and Table 1 show the visual results and numerical results of the six algorithms under the three different data structure. We can see that in the three-order tensor case, the results among the algorithms are similar. However, for nine-order cases, other algorithms fail the completion task while TT-WOPT and TT-SGD perform well. Furthermore, when the image is transformed to nine-order tensor by VDT method, we see the distinct improvement of our two algorithms. Figure 4: Visual results for completion of the 0.9 random missing ‘Lena’ image under six algorithms. The first row applies original order-three tensor data, the second row applies order-nine tensor data without VDT method, and the third row applies order-nine tensor data generated by VDT method.

### 4.3 Benchmark Image Completion

From the previous experiments we can see, TT-based and TR-based algorithms can be applied to higher-order tensors, and significant improvement of TT-based algorithms can be seen when the VDT method is applied to the image tensorization. However, for algorithms which are based on CP decomposition and Tucker decomposition, higher-order tensorization will decrease the performance. In later experiments, we only apply the VDT method to TT-WOPT, TT-SGD, TT-ALS, SILRTC-TT and TR-ALS. For CP-WOPT, FBCP, TLnR, STTC and HaLRTC, we keep the original data structure to get better results.

In this experiment, we consider several irregular missing cases (the scratch missing, the whole row missing and the block missing) and some high-random-missing cases on benchmark RGB images. The parameter settings for each compared algorithms are tuned to get the best performance. The completion results from Figure 5 and Table 2 we can see, our algorithms show high completion performance in all the missing cases. Moreover, for irregular missing cases and 0.8 random missing cases, STTC and HaLRTC performs well and achieve low RSE values. However, the two algorithms fail to solve the completion task when the random missing rate is 0.9 and 0.99, this is because the nuclear-norm-based and total-variations-based algorithms cannot explore low-rank and local information when only a very small amount of entries is obtained. It should be noted that the 0.99 random missing case is a challenging task among all the image completion algorithms. Our two proposed algorithms with VDT method can achieve high performance under this situation while the other algorithms fail. Figure 5: The first and second row of the figure is the fully observed benchmark images and the corresponding missing patterns respectively, below which the visual completion results of the ten algorithms under the different missing patterns (i.e., scratch missing, row missing, block missing, 0.9 random missing, 0.95 random missing, and 0.99 random missing) are shown.

### 4.4 Video and Hyperspectral Image Completion

For large-scale data completion task, we test a video and a hyperspectral image (HSI) in the following experiments. For our proposed algorithms, we only test TT-SGD because TT-SGD is better for large-scale data than TT-WOPT. In addition, when large-scale data is employed, many algorithms which work well on benchmark images will become inefficient or ineffective, so we compare TT-SGD to only several algorithms (TT-ALS, CP-WOPT, FBCP, and HaLRTC).

First, we test a video which records a moving train. The size of the data is and the background of the video changes by frames. By the VDT method, we first reshape the data to size , then permute it by index , and finally we reshape it to size as the input tensor. We compare three random missing cases (, and ) in this experiment. Part of the visual results are shown in Figure 6, and the numerical results are shown in Table 3. The performance of TT-SGD outperforms other compared algorithms. More specifically, it can recover the video well even there is only 1% sampled entries while other compared algorithms fail in this high missing rate case. It should also be noted that the time cost of TT-SGD is lower than the other compared algorithms, which shows high efficiency of TT-SGD. Figure 6: Video completion results of TT-SGD, TT-ALS, CP-WOPT, FBCP, and HaLRTC under random missing cases. The first row to the last row show the completion results of the 1st frame, the 75th frame and the 100th frame of the video respectively.

Then we test TT-SGD, CP-WOPT, FBCP and HaLRTC on a hyperspectral image (HSI) of size recorded by a satellite. Due to the inferior working condition of satellite sensors, the collected data often has Gaussian noise, impulse noise, dead lines, and stripes zhang2014hyperspectral . In this experiment, we first consider the situation when the HSI has ‘dead lines’, which is a common missing case in HSI record. Then we consider the case when only 1% of the data is obtained, which is meaningful in data compression and transformation. We transform the HSI data to by VDT method as the input for TT-SGD and apply original three-order tensor as the input for the other compared algorithms. We set TT-ranks as and for dead line missing case and 99% missing case respectively. The visual completion results in Figure 7 shows the image of the first channel of the HSI and the numerical results are the evaluation of the overall completion performance. Figure 7: HSI completion results of the four algorithms. We show the image of the first channel of the HSI. The first row is the original image, the segmentation to show the completion performance, the dead line missing pattern, and the 0.99 random missing pattern. The second row and the third row show the completion results.

TT-SGD performs best among the algorithms at both dead line missing case and 99% random missing case. In 99% random missing case, HaLRTC fails the completion task, while CP-WOPT and FBCP obtain lower performance than TT-SGD. In addition, it should be noted that the volume of data is about , and when the iteration reaches (16% of the total data), the optimization of TT-SGD is converged. This indicates that TT-SGD has fast and efficient computation.

## 5 Conclusion

In this paper, in order to solve the tensor completion problem, based on tensor train decomposition and gradient descent method, we propose two tensor completion algorithms named TT-WOPT and TT-SGD. We first cast the completion problem into solving the optimization models, then we use gradient descent methods to find the optimal core tensors of TT decomposition. Finally, the TT core tensors are applied to approximate the missing entries of the incomplete tensor. Furthermore, to improve the performance of the proposed algorithms, we propose the VDT method to tensorize visual data to higher-order. We conduct simulation experiments and visual data experiments to compare our algorithms to the state-of-the-art algorithms. From the simulation experiments we can see, the performance of our algorithms stays stable when the tensor order increases. Moreover, the visual data experiments show that after higher-order tensorization by VDT, the performance of our two algorithms can be improved. Our algorithms outperform the compared state-of-the-art algorithms in various missing situations, particularly when the tensor order is high and the missing rate is high. More specially, our algorithms with VDT method can process extreme high random missing situation (i.e., 99% random missing) well while other algorithms fail. Besides, our proposed TT-SGD achieves low computational complexity and high efficiency in processing large-scale data.

The high performance of the proposed algorithms shows that TT-based tensor completion is a promising aspect. It should be noted that TT-rank setting is essential to obtain better experiment results and it is selected manually in common. We will extend our algorithms by choosing TT-rank automatically in our future work.

## Acknowledgement

This work was supported by JSPS KAKENHI (Grant No. 17K00326, 15H04002, 18K04178), JST CREST (Grant No. JPMJCR1784) and the National Natural Science Foundation of China (Grant No. 61773129).

## References

• (1)

A. Shashua, T. Hazan, Non-negative tensor factorization with applications to statistics and computer vision, in: Proceedings of the 22nd international conference on Machine learning, ACM, 2005, pp. 792–799.

• (2) T. G. Kolda, B. W. Bader, Tensor decompositions and applications, SIAM review 51 (3) (2009) 455–500.
• (3)

M. A. O. Vasilescu, D. Terzopoulos, Multilinear subspace analysis of image ensembles, in: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, Vol. 2, IEEE, 2003, pp. II–93.

• (4) T. Franz, A. Schultz, S. Sizov, S. Staab, Triplerank: Ranking semantic web data by tensor decomposition, The Semantic Web-ISWC 2009 (2009) 213–228.
• (5) E. Acar, D. M. Dunlavy, T. G. Kolda, M. Mørup, Scalable tensor factorizations for incomplete data, Chemometrics and Intelligent Laboratory Systems 106 (1) (2011) 41–56.
• (6) Q. Zhao, L. Zhang, A. Cichocki, Bayesian cp factorization of incomplete tensors with automatic rank determination, IEEE transactions on pattern analysis and machine intelligence 37 (9) (2015) 1751–1763.
• (7) L. De Lathauwer, J. Castaing, Blind identification of underdetermined mixtures by simultaneous matrix diagonalization, IEEE Transactions on Signal Processing 56 (3) (2008) 1096–1105.
• (8) D. Muti, S. Bourennane, Multidimensional filtering based on a tensor approach, Signal Processing 85 (12) (2005) 2338–2353.
• (9) J. Mocks, Topographic components model for event-related potentials and some biophysical considerations, IEEE transactions on biomedical engineering 35 (6) (1988) 482–484.
• (10) A. Shashua, A. Levin, Linear image coding for regression and classification using the tensor-rank principle, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1, IEEE, 2001, pp. I–I.
• (11) L. Sorber, M. Van Barel, L. De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l_r,l_r,1) terms, and a new generalization, SIAM Journal on Optimization 23 (2) (2013) 695–720.
• (12) J. H. d. M. Goulart, M. Boizard, R. Boyer, G. Favier, P. Comon, Tensor cp decomposition with structured factor matrices: Algorithms and performance, IEEE Journal of Selected Topics in Signal Processing 10 (4) (2016) 757–769.
• (13) L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (3) (1966) 279–311.
• (14) L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors, SIAM journal on Matrix Analysis and Applications 21 (4) (2000) 1324–1342.
• (15) C.-Y. Tsai, A. M. Saxe, D. Cox, Tensor switching networks, in: Advances in Neural Information Processing Systems, 2016, pp. 2038–2046.
• (16) J. Liu, P. Musialski, P. Wonka, J. Ye, Tensor completion for estimating missing values in visual data, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 208–220.
• (17) M. Filipović, A. Jukić, Tucker factorization with missing data with application to low-n-rank tensor completion, Multidimensional systems and signal processing 26 (3) (2015) 677–692.
• (18) I. V. Oseledets, Tensor-train decomposition, SIAM Journal on Scientific Computing 33 (5) (2011) 2295–2317.
• (19) J. A. Bengua, H. N. Phien, H. D. Tuan, M. N. Do, Efficient tensor completion for color image and video recovery: Low-rank tensor train, IEEE Transactions on Image Processing 26 (5) (2017) 2466–2479.
• (20)

Y. Yang, D. Krompass, V. Tresp, Tensor-train recurrent neural networks for video classification, in: International Conference on Machine Learning, 2017, pp. 3891–3900.

• (21) W. Wang, V. Aggarwal, S. Aeron, Tensor completion by alternating minimization under the tensor train (tt) model, arXiv preprint arXiv:1609.05587.
• (22) L. Yuan, Q. Zhao, J. Cao, Completion of high order tensor data with missing entries via tensor-train decomposition, in: International Conference on Neural Information Processing, Springer, 2017, pp. 222–229.
• (23) Q. Zhao, G. Zhou, S. Xie, L. Zhang, A. Cichocki, Tensor ring decomposition, arXiv preprint arXiv:1606.05535.
• (24) Q. Zhao, M. Sugiyama, A. Cichocki, Learning efficient tensor representations with ring structure networks, arXiv preprint arXiv:1705.08286.
• (25) W. Wang, V. Aggarwal, S. Aeron, Efficient low rank tensor ring completion, Rn 1 (r1) (2017) 1.
• (26) L. Yuan, J. Cao, Q. Wu, Q. Zhao, Higher-dimension tensor completion via low-rank tensor ring decomposition, arXiv preprint arXiv:1807.01589.
• (27) L. Yuan, C. Li, D. Mandic, J. Cao, Q. Zhao, Tensor ring decomposition with rank minimization on latent space: An efficient approach for tensor completion, arXiv preprint arXiv:1809.02288.
• (28) Y. Liu, Z. Long, C. Zhu, Image completion using low tensor tree rank and total variation minimization, IEEE Transactions on Multimedia.
• (29) A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, D. P. Mandic, et al., Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions, Foundations and Trends® in Machine Learning 9 (4-5) (2016) 249–429.
• (30) R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent, in: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2011, pp. 69–77.
• (31) T. Maehara, K. Hayashi, K.-i. Kawarabayashi, Expected tensor decomposition with stochastic gradient descent., in: AAAI, 2016, pp. 1919–1925.
• (32) Y. Wang, A. Anandkumar, Online and differentially-private tensor decomposition, in: Advances in Neural Information Processing Systems, 2016, pp. 3531–3539.
• (33) S. Wright, J. Nocedal, Numerical optimization, Springer Science 35 (67-68) (1999) 7.
• (34) J. J. Moré, D. J. Thuente, Line search algorithms with guaranteed sufficient decrease, ACM Transactions on Mathematical Software (TOMS) 20 (3) (1994) 286–307.
• (35) D. M. Dunlavy, T. G. Kolda, E. Acar, Poblano v1. 0: A matlab toolbox for gradient-based optimization, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Tech. Rep. SAND2010-1422.
• (36) D. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980.
• (37) S. Ruder, An overview of gradient descent optimization algorithms, arXiv preprint arXiv:1609.04747.
• (38) B. N. Khoromskij, Tensor numerical methods for multidimensional pdes: theoretical analysis and initial applications, ESAIM: Proceedings and Surveys 48 (2015) 1–28.
• (39) J. I. Latorre, Image compression and entanglement, arXiv preprint quant-ph/0510031.
• (40) A. Novikov, D. Podoprikhin, A. Osokin, D. P. Vetrov, Tensorizing neural networks, in: Advances in Neural Information Processing Systems, 2015, pp. 442–450.
• (41) H. Zhang, W. He, L. Zhang, H. Shen, Q. Yuan, Hyperspectral image restoration using low-rank matrix recovery, IEEE Transactions on Geoscience and Remote Sensing 52 (8) (2014) 4729–4743.