## 1 Introduction

Tensors are multi-dimensional arrays and high-order generation of vectors and matrices

[1]. Most of the real world data like color images, videos, multichannel electroencephalography (EEG) signals, etc. are more than two dimensions. Tensor data representation can keep the original form of data, which is good for retaining high dimensional structure and adjacent relation information of data. Due to the flexibility and highly compressibility of tensor decomposition, in recent decades, many tensor methodologies have been proposed in various fields such as image and video completion [2, 3], brain computer interface [4], signal processing [5, 6], etc.The main concept of solving tensor completion problem is that we use the observed entries of incomplete data to find the tensor decomposition factors which contain the latent features of the data, then we use the powerful feature representation ability of tensor decomposition factors to approximate the missing entries. The most studied and popular decomposition models in recent years are CANDECOMP/PARAFAC (CP) decomposition [7] and Tucker decomposition [8]. They have been applied in many data completion methods. CP weighted optimization (CP-WOPT) [2] builds objective function by the Frobenius norm of weighted approximated tensor and observed tensor, then it uses optimization method to find the optimal CP factor matrices by the observed data. Bayesian CP factorization [3] employs Bayesian probabilistic model to find the best CP factor matrices and determine the rank of CP tensor automatically at the same time. The method in [9] recovers low-n-rank tensor data with its convex relaxation by alternating direction method of multipliers (ADM). Low-n-rank Tucker completion method is used in [10] and the experiments show better results than other nuclear norm minimization methods.

Though CP and Tucker can reach relatively high performance in low-order tensors, due to the nature limitations of CP and Tucker, when it comes to high-order tensors and high missing rate of data, the performance of these two decomposition methods will decrease rapidly. Tensor-train (TT) decomposition [11]

, which is free from the “curse of dimensionality” and a better model to process high-order tensor is employed in our method. The works in our paper are concluded as below: (a) We propose an algorithm named Sparse Tensor-train Optimization (STTO) which considers incomplete data as sparse tensor and optimize the factors of tensor-train decomposition by gradient descent method. By optimizing the factors of tensor-train decomposition in sparse format, the computational complexity is significantly reduced. The tensor decomposition factors are used to approximate the missing entries. (b) Using synthetic data, we conduct simulation experiments to compare our algorithm with the state-of-the-art algorithms in four different dimensions. (c) We provide a data dimension ascending scheme for image data which can improve the performance of our algorithm. It is particularly useful to process image data in irregular missing cases like whole row missing and block missing. (d) We carry out several real world data experiments, and the results in simulation data and image data show that our method outperforms the state-of-the-art approaches.

## 2 Notations and Tensor-train Decomposition

### 2.1 Notations

In this paper, we adopt the notations from [1]. Scalars are denoted by normal lowercase letters, e.g., , and vectors are denoted by boldface lowercase letters, e.g., . Matrices are denoted by boldface capital letters, e.g., . Tensors of order are denoted by boldface Euler script letters, e.g., . denotes the th matrix of a matrix sequence, and the representations of vector and tensor sequence are denoted in the same way. When given a tensor , the th element of is denoted by or .

The of two tensors , is defined as . Furthermore, the of is defined by . The is denoted by which is an element-wise product of vectors, matrices or tensors of same sizes. The of two matrices and is .

### 2.2 Tensor-train Decomposition

The most prominent advantage of tensor-train decomposition is that the amount of model parameters will not grow exponentially by data dimension. It decomposes a tensor into a sequence of three-way tensor factors (core tensors). In particular, the TT decomposition of a tensor can be expressed as follow:

(1) |

where is a sequence of three-way core tensors of size , . is named TT-rank which limits the size of every core tensor. Furthermore, Each element of tensor can be represented by core tensors as follow:

(2) |

where is the th slice of the th core tensor of size , , .

## 3 Sparse Tensor-train Optimization

### 3.1 Our Previous Work

In our previous work [12], we proposed an algorithm called Tensor-train Weighted OPTimization (TT-WOPT) which achieves high performance in data completion task. However, TT-WOPT considers all the missing entries of data as zero, and it computes the whole scale of tensor in every iteration. If the data scale is huge and missing rate is high, TT-WOPT will cost much computer memory space and be ineffective as it computes the whole scale tensor of which only a small percentage of entries is useful.

### 3.2 STTO Algorithm

In order to solve the problems of TT-WOPT as mentioned in , our proposed algorithm STTO, which only uses observed entries to compute the gradient of every core tensor is proposed. Consider is the observed tensor with missing entries, is the tensor approximated by core tensors, and the number of all the observed entries is . Define the index of the th observed entry as , , we have , . According to equation (2), can be written as:

(3) |

For one observed entry of tensor , we formulate the objective function as:

(4) |

For , and , the partial derivatives of every used slice of this entry is calculated by:

(5) |

where , . If we consider the incomplete tensor as a sparse tensor, only the observed entries need to be enumerated. We arrange all the observed entries into vector , and arrange the according entries which are approximated by core tensors into . Then the optimization objective function of all missing entries can be formulated by:

(6) |

By equation (3) and (4), the optimization objective function can also be formulated as follow:

(7) |

So the sum gradient of every slice of every core tensor is the accumulation of the slice gradients in equation (5) with the same index, that is:

(8) |

, and . After all the gradients of every slice of core tensors are obtained, any first-order optimization method can be applied to the STTO algorithm. The whole process of STTO is summarized in The computational complexity of TT-WOPT and STTO is and , respectively. From this we can see STTO largely reduces the computational complexity and is totally free from dimensionality of tensor.

Algorithm 1 Sparse Tensor-train Optimization (STTO) |

1: Input: incomplete sparse tensor and TT-rank . |

2: Initialization: core tensors of approximated |

tensor . |

3: While the optimization stopping condition is not satisfied |

4: For n=1: |

5: For j=1: |

6: Compute . |

7: End |

8: End |

9: Update by gradient descent method. |

10: End while |

11: Output: . |

## 4 Experiments

In this section, our proposed STTO is compared with two state-of-the-art algorithms: CP weighted optimization (CP-WOPT) [2] and Fully Bayesian CP (FBCP) [3]. Simulation experiments, color image data experiments are conducted to validate the effectiveness of our algorithm. In addition, we provide a tensorization method to transform visual data to a higher dimension. This method can enhance the structure relation information of data and improve the performance of our algorithm.

For evaluation indices, we use RSE (Relative Square Error) for simulation data and image data. PSNR (Peak Signal-to-noise Ratio) is used to measure the quality of reconstructed image data. In order to have a more clear comparison with CP-WOPT, we adopt the same optimization method as paper [2]. We apply nonlinear conjugate gradient (NCG) with Hestenes-Stiefel updates [13] and the Moré-Thuente line search algorithm [14]. All the methods are implemented by an optimization toolbox named Pablano Toolbox [15] and optimization stopping condition is set as maximum number of iterations.

### 4.1 Simulations

We consider to use values produced from a highly oscillating function: [16] as simulation data, which is expected to be well approximated by all the tensor completion algorithms. The four tested data structures are (3D), (5D), (7D), (9D). The TT-ranks and CP-ranks of the four simulation are set to make the number of model parameters of the three algorithms as close as possible respectively.

From we can see, our method performs best among the three algorithms almost in every situation. Especially when the dimension of data is increase, our algorithm can maintain the RSE values while the performance of the other two algorithms falls quickly.

### 4.2 Image Data Completion

#### 4.2.1 Visual Data Tensorization Method

From the simulation results we can see STTO can perform well in high-order cases, so we provide the below method to transform visual data to higher-order to enhance the performance of our algorithm. The original size of every image data is . First the three-way tensor image is reshaped to a seventeen-way tensor of size and permute the tensor according to order . Then we reshape the tensor to a nine-way tensor of size . The first order of the transformed tensor contains the data of a pixel block of the image and the following orders of the tensor describe the expanding pixel blocks of the image. This nine-way tensor is considered to be a better structure of the image data. This tensorization method is applied to STTO in all of the following image experiments. The other two algorithms use original three way tensor form because they perform better in low-order tensor.

#### 4.2.2 Random Missing

We first adopt one benchmark image named “Lena” to see the best performance of all the algorithms in random missing cases. Briefly, we only compare the three algorithms in high missing rate situations. TT-ranks and CP-ranks are set properly to obtain the best results. The visualized experiment results in show that our STTO algorithm outperforms other algorithms distinctly. Particularly, when the missing rate reaches 98% and 99%, our algorithm with our visual data tensorization method can recover the image well while other algorithms fail totally.

#### 4.2.3 Irregular Missing

In this experiment, images with whole row missing or block missing are tested by the three algorithms. The visualized results of and values of RSE and PSNR from show that STTO with visual data tensorization method can recover images with whole row missing and block missing well.

row missing | block missing | ||||||

image | lena | peppers | sailboat | lena | peppers | sailboat | |

STTO | RSE PSNR | 0.1138 24.00 | 0.1661 20.80 | 0.1767 19.93 | 0.1323 22.69 | 0.1611 21.06 | 0.1704 20.25 |

CP-WOPT | RSE PSNR | 0.5401 10.86 | 0.5546 10.85 | 0.5545 10.34 | 0.1746 20.61 | 0.2252 18.27 | 0.2082 19.00 |

FBCP | RSE PSNR | 0.5503 10.46 | 0.5594 10.58 | 0.5586 10.18 | 0.1498 21.66 | 0.1671 20.79 | 0.1764 20.01 |

## 5 Conclusions

In this paper, we first elaborate the basis of tensor and tensor-train decomposition. Then STTO algorithm which is efficient and has low computational complexity is proposed. It uses observed entries of sparse tensor to optimize the core tensors of tensor-train model and recover the missing data. From the simulation experiments, we can see our algorithm outperforms the state-of-the-art methods in both low-order cases and high-order cases. In addition, image completion experiment results prove that STTO with our tensorization scheme can achieve a high performance under high missing rate cases. The remarkable results on image irregular missing cases also show advantages of our algorithm and tensorization method. From the experiment results we can see tensor-train decomposition with high-order tensorizations can achieve high compression and representation abilities. Furthermore, it should be noted that the performance of tensor-train decomposition is sensitive to the selection of TT-ranks. Hence, we will study on how to optimize tensor factors and TT-ranks simultaneously in our future work.

## 6 Acknowledgement

This work was supported by JSPS KAKENHI (Grant No. 17K00326, 15H04002), JST CREST (Grant No. JPMJCR1784) and the National Natural Science Foundation of China (Grant No. 61773129).

## References

- [1] Tamara G Kolda and Brett W Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
- [2] Evrim Acar, Daniel M Dunlavy, Tamara G Kolda, and Morten Mørup, “Scalable tensor factorizations for incomplete data,” Chemometrics and Intelligent Laboratory Systems, vol. 106, no. 1, pp. 41–56, 2011.
- [3] Qibin Zhao, Liqing Zhang, and Andrzej Cichocki, “Bayesian CP factorization of incomplete tensors with automatic rank determination,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1751–1763, 2015.
- [4] J Mocks, “Topographic components model for event-related potentials and some biophysical considerations,” IEEE transactions on biomedical engineering, vol. 35, no. 6, pp. 482–484, 1988.
- [5] Lieven De Lathauwer and Joséphine Castaing, “Blind identification of underdetermined mixtures by simultaneous matrix diagonalization,” IEEE Transactions on Signal Processing, vol. 56, no. 3, pp. 1096–1105, 2008.
- [6] Damien Muti and Salah Bourennane, “Multidimensional filtering based on a tensor approach,” Signal Processing, vol. 85, no. 12, pp. 2338–2353, 2005.
- [7] RA Harshman, “Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis,” UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970.
- [8] Ledyard R Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
- [9] Silvia Gandy, Benjamin Recht, and Isao Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Problems, vol. 27, no. 2, pp. 025010, 2011.
- [10] Marko Filipović and Ante Jukić, “Tucker factorization with missing data with application to low-n-rank tensor completion,” Multidimensional systems and signal processing, vol. 26, no. 3, pp. 677–692, 2015.
- [11] Ivan V Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
- [12] Longhao Yuan, Qibin Zhao, and Jianting Cao, “Completion of high order tensor data with missing entries via tensor-train decomposition,” in Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings. Springer, 2017, vol. 10634, p. 222.
- [13] Jorge Nocedal and Stephen Wright, Numerical optimization, Springer Science & Business Media, 2006.
- [14] Jorge J Moré and David J Thuente, “Line search algorithms with guaranteed sufficient decrease,” ACM Transactions on Mathematical Software (TOMS), vol. 20, no. 3, pp. 286–307, 1994.
- [15] Daniel M Dunlavy, Tamara G Kolda, and Evrim Acar, “Poblano v1. 0: A matlab toolbox for gradient-based optimization,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Tech. Rep. SAND2010-1422, 2010.
- [16] Boris N Khoromskij, “Tensor numerical methods for multidimensional PDES: theoretical analysis and initial applications,” ESAIM: Proceedings and Surveys, vol. 48, pp. 1–28, 2015.