1 Introduction
In this paper we consider the tensor completion problem. We suppose that values of tensor are generated by some smooth function, i.e.
where is a point on some multidimensional grid and is some unknown smooth function. However, the tensor values are known only at some small subset of the grid. The task is to complete the tensor, i.e., to reconstruct the tensor values at all points on the grid taking into account the properties of the data generating process .
This problem statement differs from the traditional problem statement, which does not use any assumptions about the function . Knowing some properties of the data generating function provides insights about how the tensor values relate to each other, and this, in turn, allows us to improve the results. In this work we assume that function is smooth.
There are a lot of practical applications that suit the statement. For example, modeling of physical processes, solutions of differential equations, modeling probability distributions.
In this paper we propose to model the smoothness of the data generating process by using Gaussian Process Regression (GPR). In GPR the assumptions about the function that we approximate are controlled via the kernel function. The GPR model is then used to construct the initial solution to the tensor completion problem
In principle, such initialization can improve any other tensor completion technique. It means that using the proposed initialization stateoftheart results can be obtained employing some simple optimization procedure like Stochastic Gradient Descent.
When the tensor order is high the problem should be solved in some lowrank format because the number of elements of the tensor grows exponentially. The proposed approach is based on the tensor train (TT) format for its computational efficiency and ability to handle large dimensions [15].
The contributions of this paper are as follows

We introduce new initialization algorithm which takes into account the tensor generating process. The proposed algorithm is described in Section 3.

The proposed initialization technique automatically selects the rank of the tensor, the details are given in Section 3.2.

We conducted empirical evaluations of the proposed approach and compared it with tensor completion techniques without our initialization. The results are given in Section 4 and show the superiority of the proposed algorithm.
2 Tensor completion
The formal problem statement is as follows. Suppose that is a way tensor, (by tensor here we mean a multidimensional array). Tensor values are known only at some subset of indices . By we denote the projection onto the set , i.e.
We formulate the tensor completion as an optimization problem
(1)  
subject to 
where is a tensor train rank of [16], which is a generalization of the matrix rank, and is the Frobenius norm. A tensor is said to be in tensor train format if its elements are represented as
where is a threeway tensor core with size ,
. Vector
is called TTrank.Tensor train format assumes that the full tensor can be approximated by a set of way core tensors, the total number of elements in core tensors is , where , , which is much smaller than .
In problem (1) we optimize the objective function straightforwardly with respect to tensor cores while having their sizes fixed. Problem (1) is nonconvex, so optimization methods can converge to a local minimum. To get an efficient solution we impose two requirements:

Initial tensor in tensor train format should be as close to the optimum as possible.

Availability of an efficient optimization procedure that will be launched from the obtained initial tensor.
These steps are independent, and one can apply any desired algorithm in each of them.
In this work we develop the initialization algorithm that allows obtaining accurate initial tensor for the case when the tensor of interest is generated by some smooth function. The experimental section demonstrates that our initialization can improve the results of any optimization procedure, ensuring our approach to be universal.
3 Initialization
We consider tensors that are generated by some function, i.e. tensor values are computed as follows
where is some unknown smooth function and , , are tensor sizes. The set of points is a full factorial Design of Experiments, i.e. a multidimensional grid, and we also assume that the grid is uniform.
In this setting the tensor completion can be considered as a regression problem and can be solved by any regression technique that guarantees the smoothness of the solution. However, in the tensor completion problem we are interested in a tensor of values of at a predefined finite grid of points. The tensor should be in a lowrank format to be able to perform various operations with tensor efficiently (e.g. calculation of the norm of the tensor, dot product and other).
These observations give us the solution — build regression model using the observed values of the tensor, then use the obtained approximation as a blackbox for the TTcross approximation algorithm [15]. The last step results in a tensor in TT format, which is a lowrank format and allows efficient computations. The next step (which is optional) is to improve the obtained solution by using it as initialization for any other tensor completion technique.
Let us write down the set of observed tensor values into a vector and the corresponding indices into a matrix (each row is a vector of indices ). Then the approach for tensor completion (in TT format) can be written as follows

Construct initial tensor in TT format:

Apply some regression technique using given data set to construct approximation of the function that generates tensor values.

Apply TTcross method (see Section 3.2, [15]) to the constructed approximation to obtain .


Apply some tensor completion technique using as an initial value.
At step (a) the choice of the regression technique affects the result of the initialization, although it can be arbitrary. It is required to choose the regression algorithm such that it will capture the peculiarities of the tensor we would like to restore. In this work we suppose that the tensor generating function is smooth (which is a common situation when modeling physical processes). Therefore, we choose a regression technique that is good at approximating smooth functions. A reasonable choice, in this case, is to use Gaussian Process (GP) Regression [17]. GP models is a favorite tool in many engineering applications as they proved to be efficient, especially for problems where it is required to model some smooth function [3]. The points are not given, all we know is that at the point with multiindex on the grid the function value is equal to . To make the problem statement reasonable we assume that the indices are connected with the points as follows: , where . So, as an input for the approximation we set and such that .
At step (b) we use TTcross because it allows to efficiently approximate blackbox function by a lowrank tensor in TT format. Moreover, this approach can automatically select TTrank making it more desirable. More details on the technique are given in Section 3.2.
The described approach has the following benefits

Initial tensor which is close to the optimal value in terms of the reconstruction error at observed values. It will push the optimization to faster convergence.

Better generalization ability — there are many degrees of freedom: a lot of different tensor train factors can give low reconstruction error at observed positions but can give a large error at other locations. Accurate approximation model will push the initial tensor to be closer to the original tensor in both the observed positions and unobserved ones.

TTcross technique chooses rank automatically, so there is no need to tune the rank of the tensor manually.
The described approach leads to the Algorithm 1. The steps 3 and 4 of the algorithm are described in Section 3.1 and Section 3.2 correspondingly.
3.1 Gaussian Process Regression
One of the most efficient tools for approximating smooth functions is the Gaussian Process (GP) Regression [6]. GP regression is a Bayesian approach where a prior distribution over continuous functions is assumed to be a Gaussian Process, i.e.
where is a vector of outputs, is a matrix of inputs, ,
is a noise variance,
is a mean vector modeled by some function , is a covariance matrix for some a priori selected covariance function andis an identity matrix. An example of such function is a squared exponential kernel
where
are parameters of the kernel (hyperparameters of the GP model). The hyperparameters should be chosen according to the given data set.
Without loss of generality we make the standard assumption of zeromean data. Now, for a new unseen data point we have
(2) 
where and .
Let us denote the vector of hyperparameters and by . To choose the hyperparameters of our model we consider the loglikelihood
and optimize it over the hyperparameters [17]. The runtime complexity of learning GP regression is as we need to calculate the inverse of , its determinant and derivatives of the loglikelihood. If the sample size ( in our case) is large, the computational complexity becomes an issue. There are several ways to overcome it. If the data set has a factorial structure (multidimensional grid in a simple case) we can use the algorithm from [4]. If the structure is factorial with a small number of missing values the method from [5] should be applied. For general unstructured cases, the approximate GP model can be built using, for example, the model described in [14] or use a subsample as a training set.
After tuning of the hyperparameters, we can use the mean of the posterior distribution (2) as a prediction of the model.
Note, that the input points in our case is a set of indices of the observed values . For the GP model we scale each index to interval.
3.2 TensorTrain crossapproximation
To approximate tensor generated by we use TensorTrain crossapproximation. First, let us consider the matrix case. Suppose that we are given a rank matrix of size . A crossapproximation for the matrix is represented as
where are some columns and rows of the matrix and is the submatrix on the intersection of these rows and columns. To construct accurate approximation it is required to find submatrix of large volume. It can be done in operations [21].
Now for tensor the procedure is the following. At the first step let us consider unfolding of size and rank . Using rowcolumn alternating algorithm from [21] we can find linearly independent columns of matrix , these columns form matrix . After that applying maxvol procedure [21] to the matrix we can find set of row indices , matrix and matrix that will give the crossapproximation of unfolding :
We set
where is of size . Next, let us form tensor from rows of :
and reshape it into a tensor of size . Next step is to apply the same procedure to the unfolding of the tensor and obtain the matrices , and
of size .
Repeating the described procedure times we will end up with matrices of sizes . Then each matrix can be reshaped to the way tensor of size , and can be used as core tensors for TT format. It turns out that such representation is a TT decomposition of the initial tensor .
The exact ranks
are not known to us in general. They can only be estimated from the above (e.g., by the maximum rank of the corresponding unfolding). If the rank is overestimated then the calculation of matrices
becomes unstable operation (because we obtain almost rankdeficient unfolding matrices). However, in [15] the authors suggest some simple modification that overcomes this issue. Therefore, we need to estimate the ranks from the above, but the estimate should not be much larger than the real rank. So, the approach is to start from some small rank, construct the tensor in TT format and then apply recompression (see [16]). If there is a rank that is not reduced, then we underestimated that rank and should increase it and repeat the procedure.4 Experimental results
In this section we present the results of the application of our approach to engineering problems.
The experimental setup is the following. We try the following optimization algorithms

SGD – stochastic gradient descent [25],

Ropt – Riemannian optimization [19],

TTWopt – weighted tensor train optimization [25],

ALS – alternating least squares [9].
We run each algorithm with random initialization and with the proposed GPbased initialization and then compare results.
The quality of the methods is measured using mean squared error (MSE)
where is some set of indices independent from the given observed set of indices , is a size of the set and is an obtained approximation of the actual tensor .
The listed above algorithms were compared on two problems: CMOS oscillator model and Cookie problem (see Section 4.2 and Section 4.1 correspondingly). In CMOS oscillator problem we run each optimization
times with different random training sets and then calculate the average reconstruction error as well as standard deviation. However, for Cookie problem we performed
runs, and the training set was the same during all runs because it is more computationally intensive and generating several training sets takes more time.Note, that when we use GP based initialization, the TTrank of the tensor is selected automatically by the TTcross algorithm and max value of can be larger than . The optimization algorithms with random initialization do not have a procedure for automatic rank selection, so we ran them with different ranks (from to ) and then chose the best one.
TTWopt implementation^{1}^{1}1https://github.com/yuanlonghao/T3C_tensor_completion does not support highdimensional problems. For higher dimensional problems the authors of TTWopt propose to use SGD. The authors of TTWopt also propose truncated SVD based initialization. The idea is to fill missing values using the mean value of the observed part of the tensor and then to apply truncated SVD to obtain TT cores. However, such approach is only applicable to lowdimensional tensors as it requires to calculate full matrices of large size.
For Ropt and ALS we used publically available MATLAB codes ^{2}^{2}2https://anchp.epfl.ch/indexhtml/software/ttemps/.
4.1 Cookie problem
Let us consider parameterdependent PDE [2, 20]:
where
is a disk of radius and is a number of disks which form grid. This is a heat equation where heat conductivity depends on (see illustration in Fig. 1) and is an dimensional.
We are interested in average temperature over : . If takes possible values then there are possible values of .
In this work we used the following setup for the Cookie problem: each parameter lie in the interval , number of levels for each p is , number of cookies is and , size of the observed set is , for the test set we used independently generated points.
The results of tensor completion are presented in Table 1. One can see that GP based initialization gives lower reconstruction errors both on the training set and test set except for ALS technique. ALS method with the proposed initialization overfits: the error on the training set is close to , whereas the test error is much more significant. The error on the training set is about , which means that the training set was approximated with machine precision. It is not surprising if we recall that there are only observed values, while the number of free parameters that are used to construct TT is much higher.
4.2 CMOS ring oscillator
Let us consider the CMOS ring oscillator [24]. It is an electric circuit which consists of stages of CMOS inverters. We are interested in the oscillation frequency of the oscillator. The characteristics of the electric circuit are described by parameters. Each parameter can take one of values, so the total size of the tensor is . The number of observed values that were used during the experiments is . For the test set we used independently generated points.
The results of the experiments are given in Table 2. The table demonstrates that utilizing GP based initialization improves the results for all algorithms except ALS. ALS, in this case, overfits again: training error is extremely small, whereas the test error is much larger, though it is rather small compared to other techniques and ALS with random initialization.
Training set  
Random init  GP init  
error  average N iters  error  average N iters  
SGD  
Ropt  
TTWopt  
ALS  
SGD  
Ropt  
TTWopt  
ALS  
Test set  
SGD  —  —  
Ropt  —  —  
TTWopt  —  —  
ALS  —  —  
SGD  —  —  
Ropt  —  —  
TTWopt  —  —  
ALS  —  — 
Training set  
Random init  GP init  
error  N iters  error  N iters  
SGD  
Ropt  
ALS  
Test set  
SGD  —  —  
Ropt  —  —  
ALS  —  — 
All in all, the obtained results prove that GP based initialization allows improving the tensor completion results in general. At least it provides better training error. As for the error on the test set one should be more careful as the number of degrees of freedom is large and there exist many solutions that give a small error for the observed values but large errors for other values.
5 Related works
One set of approaches to tensor completion is based on nuclear norm minimization. The nuclear norm of a matrix is defined as a sum of all singular values of the matrix. This objective function is a convex envelope of the rank function. For a tensor the nuclear norm is defined as a sum of singular values of matricizations of the tensor.
There are efficient offtheshelf techniques for such types of problems that apply interiorpoint methods. However, they are secondorder methods and scale poorly with the dimensionality of the problem. Special optimization technique was derived for nuclear norm minimization [8, 13, 18].
More often such techniques are applied to matrices or lowdimensional tensors as their straightforward formulation allows finding the full tensor. It becomes infeasible when we come to highdimensional problems.
The second type of approaches is based on lowrank tensor decomposition [1, 7, 12, 19, 23]. There are several tensor decompositions, and all these papers derive some optimization procedure for one of them, namely, CP decomposition, Tucker decomposition or TT/MPS decomposition. The simplest technique is the alternating least squares [10]. It just finds the solution iteratively at each iteration minimizing the objective function w.r.t. one core while other cores are fixed.
Another approach is based on Riemannian optimization that tries to find the optimal solution on the manifold of lowrank tensors of the given structure [19]. The same can be done by using Stochastic Gradient Descent [23]. Riemannian optimization, TTWopt, ALS and its modifications (e.g., ADF, alternating directions fitting [9]) try to find the TT representation of the actual tensor iteratively. At each iteration it optimizes TT cores such that the resulting tensor approximates well the tensor which coincides with the real tensor at observed indices and with the result of the previous iteration at other indices. All these approaches need to specify rank manually. The authors of [26] apply the Bayesian framework for CP decomposition which allows them to select the rank of the decomposition automatically.
In some papers the objective is modified by introducing special regularizers to suit the problem better [22]. For example, in [7, 26] to obtain better results for visual data a special prior regularizer was utilized.
Our proposed algorithm is an initialization technique for the tensor completion problems in TT format and can be used with most of the algorithms solving such problems. If the assumptions from Section 3 (the tensor values are values of some rather smooth function of tensor indices) are satisfied the initial value will be close to the optimal providing better results. The question of a good initialization is rarely taken into account. In paper [11]
a special initialization if proposed for visual data. The idea is to use some crude technique (like bilinear interpolation) to fill missing values and after that apply SVDbased tensor train decomposition. The drawback of the approach is that it can be applied only in case of smalldimensional tensors as we need to fill all missing values. In
[9] they propose special initialization for the Alternating Direction Fitting (ADF) method. This is a general technique for the tensor completion and it does not take into account the assumptions on the data generating function.6 Conclusions
We proposed a new initialization algorithm for highdimensional tensor completion in TT format. The approach is designed mostly for the cases when some smooth function generates the tensor values. It can be combined with any optimization procedure that is used for tensor completion. Additionally, the TTrank of the initial tensor is adjusted automatically by the TTcross method and defines the resulting rank of the tensor. So, the approach provides an automatic rank selection. Our experimental study confirms that the proposed initialization delivers lower reconstruction errors for many of the optimization procedures.
Acknowledgments
We would like to thank Zheng Zhang, who kindly provided us CMOS ring oscillator data set.
References
 [1] E. Acar, D. M. Dunlavy, T. G. Kolda, and M. Mørup, Scalable tensor factorizations for incomplete data, Chemometrics and Intelligent Laboratory Systems, 106 (2011), pp. 41–56.
 [2] J. Ballani and L. Grasedyck, Hierarchical tensor approximation of output quantities of parameterdependent pdes, SIAM/ASA Journal on Uncertainty Quantification, 3 (2015), pp. 852–872.
 [3] M. Belyaev, E. Burnaev, E. Kapushev, M. Panov, P. Prikhodko, D. Vetrov, and D. Yarotsky, Gtapprox: Surrogate modeling for industrial design, Advances in Engineering Software, 102 (2016), pp. 29–39.

[4]
M. Belyaev, E. Burnaev, and Y. Kapushev, Gaussian process regression
for structured data sets
, in International Symposium on Statistical Learning and Data Sciences, Springer, 2015, pp. 106–115.
 [5] M. Belyaev, E. Burnaev, and Y. Kapushev, Computationally efficient algorithm for gaussian process regression in case of structured samples, Computational Mathematics and Mathematical Physics, 56 (2016), pp. 499–513.
 [6] E. Burnaev, M. Panov, and A. Zaytsev, Regression on the basis of nonstationary gaussian processes with bayesian regularization, Journal of communications technology and electronics, 61 (2016), pp. 661–671.
 [7] Y.L. Chen, C.T. C. Hsu, and H.Y. M. Liao, Simultaneous tensor decomposition and completion using factor priors, IEEE Transactions on Pattern Analysis & Machine Intelligence, (2013), p. 1.
 [8] S. Gandy, B. Recht, and I. Yamada, Tensor completion and lownrank tensor recovery via convex optimization, Inverse Problems, 27 (2011), p. 025010.
 [9] L. Grasedyck, M. Kluge, and S. Krämer, Alternating directions fitting (adf) of hierarchical low rank tensors, Preprint, 149 (2013).
 [10] L. Grasedyck, M. Kluge, and S. Krämer, Alternating least squares tensor completion in the ttformat, arXiv preprint arXiv:1509.00311, (2015).
 [11] C.Y. Ko, K. Batselier, W. Yu, and N. Wong, Fast and accurate tensor completion with total variation regularized tensor trains, arXiv preprint arXiv:1804.06128, (2018).
 [12] D. Kressner, M. Steinlechner, and B. Vandereycken, Lowrank tensor completion by riemannian optimization, BIT Numerical Mathematics, 54 (2014), pp. 447–468.
 [13] J. Liu, P. Musialski, P. Wonka, and J. Ye, Tensor completion for estimating missing values in visual data, IEEE transactions on pattern analysis and machine intelligence, 35 (2013), pp. 208–220.
 [14] M. Munkhoeva, Y. Kapushev, E. Burnaev, and I. Oseledets, Quadraturebased features for kernel approximation, in Advances in Neural Information Processing Systems, 2018, pp. 9147–9156.
 [15] I. Oseledets and E. Tyrtyshnikov, Ttcross approximation for multidimensional arrays, Linear Algebra and its Applications, 432 (2010), pp. 70–88.
 [16] I. V. Oseledets, Tensortrain decomposition, SIAM Journal on Scientific Computing, 33 (2011), pp. 2295–2317.

[17]
C. E. Rasmussen, Gaussian processes in machine learning
, in Advanced lectures on machine learning, Springer, 2004, pp. 63–71.
 [18] B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization, SIAM review, 52 (2010), pp. 471–501.
 [19] M. Steinlechner, Riemannian optimization for highdimensional tensor completion, SIAM Journal on Scientific Computing, 38 (2016), pp. S461–S484.

[20]
C. Tobler,
Lowrank tensor methods for linear systems and eigenvalue problems
, PhD thesis, ETH Zurich, 2012.  [21] E. Tyrtyshnikov, Incomplete cross approximation in the mosaicskeleton method, Computing, 64 (2000), pp. 367–380.
 [22] T. Yokota, Q. Zhao, and A. Cichocki, Smooth parafac decomposition for tensor completion, IEEE Transactions on Signal Processing, 64 (2016), pp. 5423–5436.
 [23] L. Yuan, Q. Zhao, and J. Cao, Completion of high order tensor data with missing entries via tensortrain decomposition, in International Conference on Neural Information Processing, Springer, 2017, pp. 222–229.
 [24] Z. Zhang, T.W. Weng, and L. Daniel, Bigdata tensor recovery for highdimensional uncertainty quantification of process variations, IEEE Transactions on Components, Packaging and Manufacturing Technology, 7 (2017), pp. 687–697.
 [25] L. Y. Zhao, J. Cao, et al., Highdimension tensor completion via gradientbased optimization under tensortrain format, arXiv preprint arXiv:1804.01983, (2018).
 [26] Q. Zhao, L. Zhang, and A. Cichocki, Bayesian cp factorization of incomplete tensors with automatic rank determination, IEEE transactions on pattern analysis and machine intelligence, 37 (2015), pp. 1751–1763.
Comments
There are no comments yet.