1 Introduction
Tensors are multidimensional or way arrays, which provide a natural way to represent multimodal data [CPZ17a, CPZ17b]. Lowrank tensor completion problem, in particular, aims to recover a lowrank tensor from partially observed tensor [ADKM11]. This problem has numerous applications in image/video inpainting [LMWY13, KSV14], linkprediction [EAC15], and recommendation systems [SNM08], to name a few.
In this work, we focus on trace norm regularized lowrank tensor completion problem of the form
(1) 
where is a partially observed  mode tensor, whose entries are only known for a subset of indices . , if and otherwise, is the Frobenius norm , is a lowrank promoting regularizer, and is the regularization parameter.
Similar to the matrix completion problem, the trace norm regularization has been used to enforce the lowrank constraint for the tensor completion problem. The works [THK10, TS13] discuss the overlapped and latent trace norm regularizations for tensors. In particular, [TS13, WST14] show that the latent trace norm has certain better tensor reconstruction bounds. The latent trace norm regularization learns the tensor as a sparse combination of different tensors. In our work, we empirically motivate the need for learning nonsparse combination of tensors and propose a variant of the latent trace norm that learns a nonsparse combination of tensors. We show a novel characterization of the solution space that allows for a compact storage of the tensor, thereby allowing to develop scalable optimization formulations. Concretely, we make the following contributions in this paper.

We propose a novel trace norm regularizer for lowrank tensor completion problem, which learns a tensor as a nonsparse combination of tensors. In contrast, the more popular latent trace norm regularizer [THK10, TS13, WST14] learns a highly sparse combination of tensors. Nonsparse combination helps in capturing information along all the modes.

We propose a dual framework for analyzing the problem formulation. This provides interesting insights into the solution space of the tensor completion problem, e.g., how the solutions along different modes are related, allowing a compact representation of the tensor.

Exploiting the characterization of the solution space, we develop a fixedrank formulation. Our optimization problem is on Riemannian spectrahedron manifolds and we propose computationally efficient trustregion algorithm for our formulation.
Numerical comparisons on realworld datasets for different applications such as video and hyperspectralimage completion, link prediction, and movie recommendation show that the proposed algorithm outperforms stateoftheart latent trace norm regularized algorithms.
The organization of the paper is as follows. Related works are discussed in Section 2. In Section 3, we study a particular trace norm based tensor completion formulation. Theorem 1 shows the characterization of the solution space. Building on this, we show two particular fixedrank formulations. Both of these problems have the structure of optimization on Riemannian manifolds. The optimization related ingredients and their computational cost are subsequently discussed in Section 4. In Section 5, numerical comparisons on real data sets for three different applications: video and hyperspectralimage completion, link prediction, and movie recommendation, show that our proposed algorithms outperform various stateoftheart latent trace norm regularized algorithms.
The Matlab codes are available at https://pratikjawanpuria.com/.
2 Related work
(a) Ribeira (b) Baboon (c) Ribeira (d) Baboon 
Trace norm regularized tensor completion formulations. The works [LMWY13, TS13, SDLS14, RpABbP13, CYZ16] discuss the overlapped trace norm regularization for tensor learning. The overlapped trace norm is motivated as a convex proxy for minimizing the Tucker (multilinear) rank of a tensor. The overlapped trace norm is defined as: , where is the mode matrix unfolding of the tensor [KB09] and denotes the trace norm regularizer. is a matrix obtained by concatenating mode
fibers (column vectors) of the form
[KB09].Latent trace norm is another convex regularizer used for lowrank tensor learning [THK10, TSHK11, TS13, WST14, GYK17]. In this setting, the tensor is modeled as sum of (unknown) tensors such that are lowrank matrices. The latent trace norm is defined as:
(2) 
A variant of the latent trace norm ( scaled by ) is analyzed in [WST14]. Latent trace norm and its scaled variant achieve better recovery bounds than overlapped trace norm [TS13, WST14]. Recently, [GYK17] proposed a scalable latent trace norm based FrankWolfe algorithm for tensor completion.
The latent trace norm (2) corresponds to the sparsity inducing norm penalization across . Hence, it learns as a sparse combination of . In case of high sparsity, it may result in selecting only one of the tensors as , i.e., for some , in which case is essentially learned as a lowrank matrix. In several realworld applications, tensor data cannot be mapped to a lowrank matrix structure and they require a higher order structure. Therefore, we propose a regularizer which learns a nonsparse combination of
. Nonsparse norms have led to better generalization performance in other machine learning settings
[CMR09, Suz11, JVN14].We show the benefit of learning a nonsparse mixture of tensors as against a sparse mixture on two datasets: Ribeira and Baboon (refer Section 5 for details). Figures 1(a) and 1(b) show the relative sparsity of the optimally learned tensors in the mixture as learned by the regularized latent trace norm based model (2) [TS13, WST14, GYK17] versus the proposed regularized model (discussed in Section 3). The relative sparsity for each in the mixture is computed as
. In both the datasets, our model learns a nonsparse combination of tensors, whereas the latent trace norm based model learns a highly skewed mixture of tensors. The proposed nonsparse tensor combination also leads to better generalization performance, as can be observed in the Figures
1(c) and 1(d). In the particular case of Baboon dataset, the latent trace norm essentially learns as a lowrank matrix () and consequently obtains poor generalization.Other tensor completion formulations. Other approaches for lowrank tensor completion include tensor decomposition methods like Tucker and CP [KB09, CPZ17a, CPZ17b]
. They generalize the notion of singular value decomposition of matrices to tensors. Recently,
[KSV14] exploits the Riemannian geometry of fixed multilinear rank to learn factor matrices and the core tensor. They propose a computationally efficient nonlinear conjugate gradient method for optimization over manifolds of tensors of fixed multilinear rank. [KM16] further propose an efficient preconditioner for lowrank tensor learning with the Tucker decomposition. [ZZC15] propose a Bayesian probabilistic CP model for performing tensor completion. Tensor completion algorithms based on tensor tubalrank have been recently proposed in [ZEA14, LAAW16].3 Nonsparse latent trace norm, duality, and novel formulations for lowrank tensor completion
We propose the following formulation for learning the lowrank tensor
(3) 
where is the learned tensor. It should be noted that the proposed regularizer in (3) employs the norm over . In contrast, the latent trace norm regularizer (2) has the norm over .
While the existing tensor completion approaches [KM16, GYK17, KSV14, LMWY13, TS13, SDLS14] mostly discuss a primal formulation similar to (1), we propose a novel dual framework for our analysis. The use of dual framework, e.g., in the matrix completion problem [XJ12, PTJY10, JM18], often leads to novel insights into the solution space of the primal problem.
We begin by discussing how to obtain the dual formulation of (3). Later, we explain how the insights from the dual framework motivate us to propose two novel fixedrank formulations for (3). As a first step, we exploit the following variational characterization of the trace norm studied in [AEP06, Theorem 4.1]. Given , the following result holds:
(4) 
where denotes the set of positive semidefinite matrices with unit trace, denotes the pseudoinverse of , , and is the inner product. The expression for optimal is [AEP06], and hence the ranks of and X are equal at optimality. Thus, (4) implicitly transfers the lowrank constraint on (due to trace norm) to an auxiliary variable . It is well known that positive semidefinite matrix with unit trace constraint implies the
norm constraint on the eigenvalues of
, leading to lowrankness of .Using the result (4) in (3) leads to auxiliary matrices, one corresponding to every (mode matrix unfolding of the tensor ). It should also be noted that are lowrank matrices. We now present the following theorem that states an equivalent minimax formulation of (3).
Theorem 1.
Proof.
From using the auxiliary s in (3), we obtain the following formulation:
(6) 
For deriving the dual, we introduce new variables which satisfy the constraints . We now introduce the dual variables corresponding to those additional constraints. The Lagrangian of (6) is given as:
(7) 
The dual function of (6) is defined as:
(8) 
By minimizing with respect to and , for all , we arrive at the following conditions:
(9) 
(10) 
where represents mode folding of a matrix P into a tensor.
Remark 1.
We next present the following result related to the form of the optimal solution of (3).
Corollary 1.
(Representer theorem) The optimal solution of the primal problem (3) admits a representation of the form: , where and .
Instead of solving the minimax problem (5) directly, we solve the following equivalent min optimization problem:
(13) 
where and is the convex function
(14) 
The problem formulation (13) allows to decouple the structures of the min and max problems in (5).
As discussed earlier in the section, the optimal is a lowrank positive semidefinite matrix for all . In spite of the lowrankness of the optimal solution, an algorithm for (5) need not produce intermediate iterates that are low rank. From the perspective of largescale applications, this observation as well as other computational efficiency concerns discussed below motivate to exploit a fixedrank parameterization of for all .
Fixedrank parameterization of
We propose to explicitly constrain the rank of to as follows:
(15) 
where and . In largescale tensor completion problems, it is common to set , where the fixedrank parameterization (15) of has a twofold advantage. First, the search space dimension of (13) drastically reduces from , which is quadratic in tensor dimensions, to , which is linear in tensor dimensions [JBAS10]. Second, enforcing the constraint costs , which is linear in tensor dimensions and is computationally much cheaper than enforcing that costs .
Fixedrank dual formulation
A first formulation is obtained by employing the parameterization (15) directly in the problems (13) and (14). We subsequently solve the resulting problem as a minimization problem as follows
(16) 
where and is the function
(17) 
It should be noted that though (16) is a nonconvex problem in , the optimization problem in (17) is strongly convex in for a given and has unique solution.
Fixedrank leastsquares formulation
The second formulation is motivated by the representer theorem (Corollary 1) and the fixedrank parameterization (15). However, instead of solving (3), we take a more practical approach of solving the following lowrank
(18) 
and is
(19) 
where and . It should be noted that the objective function in (18) does not have an explicit regularizer as in (3) to ensure nonsparse and lowrank . However, the regularizer is implicit since we employed the representer theorem and the fixedrank parameterization ( is common for all and ).
4 Optimization algorithms
In this section we discuss the optimization algorithms for (16) and (18). Both of these are of the type
(20) 
where is a smooth loss and is the constraint set. For Problem , and Problem , .
In order to propose numerically efficient algorithms for optimization over , we exploit the particular structure of the set , which is known as the spectrahedron manifold [JBAS10]. The spectrahedron manifold has the structure of a compact Riemannian quotient manifold [JBAS10]. Consequently, optimization on the spectrahedron manifold is handled in the Riemannian optimization framework. This allows to exploit the rotational invariance of the constraint naturally. The Riemannian manifold optimization framework embeds the constraint into the search space, thereby translating the constrained optimization problem into unconstrained optimization problem over the spectrahedron manifold. The Riemannian framework generalizes a number of classical first and secondorder (e.g., the conjugate gradient and trustregion algorithms) Euclidean algorithms to manifolds and provides concrete convergence guarantees [EAS98, AMS08, SKM17, ZRS16, SI13]. The work [AMS08] in particular shows a systematic way of implementing trustregion (TR) algorithms on quotient manifolds. A full list of optimizationrelated ingredients and their matrix characterizations for the spectrahedron manifold is in Section B. Overall, the constraint is endowed a Riemannian structure. Figure 2 depicts a general overview of a gradientbased algorithm on .
We implement the Riemannian TR (secondorder) algorithm for (20). To this end, we require the notions of the Riemannian gradient (the firstorder derivative of the objective function on the manifold), the Riemannian Hessian along a search direction (the covariant derivative of the Riemannian gradient along a tangential direction on the manifold), and the retraction operator which ensures that we always stay on the manifold (i.e., maintain strict feasibility). The Riemannian gradient and Hessian notions require computations of the standard (Euclidean) gradient and the directional derivative of this gradient along a given search direction denoted by , the expressions of which for (16) and 18) are shown in Lemma 1 and Lemma 2, respectively.
Lemma 1.
Let be the optimal solution of the convex problem (17) at . Let denote the gradient of at and denote the directional derivative of the gradient along . Let denote the directional derivative of along at . Then,
where and extracts the symmetric factor of a matrix, i.e., .
Input: , rank , regularization parameter , and tolerance . 
Initialize : . 
repeat 
1: Compute the gradient as given in Lemma 1 for Problem and Lemma 2 for Problem . 
2: Compute the search direction which minimizes the trustregion subproblem. It makes use 
of and its directional derivative presented in Lemma 1 and Lemma 2 for and , 
respectively. 
3: Update with the retraction step to maintain strict feasibility on . 
Specifically for the spectrahedron manifold, , 
where is the search direction. 
until . 
Output: 
Proof.
The gradient is computed by employing the Danskin’s theorem [Ber99, BS00]. The directional derivative of the gradient with respect to U
follows directly from the chain rule keeping
fixed (to the optimal solution of (17) for given U). ∎A key requirement in Lemma 1 is solving (17) for computationally efficiently for a given . It should be noted that (17) has a closedform sparse solution, which is equivalent to solving the linear system
(21) 
Solving the linear system (21) in a single step is computationally expensive (it involves the use of Kronecker products, vectorization of a sparse tensor, and a matrix inversion). Instead, we use an iterative solver that exploits the sparsity in the variable and the factorization form efficiently. Similarly, given and , can be computed by solving
(22) 
Lemma 2.
The gradient of in (19) at and their directional derivatives along , where and are given as follows.
where
(23) 
Proof.
The directional derivatives of the gradient with respect to U and follow directly from the chain rule. ∎
The Riemannian TR algorithm solves a Riemannian trustregion subproblem in every iteration [AMS08, Chapter 7]. The TR subproblem is a secondorder approximation of the objective function in a neighborhood, solution to which does not require inverting the full Hessian of the objective function. It makes use of the gradient and its directional derivative along a search direction. The TR subproblem is approximately solved with an iterative solver, e.g., the truncated conjugate gradient algorithm. The TR subproblem outputs a potential update candidate for , which is then accepted or rejected based on the amount of decrease in . Algorithm 1 summarizes the key steps of the TR algorithm for solving (20).
Remark 2: In contrast to Problem , where (17) is required to be solved for at a given , in Problem both and are updated simultaneously.
Computational and memory complexity
The periteration computational complexities of our Riemannian TR algorithms for both the formulations – (16) and (18) – are shown in Table 1. The computational complexity scales linearly with the number of known entries , denoted by .
In particular, the periteration computational cost depends on two sets of operations. First, on manifold related ingredients like the retraction operation. Second, the objective function related ingredients like the computation of the partial derivatives.
The manifold related operations cost for (16) and for (18). Specifically, the retraction on the spectrahedron manifold costs as it needs to project a matrix of size on to the set , which costs .
The following are the computational cost for the objective function related ingredients.

. It involves computation of matrix with mode unfolding of a sparse with nonzero entries. Such matrixmatrix operations are used in both (16) and (18).The computation of costs . It should be noted that although the dimension of is , only a maximum of columns have nonzero entries. We exploit this property of and have a compact memory storage of .

Computation of in (16) and its partial derivatives. The computation of relies on the solution of (21) and then explicitly computing the objective function in (17). This costs , where is the number of iterations to solve (21). The computation of requires the computation of terms like , and overall it costs . Given a search direction , the computation of costs .

Computation of in (18) and its partial derivatives. The computation of costs . Specifically, the computation of costs . The computations of and it directional derivatives require the computation of terms like , which costs . The same computational overhead is for and its directional derivative.
The memory cost for both algorithms is , where is linear in .
Convergence
The Riemannian TR algorithms come with rigorous convergence guarantees. [AMS08] discusses the rate of convergence analysis of manifold algorithms, which directly apply in our case. Specifically for trust regions, the global convergence to a firstorder critical point is discussed in [AMS08, Section 7.4.1] and the local convergence to local minima is discussed in [AMS08, Section 7.4.2]. From an implementation perspective, we follow the existing approaches [KSV14, KM16, GYK17] and upperlimit the number of TR iterations.
Numerical implementation
Our algorithms are implemented in the Manopt toolbox [BMAS14] in Matlab, which has offtheshelf generic TR implementation. For efficient computation of certain sparse matrixvector we make use of the mex support in Matlab. For solving (21) and (22), which are used in solving (16), we rely on the conjugate gradients solver of Matlab.
5 Experiments
In this section, we evaluate the generalization performance and efficiency of the proposed algorithms against stateoftheart in several tensor completion applications. We provide the experimental analysis of our proposed algorithms for the applications of video and hyperspectralimage completion, recommendation and link prediction. Table 2 provides the details of all the datasets we experiment with. We compare performance of the following trace norm based algorithms.

TRMM: our trustregion algorithm for the proposed formulation (16).

TRLS: our trustregion algorithm for the proposed formulation (18).

Latent^{1}^{1}1http://tomioka.dk/softwares/ [THK10]: a convex optimization based algorithm with latent trace norm.

Hard [SDLS14]: an algorithm based on a convex optimization framework with spectral regularization for tensor learning.

HaLRTC^{2}^{2}2http://www.cs.rochester.edu/u/jliu/code/TensorCompletion.zip [LMWY13]: an ADMM based algorithm for tensor completion.

FFW^{3}^{3}3http://home.cse.ust.hk/~qyaoaa/ [GYK17]: a FrankWolfe algorithm with scaled latent trace norm regularization.
We denote our algorithms for Problem (16) and Problem (18) as TRMM and TRLS, respectively. For both of our algorithms, we set for all , which implies that in is the only hyperparameter that needed to be tuned. We tune with fold crossvalidation on the train data of the split from the range .
Latent trace norm based algorithms such as FFW [GYK17] and Latent [THK10], and overlapped trace norm based algorithms such as HaLRTC [LMWY13] and Hard [SDLS14] are the closest to our approach. FFW is a recently proposed stateoftheart large scale tensor completion algorithm. Table 3 summarizes the baseline algorithms.
The proposed algorithms are implemented using Manopt toolbox [BMAS14] in Matlab. All the experiments are run on an Intel Xeon CPU with 32GB RAM and 2.40GHz processor.
Dataset  Dimensions  Task 

Ribeira  203 268 33  Image completion 
Tomato  242 320 167  Video completion 
Baboon  256 256 3  Image completion 
FB15k237  14541 14541 237  Link prediction 
YouTube (subset)  1509 1509 5  Link prediction 
YouTube (full)  15088 15088 5  Link prediction 
ML10M  71567 10681 731  Recommendation 
Algorithm  Modeling details 
Trace norm based algorithms  
FFW  Scaled latent trace norm (scaled norm) + 
Frank Wolfe optimization + basis size reduction  
Hard  Scaled overlapped trace norm + proximal gradient 
HaLRTC  Scaled overlapped trace norm + Alternating 
direction methods of multipliers (ADMM)  
Latent  Latent trace norm (norm) + ADMM 
Other algorithms  
Rprecon  Fixed multilinear rank + Riemannian CG 
with preconditioning  
BayesCP  Bayesian CP algorithm with rank tuning 
geomCG  Riemannian fixed multilinear rank CG algorithm 
Topt  Fixed multilinear rank CG algorithm 
Tsvd  Tensor tubalrank + ADMM 
Ribeira  Tomato  Baboon  MovieLens10M  

TRMM  
TRLS  
FFW  
Rprecon  
geomCG  
Hard    
Topt    
HaLRTC    
Latent    
Tsvd    
BayesCP   
5.1 Video and hyperspectralimage completion
We work with the following data sets for predicting missing values in multimedia data like videos and hyperspectral images.
Ribeira: A hyperspectral image data set^{4}^{4}4http://personalpages.manchester.ac.uk/staff/d.h.foster/Hyperspectral_images_of_natural_scenes_04.html [FNA04] of size , where each slice represents a particular image measured at a different wavelength. We resize this tensor to [SDLS14, KSV14, KM16]. Following [KM16], the incomplete tensor as training data is generated by randomly sampling of the entries and the evaluation (testing) of the learned tensor was done on another of the entries. The experimented is repeated times. Figure 3(a) shows the Ribeira image.
Tomato: A tomato video sequence data set^{5}^{5}5http://www.cs.rochester.edu/u/jliu/publications.html [LMWY13, CC14] of size . We follow the same experimental setup as for Ribeira data set discussed above.
Baboon: An RGB image ^{6}^{6}6https://github.com/qbzhao/BCPF of a baboon used in [ZZC15] which is modeled as a tensor. The experimental setup is same as that for Ribeira. Figure 3(b) shows the image.
Results. Table 4 reports the root mean squared error (RMSE) on the test set that is averaged over ten splits. Our algorithms, TRMM and TRLS, obtain the best results, outperforming other trace norm based algorithms on Ribeira, Tomato and Baboon data sets. Figure 4(a) shows the tradeoff between the test RMSE and the training time of all the algorithms on Ribeira. It can be observed that both our algorithms converge to the lowest RMSE at a significantly faster rate compared to the other baselines. It is evident from the results that learning a mixture of nonsparse tensors, as learned by the proposed algorithms, helps in achieving better performance compared to the algorithms that learn a skewed sparse mixture of tensors.
5.2 Link prediction
In this application, the task is to predict missing or new links in knowledge graphs, social networks, etc. We consider the following data sets.
FB15k237: FB15k237^{7}^{7}7http://kristinatoutanova.com/ [TCP15] is a subset of FB15k dataset [BUGD13], containing facts of the form subjectpredicateobject (RDF) triples from Freebase knowledge graph. The subject and object noun phrases are called entities and the predicates are the relationships among them. For instance, in the RDF triple {‘Obama’, ‘president of’, ‘United States’}, ‘Obama’ and ‘United States’ are entities, and ‘president of’ is a relationship between them. The task is to predict relationships (from a given set of relations) between a pair of entities. FB15k237 contains entities and relationships. It has observed relationships (links) between pairs of entities, which are positive samples. In addition, negative samples are generated following the procedure described in [BUGD13]. Negative samples are those {entity, relationship, entity} triples in which the relationship does not hold between entity and entity. We model this task as a tensor completion problem. implies that relationship exists between entity and entity (positive sample), and implies otherwise (negative sample). We keep of the observed entries for training and the remaining for testing.
YouTube: This is a link prediction data set^{8}^{8}8http://leitang.net/data/youtubedata.tar [TWL09] having 5 types of interactions between users. The task is to predict the interaction (from a given set of interactions) between a pair of users. We model it as a tensor completion problem. All the entries are known in this case. We randomly sample of the data for training [GYK17] and another for testing.
It should be noted that Hard, HaLRTC, and Latent do not scale to the full FB15k237 and YouTube data sets as they need to store full tensor in memory. Hence, we follow [GYK17] to create a subset of the YouTube data set of size in which 1509 users with most number of links are chosen. We randomly sample of the data for training and another for testing.
Results. All the above experiments are repeated on ten random traintest splits. Following [LZ11, GYK17], the generalization performance for link prediction task is measured by computing the area under the ROC curve on the test set (test AUC) for each algorithm. Table 5 report the average test AUC on YouTube (subset), Youtube (full) and FB15k237 data sets. The TRMM algorithm achieves the best performance in all the link prediction tasks, while TRLS achieves competitive performance. This shows that the nonsparse mixture of tensors learned by TRMM helps in achieving better performance. Figures 4 (b) & (c) plots the tradeoff between the test AUC and the training time for FB15k237 and YouTube, respectively. We can observe that TRMM is the fastest to converge to a good AUC and take only a few iterations.
We also conduct experiments to evaluate the performance of different algorithms in challenging scenarios when the amount of training data available is less. On the FB15k237 data set, we vary the size of training data from to of the observed entries, and the remaining of the observed entries is kept as the test set. Figure 4 (d) plots the performance of different algorithms on this experiment. We can observe that both our algorithms do significantly better than baselines in data scarce regimes.
YouTube (subset)  YouTube (full)  FB15k237  

TRMM  
TRLS  
FFW  
Rprecon  
geomCG  
Hard      
Topt      
HaLRTC      
Latent      
Tsvd  
BayesCP     
5.3 Movie recommendation
We also evaluate the algorithms on the MovieLens10M^{9}^{9}9http://files.grouplens.org/datasets/movielens/ml10mREADME.html data set [HK15]. This is a movie recommendation task — predict the ratings given to movies by various users. MovieLens10M contains ratings of movies given by users. Following [KM16], we split the time into 7days wide bins, forming a tensor of size . For our experiments, we generate random traintest splits, where of the observed entries is kept for training and remaining is used for testing. Table 4 reports the average test RMSE on this task. It can be observed that the proposed algorithms outperform stateoftheart latent and scaled latent trace norm based algorithms.
5.4 Results compared to other baseline algorithms
Dataset  TRMM/TRLS  Tuckerbased algorithms 

Ribeira  (5,5,5)  (15,15,6) 
Tomato  (10,10,10)  (15,15,15) 
Baboon  (4,4,3)  (4,4,3) 
FB15k237  (20,20,1)  (5,5,5) 
YouTube (subset)  (3,3,1)  (5,5,5) 
YouTube (full)  (3,3,1)  (5,5,5) 
ML10M  (20,10,1)  (4,4,4) 
In addition to the trace norm based algorithms, we also compare against the following tensor decomposition algorithms:

Rprecon^{10}^{10}10https://bamdevmishra.com/codes/tensorcompletion/ [KM16]: a Riemannian manifold preconditioning algorithm for the fixed multilinear rank lowrank tensor completion problem.

geomCG^{11}^{11}11http://anchp.epfl.ch/geomCG [KSV14]: a fixed multilinear rank lowrank tensor completion algorithm using Riemannian optimization on the manifold of fixed multilinear rank tensors.

Topt [FJ15]: a nonlinear conjugate gradient algorithm for Tucker decomposition.

Tsvd^{12}^{12}12http://www.ece.tufts.edu/~shuchin/software.html[ZEA14]: a tubal rank based tensor completion algorithm.

BayesCP^{13}^{13}13https://github.com/qbzhao/BCPF[ZZC15]:a Bayesian CP algorithm for tensor completion which has inbuilt rank tuning.
As can be observed from Tables 4 and 5, the proposed TRMM and TRLS achieve better results than the baseline algorithms. However for the movie recommendation task, Rprecon achieves better results. It should be noted that Topt, Tsvd, and BayesCP are not scalable for large scale datasets.
For all the datasets, we vary the rank between (3,3,3) and (25,25,25) for TRMM and TRLS algorithms. For Tucker based algorithms we vary the Tucker rank between (3,3,3) and (25,25,25). Table 6 shows the ranks at which the algorithms achieve best results with different datasets.
Algorithms based on the Tucker decomposition [FJ15, KM16, KSV14] model the tensor completion problem as a factorization problem. Tucker decomposition is a form of higher order PCA [KB09], it decomposes a tensor into a core tensor multiplied by a matrix along each mode. Given the in complete tensor , the Tucker decomposition based tensor completion problem is:
where the multilinear rank of the optimal solution is given by , which is a user provided parameter in practical settings. CP based algorithms like BayesCP [ZZC15] are a special case of Tucker based modeling with set to be super diagonal and , which is a user provided parameter in practical setting. In particular, [ZZC15] discusses a probabilistic model. Tsvd algorithm [ZEA14] models the tensor completion problem using the tensor singular value decomposition, and the rank here is the tensor tubal rank. Finally, algorithms based on latent norm regularization [GYK17, LMWY13, SDLS14, THK10] model the completion problem by approximating the input tensor as a sum of individual tensors (3). Each of these individual tensors is constrained to have a lowrank in one mode. The lowrank constraint is enforced on a different mode for each tensor. Due to the fundamental difference in modeling of these algorithms, the tensor ranks of the optimal solution obtained from algorithms that follow one approach cannot be compared with that of the algorithms that follow the other approach.
5.5 Results on outlier robustness
TRMM  TRLS  FFW  Rprecon  geomCG  Hard  Topt  HaLRTC  Latent  Tsvd  BayesCP  

Ribeira (RMSE)  
FB15k237 (AUC)  
Results on outlier robustness experiments. Our algorithm, TRMM, is more robust to outliers than the competing baselines. The symbol ‘
’ denotes the dataset is too large for the algorithm to generate result.We also evaluate TRMM and the baselines considered in Table 3 for outlier robustness on hyperspectal image completion and link prediction problems. In the Ribeira dataset, we add the standard Gaussian noise to randomly selected fraction of the entries in the training set. The minimum and the maximum value of entries in the (original) Ribeira are and , respectively. In FB15k237 dataset, we flip randomly selected fraction of the entries in the training set, i.e., the link is removed if present and viceversa. We experiment with and .
The results are reported in Table 7. We observe that our algorithm, TRMM, obtains the best generalization performance and, hence, is the most robust to outliers. We also observe that trace norm regularized algorithms (TRMM, FFW, Hard, HaLTRC, Latent) are relatively more robust to outliers than Tuckerdecomposition based algorithms (Rprecon, geomCG, Topt), CPdecomposition based algorithm (BayesCP), and tensor tubalrank based algorithm (Tsvd).
6 Conclusion
In this paper, we introduce a variant of trace norm regularizer for lowrank tensor completion problem which learns the tensor as a nonsparse combination of tensors. The number of tensors in this combination is equal to the number of modes (). Existing works [THK10, TS13, WST14, GYK17] learn a sparse combination of tensors, essentially learning the tensor as a lowrank matrix and losing higher order information in the available data. Hence, we recommend learning a nonsparse combination of tensors in trace norm regularized setting, especially since is typically a small integer in most realworld applications. In our experiments, we observe better generalization performance with the proposed regularization. Theoretically, we provide the following result on the reconstruction error in the context of recovering an unknown tensor from noisy observation.
Lemma 3.
Let be the true tensor to be recovered from observed , which is obtained as , where