 # Low-rank tensor completion: a Riemannian manifold preconditioning approach

We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian optimization on quotient manifolds to develop preconditioned nonlinear conjugate gradient and stochastic gradient descent algorithms for batch and online setups, respectively. Concrete matrix representations of various optimization-related ingredients are listed. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This paper addresses the problem of low-rank tensor completion when the rank is a priori known or estimated. We focus on 3-order tensors in the paper, but the developments can be generalized to higher order tensors in a straightforward way. Given a tensor

, whose entries are only known for some indices , where is a subset of the complete set of indices , the fixed-rank tensor completion problem is formulated as

 min\mathbfcalX∈Rn1×n2×n31|Ω|∥\mathbfcalPΩ(\mathbfcalX)−\mathbfcalPΩ(\mathbfcalX⋆)∥2Fsubject torank(\mathbfcalX)=r, (1)

where the operator if and otherwise and (with a slight abuse of notation) is the Frobenius norm. is the number of known entries. , called the multilinear rank of , is the set of the ranks of for each of mode- unfolding matrices. enforces a low-rank structure. The mode is a matrix obtained by concatenating the mode- fibers along columns, and mode- unfolding of a -order tensor is for .

Problem (1) has many variants, and one of those is extending the nuclear norm regularization approach from the matrix case Candès & Recht (2009) to the tensor case. This results in a summation of nuclear norm regularization terms, each one corresponds to each of the unfolding matrices of . While this generalization leads to good results Liu et al. (2013); Tomioka et al. (2011); Signoretto et al. (2014)

, its applicability to large-scale instances is not trivial, especially due to the necessity of high-dimensional singular value decomposition computations. A different approach exploits

Tucker decomposition (Kolda & Bader, 2009, Section 4) of a low-rank tensor to develop large-scale algorithms for (1), e.g., in Filipović & Jukić (2013); Kressner et al. (2014).

The present paper exploits both the symmetry present in Tucker decomposition and the least-squares structure of the cost function of (1) to develop competitive algorithms. The multilinear rank constraint forms a smooth manifold (Kressner et al., 2014). To this end, we use the concept of manifold preconditioning. While preconditioning in unconstrained optimization is well studied (Nocedal & Wright, 2006, Chapter 5), preconditioning on constraints with symmetries, owing to non-uniqueness of Tucker decomposition Kolda & Bader (2009), is not straightforward. We build upon the recent work Mishra & Sepulchre (2016) that suggests to use preconditioning with a tailored metric (inner product) in the Riemannian optimization framework on quotient manifolds Absil et al. (2008); Edelman et al. (1998); Mishra & Sepulchre (2016). The differences with respect to the work of Kressner et al. (2014), which also exploits the manifold structure, are twofold. (i) Kressner et al. (2014) exploit the search space as an embedded submanifold of the Euclidean space, whereas we view it as a product of simpler search spaces with symmetries. Consequently, certain computations have straightforward interpretation. (ii) Kressner et al. (2014) work with the standard Euclidean metric, whereas we use a metric that is tuned to the least-squares cost function, thereby inducing a preconditioning effect. This novel idea of using a tuned metric leads to a superior performance of our algorithms. They also connect to state-of-the-art algorithms proposed in Ngo & Saad (2012); Wen et al. (2012); Mishra & Sepulchre (2014); Boumal & Absil (2015).

The paper is organized as follows. Section 2 discusses the two fundamental structures of symmetry and least-squares associated with (1) and proposes a novel metric that captures the relevant second order information of the problem. The optimization-related ingredients on the Tucker manifold are developed in Section 3. The cost function specific ingredients are developed in Section 4. The final formulas are listed in Table 1, which allow to develop preconditioned conjugate gradient descent algorithm in the batch setup and stochastic gradient descent algorithm in the online setup. In Section 5, numerical comparisons with state-of-the-art algorithms on various synthetic and real-world benchmarks suggest a superior performance of our proposed algorithms. Our proposed algorithms are implemented in the Matlab toolbox Manopt Boumal et al. (2014). The concrete proofs of propositions, development of optimization-related ingredients, and additional numerical experiments are shown in Sections A and B, respectively, of the supplementary material file. The Matlab codes for first and second order implementations, e.g., gradient descent and trust-region methods, are available at https://bamdevmishra.com/codes/tensorcompletion/.

## 2 Exploiting the problem structure

Construction of efficient algorithms depends on properly exploiting the problem structure. To this end, we focus on two fundamental structures in (1): symmetry in the constraints and the least-squares structure of the cost function. Finally, a novel metric is proposed.

The symmetry structure in Tucker decomposition. The Tucker decomposition of a tensor of rank r (=) is

 \mathbfcalX=\mathbfcalG×1\bf U1×2\bf U2×3\bf U3, (2)

where for belongs to the Stiefel manifold of matrices of size with orthogonal columns and Kolda & Bader (2009). Here, computes the d-mode product of a tensor and a matrix . Tucker decomposition (2) is not unique as remains unchanged under the transformation

 (\bf U1,\bf U2,\bf U3,\mathbfcalG)↦(\bf U1\bf O1,\bf U2%O2,\bf U3\bf O3,\mathbfcalG×1% \bf OT1×2\bf OT2×3\bf OT3) (3)

for all , which is the set of orthogonal matrices of size of . The classical remedy to remove this indeterminacy is to have additional structures on like sparsity or restricted orthogonal rotations (Kolda & Bader, 2009, Section 4.3). In contrast, we encode the transformation (3) in an abstract search space of equivalence classes, defined as,

 [(\bf U1,\bf U2,\bf U3,\mathbfcalG)]:={(\bf U1\bf O1,\bf U2% \bf O2,\bf U3\bf O3,\mathbfcalG×1\bf O% T1×2\bf OT2×3\bf OT3):\bf Od∈O(rd)}. (4)

The set of equivalence classes is the quotient manifold Lee (2003)

 M/∼:=M/(O(r1)×O(r2)×O(r3)), (5)

where is called the total space (computational space) that is the product space

 M:=St(r1,n1)×St(r2,n2)×St(r3,n3)×Rr1×r2×r3. (6)

Due to the invariance (3), the local minima of (1) in are not isolated, but they become isolated on . Consequently, the problem (1) is an optimization problem on a quotient manifold for which systematic procedures are proposed in Absil et al. (2008); Edelman et al. (1998). A requirement is to endow endow with a Riemannian structure, which conceptually translates (1) into an unconstrained optimization problem over the search space . We call , defined in (5), the Tucker manifold as it results from Tucker decomposition.

The least-squares structure of the cost function. In unconstrained optimization, the Newton method is interpreted as a scaled steepest descent method, where the search space is endowed with a metric (inner product) induced by the Hessian of the cost function Nocedal & Wright (2006). This induced metric (or its approximation) resolves convergence issues of first order optimization algorithms. Analogously, finding a good inner product for (1) is of profound consequence. Specifically for the case of quadratic optimization with rank constraint (matrix case), Mishra and Sepulchre Mishra & Sepulchre (2016) propose a family of Riemannian metrics from the Hessian of the cost function. Applying this approach directly for the particular cost function of (1) is computationally costly. To circumvent the issue, we consider a simplified cost function by assuming that contains the full set of indices, i.e., we focus on to propose a metric candidate. Applying the metric tuning approach of Mishra & Sepulchre (2016) to the simplified cost function leads to a family of Riemannian metrics. A good trade-off between computational cost and simplicity is by considering only the block diagonal elements of the Hessian of . It should be noted that the cost function is convex and quadratic in . Consequently, it is also convex and quadratic in the arguments individually. Equivalently, the block diagonal approximation of the Hessian of in is

 ((\bf G1\bf GT1)⊗\bf In1,(\bf G2\bf GT2)⊗\bf In2,(\bf G3\bf GT3)⊗\bf In3,\bf Ir1r2r3), (7)

where is the mode- unfolding of and is assumed to be full rank. is the Kronecker product. The terms for are positive definite when , , and , which is a reasonable assumption.

A novel Riemannian metric. An element in the total space has the matrix representation . Consequently, the tangent space is the Cartesian product of the tangent spaces of the individual manifolds of (6), i.e., has the matrix characterization Edelman et al. (1998)

 TxM={(\bf ZU1,%ZU2,\bf ZU3,\bf Z\mathbfcalG)∈Rn1×r1×Rn2×r2×Rn3×r3×Rr1×r2×r3:\bf UTd\bf ZUd+\bf ZTUd\bf Ud=0, for d∈{1,2,3}}. (8)

From the earlier discussion on symmetry and least-squares structure, we propose the novel metric or inner product

 gx(ξx,ηx)=⟨ξ% \bf U1,η\bf U1(\bf G1\bf GT1)⟩+⟨ξ\bf U2,η\bf U2(\bf G2\bf GT2)⟩+⟨ξ\bf U3,η\bf U3(\bf G3\bf GT3)⟩+⟨ξ\mathbfcalG,η\mathbfcalG⟩, (9)

where

are tangent vectors with matrix characterizations, shown in (

8), and , respectively and is the Euclidean inner product. It should be emphasized that the proposed metric (9) is induced from (7).

###### Proposition 1.

Let and be tangent vectors to the quotient manifold (5) at , and and be tangent vectors to the quotient manifold (5) at . The metric (9) is invariant along the equivalence class (4), i.e.,

 g(\bf U1,\bf U% 2,\bf U3,\mathbfcalG)((ξ\bf U1,ξ\bf U2,ξ\bf U3,ξ\mathbfcalG),(η\bf U1,η\bf U2,η\bf U3,η\mathbfcalG))=g(\bf U1\bf O1,\bf U2\bf O2,\bf U3\bf O3,\mathbfcalG×1\bf OT1×2\bf OT2×3\bf OT3)((ξ\bf U1\bf O1,ξ\bf U2\bf O2,ξ\bf U3\bf O3,ξ\mathbfcalG×1\bf OT1×2\bf O% T2×3\bf OT3),(η\bf U1\bf O1,η\bf U2\bf O2,η\bf U3\bf O3,η\mathbfcalG×1\bf OT1×2\bf OT2×3\bf OT3)).

## 3 Notions of manifold optimization Figure 1: Riemannian optimization framework: geometric objects, shown in dotted lines, on quotient manifold M/∼ call for matrix representatives, shown in solid lines, in the total space M.

Each point on a quotient manifold represents an entire equivalence class of matrices in the total space. Abstract geometric objects on the quotient manifold call for matrix representatives in the total space . Similarly, algorithms are run in the total space , but under appropriate compatibility between the Riemannian structure of and the Riemannian structure of the quotient manifold , they define algorithms on the quotient manifold. The key is endowing with a Riemannian structure. Once this is the case, a constraint optimization problem, for example (1), is conceptually transformed into an unconstrained optimization over the Riemannian quotient manifold (5). Below we briefly show the development of various geometric objects that are required to optimize a smooth cost function on the quotient manifold (5) with first order methods, e.g., conjugate gradients.

Quotient manifold representation and horizontal lifts. Figure 1 illustrates a schematic view of optimization with equivalence classes, where the points and in belong to the same equivalence class (shown in solid blue color) and they represent a single point on the quotient manifold . The abstract tangent space at has the matrix representation in , but restricted to the directions that do not induce a displacement along the equivalence class . This is realized by decomposing into two complementary subspaces, the vertical and horizontal subspaces. The vertical space is the tangent space of the equivalence class . On the other hand, the horizontal space is the orthogonal subspace to in the sense of the metric (9). Equivalently, . The horizontal subspace provides a valid matrix representation to the abstract tangent space . An abstract tangent vector at has a unique element that is called its horizontal lift.

A Riemannian metric at defines a Riemannian metric , i.e., on the quotient manifold , if does not depend on a specific representation along the equivalence class . Here, and are tangent vectors in , and and are their horizontal lifts in at , respectively. Equivalently, the definition of the Riemannian metric is well posed when for all , where and are the horizontal lifts of along the same equivalence class . This holds true for the proposed metric (9) as shown in Proposition 1. From Absil et al. (2008), endowed with the Riemannian metric (9), the quotient manifold is a Riemannian submersion of . The submersion principle allows to work out concrete matrix representations of abstract object on , e.g., the gradient of a smooth cost function Absil et al. (2008).

Starting from an arbitrary matrix (with appropriate dimensions), two linear projections are needed: the first projection is onto the tangent space , while the second projection is onto the horizontal subspace . The computation cost of these is .

The tangent space projection is obtained by extracting the component normal to in the ambient space. The normal space has the matrix characterization . Symmetric matrices for all parameterize the normal space. Finally, the operator is given as follows.

###### Proposition 2.

The quotient manifold (5) endowed with the metric (9) admits the tangent space projector defined as

 Ψx(\bf Y\bf U1,\bf Y\bf U2,\bf Y\bf U% 3,\bf Y\mathbfcalG)=(\bf Y\bf U1−\bf U1\bf S\bf U1(\bf G1\bf GT1)−1,\bf Y\bf U% 2−\bf U2\bf S\bf U2(\bf G2\bf GT2)−1,\bf Y\bf U3−\bf U3\bf S\bf U3(\bf G3\bf GT3)−1,% \bf Y\mathbfcalG), (10)

where is the solution to the Lyapunov equation for .

The Lyapunov equations in Proposition 2 are solved efficiently with the Matlab’s lyap routine.

The horizontal space projection of a tangent vector is obtained by removing the component along the vertical space. The vertical space has the matrix characterization

. Skew symmetric matrices

for all parameterize the vertical space. Finally, the horizontal projection operator is given as follows.

###### Proposition 3.

The quotient manifold (5) endowed with the metric (9) admits the horizontal projector defined as

 Πx(ηx)=(η\bf U1−\bf U1Ω1,η\bf U2−\bf U% 2Ω2,η\bf U3−\bf U3Ω3,η\mathbfcalG−(−(\mathbfcalG×1Ω1+\mathbfcalG×2Ω2+\mathbfcalG×3Ω3))),

where and is a skew-symmetric matrix of size that is the solution to the coupled Lyapunov equations

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩\bf G1\bf GT1Ω1+Ω1\bf G1\bf GT1−% \bf G1(\bf Ir3⊗Ω2)\bf GT1−\bf G1(Ω3⊗\bf Ir2)\bf GT1=Skew(\bf UT1η\bf U% 1\bf G1\bf GT1)+Skew(\bf G1ηT\bf G1),\bf G2\bf GT2Ω2+Ω2\bf G% 2\bf GT2−\bf G2(\bf Ir3⊗Ω1)\bf GT2−\bf G2(Ω3⊗% \bf Ir1)\bf GT2=Skew(\bf UT2η\bf U% 2\bf G2\bf GT2)+Skew(\bf G2ηT\bf G2),\bf G3\bf GT3Ω3+Ω3\bf G% 3\bf GT3−\bf G3(\bf Ir2⊗Ω1)\bf GT3−\bf G3(Ω2⊗% \bf Ir1)\bf GT3=Skew(\bf UT3η\bf U% 3\bf G3\bf GT3)+Skew(\bf G3ηT\bf G3), (11)

where extracts the skew-symmetric part of a square matrix, i.e., .

The coupled Lyapunov equations (11) are solved efficiently with the Matlab’s pcg routine that is combined with a specific symmetric preconditioner resulting from the Gauss-Seidel approximation of (11). For the variable , the preconditioner is of the form . Similarly, for the variables and .

Retraction. A retraction is a mapping that maps vectors in the horizontal space to points on the search space and satisfies the local rigidity condition Absil et al. (2008). It provides a natural way to move on the manifold along a search direction. Because the total space has the product nature, we can choose a retraction by combining retractions on the individual manifolds, i.e., where and extracts the orthogonal factor of a full column rank matrix, i.e., . The retraction defines a retraction on the quotient manifold , as the equivalence class does not depend on specific matrix representations of and , where is the horizontal lift of the abstract tangent vector .

Vector transport. A vector transport on a manifold is a smooth mapping that transports a tangent vector at to a vector in the tangent space at a point . It is defined by the symbol . It generalizes the classical concept of translation of vectors in the Euclidean space to manifolds (Absil et al., 2008, Section 8.1.4). The horizontal lift of the abstract vector transport on has the matrix characterization , where and are the horizontal lifts in of and that belong to . and are projectors defined in Propositions 2 and 3. The computational cost of transporting a vector solely depends on the projection and retraction operations.