# Polar decomposition based algorithms on the product of Stiefel manifolds with applications in tensor approximation

In this paper, based on the matrix polar decomposition, we propose a general algorithmic framework to solve a class of optimization problems on the product of Stiefel manifolds. We establish the weak convergence and global convergence of this general algorithmic approach based on the Łojasiewicz gradient inequality. This general algorithm and its convergence results are applied to the best rank-1 approximation, low rank orthogonal approximation and low multilinear rank approximation for higher order tensors. We also present a symmetric variant of this general algorithm to solve a symmetric variant of this class of optimization models, which essentially optimizes over a single Stiefel manifold. We establish its weak convergence and global convergence in a similar way. This symmetric variant and its convergence results are applied to the best symmetric rank-1 approximation and low rank symmetric orthogonal approximation for higher order tensors. It turns out that well-known algorithms such as HOPM, S-HOPM, LROAT, S-LROAT are all special cases of this general algorithmic framework and its symmetric variant.

## Authors

• 7 publications
• 10 publications
11/02/2019

### Jacobi-type algorithm for low rank orthogonal approximation of symmetric tensors and its convergence analysis

In this paper, we propose a Jacobi-type algorithm to solve the low rank ...
06/14/2019

### Optimal orthogonal approximations to symmetric tensors cannot always be chosen symmetric

We study the problem of finding orthogonal low-rank approximations of sy...
10/05/2021

### Jacobi-type algorithms for homogeneous polynomial optimization on Stiefel manifolds with applications to tensor approximations

In this paper, we mainly study the gradient based Jacobi-type algorithms...
12/16/2019

### On the convergence of Jacobi-type algorithms for Independent Component Analysis

Jacobi-type algorithms for simultaneous approximate diagonalization of s...
11/25/2019

### The Epsilon-Alternating Least Squares for Orthogonal Low-Rank Tensor Approximation and Its Global Convergence

The epsilon alternating least squares (ϵ-ALS) is developed and analyzed ...
04/22/2022

### Scalable symmetric Tucker tensor decomposition

We study the best low-rank Tucker decomposition of symmetric tensors, ad...
05/17/2022

### A General Framework for a Class of Quarrels: The Quarrelling Paradox Revisited

If a measure of voting power assigns greater voting power to a player be...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Theory and algorithms on optimization over manifolds have been developed and applied widely because of its practical relevance; see [3]. In particular, methods such as the Newton-type and trust-region algorithms have been proposed for optimization over Stiefel manifolds, which represents the orthogonal constraints [1, 3, 4, 18].

Let be the Stiefel manifold with . In this paper, we mainly study the optimization problem on the product of Stiefel manifolds, which is to maximize a smooth function

 f: Ω⟶R+, (1)

where

 Ωdef=St(r1,n1)×% St(r2,n2)×⋯×St(rd,nd)

is the product of Stiefel manifolds (). Let . If and for , then becomes

 Ωsdef=St(r,n)×St(r,n)×⋯×St(r,n).

Assume that the cost function (1) is symmetric on , that is,

 f(ω)=f(π(ω)) (2)

for any and permutation . In this paper, we also study the symmetric variant of problem (1), which is to maximize the smooth function

 g: St(r,n)⟶R+, U⟼f(U,U,⋯,U). (3)

Many low rank approximation problems for higher order tensors [8, 12, 29, 44], such as the best rank-1 approximation [15, 16, 17, 50], low rank orthogonal approximation [7, 28] and low multilinear rank approximation [15, 17, 23], can be formulated in the form of problem (1). These approximations have been widely used in various fields, including signal processing [12, 14, 20, 26, 38], numerical linear algebra [41, 42] and data analysis [5, 8, 29, 44]

. In particular, the symmetric variant of best rank-1 approximation is corresponding to finding the largest Z-eigenvalue

[42]. When the rank is equal to the dimension, the low rank orthogonal approximation boils down to the approximate orthogonal diagonalization [11, 34, 36, 37, 47] of symmetric tensors, which is central in Independent Component Analysis (ICA) [9, 10]. The low multilinear rank approximation is equivalent to the well-known Tucker decomposition [17, 29]

, which has been a popular method for data reduction in signal processing and machine learning. Therefore, it is desirable to develop a general algorithmic scheme to solve problem (

1) and its symmetric version (3).

Contribution. In this paper, based on the matrix polar decomposition, we propose an approach to be called APDOI (alternating polar decomposition based orthogonal iteration) (Algorithm 1) to solve problem (1). Then we establish its weak convergence111every accumulation point is critical. and global convergence222for any starting point, the iterations converge as a whole sequence. based on the Łojasiewicz gradient inequality. We apply APDOI and establish its convergence properties to the best rank-1 approximation (Section 6.1); the low rank orthogonal approximation (Section 7.1); the low multilinear rank approximation (Section 8) for higher order tensors. It turns out that the well-known methods HOPM [16, 17] (for best rank-1 approximation) and LROAT [7] (for low rank orthogonal approximation) are both special cases of APDOI. Our convergence results subsume the results found in the literature designed for those special cases.

Algorithm APDOI for the low multilinear rank approximation will be called LMPD (low multilinear rank approximation based on polar decomposition) in this paper. For this particular algorithm, we propose its shifted variant, LMPD-S, which has a better convergence property. Then we conduct experiments to show that LMPD and LMPD-S have comparable speed with the well-known HOOI [17] algorithm, and LMPD-S has a even much better convergence performance.

As the symmetric variant of APDOI, we propose a PDOI (polar decomposition based orthogonal iteration) approach (Algorithm 2) to solve problem (3). We establish its weak convergence and global convergence in a similar way. Then Algorithm PDOI and its convergence results are applied to the best symmetric rank-1 approximation (Section 6.2) and low rank symmetric orthogonal approximation (Section 7.2) for higher order tensors. It turns out that the well-known algorithms S-HOPM [17] (for best symmetric rank-1 approximation) and S-LROAT [7] (for low rank symmetric orthogonal approximation) are both special cases of PDOI. Our convergence results also subsume the results found in the literature designed for those special cases.

Organization. The paper is organized as follows. In Section 2, we introduce notations and preliminaries. In Section 3, we show the condition under which problems (1) and (3) will be studied in this paper. Then we prove that this condition is satisfied in the best rank-1 approximation, low rank orthogonal approximation and low multilinear rank approximation for higher order tensors. In Section 4, we propose Algorithm APDOI to solve problem (1), and establish its convergence. In Section 5, as a symmetric variant of APDOI, we propose Algorithm PDOI to solve problem (3), and establish its convergence. In Sections 8, 7 and 6, we apply these convergence results to the best rank-1 approximation, low rank orthogonal approximation and low multilinear rank approximation for higher order tensors, respectively.

## 2. Notations and preliminaries

### 2.1. Notations

Let be the linear space of -th order real tensors and be the set of symmetric ones, whose entries do not change under any permutation of indices [13, 41]

. Tensor arrays, matrices, and vectors, will be respectively denoted by bold calligraphic letters,

e.g., , with bold uppercase letters, e.g., , and with bold lowercase letters, e.g., ; corresponding entries will be denoted by , , and . Let and . We use the -mode product defined by

 (A∙iM)p1⋯j⋯pd=∑piAp1⋯pdMjpi.

The -mode unfolding of is denote by , which is the matrix obtained by reordering the -mode fibers333the fiber of a tensor is defined by fixing every index but one. of in a fixed way [29]. We denote by the i-rank, that is, the rank of . The vector is called the multilinear rank of . The diagonal of a tensor is defined by

Let be the orthogonal group. We denote by the Frobenius norm of a tensor or a matrix, or the Euclidean norm of a vector. We denote by the unit sphere of . We denote by

the minimal singular value of a matrix. Let

for .

Let be a Riemannian submanifold, and be a differentiable function. In this paper, for simplicity, we also denote by the natural extension of itself to . Let and be the tangent space of at . We denote by the Euclidean gradient of at , and the Riemannian gradient444see [3, Section 3.6] for a detailed definition. of at .

###### Definition 2.1.

([43, Definition 2.1]) The funciton is said to satisfy a Łojasiewicz gradient inequality at , if there exist , and a neighborhood in of such that for all , it follows that

###### Lemma 2.2 ([43, Proposition 2.2]).

Let be an analytic submanifold555See [31, Definition 2.7.1] or [36, Definition 5.1] for a definition of an analytic submanifold. and be a real analytic function. Then for any , satisfies a Łojasiewicz gradient inequality (4) in the -neighborhood of , for some666The values of depend on a specific point. and .

Based on the discrete-time analogue of classical Łojasiewicz’s theorem [2, 33, 39], the following result was proved [43, Theorem 2.3].

###### Theorem 2.3.

Let be an analytic submanifold and . Suppose that is real analytic and, for large enough ,
(i) there exists such that

(ii) implies that .
Then any accumulation point of must in fact be the same.

###### Remark 2.4.

It can be verified, after checking the proof in [43], that condition (ii) in Theorem 2.3 can be replaced by that implies .

## 3. Problem statement and tensor approximation examples

### 3.1. Problem statement

We first discuss the condition, under which problems (1) and (3) will be studied in this paper. This condition is motivated by the tensor approximation problems in Section 3.2.1, Section 3.3.1 and Section 3.4 .

#### 3.1.1. General case

Let

 Ω(i)def=St(r1,n1)×⋯×St(ri−1,ni−1)×St(ri+1,ni+1)×⋯×St(rd,nd)

be the product of Stiefel manifolds and . Let . Define

 h(i): St(ri,ni) ⟶R+, U ⟼f(ν(i),U)def=f(U(1),⋯,U(i−1),U,U(i+1),⋯,U(d)). (5)

In this paper, we assume that problem (1) always satisfies the restricted homogeneity condition, which means that there is some fixed such that, for any and , it holds that

 h(i)(U)=α⟨U,∇h(i)(U)⟩, (6)

for any . By the gradient inequality [6, (3.2)], it follows that the function (3.1.1) is convex if and only if

 ⟨U′−U,∇h(i)(U)⟩≤h(i)(U′)−h(i)(U),

for any , and then by condition (6), if and only if

 ⟨U′,∇h(i)(U)⟩≤(1−α)⟨U,∇h(i)(U)⟩+α⟨U′,∇h(i)(U′)⟩, (7)

for any . We say that the cost function (1) is partially convex if the function (3.1.1) is convex for any and .

#### 3.1.2. Symmetric case

In this paper, we assume that problem (3) always satisfies the homogeneity condition, which means that there is some fixed such that

 g(U)=β⟨U,∇g(U)⟩ (8)

for any . By the gradient inequality [6, (3.2)], it follows that the cost function (3) is convex if and only if

 ⟨U′−U,∇g(U)⟩≤g(U′)−g(U),

for any , and then by condition (8), if and only if

 ⟨U′,∇g(U)⟩≤(1−β)⟨U,∇g(U)⟩+β⟨U′,∇g(U′)⟩,

for any .

###### Remark 3.1.

Suppose that the cost function (1) is symmetric on and satisfies condition (6). Then the corresponding cost function (3) satisfies condition (8) with . In fact, for any , we define

 h: St(r,n)⟶R+, U⟼f(U0,⋯,U0,U).

Then we get that

 ∇g(U0)=d∇h(U0) (9)

by (2). It follows from condition (6) that

 g(U0)=h(U0)=α⟨U0,∇h(U0)⟩=αd⟨U0,∇g(U0)⟩.

Therefore, the cost function (3) satisfies condition (8) with .

### 3.2. Example: best rank-1 approximation

#### 3.2.1. General case

Let . The best rank-1 approximation problem [15, 16, 17, 50] is to find

 B∗=argmin∥A−B∥, (10)

where with and . It was proved [15, 17] that problem (10) is equivalent to the maximization of

 f(u(1),u(2),⋯,u(d))=|A∙1u(1)T∙2u(2)T⋯∙du(d)T|2, (11)

where for .

###### Lemma 3.2.

The cost function (11) satisfies (6) with and is partially convex.

###### Proof.

Let . Denote that

 v(i)def=A∙1u(1)T⋯∙i−1u(i−1)T∙i+1u(i+1)T⋯∙du(d)T.

Let and . Then we have that

 h(i)(u)=|⟨v(i),u⟩|2=⟨u,V(i)u⟩, ∇h(i)(u)=2V(i)u=2⟨v(i),u⟩v(i). (12)

It is easy to see that the cost function (11) satisfies condition (6) with . Moreover, by the Cauchy-Schwarz inequality, we have that

 ⟨u′,∇h(i)(u)⟩≤12(⟨u,∇h(i)(u)⟩+⟨u′,∇h(i)(u′)⟩)

for any . Then the cost function (11) is partially convex by (7). The proof is complete. ∎

#### 3.2.2. Symmetric case

Let . The best symmetric rank-1 approximation problem [17, 27, 30, 42, 51] is to find

 B∗=argmin∥A−B∥, (13)

where with and . It was proved [15, 17] that problem (13) is equivalent to the maximization of

 g(u)def=f(u,u,⋯,u)=|A∙1uT∙2uT⋯∙duT|2, (14)

where . By Remark 3.1 and Lemma 3.2, we see that the cost function (14) satisfies condition (8) with .

###### Lemma 3.3.

Suppose that is even. Let

 C=D∙1HT∙2HT⋯∙dHT (15)

where and is a diagonal tensor satisfying that with . Let and . Then, for any , the function

 τi:Rn×r⟶R+, X⟼|Wii⋯i|2

is convex.

###### Proof.

Let be the linear map and

 γi:Rm×r⟶R+, Y⟼|Wii⋯i|2,

where . Note that We see that is convex, and thus is also convex. ∎

###### Remark 3.4.

By Lemma 3.3, if takes the form in (15), then the cost function (14) is convex. This fact has also been proved by the method of square matrix unfolding in [27, (4.2)].

### 3.3. Example: low rank orthogonal approximation

#### 3.3.1. General case

Let and for . The low rank orthogonal approximation problem [7, 22, 28, 49] is to find

 B∗=argmin∥A−B∥, (16)

where

 B=r∑ℓ=1μℓu(1)ℓ⊗⋯⊗u(d)ℓ

satisfies that for any and . Let for . It is clear that

 B=D∙1U(1)∙2U(2)∙3⋯∙dU(d),

where is the diagonal tensor satisfying that . It was proved [7] that problem (16) is equivalent to the maximization of

 f(U(1),U(2),⋯,U(d))=∥\operator@fontdiag{W}∥2, (17)

where .

###### Lemma 3.5.

The cost function (17) satisfies (6) with and is partially convex.

###### Proof.

Let . Denote that

 V(i)def=A∙1U(1)T⋯∙i−1U(i−1)T∙i+1U(i+1)T⋯∙dU(d)T, v(i,q)def=A∙1u(1)qT⋯∙i−1u(i−1)qT∙i+1u(i+1)qT⋯∙du(d)qT∈Rni,

for . Let and . We can calculate that

 h(i)(U)=∥\operator@fontdiag{W}∥2=r∑q=1⟨v(i,q),uq⟩2, ∇h(i)(U)=2[v(i,1),⋯,v(i,r)]⎡⎢ ⎢⎣W1⋯1⋱Wr⋯r⎤⎥ ⎥⎦. (18)

Then it can be verified that

 h(i)(U)=12⟨U,∇h(i)(U)⟩.

Let and . By (18) and the Cauchy-Schwarz inequality, we get

 2⟨U′,∇h(i)(U)⟩ =2r∑q=1W′q⋯qWq⋯q ≤r∑q=1W′2q⋯q+r∑q=1W2q⋯q=⟨U′,∇h(i)(U′)⟩+⟨U,∇h(i)(U)⟩.

Then the cost function (17) is partially convex by (7). The proof is complete. ∎

#### 3.3.2. Symmetric case

Let and . Then low rank symmetric orthogonal approximation problem [7, 35, 40] is to find

 B∗=argmin∥A−B∥, (19)

where

 B=r∑ℓ=1μℓuℓ⊗⋯⊗uℓ

satisfies that for . It was proved [7] that problem (19) is equivalent to the maximization of

 g(U)def=f(U,U,⋯,U)=∥\operator@fontdiag{W}∥2, (20)

where .

###### Remark 3.6.

By Lemma 3.3, if takes the form in (15), then the cost function (20) is convex.

### 3.4. Example: low multilinear rank approximation

Let and for . The low multilinear rank approximation problem [15, 17, 32] is to find

 B∗=argmin∥A−B∥, (21)

where for . Note that any such can be decomposed as

 B=C∙1U(1)∙2U(2)∙3⋯∙dU(d),

with and for . Problem (21) is in fact equivalent to the following Tucker decomposition [45, 15, 17] problem

 min∥A−C∙1U(1)∙2U(2)∙3⋯∙dU(d)∥. (22)

It was proved [15, 17] that problem (22) is equivalent to the maximization of

 f(U(1),U(2),⋯,U(d))=∥W∥2, (23)

where and for .

###### Lemma 3.7.

The cost function (23) satisfies (6) with and is partially convex.

###### Proof.

Let . Denote that

 V(i)def=A∙1U(1)T⋯∙i−1U(i−1)T∙i+1U(i+1)T⋯∙dU(d)T.

Let and . By [29, Section 2.5], we see that

 h(i)(U)=∥W∥2=∥UTV(i)(i)∥2,

where with . Let . It can be verified that

 h(i)(U)=⟨U,V(i)U⟩, ∇h(i)(U)=2V(i)U, (24)

and thus

 h(i)(U)=12⟨U,∇h(i)(U)⟩.

Let . By the Cauchy-Schwarz inequality, it is easy to see that

 ⟨U′,V(i)U⟩≤12(⟨U,V(i)U⟩+⟨U′,V(i)U′⟩).

Then the cost function (23) is partially convex by (7) and (24). The proof is complete. ∎

###### Remark 3.8.

By (24), we see that the cost function (23) satisfies that is always positive semidefinite for any and .

## 4. Algorithm APDOI on the product of Stiefel manifolds

In this section, based on the matrix polar decomposition, we propose a general algorithm to solve problem (1), and establish its convergence. The algorithm and convergence results in this section apply to the best rank-1 approximation in Section 3.2.1, low rank orthogonal approximation in Section 3.3.1 and low multilinear rank approximation in Section 3.4.

### 4.1. Algorithm APDOI

By [3, (3.35)], the Riemannian gradient of (3.1.1) at is

Therefore, if is a maximal point of (3.1.1), it should satisfy that

 ∇h(i)(U(i)∗)=U(i)∗\rm sym(U(i)∗T∇h(i)(U(i)∗)), (25)

which is close to a polar decomposition form of .

###### Lemma 4.1.

([19, Theorem 9.4.1], [7, Lemma 5.3]) Let with . Then there exist and positive semidefinite matrix such that has the polar decomposition . We say that is the orthogonal polar factor and is the symmetric polar factor. Moreover,
(i) for any , we have that

 ⟨U,X⟩≥⟨U′,X⟩;

(ii) is the best orthogonal approximation to , that is,

 ∥X−U∥≤∥X−U′∥

for any ;
(iii) if , then and are both unique.

Let and . We denote that

 ν(k,i)def=(U(1)k,⋯,U(i−1)k,U(i+1)k−1,⋯,U(d)k−1)∈Ω(i), ω(k,i)def=(U(1)k,⋯,U(i)k,U(i+1)k−1,⋯,U(d)k−1)∈Ω,

and . Define

 h(k,i): St(ri,ni) ⟶R+, U ⟼f(ν(k,i),U)def=f(U(1)k,⋯,U(i−1)k,U,U(i+1)k−1,⋯,U(d)k−1).

Inspired by the decomposition form in (25), we propose the following alternating polar decomposition based orthogonal iteration (APDOI) algorithm to solve problem (1). We assume that the starting point satisfies that without loss of generality.

###### Algorithm 1.

(APDOI)
Input: starting point .
Output: , ,

• For

• For

• Compute .

• Compute the polar decomposition of .

• Update to be the orthogonal polar factor.

• End for

• Until convergence

###### Remark 4.2.

Note that (1) is smooth and is compact. Let . Then always holds in Algorithm 1.

###### Lemma 4.3.

Let be a differentiable function and . Suppose that is the orthogonal polar factor of .
(i) If , then .
(ii) If , then is symmetric. Furthermore, if is of full rank and is positive semidefinite, then .

###### Proof.

(i) Since , we see that is a polar decomposition. It follows that is symmetric, and thus

 ∇h(U)=UP=U\rm sym(UT∇h(U)).

Therefore, by [3, (3.35)], we get that .
(ii) By [3, (3.35)], we see that and thus

 2UT∇h(U)=UT∇h(U)+∇h(U)TU,

which means that is symmetric. Note that is positive semidefinite and is of full rank. Then is the unique polar decomposition of by Lemma 4.1(iii). Therefore, we get that . ∎

###### Remark 4.4.

By Lemma 4.3, we see that, in Algorithm 1, if is of full rank and