# On the convergence of Jacobi-type algorithms for Independent Component Analysis

Jacobi-type algorithms for simultaneous approximate diagonalization of symmetric real tensors (or partially symmetric complex tensors) have been widely used in independent component analysis (ICA) because of its high performance. One natural way of choosing the index pairs in Jacobi-type algorithms is the classical cyclic ordering, while the other way is based on the Riemannian gradient in each iteration. In this paper, we mainly review our recent results in a series of papers about the weak convergence and global convergence of these Jacobi-type algorithms, under both of two pair selection rules. These results are mainly based on the Lojasiewicz gradient inequality.

## Authors

• 6 publications
• 12 publications
• 9 publications
11/02/2019

### Jacobi-type algorithm for low rank orthogonal approximation of symmetric tensors and its convergence analysis

In this paper, we propose a Jacobi-type algorithm to solve the low rank ...
09/28/2020

### Gradient based block coordinate descent algorithms for joint approximate diagonalization of matrices

In this paper, we propose a gradient based block coordinate descent (BCD...
12/22/2019

### Polar decomposition based algorithms on the product of Stiefel manifolds with applications in tensor approximation

In this paper, based on the matrix polar decomposition, we propose a gen...
03/27/2018

### Gradient Algorithms for Complex Non-Gaussian Independent Component/Vector Extraction

We address the problem of extracting one independent component from an i...
07/30/2019

### Classical and Quantum Algorithms for Tensor Principal Component Analysis

We present classical and quantum algorithms based on spectral methods fo...
10/05/2021

### Jacobi-type algorithms for homogeneous polynomial optimization on Stiefel manifolds with applications to tensor approximations

In this paper, we mainly study the gradient based Jacobi-type algorithms...
07/18/2020

### Improved Convergence Speed of Fully Symmetric Learning Rules for Principal Component Analysis

Fully symmetric learning rules for principal component analysis can be d...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Let be a smooth submanifold, and

 f:M→R+

be a differentiable function. In this paper, we mainly study the problem to find such that

 x∗=argmaxx∈Mf(x), (1)

and review our recent results in [1, 2, 3, 4] about weak convergence111every accumulation point is a stationary point. and global convergence222for any starting point, the iterations converge to a single limit point. of the Jacobi-type algorithms in Section II. The following examples from Independent Component Analysis (ICA) [5, 6, 7] in Example I.1 and Example I.2 are all special cases of problem (1), where is the orthogonal group or the unitary group .

We first set some notations. Let be the space of th order real tensors and be the set of symmetric ones [8, 9], whose entries do not change under any permutation of indices. Let be the space of th order complex tensors. For (or ), we denote by

the vector of its diagonal elements. We denote by

the Frobenius norm of a tensor or a matrix, or the Euclidean norm of a vector. Tensor arrays, matrices, and vectors, will be respectively denoted by bold calligraphic letters, e.g., , with bold uppercase letters, e.g., , and with bold lowercase letters, e.g., ; corresponding entries will be denoted by , , and . Operator denotes contraction on the th index of a tensor; when contracted with a matrix, it is understood that summation is always performed on the second index of the matrix. For instance, .

###### Example I.1 ([1, Section 1]).

Let . The simultaneous approximate diagonalization of symmetric real tensors problem is to find that maximizes

 f(Q)=L∑ℓ=1∥\operator@fontdiag{W(ℓ)}∥2, (2)

where This problem has the following well-known problems in ICA as special cases:
(i) approximate tensor diagonalization problem [10, 6], if and ;
(ii) simultaneous approximate matrix diagonalization problem [11], if and .

###### Example I.2 ([3, Section 2.2]).

(i) Simultaneous approximate diagonalization of Hermitian matrices. Let , be Hermitian matrices. The cost function is defined as

 f(U)=L∑ℓ=1∥\operator@fontdiag{W(ℓ)}∥2, (3)

where .
(ii) Approximate diagonalization of a partially symmetric 3rd order tensor. Let satisfy the partial symmetry condition

 Aijk=Aikj (4)

for any . The cost function is defined as:

 f(U)=∥\operator@fontdiag{W}∥2,

where .
(iii) Approximate diagonalization of a partially symmetric 4th order tensor. Let satisfy the partial symmetry conditions

 Bijkl=Bjikl  and Bijkl=B∗klij (5)

for any . The cost function is defined as

 f(U)=n∑p=1Vpppp,

where .

The motivation behind these cost functions comes from blind source separation [7]. In fact, the cost function (3) is used for simultaneous approximate diagonalization of covariance matrices [12, 13]. An example of the 3rd order tensor satisfying property (4) is the cumulant tensor of a complex random vector [14]. An example of the 4th order tensor satisfying property (5) is the cumulant tensor of a complex random vector [15]

, which may itself stem from a Fourier transform

[16].

## Ii Jacobi-type algorithms

### Ii-a General Jacobi algorithm on On

Let be an angle and be a pair of indices with . Denote by the Givens rotation matrix, as defined e.g., in [17, 7, 1]. The general Jacobi algorithm for problem (1) on can be summarized as follows.

###### Algorithm 1.

(General Jacobi algorithm on )
Input: A starting point .
Output: Sequence of iterations .

• For until a stopping criterion do

• Choose the pair according to a certain pair selection rule.

• Compute that maximizes

 hk(θ)def=f(Qk−1G(ik,jk,θ)). (6)
• Update .

• End for

### Ii-B General Jacobi algorithm on Un

Let and be a pair of indices with . Denote by the complex Givens transformation matrix, as defined e.g., in [3, 7]. The general Jacobi algorithm for problem (1) on can be summarized as follows.

###### Algorithm 2.

(General Jacobi algorithm on )
Input: A starting point .
Output: Sequence of iterations .

• For until a stopping criterion do

• Choose the pair according to a certain pair selection rule.

• Compute that maximizes

 hk(Ψ)def=f(Uk−1G(ik,jk,Ψ)). (7)
• Update .

• End for

###### Remark II.1.

In Algorithm 1, if several equivalent maximizers are present in (6), we choose the one with the angle of smallest magnitude. Algorithm 2 is similar.

### Ii-C Convergence of Jacobi-C algorithm on On

One natural pair selection rule in Algorithm 1 and Algorithm 2 is in cyclic fashion [17, 7] as follows:

 (1,2)→(1,3)→⋯→(1,n)→(2,3)→⋯→(2,n)→⋯→(n−1,n)→(1,2)→(1,3)→⋯. (8)

We call the Jacobi algorithm with cyclic rule (8) the Jacobi-C algorithm. This cyclic rule is used in the well-known Jacobi CoM2 algorithm [5, 6, 10, 7] and JADE algorithm [11].

###### Algorithm 3.

(Jacobi-C algorithm on )
Input: A starting point .
Output: Sequence of iterations .

• For until a stopping criterion do

• Choose the pair according to (8).

• Compute that maximizes (6).

• Update .

• End for

Let be as in (2) with and . The convergence properties of Algorithm 3 can be mainly seen in [2, Section 6] and [4, Section 5]. We do not detailedly present them here due to space constraints.

###### Remark II.2.

(i) It was shown in [2, Remark 6.5] that, if is as in (2) with and , then Algorithm 3 may converge to a saddle point of .
(ii) Let be as in [4, Equation (6)]. Then Algorithm 3 is the Jacobi-type algorithm for low rank orthogonal approximation of symmetric tensors in [4]. The global convergence of this algorithm is still unknown.

## Iii Convergence analysis of Jacobi-G algorithms

In this section, we show our recent results in [1, 2, 3, 4] about the weak convergence and global convergence of Algorithm 1 and Algorithm 2, under a gradient based pair selection rule. Before that, we need to present an important result based on the well-known Łojasiewicz gradient inequality[18, 19, 20, 21].

Let be the tangent space at . Let be the Euclidean gradient, and be the projection of on , i.e., the Riemannian gradient333see [22, Section 3.6] for a detailed definition.. The following results were proved in [21].

###### Lemma III.1.

Let be an analytic submanifold444See [23, Definition 2.7.1] or [1, Definition 5.1] for a definition of an analytic submanifold. and be real analytic. Then any point satisfies a Łojasiewicz inequality for , that is, there exist , and such that for all with , it holds that

###### Theorem III.2 ([21, Theorem 2.3]).

Let be an analytic submanifold and be a sequence. Suppose that is real analytic and, for large enough ,
(i) there exists such that

(ii) implies that .
Then any accumulation point of is the only limit point.

If, in addition, for some and for large enough it holds that

then the following convergence rates apply

 ∥xk−x∗∥≤C⎧⎨⎩e−ck, if% ζ=12 (for some c>0),k−ζ1−2ζ, if 0<ζ<12,

where is the parameter in (9) at the limit point .

### Iii-B Convergence of Jacobi-G algorithm on On

A gradient based pair selection rule of Algorithm 1 was proposed in [24], which chooses a pair at each iteration satisfying that

where is a small positive constant. We call the Jacobi algorithm with this rule the Jacobi-G algorithm.

###### Algorithm 4.

(Jacobi-G algorithm on )
Input: A starting point , a positive .
Output: Sequence of iterations .

• For until a stopping criterion do

• Choose the pair satisfying (10).

• Compute that maximizes (6).

• Update .

• End for

It was shown in [1, Lemma 3.1] (an easy generalisation of [24, Lemma 5.2]) that, in Algorithm 4, we can always choose such a pair satisfying (10). By combining this result and the proof of [24, Theorem 5.4], we can easily get the following result about the weak convergence of Algorithm 4.

###### Theorem III.3 ([1, Theorem 3.3]).

Let be . Then every accumulation point of Algorithm 4 is a stationary point of .

Based on Theorem III.2, the following global convergence result of Algorithm 4 was proved.

###### Theorem III.4 ([1, Theorem 5.6]).

Let be as in (2) with or . Then Algorithm 4 converges to a stationary point of , for any starting point .

### Iii-C Convergence of Jacobi-G algorithm on Un

Based on the similar idea as in Algorithm 4, the following Jacobi-G algorithm on was formulated in [3, Section 3].

###### Algorithm 5.

(Jacobi-G algorithm on )
Input: A starting point , a positive .
Output: Sequence of iterations .

• For until a stopping criterion do

• Choose an index pair satisfying

• Compute that maximizes (7).

• Update .

• End for

It was shown in [3, Corollary 3.3] that, in Algorithm 5, we can always choose such a pair satisfying (11). Then, based on [25, Theorem 2.5], the following weak convergence result of Algorithm 5 was proved in [3, Proposition 5.4].

###### Proposition III.5.

Suppose that has Lipschitz continuous gradient in the convex hull of . Then every accumulation point of Algorithm 5 is a stationary point of .

Let be defined as in [3, Section 2.4]. Assume that it always has the form

 h(i,j),U(c,s1,s2)=rTΓ(i,j,U)r+C, (12)

for all and , where

 rdef=(2c2−1,−2cs1,−2cs2)T,

the constant does not depend on , and is a symmetric matrix determined by . It was shown in [3, Lemma 2.3] that all the cost functions in Example I.2 satisfy (12). Now, we define

 D(i,j)Udef=2⎛⎝⎡⎣Γ(i,j,U)2,2Γ(i,j,U)2,3Γ(i,j,U)3,2Γ(i,j,U)3,3⎤⎦−Γ(i,j,U)1,1I2⎞⎠.

Let be the Riemannian Hessian555see [22, Section 5.5] for a detailed definition. of at , which is a linear map . Based on Theorem III.2, the following global convergence result of Algorithm 5 was proved in [3, Thoerem 7.4].

###### Theorem III.6.

Suppose that has Lipschitz continuous gradient in the convex hull of . Let be an accumulation point of Algorithm 5 (and thus by Proposition III.5), where (12) is satisfied. Assume that is negative definite for all pairs . Then
(i) is the only limit point and convergence rates in Theorem III.2 apply.
(ii) If the rank of Riemannian Hessian is maximal at (i.e., ), then the speed of convergence is linear.

## Iv Experiments

In this section, we conduct some experiments to see the convergence behaviours of Jacobi-C and Jacobi-G algorithms on .

###### Example IV.1.

We randomly generate tensors and . Let be as in (2) with . Let the starting point in Algorithm 3 and Algorithm 4, and in Algorithm 4 . The results are shown in Figure 1.

## V Conclusion

Thanks to the Łojasiewicz gradient inequality, we could establish the global convergence of Jacobi-G algorithms on and

. However, for the moment, there are still some open problems as follows.

(i) If is as in (2) with , the global convergence of Algorithm 4 is still unknown.
(ii) The global convergence of Algorithm 3 is still unknown.

## References

• [1] J. Li, K. Usevich, and P. Comon, “Globally convergent jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization,” SIAM J. Matr. Anal. Appl., vol. 39, no. 1, pp. 1–22, 2018.
• [2] ——, “On approximate diagonalization of third order symmetric tensors by orthogonal transformations,” Linear Algebra and its Applications, vol. 576, pp. 324–351, 2019.
• [3] K. Usevich, J. Li, and P. Comon, “Approximate matrix and tensor diagonalization by unitary transformations: convergence of jacobi-type algorithms,” arXiv:1905.12295, 2019.
• [4] J. Li, K. Usevich, and P. Comon, “Jacobi-type algorithm for low rank orthogonal approximation of symmetric tensors and its convergence analysis,” arXiv:1911.00659, 2019.
• [5] P. Comon, “Independent Component Analysis,” in Higher Order Statistics, J.-L. Lacoume, Ed.   Amsterdam, London: Elsevier, 1992, pp. 29–38.
• [6] ——, “Independent component analysis, a new concept?” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994.
• [7] P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation.   Oxford: Academic Press, 2010.
• [8] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, “Symmetric tensors and symmetric tensor rank,” SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 3, pp. 1254–1279, 2008.
• [9] L. Qi and Z. Luo, Tensor analysis: Spectral theory and special tensors.   SIAM, 2017.
• [10] P. Comon, “Tensor Diagonalization, A useful Tool in Signal Processing,” in 10th IFAC Symposium on System Identification (IFAC-SYSID), M. Blanke and T. Soderstrom, Eds., vol. 1.   Copenhagen, Denmark: IEEE, Jul. 1994, pp. 77–82.
• [11] J. Cardoso and A. Souloumiac, “Blind beamforming for non-gaussian signals,” IEE Proceedings F (Radar and Signal Processing), vol. 6, no. 140, pp. 362–370, 1993.
• [12] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-gaussian signals,” IEE Proceedings F-Radar and Signal Processing, vol. 140, no. 6, pp. 362–370, 1993.
• [13] ——, “Jacobi angles for simultaneous diagonalization,” SIAM journal on matrix analysis and applications, vol. 17, no. 1, pp. 161–164, 1996.
• [14] L. De Lathauwer, “Signal processing based on multilinear algebra,” Ph.D. dissertation, KUL, 1997.
• [15] P. Comon, “From source separation to blind equalization, contrast-based approaches,” in Int. Conf. on Image and Signal Processing (ICISP’01), Agadir, Morocco, May 2001, pp. 20–32, preprint: hal-01825729.
• [16]

B. Emile, P. Comon, and J. Le Roux, “Estimation of time delays with fewer sensors than sources,”

IEEE Transactions on Signal Processing, vol. 46, no. 7, pp. 2012–2015, July 1998.
• [17] G. Golub and C. Van Loan, Matrix Computations, 3rd ed.   Johns Hopkins University Press, 1996.
• [18] S. law Lojasiewicz, “Ensembles semi-analytiques,” IHES notes, 1965.
• [19] S. Łojasiewicz, “Sur la géométrie semi-et sous-analytique,” in Annales de l’institut Fourier, vol. 43, no. 5, 1993, pp. 1575–1595.
• [20] P. A. Absil, R. Mahony, and B. Andrews, “Convergence of the iterates of descent methods for analytic cost functions,” SIAM Journal on Optimization, vol. 16, no. 2, pp. 531–547, 2005.
• [21] R. Schneider and A. Uschmajew, “Convergence results for projected line-search methods on varieties of low-rank matrices via lojasiewicz inequality,” SIAM Journal on Optimization, vol. 25, no. 1, pp. 622–646, 2015.
• [22] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds.   Princeton, NJ: Princeton University Press, 2008.
• [23] S. Krantz and H. Parks, A Primer of Real Analytic Functions, ser. A Primer of Real Analytic Functions.   Birkhäuser Boston, 2002.
• [24] M. Ishteva, P.-A. Absil, and P. Van Dooren, “Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors,” SIAM J. Matrix Anal. Appl., vol. 2, no. 34, pp. 651–672, 2013.
• [25] N. Boumal, P.-A. Absil, and C. Cartis, “Global rates of convergence for nonconvex optimization on manifolds,” IMA Journal of Numerical Analysis, vol. 39, no. 1, pp. 1–33, 2018.