I Introduction
Let be a smooth submanifold, and
be a differentiable function. In this paper, we mainly study the problem to find such that
(1) 
and review our recent results in [1, 2, 3, 4] about weak convergence^{1}^{1}1every accumulation point is a stationary point. and global convergence^{2}^{2}2for any starting point, the iterations converge to a single limit point. of the Jacobitype algorithms in Section II. The following examples from Independent Component Analysis (ICA) [5, 6, 7] in Example I.1 and Example I.2 are all special cases of problem (1), where is the orthogonal group or the unitary group .
We first set some notations. Let be the space of th order real tensors and be the set of symmetric ones [8, 9], whose entries do not change under any permutation of indices. Let be the space of th order complex tensors. For (or ), we denote by
the vector of its diagonal elements. We denote by
the Frobenius norm of a tensor or a matrix, or the Euclidean norm of a vector. Tensor arrays, matrices, and vectors, will be respectively denoted by bold calligraphic letters, e.g., , with bold uppercase letters, e.g., , and with bold lowercase letters, e.g., ; corresponding entries will be denoted by , , and . Operator denotes contraction on the th index of a tensor; when contracted with a matrix, it is understood that summation is always performed on the second index of the matrix. For instance, .Example I.1 ([1, Section 1]).
Let . The simultaneous approximate diagonalization of symmetric real tensors problem is to find that maximizes
(2) 
where
This problem has the following wellknown problems in ICA as special cases:
(i) approximate tensor diagonalization problem [10, 6], if and ;
(ii) simultaneous approximate matrix diagonalization problem [11], if and .
Example I.2 ([3, Section 2.2]).
(i) Simultaneous approximate diagonalization of Hermitian matrices. Let , be Hermitian matrices. The cost function is defined as
(3) 
where .
(ii) Approximate diagonalization of a partially symmetric 3rd order tensor.
Let satisfy the partial symmetry condition
(4) 
for any . The cost function is defined as:
where .
(iii) Approximate diagonalization of a partially symmetric 4th order tensor.
Let satisfy the partial symmetry conditions
(5) 
for any . The cost function is defined as
where .
The motivation behind these cost functions comes from blind source separation [7]. In fact, the cost function (3) is used for simultaneous approximate diagonalization of covariance matrices [12, 13]. An example of the 3rd order tensor satisfying property (4) is the cumulant tensor of a complex random vector [14]. An example of the 4th order tensor satisfying property (5) is the cumulant tensor of a complex random vector [15]
, which may itself stem from a Fourier transform
[16].Ii Jacobitype algorithms
Iia General Jacobi algorithm on
Let be an angle and be a pair of indices with . Denote by the Givens rotation matrix, as defined e.g., in [17, 7, 1]. The general Jacobi algorithm for problem (1) on can be summarized as follows.
Algorithm 1.
(General Jacobi algorithm on )
Input: A starting point .
Output: Sequence of iterations .

For until a stopping criterion do

Choose the pair according to a certain pair selection rule.

Compute that maximizes
(6) 
Update .

End for
IiB General Jacobi algorithm on
Let and be a pair of indices with . Denote by the complex Givens transformation matrix, as defined e.g., in [3, 7]. The general Jacobi algorithm for problem (1) on can be summarized as follows.
Algorithm 2.
(General Jacobi algorithm on )
Input: A starting point .
Output: Sequence of iterations .

For until a stopping criterion do

Choose the pair according to a certain pair selection rule.

Compute that maximizes
(7) 
Update .

End for
Remark II.1.
In Algorithm 1, if several equivalent maximizers are present in (6), we choose the one with the angle of smallest magnitude. Algorithm 2 is similar.
IiC Convergence of JacobiC algorithm on
One natural pair selection rule in Algorithm 1 and Algorithm 2 is in cyclic fashion [17, 7] as follows:
(8) 
We call the Jacobi algorithm with cyclic rule (8) the JacobiC algorithm. This cyclic rule is used in the wellknown Jacobi CoM2 algorithm [5, 6, 10, 7] and JADE algorithm [11].
Algorithm 3.
Let be as in (2) with and . The convergence properties of Algorithm 3 can be mainly seen in [2, Section 6] and [4, Section 5]. We do not detailedly present them here due to space constraints.
Remark II.2.
(i) It was shown in [2, Remark 6.5] that, if is as in (2) with and , then Algorithm 3 may converge to a saddle point of .
(ii) Let be as in [4, Equation (6)].
Then Algorithm 3 is the Jacobitype algorithm for low rank orthogonal approximation of symmetric tensors in [4].
The global convergence of this algorithm is still unknown.
Iii Convergence analysis of JacobiG algorithms
In this section, we show our recent results in [1, 2, 3, 4] about the weak convergence and global convergence of Algorithm 1 and Algorithm 2, under a gradient based pair selection rule. Before that, we need to present an important result based on the wellknown Łojasiewicz gradient inequality[18, 19, 20, 21].
Iiia Łojasiewicz gradient inequality
Let be the tangent space at . Let be the Euclidean gradient, and be the projection of on , i.e., the Riemannian gradient^{3}^{3}3see [22, Section 3.6] for a detailed definition.. The following results were proved in [21].
Lemma III.1.
Theorem III.2 ([21, Theorem 2.3]).
Let be an analytic submanifold
and be a sequence.
Suppose that is real analytic and, for large enough ,
(i) there exists such that
(ii) implies that .
Then any accumulation point of is the only limit point.
If, in addition, for some and for large enough it holds that
then the following convergence rates apply
where is the parameter in (9) at the limit point .
IiiB Convergence of JacobiG algorithm on
A gradient based pair selection rule of Algorithm 1 was proposed in [24], which chooses a pair at each iteration satisfying that
(10) 
where is a small positive constant. We call the Jacobi algorithm with this rule the JacobiG algorithm.
Algorithm 4.
It was shown in [1, Lemma 3.1] (an easy generalisation of [24, Lemma 5.2]) that, in Algorithm 4, we can always choose such a pair satisfying (10). By combining this result and the proof of [24, Theorem 5.4], we can easily get the following result about the weak convergence of Algorithm 4.
Theorem III.3 ([1, Theorem 3.3]).
Let be . Then every accumulation point of Algorithm 4 is a stationary point of .
Based on Theorem III.2, the following global convergence result of Algorithm 4 was proved.
Theorem III.4 ([1, Theorem 5.6]).
Let be as in (2) with or . Then Algorithm 4 converges to a stationary point of , for any starting point .
IiiC Convergence of JacobiG algorithm on
Based on the similar idea as in Algorithm 4, the following JacobiG algorithm on was formulated in [3, Section 3].
Algorithm 5.
(JacobiG algorithm on )
Input: A starting point , a positive .
Output: Sequence of iterations .

For until a stopping criterion do

Choose an index pair satisfying
(11) 
Compute that maximizes (7).

Update .

End for
It was shown in [3, Corollary 3.3] that, in Algorithm 5, we can always choose such a pair satisfying (11). Then, based on [25, Theorem 2.5], the following weak convergence result of Algorithm 5 was proved in [3, Proposition 5.4].
Proposition III.5.
Suppose that has Lipschitz continuous gradient in the convex hull of . Then every accumulation point of Algorithm 5 is a stationary point of .
Let be defined as in [3, Section 2.4]. Assume that it always has the form
(12) 
for all and , where
the constant does not depend on , and is a symmetric matrix determined by . It was shown in [3, Lemma 2.3] that all the cost functions in Example I.2 satisfy (12). Now, we define
Let be the Riemannian Hessian^{5}^{5}5see [22, Section 5.5] for a detailed definition. of at , which is a linear map . Based on Theorem III.2, the following global convergence result of Algorithm 5 was proved in [3, Thoerem 7.4].
Theorem III.6.
Suppose that has Lipschitz continuous gradient in the convex hull of .
Let be an accumulation point of Algorithm 5 (and thus by Proposition III.5), where (12) is satisfied.
Assume that is negative definite for all pairs .
Then
(i) is the only limit point and convergence rates in Theorem III.2 apply.
(ii) If the rank of Riemannian Hessian is maximal at (i.e., ), then the speed of convergence is linear.
Iv Experiments
In this section, we conduct some experiments to see the convergence behaviours of JacobiC and JacobiG algorithms on .
Example IV.1.
We randomly generate tensors and . Let be as in (2) with . Let the starting point in Algorithm 3 and Algorithm 4, and in Algorithm 4 . The results are shown in Figure 1.
V Conclusion
Thanks to the Łojasiewicz gradient inequality, we could establish the global convergence of JacobiG algorithms on and
. However, for the moment, there are still some open problems as follows.
(i) If is as in (2) with , the global convergence of Algorithm 4 is still unknown.
(ii) The global convergence of Algorithm 3 is still unknown.
References
 [1] J. Li, K. Usevich, and P. Comon, “Globally convergent jacobitype algorithms for simultaneous orthogonal symmetric tensor diagonalization,” SIAM J. Matr. Anal. Appl., vol. 39, no. 1, pp. 1–22, 2018.
 [2] ——, “On approximate diagonalization of third order symmetric tensors by orthogonal transformations,” Linear Algebra and its Applications, vol. 576, pp. 324–351, 2019.
 [3] K. Usevich, J. Li, and P. Comon, “Approximate matrix and tensor diagonalization by unitary transformations: convergence of jacobitype algorithms,” arXiv:1905.12295, 2019.
 [4] J. Li, K. Usevich, and P. Comon, “Jacobitype algorithm for low rank orthogonal approximation of symmetric tensors and its convergence analysis,” arXiv:1911.00659, 2019.
 [5] P. Comon, “Independent Component Analysis,” in Higher Order Statistics, J.L. Lacoume, Ed. Amsterdam, London: Elsevier, 1992, pp. 29–38.
 [6] ——, “Independent component analysis, a new concept?” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994.
 [7] P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation. Oxford: Academic Press, 2010.
 [8] P. Comon, G. Golub, L.H. Lim, and B. Mourrain, “Symmetric tensors and symmetric tensor rank,” SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 3, pp. 1254–1279, 2008.
 [9] L. Qi and Z. Luo, Tensor analysis: Spectral theory and special tensors. SIAM, 2017.
 [10] P. Comon, “Tensor Diagonalization, A useful Tool in Signal Processing,” in 10th IFAC Symposium on System Identification (IFACSYSID), M. Blanke and T. Soderstrom, Eds., vol. 1. Copenhagen, Denmark: IEEE, Jul. 1994, pp. 77–82.
 [11] J. Cardoso and A. Souloumiac, “Blind beamforming for nongaussian signals,” IEE Proceedings F (Radar and Signal Processing), vol. 6, no. 140, pp. 362–370, 1993.
 [12] J.F. Cardoso and A. Souloumiac, “Blind beamforming for nongaussian signals,” IEE Proceedings FRadar and Signal Processing, vol. 140, no. 6, pp. 362–370, 1993.
 [13] ——, “Jacobi angles for simultaneous diagonalization,” SIAM journal on matrix analysis and applications, vol. 17, no. 1, pp. 161–164, 1996.
 [14] L. De Lathauwer, “Signal processing based on multilinear algebra,” Ph.D. dissertation, KUL, 1997.
 [15] P. Comon, “From source separation to blind equalization, contrastbased approaches,” in Int. Conf. on Image and Signal Processing (ICISP’01), Agadir, Morocco, May 2001, pp. 20–32, preprint: hal01825729.

[16]
B. Emile, P. Comon, and J. Le Roux, “Estimation of time delays with fewer sensors than sources,”
IEEE Transactions on Signal Processing, vol. 46, no. 7, pp. 2012–2015, July 1998.  [17] G. Golub and C. Van Loan, Matrix Computations, 3rd ed. Johns Hopkins University Press, 1996.
 [18] S. law Lojasiewicz, “Ensembles semianalytiques,” IHES notes, 1965.
 [19] S. Łojasiewicz, “Sur la géométrie semiet sousanalytique,” in Annales de l’institut Fourier, vol. 43, no. 5, 1993, pp. 1575–1595.
 [20] P. A. Absil, R. Mahony, and B. Andrews, “Convergence of the iterates of descent methods for analytic cost functions,” SIAM Journal on Optimization, vol. 16, no. 2, pp. 531–547, 2005.
 [21] R. Schneider and A. Uschmajew, “Convergence results for projected linesearch methods on varieties of lowrank matrices via lojasiewicz inequality,” SIAM Journal on Optimization, vol. 25, no. 1, pp. 622–646, 2015.
 [22] P.A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton, NJ: Princeton University Press, 2008.
 [23] S. Krantz and H. Parks, A Primer of Real Analytic Functions, ser. A Primer of Real Analytic Functions. Birkhäuser Boston, 2002.
 [24] M. Ishteva, P.A. Absil, and P. Van Dooren, “Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors,” SIAM J. Matrix Anal. Appl., vol. 2, no. 34, pp. 651–672, 2013.
 [25] N. Boumal, P.A. Absil, and C. Cartis, “Global rates of convergence for nonconvex optimization on manifolds,” IMA Journal of Numerical Analysis, vol. 39, no. 1, pp. 1–33, 2018.
Comments
There are no comments yet.