Many problems in engineering, medicine and computer science make use of the geometric mean of two matrices , with and being positive definite matrices. These applications include the calculation of electrical networks 
, diffusion tensor imaging and image deblurring . For these applications there are methods to approximate a function of a matrix, as shown in  and , that are often used with acceptable results in term of computational cost.
But in the solution of elliptic partial differential equations with domain decomposition (such as in and ), in certain cases it is required to compute the geometric mean of two matrices times a vector , where and are often very large and sparse. A matrix of dimension is called sparse if defined as the number of its non-zero elements, it holds
The use of algorithms created to compute could lead to unsustainable computational times or sometimes lead to the impossibility of calculating the result, since these methods do not exploit the sparse structure of data, that instead should represent an advantage and a saving in terms of operations.
For these reasons, methods based on Krylov spaces are preferred. In  and , the usage of a generalized version of the Lanczos method is exploited (we briefly present it in Section 3). In this work, we will use methods based on both polynomial and rational Krylov spaces in order to compute the geometric mean of two matrices times a vector in an efficient way, by exploiting the sparse structure of the involved matrices. In fact, these methods are easier to use, because they only compute matrix-vector products and solve linear systems with the conjugate gradient method. Moreover, their use as iterative methods is another advantage to reduce the computational cost. Specifically, we will focus on the Arnoldi method and on some variations of the rational Arnoldi method, explained in Section 3 and Section 4 respectively. Experiments presented in Section 5 show how these methods, especially rational Arnoldi ones, are efficient in terms of approximation and computational time.
2 Geometric Mean of Two Matrices
The geometric mean of two positive definite matrices is defined in  as
Letand a diagonal matrix exist such that . The square root of M is defined as
that turns out to be still positive definite. The matrix is diagonalizable with positive eigenvalues, because it is similar to which is positive definite, so the notation makes sense. It can be shown that , then we use the notation without ambiguity. We observe also that
We also have that, if both and are positive definite matrices, then so is .
To get an efficient computation of it is important not to use expensive operations. The most problematic part is to find a good approximation of without explicitly computing or the square root of . We recall the fact that is a Markov function, i.e. a function of the form
with complex measure over the closed set , because
3 Polynomial Krylov Spaces
Given and , the -th Krylov polynomial space associated with them is defined as
With the increase of , we get Krylov spaces all nested one inside another, . It can be shown that if , there exists such that
For each step the space has a basis , but from a certain on the new term is linearly dependent with the other vectors in the basis, giving that is the same of .
3.1 Generalized Lanczos Method
First we briefly recall the variation of the Lanczos method presented in  and , called the generalized Lanczos method (see ,  and  for an introduction to the Lanczos method). This iterative method is adapted in  in order to compute the geometric mean of two positive definite matrices times a vector. Given two positive definite matrices , the method constructs at the step a new vector that is -orthogonal, that is with , such that for each step we have
with , the identity matrix and its -th column.
The matrix is the projection of into the space generated by the columns of the matrix . At the -th step this gives us the equalities
Using this method we find an expression for (that is the same of ) as
To compute , with , we can stop at the -th step, and taking , we have
because , that is a better approximation the closer is to .
3.2 Arnoldi Method
The Arnoldi method, exposed in , is a general version of the Lanczos method that does not need the starting matrices to be positive definite. This method computes the decomposition
where is unitary, and the vectors are its columns, and is upper Hessenberg. From the equality (13) we get that
The previous relation can be rewritten as
and, because is unitary, we have that
If , so , where . The Arnoldi method is a Krylov space method because
that is the vectors constructed by the Arnoldi method are a basis for the Krylov space . Used as an iterative method, at the -th step the Arnoldi method give us the factorization
where consist of the first column of , is a upper Hessenberg matrix and is the -th column of the identity matrix. Because
we have that is the projection of in . The orthonormalization of vectors comes from the modified Gram-Schmidt method (see ), but in finite precision arithmetic a loss of orthogonality can occur, as shown in .
4 Rational Krylov Spaces
The definition of rational Krylov spaces is similar to the polynomial one, except for the presence of a denominator (see  and  for an introduction). Given and , and the sequence of polynomials of degree
in which the values are called poles and are numbers in the extended complex plane different from all the eigenvalues of and , the rational Krylov space of order associated to them is defined as
In the case in which for some the corresponding factor in (23) is replaced by .
In the previous definition it is not taken into account the case , but we can exclude every other value simply by using the new values and . This can be done considering the new polynomial that uses poles and the new matrix instead of . Like the polynomial ones, rational Krylov spaces are also of increasing dimension and are nested until a certain dimension from which they do not change anymore. Polynomial Krylov spaces can be seen as a special case of the rational ones, when every pole is , that is when .
4.1 Rational Arnoldi Method
To compute a basis for the rational Krylov space we can use the rational Arnoldi method, as proposed in  and in . Starting with , in the following iterations the vector is generated orthogonalizing
against the previous orthonormal vectors. If , we have that
and , instead if we can proceed finding a new vector that is orthonormal to .
We also have that
that gives us the decomposition
in which , is the identity matrix and is its -th column. Defining
we have that
where . Finally, if the last pole is , the decomposition is simplified into
where is the upper part of .
Computed the basis of , we can approximate as
The benefit of this method is that, in many interesting cases, is a good approximation of also if is small: it is required to compute , but is small compared to itself. If the last pole is , it is not even necessary to compute , but it can be computed . Also and are small compared to , so computing or the product is possible. For our problem we have found the approximation
starting the method with .
From a Crouzeix theorem exposed in  we know that there exists a constant for which , where the second norm indicates the maximum absolute value of over the compact set on which the function is approximated, that in this case is a set containing the spectrum of . As explained in , this helps us to find a result of semi-optimality for the approximation given by the rational Arnoldi method: in fact, if is analytic in and we define with , where is a polynomial of degree , so it stands
It seems obvious that the choice of the poles is strongly connected to the function which we have to approximate. To find a small value of , it is necessary to search a sufficiently uniform approximation of over . Moreover, sometimes it is convenient to consider only a certain set for the poles disjoint from . For Markov functions, and especially for the function that appears in our problem, a good set is , and as we can use , with and respectively the minimum and maximum eigenvalues of (as above, see  for further informations). Computing the eigenvalues of can be computationally expensive, however only the extreme eigenvalues are required.
4.1.1 Extended Krylov Method
A special case of the rational Arnoldi method is the so-called extended Krylov method, proposed in  and then investigated in  and . It simply chooses alternately the poles as and . This method is good for approximate Markov functions. Its benefit is obvious: the choice of the poles is completely a priori, without information on the spectrum of . It had been observed in  that in some cases this is equivalent to a rational Arnoldi method in which all the poles are chosen as an unique asymptotically optimal value. However, finding the optimal pole needs some informations on the spectrum of .
4.1.2 Generalized Leja Points
An alternative way to choose poles, exposed in , is the so-called generalized Leja points method, that rely on some logarithmic potential theory’s instruments (for a complete mathematical explanation about it see  and ). Given two closed sets and , both of non-zero logarithmic capacity (the logarithmic capacity of a set is defined as , where is the Robin constant for ) and of positive distance, the pair is called a condenser, and we can associate a number to it called condenser capacity (still see  for further explanations). We now consider the sequence of functions
with nodes and poles . Our aim is to make this sequence in absolute value as large as possible over and as small as possible over . It can be shown that for this kind of sequences the relation
To find sequences of functions in which this inequality is true is called the generalized Zolotarev problem for the condenser .
A practical method to obtain such functions is the following greedy algorithm: starting with and as points of minimum distance from and , the following points and are recursively determined in a way such that lasts
The points are called generalized Leja points.
We want to approximate the Markov function . In order to archive this, in  it is suggested to choose sets and as and . These sets are optimal respectively for the choice of nodes and poles for that function.
4.1.3 Adaptive Poles
To approximate Markov functions, another way to choose the poles for the rational Arnoldi method is also qsuggested in : they are called adaptive poles. We consider the function
in which are the Ritz values of the projection of at the -th step, i.e. the eigenvalues of , and is the denominator associated with the corresponding rational Krylov space. Our aim is to minimize the error in the approximation for each step, and to do so we have to make uniformly large on the set choosing the next pole as
This method is substantially a black-box method, because we do not need any information on the spectrum of , but we only have to compute the eigenvalues of the matrix , that are considerably less than the eigenvalues of itself.
5 Experiments and Conclusions
We are now going to consider some experiments in order to test the accuracy of the methods presented above. The first important thing is to analyse their convergence rate, that is how close they are to approximate the value of . For this experiment we have used two positive definite matrices and and a vector randomly generated. We have run the methods up to 30 steps for each one.
The results are shown in the following graph, where we have on the x-axis the number of steps (or the number of poles considered for the Minimax method and the Gauss-Chebyshev quadrature, explained in  and , respectively) and on the y-axis the relative errors with respect to the real value of in a logarithmic scale.
Figure 1 shows how some methods, like the rational Arnoldi method with Leja points or adaptive poles, behave similarly to the Minimax method (they have almost the same convergence rate, with peaks in their graphics due to some numerical instabilities), but for the latter the knowledge of the full spectrum of the matrices is required.
Another important aspect we had to consider is how much time is required by these methods to compute the results. In fact, taking advantage of the sparse structure, Krylov methods use less operations than those methods that do not exploit it. We have used matrices and structured respectively as the finite-difference discretization matrices of the 1D and 2D Laplacian of variable dimension and a vector with every component equals to . Both matrices are implemented using the Octave CSR sparse format.
We have compared various methods: a method based on Schur decomposition presented in  (that directly compute and only later makes the product with ), the Minimax method presented in  (based on contour integrals) and two versions of the rational Arnoldi method, the one with Leja points and the adaptive one. Every method is run for 30 steps. We have measured the execution time of these methods with the Octave commands tic and toc (we have considered the time required to compute the extreme eigenvalues of with the Octave function eigs()
in the estimation for the rational Arnoldi method with Leja points).
Table 1 shows how the computational time required by the Minimax method and the method based on Schur decomposition grows very quickly. For matrices of dimension , the estimated times for the aforementioned methods become practically unsustainable, while methods based on Krylov spaces require only few seconds. These results show how important it is to preserve and exploit the sparse structure of the starting matrices in order to complete these computations in a reasonable time, gaining in terms of operations.
- Anderson et al.  W. N. Anderson, T. D. Morley, and G. E. Trapp. Ladder networks, fixpoints, and the geometric mean. Circuits Systems Signal Process, 2(3):259–268, 1983.
Arioli and Loghin 
M. Arioli and D. Loghin.
Discrete interpolation norms with applications.SIAM J. Numer. Anal., 47(4):2924–2951, 2009.
- Arioli et al.  M. Arioli, D. Kourounis, and D. Loghin. Discrete fractional Sobolev norms for domain decomposition preconditioning. IMA Journal of Numerical Analysis, 33(1):318–342, 2013.
- Beckermann and Reichel  B. Beckermann and L. Reichel. Error estimates and evaluation of matrix functions via the Faber transform. SIAM J. Numerical Analysis, 47(5):3849–3883, 2009.
- Bhatia  R. Bhatia. Positive definite matrices. Princeton University Press, 2007.
- Bini et al.  D. Bini, M. Capovani, and O. Menchi. Metodi numerici per l’algebra lineare. Zanichelli, 1988.
- Crouzeix  M. Crouzeix. Numerical range and functional calculus in Hilbert space. Journal of Functional Analysis, 244(2):668–690, 2006.
- Druskin and Knizhnerman  V. Druskin and L. Knizhnerman. Extended Krylov subspaces: approximation of the matrix square root and related functions. SIAM J. Numer. Anal., 19(3):755–771, 1998.
- Estatico and Benedetto  C. Estatico and F. Di Benedetto. Shift-invariant approximations of structured shift-variant blurring matrices. Numerical Algorithms, 62(4):615–635, 2013.
- Golub and Van Loan  G. H. Golub and C. F. Van Loan. Matrix computations. The Johns Hopkins University Press, 1989.
- Güttel  S. Güttel. Rational Krylov approximation of matrix functions: numerical methods and optimal pole selection. GAMM-Mitt., 36(1):8–31, 2013.
- Güttel and Knizhnerman  S. Güttel and L. Knizhnerman. Automated parameter selection for rational Arnoldi approximation of Markov functions. Proceedings in Applied Mathematics and Mechanics, 11:15–18, 2011.
- Hale et al.  N. Hale, N. J. Higham, and L. N. Trefethen. Computing , and related matrix functions by contour integrals. SIAM J. Numer. Anal., 46(5):2505–2523, 2008.
- Higham  N. J. Higham. Functions of matrices: theory and computation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
- Iannazzo  B. Iannazzo. The geometric mean of two matrices from a computational viewpoint. Technical report, 2011. Available on arXiv - http://arxiv.org/abs/1201.0101.
- Knizhnerman and Simoncini  L. Knizhnerman and V. Simoncini. A new investigation of the extended Krylov subspace method for matrix function evaluations. SIAM J. Numer. Linear Algebra Appl., 17(4):615–638, 2010.
- Moakher  M. Moakher. On the averaging of symmetric positive-definite tensors. Journal of Elasticity, 82(3):273–296, 2006.
- Parlett  B. N. Parlett. The symmetric eigenvalue problem. Prentice-Hall, 1980.
- Ransford  T. Ransford. Potential theory in the complex plane. Press Syndicate of the University of Cambridge, 1995.
- Saff  E. B. Saff. Logarithmic potential theory with applications to approximation theory. Technical report, 2010. Available on arXiv - arXiv:1010.3760.
- Simoncini  V. Simoncini. A new iterative method for solving large-scale Lyapunov matrix equations. SIAM J. Sci. Comput., 29(3):1268–1288, 2007.
- Trefethen and Bau  L. N. Trefethen and D. Bau. Numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.