 # Krylov Iterative Methods for the Geometric Mean of Two Matrices Times a Vector

In this work, we are presenting an efficient way to compute the geometric mean of two positive definite matrices times a vector. For this purpose, we are inspecting the application of methods based on Krylov spaces to compute the square root of a matrix. These methods, using only matrix-vector products, are capable of producing a good approximation of the result with a small computational cost.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many problems in engineering, medicine and computer science make use of the geometric mean of two matrices , with and being positive definite matrices. These applications include the calculation of electrical networks 

, diffusion tensor imaging

 and image deblurring . For these applications there are methods to approximate a function of a matrix, as shown in  and , that are often used with acceptable results in term of computational cost.

But in the solution of elliptic partial differential equations with domain decomposition (such as in

 and ), in certain cases it is required to compute the geometric mean of two matrices times a vector , where and are often very large and sparse. A matrix of dimension is called sparse if defined as the number of its non-zero elements, it holds

 limn,m→∞S(n,m)nm=0. (1)

The use of algorithms created to compute could lead to unsustainable computational times or sometimes lead to the impossibility of calculating the result, since these methods do not exploit the sparse structure of data, that instead should represent an advantage and a saving in terms of operations.

For these reasons, methods based on Krylov spaces are preferred. In  and , the usage of a generalized version of the Lanczos method is exploited (we briefly present it in Section 3). In this work, we will use methods based on both polynomial and rational Krylov spaces in order to compute the geometric mean of two matrices times a vector in an efficient way, by exploiting the sparse structure of the involved matrices. In fact, these methods are easier to use, because they only compute matrix-vector products and solve linear systems with the conjugate gradient method. Moreover, their use as iterative methods is another advantage to reduce the computational cost. Specifically, we will focus on the Arnoldi method and on some variations of the rational Arnoldi method, explained in Section 3 and Section 4 respectively. Experiments presented in Section 5 show how these methods, especially rational Arnoldi ones, are efficient in terms of approximation and computational time.

## 2 Geometric Mean of Two Matrices

The geometric mean of two positive definite matrices is defined in  as

 A(A−1B)1/2. (2)

Let

be a diagonalizable matrix with positive eigenvalues, so that an invertible matrix

and a diagonal matrix exist such that . The square root of M is defined as

 M1/2=Kdiag(√λ1,…,√λn)K−1 (3)

that turns out to be still positive definite. The matrix is diagonalizable with positive eigenvalues, because it is similar to which is positive definite, so the notation makes sense. It can be shown that , then we use the notation without ambiguity. We observe also that

 A#B=A(A−1B)1/2=(BA−1)1/2A=B(B−1A)1/2B=B#A. (4)

We also have that, if both and are positive definite matrices, then so is .

To get an efficient computation of it is important not to use expensive operations. The most problematic part is to find a good approximation of without explicitly computing or the square root of . We recall the fact that is a Markov function, i.e. a function of the form

 f(z)=∫Γdγ(x)z−x, (5)

with complex measure over the closed set , because

 f(z)=z−1/2=∫0−∞1z−xdxπ√−x. (6)

## 3 Polynomial Krylov Spaces

Given and , the -th Krylov polynomial space associated with them is defined as

 Km(A,b)=span{b,Ab,A2b,…,Am−1b}⊆Cn. (7)

With the increase of , we get Krylov spaces all nested one inside another, . It can be shown that if , there exists such that

 dim Ki(A,b)={iif i

For each step the space has a basis , but from a certain on the new term is linearly dependent with the other vectors in the basis, giving that is the same of .

### 3.1 Generalized Lanczos Method

First we briefly recall the variation of the Lanczos method presented in  and , called the generalized Lanczos method (see ,  and  for an introduction to the Lanczos method). This iterative method is adapted in  in order to compute the geometric mean of two positive definite matrices times a vector. Given two positive definite matrices , the method constructs at the step a new vector that is -orthogonal, that is with , such that for each step we have

 AQk=BQkTk+βk+1Bqk+1eTkQ∗kBQk=Ik, (9)

with , the identity matrix and its -th column.

The matrix is the projection of into the space generated by the columns of the matrix . At the -th step this gives us the equalities

 Q∗AQ=T,Q∗BQ=I. (10)

Using this method we find an expression for (that is the same of ) as

 B#A=B(B−1A)1/2=(QT−1/2Q∗)−1=BQT1/2Q∗B. (11)

To compute , with , we can stop at the -th step, and taking , we have

 (B#A)v≈BQkT1/2ke1∥v∥B, (12)

because , that is a better approximation the closer is to .

### 3.2 Arnoldi Method

The Arnoldi method, exposed in , is a general version of the Lanczos method that does not need the starting matrices to be positive definite. This method computes the decomposition

 Q∗AQ=H, (13)

where is unitary, and the vectors are its columns, and is upper Hessenberg. From the equality (13) we get that

 Aqk=k+1∑i=1hikqi,with k=1,…,n−1. (14)

The previous relation can be rewritten as

 hk+1,kqk+1=Aqk−k∑i=1hikqi=rk, (15)

and, because is unitary, we have that

 hik=q∗iAqkwith i=1,…,k. (16)

If , so , where . The Arnoldi method is a Krylov space method because

 span{q1,q2,…,qk}=span{q1,Aq1,…,Ak−1q1}, (17)

that is the vectors constructed by the Arnoldi method are a basis for the Krylov space . Used as an iterative method, at the -th step the Arnoldi method give us the factorization

 AQk=QkHk+hk+1,kqk+1eTk, (18)

where consist of the first column of , is a upper Hessenberg matrix and is the -th column of the identity matrix. Because

 Q∗kAQk=Hk, (19)

we have that is the projection of in . The orthonormalization of vectors comes from the modified Gram-Schmidt method (see ), but in finite precision arithmetic a loss of orthogonality can occur, as shown in .

Using the Arnoldi method we can approximate , starting with the vector . As shown in  and  it stands that, at the -th step, the approximation is

 fk=Qkf(Hk)e1∥b∥2=Qkf(Hk)Q∗kb, (20)

that is the same of calculating into the Krylov space and then expanding the result into the original space . So an approximation for is

 (A#B)v≈AQkH−1/2kQ∗kv=AQkH−1/2ke1∥v∥2. (21)

## 4 Rational Krylov Spaces

The definition of rational Krylov spaces is similar to the polynomial one, except for the presence of a denominator (see  and  for an introduction). Given and , and the sequence of polynomials of degree

 qm−1(z)=m−1∏j=1(1−z/ξj), (22)

in which the values are called poles and are numbers in the extended complex plane different from all the eigenvalues of and , the rational Krylov space of order associated to them is defined as

 Qm(A,b)=qm−1(A)−1span{b,Ab,…,Am−1b}⊆Cn. (23)

In the case in which for some the corresponding factor in (23) is replaced by .

In the previous definition it is not taken into account the case , but we can exclude every other value simply by using the new values and . This can be done considering the new polynomial that uses poles and the new matrix instead of . Like the polynomial ones, rational Krylov spaces are also of increasing dimension and are nested until a certain dimension from which they do not change anymore. Polynomial Krylov spaces can be seen as a special case of the rational ones, when every pole is , that is when .

### 4.1 Rational Arnoldi Method

To compute a basis for the rational Krylov space we can use the rational Arnoldi method, as proposed in  and in . Starting with , in the following iterations the vector is generated orthogonalizing

 xj=(I−A/ξj)−1Avj (24)

against the previous orthonormal vectors

. If , we have that

 xj=j+1∑i=1vihi,j, (25)

and , instead if we can proceed finding a new vector that is orthonormal to .

We also have that

 A(vj+j+1∑i=1vihi,jξ−1j)=j+1∑i=1vihi,j, (26)

that gives us the decomposition

 AVm(Im+HmDm)+Avm+1hm+1,mξ−1meTm=VmHm+vm+1ξ−1meTm, (27)

in which , is the identity matrix and is its -th column. Defining

 Hm––––=[Hmhm+1,meTm]andKm––––=[Im+HmDmhm+1,mξ−1meTm] (28)

we have that

 AVm+1Km––––=Vm+1Hm––––, (29)

where . Finally, if the last pole is , the decomposition is simplified into

 AVmKm=Vm+1Hm––––, (30)

where is the upper part of .

Computed the basis of , we can approximate as

 fRAm=Vmf(Am)V∗mb,where Am=V∗mAVm∈Cm×m. (31)

The benefit of this method is that, in many interesting cases, is a good approximation of also if is small: it is required to compute , but is small compared to itself. If the last pole is , it is not even necessary to compute , but it can be computed . Also and are small compared to , so computing or the product is possible. For our problem we have found the approximation

 (A#B)v≈AVm(HmK−1m)−1/2V∗mv=AVm(HmK−1m)−1/2e1∥v∥2, (32)

starting the method with .

From a Crouzeix theorem exposed in  we know that there exists a constant for which , where the second norm indicates the maximum absolute value of over the compact set on which the function is approximated, that in this case is a set containing the spectrum of . As explained in , this helps us to find a result of semi-optimality for the approximation given by the rational Arnoldi method: in fact, if is analytic in and we define with , where is a polynomial of degree , so it stands

 ∥f(A)b−fRAm∥2≤2C∥b∥2minrm∈Pm−1/qm−1∥f−rm∥Σ. (33)

It seems obvious that the choice of the poles is strongly connected to the function which we have to approximate. To find a small value of , it is necessary to search a sufficiently uniform approximation of over . Moreover, sometimes it is convenient to consider only a certain set for the poles disjoint from . For Markov functions, and especially for the function that appears in our problem, a good set is , and as we can use , with and respectively the minimum and maximum eigenvalues of (as above, see  for further informations). Computing the eigenvalues of can be computationally expensive, however only the extreme eigenvalues are required.

#### 4.1.1 Extended Krylov Method

A special case of the rational Arnoldi method is the so-called extended Krylov method, proposed in  and then investigated in  and . It simply chooses alternately the poles as and . This method is good for approximate Markov functions. Its benefit is obvious: the choice of the poles is completely a priori, without information on the spectrum of . It had been observed in  that in some cases this is equivalent to a rational Arnoldi method in which all the poles are chosen as an unique asymptotically optimal value. However, finding the optimal pole needs some informations on the spectrum of .

#### 4.1.2 Generalized Leja Points

An alternative way to choose poles, exposed in , is the so-called generalized Leja points method, that rely on some logarithmic potential theory’s instruments (for a complete mathematical explanation about it see  and ). Given two closed sets and , both of non-zero logarithmic capacity (the logarithmic capacity of a set is defined as , where is the Robin constant for ) and of positive distance, the pair is called a condenser, and we can associate a number to it called condenser capacity (still see  for further explanations). We now consider the sequence of functions

 sm(z)=(z−σ1)⋯(z−σm)(1−z/ξ1)⋯(1−z/ξm),with m=1,…,n, (34)

with nodes and poles . Our aim is to make this sequence in absolute value as large as possible over and as small as possible over . It can be shown that for this kind of sequences the relation

 limm→∞sup(supz∈Σ|sm(z)|infz∈Ξ|sm(z)|)1/m≥e−1/cap(Σ,Ξ). (35)

lasts.

To find sequences of functions in which this inequality is true is called the generalized Zolotarev problem for the condenser .

A practical method to obtain such functions is the following greedy algorithm: starting with and as points of minimum distance from and , the following points and are recursively determined in a way such that lasts

 maxz∈Σ|sj(z)|=|sj(σj+1)|andminz∈Ξ|sj(z)|=|sj(ξj+1)|. (36)

The points are called generalized Leja points.

We want to approximate the Markov function . In order to archive this, in  it is suggested to choose sets and as and . These sets are optimal respectively for the choice of nodes and poles for that function.

To approximate Markov functions, another way to choose the poles for the rational Arnoldi method is also qsuggested in : they are called adaptive poles. We consider the function

 sm(z)=∏mk=1(z−σk)qm−1, (37)

in which are the Ritz values of the projection of at the -th step, i.e. the eigenvalues of , and is the denominator associated with the corresponding rational Krylov space. Our aim is to minimize the error in the approximation for each step, and to do so we have to make uniformly large on the set choosing the next pole as

 minz∈Ξ|sm(z)|=|sm(ξm)|. (38)

This method is substantially a black-box method, because we do not need any information on the spectrum of , but we only have to compute the eigenvalues of the matrix , that are considerably less than the eigenvalues of itself.

## 5 Experiments and Conclusions

We are now going to consider some experiments in order to test the accuracy of the methods presented above. The first important thing is to analyse their convergence rate, that is how close they are to approximate the value of . For this experiment we have used two positive definite matrices and and a vector randomly generated. We have run the methods up to 30 steps for each one.

The results are shown in the following graph, where we have on the x-axis the number of steps (or the number of poles considered for the Minimax method and the Gauss-Chebyshev quadrature, explained in  and , respectively) and on the y-axis the relative errors with respect to the real value of in a logarithmic scale.

Figure 1 shows how some methods, like the rational Arnoldi method with Leja points or adaptive poles, behave similarly to the Minimax method (they have almost the same convergence rate, with peaks in their graphics due to some numerical instabilities), but for the latter the knowledge of the full spectrum of the matrices is required.

Another important aspect we had to consider is how much time is required by these methods to compute the results. In fact, taking advantage of the sparse structure, Krylov methods use less operations than those methods that do not exploit it. We have used matrices and structured respectively as the finite-difference discretization matrices of the 1D and 2D Laplacian of variable dimension and a vector with every component equals to . Both matrices are implemented using the Octave CSR sparse format.

 A=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣2−1−12−1⋱⋱⋱⋱⋱⋱−12−1−12⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦B=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣4−1−1−14−1−1⋱⋱⋱−1−1⋱⋱⋱−1−14−1−1−14⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

We have compared various methods: a method based on Schur decomposition presented in  (that directly compute and only later makes the product with ), the Minimax method presented in  (based on contour integrals) and two versions of the rational Arnoldi method, the one with Leja points and the adaptive one. Every method is run for 30 steps. We have measured the execution time of these methods with the Octave commands tic and toc (we have considered the time required to compute the extreme eigenvalues of with the Octave function eigs()

in the estimation for the rational Arnoldi method with Leja points).

Table 1 shows how the computational time required by the Minimax method and the method based on Schur decomposition grows very quickly. For matrices of dimension , the estimated times for the aforementioned methods become practically unsustainable, while methods based on Krylov spaces require only few seconds. These results show how important it is to preserve and exploit the sparse structure of the starting matrices in order to complete these computations in a reasonable time, gaining in terms of operations.

## References

• Anderson et al.  W. N. Anderson, T. D. Morley, and G. E. Trapp. Ladder networks, fixpoints, and the geometric mean. Circuits Systems Signal Process, 2(3):259–268, 1983.
• Arioli and Loghin  M. Arioli and D. Loghin.

Discrete interpolation norms with applications.

SIAM J. Numer. Anal., 47(4):2924–2951, 2009.
• Arioli et al.  M. Arioli, D. Kourounis, and D. Loghin. Discrete fractional Sobolev norms for domain decomposition preconditioning. IMA Journal of Numerical Analysis, 33(1):318–342, 2013.
• Beckermann and Reichel  B. Beckermann and L. Reichel. Error estimates and evaluation of matrix functions via the Faber transform. SIAM J. Numerical Analysis, 47(5):3849–3883, 2009.
• Bhatia  R. Bhatia. Positive definite matrices. Princeton University Press, 2007.
• Bini et al.  D. Bini, M. Capovani, and O. Menchi. Metodi numerici per l’algebra lineare. Zanichelli, 1988.
• Crouzeix  M. Crouzeix. Numerical range and functional calculus in Hilbert space. Journal of Functional Analysis, 244(2):668–690, 2006.
• Druskin and Knizhnerman  V. Druskin and L. Knizhnerman. Extended Krylov subspaces: approximation of the matrix square root and related functions. SIAM J. Numer. Anal., 19(3):755–771, 1998.
• Estatico and Benedetto  C. Estatico and F. Di Benedetto. Shift-invariant approximations of structured shift-variant blurring matrices. Numerical Algorithms, 62(4):615–635, 2013.
• Golub and Van Loan  G. H. Golub and C. F. Van Loan. Matrix computations. The Johns Hopkins University Press, 1989.
• Güttel  S. Güttel. Rational Krylov approximation of matrix functions: numerical methods and optimal pole selection. GAMM-Mitt., 36(1):8–31, 2013.
• Güttel and Knizhnerman  S. Güttel and L. Knizhnerman. Automated parameter selection for rational Arnoldi approximation of Markov functions. Proceedings in Applied Mathematics and Mechanics, 11:15–18, 2011.
• Hale et al.  N. Hale, N. J. Higham, and L. N. Trefethen. Computing , and related matrix functions by contour integrals. SIAM J. Numer. Anal., 46(5):2505–2523, 2008.
• Higham  N. J. Higham. Functions of matrices: theory and computation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
• Iannazzo  B. Iannazzo. The geometric mean of two matrices from a computational viewpoint. Technical report, 2011. Available on arXiv - http://arxiv.org/abs/1201.0101.
• Knizhnerman and Simoncini  L. Knizhnerman and V. Simoncini. A new investigation of the extended Krylov subspace method for matrix function evaluations. SIAM J. Numer. Linear Algebra Appl., 17(4):615–638, 2010.
• Moakher  M. Moakher. On the averaging of symmetric positive-definite tensors. Journal of Elasticity, 82(3):273–296, 2006.
• Parlett  B. N. Parlett. The symmetric eigenvalue problem. Prentice-Hall, 1980.
• Ransford  T. Ransford. Potential theory in the complex plane. Press Syndicate of the University of Cambridge, 1995.
• Saff  E. B. Saff. Logarithmic potential theory with applications to approximation theory. Technical report, 2010. Available on arXiv - arXiv:1010.3760.
• Simoncini  V. Simoncini. A new iterative method for solving large-scale Lyapunov matrix equations. SIAM J. Sci. Comput., 29(3):1268–1288, 2007.
• Trefethen and Bau  L. N. Trefethen and D. Bau. Numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.