Shifted Lanczos method for quadratic forms with Hermitian matrix resolvents

02/17/2020 ∙ by Keiichi Morikuni, et al. ∙ University of Tsukuba 0

Quadratic forms of Hermitian matrix resolvents involve the solutions of shifted linear systems. Efficient solutions use the shift-invariance property of Krylov subspaces. The Hermitian Lanczos method reduces a given vector and matrix to a Jacobi matrix (a real symmetric tridiagonal matrix with positive super and sub-diagonal entries) and approximates the quadratic form with the Jacobi matrix. This study develops a shifted Lanczos method that deals directly with the Hermitian matrix resolvent to extend the scope of problems that the Lanczos method can solve. We derive a matrix representation of a linear operator that approximates the resolvent by solving a Vorobyev moment problem associated with the shifted Lanczos method. We show that an entry of the Jacobi matrix resolvent can approximate the quadratic form. We show the moment-matching property of the shifted Lanczos method and give a sufficient condition such that the method does not break down. Numerical experiments on matrices drawn from real-world applications compare the proposed method with previous methods and show that the proposed method outperforms a well-established method in solving some problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider the computation of quadratic forms

(1.1)

where denotes the complex conjugate transpose of a vector , is a Hermitian matrix and . Here, is assumed to be invertible. This kind of problem arises in chemistry and physics (Liesen & Strakoš, 2013, Section 3.9), eigensolvers using complex moments (Sakurai & Tadano, 2007)

, the stochastic estimation of the number of eigenvalues 

(Maeda et al., 2015), samplers for determinantal point processes (Li et al., 2016, and references therein)

and the approximation of Markov chains in Bayesian sampling 

(Johndrow et al., 2017). An extension of the present single-vector case in (1.1) for real to the multiple-vector case , , …, , namely,

can be reduced to the solutions of bilinear forms , , , , …, , where is the th column of . The bilinear form can be reduced to the quadratic form

where and  (cf.  Golub & Meurant, 2010, p. 114). This kind of problem arises in the analysis of dynamical systems (Bai & Golub, 2002).

A straightforward approach to the quadratic form (1.1) is to solve the shifted linear systems

(1.2)

and compute for , , …, . The development of efficient solutions for solving shifted linear systems (1.2) with real symmetric has been on-going for two decades; they include shifted Krylov subspace methods such as variants of the conjugate orthogonal conjugate gradient (COCG) method (van der Vorst & Melissen, 1990), the quasi-minimal residual (QMR) method (Freund & Nachtigal, 1991; Freund, 1992) and the conjugate orthogonal conjugate residual (COCR) method (Sogabe & Zhang, 2007), proposed by Takayama et al. (2006), Sogabe et al. (2008) and Sogabe & Zhang (2011), respectively. These methods use the shift-invariance property of the Krylov subspace , where . Hence, a basis of the Krylov subspace is formed by using a real symmetric matrix instead of a shifted matrix . Thus, they determine an iterate for a seed linear system and then associated iterates for shifted linear systems by taking the iterative residual vectors of the shifted linear systems to be collinear to that of the seed linear system. These methods may suffer from breakdown, i.e. division by zero, although this happens very rarely in practice. Extensions of the MINRES method (Paige & Saunders, 1975) to the shifted linear system (1.2) work with Hermitian  (Gu & Liu, 2013; Seito et al., 2019). The MINRES-based approaches do not break down for nonsingular and use only real arithmetic for the matrix–vector product for real .

The biconjugate gradient (BiCG) method does not exploit the symmetry of a coefficient matrix and requires two matrix–vector products per iteration for non-Hermitian matrices (Strakoš & Tichý, 2011), whereas the shifted COCG, COCR and MINRES methods require one matrix–vector product per iteration for (1.2).

In this study, we focus on computing the quadratic form (1.1) without directly solving the shifted linear systems (1.2). The key feature of our approach is the development of a shifted Lanczos method. The Hermitian Lanczos method projects an original model represented by and to a lower-dimensional model and matches the lowest-order moments of the original model with those of the reduced model, whereas the shifted Lanczos method retains such properties of the Hermitian Lanczos method. To derive the method, the Vorobyev moment problem (Vorobyev, 1965; Brezinski, 1996; Strakoš, 2009; Liesen & Strakoš, 2013) is useful. The Vorobyev moment problem gives a matrix representation of a linear operator that represents the reduced model to approximate the resolvent . We show that the entry of the Jacobi matrix resolvent can approximate the quadratic form (1.1) with eight additional operations to the Lanczos method for each shift by using a recursive formula devised by Golub & Meurant (2010). Moreover, we give a sufficient condition such that the proposed method does not break down. To the best of the author’s knowledge, this study is the first of its kind to investigate the effectiveness of the Lanczos method for directly approximating the quadratic form of a Hermitian matrix resolvent.

The outline of this paper is as follows. In Section 2, we review the Lanczos method and its moment-matching property. In Section 3, we describe a shifted Lanczos method for computing the quadratic forms (1.1), show its moment-matching property and implementation, discuss related methods, and give a sufficient breakdown-free condition. In Section 4, we present the results of numerical experiments, comparing the shifted Lanczos method with previous methods, and provide an error estimate for the shifted Lanczos method. In Section 5, we conclude the paper.

2 Lanczos method

Consider the application of the Lanczos method (Lanczos, 1952) to to approximate the resolvent . For convenience, we omit subscript of . An algorithm for the Lanczos method is given as Algorithm 2.1.

0:  , , , , , …,
0:  , , , , , …,
1:  , ,
2:  for  do
3:     ,
4:     if  then break
5:     , ,
6:  end for
Algorithm 2.1 Lanczos method

Here, denotes the Euclidean norm. Denote the Lanczos decomposition of by

(2.1)

where the columns of form an orthonormal basis of the Krylov subspace and is the Jacobi matrix (real symmetric tridiagonal matrix with positive super and sub-diagonal entries)

where is the th standard basis vector. Then holds.

2.1 Hamburger moment problem

The Lanczos method projects the original model given by a Hermitian matrix  and an initial vector  to a lower-dimensional model given by and , matching the lowest-order moments of the original model and reduced model; i.e. it approximates a given matrix  via moment matching. A thorough derivation of the moment-matching property (2.6) presented below is given in Liesen & Strakoš (2013, Chapter 3). Given a sequence of scalars , , , , …, , the problem of finding a non-decreasing real distribution function , , with points of increase such that the Riemann–Stieltjes integral is equal to the given sequence of scalars

(2.2)

is called the Hamburger moment problem (Hamburger, 1919, 1920a, 1920b, 1921). The left-hand side of (2.2) is called the th moment with respect to the distribution function . Set another moment as

(2.3)

and the distribution function with points of increase to

(2.4)

associated with weights , , , …, , where are the eigenvalues of and , , , …,

, are the corresponding eigenvectors. Here, for clarity of exposition, we assume that the eigenvalues of

are distinct without loss of generality. Then we can express the moment (2.3) as the Gauss–Christoffel quadrature and the quadratic form

Thus, the solution of (2.2) is given by

(2.5)

associated with weights , , , …, , where are the eigenvalues of and , , , …, , are the corresponding eigenvectors. Because the Gauss–Christoffel quadrature is exact for polynomials up to degree , we have

and the first moments match

(2.6)

2.2 Model reduction via Vorobyev moment matching

We can state the problem of moment matching in the language of matrices via the Vorobyev moment problem. We follow Vorobyev (1965) for the derivation of a linear operator given by reducing the model order of . (See also Brezinski (1996) and Strakoš (2009).) First, we derive a linear operator that reduces the model order of . Let

where , , , …, are assumed to be linearly independent. Then, the Vorobyev moment problem involves determining a sequence of linear operators such that

where is the orthogonal projector onto . A linear operator reducing the model order of is given by

(2.7)

where the sequence is strongly convergent to  (Vorobyev, 1965, Theorem II). (See Brezinski (1996, Section 4.2) for the derivation of (2.7).) Hence, the first moments of the reduced model match those of the original model

(2.8)

See Appendix A for the derivation of (2.8).

3 Shifted Lanczos method

Next, we formulate a shifted Lanczos method. Application of Vorobyev’s method of moments to the shifted matrix and vector gives a matrix representation of a linear operator that represents the reduced model to approximate the resolvent . We first solve the problem for the linear operator . Let

where , , , …, are assumed to be linearly independent. Then, the Vorobyev moment problem involves determining a sequence of linear operators such that

(3.1)

where is the orthogonal projector onto . Equations (3.1) can be written as

An arbitrary vector in has the expansion

where . Multiplying both sides by , we have

Projecting this onto , we have

Here, the first equality is due to

This shows that on . Since for , we can obtain the expression

by extending the domain to the whole space .

Using the expression , we show that the moments of the original model with and and those of the reduced model with and match. Subtracting (2.1) from the identity , we obtain the Lanczos-like decomposition

(3.2)

Note that direct application of the Lanczos method to does not necessarily give the decomposition (3.2). Multiplying this by , we have

This gives the orthogonally projected restriction

Note that the sequence is strongly convergent to  (Vorobyev, 1965, Theorem II). By using the matching moments for and the binomial formula, we have

where denotes the binomial coefficient. Therefore, the reduced model matches the first moments of the original model

Note that the left-hand side is equal to

with distribution function (2.4) and the right-hand side is equal to

with distribution function (2.5(cf. Szegö, 1959, Chapter 15).

Let us now come back to the approximation of . Because

is the matrix representation of the inverse of the reduced-order operator restricted onto  (Hoffman & Kunze, 1971, p. 79), an approximation is given by . Hence, we have

(3.3)

3.1 Implementation

The quantity (3.3) is the entry of the resolvent of a successively enlarging tridiagonal matrix. An efficient recursive formula for computing such an entry was developed in Golub & Meurant (2010, Section 3.4). The entry of the resolvent for the successively enlarging Jacobi matrix  is recursively computed with , , using

where , , and , as given in Algorithm 3.1. We summarise the procedures for approximating quadratic forms (1.1). The difference from Algorithm 2.1 is the addition of Lines 2 and 7. In particular, when is a real symmetric matrix and is a real vector in Algorithm 3.1, only real arithmetic is needed to compute Lines 1, 4 and 6, whereas Lines 2 and 7 require complex arithmetic.

0:  , , , , , …,
0:  , , , …,
1:  , ,
2:  , , , , , , …,
3:  for , until convergence do
4:     ,
5:     if  then break
6:     , ,
7:     , , , , , , , …,
8:  end for
Algorithm 3.1 Shifted Lanczos method for quadratic forms

We compare the shifted Lanczos method with related methods in terms of computational cost. Algorithms 3.2, 3.3 and 3.4 give simple modifications of the shifted COCG, COCR and MINRES methods, respectively, for computing quadratic forms (1.1). Here, is the seed shift that can be chosen among the target shifts , , , …, . The modifications are given by applying to the th iterate and associated vectors. They produce approximations , and , respectively, to the quadratic form (1.1). Note that the shifted COCG and COCR methods use complex arithmetic throughout for or , whereas the shifted MINRES method uses real arithmetic to compute the matrix–vector product for real and , similarly to the shifted Lanczos method.

For simplicity of comparison, we count basic vector and matrix operations. The Lanczos method (Algorithm 2.1) needs one vector scale (scale), one dot product (dot), one vector norm (norm), two scalar–vector additions (axpy) and one matrix–vector product (matvec) per iteration. Table 3.2 gives the number of basic vector and matrix operations of the shifted methods. Table 3.2 gives the number of scalar operations for each shift per iteration. These tables show that in terms of the cost per iteration, the shifted Lanczos method is the cheapest of the methods compared.

0:  , , , , , , …,
0:  , , , …,
1:  , , ,
2:  , , , , , …,
3:  for , until convergence do
4:     
5:     if  or  then switch the seed
6:     , , ,
7:     for , , …,  do
8:        
9:        if  then output
10:        , , , ,
11:     end for
12:  end for
Algorithm 3.2 Shifted COCG method for quadratic forms
0:  , , , , , , …,
0:  , , , …,
1:  , , , ,
2:  , , , , , , , …,
3:  for , until convergence do
4:     
5:     if  then switch the seed
6:     
7:     for , , …,  do
8:        
9:        if  then output
10:        , , ,
11:     end for
12:     ,
13:     if  then switch the seed
14:     
15:  end for
Algorithm 3.3 Shifted COCR method for quadratic forms
0:  , , , , , , , …,
0:  , , , …,
1:  , , , ,
2:  , , , , , …,
3:  for , until convergence do
4:     , , , ,
5:     for , , …,  do
6:        , ,
7:        if  then update
8:        if  then update
9:        Compute and update such that , , , .
10:        if  then output
11:        ,