# Compound Krylov subspace methods for parametric linear systems

In this work, we propose a reduced basis method for efficient solution of parametric linear systems. The coefficient matrix is assumed to be a linear matrix-valued function that is symmetric and positive definite for admissible values of the parameter σ∈ℝ^s. We propose a solution strategy where one first computes a basis for the appropriate compound Krylov subspace and then uses this basis to compute a subspace solution for multiple σ. Three kinds of compound Krylov subspaces are discussed. Error estimate is given for the subspace solution from each of these spaces. Theoretical results are demonstrated by numerical examples related to solving parameter dependent elliptic PDEs using the finite element method (FEM).

There are no comments yet.

## Authors

• 1 publication
• 3 publications
03/10/2020

### A Least-Squares Finite Element Reduced Basis Method

We present a reduced basis (RB) method for parametrized linear elliptic ...
02/08/2021

### Infinite GMRES for parameterized linear systems

We consider linear parameter-dependent systems A(μ) x(μ) = b for many di...
11/17/2020

### Solving Symmetric and Positive Definite Second-Order Cone Linear Complementarity Problem by A Rational Krylov Subspace Method

The second-order cone linear complementarity problem (SOCLCP) is a gener...
02/11/2022

### The Factorial-Basis Method for Finding Definite-Sum Solutions of Linear Recurrences With Polynomial Coefficients

The problem of finding a nonzero solution of a linear recurrence Ly = 0 ...
05/12/2022

### Direct optimization of BPX preconditioners

We consider an automatic construction of locally optimal preconditioners...
10/07/2019

### Constraint-Preconditioned Krylov Solvers for Regularized Saddle-Point Systems

We consider the iterative solution of regularized saddle-point systems. ...
10/22/2020

### Krylov Subspace Recycling for Evolving Structures

Krylov subspace recycling is a powerful tool for solving long series of ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Denote by the set of real, symmetric, and positive definite (s.p.d.) – matrices. Let , parameter set , and be a linear matrix-valued function such that . This work concerns efficient solution of the linear system: find satisfying

 A(σ)x(σ)=b (1)

for multiple values of the parameter . We call defined point-wise by (1) as the parameter-to-solution map.

Our motivation for studying (1) arises from the solution of elliptic PDEs with spatially varying coefficient functions using the finite element method (FEM), see Section 2.2. As parameter dependent PDEs are related to several interesting engineering problems, their solution has attracted lots of attention. Research has been done both before and after spatial discretization. Parametric PDEs have especially been studied in the context of uncertainty quantification, where the parameter is typically related to a truncated Polynomial Chaos or Karhunen-Loève expansion of a random coefficient field, see [BaTeZo:04, BaNoTe:07, ScGi:11].

Currently, there exist three main approaches for the solution of (1) or the underlying parametric PDE. One can approximate the parameter-to-solution map using a Galerkin method in the parameter space, see [BaTeZo:04, ScGi:11]. These methods often combine discretization of spatial and parameter dimensions. Second alternative is to apply a collocation method, where

is first evaluated at collocation points and then approximated by interpolation,

[BaNoTe:07]

. To break the curse of dimensionality, several sparse and adaptive families of collocation points have been proposed

[BaCoDa:17]. Finally, one can construct a reduced basis or a subspace of that can accurately represent the solution for desired [QuMaNe:16]. The reduced basis is constructed by evaluating the solution at sampling points that are selected, e.g., by a greedy algorithm [BuMaPaPr:12]. The solution is then approximated point-wise by computing subspace solution using the reduced basis.

In this work, we propose a reduced basis method for the solution of (1) that is inspired by the Conjugate Gradient (CG) method. The solution of the linear system (1) for a single can be approximated efficiently using CG if the condition number of is close to one. The CG method is an iteration for finding a sequence of approximate solutions to linear systems with s.p.d. coefficient matrices, see [HeSt:1953] and [axelsson:1994]. Each CG-iterate is the subspace solution from the Krylov subspace corresponding to the linear system and the iteration index. The th Krylov subspace related to the model problem (1) is defined as

 Kj(A(σ),b):=span{b,A(σ)b,…,A(σ)j−1b} for any j∈N. (2)

Observe that is dependent on .

The convergence of CG is well studied; the estimated number of iterations required to compute an approximate solution with desired error grows with the condition number. The condition numbers of coefficient matrices related to the FE-solution of elliptic PDEs are large and increase when the applied finite element mesh is refined. Hence, if the CG method is used in this setting, a preconditioner is required to improve convergence.

In this work, we propose one kind of exact and two kinds of approximate compound Krylov (CK) subspace methods to efficiently compute approximate solutions to (1) for multiple . The computation proceeds in two stages:

1. Off-line stage: Compute a basis for the applied compound Krylov subspace.

2. On-line stage: Use the basis constructed in the off-line stage to compute subspace solution to (1) for multiple .

In practice, the computational cost related to the first stage is large whereas computing the subspace solution for a favorable

is fast. Therefore our proposed method is most beneficial when the parameter-to-solution map is evaluated for a large number of parameter vectors

.

The family of exact compound Krylov subspaces is designed to satisfy the inclusion

 Kj(A(σ),b)⊂CKj(A,b)for all σ∈S and j∈N. (3)

Observe that is independent of but dependent on as well as on the matrix-valued function . Due to the inclusion in (3) and the best approximation property of subspace methods, the subspace solution to (1) from is at least as accurate as the th iterate produced by the CG method for any .

Constructing a subspace satisfying the inclusion (3) requires treating the –dependency of the linear matrix-valued function . We use the linearity of and define as the union of subspaces containing the range of the mapping

 σ→A(σ)kbfork∈{1,…,j}. (4)

Particularly, we reformulate the terms as , where is a linearisation matrix independent of . This reformulation allows us to easily compute the range of that contains for any . Such linearisation process can be repeated for terms , and thus, to find a basis for the compound Krylov subspace . Special care must be taken to cope with the exponentially growing column dimension of the linearisation matrices. The dimension and the computational cost are reduced by two kinds of approximate compound Krylov subspaces that are defined by including low rank approximations to the linearisation process.

The proposed CK-solvers are subspace methods, and as such they produce the best possible approximation to the exact solution from the method subspace in the norm associated with the coefficient matrix . We take advantage of this property in error analysis. Particularly, we show that both kinds of approximate CK-method subspaces approximately contain a solution candidate appearing in the error analysis of the Conjugate Gradient (CG) method for any . Our final error estimate guarantees that the CK-solutions have a comparable error with the CG method if sufficiently accurate low rank approximations of the linearisation matrices are used.

This work is organised as follows. In Section 2 we give a brief review of subspace and Conjugate Gradient (CG) methods and discuss how Problem (1) is related to the FE-solution of the Poisson’s equation with varying material data or geometry. Compound Krylov subspaces are discussed in Section 3. Implementation of the CK method is outlined in Section 4. The proposed methods and analytical results are illustrated in Section 5 by numerical examples. We conclude with a discussion of the obtained results and future work in Section 6

## 2 Background

In this section, we first discuss linear matrix-valued functions and their representation. Then we give two examples of linear systems of the type (1) that are related to finite element solution of parametric PDEs. Finally, we briefly review subspace methods and CG error analysis that are a prerequisite for Section 3.

### 2.1 Linear matrix-valued functions

Linear matrix-valued functions are defined as usual:

###### Definition 1

Function is called linear if for any and

 F(σ1+σ2)=F(σ1)+F(σ2)andF(ασ1)=αF(σ1).

Naturally, any linear matrix-valued function admits the representation for independent of . Particularly, there exists independent of such that

 A(σ)=s∑i=1σiAifor anyσ∈Rs. (5)

### 2.2 Application in FEM

The motivation for our proposed methods comes from solving parameter dependent partial differential equations using the finite element method. Let domain

, where the dimension is or , have sufficiently regular boundary and . Consider the weak form of the modified Poisson’s equation: find satisfying

 ∫Ω∇uσ⋅C(σ,x)∇v=∫Ωfvfor all v∈H10(Ω). (6)

We are interested in solving this problem multiple times with different values of using finite elements. Assume the function is in the form

 C(σ,x)=s∑i=1ψi(x)σi, (7)

where . The function is chosen so that for a.e. and all . In this case, the Lax-Milgram lemma guarantees the existence of a unique solution to (6) for any , see, e.g. [evans:1998].

The weak problem (6) is solved using FEM by limiting it to some finite element space . This is, one solves the problem: find satisfying

 ∫Ω∇uσ,FE⋅C(σ,x)∇v=∫Ωfvfor all v∈VFE. (8)

The FE-space is finite dimensional and has a basis . The basis functions are defined with the help of a mesh, a partition of the domain to subdomains called elements. The maximum diameter of these elements is called the mesh size. For an introduction on FEM, see [Braess:2007, brenner_scott:1994].

The problem (8) is equivalent to the matrix equation: find satisfying , where and are defined as

 K(σ)ij=∫Ω∇ϕi⋅C(σ,x)∇ϕjand^bi=∫Ωfϕi% for i,j∈{1,…,n}. (9)

The accuracy of the approximate solution produced by the compound Krylov solver depends, among other things, on the condition number of the coefficient matrix, see Theorems 1 and 2. If the condition number is close to one, these methods converge rapidly. However, the condition number of defined in (9) grows with decreasing mesh size and is typically much larger than one leading to slow convergence of CK-methods, see [ToWi:05, Chapter B.6] for analysis in the case . To speed up convergence we improve conditioning by applying a split preconditioner. Let satisfy

 ¯Kij=∫∇ϕi⋅¯C(x)∇ϕjfori,j∈{1,…,n}, (10)

where the function satisfies the following: there exists such that

 αηT¯C(x)η≤ηTC(σ,x)η≤βηT¯C(x)η

for all and for a.e. . In addition, let be the Cholesky factor of so that . We can now write the linear system as

 R−1K(σ)R−TRT^x(σ)=R−1^b.

Redefining , , and yields the preconditioned linear system: find satisfying

 A(σ)x(σ)=b. (11)

The assumption (7) ensures that the matrix is a linear matrix-valued function. This is, we have arrived to an instance of (1). Next, we give an estimate for the condition number of the coefficient matrix in (11).

###### Lemma 1

Let and . Let be as defined in (9), as defined in (10), the Cholesky factor of , and . Assume that there exists such that

 αηT¯C(x)η≤ηTC(σ,x)η≤βηT¯C(x)η (12)

for a.e. and every . Then it holds for any that , where is the condition number of in the Euclidean norm.

###### Proof

We use the Rayleigh quotient

. According to it the smallest and largest eigenvalues of the matrix

are

 λmin=miny∈RnyTR−1K(σ)R−TyyTy,λmax=maxy∈RnyTR−1K(σ)R−TyyTy. (13)

We apply a change of variables and write the quotient as

 ^yTK(σ)^y^yT¯K^y.

Let be defined as for . By (9), the quotient can be written as:

 ∫∇v^y⋅C(σ,x)∇v^y∫∇v^y⋅¯C(x)∇v^y.

Using the assumptions in (12) we find that:

 α≤∫∇v^y⋅C(σ,x)∇v^y∫∇v^y⋅¯C(x)∇v^y≤βfor any ^y∈Rn.

Combining this with (13) yields estimate for the smallest and the largest eigenvalue of

 α≤λmin(A(σ))andλmax(A(σ))≤β

for any . Recalling the definition of the condition number completes the proof.

#### 2.2.1 Example 1: Piecewise constant material parameter

Let the subdomains be non-overlapping and satisfy . Let , and . We consider solving the problem: find satisfying

 ∑i∫Ωiσi∇uσ⋅∇v=∫Ωfvfor all v∈H10(Ω) (14)

for multiple . The parameter may physically correspond, for example, to the electrical conductivity in the subdomain . Eq. (14) is an instance of the abstract problem in (8) with

 C(σ,x)=s∑i=1σiIXΩi, (15)

where

is the characteristic function of the set

.

As a demonstration we use - checkerboard patterns where the domain is divided into subdomains by slicing it into horizontal and vertical strips, see Fig. 1. We choose the preconditioning coefficient matrix . Estimate for the condition number of corresponding to (14) and follows using Lemma 1. As , we have

 ηT(1001)η≤ηTC(σ,x)η≤ηT(a00a)ηfor a.e. x∈Ω and any η∈Rd.

Therefore and in (12), and .

#### 2.2.2 Example 2: Deformation of geometry

A slightly more complicated example of (1) is a problem where the parameter is related to deformation of the domain. As an example we solve the Poisson’s equation in a rectangular domain with a spherical hole in multiple positions along the y-direction. Let , , and . Consider solving the problem: find satisfying

 ∫Ωl∇ul⋅∇vl=∫Ωlvlfor all vl∈H10(Ωl) (16)

for multiple , where , . We reduced the problem (16) to the reference domain by using the coordinate transformation defined as

 Fl(^x)=^x+e2τ(^x2)l. (17)

Here specifies the length of the translation and is a piecewise linear function defined as follows:

 τ(t)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩3tt∈[0,13)1t∈[13,23]−3t+3t∈(23,1].

The domain before and after the transformation can be seen in Figure 2. For convenience, we name the three subdomains with different transformation rules from bottom to top as , and . The Jacobian of the transformation is

 DFl(^x)=(1001+τ′(^x2)l)

Applying the change of variables in (16) and defining yields the problem: find satisfying

 ∫ˆΩ^∇^ul⋅(DFl)−1(DFl)−Tdet(DFl)^∇^v=∫ˆΩ^vdet(DFl) (18)

for all . Observe that for , hence, its absolute value can be omitted. Expanding the left hand side of (18) gives

 ∫ˆΩ^vdet(DFl)=∫ˆΩ^v+l∫ˆΩif^vforf(^x)=τ′(^x2). (19)

Hence, the problem (18) can be solved in two parts: find satisfying:

 ∫ˆΩ^∇^ul0 ⋅(DFl)−1(DFl)−Tdet(DFl)^∇^v=∫ˆΩ^vdet(DFl) (20)

and

 ∫ˆΩ^∇^ul1⋅(DFl)−1(DFl)−Tdet(DFl)^∇^v=∫ˆΩf^vdet(DFl), (21)

for any . Then . Next, we reformulate the RHS so that both of these problems are instances of the abstract problem (6). The extra term induced by the change of variables to the RHS of (18) is:

 (DFl)−1(DFl)−Tdet(DFl)=(1+τ′(^x2)l00(1+τ′(^x2)l)−1).

Since is a piecewise linear function, the above matrix is piecewise constant with respect to the spatial variable and depends only on . Explicitly,

 (1+3l00(1+3l)−1)XˆΩ1+(1001)XˆΩ2+(1−3l00(1−3l)−1)XˆΩ3. (22)

We identify the parameters with elements in the above equation as

 σ1=1+3l,σ2=(1+3l)−1,σ5=1−3l,σ6=(1−3l)−1. (23)

The elements and equal to one since the Jacobian in the subdomain is the identity. The above relations and bound define the parameter set . The LHS of (20) corresponds to

 ∫ˆΩ∇^ul0⋅C(σ,x)∇^v (24)

for

 C(σ,x)=(σ100σ2)XˆΩ1(x)+IXˆΩ2(x)+(σ500σ6)XˆΩ3(x). (25)

Hence, it is an instance of the abstract problem (8). Same applies to (21).

Because the parameter , and also , vary symmetrically around , we choose the preconditioning coefficient matrix . We proceed to estimate the condition number of corresponding to (24) and . We obtain:

 (1−3a)ηTη≤ηTC(σ,x)η≤11−3aηTηfor a.e. x∈Ω and % η∈Rd.

This is, and in Eq. (12). By Lemma 1 the condition number satisfies

 κ2(A(σ))≤1/(1−3a)2for all σ∈S. (26)

It’s worth noting that the condition number blows up when approaches .

### 2.3 Subspace methods

Let , , be a subspace, a basis of , and . A subspace method computes an approximate solution to the linear system by first solving the auxiliary problem: find satisfying

 QTBQz=QTg,and then setting^y=Qz. (27)

We call such as the subspace solution from .

Any defines an inner product and the induced norm in : for any let

 ⟨z,w⟩B:=zTBwand∥z∥B:=⟨z,z⟩1/2B.

The subspace solution from is the -orthogonal projection of onto . Thus depends only on , not on the basis . For this reason, we call as the method subspace.

The error of the subspace solution is measured in the -norm as . Because is the -orthogonal projection of the exact solution to , the -norm of the error satisfies the best approximation property:

 ∥y−^y∥B=minv∈V∥y−v∥B. (28)

Let be the exact solution and the subspace solution from to (1), respectively. Our aim is to design subspace , independent of , such that

 ∥^x(σ)−x(σ)∥A(σ)≤tolfor any σ∈S.

We take advantage of the best approximation property and design compound Krylov method subspaces that contain either exactly or approximately a solution candidate appearing in the CG error analysis. By the best approximation property, the error of the CK-solution is then bounded by the error of this solution candidate. CG error analysis is discussed next.

### 2.4 The Conjugate Gradient Method

The CG method is an iteration for finding a sequence of approximate solutions to linear systems with s.p.d. coefficient matrices, see [HeSt:1953] and [axelsson:1994]. It can be understood as a line search method for minimising the energy functional associated to the linear system to be solved, or as a method finding a sequence of subspace solutions from the family of Krylov subspaces corresponding to the linear system.

Let , , and consider the linear system: find satisfying

 By=g. (29)

The family of Krylov subspaces corresponding to (29) is defined as

 Kj(B,g):=span{g,Bg,…,Bj−1g}for j∈N.

The CG method computes a sequence of approximate solutions to (29) such that is the subspace solution from for each . It does this without computing a basis for , which makes CG method very memory efficient. We proceed to outline the CG error analysis, i.e., study how the error depends on , , and the iteration index . This material can be found, e.g., from [axelsson:1994].

First, observe the duality between vectors in and -degree polynomials: each satisfies

 vj=pvj(B)gforpvj∈Pj−1. (30)

Similarly, for any .

It is well known that a bound for the error norm follows from the best approximation property (28) and constructing an approximation to from by utilising properties of the Chebychev polynomials, see [axelsson:1994]. By (30)

 y−vj=[I−pvj(B)B]y

for any and satisfying . Denote the set of eigenvalues of by . By standard arguments,

 ∥y−vj∥B≤maxt∈Λ(B)|1−tpvj(t)|∥y∥Bfor any vj∈Kj(B,g). (31)

The multiplier in (31) satisfies , where is the space of degree monic polynomials. One can verify that choosing appropriate yields all possible multipliers in . The CG error estimate could be constructed by finding that minimises the multiplicative term . As there does not exist a general solution for this optimisation problem, one instead finds that has the minimal norm in the set by translating and scaling the th Chebychev polynomial on as

 q∗j(t)=Tj(λmax+λmin−2tλmax−λmin)Tj(λmax+λminλmax−λmin)=j∑k=0γjktk. (32)

Note that for any . Let be the condition number of , i.e., . Using the properties of Chebychev polynomials gives the identity

 maxt∈Λ(B)|1−tpvj(t)|=2(√κ(B)−1√κ(B)+1)j. (33)

The element satisfying is

 v∗j=(q∗j(B)−I)B−1g=j∑k=1γjkBk−1g. (34)

Choosing in (31), using the best approximation property, and (33) yields the error bound

 ∥y−^yj∥B≤2(√κ(B)−1√κ(B)+1)j∥y∥Bfor any j∈N. (35)

## 3 Compound Krylov Subspaces

In this section, we define three families of compound Krylov subspaces that are used to solve (1). We begin by defining the family , that satisfies

 Kj(A(σ),b)⊂CKj(A,b)for any σ∈S and j∈N. (36)

Due to the inclusion in (36) and the best approximation property (28), the subspace solution to (1) from is at least as accurate as the th CG-iterate for any .

The subspace satisfying (36) that has the smallest possible dimension is

 j−1⋃k=0span{A(σ)kb|σ∈S}. (37)

We take advantage of linearity of , and define the subspace containing (37) by linearisation: the terms are written as , where is the th linearisation matrix and the Kronecker product is repeated -times, see Section 3.1. By definition, it holds that . Hence, we define

 CKj(A,b):=j−1⋃k=0range(Lk). (38)

The column dimension of depends exponentially on whereas its row dimension is fixed. Thus, we compute using the normal form that can be formed without ever constructing , see Lemma 4 and Remark 2.

The computational cost of a subspace method depends on the dimension of the applied method subspace. To keep both small, we propose two families of approximate compound Krylov subspaces, denoted by and , that have smaller dimensions but admit similar error estimate as . Spaces are obtained by using the span of the most dominant right singular vectors of the linearisation matrix instead of in (38). The space is obtained by applying an approximate linearisation process including a low-rank approximation step. In both cases, the low-rank approximation can be implemented in a way that eliminates the exponential growth in the dimension of all involved matrices for a favorable , see Lemma 4, Theorem 3, and numerical examples in Section 5.

### 3.1 Linearisation process

Next, we discuss the linearisation process and define the family of exact-CK subspaces satisfying (38). We begin with some notation.

###### Definition 2

Let and . We write for the -times Kronecker product of ,

 σ⊗k:={σ⊗σ⊗(k−1)for k>1σfor k=1.

The linearisation function of is defined as follows:

###### Definition 3

Let the matrix-valued function be linear and such that for any . The linearisation function of is defined as

 L(C)=[A1CA2C…AsC]

for any .

We write for the functional power, i.e., for and . Using induction, we obtain the following Lemma that gives the linearisation of with respect to .

###### Lemma 2

Let be linear, and be the linearisation function of . Then

 A(σ)kb=Lk(b)σ⊗kfor % any k∈N and σ∈Rs

Observe that the column dimension of grows exponentially with , particularly, .

###### Proof

The proof follows by induction. By (5) and Definition 3,

 A(σ)b=s∑i=1σiAib=L(b)σ.

Let , and assume that holds. Then,

 A(σ)k+1b=A(σ)Lk(b)σ⊗k=s∑i=1σiAiLk(b)σ⊗k,

and further

 A(σ)k+1b=[A1Lk(b)⋯AsLk(b)]⎡⎢⎣σ1σ⊗k…σsσ⊗k⎤⎥⎦.

Recalling the definition of the Kronecker product and linearisation function of completes the proof.

In practical computation, the subspace is obtained by augmenting with . Hence, we use the following recursive definition instead of (38):

###### Definition 4

Let , be linear, and be the linearisation function of . Then the family of compound Krylov subspaces is defined as

 CK1=span(b)andCKj+1=range(Lj(b))⊕CKjfor j∈N, j>1.

By Lemma 2, it holds that for any and . Let and be the subspace solution to (1) from . By the best approximation property (28), admits identical error estimate with the CG method, this is,

 ∥x(σ)−