Denote by the set of real, symmetric, and positive definite (s.p.d.) – matrices. Let , parameter set , and be a linear matrix-valued function such that . This work concerns efficient solution of the linear system: find satisfying
for multiple values of the parameter . We call defined point-wise by (1) as the parameter-to-solution map.
Our motivation for studying (1) arises from the solution of elliptic PDEs with spatially varying coefficient functions using the finite element method (FEM), see Section 2.2. As parameter dependent PDEs are related to several interesting engineering problems, their solution has attracted lots of attention. Research has been done both before and after spatial discretization. Parametric PDEs have especially been studied in the context of uncertainty quantification, where the parameter is typically related to a truncated Polynomial Chaos or Karhunen-Loève expansion of a random coefficient field, see [BaTeZo:04, BaNoTe:07, ScGi:11].
Currently, there exist three main approaches for the solution of (1) or the underlying parametric PDE. One can approximate the parameter-to-solution map using a Galerkin method in the parameter space, see [BaTeZo:04, ScGi:11]. These methods often combine discretization of spatial and parameter dimensions. Second alternative is to apply a collocation method, where
is first evaluated at collocation points and then approximated by interpolation,[BaNoTe:07]
. To break the curse of dimensionality, several sparse and adaptive families of collocation points have been proposed[BaCoDa:17]. Finally, one can construct a reduced basis or a subspace of that can accurately represent the solution for desired [QuMaNe:16]. The reduced basis is constructed by evaluating the solution at sampling points that are selected, e.g., by a greedy algorithm [BuMaPaPr:12]. The solution is then approximated point-wise by computing subspace solution using the reduced basis.
In this work, we propose a reduced basis method for the solution of (1) that is inspired by the Conjugate Gradient (CG) method. The solution of the linear system (1) for a single can be approximated efficiently using CG if the condition number of is close to one. The CG method is an iteration for finding a sequence of approximate solutions to linear systems with s.p.d. coefficient matrices, see [HeSt:1953] and [axelsson:1994]. Each CG-iterate is the subspace solution from the Krylov subspace corresponding to the linear system and the iteration index. The th Krylov subspace related to the model problem (1) is defined as
Observe that is dependent on .
The convergence of CG is well studied; the estimated number of iterations required to compute an approximate solution with desired error grows with the condition number. The condition numbers of coefficient matrices related to the FE-solution of elliptic PDEs are large and increase when the applied finite element mesh is refined. Hence, if the CG method is used in this setting, a preconditioner is required to improve convergence.
In this work, we propose one kind of exact and two kinds of approximate compound Krylov (CK) subspace methods to efficiently compute approximate solutions to (1) for multiple . The computation proceeds in two stages:
Off-line stage: Compute a basis for the applied compound Krylov subspace.
On-line stage: Use the basis constructed in the off-line stage to compute subspace solution to (1) for multiple .
In practice, the computational cost related to the first stage is large whereas computing the subspace solution for a favorable
is fast. Therefore our proposed method is most beneficial when the parameter-to-solution map is evaluated for a large number of parameter vectors.
The family of exact compound Krylov subspaces is designed to satisfy the inclusion
Observe that is independent of but dependent on as well as on the matrix-valued function . Due to the inclusion in (3) and the best approximation property of subspace methods, the subspace solution to (1) from is at least as accurate as the th iterate produced by the CG method for any .
Constructing a subspace satisfying the inclusion (3) requires treating the –dependency of the linear matrix-valued function . We use the linearity of and define as the union of subspaces containing the range of the mapping
Particularly, we reformulate the terms as , where is a linearisation matrix independent of . This reformulation allows us to easily compute the range of that contains for any . Such linearisation process can be repeated for terms , and thus, to find a basis for the compound Krylov subspace . Special care must be taken to cope with the exponentially growing column dimension of the linearisation matrices. The dimension and the computational cost are reduced by two kinds of approximate compound Krylov subspaces that are defined by including low rank approximations to the linearisation process.
The proposed CK-solvers are subspace methods, and as such they produce the best possible approximation to the exact solution from the method subspace in the norm associated with the coefficient matrix . We take advantage of this property in error analysis. Particularly, we show that both kinds of approximate CK-method subspaces approximately contain a solution candidate appearing in the error analysis of the Conjugate Gradient (CG) method for any . Our final error estimate guarantees that the CK-solutions have a comparable error with the CG method if sufficiently accurate low rank approximations of the linearisation matrices are used.
This work is organised as follows. In Section 2 we give a brief review of subspace and Conjugate Gradient (CG) methods and discuss how Problem (1) is related to the FE-solution of the Poisson’s equation with varying material data or geometry. Compound Krylov subspaces are discussed in Section 3. Implementation of the CK method is outlined in Section 4. The proposed methods and analytical results are illustrated in Section 5 by numerical examples. We conclude with a discussion of the obtained results and future work in Section 6
In this section, we first discuss linear matrix-valued functions and their representation. Then we give two examples of linear systems of the type (1) that are related to finite element solution of parametric PDEs. Finally, we briefly review subspace methods and CG error analysis that are a prerequisite for Section 3.
2.1 Linear matrix-valued functions
Linear matrix-valued functions are defined as usual:
Function is called linear if for any and
Naturally, any linear matrix-valued function admits the representation for independent of . Particularly, there exists independent of such that
2.2 Application in FEM
The motivation for our proposed methods comes from solving parameter dependent partial differential equations using the finite element method. Let domain, where the dimension is or , have sufficiently regular boundary and . Consider the weak form of the modified Poisson’s equation: find satisfying
We are interested in solving this problem multiple times with different values of using finite elements. Assume the function is in the form
where . The function is chosen so that for a.e. and all . In this case, the Lax-Milgram lemma guarantees the existence of a unique solution to (6) for any , see, e.g. [evans:1998].
The weak problem (6) is solved using FEM by limiting it to some finite element space . This is, one solves the problem: find satisfying
The FE-space is finite dimensional and has a basis . The basis functions are defined with the help of a mesh, a partition of the domain to subdomains called elements. The maximum diameter of these elements is called the mesh size. For an introduction on FEM, see [Braess:2007, brenner_scott:1994].
The problem (8) is equivalent to the matrix equation: find satisfying , where and are defined as
The accuracy of the approximate solution produced by the compound Krylov solver depends, among other things, on the condition number of the coefficient matrix, see Theorems 1 and 2. If the condition number is close to one, these methods converge rapidly. However, the condition number of defined in (9) grows with decreasing mesh size and is typically much larger than one leading to slow convergence of CK-methods, see [ToWi:05, Chapter B.6] for analysis in the case . To speed up convergence we improve conditioning by applying a split preconditioner. Let satisfy
where the function satisfies the following: there exists such that
for all and for a.e. . In addition, let be the Cholesky factor of so that . We can now write the linear system as
Redefining , , and yields the preconditioned linear system: find satisfying
The assumption (7) ensures that the matrix is a linear matrix-valued function. This is, we have arrived to an instance of (1). Next, we give an estimate for the condition number of the coefficient matrix in (11).
We use the Rayleigh quotient
. According to it the smallest and largest eigenvalues of the matrixare
We apply a change of variables and write the quotient as
Let be defined as for . By (9), the quotient can be written as:
Using the assumptions in (12) we find that:
Combining this with (13) yields estimate for the smallest and the largest eigenvalue of
for any . Recalling the definition of the condition number completes the proof.
2.2.1 Example 1: Piecewise constant material parameter
Let the subdomains be non-overlapping and satisfy . Let , and . We consider solving the problem: find satisfying
is the characteristic function of the set.
As a demonstration we use - checkerboard patterns where the domain is divided into subdomains by slicing it into horizontal and vertical strips, see Fig. 1. We choose the preconditioning coefficient matrix . Estimate for the condition number of corresponding to (14) and follows using Lemma 1. As , we have
Therefore and in (12), and .
2.2.2 Example 2: Deformation of geometry
A slightly more complicated example of (1) is a problem where the parameter is related to deformation of the domain. As an example we solve the Poisson’s equation in a rectangular domain with a spherical hole in multiple positions along the y-direction. Let , , and . Consider solving the problem: find satisfying
for multiple , where , . We reduced the problem (16) to the reference domain by using the coordinate transformation defined as
Here specifies the length of the translation and is a piecewise linear function defined as follows:
The domain before and after the transformation can be seen in Figure 2. For convenience, we name the three subdomains with different transformation rules from bottom to top as , and . The Jacobian of the transformation is
Applying the change of variables in (16) and defining yields the problem: find satisfying
for all . Observe that for , hence, its absolute value can be omitted. Expanding the left hand side of (18) gives
Hence, the problem (18) can be solved in two parts: find satisfying:
Since is a piecewise linear function, the above matrix is piecewise constant with respect to the spatial variable and depends only on . Explicitly,
We identify the parameters with elements in the above equation as
The elements and equal to one since the Jacobian in the subdomain is the identity. The above relations and bound define the parameter set . The LHS of (20) corresponds to
Because the parameter , and also , vary symmetrically around , we choose the preconditioning coefficient matrix . We proceed to estimate the condition number of corresponding to (24) and . We obtain:
It’s worth noting that the condition number blows up when approaches .
2.3 Subspace methods
Let , , be a subspace, a basis of , and . A subspace method computes an approximate solution to the linear system by first solving the auxiliary problem: find satisfying
We call such as the subspace solution from .
Any defines an inner product and the induced norm in : for any let
The subspace solution from is the -orthogonal projection of onto . Thus depends only on , not on the basis . For this reason, we call as the method subspace.
The error of the subspace solution is measured in the -norm as . Because is the -orthogonal projection of the exact solution to , the -norm of the error satisfies the best approximation property:
Let be the exact solution and the subspace solution from to (1), respectively. Our aim is to design subspace , independent of , such that
We take advantage of the best approximation property and design compound Krylov method subspaces that contain either exactly or approximately a solution candidate appearing in the CG error analysis. By the best approximation property, the error of the CK-solution is then bounded by the error of this solution candidate. CG error analysis is discussed next.
2.4 The Conjugate Gradient Method
The CG method is an iteration for finding a sequence of approximate solutions to linear systems with s.p.d. coefficient matrices, see [HeSt:1953] and [axelsson:1994]. It can be understood as a line search method for minimising the energy functional associated to the linear system to be solved, or as a method finding a sequence of subspace solutions from the family of Krylov subspaces corresponding to the linear system.
Let , , and consider the linear system: find satisfying
The family of Krylov subspaces corresponding to (29) is defined as
The CG method computes a sequence of approximate solutions to (29) such that is the subspace solution from for each . It does this without computing a basis for , which makes CG method very memory efficient. We proceed to outline the CG error analysis, i.e., study how the error depends on , , and the iteration index . This material can be found, e.g., from [axelsson:1994].
First, observe the duality between vectors in and -degree polynomials: each satisfies
Similarly, for any .
It is well known that a bound for the error norm follows from the best approximation property (28) and constructing an approximation to from by utilising properties of the Chebychev polynomials, see [axelsson:1994]. By (30)
for any and satisfying . Denote the set of eigenvalues of by . By standard arguments,
The multiplier in (31) satisfies , where is the space of degree monic polynomials. One can verify that choosing appropriate yields all possible multipliers in . The CG error estimate could be constructed by finding that minimises the multiplicative term . As there does not exist a general solution for this optimisation problem, one instead finds that has the minimal norm in the set by translating and scaling the th Chebychev polynomial on as
Note that for any . Let be the condition number of , i.e., . Using the properties of Chebychev polynomials gives the identity
The element satisfying is
3 Compound Krylov Subspaces
In this section, we define three families of compound Krylov subspaces that are used to solve (1). We begin by defining the family , that satisfies
The subspace satisfying (36) that has the smallest possible dimension is
We take advantage of linearity of , and define the subspace containing (37) by linearisation: the terms are written as , where is the th linearisation matrix and the Kronecker product is repeated -times, see Section 3.1. By definition, it holds that . Hence, we define
The computational cost of a subspace method depends on the dimension of the applied method subspace. To keep both small, we propose two families of approximate compound Krylov subspaces, denoted by and , that have smaller dimensions but admit similar error estimate as . Spaces are obtained by using the span of the most dominant right singular vectors of the linearisation matrix instead of in (38). The space is obtained by applying an approximate linearisation process including a low-rank approximation step. In both cases, the low-rank approximation can be implemented in a way that eliminates the exponential growth in the dimension of all involved matrices for a favorable , see Lemma 4, Theorem 3, and numerical examples in Section 5.
3.1 Linearisation process
Next, we discuss the linearisation process and define the family of exact-CK subspaces satisfying (38). We begin with some notation.
Let and . We write for the -times Kronecker product of ,
The linearisation function of is defined as follows:
Let the matrix-valued function be linear and such that for any . The linearisation function of is defined as
for any .
We write for the functional power, i.e., for and . Using induction, we obtain the following Lemma that gives the linearisation of with respect to .
Let be linear, and be the linearisation function of . Then
Observe that the column dimension of grows exponentially with , particularly, .
In practical computation, the subspace is obtained by augmenting with . Hence, we use the following recursive definition instead of (38):
Let , be linear, and be the linearisation function of . Then the family of compound Krylov subspaces is defined as