A study of defect-based error estimates for the Krylov approximation of φ-functions

01/31/2020 ∙ by Tobias Jawecki, et al. ∙ TU Wien 0

Prior recent work, devoted to the study of polynomial Krylov techniques for the approximation of the action of the matrix exponential e^tAv, is extended to the case of associated φ-functions (which occur within the class of exponential integrators). In particular, a posteriori error bounds and estimates, based on the notion of the defect (residual) of the Krylov approximation are considered. Computable error bounds and estimates are discussed and analyzed. This includes a new error bound which favorably compares to existing error bounds in specific cases. The accuracy of various error bounds is characterized in relation to corresponding Ritz values of A. Ritz values yield properties of the spectrum of A (specific properties are known a priori, e.g. for Hermitian or skew-Hermitian matrices) in relation to the actual starting vector v and can be computed. This gives theoretical results together with criteria to quantify the achieved accuracy on the run. For other existing error estimates the reliability and performance is studied by similar techniques. Effects of finite precision (floating point arithmetic) are also taken into account.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Overview on prior work.

The matrix exponential and associated -functions play a crucial role in some numerical methods for solving systems of differential equations. In practice this means that the vector for a given matrix and a given vector , representing the time propagation for a linear initial value problem, is to be approximated. Similarly, the associated -functions (see (2.2) below) conform to solutions of certain inhomogeneous differential equations. In particular, evaluation of -functions is used in exponential integrators HO10 .

If the matrix is sparse and large, approximation of the action of these matrix functions in the class of Krylov subspaces is a general and well-established technique. For the matrix exponential and -functions this goes back to early works in the field of chemical physics NR83 ; PL86 , parabolic problems GS92 , some nonlinear problems FTDR89 , etc. The case of a symmetric or skew-Hermitian matrix  is the most prominent one. Krylov approximations of the matrix exponential were early studied for the symmetric case in DK89 ; DK91 ; Sa92 , and together with -functions in a more general setting HLS98 ; HL97 .

Concerning different approaches for the numerical approximation of the matrix exponential see ML03 . In Sa92

it is shown for the symmetric case that the Krylov approximation is equivalent to interpolation of the exponential function at associated Ritz values. This automatically results in a near-best approximation among other choices of interpolation nodes, see also 

DK89 ; SL96 and future works BR09 with similar results for the non-symmetric case. For other polynomial approaches approximating the matrix exponential we mention truncated Taylor series MH11 (and many works well in advance), Chebychev polynomial interpolation TK84 , or the Leja method CKOR16 .

In general, Krylov approximation (or other polynomial approximations) result in an accurate approximation if the time step in is sufficiently small or the dimension of the Krylov subspace (i.e., the degree of the approximating matrix polynomial) is sufficiently large, see for instance HL97 . The dimension of the Krylov subspace is limited in practice, and large time steps require a restart of the iteration generating the Krylov basis. A larger time step can be split into smaller substeps for which the Krylov approximation can be applied in a nested way. Such a restart strategy in the sense of a time integrator was already exploited in PL86 . In particular we refer to the EXPOKIT package Si98 . Similar ideas can be applied for the evaluation of -functions HLS98 ; NW12 .

In practice, a posteriori error estimates are used to choose a proper Krylov dimension or proper (adaptive) substeps if the method is restarted as a time integrator. Different approaches for a posteriori error estimation make use of a series expansion for the error given Sa92 ; Si98 or use a formulation via the defect (also called residual) of the Krylov approximation DGK98 ; CM97 ; BGH13 . Further a priori as well as a posteriori error estimates are given in MN01 ; Lu08 ; DMR09 ; BR09 ; JL15 ; WY17 ; JAK19 . Restarting via substeps based on different choices of error estimates is further discussed in JAK19 . A restart with substeps together with a strategy to choose the Krylov dimension in terms of computational cost was presented in NW12 ; BK19 . For various other approaches for restarting (without adapting the time step) we refer to CM97 ; EE06 ; Ta07 ; Ni07 ; AEEG08 ; EEG11 ; BGH13 ; Schwe15 .

The influence of round-off errors on the construction of the Krylov basis in floating point arithmetic was early studied for the symmetric case in Pa76 ; Par98 . The orthogonalization procedure can behave numerically unstable, typically due to a loss of orthogonality. Nevertheless, the near-best approximation property and related a priori convergence results are not critically affected DK91 ; DGK98 . Following DGK98 , in the symmetric case the defect obtained in floating point arithmetic results in numerically stable error estimates.

Beside the polynomial Krylov method, further studies are devoted to the approximation of matrix functions using so called extended Krylov subspaces DK98 ; KS10 ; GG13 , rational Krylov subspaces MN04 ; EH06 ; Gu10 , or polynomial Krylov subspaces with a harmonic Ritz approach HH05 ; Schwe15 ; WZX16 .

Overview on results presented here.

In Section 2 we introduce the problem setting and recapitulate basic properties of Krylov subspaces.

In Section 3 we introduce the defect associated with Krylov approximations to -functions, including the exponential function as the basic case. Our approach for the defect is different from WZX16 and is based on an inhomogeneous differential equation for the approximation error. This is used in Theorem 3.1 to obtain an integral representation of the error, also taking effects of floating point arithmetic into account. 111Cf. DGK98 for the case of the matrix exponential. In contrast to previous works (DGK98 ; JAK19 ), this result is extended to -functions here. Theorem 3.1 also includes an a priori upper bound on the error norm based on an integral of the defect norm.

This upper bound is further analyzed in Section 4 to obtain computable a posteriori bounds, in particular a new a posteriori bound (Theorem 4.1). We also study the accuracy of our and other existing defect-based bounds JAK19 with respect to spectral properties of the Krylov Hessenberg matrix (the representation of in the orthogonal Krylov basis). To this end we use properties of divided differences including a new asymptotic expansion for such given in Appendix C. In Subsection 4.1 we recapitulate error estimates based on a quadrature estimate of the defect norm integral, e.g. the generalized residual estimate HLS98 . We also discuss cases for which the defect norm behaves oscillatory and reliable quadrature estimates may be difficult to obtain. In Subsection 4.2 we specify a stopping criterion for the so-called lucky breakdown in floating point arithmetic which is justified by our a posteriori error bounds.

In Section 5 we illustrate our results via numerical experiments. Here we confine ourselves to the exponential function. Particular application problems where -functions are used will be documented elsewhere.

2 Problem statement and Krylov approximation

We discuss the approximation via Krylov techniques for evaluation of the matrix exponential, and in particular of the associated -functions, for a step size and matrix applied to an initial vector . Here,


The matrix exponential is the solution of the differential equation

The associated -functions are given by


This includes the case . The matrix functions (2.1) and (2.2) are defined according to their scalar counterparts. The following definitions of are equivalent to (2.2): For we have , and


(See also (Hi08, , Subsection 10.7.4).) The function () is the solution of an inhomogeneous differential equation of the form


see for instance NW12 . This follows from (2.2),

The -functions appear for instance in the field of exponential integrators, see for instance HO10 .

For the case of being a large and sparse matrix, e.g., the spatial discretization of a partial differential operator using a localized basis, Krylov subspace techniques are commonly used to approximate (2.2) in an efficient way.

Notation and properties of Krylov subspaces.

222In the sequel, denotes the -th unit vector in or , respectively.

We briefly recapitulate the usual notation and properties of standard Krylov subspaces, see for instance Sa03 . For a given matrix , a starting vector and Krylov dimension , the Krylov subspace is given by

Let represent the orthonormal basis of with respect to the Hermitian inner product, constructed by the Arnoldi method and satisfying . Its first column is given by with . Here, the matrix

is upper Hessenberg. We further use the notation , and for the -th column of , with and .

The Arnoldi decomposition (in exact arithmetic) can be expressed in matrix form,

Remark 1

The numerical range plays a role in our analysis. Note that (see (A.1)).

Remark 2

The case occurs if is an invariant subspace of , whence the Krylov approximation given in (2.9) below is exact. This exceptional case is referred to as a lucky breakdown. In general we assume that no lucky breakdown occurs, whence the lower subdiagonal entries of are real and positive, for , and .

For the special case of a Hermitian or skew-Hermitian matrix  the Arnoldi iteration simplifies to a three-term recurrence, the so-called Lanczos iteration. This case will be addressed in Remark 4 below.

Krylov subspaces in floating point arithmetic.

We proceed with some results for the Arnoldi decomposition in computer arithmetic, assuming complex floating point arithmetic with a relative machine precision , see also Hi02 . For practical implementation different variants of the Arnoldi procedure exist, using different ways for the orthogonalization of the Krylov basis. These are based on classical Gram-Schmidt, modified Gram-Schmidt, the Householder algorithm, the Givens algorithm, or variants of Gram-Schmidt with reorthogonalization (see also (Sa03, , Algorithm 6.1–6.3) and others). We refer to BLR00 and references therein for an overview on the stability properties of these different variants.

In the sequel the notation , , etc., will again be used for the result of the Arnoldi method in floating point arithmetic. We now accordingly adapt some statements formulated in the previous paragraph. By construction, remains to be upper Hessenberg with positive lower subdiagonal entries. Assuming floating point arithmetic we use the notation for a perturbation of the Arnoldi decomposition (2.5) caused by round-off, i.e.,


An upper norm bound for was first introduced in Pa76 for the Lanczos iteration in real arithmetic. For different variants of the Arnoldi or Lanczos iteration this is discussed in Ze03 and others. We assume is bounded by a constant which can depend on and in a moderate way and is sufficiently small in a typical setting,

We further assume that the normalization of the columns of is accurate, in particular that the -th basis vector is normalized correctly up round-off with a sufficiently small constant (see e.g. (Pa76, , eq. (14))),
Concerning which represents an orthogonal basis in exact arithmetic, numerical loss of orthogonality has been well-studied. Loss of orthogonality can be significant (see for instance Par98 ; BLR00 and others), depending on the starting vector . Reorthogonalization schemes or orthogonalization via Householder or Givens algorithm can be used to obtain orthogonality of on a sufficiently accurate level.

The numerical range of obtained in floating point arithmetic (see (2.6)) can be characterized as


with being the neighborhood of in with a distance . With the assumption that is sufficiently close to orthogonal (e.g., semiorthogonal Si84 ), the constant in (2.7c) (which also depends on and problem sizes) can be shown to be moderate-sized. Further details on this aspect are given in Appendix A.

Krylov approximation of -functions.

333Remark concerning notation: ’’ objects live in , and ’’ objects live in .

Let , and be the result of the Arnoldi method in floating point arithmetic for as described above. For a time-step and the vector can be approximated in the Krylov subspace by the Krylov propagator

The special case reads

We remark that the small-dimensional problem , typically with , can be evaluated cheaply by standard methods. In the sequel we denote


For the small dimensional problem solves the differential equation


For later use we introduce the notation

which for and according to (2.4) satisfies the differential equation
Remark 3

Although we take rounding effects in the Arnoldi decomposition into account, we do not give a full study of round-off errors at this point. Round-off errors in substeps such as the evaluation of or the matrix-vector multiplication will be ignored. We refer to Hi02 for a more general study of these effects.

Remark 4

In the special cases or for a Hermitian matrix (with being skew-Hermitian in the latter case) the orthogonalization of the Krylov basis of simplifies to a three-term recursion, the so-called Lanczos method. In the skew-Hermitian case () the Krylov propagator (2.8a) can be evaluated by , i.e., we approximate the function in the Krylov subspace . The advantage is a cheaper computation of the Krylov subspace in terms of computational cost and better conservation of geometric properties. For details we refer to the notation as introduced in JAK19 , with and a Hermitian matrix for the skew-Hermitian case.

The error of the Krylov propagator.

We denote the error of the Krylov propagator given in (2.9) by


We are further interested in computable a posteriori estimates for the error norm , which in the best case can be proven to be upper bounds on the error norm . Norm estimates of the error (2.12) can be used in practice to stop the Krylov iteration after steps if satisfies (2.13) below, or to restrict the time-step to obtain an accurate approximation and restart the method with the remaining time. For details on the total error with this restarting approach see also Si98 ; JAK19 .

A prominent task is to test if the error norm per unit step is bounded by a tolerance ,


In case of being an upper bound on the error norm, this results in a reliable bound on the error norm (2.13).

3 An integral representation for the error of the Krylov propagator

We proceed with discussing the error of the Krlyov propagator. To this end we first define its scalar defect by

and the defect integral by444 This and the result of Theorem 3.1 remain valid for the case .
Theorem 3.1

Let be the defect defined in (3.1a). For defined in (2.9) and a numerical perturbation of the Arnoldi decomposition (see (2.6)), we have:

  1. The error of the Krylov propagator (see (2.12)) enjoys the integral representation

  2. For given machine precision and constants , representing round-off effects (see (2.7a),(2.7b)), and with and the error norm is bounded by


    with the defect integral defined in (3.1b).


  1. For the exact matrix function we use the notation

    For the Krylov propagator we denote

    (see (2.9)), and we also define

    • For , the functions and satisfy the differential equations (see (2.4), (2.11b))

    • For , i.e., and , according to (2.10) we have

    Local error representation in terms of the defect.

    We defined the scaled error

    • For the scaled error satisfies


      with the defect of with respect to the differential equation (3.3),

      Together with (2.6) and using of the defect can be written as

    • For , in an analogous way we obtain

    We conclude


    with the scalar defect defined in (3.1a). Due to (3.4) we have

    and for together with (3.5) this implies (3.2a).

  2. With , and , the representation (3.2a) implies the upper bound


    With the defect integral defined in (3.1b) we obtain the first term in (3.2b). For the second integral term (with ) we use the upper bound

    • For we apply the integral representation due to (2.3) for to obtain the norm bound

    • For we obtain (3.8) in a direct way.

    Combining (3.7) with (3.8) and denoting we obtain

    Combining these estimates with (3.6) we conclude (3.2b). ∎

Remark 5

The error norm of the Krylov propagator scales with and in a natural way. 555Taking the maximum in the definition of and is necessary to cover the case . For the special case the upper norm bound given in Theorem 3.1 can be adapted to scale with . It is well known that

see for instance (Hi08, , Theorem 10.11). Problems with can be arbitrary ill-conditioned and difficult to solve with proper accuracy. (For further results on the stability of the matrix exponential see also ML03 ; Lo77 .) We will not further discuss problems with and assume . We refer to the case as the dissipative case, with .

For the dissipative case with the error bound (3.2b) from Theorem 3.1 reads


The dissipative behavior of carries over to the Krylov propagator up to a perturbation which depends on round-off errors, including the loss of orthogonality of . In terms of the numerical range , with we have , for a constant depending on round-off effects (2.7c). Thus, and .

Our aim is to construct an upper norm bound for the error per unit step (2.13) via (3.9). Let the tolerance be given and be a respective time step for (2.13). Then the round-off error terms in (3.9) are negligible if


Concerning the constants , and see (2.7). We recapitulate that and given in (2.7a) and (2.7b) can be considered to be small enough in a standard Krylov setting. The constant can be larger in the case of a loss of orthogonality of the Krylov subspace, which can however be avoided at the cost of additional computational effort. The constant only appears as an exponential prefactor for the round-off term in (3.10) and is less critical compared to and .

With the previous observation on the round-off errors taken into account in (3.9) we consider the following upper bound to be stable in computer arithmetic in accordance to a proper value of , see (3.10).

Corollary 1

For the case and with the assumption that round-off error is negligible, the error of the Krylov propagator is bounded by the defect integral ,

Note that the defect norm cannot be integrated exactly in general. This point will further be studied in the sequel.

Representing the defect in terms of divided differences.

Divided differences play an essential role in this work. We use the notation

for the divided differences of a function over the nodes . (This is to be understood in the confluent sense for the case of multiple nodes , see for instance (Hi08, , Section B.16).)

Theorem 3.2 (see for instance Cm97 )

Let be an upper Hessenberg matrix with positive secondary diagonal entries, for

, and eigenvalues

. Let be an analytic function for which is well defined. Then,

with .

For we will also make use of the following result. 666Theorem 3.3 can be generalized to the case with , see (MH11, , Theorem 2.1). The case is sufficient for our purpose.

Theorem 3.3 (Corollary 1 in Si98 )

(Expressing -functions via dilated -functions.) For ,


The matrix in Theorem 3.3 is block triangular with eigenvalues equal to those of and . Therefore, , with as an eigenvalue of multiplicity (at least). In our context, is upper Hessenberg with a positive lower secondary diagonal and . In accordance with Theorem 3.2 the result of Theorem 3.3 holds for divided differences in a similar manner,

With Theorem 3.2 and 3.3 the following equivalent formulations can be used the rewrite the scalar defect defined in (3.1a).

Corollary 2

Let be the scalar defect given in (3.1a) for the upper Hessenberg matrix with positive secondary diagonal entries. Denote . Let be given as in Theorem 3.3. For the scalar defect we obtain the following equivalent formulations:

  1. 777 Here we introduce the notation for .

We remark that the eigenvalues of the Krylov Hessenberg matrix are also referred to as Ritz values (of ) in the literature.

4 Computable a posteriori error bounds for the Krylov propagator

The following two propositions are used for the proof of Theorem 4.1 below.888 We use the notation introduced in the previous sections.

Proposition 1

For arbitrary nodes and ,


See Appendix B.

Proposition 2 (Lemma including eq. (5.1.1) in Mnp84 )

For arbitrary nodes ,


See Appendix B.

We now derive upper bounds for the error via its representation by the defect integral (3.1b).

Theorem 4.1

Let , , and assume that round-off errors are sufficiently small (see Corollary 1). For the eigenvalues of we write , . An upper bound on the error norm is given by


Due to Corollary 2, (iv),

The divided differences in (4.2a) span over complex nodes and , with real parts . Propositions 2 and 1 imply
From Corollary 2 we obtain

Eqs. (4.2a)–(4.2b) together with Corollary 1 imply (4.1). ∎

For the case of having real eigenvalues, the assertion of Theorem 4.1 can be reformulated in the following way (see (JAK19, , Proposition 6)).

Corollary 3

Assume and that round-off errors are sufficiently small (see Corollary 1). For the case of having real eigenvalues , the upper bound on the error norm in Theorem 4.1 yields an exact evaluation of the defect integral. Hence,

As a further corollary we formulate an upper bound on the error norm which is cheaper to evaluate compared to the bound from Theorem 4.1 but may be less tight. Using the Mean Value Theorem, (Hi08, , eq. (B.26)) or (Bo05, , eq. (44)), for the divided differences in Theorem 4.1, eq. (4.1) we obtain the following result which corresponds to (JAK19, , Theorem 1 and 2). For the exponential of a skew-Hermitian matrix a similar error estimate has been used in KBC05 and is based on ideas of PL86 with some lack of theory.

Corollary 4

Let , , and assume that round-off errors are sufficiently small (see Corollary 1). Let for and for and eigenvalues of . An upper bound on the error norm is given by