# IMEX error inhibiting schemes with post-processing

High order implicit-explicit (IMEX) methods are often desired when evolving the solution of an ordinary differential equation that has a stiff part that is linear and a non-stiff part that is nonlinear. This situation often arises in semi-discretization of partial differential equations and many such IMEX schemes have been considered in the literature. The methods considered usually have a a global error that is of the same order as the local truncation error. More recently, methods with global errors that are one order higher than predicted by the local truncation error have been devised (by Kulikov and Weiner, Ditkowski and Gottlieb). In prior work we investigated the interplay between the local truncation error and the global error to construct explicit and implicit error inhibiting schemes that control the accumulation of the local truncation error over time, resulting in a global error that is one order higher than expected from the local truncation error, and which can be post-processed to obtain a solution which is two orders higher than expected. In this work we extend our error inhibiting with post-processing framework introduced in our previous work to a class of additive general linear methods with multiple steps and stages. We provide sufficient conditions under which these methods with local truncation error of order p will produce solutions of order (p+1), which can be post-processed to order (p+2), and describe the construction of one such post-processor. We apply this approach to obtain implicit-explicit (IMEX) methods with multiple steps and stages. We present some of our new IMEX methods and show their linear stability properties, and investigate how these methods perform in practice on some numerical test cases.

## Authors

• 5 publications
• 9 publications
• 8 publications
• ### Explicit and implicit error inhibiting schemes with post-processing

Efficient high order numerical methods for evolving the solution of an o...
10/07/2019 ∙ by Adi Ditkowski, et al. ∙ 0

• ### Two-derivative error inhibiting schemes with post-processing

High order methods are often desired for the evolution of ordinary diffe...
12/09/2019 ∙ by Adi Ditkowski, et al. ∙ 0

• ### High-Order Multiderivative IMEX Schemes

Recently, a 4th-order asymptotic preserving multiderivative implicit-exp...
08/07/2020 ∙ by Alexander J. Dittmann, et al. ∙ 0

• ### Alternating Directions Implicit Integration in a General Linear Method Framework

Alternating Directions Implicit (ADI) integration is an operator splitti...
02/02/2019 ∙ by Arash Sarshar, et al. ∙ 0

• ### Design of High-Order Decoupled Multirate GARK Schemes

Multirate time integration methods apply different step sizes to resolve...
04/20/2018 ∙ by Arash Sarshar, et al. ∙ 0

• ### Convergence Results for Implicit–Explicit General Linear Methods

This paper studies fixed-step convergence of implicit-explicit general l...
04/08/2020 ∙ by Adrian Sandu, et al. ∙ 0

• ### A general linear method approach to the design and optimization of efficient, accurate, and easily implemented time-stepping methods in CFD

In simulations of fluid motion time accuracy has proven to be elusive. W...
10/13/2020 ∙ by Victor DeCaria, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Efficient high order numerical methods for evolving the solution of an ordinary differential equation (ODE) are frequently desired, especially for the time-evolution of PDEs that are semi-discretized in space. Higher order methods are desirable as they enable more accurate solutions for larger step-sizes. In this paper we consider numerical solvers for ordinary differential equations (ODEs)

 ut=F(t,u),t≥0 (1) u(t0)=u0.

(where is sufficiently smooth). Any non-autonomous system of this form can be converted to an autonomous system (see [15]) so that without loss of generality, we restrict ourselves to the autonomous case:

 ut=F(u),t≥0 (2) u(t0)=u0.

The forward Euler method is the simplest numerical method for evolving this problem forward in time with time-steps :

 vn+1=vn+ΔtF(vn),

where approximates the exact solution at time and we let . For this method, we have

 τn=ΔtLTEn=u(tn−1)+ΔtF(tn−1,u(tn−1))−u(tn)≈O(Δt2)

where is called the local truncation error, and the approximation error, at any given time . At any such time we are interested in is the global error which is the difference between the numerical and exact solution. For the forward Euler method the global error is first order accurate:

 En=vn−u(tn)≈O(Δt).

Observe that the global error and the local truncation error are of the same order, . This is the common experience that one step methods and linear multi-step methods that are typically used produce solutions that have a global error that is of the same order as the local truncation error.

To generate methods with higher local truncation errors, we can generalize the forward Euler scheme by adding steps (e.g. linear multistep methods), stages (e.g. Runge–Kutta methods), or derivatives (e.g. Taylor series methods). The Dahlquist Equivalence Theorem [27] states that any zero-stable, consistent linear multistep method with truncation error will have global error , provided that the solution has at least smooth derivatives. This behavior is usually seen in other types of schemes, and is the general expectation: so much so that the order of a numerical method is typically defined solely by the order conditions derived by Taylor series analysis of the local truncation error. However, the Lax-Richtmeyer equivalence theorem (see e.g. [21], [12], [22]) states that if the numerical scheme is stable then its global error is at least of the same order as its local truncation error. Recent work [17, 29, 18, 28, 5] highlights that while for a stable scheme the global error must at least of the same order as its local truncation error, it may in fact be of higher order.

Schemes that have global errors that are higher order than predicted by the local truncation errors were devised by Kulikov [17] and by Weiner and colleagues [29] following the quasi-consistency theory first introduced by Skeel in 1978 [25]. These methods have truncation errors that are of order but have global errors of order . In [5] Ditkowski and Gottlieb derived sufficient conditions for a class of general linear methods (GLMs) under which one can control the accumulation of the local truncation error over time evolution and produce methods that are one order higher than expected from the truncation error alone.

Additional conditions were derived in [18, 28, 19]

, that allow the computation of the the precise form of the first surviving term in the global error (the vector multiplying

). In these works, this form was leveraged for error estimation. However, the conditions in

[18, 28, 19], were overly restrictive and made it difficult to find methods. In [6] Ditkowski, Gottlieb, and Grant showed that under less restrictive conditions the form of the first surviving term in the global error can be computed explicitly and they showed how the solution can be post-processed to obtain accuracy of order . In this work, they produced implicit method and explicit methods that have favorable stability properties. The coefficients of those methods can be downloaded from [8]. In [7] Ditkowski, Gottlieb, and Grant derived sufficient conditions under which two-derivative GLMs of a certain form can produce solutions of order and can be post-processed to obtain solutions of order . The coefficients of those methods can be downloaded from [9].

In many problems, there is a component that is stiff and easy to invert (e.g. linear) and a component that is non-stiff but difficult to invert (e.g. nonlinear). Such examples include advection-diffusion PDEs, where the diffusion term requires a small time-step when used with an explicit time-stepping method but is linear, and the advection term is nonlinear but allows a relatively large time-step when used with an explicit time-stepping method. In such cases it is convenient to distinguish these two components and write the resulting autonomous ODE as

 ut=F(u)+G(u),t≥0 (3) u(t0)=u0.

(where we assume, as before, the required smoothness on and ). Here, is a stiff but ’simple to invert’ operator while is non-stiff but complicated (or costly) to invert and typically non-linear.

Implicit-explicit (IMEX) methods were developed to treat the component explicitly and the component implicitly, thus alleviating the linear stability constraint on the stiff component while treating the difficult to invert component explicitly. IMEX methods were first introduced by Crouziex in 1980 [4] for evolving parabolic equations. For time dependent PDEs (notably (convection-diffusion equations) Ascher, Ruuth, and Wetton [1] introduced IMEX multi-step methods and Ascher, Ruuth and Spiteri [2] IMEX Runge–Kutta schemes. Implicit methods are often particularly desirable when applied to a linear component, in which case the order conditions may simplify: such methods were considered by Calvo, Frutos, and Novo in [3]. In 2003, Kennedy and Carpenter [16] derived IMEX Runge–Kutta methods based on singly diagonally implicit Runge–Kutta (SDIRK) methods. This work introduced sophisticated IMEX methods with good accuracy and stability properties, as well as high quality embedded methods for error control and other features that make these methods usable in complicated applications. Recently, there has been more interest in IMEX methods with multiple steps and stages, including [14], [26]. Optimized stability regions for IMEX methods with multiple steps and stages have also been of recent interest, as in [30] and [20].

In recent work, Schneider and colleagues [23, 24] produced super-convergent IMEX Peer methods (a Peer method is a GLM where each stage is of the same order). These methods satisfy error inhibiting conditions similar to those in [5] and so produce order although their truncation error is of order , so we shall refer to them here as IMEX-EIS schemes. In this work, we extend the error inhibiting with post-processing theory in [6] and [7] to IMEX methods. We proceed to devise error inhibiting IMEX methods of up to five steps and order six that have truncation errors or order but which can be post-processed to attain a solution of order . We refer to these as IMEX-EIS+ methods. We test these methods on a number of numerical examples to demonstrate their enhanced accuracy properties. This paper is structured as follows: in Section 2 we present our notation and some information about our IMEX methods which have multiple steps and stages. In Section 3.1 we provide sufficient conditions under which these methods with local truncation error of order will produce solutions of order , which can be post-processed to order . We provide the construction of one such post-processor in Section 3.2. In Section 4 we present some of our new methods and show their linear stability properties, and in Section 5 we show how these methods perform in practice on some numerical test cases.

## 2 Multistep multi-stage additive methods

In this work we consider the class of additive general linear methods (GLMs) of the form

 Vn+1=DVn+Δt[AFF(Vn)+RFF(Vn+1)+AGG(Vn)+RGG(Vn+1)], (4)

where is a vector of length that contains the numerical solution at times for :

 Vn=(v(tn+c1Δt)),v(tn+c2Δt),…,v(tn+csΔt))T. (5)

The functions and are defined as the component-wise function evaluation on the vector :

 F(Vn)=(F(v(tn+c1Δt)),F(v(tn+c2Δt)),…,F(v(tn+csΔt)))T. (6)

and

 G(Vn)=(FGv(tn+c1Δt)),G(v(tn+c2Δt)),…,G(v(tn+csΔt)))T. (7)

For convenience, we select so that the first element in the vector approximates the solution at time , and the abscissas are non-decreasing . To initialize these methods, we define the first element in the initial solution vector and the remaining elements are computed using a sufficiently accurate method.

Similarly, we define the projection of the exact solution of the ODE (3) onto the temporal grid by:

 Un=(u(tn+c1Δt),u(tn+c2Δt),…,u(tn+csΔt))T, (8)

with and are the component-wise function evaluation on the vector . We define the global error as the difference between the vectors of the exact and the numerical solutions at some time

 En=Vn−Un. (9)
###### Remark

Note that the additive form (4) includes both explicit and implicit schemes, as appears on both sides of the equation. However, if and are both strictly lower triangular the scheme is explicit. A special case of additive methods is that the method is explicit in and implicit in : such methods are called implicit-explicit or IMEX methods. In this work we are primarily interested in IMEX schemes so we require that in the form (4) is strictly lower triangular, and that contains some diagonal or super-diagonal elements. However, in this section and in Section (3.1) we do not assume this is the case, and so our error inhibiting with post-processing theory is true for all additive methods of the form (4), which is a much more general class than IMEX methods.

### 2.1 Truncation errors

A method of the form (4) has an approximation error at at time

 τn=[DUn−1+Δt(AFF(Un−1)+RFF(Un)+AGG(Un−1)+RGG(Un))]−Un (10)

where

 τn=∞∑j=0τnjΔtj=τ0u(tn)+∞∑j=1Δtj(τjdjudtj∣∣∣t=tn+^τjdj−1G(u)dtj−1∣∣∣t=tn) (11)

where equationparentequation

 τ0 = (D−I)1 (12a) τj = 1(j−1)!(1jD(c−1)j+AF(c−1)j−1+RFcj−1−1jcj)for j>0 (12b) ^τj = 1(j−1)![(AG−AF)(c−1)j−1+(RG−RF)cj−1]for j=1,2, .... (12c)

We denote the vector of abscissas by , and the vector of ones by . Any terms of the form are to be understood component-wise . Note that this notation matches the one used in [20].

Alternatively, observing that we can write this as:

 τn = τ0u(tn)+∞∑j=1Δtj(τjdjudtj∣∣∣t=tn+^τjdj−1G(u)dtj−1∣∣∣t=tn) = τ0u(tn)+∞∑j=1Δtj(τjdj−1utdtj−1∣∣∣t=tn+^τjdj−1G(u)dtj−1∣∣∣t=tn) = τ0u(tn)+∞∑j=1Δtj(τjdj−1F(u)dtj−1∣∣∣t=tn+(τj+^τj)dj−1G(u)dtj−1∣∣∣t=tn) = τ0u(tn)+∞∑j=1Δtj(τFjdj−1F(u)dtj−1∣∣∣t=tn+τGjdj−1G(u)dtj−1∣∣∣t=tn)

where equationparentequation

 τ0 = (D−I)1 (13a) τFj = 1(j−1)!(1jD(c−1)j+AF(c−1)j−1+RFcj−1−1jcj)for j>0 (13b) τGj = 1(j−1)!(1jD(c−1)j+AG(c−1)j−1+RGcj−1−1jcj)for j>0 (13c)

This notation is the same as used in [26].

Regardless of whether we use the notation (12) or (13), for a method to have order truncation error of order it must satisfy the order conditions

 τnj=0j≤p. (14)

As this must be true for all and these become

 τ0=0,andτj=^τj=0∀j=1,...,p,

or, equivalently,

 τ0=0,andτFj=τGj=0∀j=1,...,p.

We are only interested in zero-stable methods. A sufficient condition for this is that the coefficient matrix is a rank one matrix that has row sum one: this satisfies the consistency condition

 τ0=(D−I)1=0.

For simplicity we assume this to be the case in this the remainder of this work.

### 2.2 Preliminaries

In this subsection we make an observation that will be useful in the remainder of the paper.

Observation 1 Given the smoothness of and , and the assumption that , we observe that

 F(Un+En)=F(Un)+FnyEn+O(Δt)O(∥En∥), (15)

and

 G(Un+En)=G(Un)+GnyEn+O(Δt)O(∥En∥), (16)

where and .

###### Proof

This is simply due to the fact that is smooth enough and is small enough that we can expand:

 F(Un+En) =F(Un)+⎛⎜ ⎜ ⎜ ⎜ ⎜⎝Fy(u(tn+c1Δt))en+c1Fy(u(tn+c2Δt))en+c2⋮Fy(u(tn+csΔt))en+cs⎞⎟ ⎟ ⎟ ⎟ ⎟⎠+O(∥En∥2), =F(Un)+⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝Fn+c1y0⋯00Fn+c2y⋯0⋮⋮⋱⋮00⋯Fn+csy⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠En+O(∥En∥2),

where the error vector is , and we use the notation . Each term can be expanded as

 Fn+cjy=Fy(u(tn+cjΔt))=Fy(u(tn))+cjΔtFyy(u(tn))+O(Δt2),

so that, using the assumption that we have

 F(Un+En) =F(Un)+(FnyI+O(Δt))En+O(∥En∥2) =F(Un)+FnyEn+O(Δt)O(∥En∥).

The proof for is identical.

We now describe the growth of the error:

###### Lemma

Given a zero-stable method of the form (4) which satisfies the order conditions

 τnj=0forj=0,...,p

and where the functions and are smooth, the evolution of the error can be described by:

 (I−Δt(RFFny+RGGny)+O(Δt2))En+1 = (D+Δt(AFFny+AGGny))En+τn+1.

###### Proof

Subtracting (10) from (4), we obtain

 Vn+1−Un+1 = DVn−DUn+ΔtAF(F(Vn)−F(Un)) +ΔtAG(G(Vn)−G(Un))+ΔtRF(F(Vn+1)−F(Un+1)) +ΔtRG(G(Vn+1)−G(Un+1))+τn+1

using Observation 1 we have

 En+1 = DEn+Δt(AFFnyEn+RFFn+1yEn+1+AGGnyEn+RGGn+1yEn+1) + τn+1+O(Δt)O(∥En∥2,∥En+1∥2).

The term final term is very small: so we can neglect it. We then have an equation for the error which is essentially linear because the terms , , and are all constant scalars, which depend only on the solution of the ODE.

Now move the terms to the left side:

 (I−ΔtRFFn+1y−ΔtRGGn+1y)En+1 = DEn+ΔtAFFnyEn+ΔAGGnyEn+τn+1,

Noting that and we have the desired result

 (I−Δt(RFFny+RGGny)+O(Δt2))En+1 = (D+Δt(AFFny+AGGny))En+τn+1.

## 3 Constructing IMEX-EIS+ methods

Schneider, Lang, and Hundsdorfer [23] showed how to construct IMEX methods of the form (4) which satisfy the order conditions to order but are error inhibiting and so produce a solution of order . In this section we show that under additional conditions we can express the exact form of the next term in error and define an associated post-processor that allow us to recover order from a scheme that would otherwise be only th-order accurate. We note that also our main interest is in IMEX methods, and we will focus on these later in the paper, the theory presented in this section is general enough for all additive methods.

### 3.1 IMEX error inhibiting schemes that can be post-processed

In this section we consider an additive method of the form (4) where we assume that is a rank one matrix that satisfies the consistency condition so that this scheme is zero-stable. Furthermore, we assume that the coefficient matrices , , , , are such that the order conditions are satisfied to order , so that the method will give us a numerical solution that has error that is guaranteed of order .

In [23] it was shown that if the truncation error vector lives in the null-space of the operator then the order of the error is of order . In the following theorem we establish additional conditions on the coefficient matrices , , , , , which allow us to determine precisely what the leading term of this error will look like and therefore remove it by post-processing.

###### Theorem 1

Consider a zero-stable additive general linear method of the form

 Vn+1=DVn+Δt[AFF(Vn)+RFF(Vn+1)+AGG(Vn)+RGG(Vn+1)], (18)

where is a rank one matrix that satisfies the consistency condition , the coefficient matrices , , , , , satisfy the order conditions

 τnj=0forj=1,...,p,andn≥0

for , defined by (10), and the error inhibiting condition equationparentequation

 Dτnp+1=0 (19a) is satisfied (so that numerical solution produced by this method will have error Ek=O(Δtp+1)). If the conditions Dτnp+2=0 (19b) D(AF+RF)τnp+1=0 (19c) D(AG+RG)τnp+1=0 (19d)

are satisfied, then error vector will have the more precise form:

 En=Δtp+1τnp+1+O(Δtp+2). (20)

###### Proof

Using Lemma 1 we obtain the equation for the evolution of the error

 En+1 = [I−Δt(RFFny+RGGny)+O(Δt2)]−1 [(D+Δt(AFFny+AGGny))En+τn+1].

the fact and are smooth assures us that so that we can expand the first term

 En+1 = [I+Δt(RFFny+RGGny)+O(Δt2)] [(D+Δt(AFFny+AGGny))En+τn+1] = (D+ΔtFny(RFD+AF)+ΔtGny(RGD+AG)+O(Δt2))En ≡ QnEn+ΔtTne.

As we showed in Lemma 3 in [6], we can assume that for some reasonably large time interval .

The discrete Duhamel principle (given as Lemma 5.1.1 in [12]) states that given an iterative process of this form where is a linear operator, we have

 En=n−1∏μ=0QμE0+Δtn−1∑ν=0(n−1∏μ=ν+1Qμ)Tνe. (22)

To analyze the error (22), we separate it into four parts:

 En=n−1∏μ=0QμE0I+ΔtTn−1eII+ΔtQn−1Tn−2eIII+Δtn−1∑ν=0(n−3∏μ=ν+1Qμ)TνeIV

and discuss each part separately:

• The method is initialized so that the numerical solution vector is accurate enough to ensure that the initial error is negligible and we can ignore the first term.

• The final term in the summation is . Recall that

 Tn−1e=Δtpτnp+1+Δtp+1(τnp+2+(RFFn−1y+RGGn−1y)τnp+1)+O(Δtp+2)

so that this term contributes to the final time error the term

 ΔtTn−1e=Δtp+1τnp+1+O(Δtp+2).
• The term is a product of the operator

 Qn−1=(D+ΔtFn−1y(RFD+AF)+ΔtGn−1y(RGD+AG)+O(Δt2))

and the approximation error

 Tn−2e=Δtpτn−1p+1+Δtp+1(τn−1p+2+(RFFn−2y+RGGn−2y)τn−1p+1)+O(Δtp+2)

so we obtain

 Qn−1Tn−2e=ΔtpDτn−1p+1+O(Δtp+1)=O(Δtp+1),

due to the condition (19a) that states that .

The result in this theorem requires a closer look at the terms in this product

 D[τn−1p+2+(RFFn−2y+RGGn−2y)τn−1p+1)] +[Fn−1y(RFD+AF)+Gn−1y(RGD+AG)]τn−1p+1+O(Δt) = D[τn−1p+2+(RFFn−2y+RGGn−2y)τn−1p+1)] +[Fn−1yAF+Gn−1yAG]τn−1p+1+O(Δt) = D[τn−1p+2+(RFFn−1y+RGGn−1y)τn−1p+1)] +[Fn−1yAF+Gn−1yAG]τn−1p+1+O(Δt) =

where we applied (19a) and used the observation that and .

We now apply the conditions of the theorem (19b) - (19d), which make the the terms in (3.1) vanish and allow us to conclude that:

 Qn−1Tn−2e=O(Δtp+2).
• Finally we look at the rest of the sum and use the boundedness of the operator to observe

 ∥∥ ∥∥Δtn−3∑ν=0(n−1∏μ=ν+1Qμ)Tνe∥∥ ∥∥ = ∥∥ ∥∥Δtn−3∑ν=0(n−1∏μ=ν+3^Qμ)(^Qν+2^Qν+1Tνe)∥∥ ∥∥ ≤ Δtn−3∑ν=0∥∥ ∥∥n−1∏μ=ν+3Qμ∥∥ ∥∥∥∥Qν+2Qν+1Tνe∥∥ ≤ Δtn−3∑ν=0(1+cΔt)n−ν−3∥∥Qν+2Qν+1Tνe∥∥ = exp(ctn)−1cmaxν=0,...,n−3∥∥Qν+2Qν+1Tνe∥∥.

Clearly, the first term here is a constant that depends only on the final time, so it is the product we need to bound. Using the definition of the operators

 Qν+2Qν+1 = +Fν+2y(RFD+AF)D+Gν+2y(RGD+AG)D]+O(Δt2)

and the Truncation error

 Tνe=[Δtpτν+1p+1+Δtp+1(τν+1p+2+(RFFνy+RGGνy)τν+1p+1)+O(Δtp+2)]

we have

 Qν+2Qν+1Tνe = Δtp[D+ΔtFν+2y(RFD+AF)+ΔtGν+2y(RGD+AG)]Dτν+1p+1 + Δtp+1[Fν+1yD(RFD+AF)+Gν+1yD(RGD+AG)]τν+1p+1 + Δtp+1D2(τν+1p+2+(RFFνy+RGGνy)τν+1p+1)+O(Δtp+2).

We use the facts that: (1) the form of means that ; (2) , ; and (3) for any integer , to obtain

 Qν+2Qν+1Tνe = Δtp+1[Fν+1yD(RF+AF)+Gν+1yD(RG+AG)]τν+1p+1 + Δtp+1Dτν+1p+2+O(Δtp+2).

The last terms disappear because of (19b),