1 Introduction
Very highorder methods have become rather ubiquitous in the field of numerical methods for hyperbolic partial differential equations. There are many methods which obtain an arbitrarily highorder in its spatial discretization, namely, discontinuous Galerkin
[glaubitz2020stable, offner2019error], spectral difference method [Liu2006, glaubitz2018application], etc. When considering timedependent problems, the time integration must have the same order of convergence as of the spatial one, in order to formally guarantee the high order spacetime convergence.Explicit Runge–Kutta methods have long been used for its simplicity and ease of implementation. However, in the context of conservation laws, it is desirable for time integration to be Total Variation Diminishing (TVD), in order to avoid spurious oscillations. RungeKutta timestepping schemes that are TVD are the so called Strong Stability Preserving (TVDSSP RK) schemes and they hit an accuracy barrier, as these methods cannot be higher than forth order [ruuth2002].
There are, however, timestepping methods which promise arbitrarily high accuracy, without the necessity to compute Butcher like coefficient tables, such as Defereed Correction [dutt2000dec], ADER [toro2001towards], SBP [nordstrom2013summation] or (continuous or discontinuous) Galerkin approaches also in time.
The deferred correction method was first introduced in the context of ODEs, and later, formulated as a timestepping scheme for PDEs in conjunction with Finite Element ansatzes [abgrall2017dec, liu2008strong]. The key idea of DeC is based on the PicardLindelöf Theorem and especially on Picard iteration. In every iteration step, we decrease the error of the numerical solution until we reach a fixed bound / limit. Extension to implicit or semiimplicit variation exist [minion2003dec] but we concentrate on the explicit version. From our point of view, some advantages of DeC compared with explicit RK are the possibility to use an FE ansatz in space and avoid the inversion of mass matrix to obtain a full discretization of a PDE as described in [abgrall2017dec] or to obtain arbitrary high order time accuracy without computing the order conditions checks.
The ADER approach, introduced firstly in [toro2001towards] and further developed in many other works, e.g. [dumbser2008unified, titarev2002ader, zanotti2015space], remains quite elusive and misunderstood in the broad community of numerical analysis for hyperbolic problems, despite being able to achieve arbitrary high order in time without the well known barriers found in, for example, TVDSSP RK.
The first ADER methods for linear hyperbolic equations were presented in [Schwartzkopff2002ADER, toro2001towards]. The (historical) ADER (AdvectionDiffusionReaction) approach, described in the paper “ADER: Arbitrary High Order Godunov Approach” [titarev2002ader], extends what the method to nonlinear hyperbolic systems, achieving arbitrary high order accuracy both in time and space. The key ingredient of this approach is to consider a generalised Riemann Problem [toro2009riemann].
Later, in [dumbser2008unified], a different formulation of ADER is presented, which can be interpreted as a spacetime finite element method. This is the ADER approach that we consider in this paper, referring to it as the modern ADER (described in detail in section 2). Although used abundantly in many codes, the theoretical properties of ADER are not well studied. Some literature shows, for example, that for a linear homogeneous system, a finite number of iterations is sufficient to convergence [hjackson2017]. However, questions on how to choose integration points, polynomial basis, are only addressed empirically uptoour knowledge.
Since DeC and the modern ADER are both iterative approaches, the questions that motivated this paper are the following:

Is there a connection between the ADER timestepping method and the DEC timestepping method? And if yes, what is this connection?

Does the connection between ADER and DEC help us study properties of the ADER scheme?
In order to address these questions, we first introduce the two methods considered. In section 2, we describe the ADER approach and, in section 3, the Deferred Correction method. Then, in section 4, we show that these two methods are very similar and that DEC can be written as an ADER scheme and vice versa. This connection is also verified through numerical experiments shown in section 5. Finally, we conclude the paper with a discussion in section 6.
2 The (Modern) ADER Approach
In the following section, we will give a short introduction about the ADER approach and how we understand it. Actually, it is an iterative process to obtain the numerical solution of a given space–time PDE. Therefore, we consider for simplicity the following scalar (non)linear hyperbolic problem
(1) 
Suppose that solves this equation under sufficient boundary and initial conditions. The main idea of the modern ADER is based on the variational formulation in the finite element context. We suppose to approximate the field in a set of space nodes
and we denote with the vector
the semidiscretization in space of the function ^{1}^{1}1By switching from the PDE formulation to the ODE setting, we change also the used notation. Here, is used in the PDE case and when speaking about an ODE..For the time discretization, we consider a time interval , and we represent as a linear combination of basis functions in the time component :
(2) 
where is the vector of time basis functions and
is the vector of coefficients related to the time basis functions for each degree of freedom in space. Typically,
are Lagrange basis functions in some nodes(3) 
e.g. equispaced, GaussLobatto or GaussLegendre nodes.
Proceeding with the weak formulation in time, we consider a smooth test function and we integrate over some interval .
We consider, from now on, only the time derivatives of the hyperbolic equation (1). In other words, we can say that, through the method of lines, we make a splitting in space and time because the ADER approach will be used as the time integration method and it will be fully explicit. For the space discretisation, one can use their favorite numerical scheme, inter alia discontinuous Galerkin (DG), flux reconstruction (FR), finite volume (FV) or ENO/WENO space which finally yield to the different ADER representation like ADER–WENO or ADER–DG that can be found in literature. Additionally, we will not suffer from any other issues like accuracy reduction or similar if using one the above described space methods.
We consider, thus, the function to be the semidiscrete operator in space for all the degrees of freedom, which is given by the chosen spatial scheme. We can just focus then on the resulting system of ODEs for
(4) 
Its variational form in time is given by
(5) 
Now, we replace the unknown with its reconstruction and we choose the test functions to be the same as the basis functions. This results in the following definition of the operator
(6) 
Integrating by parts yields
(7) 
Now, we replace the integrals by quadratures. The quadrature nodes can coincide (or not) with the ones defining the Lagrange polynomials^{2}^{2}2In this work we choose the quadrature nodes as equidistant, GaussLobatto or GaussLegendre nodes. However, in many application of ADER, GaussLegendre nodes are the typical choice to guarantee the exactness of the integration.. We denote them as and with respective weights . We choose the weights, the nodes and the basis functions to be scaled for the interval , so that we explicitly highlight the presence of . The integration terms reduce to
(8)  
Then, the approximation of (7) yields, for every test function indexed by
(9) 
This is a system of equations with unknowns for every ODE of the system (4). Actually, what is expressed here is nothing more than a classical collocation method, c.f. [wanner1996solving], or a high order implicit RK method. The order depends on the used quadrature rule. Using, as an example, Gauss–Legendre or Gauss–Lobatto nodes both for the definition of the basis functions and the quadrature nodes results in a high order quadrature formula. One can also use different points for the quadrature and the basis functions, resulting in more varieties of this scheme.
We can rewrite the system (9) in a matrix–fashioned way, using the mass matrix , given by
(10) 
and the right–hand side functional , given by
(11) 
We have brought on the right–hand side of (9) the nonlinear terms and the explicit ones, while keeping the linear terms on the left–hand side. Finally, our system to be solved is given by
(12) 
This equation is nothing else than a fixed–point problem. Its solution will give us an th order accurate solution in time. It cannot be directly solved when nonlinear fluxes are present, but it can be solved under certain assumptions with an iterative process in . Defining the stating guess as , the algorithm proceeds iteratively as follows
(13) 
Several questions arise automatically for (13), such as, how is the convergence of the method influenced by the mass matrix and, hence, by the node placement? By reinterpreting the new ADER approach into the DeC framework in section 4, we can answer these and more questions, thanks to the definition of the operator in (12). This serves as a connection to the DeC procedure.
Before introducing the DeC method, we give the following simple example to get more familiar with the ADER method.
Example 2.1.
2 order ADER method for ODEs Let us demonstrate a concrete example of the methodology described above, considering a simple scalar ODE:
(14) 
with .
Let us consider the timestep interval , rescaled to
. The time interpolation nodes and the quadrature nodes are given by GaussLegendre points (in the interval
) and respective quadrature weights.The time basis is given by Lagrange interpolation polynomials built on the nodes .
Then, the mass matrix is given by
thanks to the definition of the Lagrange polynomials.
The right hand side is given by^{3}^{3}3Note that the quadrature process is greatly simplified because of the choice of Gauss–Legendre nodes both for the quadrature and the Lagrangian basis functions in (10) and (11).
Then, the coefficients are given by
Finally, use to reconstruct the solution at the time step :
2.1 Historical ADER
In the community, the term ADER is often associated with the historical approach. In order to establish the difference between the modern version of ADER and the historical one, we explain the procedure, as introduced in [titarev2002ader], highlighting the key differences between the two methods.
Solving the same problem as in (1), let us consider as starting time and suppose that we have at our disposal the cell averages: , for indexing control volumes .
The method consists of three steps:

reconstruction of pointwise values from cell averages (using some reconstruction function )

solution of the generalized Riemann problem at the cell interfaces,

evaluation of the intercell flux to be used in the conservative scheme.
The main difference is in step two. In order to solve the generalised Riemann problem at the cell interfaces (for all local times ), one writes the Taylor expansion of the interface state in time
(15) 
Note that the leading term, , accounts for interactions with the boundary extrapolated values and , and it is the Godunov state of the conventional (piece–wise constant data) Riemann Problem. Typically, an exact or approximate Riemann solver is used to provide the first term of the approximation.
The next terms are higher order corrections to the 0th Godunov state. The high order derivatives in time are replaced by derivatives in space by means of the Lax–Wendroff procedure [laxwendroff].
As in [grptoro], the space derivatives of the solution at can be evaluated as the Godunov states of the following linearized generalized Riemann Problem:
(16)  
The initial condition for the Riemann Problem (16) above is given by differentiating the high–order reconstruction polynomial with respect to . Having all derivatives in terms of their spatial component, the Taylor expansion (15) can be evaluated as
(17) 
for containing all the constants not depending on . This expression approximates the interface state for to th order of accuracy.
Finally, to evaluate the numerical flux (now in time as well), an appropriate Gaussian rule is used:
where , are nodes and weights of the quadrature rule, and the number of nodes.
2.2 ADER space–time DG
Up to now, we have presented the modern and historical ADER approach from our point of view.
However, to clarify again that our description of the modern ADER used in 2 is indeed equivalent with the
one used inter alia in [zanotti2015space], we focus again on it. We hope that the understanding for ADER
will be much simpler with this different perspective.
We follow now [zanotti2015space]
where ADER is presented as a space–time DG approach. They use time and space test functions, which are the tensor products of basis functions in time and basis functions in space. We can write their formulation in our setting. Hence, we define the new basis functions
(18) 
where are the basis functions in space and are the basis functions in time. With the Einstein summation notation, the reconstruction variable is
(19) 
Let us consider a hyperbolic problem given as
(20) 
with appropriate initial and boundary conditions. We consider the weak solution of (20) in space–time. Let us define a space–time cell, , given by the tensor product of a timestep and a volume cell . We multiply (20) by test function and integrate over the defined control volume:
(21) 
We also introduce the following definitions
(22) 
Now, we apply Gauss theorem (e.g. integration by parts) in time and space and obtain
(23) 
where is a classical numerical twopoint flux, is the value of inside and is the value of in the neighboring cell and the normal vector is pointing outwards. We define the spatial mass matrix and the time mass matrix as
(24) 
Splitting as done in (9) into a linear left–hand side and a non–linear right–hand side, we obtain
(25) 
Using the Picard iteration process we obtain, again
(26) 
where
(27)  
(28) 
Remark 2.2.
We presented here in detail the ADER–DG method. Considering the (27), one notice that the full space–time matrix, actually depends on several different mass matrices, i.e., on and on . Changes in the space discretization affect only the space discretisation matrix , one can, for example, consider different schemes like Flux Reconstruction which are working with the differential formulation of the PDE. Nevertheless, the time–integration structure of the matrix (27) given by the ADER approach will stay the same. That is why we can consider the ADER to be a time integration method.
3 Deferred Correction Methods
In this section, we focus on the Deferred Correction (DeC) method which was introduced by Dutt et al. [dutt2000dec] and then reinterpreted by Abgrall [abgrall2017dec]. It is an explicit, arbitrary high order method for ODEs but further extensions of DeC, including implicit, semiimplicit or modified Patankar versions, can be found nowadays in the literature [christlieb2010integral, minion2003dec, offner2019arbitrary].
Since we want to embed the modern ADER approach into this framework, we focus only on the explicit version. Therefore, we use the notation of DeC introduced by Abgrall in [abgrall2017dec] because, in our opinion, it is more convenient to prove accuracy than in previous works [dutt2000dec, christlieb2010integral, liu2008strong]. Actually, Abgrall focuses on DeC as a time integration scheme in the context of finite element methods. In particular, when using continuous Galerkin schemes for the space discretization and applying RK methods, a sparse mass matrix has to be inverted. Through the DeC approach, one can avoid the mass matrix inversion.
The main core of all the DeC algorithms is the same and it is based on the PicardLindelöf theorem in the continuous setting. The theorem states the existence and uniqueness of solutions for ODEs. The classical proof makes use of the so–called Picard iterations to minimize the error and to prove the convergence of the method, which is nothing else than a fixed–point problem. The foundation of DeC relies on mimicking the Picard iterations and the fixed–point iteration process at the discrete level.
Here, we see already some connection between the two approaches. Indeed, the approximation error decreases with several iteration steps. For the description of DeC, let us consider the system of ODEs as in (4)
with .
Abgrall
introduces two operators: and .
The operator represents a loworder
easy–to–solve numerical scheme, e.g. the explicit Euler method,
and is a highorder operator
that can present difficulties in its practical solution, e.g. an implicit RK scheme or a collocation method.
We use here on purpose the nomenclature since we will see that actually
this is the key point in the connection.
The DeC method can be written as a combination of these two operators.
Given a timeinterval we subdivide
it into subintervals ,
where and
and we mimic for every subinterval the Picard–Lindelöf theorem
for both operators and .
We drop the dependency on the timestep for subtimesteps
and substates as denoted in Figure 1.
Then, the operator is given by
(29) 
Here, the term denotes an interpolation polynomial of order evaluated at the points . In particular, we use Lagrange polynomials , where and for any . Using these properties, we can actually compute the integral of the interpolants, thanks to a quadrature rule in the same points with weights . We can rewrite
(30) 
The operator represents an th order numerical scheme (collocation method) if set equal to zero, i.e., . Unfortunately, the resulting scheme is implicit and, further, the terms may be nonlinear. Because of this, the only formulation is not explicit and more efforts have to be made to solve it.
For this purpose, we introduce a simplification of the operator. Instead of using a quadrature formula at the points we evaluate the integral in equation (29) applying the left Riemann sum. The resulting operator is given by the forward Euler discretization for each state in the timeinterval, i.e.,
(31) 
with coefficients .
To simplify the notation and to describe
DeC, as before, we introduce the vector of states for the variable at all subtimesteps
(32)  
(33) 
Now, the DeC algorithm uses a combination of the and operators to provide an iterative procedure. The aim is to recursively approximate , the numerical solution of the scheme, similarly to the Picard iterations in the continuous setting. The successive states of the iteration process will be denoted by the superscript , where is the iteration index, e.g. . The total number of iterations (also called correction steps) is denoted by . To describe the procedure, we have to refer to both the th subtimestep and the th iteration of the DeC algorithm. We will indicate the variable by .
Finally, the DeC method can be written as
DeC Algorithm
(34) 
where is the number of iterations that we want to compute. Using the procedure (34), we need, in particular, as many iterations as the desired order of accuracy, i.e., .
Notice that, in every step, we solve the equations for the unknown variables which appears only in the formulation, the operator that can be easily inverted. Conversely, is only applied to already computed predictions of the solution . Therefore, the scheme 34 is completely explicit and arbitrary high order as stated in [abgrall2017dec] with the following proposition.
Proposition 3.1.
Let and be two operators defined on , which depend on the discretization scale , such that

is coercive with respect to a norm, i.e., independent of , such that for any we have that

is Lipschitz with constant uniformly with respect to , i.e., for any
We also assume that there exists a unique such that . Then, if , the DeC is converging to and after iterations the error is smaller than .
Proofs of this proposition and of the hypotheses of the proposition for operators and such as (31) and (30) can be found in [abgrall2017dec, abgrall2018asymptotic, offner2019arbitrary].
The condition for comes actually from the fixed–point theorem and has to be guaranteed that the iterative process converges. Before, we focus on the relation between DeC and ADER we give the following remark.
Remark 3.2.

In the operator, one can use higher order time integration methods than the explicit Euler method. In principle, this increases the convergence rate in the iteration process. However, an accuracy drop down can be observed in some cases. This is due to the fact that the smoothness of the error is not given anymore, c.f. [christlieb2010integral].

In our description of DeC both endpoints are included. However, also GaussLegendre nodes like in example 2.1 can be used. Then, the approximation at the endpoint is done via extrapolation, see [dutt2000dec].

Finally, any DeC method can be interpreted as a RK scheme [christlieb2010integral]. The main difference between RK and DeC is that the latter gives a general approach to the time discretization and does not require a specification of the coefficients for every order of accuracy.
4 Relation between DeC and ADER
What we see up to now was a repetition of the DeC and ADER approaches. Both are based on an iterative procedure and mainly the foundation is given by the same fixed–point iteration method. In the following, we point out the relation between these two methods, namely, we show how ADER can be expressed as DeC and vice versa. This relation can be used to prove a new theoretical result for the ADER algorithm. The number of iterations needed to the Picard process can be chosen equal to the accuracy order we aim to reach. This result can be used in a general setting, providing few hypotheses on the operators, extending the result of [hjackson2017].
4.1 ADER as DeC
First, we start to show how the modern ADER can be put into the DeC framework which we have described in section 3. Therefore, let us rewrite the operator from section 2. It is given in equation (7) and (12). It is
where is the previously defined invertible mass matrix. For the DeC algorithm 34, we need further an low–order explicit operator. For the ADER, we choose the same low order operator of the DeC, namely,
(35) 
Actually, we have to mention that this operator is not really unique and can be defined in different ways, since all the information is already included in the operator for the ADER–DeC approach (see remark 4.1). Nevertheless, the choice of (35) is useful for the hypotheses of the proposition 3.1, because, in this way, the difference of the two operators will be Lipschitz continuous.
Then, we obtain the ADERDeC algorithm:
defining , . Hence, we can explicitly write it as
which is nothing more than the discrete fixed–point problem in equation (12).
Remark 4.1.
The operator already comprises the information needed in the fixed–point iteration. Nevertheless, the operator serves us to easily prove the convergence up to the required order of accuracy of the process. Finally, we like to mention that one can also define the operators as
(36) 
and obtain an analog result. We use this expression when demonstrating the accuracy property in subsection 4.3.
4.2 DeC as ADER
Here, we will show that the DeC scheme can be derived from the ADER method. The essential difference is the choice of appropriate basis functions. This is mainly related to the definition of the operator in the DeC framework. If we rewrite (29)
and focus on the th line, which reads
we can rewrite the operator in the following form
(37) 
where
is the characteristic function in the interval
, i. e.,(38) 
Therefore, we can actually include the integration also for the first two terms of equation (37), resulting in
(39)  
(40) 
where are the chosen test functions.
If we compare (40) with the beginning ADER formulation (5) and (6), we notice that they differ just in the choice of the test functions. If for the ADER approach we chose test functions to be the basis functions, in the DeC method we considered different test functions.
Inserting this in the DeC algorithm, we obtain