DeepAI
Log In Sign Up

Error bound analysis of the stochastic parareal algorithm

Stochastic parareal (SParareal) is a probabilistic variant of the popular parallel-in-time algorithm known as parareal. Similarly to parareal, it combines fine- and coarse-grained solutions to an ordinary differential equation (ODE) using a predictor-corrector (PC) scheme. The key difference is that carefully chosen random perturbations are added to the PC to try to accelerate the location of a stochastic solution to the ODE. In this paper, we derive superlinear and linear mean-square error bounds for SParareal applied to nonlinear systems of ODEs using different types of perturbations. We illustrate these bounds numerically on a linear system of ODEs and a scalar nonlinear ODE, showing a good match between theory and numerics.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/10/2019

A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Motivated by the widespread use of temporal-difference (TD-) and Q-learn...
02/03/2019

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

We consider the dynamics of a linear stochastic approximation algorithm ...
09/10/2020

Mean-square contractivity of stochastic θ-methods

The paper is focused on the nonlinear stability analysis of stochastic θ...
10/25/2017

Learning Large Scale Ordinary Differential Equation Systems

Learning large scale nonlinear ordinary differential equation (ODE) syst...
07/24/2021

Theoretical Study and Comparison of SPSA and RDSA Algorithms

Stochastic approximation (SA) algorithms are widely used in system optim...

Code Repositories

StochasticPararealAnalysis

This repository contains sample code for the pre-print by Pentland, Tamborrino, and Sullivan - "Error bound analysis of the stochastic parareal algorithm".


view repo

1 Introduction

In recent years, there has been an increasing need to develop faster numerical integration methods for a broad range of time-dependent ordinary, partial, and stochastic differential equations (ODEs/PDEs/SDEs) that form the building blocks of mathematical models of real-world systems. Advances in high performance computing have fuelled the development of new parallel algorithms for solving such initial value problems (IVPs), with one particular area of increasing focus being parallel-in-time (PinT) methods111A collection of PinT research papers and software can found at https://parallel-in-time.org/.. Given the causal nature of time, PinT algorithms provide a way to, either iteratively or directly, simulate the solution states of an IVP across the entire time interval of integration concurrently. Since the seminal work of Nievergelt1964, a variety of different PinT methods have been developed over the last years—refer to gander2015 and ong2020 for reviews.

Of particular interest here is the increasingly popular multiple shooting-type/multigrid PinT algorithm know as parareal (lions2001). Dividing the time interval into ‘slices’, typically of fixed size , parareal iteratively combines solutions calculated by two serial numerical integrators (one coarse- and one fine-grained) on each slice using a predictor-corrector (PC) scheme. Using processors, one for each time slice, parareal runs the fine (expensive) computations in parallel, locating a solution to the IVP in a fixed number of iterations , after which one says the algorithm has ‘converged’. Parallel speedup from parareal is bounded above by and so, ideally, the algorithm finds a solution in iterations. Since it was proposed, parareal has been shown to provide speedup for a range of different IVPs in areas ranging from molecular dynamics (engblom2009; legoll2020; legoll2022) to plasma physics (samaddar2010; samaddar2019; grigori2021)—refer to (ong2020, Sec. 2) for an overview. Variants have also been developed to tackle issues with Hamiltonian systems (bal2008; dai2013), task scheduling (elwasif2011), and IVP stiffness (maday2020). Others, aimed at reducing the number iterations in parareal, have also emerged. Approaches include learning the correction term in the PC using Gaussian process emulators (pentland2022b) and building the coarse solver using a Krylov subspace of the set of PC solutions (gander2008krylov).

The focus of this paper is the stochastic parareal algorithm (henceforth SParareal), developed by pentland2022a

, in which random perturbations are added to the PC to increase the probability of locating the solution to the IVP in fewer iterations. In each time slice,

samples are taken from probability distributions constructed using different combinations of coarse and fine solution information (known as “sampling rules”) and propagated forward in time in parallel. From this set of perturbed solutions, the ones which generate the most continuous (fine) trajectory in solution space, starting from the exact known initial condition, are selected. Numerical simulations showed that as the number of samples increased, the probability of locating the exact solution increased and so convergence would occur in fewer iterations and therefore faster wallclock time. Numerical simulations also showed that, upon multiple simulations of SParareal, one could obtain a distribution of solutions to the IVP which were numerically accurate with respect to the serially located fine solution.

conrad2017 and lie2019; lie2022 developed similar (non-time-parallel) numerical schemes in which they perturb the solution states generated by serial numerical integrators to incorporate a measure of uncertainty quantification in the solution of ODEs.

1.1 Contribution and outline

Deriving rigorous error bounds for SParareal (and other PinT methods in general) is important in demonstrating that numerical solutions obtained in parallel are meaningful, accurate, and that they can be compared to one another (gander2022). The first convergence results for parareal assumed that was fixed, showing that the error of the scheme approached zero as the time slice size (lions2001; bal2002; bal2005). Here, we are interested in studying how numerical errors behave when is fixed and the number of iterations grows. gander2007 investigated this, deriving both linear and superlinear error bounds for the scalar linear ODE problem on unbounded and bounded time intervals, respectively. Following this, gander2008 used the generating function method to derive a superlinear bound for nonlinear systems of ODEs on bounded intervals. Whereas work has been done to derive error bounds for parareal applied to certain SDEs (bal2006; engblom2009), where the IVP itself contains randomness, those results cannot be applied here, as it is the SParareal scheme itself that contains randomness, not the solvers or the IVP.

In this paper, we extend the qualitative discussion of numerical convergence in pentland2022a by making use of convergence analysis from both the parareal literature (see above) and the sampling-based ODE solver presented in lie2019. We derive explicit mean-square error bounds for SParareal applied to nonlinear systems of ODEs (over a finite time interval), using both state-independent and state-dependent perturbations. The rest of the paper is organised as follows. In Section 2, we set up the IVP and provide an overview of the (classic) parareal algorithm. We then detail the SParareal scheme in Section 3, outlining how it works and defining the (state-dependent) sampling rules. In Section 4, we begin by outlining the assumptions on the fine and coarse integrators required to derive the error bounds. We first consider the state-independent

setting, where the random perturbations do not depend on the current time step or iteration and are assumed to have bounded absolute moments. In this setting we derive our main result (

Theorem 4.6), a superlinear bound on the mean-square error depending on the time step and iteration. Using this result, we can maximise the error over time (fixing the iteration), to derive a linear error bound (Corollary 4.7). In the state-dependent setting, we allow the perturbations to depend on the current state of the system, i.e. the coarse and fine solution information at the current time step and iteration, to analyse the convergence of the sampling rules proposed in the original work. We derive linear bounds (Corollaries 4.14 and 4.13) in this case. Following this, we verify these bounds by applying them numerically to a linear system of ODEs and a nonlinear scalar ODE in Section 5. We conclude with some brief remarks on the theoretical and numerical results and discuss possible directions for future study in Section 6. In the Appendix, we provide some technical results used to derive the aforementioned error bounds.

1.2 Notation

Let

denote the expected value of a random variable. Variables

will be

-dimensional real-valued vectors unless otherwise stated. Denote the component-wise absolute value of a vector as

and the Hadamard (component-wise) product as . Also note that will correspond to component-wise squaring, i.e. . We let denote the infinity (or uniform) norm, i.e. . The

-dimensional vector of ones and the identity matrix will be written as

and , respectively. Non-negative constants will be denoted throughout by , ,….

2 The parareal scheme

In this section, we describe the IVP under consideration and define the classic parareal scheme.

2.1 Problem setup

Consider the following system of autonomous ODEs

(2.1)

where is a nonlinear vector field, is the time-dependent solution, and is the initial value at time . We assume is sufficiently smooth such that (2.1) has a unique solution for all initial conditions of interest. We seek numerical solutions to the IVP (2.1) on a pre-defined mesh , where for fixed . is the number of processors required for parareal to compute a solution in parallel, i.e. one processor is assigned to each time slice , . Note that everything that follows extends to the non-autonomous case but is not discussed here to simplify explanation and notation.

2.2 The scheme

The parareal algorithm uses processors and two deterministic numerical flow maps (solvers) to locate solutions , at iteration and time , to system (2.1). These maps take an initial state at time and propagate it, over a time slice of size , to a terminal state at time . The fine solver, denoted as the flow map , returns a terminal state with high numerical accuracy at very high computational cost. The cost is high enough that trying to use to solve (2.1) sequentially, i.e. calculate

(2.2)

is computationally infeasible. The coarse solver, denoted similarly by , returns a terminal state with a much lower numerical accuracy, at a smaller computational cost than . Therefore, is fast and allowed to run serially at any time, whilst is expensive and must only be run in parallel.

Figure 2.1: First iteration of parareal that numerically approximates the exact solution (black line), obtained via repeated serial applications of the fine solver, i.e. (2.2) in the scalar case. The coarse initial guess found using (yellow lines and dots) is followed by the parallel runs of from these guesses (blue lines). The coarse predictions from (red lines) are then used in the prediction-correction update (2.5) (red dots). Figure adapted from pentland2022a (Fig. 2).
Definition 2.1 (Parareal).

For the two numerical flow maps and described above, the parareal scheme is given by

(2.3)
(2.4)
(2.5)

Parareal begins by propagating the exact initial condition (2.3) across serially using (2.4). Each of the solution states found in (2.4) are then propagated in parallel using . Equation (2.5), often referred to as the PC, is then run serially to update the solution states at each . This process of propagating using and updating using the PC can be repeated iteratively, stopping after iterations, once a pre-specified stopping criterion is met. The stopping criteria can take many different forms, one popular choice being that the time slices up to are considered converged if

(2.6)

for some user-specified tolerance . Once , parareal is said to have converged in iterations. An illustration of the first iteration of parareal is given in Figure 2.1.

Remark 2.2.

It is often assumed, and will be assumed in this paper, that , if it were computationally feasible to apply serially over , returns the ‘exact’ solution to (2.1), i.e. it returns , . Therefore, after iterations, parareal returns the exact solution for , meaning that parareal returns the exact solution over in at most iterations—albeit achieving no parallel speedup in this case. This is an important feature of parareal which is also preserved by SParareal.

3 The stochastic parareal scheme

In this section, we outline the SParareal scheme, how it works, and how we have slightly modified the scheme, compared to the original formulation in pentland2022a, to carry out the error bound analysis in Section 4. Following this, we describe the sampling rules tested in the original work.

3.1 The scheme

The intuition behind SParareal is to perturb the solution states in the classic parareal scheme, i.e. the PC (2.5), with some additive noise to reduce the number of iterations taken until the stopping tolerance (2.6) is met. Recall that a reduction in by even a single iteration can correspond to a large increase in numerical speedup—by approximately the runtime of a single run of .

First, let us formally define the scheme.

Definition 3.1 (SParareal).

For the two numerical flow maps, and , described in Section 2, the stochastic parareal scheme is given by

(3.1)
(3.2)
(3.3)
(3.4)

where are (possibly state-dependent) random variables. Note that when .

The first three stages of the scheme (3.1)-(3.3) are identical to the ‘zeroth’ and first iteration of parareal. The exact initial condition (3.1) is propagated forward in time using the coarse solver (3.2), then there is a first pass of the PC (3.3). Following this, the stochastic iterations begin (3.4), whereby random perturbations, i.e. a single draw from the random variable , are added to the PC solution. Notice that no random perturbation is added when to ensure that SParareal returns the exact solution up to time after iterations, just as parareal does, recall Remark 2.2. The reason the random perturbations are only included from iteration onward is because may depend on solution information from iteration . In Section 3.2, we will discuss the construction of these random variables in more detail. Note that the parareal scheme (Definition 2.1) can be recovered by setting for in (3.4).

The scheme in Definition 3.1 looks slightly different to the one presented in pentland2022a, where random perturbations were instead incorporated via random variables, denoted by , in the correction term, i.e. equation (3.4) was instead written as

(3.5)

The different state-dependent forms that can take are defined through the “sampling rules” in Section 3.2. Also note that when , which is equivalent to the condition that when in Definition 3.1. This version 3.5 of the scheme was designed so that samples could be drawn from each of the random variables to increase the probability of locating the exact solution state in fewer iterations. From the sets of samples generated at each , all having been propagated in parallel using and (recall (3.5)), those generating the most continuous trajectory across were then chosen as the “best” perturbations . Numerical experiments illustrated that increasing led to further and further reductions in , albeit at the cost of requiring more processors, specifically in SParareal vs in parareal.

To enable us to carry out the convergence analysis, we move the random perturbations in (3.5) outside the correction term, and express them using in (3.4). This new scheme is equivalent to the old scheme in the case where one sample () is drawn at each time . To obtain the appropriate values for , we equate (3.4) and (3.5) to find

(3.6)

Using the scheme in Definition 3.1, we can derive error bounds for state-independent perturbations, i.e. , that have bounded absolute moments. Using this result, we can then derive corresponding bounds for the state-dependent sampling rules summarised in Table 3.1 using relation (3.6). All of these bounds are derived assuming one sample () is drawn from each . The sample case is much more complex and out of the scope of the present work.

3.2 Sampling Rules

The sampling rules presented by pentland2022a describe the probability distributions that follow in the SParareal algorithm, see Table 3.1. These distributions were designed to vary with both iteration and time step , so that as the solution states get closer to

, their variances would decrease. The benefit of this property will be highlighted in

Section 5. The different rules were constructed to assess the performance of SParareal when the perturbations had different distribution families, marginal means, or correlations.

Sampling rule
Table 3.1: Sampling rules that the random variables follow. The quantities and are -dimensional Gaussian and uniform random vectors, respectively, whilst .

Sampling rules 1 and 2 correspond to multivariate Gaussian perturbations with marginal means and

, respectively, and marginal standard deviations

. The variable is a standard

-dimensional Gaussian random vector. Sampling rules 3 and 4 correspond to perturbations following a multivariate uniform distribution with the same marginal means and standard deviations as rules 1 and 2, respectively. The variable

is a standard -dimensional uniformly distributed random vector with independent components. Note that pentland2022a considered both correlated and uncorrelated random variables in their experiments, whereas here we carry out our analysis only for the uncorrelated case for simplicity.

4 Error bound analysis

In this section, we analyse the mean-square error

(4.1)

between the exact solutions and the stochastic numerical solutions located by the SParareal scheme. We also define the maximal mean-square error (over time) at iteration to be

(4.2)

Specifically, we analyse the mean-square error for the nonlinear autonomous system of ODEs in (2.1), first deriving superlinear (Theorem 4.6) and linear (Corollary 4.7) bounds using state-independent perturbations in SParareal. Then, using these results, we derive linear bounds for the state-dependent sampling rules 2 and 4 (Corollary 4.13) and 1 and 3 (Corollary 4.14). In the following, we introduce some assumptions on the flow maps (gander2008) and perturbations (lie2019) required to derive the error bounds.

Assumption 4.1 (Exact flow map).

The flow map solves (2.1) exactly such that

(4.3)

This assumption is made for simplicity, since SParareal is essentially trying to locate the solution that would be obtained by running the fine solver serially, i.e. (2.2), in parallel. If instead we were to consider to be a numerical method with some (very small) numerical error with respect to the exact solution, then the accuracy of would provide a lower bound on the accuracy of the SParareal scheme as a whole.

Assumption 4.2 (Coarse flow map).

The flow map is a one-step numerical method with uniform local truncation error , for , such that

(4.4)

for and continuously differentiable functions . Taking the difference of (4.4) evaluated at states , then applying norms and the triangle inequality, we can write

(4.5)

where is the Lipschitz constant for and we absorb terms into .

Assumption 4.3 (Lipschitz coarse flow).

The flow map satisfies the Lipschitz condition

(4.6)

for and Lipschitz constant .

Note that these assumptions do not restrict the choice of , as they are met when choosing any Runge-Kutta or Taylor method (hairer1993).

In addition to assumptions on the flow maps, we require an assumption on the absolute moments of the (state-independent) random variables, which will be needed to prove Theorem 4.6.

Assumption 4.4 (Bounded absolute moments of ).

For , , and independent of , , and , the -th absolute moments of satisfy

(4.7)

This assumption enables flexibility in defining the state-independent perturbations, in the sense that it does not require that the random variables be independent, identically distributed, or centred (lie2019). It also means that each could follow a different probability distribution, with the only requirement being that they share a common maximal bound on their absolute moments with respect to the norm. Note that we assume without loss of generality here, so that for increasing , the perturbations get smaller and smaller. For , we can simply take to be negative.

These assumptions will enable us to derive error bounds in the state-independent and state-dependent cases (using sampling rules 2 and 4). The sampling rule 1 and 3 case requires the following as well.

Assumption 4.5 (Lipschitz exact flow).

The flow map satisfies the Lipschitz condition

(4.8)

for and constant .

4.1 State-independent perturbations

In this section, we derive error bounds for SParareal when using the state-independent perturbations .

Theorem 4.6 (Superlinear error bound for state-independent perturbations).

Suppose the SParareal scheme (3.1)-(3.4) with satisfies Assumptions 4.4, 4.3, 4.2 and 4.1. Then, the mean-square error (4.1) of the solution to the nonlinear ODE system (2.1) at iteration and time satisfies

for and constants , , , and .

Proof. Using (3.4), that is the exact solver (4.3), and adding and subtracting , we see that

where , , and are given by

Then, using the triangle inequality and (A.1) for the cross terms (engblom2009, Sec. 4.2), we obtain

(4.9)

Using (4.5), we can bound

(4.10)

Applying the Lipschitz condition (4.6), we obtain

(4.11)

Using (4.7) with , we obtain

(4.12)

Plugging (4.10)-(4.12) into (4.9) and choosing , , and , we obtain the double recursion

(4.13)

where , , , and . Solving (4.13) using the generating function method in Lemma B.1 (see Appendix), we obtain the result.

One can make alternative choices for , , and , however, the choices given in the proof above seem to yield the smallest error bounds. If we were to maximise (4.13) over , we obtain the following linear error bound in the case that , i.e. .

Corollary 4.7 (Linear error bound for state-independent perturbations).

Suppose the SParareal scheme (3.1)-(3.4) with satisfies Assumptions 4.4, 4.3, 4.2 and 4.1. Then, the maximal mean-square error (4.2) of the solution to the nonlinear ODE system (2.1) at iteration satisfies

for and constants , , and .

Proof. Following the proof of Theorem 4.6, we maximise (4.13) over to obtain

(4.14)

where

Solving recursion (4.14) with initial condition , we obtain the desired result.

Remark 4.8.

The bounds in Theorem 4.6 and Corollary 4.7 hold for due to the design of the SParareal scheme. We can recover the bound for iteration (which is deterministic) by solving the second recursion in (4.13) with initial value such that

(4.15)

For the case when , the numerical error is zero, as will have propagated the exact initial condition at forward times without any perturbations, just like parareal (recall Remark 2.2).

Remark 4.9.

The bound in Theorem 4.6 (similarly for Corollary 4.7) can be written as

(4.16)

where is a function of and . Assuming and that is non-increasing in , the accuracy of SParareal should increase with each iteration proportional to the local truncation error of (i.e. the term ) up until the errors induced by the perturbations (i.e. ) become dominant. We illustrate this property numerically in Section 5.

Remark 4.10.

As , both error bounds go to zero as expected, as can be seen clearly in (4.16). The intuition being that as , the local truncation error of goes to zero, i.e. it ‘becomes’ the exact flow map , see (4.5).

Remark 4.11.

If we send (assuming ), the second moments of the random variables vanish and we recover the classic parareal scheme. This can be seen in both Theorem 4.6 and Corollary 4.7, where vanishes as , leading to bounds similar to those for classic parareal. These bounds are not identical to those calculated by gander2007, gander2008, and gander2022 because we are working with the mean-square error, not the (mean) absolute error.

Remark 4.12.

If we additionally assume that the random variables are centred, i.e. , and work in the -norm, i.e. , in the proof of Theorem 4.6, we can write (4.9) as

where the final two terms are equal to zero by independence of and with and using the fact that each is centred. Continuing the proof, we obtain the same bounds for Theorem 4.6 and Corollary 4.7 with slightly altered constants , , , and .

4.2 State-dependent perturbations (sampling rules)

We now use the previous results to derive the corresponding error bounds for the state-dependent sampling rules defined in Table 3.1.

Corollary 4.13 (Linear error bound for state-dependent sampling rules 2 and 4).

Suppose the SParareal scheme (3.1)-(3.4) satisfies Assumptions 4.3, 4.2 and 4.1, with defined using sampling rule 2 or 4. Then, the maximal mean-square error (4.2) of the solution to the nonlinear ODE system (2.1) at iteration satisfies

for and constants , , , and .

Proof for sampling rule 2. The proof follows in the same fashion as Theorem 4.6. Instead of using the bound (4.12), we obtain, using (3.6) and applying (4.5),

(4.17)

Substituting in for sampling rule 2 (Table 3.1) we get

The second inequality follows by Cauchy-Schwarz and independence of and . The third follows by plugging in , applying (4.6), and noting that . Next, we add and subtract inside the expectation and then apply (A.1), with , to get

(4.18)

Using the new bound for in (4.9), we obtain the double recurrence

(4.19)

where , , , and . Maximising over , we obtain

(4.20)

where

Recursion (4.20) can be solved using Lemma B.2 (see Appendix), resulting in the desired bound.

Proof for sampling rule 4. The proof follows in the same way as the proof for sampling rule 2, except that is used in place of .

Corollary 4.14 (Linear error bound for state-dependent sampling rules 1 and 3).

Suppose the SParareal scheme (3.1)-(3.4) satisfies Assumptions 4.5, 4.3, 4.2 and 4.1, with defined using sampling rule 1 or 3. Then, the maximal mean-square error (4.2) of the solution to the nonlinear ODE system (2.1) at iteration satisfies

for and constants , , , , and .

Proof for sampling rule 1. The proof follows in the same fashion as Corollary 4.13. Substituting for sampling rule 1 (Table 3.1) in (4.17), we get

(4.21)

The second inequality follows by applying (A.1) with . To bound the first term in (4.21), we add and subtract inside the expectation and apply (A.1) again with , obtaining

1st Term

The second inequality follows by applying the Lipschitz condition (4.8) and recalling that is the exact solver (4.3). The second term in (4.21) can be bounded as in (4.18) in Corollary 4.13,

Combining both terms in (4.21), we obtain

where , , and . Using the new bound for in (4.9), we obtain the following recurrence

(4.22)

where and . Maximising over , we obtain

(4.23)

where

Recursion (4.23) can be solved using Lemma B.2 (see Appendix), resulting in the desired bound.