Multilevel Sequential^2 Monte Carlo for Bayesian Inverse Problems

09/27/2017 ∙ by Jonas Latz, et al. ∙ Technische Universität München 0

The identification of parameters in mathematical models using noisy observations is a common task in uncertainty quantification. We employ the framework of Bayesian inversion: we combine monitoring and observational data with prior information to estimate the posterior distribution of a parameter. Specifically, we are interested in the distribution of a diffusion coefficient of an elliptic PDE. In this setting, the sample space is high-dimensional, and each sample of the PDE solution is expensive. To address these issues we propose and analyse a novel Sequential Monte Carlo (SMC) sampler for the approximation of the posterior distribution. Classical, single-level SMC constructs a sequence of measures, starting with the prior distribution, and finishing with the posterior distribution. The intermediate measures arise from a tempering of the liklihood, or, equivalently, a rescaling of the noise. The resolution of the PDE discretisation is fixed. In contrast, our estimator employs a hierarchy of PDE discretisations to decrease the computational cost. We construct a sequence of intermediate measures by decreasing the temperature or by increasing the discretisation level at the same time. This idea builds on and generalises the multi-resolution sampler proposed in [P.S. Koutsourelakis, J. Comput. Phys., 228 (2009), pp. 6184-6211] where a bridging scheme is used to transfer samples from coarse to fine discretisation levels. Importantly, our choice between tempering and bridging is fully adaptive. We present numerical experiments in 2D space, comparing our estimator to single-level SMC and the multi-resolution sampler.



There are no comments yet.


page 20

page 26

page 32

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In science and engineering we use mathematical models to simulate and understand physical processes. These models require input parameters. Once the parameters are specified we can solve the so-called forward problem to obtain output quantities of interest. In this work we focus on models that involve partial differential equations (PDEs). To date approximate forward solvers are available for many PDE-based models, and output quantities of interest can be approximated efficiently. In contrast, the identification of input parameters (the inverse problem) is more challenging. Often the physical process is only given implicitly by observations (data, measurements). These measurements are typically noisy and/or sparse, and do not contain sufficient information on the underlying parameter or are disturbed in such a way that the true parameter cannot be recovered at all. The inverse problem is ill-posed.

A classical example is the simulation of steady-state groundwater flow to assess the safety of proposed long-term radioactive waste repositories. The quantity of interest is the travel time of radioactive particles to the boundary of a safety zone. The simulation requires the hydraulic conductivity of the ground; it can be observed implicitly by pumping tests, and by pressure measurements. The objective of the groundwater flow inverse problem is the identification of the conductivity. In this example, the mathematical model involves an elliptic PDE. The groundwater flow inverse problem is well known, see e.g. Dashti2011UncertaintyProblem ; Dashti2017TheProblems ; Marzouk2009DimensionalityProblems ; Richter1981AnEquation ; Schillings2017AnalysisProblems .

In contrast to deterministic regularisation techniques, the Bayesian approach to inverse problems uses the probabilistic framework of Bayesian inference. Bayesian inference is built on

Bayes’ Formula in the formulation given by Laplace (Laplace1812TheorieProbabilites, , II.1). We remark that other formulations are possible, see e.g. the work by Matthies et al. Matthies2016InverseSetting . We make use of the mathematical framework for Bayesian Inverse Problems (BIPs) given by Stuart Stuart2010InversePerspective

. Under weak assumptions – which we will give below – one can show that the BIP is well-posed. The solution of the BIP is the conditional probability measure of the unknown parameter given the observations.

The Bayesian framework is very general and can handle different types of forward models. However, in this work we consider PDE-based forward models, and in particular an elliptic PDE. The exact solution of the associated BIP is often inaccessible for two reasons: there is no closed form expression for the posterior measure, and the underlying PDE cannot be solved analytically. We focus on , and study efficient approximations to the full posterior measure. Alternatively, one could also only approximate the expectation of output quantities of interest with respect to the posterior measure, or estimate the model evidence, the normalization constant of the posterior measure.

Typically, BIPs are approached with sampling based methods, such as Markov Chain Monte Carlo (MCMC) or Importance Sampling. Classical MCMC samplers are the algorithms suggested by Metropolis et al.

Metropolis1953EquationMachines and the generalisation by Hastings Hastings1970MonteApplications . Advanced MCMC methods for BIP settings are Hamiltonian Monte Carlo Bui-Thanh2014SolvingCarlo and preconditioned Crank-Nicholson MCMC Beskos2008AnBridges ; Cotter2013MCMCFaster . A disadvantage of MCMC samplers is the fact that it is often difficult to assess their convergence after an initial burn-in phase. Importance Sampling Agapiou2015ImportanceCost on the other hand does not require burn-in. However, Importance Sampling is inefficient if the sampling density differs significantly from the target density. For these reasons we employ Sequential Monte Carlo (SMC) Chopin2002AModels ; DelMoral2006SequentialSamplers ; Neal2001AnnealedSampling to approximate the posterior measure. SMC was initially developed to approximate sequences of measures which arise from time-dependent estimation problems in data assimilation. In our setting, since the elliptic PDE models a steady-state process the SMC sequences are constructed artificially such that, starting from the prior measure, they gradually approach the posterior measure. Artificial sequences of measures arise also in simulated annealing DelMoral2006SequentialSamplers , the estimation of rare events Papaioannou2016SequentialAnalysis , model selection Zhou2016TowardApproach , and bridging Gelman1998SimulatingSampling .

In some situations it is convenient to determine the artificial sequences “on the fly” during the execution of the algorithm. The associated method is termed adaptive SMC; see Doucet2011ALater ; Jasra2011InferenceCarlo for a discussion, and Beskos2016OnMethods for a careful analysis. A well-known drawback is the fact that adaptive SMC returns a biased model evidence estimate, however, the model evidence is not the major focus of our work. The estimation of the model evidence with non-adaptive SMC is discussed in Gelman1998SimulatingSampling ; Neal2005EstimatingSampling .

The major advantage of SMC is its dimension-independent convergence which is often observed in practise and which can be proved e.g. for uniformly bounded update densities Beskos2015SequentialProblems . Thus SMC can be used in high- and infinite dimensional settings. See Rebeschini2015CanDimensionality

for a discussion of this point. Similar results are also known for the Ensemble Kalman Filter (EnKF) applied to linear inverse problems with a finite number of particles

Schillings2017AnalysisProblems . The EnKF is a linearised version of SMC and has been applied to linear and nonlinear inverse problems (see Iglesias2013EnsembleProblems ).

SMC has already been used to solve BIPs where the forward model is an elliptic Beskos2015SequentialProblems or Navier-Stokes equation Kantas2014SequentialEquations . The computational challenge is that PDE-based forward solves are in general very expensive. Thus every sample, required by standard solvers such as MCMC or SMC, is expensive. The total computational budget might allow only a few samples and thus the sample error can be considerably large. We handle this problem by constructing a multilevel SMC sampler. To do this we assume that the PDE can be discretised with multiple levels of accuracy. In our work these levels are associated with different mesh sizes in a spatial domain. However, it is also possible to consider e.g. different time step sizes, or target accuracies of Newton’s method.

Multilevel samplers enjoy considerable attention at the moment, and are available for various tasks in uncertainty quantification. Multilevel Monte Carlo is widely used in forward uncertainty quantification; see

Giles2015MultilevelMethods for an overview. In the pioneering work by Giles Giles2008MultilevelSimulation the multilevel idea is combined with standard Monte Carlo in a forward setting. However, it can be used with other samplers such as MCMC, SMC, and the EnKF, for the estimation of rare events, for filtering problems in data assimilation, and to solve Bayesian Inverse Problems. For example, multilevel Ensemble Kalman Filters have been proposed in Chernov2016MultilevelModels ; Hoel2016MultilevelFiltering . The authors in Hoel2016MultilevelFiltering consider continuous-time data assimilation with multiple time-step discretisations. In contrast, the work in Chernov2016MultilevelModels considers data assimilation for spatially extended models e.g. time dependent stochastic partial differential equation. The multilevel estimation of rare events with an SMC type method has been proposed in Ullmann2015MultilevelEvents .

For Bayesian Inverse Problems a multilevel MCMC method has been introduced in Dodwell2015AFlow . Multilevel Sequential Monte Carlo is introduced in Beskos2017MultilevelSamplers and further discussed in Beskos2017MultilevelProposals ; DelMoral2017MultilevelConstants ; DelMoral2017MultilevelConditions

. Both the multilevel MCMC and multilevel SMC use coarse PDE discretisations for variance reduction with the help of a telescoping sum expansion. Moreover, these multilevel samplers are built to integrate output quantities of interest with respect to the posterior. In contrast, in our work we do not rely on a telescoping sum, and we construct approximations to the full posterior measure.

We build on and generalise the work by Koutsourelakis in Koutsourelakis2009AParameters . As in Koutsourelakis2009AParameters we combine SMC with tempering on a fixed PDE discretisation level, and bridging between two consecutive discretisation levels. The major novel contribution of our work is a fully adaptive algorithm to decide when to increase the discretisation accuracy (bridging) and when to proceed with SMC (tempering). In numerical experiments we show that our sampler – termed multilevel Sequential Monte Carlo – gives an approximation accuracy similar to single-level SMC while decreasing the computational cost substantially. Our method works also consistently in the small noise limit. We note that a similar SMC based multilevel method has been proposed in Calvetti2017IterativeInversion ; in this work the model error is given as part of the measurement noise and is updated iteratively.

The remainder of this paper is organised as follows. In §2 we formulate the Bayesian inverse problem and discuss its discretisation. Moreover, we review the basic idea of sequential Monte Carlo. In §3 we give an overview of classical constructions for the intermediate SMC measures, in particular, tempering, bridging, and multilevel bridging. In §3.4 we discuss adaptive SMC samplers where the inverse temperature is selected to bound the effective sample size or, equivalently, the coefficient of variation of the sample weights. The major contribution of this work is presented in §4 where we introduce the Multilevel Sequential Monte Carlo sampler. We discuss its computational cost, and suggest an adaptive update scheme for the combination of bridging and tempering. This update scheme is consistent with the adaptive bridging and tempering discussed in §3.4. Finally, in §5 we present numerical experiments for a test problem in 2D space. In §6 we give a summary and an outlook of our work.

2 Background

2.1 Bayesian Inverse Problem

We consider engineering systems or models subject to an uncertain parameter . The parameter space is a separable Banach space. The forward response operator maps the parameter to the (finite-dimensional) data space . We are interested in models where is the composition of an observation operator , and a solution operator of a partial differential equation. In addition, we consider a quantity of interest which depends on the parameter .

Observations are often noisy. We model this by assuming that is a realisation of where is the true model parameter and is mean-zero Gaussian noise with non-singular covariance matrix

. Hence, using the data vector

, we wish to identify via the equation

This (inverse) problem is in general ill-posed in the sense of Hadamard Hadamard1902SurPhysique since often or . For this reason we employ the framework of Bayesian inverse problems. Instead of identifying the deterministic parameter we assume that is a square-integrable

-valued random variable distributed according to a

prior measure . Moreover, we assume that is independent of the noise . Then, the solution of the Bayesian Inverse Problem is the posterior measure of ,

Under Assumptions 2.1 and 2.2 it can be proved that exists and that it holds


In (2.1) the term is called likelihood,


is a so-called potential (the negative log-likelihood), and


denotes the normalising constant of , or so-called model evidence. The proof of the existence of the posterior measure is given in Stuart2010InversePerspective for Gaussian prior measures . It can also be proved that the posterior measure is Lipschitz continuous with respect to (w.r.t.) the data space . In this sense the BIP is well-posed. Now we state the assumptions on the prior measure.

Assumptions 2.1 (Prior measure)

Let , and let be a Hilbert space with . Let be a trace-class, positive definite and self-adjoint linear operator on . Furthermore, let and be chosen such that . Moreover, the prior measure is absolutely continuous with respect to and its Radon-Nikodym-derivative is -a.s. given by

where is a potential.

Note that if satisfies (Stuart2010InversePerspective, , Assumptions 2.9), then holds. We also remark that Assumptions 2.1 allow for certain non-Gaussian priors. Note that can be given in terms of the so-called precision . Then it can happen that is only densely defined on , however, Assumptions 2.1 are also satisfied in this case.

In addition, for any potential we consider the following assumptions.

Assumptions 2.2 (Potential)

A potential satisfies the following conditions:

  1. For every there is an such that

  2. For every there is a such that

  3. For every there is an such that

  4. For every there is a such that

The potential in (2.2) is a typical example in our setting with Gaussian noise. If we are not particularly interested in the data dependence, we sometimes drop this dependence and set for a specific . In this case, Assumptions 2.2 are satisfied if is the solution operator of an elliptic BVP (see §5), and the observation operator is linear. In general, one can also consider non-Gaussian noise, e.g. lognormal (multiplicative) noise (Kaipio2005StatisticalProblems, , §3.2.2), or other PDE operators, e.g. Navier-Stokes Iglesias2013EnsembleProblems ; Kantas2014SequentialEquations .

Note that if two potentials and satisfy Assumptions 2.2 then the sum does as well. Thus, we can also consider (posterior) measures with a -density that is proportional to . This situation occurs in our setting since the sequential Monte Carlo estimator approximates a posterior measure which is then used as prior measure in the next step of the estimation.

2.2 Discretisation

In most applications it is not possible to evaluate , or analytically. Furthermore, the parameter space is often infinite-dimensional. This motivates the need to study approximations to the solution of the BIP.

We begin by discretising the physical space associated with the PDE solution operator . Let denote an approximation of . Here, refers to a characteristic finite element mesh size. The associated approximate potential is given by

Analogously, we approximate the quantity of interest by .

Let . We approximate the parameter space by the -dimensional space . For example, if the prior measure is Gaussian, then the parameter space approximation can be constructed using a truncated Karhunen-Loève (KL) expansion. See Ghosal2017FundamentalsInference for details. In this case is the number of terms retained in the KL expansion.

Finally, we construct particle-based approximations to the posterior measure . For particles we define

where is the weight associated with . The sum of the weights is equal to one. Typically, the particles are random samples. We obtain the particles by either deterministic or non-deterministic transformations of i.i.d. samples from the prior measure. If the approximation of the posterior measure involves the discretised potential , then we write in place of .

2.3 Importance Sampling

Let denote probability measures on the measurable space and . importance sampling approximates expected values with respect to given samples from . Let denote a quantity of interest that is integrable w.r.t. . According to the Radon-Nikodym Theorem (Klenke2014ProbabilityCourse, , Cor. 7.34) the expected value of with respect to can be written as


The right-hand side of (2.4) is an integral with respect to . We approximate this integral by standard Monte Carlo with independent samples with measure . This gives the importance sampling estimator for ,


Often, the Radon-Nikodym derivative is known only up to a normalising constant. In this case, we use the normalized weights

Finally, the (normalized) weights can be used to approximate ,


If the variance of the importance sampling estimator is finite, then the Strong Law of Large Numbers implies that the estimator in (

2.4) converges (a.s.) to the desired expected value as . Sufficient conditions for a finite variance of this estimator are discussed in (Robert2004MonteMethods, , §3.3.2). In the setting of BIPs, is the posterior. A straighforward way to apply importance sampling is to choose as the prior. In this case is the unnormalized likelihood. It is easy to see that if is bounded, then the variance of the estimator is finite. If the posterior is concentrated in a small area of the prior, then nearly all importance sampling weights are close to zero. In this situation the estimator is extremely inaccurate given a fixed sample budget, or a large number of samples is required to obtain a desired accuracy. Sequential Monte Carlo overcomes this problem by using a sequence, not only a pair, of appropriate intermediate measures.

2.4 Sequential Monte Carlo

Consider a finite sequence of probability measures on , where , Sequential Monte Carlo approximates each measure in the sequence with weighted particles; these are constructed sequentially with (variants of) importance sampling. We denote the Radon-Nikodym derivatives of all measures w.r.t. by


where is -almost surely (a.s.) positive and is the normalising constant associated with . By the Radon-Nikodym Theorem it follows that


We assume that we can generate independent samples distributed according to . Then, we apply importance sampling sequentially to update . The measures are approximated as in (2.6). In practice, this can be inefficient, especially if and have a different mass concentration for . In this case, the approximation of would still rely on -distributed samples. Therefore, it is a good idea to apply a Markov kernel that is stationary with respect to to the -distributed particles. This moves the particles into the high-probability areas of the measure . Before applying the Markov move, the particles are resampled; this eliminates particles with small weights.

3 Construction of intermediate measures

3.1 Tempering

In BIPs the posterior measure is often concentrated in a small area of the high-dimensional parameter space . Tempering (T) is a widely-used method to approximate such measures. The fundamental idea – borrowed from Statistical Physics – is to adjust the temperature in the Boltzmann distribution.aaaThe Boltzmann distribution is a discrete probability measure on the set of energy states of some system of particles. Its -density is proportional to

where is the energy of state and is the Boltzmann constant. A large temperature allows the particles to move faster. See (Gibbs1902ElementaryMechanics, , Chapter VIII), (Liu2004MonteComputing, , §1.1) and Metropolis1953EquationMachines for details. In a Monte Carlo setting tempering is the systematic raising of a density to some power . Looking at the Boltzmann distribution this means that . If a probability measure is unimodal, increasing the temperature increases the variance of the measure. This makes it easier to approximate the measure by importance sampling.

We apply tempering in combination with an SMC sampler with intermediate steps. We start with the prior ; this is equivalent to an infinite temperature or an inverse temperature . In the subsequent steps we scale down the temperature successively until , and we have arrived at the posterior . Formally, we define a finite, strictly increasing sequence of inverse temperatures , where and . The SMC sequence of probability measures is then given by

The last term on the right-hand side above tells us that

Hence, an upscaling of the temperature is equivalent to an upscaling of the noise level in BIPs. Moreover corresponds to an infinitely large noise level, where the likelihood does not contain any information. Hence, is consistent.

The densities are strictly positive. Hence, the intermediate densities in the SMC sampler (see 2.8) are given by

We refer to this method as either SMC with Tempering or simply single-level SMC.

3.2 Standard Bridging

Bridging (B) is an SMC type method, where the sequence of probability measures represents a smooth transition from one probability measure to another probability measure . We assume that both these probability measures are defined on a common measurable space , that and . We also assume that and are absolutely continuous with respect to a -finite measure on . Then, the Radon-Nikodym Theorem tells us that and exist and are unique -almost everywhere. Moreover, these densities are strictly positive almost everywhere on the support of and .

Now, let and be based on functions which are proportional to the Radon-Nikodym derivatives given above. That is,

Let and be a strictly increasing finite sequence, where and . Then, the bridging sequence of measures is defined as

or, equivalently,

Note that and .

Now, we consider specific functions , associated with BIPs. We assume that

where , are (bounded) potentials which satisfy Assumption 2.2. Moreover, we assume that satisfies the same conditions as the prior measure in Assumption 2.1. Then, the bridging sequence is given by

with well-defined probability measures . Indeed,

The intermediate bridging measures are given in terms of potentials which satisfy Assumptions 2.2. Thus, the existence of is equivalent to the existence of the posterior measure in a BIP (see §2.1 for details).

3.3 Multilevel Bridging

It is possible to generalize the idea of standard bridging to a setting where the probability measures and depend on discretisation parameters , . In BIPs this is the case if the forward response operator is discretised using two different mesh sizes . Here, refers to a more accurate yet computationally more expensive PDE solve compared to .

Suppose that the BIP has been solved on a coarse mesh , and that we wish to obtain a more accurate solution with . This means that shall be refined to . In Koutsourelakis2009AParameters the author proposes to bridge between the two probability measures, that is, apply standard bridging for and . In fact, this idea is carried out in a multilevel way by bridging between a hierarchy of probability measures associated with a sequence of decreasing mesh sizes. For this reason we refer to the method as Multilevel Bridging (MLB). We briefly summarize the ideas given in Koutsourelakis2009AParameters .

Let and denote the hierarchy of mesh sizes, where is the desired final mesh size and are the intermediate mesh sizes. The sequence is strictly decreasing. Starting with the prior, we first use tempering to compute the posterior measure associated with the forward response operator . This step is based on the following densities:

where is the prior measure and is the vector of inverse temperatures. Then, we proceed iteratively by bridging for each . In every bridging update we use intermediate steps based on the (bridging) inverse temperatures . In particular,

where .

3.4 Adaptive Sequential Monte Carlo

The accuracy and computational cost of all SMC samplers such as tempering, standard and multilevel bridging, depend crucially on the number of intermediate probability measures (, respectively) and the choice of the inverse temperatures . Up to now we assumed that and are given a priori. However, we can also determine the inverse temperatures and associated intermediate probability measures adaptively “on the fly”. In the literature, several strategies for adapting the inverse temperatures are known. We review (and implement) methods based on the coefficient of variation of the update weights. In the remainder of this text we do not formally distinguish between SMC with fixed intermediate probability measures (as in §2.4) and adaptive SMC as it is often done in the literature. Moreover, adaptivity refers only to the choice of the inverse temperatures. We do not consider adaptive schemes for the Markov kernel in the MCMC step.

To simplify the notation we drop the subscript and consider the SMC update in the remainder of this section. Let denote the density of with respect to . The probability measures and are approximated by and , respectively, and are based on particles each. Then, the effective sample size (ESS) for the SMC update step is defined by



is the coefficient of variation of

. In general, one would consider the standard deviation

in place of the coefficient of variation . However, in the SMC setting this allows us to work with unnormalized weights, since if is normalised. Hence is equal to the standard deviation of the normalized weights.

Now, the inverse temperature associated with is known. Our task is to define the inverse temperature associated with . Clearly, the density of with respect to depends on . For this reason we write . Then, the ESS of the SMC update also depends on and we write

Note that can be computed without further evaluations of the (expensive) forward response operator, for any . In our implementation we choose such that is equal to some predefined target value . In practice, we would like to avoid inverse temperature choices that meet the target ESS, that is, , but do not increase the inverse temperature by at least some . Thus, we define


Note that the optimisation problem in (3.2) is equivalent to the following problem:


where . Hence the fitting of the effective sample size is equivalent to a fitting of the coefficient of variation of the weights.

4 Multilevel Sequential Monte Carlo

In this section we generalize Multilevel Bridging and propose the Multilevel Sequential Monte Carlo (MLSMC) sampler. We explain the advantages of this generalisation in §4.3, but before we do this, we introduce the sampler formally in §4.1 and discuss its accuracy and computational cost in §4.2.

MLSMC is a Sequential Monte Carlo method which combines Tempering and Multilevel Bridging. Sequential refers to two individual sequences in a Sequential Monte Carlo sampler, namely a sequence of inverse temperatures and a sequence of discretisation levels . Starting with the prior measure and discretisation level , the MLSMC update either increases the discretisation resolution () or the inverse temperature (). This process is repeated until we arrive at the inverse temperature and maximal discretisation level . See Figure 4.1 for an illustration.




MLB (Koutsourelakis2009AParameters and §3.3)

single-level SMC (Beskos2015SequentialProblems and §3.1)


Inv. Temp.

Discr. lvl.

(Target distr. )

(Prior distr. )

Figure 4.1: The update schemes associated with Multilevel Bridging, single-level SMC, and MLSMC.

4.1 Formal Introduction

We introduce a general framework to describe MLSMC update strategies. Let denote the total number of bridging steps and inverse temperature updates. Let denote a function, where


We refer to as update scheme. In each step of the algorithm refers to the inverse temperature and refers to the discretisation level. The update function is convenient for the discussion and analysis of various update schemes. If we consider only a single update scheme , we define and . Furthermore, if it is clear whether refers to or or to both, then we use the notation

Before we present the formal definition of MLSMC we give two examples for alternative update schemes. See Figure 4.1 for an illustration.

Example 4.1

Let and define the update scheme , where . Then, the associated sampler is equivalent to single-level SMC.

Example 4.2

Let , where

The corresponding sampler is equivalent to Multilevel Bridging.

Now, we define MLSMC as a Sequential Monte Carlo sampler (see §2.4). Hence, we construct a sequence of probability measures , where and . The intermediate probability measures are based on the update scheme and are given once again by the Radon-Nikodym Theorem:

In the MLSMC sampler we distinguish two update types. Let . If , then

We refer to this update as inverse temperature update (ITU). Otherwise, if , then the update is given by

We refer to this update as level update (LU). However, we usually perform more than one Bridging step from one discretisation level to the next (see §3.2). We can redefine the update by the following (telescoping) product of densities, each of which reflects a particular intermediate bridging measure that is based on bridging inverse temperatures :

For clarity of presentation we do not include the intermediate bridging measures in the update scheme . Furthermore, if it is clear which update scheme is used, we write .

4.2 Computational cost and accuracy

Before we propose an efficient update scheme for the MLSMC sampler we briefly discuss its computational cost and accuracy. Let denote the computational cost of one evaluation of (for . Moreover, we denote the total cost of the MLSMC sampler with associated update scheme by . We typically measure in terms of the number of floating point operations that are required to evaluate . One could also think of estimating the elapsed time of model evaluations or e.g. the memory requirement.

Example 4.3

Let denote a forward response operator, where is the solution operator of an elliptic boundary value problem in -dimensional space (). Furthermore, let , denote the mesh size of the discretised model , respectively the discretised potential . Then, the ratio of the computational cost associated with two consecutive levels in terms of floating point operations is

Given a maximal level , we normalize the values such that . We arrive at

In the following we discuss the computational cost of MLSMC in terms of the update scheme and the costs . To begin, we consider inverse temperature updates. If the Markov kernel update is performed by a Metropolis-Hastings scheme, then one PDE solve for each of the particles is required, to evaluate the acceptance probability. The acceptance step also requires the model evaluations of the current particles. This however should remain in the memory, until the particles are updated. Hence, the computational cost of the inverse temperature updates is given by

In Bridging, we also perform a Markov kernel step for each of the intermediate Bridging steps and each of the particles. Here, the evaluation of the Markov update density requires two model evaluations in total, namely one on each discretisation level and , respectively. Thus, we perform evaluations of the model on the two levels. In addition, we have to consider the first intermediate Bridging step. As opposed to the inverse temperature update, we do not yet have the model evaluation of the current particles on level . Thus, we need to add the cost of to each of the level updates. In summary, the computational cost for a level update is given by

Adding the costs for bridging and inverse temperature updates, respectively, we arrive at the following total cost.

Proposition 4.4

Let the Markov kernels in the MLSMC sampler be given in terms of a single Metropolis-Hastings MCMC update. Then, the total computational cost of the Multilevel Sequential Monte Carlo sampler is given by

Next we discuss the accuracy of the MLSMC sampler in terms of the following root mean square error type metric

where is the particle based MLSMC approximation of . Note that is a random measure. We make use of the following observation: In every MLSMC update we perform a Monte Carlo estimation with weighted samples. Hence, in each update the approximation accuracy measured in terms of the root mean square error is of order

Here, is the effective sample size defined in (3.1). We refer to Agapiou2015ImportanceCost ; Beskos2016OnMethods ; Beskos2015SequentialProblems ; Rebeschini2015CanDimensionality for details on the approximation accuracy of SMC type samplers. Recall that we choose the update steps adaptively (see §3.4). Thus, the is constant in every step. Hence, every Bridging and Tempering step has the same influence on the accuracy. Thus, the total accuracy is bounded by the sum of the individual accuracies associated with the update steps. For this reason, we can maximize the accuracy of the MLSMC approximation by minimizing the total number of MLSMC update steps. The latter is given by

In summary, we wish to design an update scheme which minimizes both and .

4.3 Is Multilevel Bridging optimal?

Now we discuss the computational cost of Multilevel Bridging (see §3.3 for details). We do this to motivate our generalisation, the MLSMC sampler. First, we state two assumptions on the inverse temperatures and number of intermediate bridging steps.

Assumptions 4.5

In the MLSMC sampler,

  1. the inverse temperature is independent of the discretisation level , for any , where