Data assimilation is the process by which observations (data) are integrated with mathematical models so that inference or prediction of the evolving state of the system can be made. For geoscience applications such as numerical weather prediction, it is an active area of research. There the typical global-scale state space dimension is of order , and observation data of dimension are assimilated every –
hours. Current established methods used in operation centres include 4DVar, (various extended versions of) ensemble Kalman filter (EnKF) and variational assimilation methods. However, for fully nonlinear systems and complex observation operators these approaches are unsatisfactory. Our work presented in this paper is part of the wider effort to tackle high dimensional nonlinear geoscience problems using particle filters, as can be seen from the survey paper(van Leeuwen et al., 2019) and the references therein.
The idea of modelling uncertainty using stochasticity in geophysical fluid applications is well established, see Buizza et al. (1999); Majda et al. (1999, 2001). In this paper we work with the stochastic advection by Lie transport (SALT) approach, first formulated in Holm (2015)
. It can be thought of as a framework for deriving stochastic partial differential equation (SPDE) models for geophysical fluid dynamics (GFD). The stochasticity is introduced into the advection part of the dynamics via a constrained variational principle called the Hamilton-Pontryagin principle. What results is a stochastic Euler-Poincaré equation, in which the local acceleration part of the transport operator is in the geometric form represented by the Lie derivative of velocity one-form in the direction of a stochastic vector field (in the form of a Stratonovich semimartingale). This approach for adding stochasticity into GFD models is different from the current state-of-the-art in numerical weather prediction (NWP), where stochastic models of uncertainty is introduced into the forcing; for examplestochastic perturbation by physical tendencies (SPPT) methodology, see Palmer (2018).
By adding stochasticity into the advection operator, one can model uncertain transport behaviour. In particular, the SALT stochastic term can be thought of as a model on the resolvable scale for the subgrid unresolvable fluid scales for transport. The main advantage of the SALT stochastic term is that it preserves the Kelvin’s Circulation Theorem (KCT). However energy is not conserved by SALT because as one can show, an extra term called line stretching results from the application of the Reynolds transport theorem to the time differential of the energy; and the extra term contributes positively to the rate of change of the energy. An alternative, energy conserving stochastic approach called Location Uncertainty (LU) has also been developed Mémin (2014), but LU models do not preserve circulation.
A fundamental ingredient in SPDEs with SALT noise is the spectrum of the velocity-velocity correlation tensor
velocity-velocity correlation tensor
, with eigenvectors denoted by, which appears in the Eulerian velocity field
Cotter et al. (2017) showed that taking the diffusive limit of a flow with two timescales leads to a stochastic differential equation in the form of (1), where should be rigorously understood as empirical orthogonal functions corresponding to the different modes of the fast flow.
For applications, the vector fields need to be supplied a priori. A data driven calibration methodology for obtaining is described in Cotter et al. (2018, 2019), in which the authors numerically investigated two example fluid systems: a damped and forced 2D Euler model with no-penetration boundary condition, and a two-layer 2D quasi-geostrophic model prescribed on a channel. In those works, the SPDE model is interpreted as a parameterisation for the antecedent partial differential equation (PDE) model. Using statistical uncertainty quantification tests, it is shown that by conditioning on a suitable initial prior, an ensemble of SPDE solutions is able to effectively capture the large scale behaviour of the deterministic system on a coarser resolution. It is important to stress the fact that the deterministic system has degrees of freedom, whilst the coarse stochastic system has degrees of freedom. Capturing large scale dynamics on coarse scales enables the reduction of the high resolution PDE system to the coarse scale as an SPDE. This motivates further investigation of the performance of SALT SPDEs using ensemble data assimilation algorithms, where the forecast model is the SPDE prescribed on coarse scales. This is the theme of the present work, where we utilise the calibrated described in Cotter et al. (2019) in a data assimilation set-up for the damped and forced 2D Euler dynamics.
For us, sequential data assimilation is mathematically formulated as a nonlinear filtering problem, which can be tackled using a particle filter, seeBain and Crisan (2009); Reich and Cotter (2015). A particle filter proceeds by alternating between forecast and analysis cycles. In each analysis cycle, observations of the current (and possibly past) state of a system are combined with the results from a prediction model (the forecast) to produce an analysis. The analysis step is typically performed either in the form of a “best estimate” or in terms of approximating conditional distributions. The model is then advanced in time and its result becomes the forecast in the next analysis cycle. However when applied to problems in high dimensions, without additional techniques a basic particle filter algorithm would almost certainly fail. This is due to the fact that in high dimensions the data are too informative.
In this paper, we describe three additional techniques: model reduction, tempering and jittering, which are incorporated into the basic bootstrap particle filter. The combined algorithm is applied to the damped and forced 2D Euler dynamics. These techniques are all necessary for the successful assimilation of data obtained from the true state of the system, which is modelled using a highly resolved numerical solution of degrees of freedom.
The rest of the paper is structured as follows. In section 2 we describe the damped and forced deterministic system and its SALT version. The deterministic system resolved on a fine resolution spatial grid is viewed as the simulated truth. The reduction to the SALT version is done via the variational approach formulated in Holm (2015). The numerical calibration of the subgrid parameters and the numerical methodologies for solving the two systems are described in Cotter et al. (2019).
In section 3 we formulate sequential data assimilation as a nonlinear filtering problem, in which the SALT equations are used as the signal. We describe in detail each algorithm: bootstrap particle filter, tempering and jittering, which are all required to tackle the high dimensional nonlinear filtering problem.
In section 4 we present and discuss the numerical experiments and results. Two main sets of experiments are considered. In the first set, which we call the perfect model scenario, the true underlying state is a realisation of the signal. In the second set, which we call the imperfect model scenario, data from the fine resolution true state is assimilated. All experiments were run on a modest workstation which has two Intel Xeon processors totalling logical cores and Gb memory. Additionally, an effective method for generating initial ensembles for SALT models is discussed.
Finally, section 5 concludes the present work and discusses the outlook for future research.
The following is a summary of the main numerical experiments contained in this paper:
Using particles, we ran the particle filter over a period of eddy turnover times (ett, see (46) for definition) separately for observation dimensions and , and assimilation intervals ett and ett. Each experiment was repeated 20 times. The mean square error (mse), ensemble spread (), effective sample size (ess, see (31) for definition) and number of tempering steps for the average is shown, in figure 5 for the perfect model scenario and in figure 11 for the imperfect model scenario.
Forecast reliability rank histograms for the particle filter using particles, observation dimension and assimilation interval ett are shown in figures 5(a) and 5(b) for the perfect model scenario, and in figures 11(a) and 11(b) for the imperfect model scenario. The reliability rank histograms are also computed for the results from using observation dimension ; this is shown in figure 7 for perfect model scenario and figure 13 for the imperfect model scenario.
Asymptotic particle filter behaviour was estimated using an experiment which ran over a period of ett, using particles and assimilation period ett. This amounts to a total of data assimilation steps. The results are shown in figures 8 – 10 for the perfect model scenario, and figures 14 – 16 for the imperfect model scenario.
2 Deterministic and stochastic advection by Lie transport GFD models
In this section, we describe the PDE and the SPDE models with Lie transport type stochastic terms. For the theory on SALT SPDEs see for example Holm (2015); Crisan et al. (2018). We follow Cotter et al. (2019) (also Cotter et al. (2018)) and use a data-driven approach to numerically model the ’s. Thus information regarding the stochastic dynamics is complete except for initial and boundary conditions. Viewed as a parameterisation of the subgrid scales, numerically the SPDE shall be prescribed on a coarse resolution grid and the PDE prescribed on a fine resolution grid.
The spread of the SPDE dynamics from using parameters calibrated with the data-driven approach described in Cotter et al. (2019) adequately captures the large scale features of the PDE dynamics. Those results indicate the feasibility of the calibrated SPDE as model reduction, thus providing the foundation for the present work where we utilise the SPDE as the signal process in a nonlinear filtering formulation. Nonlinear filtering will be the topic of discussion in section 3.
In the following, the domain is assumed for both deterministic and stochastic models.
2.1 Deterministic model
We consider the vorticity version of an incompressible Euler flow with forcing and damping. Let denote the velocity field. Let denote the vorticity of , denotes the -axis. Note that for incompressible flows in two dimensions, is a scalar field. For a scalar field we write Let denote the stream function. The stream function is related to the fluid velocity and vorticity by and respectively, where is the Laplacian operator in . The existence of the stream function is guaranteed by the incompressibility assumption.
We now write down the deterministic model,
We choose the forcing to be given by
where and are constants having the following roles: controls the strength of the forcing; is an integer interpreted as the number of gyres in the external forcing; and can be seen as the damping rate. denotes the Lie derivative of with respect to the vector field . When applied to scalar fields, is simply the directional derivative with respect to , see Chern et al. (1999)
We consider a no-penetration spatial boundary condition
to close the system. This system is a special case of a nonlinear, one-layer quasi-geostrophic (QG) model that is driven by winds above.
2.2 Stochastic model
Consider the space of continuous function whose value at is zero. It is equipped with the Wiener measure and its natural filtration . Let be the canonical Brownian motion on , that is for , is the evaluation map. We write to denote the ’th component of . The SALT version of the Euler fluid equation (2) as derived in Holm (2015). Cotter et al. (2019) introduced damping and forcing to facilitate statistical equilibrium in the underlying resolved system, leading to the following stochastic partial differential equation (SPDE),
where the vector fields represent scaled eigenvectors of the velocity-velocity correlation tensor .
Equation (7) arises from a time-scale separation assumption for the deterministic Eulerian transport velocity , leading to the following Stratonovich stochastic differential equation
where and are divergence free vector fields, from which (7) may be derived. Here one can intuitively think of as the “large” scale mean part of . In this present work since we are interested in the practicality of (7) for data assimilation, we follow Cotter et al. (2019) and make the approximation that the sum in (8) is finite. Hence the stochastic term in (7) also consists of terms.
Let denote the stream function of , and let denote the stream function of , i.e.
Note that is constant in time. The can be solved for and stored on the computer after the are obtained. For this the boundary condition
is enforced for each Then (8) can be expressed in terms of and ,
Expressing the transport velocity in this form is useful because it allows us to introduce stochastic perturbation (i.e. terms with ) via the stream function when solving the SPDE system numerically, thereby keeping the discretisation of (7) the same as the deterministic equation (2), see Cotter et al. (2019).
The full set of stochastic equations is
with boundary condition
The forcing term is the same as the deterministic case.
For the damped and forced stochastic system considered in this section, on the torus a global wellposedness theorem with the solution space is proved in Crisan and Lang (2019). In a forthcoming sequel to this work we also show the the wellposedness of the solution on the bounded domain with no-penetration boundary conditions. We make the following important assumption.
3 Nonlinear Filtering
In this section, we formulate data assimilation as a nonlinear filtering problem in which the aim is to utilise observed data to correct the distribution of predictive dynamics. We describe a particle filter methodology which incorporates three additional techniques that are required to effectively tackle this high dimensional data assimilation problem.
In nonlinear filtering terminology the predictive dynamics is often called the signal 333 Also known as the forecast model in statistics and meteorology literature, (see Reich and Cotter, 2015).. The signal in our setting corresponds to the SALT SPDE. Data is obtained via an observation process which represents noisy partial measurements of the underlying true system state. The goal is to determine the posterior distribution of the signal at time given the information accumulated from observations. This is known as the filtering problem. This is different to inversion problems (also called smoothing problems), where one is interested in obtaining the posterior distribution of the system’s initial condition, see for example Stuart, A M (2010).
The stochastic filtering framework enables us not just to provide a solution to the data assimilation problem, but also offer a clear language in which to explain the details and the intricacies of the problem. We detail below an elementary introduction to the filtering problem.
Let denote a given state space, and let denote the set of probability measures on the state space. In what follows the state space will be , where is the dimension of the space. To avoid technical complications we will assume in the following that time runs discretely . We shall work in a Bayesian setting, in other words we will assume that we know the distribution of the signal for , which will be denoted by for . We also assume that partial observations, denoted by , of dimension with are available to us at times and we wish to approximate the signal given the accumulated observations . Of course we could aim to approximate using an arbitrary -adapted estimator , where is the -algebra . However, the best estimator is the conditional expectation of given , . In this context, by the best estimator, we mean the minimiser of the mean square error , where is the standard Euclidian norm on . Of course we would not just want to compute/estimate , but also the error that we would make if we approximate with , i.e., for .
The quantiles of the approximation error will also be of interest. Therefore, in general, the filtering problem consists in determining the condition distribution of the signal given givendenoted by . Once
is determined, then its first moment (the mean vector) will give us, its covariance matrix can be used to compute the mean square error , etc. So one can adopt one of two different approaches of estimation the signal given partial observations.
Develop a data assimilation algorithm that results in a a point approximation of the signal using the data . The approximation may or may not be optimal and only, on rare occasions, an estimate of the error will be available.
Develop a data assimilation algorithm that results in an approximation of the conditional distribution of the signal using the data . This in turn will offer an approximation of the optimal estimator as well as the approximation of the error, quantiles, occupation mesures, etc.
Of course, algorithmically we expect the first problem to be a lot easier than the second. The computation, of an estimator that is an element of would be expected to be a lot easier that that of a probability measure over The first one is a finite dimensional object the latter is an infinite dimensional one. However, in the exceptional case when the signal is a linear time-series and the observation has linear dependence on the signal and they are driven by Gaussian noise the two approaches more or less coincide. The reason is that, in this case is Guassian and one can explicitly write the recurrence formula for the pair , where is the covariance matrix of . So on one hand one can compute directly the optimal estimator and on the other hand the Gaussianity ensure that is fully described by . This is the so-called Kalman-Filter. There are numerous extensions of this method to non-linear filter that attempt a similar methodology for the non-Gaussian conditional distribution. Such approaches are not optimal in the sense that they don’t offer a point estimator that is the optimal one and the corresponding “covariance” matrix that is produced is not the covariance matrix of . The existing literature in this direction is vast, we cite here (Reich and Cotter, 2015; Evensen, 2009; Ljung, 1979).
Particle filters are a class of numerical methods that can be used to implement the second approach. They have been highly successful for problems in which the dimension of the state space has been low to medium. However, in recent works (Kantas et al., 2014; Beskos et al., 2017, 2014) they have been shown to also work in high dimensions . In this paper, we tackle a state space with dimension of order . For a filtering perspective, we overcome here one other hurdle as we explain below.
Let us denote by , the (prior) distribution of the signal on the path space . The prior distribution of the signal and the observations , are the building blocks of , . To be more precise, one can show that there exists a mapping
such that . Under very general condition on the signal and the observation, this mapping is jointly continuous on the product space This would mean that will give a reasonable approximation of the conditional distribution of the signal as long as the distribution does not differ significantly from the one used to construct . The same will happen when the true law of the observation does not differ significantly from the chosen model. This property of the posterior distribution is crucial, see section 3.1.1 for details.
In the rest of this section we consider only the space-time discretised SPDE signal, of spatial dimension . The observation process is given by noisy spatial evaluations of an underlying true system state at discrete time steps. We consider two scenarios for the underlying true system state, henceforth called the truth.
In the first scenario, we aim to compute the conditional distribution of the signal given partial observations of a single realised trajectory of the SPDE system. In this case the predictive dynamics and the truth are from the same dynamical system. We call this the perfect model scenario (or twin experiment, see Reich and Cotter (2015)).
In the second scenario, we use instead noisy spatial evaluations of a space-time discretised solution corresponding to the PDE system (2) – (6). We call this the imperfect model scenario. The truth in this case is computed on a more refined grid than solutions of the SPDE. Nevertheless the solution of the SPDE converges to that of the PDE as the coarser grid converge, see Cotter et al. (2019). Similarly the corresponding observations will converge (provided the observation noise does not change). This ensures the successful assimilation of PDE data into the SPDE model, assurance from the uncertainty quantification tests shown in Cotter et al. (2019) is necessary to numerically guarantee that the mis-match between state spaces remains small.
To our knowledge, this is the first application of particle filters to the case where the signal is described by a SALT SPDE system. As we explain below a straight application of the classical bootstrap particle filter algorithm fails. To succeed we implement and incorporate the following procedures.
Model reduction – approximate a high dimensional system using a low dimensional system via stochastic modelling, the result of which can be further reduced by choosing a projection of the noise process onto a submanifold. This was accomplished in Cotter et al. (2019).
Tempering – compute a sequence of intermediate measures parameterised by a finite number of temperatures that control the smoothness of the density of . This procedure eases the problem of highly singular posteriors in high dimensions, which come from the fact that high dimensional observations are too informative.
Jittering – a Markov chain Monte Carlo (MCMC) based technique for recovering lost population diversity in particle filter algorithms.
These techniques are added to the basic bootstrap particle filter, and are demonstratively necessary, theoretically consistent and rigorously justified. In addition, we shall pay particular attention to the initialisation of the particle filter, though this is discussed in section 4.1.
Before proceeding to the problem formulation, we insert an important remark.
Our spatial discretisations for the PDE and SPDE fields are defined on appropriate finite element spaces, see Cotter et al. (2019) for details of the numerical methods we use for the models under consideration. Under assumption 1, it is important to understand that instead of the finite state space , the actual problem involves measures defined on infinite dimensional function spaces, in particular it is highly plausible that in theory the state space for the SPDE is Sobolev for . Discussions of these technical complications are not the focus of this work. And since in practise we work with numerical solutions anyhow, we setup our filtering problem in a finite dimensional setting. However, the methods we use are all theoretically consistent in the limit, see Stuart, A M (2010); Dashti and Stuart (2017).
In light of remark 2, henceforth we drop the word “discretised” when describing the state space, signal and observation processes.
3.1 Filtering problem formulation
Consider discrete times . Let be a discrete time Markov process called the signal. Let be a discrete time process called the observation process. We assume almost surely (a.s.). We consider Eulerian data assimilation where the observations correspond to fixed spatial points , for all As already mentioned in the section introduction, we denote the dimensions of and by and respectively.
We take and to correspond to the velocity vector field. Mathematically we could also consider the vorticity field or the stream function, but in real world scenarios those fields may be difficult to observe directly. We denote by and the path of the signal and of the observation process from time to time ,
Let and denote particular trajectories of and . For notational convenience, we may write in the subscripts to mean .
It is useful to introduce the following standard notation in the case when is a measure and is a measurable function, and is a Markov kernel
The marginal distribution of the signal changes according to
for , and is a probability transition kernel defined by the push-forward of using the (discretised) SPDE solution map from assumption 1.
In standard filtering theory the observation process is defined by
where is a Borel-measurable function, and for ,
are mutually independent Gaussian distributed random vectors with mean zero and covariance matrix. Thus
for and Gaussian density . For convenience, define which is commonly referred to as the likelihood function.
We can now define the filtering problem.
Problem (Filtering Problem).
For , we wish to determine the conditional distribution of the signal given the information accumulated from observations, i.e.
for all bounded measurable functions , with being the given initial probability distribution on the state space
being the given initial probability distribution on the state space. In particular when for we have .
In statistics and engineering literature, is often called the Bayesian posterior distribution. Note that is a random probability measure. For arbitrary , denote
We also introduce predicted conditional probability measures and defined by
We have -almost surely the following Bayes recurrence relation, see Bain and Crisan (2009). For and ,
is a normalising constant. Due to (24), we may also write , thus .
In the general case for any bounded measurable function , we have for problem 3.1 the recurrence relation
Except for a few rare examples of the signal, it is extremely difficult to directly evaluate because there are no “simple” expressions. In section 3.2 we shall describe the particle filter methodology that we employ to tackle the filtering problem. Note that in statistics and engineering literature, particle filters are often called sequential monte carlo (SMC) methods.
3.1.1 Two observation scenarios
For the numerical experiments, we consider two scenarios for the truth.
Perfect model: the observations correspond to a single path-wise solution of the SPDE. In this scenario the filtering problem formulation we have described follows through.
Imperfect model: the observations correspond to the solution of the PDE, i.e. (20) is changed to
where corresponds to the coarse grained PDE velocity field. Coarse graining is explained in section 4. Here there is mismatch between the truth and the signal. As shown in Cotter et al. (2019), the law of the SPDE discretised on the chosen grid converges to the law of the PDE as the discretisation grid gets refined. Implicity also the law of the sequence of true observations is close to the law of the model observations. As stated in (18), is a continuous function of the law of the signal and the observations so we expect a reasonable approximation of even when we don’t use the true law of the signal444The true law of the signal is the push-forward of , the initial distribution of the signal . In the case when is deterministic then the distribution of the signal is a Dirac delta distribution. but the model law.555For continuous time models, this property is called the robustness of the filter. See Clark and Crisan (2005) for results in this direction.
3.2 Particle filter
Particle filter methods are among the most successful and versatile methods for numerically tackling the filtering problem. A basic algorithm implements the Bayes recurrence relation by approximating the measure valued processes and by -particle empirical distributions. The position of each particle is updated using the signal’s transition kernel. At the same time, individual weights are kept up-to-date in accordance with the updated particle positions. It is in the weights updating step that we take into account the information provided by the observations: particles are reweighted using the likelihood function. A new set of particle positions can be sampled based on the updated weights and the procedure iterates.
Due to the high dimensional nature of the systems in consideration, additional techniques are necessary in order to make the basic algorithm work effectively. We provide a concise presentation of the algorithms employed, and note that these methods are all mathematically rigorous. For more thorough discussions we refer the reader to Bain and Crisan (2009); Reich and Cotter (2015); Dashti and Stuart (2017); Kantas et al. (2014); Beskos et al. (2014).
3.2.1 Bootstrap particle filter
The basic algorithm, called the bootstrap particle filter or the sampling importance resampling (SIR) algorithm, proceeds in accordance with the Bayes recurrence relation (23) – (24) by repeating prediction and update steps. To define the method, we write an -particle empirical approximation of . Thus at each , we have
where denotes Dirac measure. The discrete measure is completely determined by particle positions and weights , . We define the update rule
for advancing to to be given by the numerical implementation of the SPDE solution map G, see (17),
Note that each particle position is updated independently.
For the weights, suppose the particles , are independent samples from then we have equal weighting for each particle
This does not change in the prediction step, thus
To go from to , the weights need to be updated to take into account the observation data at time . This is done using the likelihood function (21),
Using (27) but with the collection of updated particle positions and normalised weights , we obtain .
In the above we assumed to have started with independent samples from before proceeding with prediction and update. Thus after we obtain we have to generate independent (approximate) samples from in order to iterate the above prediction and update steps for future times. This is done via selection and mutation steps. Otherwise the non-uniform weights are carried into future iterations until resampling is required.
In high dimensions, can easily become singular due to the observations being too informative. This means after the update step, most of the normalised weights are very small. Thus with a finite support, does not have enough particle positions in around the concentration of the true distribution . Therefore it is desirable to add a resampling step so that particles with low weights are discarded, and replaced with (possibly multiple copies of) higher weighted particles. This selection is done probabilistically; for example, one could draw uniform random numbers in the unit interval and select particles based on the size of , see Bain and Crisan (2009); Reich and Cotter (2015).
Since the resampling step can introduce duplicate particle positions into the ensemble, without reintroducing the lost diversity, repeated iterations of resampling will eventually lead to a degenerate distribution (i.e. measures whose support are singletons). To tackle this issue we apply jittering after every resampling step. Jittering is based on Markov Chain Monte Carlo (MCMC) whose invariant measure is the target . The jittering step shifts duplicate particle positions whilst preserving the target distribution. We discuss this in section 3.3.
After resampling is applied, we obtain a new ensemble with equal weights , i.e.
When we do not resample, then the particles in the ensemble keep the weights given by , and use (27) for .
The resampling step should be done only when necessary to reduce computational cost, because the jittering step requires evaluating the solution map
. Therefore we employ a test statistic to quantify the non-uniformity in the weights and only resample when the non-uniformity becomes unacceptable. For this we use theeffective sample size (ess) statistic. It is defined by the inverse -norm of the normalised weights ,
The ess statistic measures the variance of the weights. If the particles have near uniform weights then theess value is close to On the other hand if only a few particles have large weights then the ess value is close to In practice we resample whenever (31) falls below a given threshold
Algorithm 1 summarises the bootstrap particle filter. The algorithm starts with an empirical approximation of the initial prior and steps forward in time, assimilates observation data in repeating cycles of prediction-update steps. The ess statistic is employed. When resampling is required, selection-mutation steps are applied.
3.3 MCMC and jittering
In this section we describe an effective Metropolis-Hastings MCMC based method called jittering with the proposal step chosen specifically for our signal. Jittering reintroduces lost diversity due to resampling by replacing an ensemble of samples that contain duplicates with a new ensemble without duplicates, such that the distribution is preserved.
MCMC is a general iterative method for constructing ergodic time-homogeneous Markov chains with transition kernel , that are invariant with respect to some target distribution , i.e.
By the Birkhoff’s ergodic theorem, we have the following identity
for any integrable and measurable function Practically, this means starting from an initial , each with can be treated as samples from the target distribution .
A generic Metropolis-Hastings MCMC algorithm is described in algorithm 2. A Markov transition kernel defined on the state space is used to generate proposals. Together with the right conditions on the acceptance probability function to guarantee detailed-balance, the algorithm produces a Markov chain with kernel that is reversible with respect to the target measure , see Dashti and Stuart (2017). In the Gaussian case, a classic and widely used choice for and is
for any appropriate covariance operator and log likelihood function , see Kantas et al. (2014). The parameter controls the local exploration size of the Markov chain. In practise for high dimensional problems needs to be very close to 1 in order to achieve a reasonable average acceptance probability (an acceptance probability of 0.234 is reasoned to be optimal in Neal et al. (2006)). For bad choices of the MCMC chain may mix very slowly and would require a burn-in step size that makes the whole algorithm computationally unattractive.
Let be a given measure on the state space. Let . Generate a -invariant Markov chain as follows
With (32), algorithm 2 is known as the Preconditioned Crank Nicolson (pCN) and is wellposed in the mesh refinement limit, see Dashti and Stuart (2017); Kantas et al. (2014). Thus when applied to discretised problems the algorithm is robust under mesh-refinement. It is commonly applied in Bayesian inverse problems where the posterior is absolutely continuous with respect to a Gaussian prior on Banach spaces. It is important to note that here the design of the algorithm is important because in high dimensions measures tend to be mutually singular, but for Metropolis-Hastings algorithms the acceptance probability is defined as the Radon-Nikodym derivative given by the stationary Markov chain transitions.
Our choice (55) for the prior is not Gaussian. The distribution (19) is also not Gaussian for any . The distribution of the SPDE solution is investigated numerically in Cotter et al. (2019), in which it is noted that non-Gaussian scaling is interpreted as intermittency in turbulence theory. Because of this, pCN is not feasible here. Therefore it is important to choose and such that the following properties hold.
Robustness under mesh refinement. Although we are considering finite dimensional state spaces, in the limit the state spaces under assumption 1 are infinite dimensional function spaces.
The chain should mix and stablise sufficiently quickly so that the number of burn-in steps required is reasonable.
Then with the appropriately chosen and , we apply algorithm 2 as a jittering step to shift apart duplicate particles introduced into the ensemble by the resampling step.
Given these considerations, we use directly the SPDE solution map (17) to define the transition kernel . Let the target distribution be the posterior distribution , . With a slight abuse of the notation introduced in (17), we write
to mean the solution of the SPDE at time along a realised Brownian trajectory over the time interval starting from position . When we consider and interval as fixed, then we view as a function on .
Let be the empirical approximation of with particles . We consider each particle a child of some parent at time for a realised Brownian trajectory over the interval , i.e.
To jitter , set and (see algorithm 2). At the -th MCMC iteration, , propose
where is a Brownian trajectory over generated independently from .
We use the canonical Metropolis-Hastings accept-reject probability function
Otherwise the proposal is rejected, in which case set
and go to the next iteration in algorithm 2.
Algorithm 3 summarises our MCMC procedure. The algorithm includes tempering scaling of the accept-reject function (36). Tempering is explained in the next subsection. Practically, to save computation, we may apply jittering to just the duplicated particles after resampling, and run each jittering procedure for a fixed number of steps.