We focus on a stochastic filtering problem where a space and time varying hidden signal is observed at discrete times with noise. The non-linear filtering problem consists of computing the conditional probability law of the hidden stochastic process (the so-called signal) given observations of it collected in a sequential manner. In particular, we model the signal with a particular dissipative stochastic partial differential equation (SPDE), which is the stochastic Navier-Stokes Equation (NSE). This model, or a variant thereof, is often used in applications to model unknown quantities such as atmosphere or ocean velocity. In the spirit of data assimilation and uncertainty quantification, we wish to extract information for the trajectory of the hidden signal from noisy observations using a Bayesian approach. Typical applications include numerical weather forecasting in meteorology, oceanography and atmospheric sciences, geophysics, hydrology and petroleum engineering; see[2, 34, 5] for an overview.
We restrict to the setting where the state of interest is the time varying velocity field, , in some 2D bounded set . The unknown state is modelled using the stochastic NSE
where is the Laplacian, a viscosity constant, a non-linear operator due to convection, a positive, self adjoint, trace class operator, a determistic forcing and
a space-time white noise as in. This might appear as a restrictive choice for the dynamics, but the subsequent methodology is generic and could be potentially applied to other similar dissipative SPDEs, such as the stochastic Burger’s or Kuramoto–Sivashinski equations [24, 7].
The evolution of the unknown state of the SPDE is observed at discrete times and generates a sequence of noisy observations
. In order to perform accurate estimation and uncertainty quantification, we are interested not just in approximating a single trajectory estimate of the hidden state, but in the complete filtering distribution,
that is, the conditional distribution of the state given all the observations obtained up to current time . The main objective is to compute the filtering distribution as it evolves with time, which is an instance of the stochastic filtering problem 
. The solution of the problem can be formulated rigorously as a recursive Bayesian inference problem posed on an appropriate function space. In contrast to standard filtering problems, the problem setup here is particularly challenging: the prior consists of a complicated probability law generated by the SPDE  and observation likelihoods on the high dimensional space of the signal tend to be very informative.
The aim of this paper is to propose Sequential Monte Carlo (SMC) methods (also known as Particle Filters (PF)) that can approximate effectively these conditional distributions. Computing the evolution of the filtering distribution is not analytically tractable, except in linear Gaussian settings. SMC is a generic Monte Carlo method that approximates the sequence of -s and their normalising constant (known in Statistics as marginal likelihood or evidence). This is achieved by obtaining samples known as particles and combining Importance Sampling (IS), resampling and parallel Markov Chain Monte Carlo (MCMC) steps. The main advantages of the methodology are: i) it is sequential and on-line in nature; ii) it does not require restrictive model assumptions such as Gaussian noise or linear dynamics and observations; iii) it is parallelisable, so one could gain significant speed-up using appropriate hardware (e.g. GPUs, computing clusters) ; iv) it is a well-studied principled method with an extensive literature justifying its validity and theoretical properties, see e.g. [13, 12]. So far SMC has been extremely successful in typically low to moderate dimensions , but its application in high dimensional settings has been very challenging mainly due to the difficulty to perform IS efficiently in high dimensions . Despite this challenge a few successful high dimensional SMC implementations have appeared recently for applications with discrete time signal dynamics [36, 46, 45, 47, 5, 8, 3].
We will formulate the filtering problem with discrete time observations and continuous time dynamics. This setup has appeared previously in [40, 39] for signals corresponding to low dimensional stochastic differential equations (SDEs). The aim of this paper is to provide a novel, accurate and more efficient SMC design when the hidden signal is modelled by a SPDE with linear Gaussian observation. To achieve this challenging task, the particle filter will use computational tools that have been previously successful in similar high dimensional problems, such as tempering  and pre-conditioned Crank Nicholson MCMC steps [23, 9]. Using such tools, we propose a particle algorithm that can be used to approximate when the signal obeys the stochastic NSE and the observations are linear with additive noise. On a general level, the proposed algorithm has a similar structure to , but here we additionally adopt the use of IS. We will provide a detailed design of the necessary likehood informed importance proposals and the MCMC moves used. We extend known IS techniques for SDEs ([22, 48]) and MCMC moves for high dimensional problems ([43, 23, 9]) to make them applicable for filtering problems involving the stochastic NSE or other dissipative SPDEs. In the context of particle filtering, our developments leads to an SMC algorithm that performs effectively for the high dimensional problem at hand using a moderate amount of particles.
The material presented in this paper can be viewed as an extension of some ideas in the authors’ earlier work in . In  we considered the deterministic NSE with more general observation equations. In the present paper the model for the signal contains additive noise and we assume linear observation schemes. This allows for the possibility of using likehood informed importance proposals and the MCMC steps need to be designed to be invariant to a more complicated conditional law due to the SPDE dynamics. The organisation of this paper is as follows: in Section 2 we present some background on the stochastic NSE and in Section 3 we formulate the filtering problem of interest. In Section 4 we present the SMC algorithm and in Section 5 we prsesent a numerical case study that illustrates the performance and efficiency of our method. Finally, in Section 6 we provide some concluding remarks.
2 Background on the Stochastic Navier-Stokes Equations
We present some background on the 2D stochastic NSE defined on an appropriate separable Hilbert space. We restrict the presentation to the case of periodic boundary conditions following the treatment in 
. This choice is motivated mainly for convenience in exposition and for performing numerical approximations using Fast Fourier Transforms (FFT). The formulation and properties of the stochastic NSE can allow for the more technically demanding Dirichlet conditions on a smooth boundary[19, 23]. We stress that the subsequent particle filtering methodology is generic and does not rely on the choice of boundary conditions.
Let the region of interest be the torus with being a point on the space. The quantity of interest is a time-space varying velocity field , and due to the periodic boundary conditions; here
denotes vector/matrix transpose. It is convenient to work with the Fourier characterisation of the function space of interest:
using the following orthonormal basis functions for ,
The deterministic NSE is given by the following the functional evolution equation
Following standard notation we denote for the Leray projector ( is the space of squared-integrable periodic functions), for the Stokes operator, for the convection mapping and for the forcing.
One can introduce additive noise in the dynamics in a standard manner. First, we define the upper half-plane of wavenumbers
where are (independent) standard Brownian motions on . In the spirit of [11, Section 4.1], consider a covariance operator such that , for , . Then, we can define the -Wiener process as
under the requirement , . Thus, we are working with a diagonal covariance matrix (w.r.t. the relevant basis of interest), though other choices could easily be considered. We will also work under the scenario that , for some , so that , i.e. is trace-class operator. Finally, we will use to denote the -Wiener measure on .
Having introduced the random component, we are now interested in weak solutions of the functional SDE,
with the solution understood pathwise on the probability space . More formally, following , we define the spaces
Since the operator is linear, bounded in and , [18, Theorem 6.1] implies that for and , there exists a unique solution for (6) such that . In [18, 19] one may also find more details on the existence of an invariant distribution, together with irreducibility and Feller properties of the corresponding Markov transition kernel.
2.2 Galerkin Projections and Computational Considerations
Using the Fourier basis (3), we can write the solution as
Hence, it is equivalent to consider the parameterisation of via . By taking the inner product with on both sides of (6), it is straightforward to obtain that the ’s obey the following infinite-dimensional SDE
Recall that due to being a real field , . This parameterisation of is more convenient as it allows performing inference on a vector (even if infinitely long), with coordinates evolving according to an SDE. For numerical purposes one is forced to use Galerkin discretisations, using projections of onto a finite Hilbert space instead. Consider the set of wavenumbers in
for some integer , and define the finite dimensional-subspace via the projection so that
Then, infering the Galerkin projection for corresponds to inferring the vector that obeys the following finite-dimensional SDE
This high dimensional SDE will provide an approximation for the infinite dimensional SPDE. Such an inference problem is more standard, but is still challenging due to the high dimensionality of and the non-linearities involved in the summation term of the drift function in (8). Since (8) is only an approximation of (7), it will induce a bias in the inferential procedure. In our paper, we do not study the size of this bias. Instead we concentrate our efforts on designing an algorithm to approximate (in (2)) that is robust to mesh refinement. This means our method should perform well numerically when one increases (and, indeed, reducing the bias in the numerical approximation of (7)). Naturally this would be at the expense of adding computational effort at a moderate amount, but this will depend on the particular numerical scheme used to approximate the solution of (8). For instance, for the FFT based numerical schemes used in Section 5 the computational cost is .
2.3 The Distribution of
We assume that the initial condition of is random and distributed according to the following Gaussian process prior:
with hyper-parameters affecting the roughness and magnitude of the initial vector field. This is a convenient but still flexible enough choice of a prior; see [11, Sections 2.3 and 4.1]
for more details on Gaussian distributions on Hilbert spaces. Notice thatadmits the Karhunen-Loève expansion
with , , (so, necessarily , ) and
Since the covariance operator is determined via the Stokes operator , one can easily check that the choice implies that for , -a.s., thus the conditions for existence of weak solution of (6) in [18, Theorem 6.1] are satisfied a.s. in the initial condition. Notice that sampling from is straightforward.
3 The Stochastic Filtering Problem
In Section 2 we defined the SPDE providing the unknown signal, i.e. the object we are interested in performing Bayesian inference upon. In this section we present the non-linear filtering problem in detail. We begin by discussing the observations. We assume that the vector field is unknown, but generates a sequence of noisy observations at ordered discrete time instances with for all , with , for , Each observation vector is further assumed to originate from the following observation equation
where is a bounded linear operator and is symmetric positive-definite. One can then write the observation likelihood at instance as
Using a linear observation model is restrictive but it does include typical observation schemes used in practice. We focus our attention at the case when is a noisy measurement of the velocity field at different fixed stationary points , . This setting is often referred to as Eulerian data assimilation. In particular we have that
with denoting a spatial average over a (typically small) region around , , say for some radius ; that is, is the following integral operator
with denoting the area of . In what follows, other integral operators could also be similarly used, such as , with being appropriate weighting functions that decay as grows.
Earlier in the introduction, the filtering problem was defined as the task of computing the conditional distribution . Due to the nature of the observations, it is clear we are dealing with a discrete time filtering problem. A particular challenge here (in common with other typical non-linear SPDEs) is that the distribution of the associated Markov transition kernel, , is intractable. Still, it is possible to simulate from the unconditional dynamics of given using standard time discretization techniques. (The simulated path introduces a time discretisation bias, but its effect is ignored in this paper.)
We aim to infer the following posterior distribution, based on the continuous time signal
we also denote
This data augmentation approach - when one applying importance sampling on continuous time - has appeared in  for a related problem and in  for filtering problems involving certain multivariate SDEs. We proceed by writing the filtering recursion for . We denote the law of in (6) for the time interval between and as
Then, one may use Bayes rule to write recursively as
In addition, one can attempt to propose paths from an appropriate SPDE different from (6), say
where and is a -Wiener process on . We define
One needs to ensure that the change of drift is appropriately chosen so that a Girsanov theorem holds and is absolutely continuous with respect to for all relevant , with the recursion in (12) becoming
Here are assumed to be typical elements of the sample space of either of the two probability measures above (e.g. all such paths are assumed to possess relevant continuity properties at ).
In the context of particle filtering and IS one aims to design in a way that the proposed trajectories are in locations where is higher. This in turn implies that the importance weights in (14
) will exhibit much less variance than the ones from the prior signal dynamics, hence the design ofis critical for generating effective Monte Carlo approximations.
4 Particle Filtering
We are interested in approximating the distribution using a particle filter approach. We present in Algorithm 1 a naive particle filter algorithm that provides the particle approximations:
Such a particle filter will be typically overwhelmed by the dimensionality of the problem and will not be able to provide accurate solutions with a moderate computational cost. When in (13), the algorithm corresponds to a standard bootstrap particle filter. For the latter, it is well known in the literature ([5, 41]) that it exhibits weight degeneracy in the presence of large dissimilarity between and , which can be caused in our context by the high dimensionality of the state space and the complexity of the SPDE dynamics. When is well designed then the particles can be guided in areas of larger importance weights and the algorithmic performance can be considerably improved, but this modification may still not be sufficient for obtaining a robust and efficient algorithm.
Initialise , .
For : sample independently
For : compute importance weights
For : resample
In the remainder of this section, we will discuss how to improve upon this first attempt to tackle the high-dimensional filtering problem at hand using the following ingredients: (i) specifying a particular form of in (13) that results in gains of efficiency, (ii) using adaptive tempering, and (iii) MCMC moves. Guided proposals and tempering are employed to bridge the dissimilarity between and . The MCMC steps are required for injecting additional diversity in the particle population, which would otherwise diminish gradually due to successive resampling and tempering steps. The method is summarised in Algorithm 2. In the following subsections we explain in detail our implementation of (i)-(iii) mentioned above.
4.1 Likelihood-Informed Proposals
absolute continuity indeed holds and we have Radon-Nikodym derivative
see [11, Theorem 10.14] and [11, Lemma 10.15] for details. It remains to provide an effective design for . One can use proposals developed for problems whereby a finite-dimensional SDE generates linear Gaussian observations and one is interested to perform a similar IS method, see e.g. [22, 48, 37, 38, 44]. In this paper we use the proposal employed in  and set
where denotes the adjoint of . The guiding function could be interpreted as an one-step Euler approximation of the -tranform needed to evolve conditional on the acquired observation within the interval . It is not hard to verify (15) for this choice of . Since , are invertible then exists via the Sherman-Morrison-Woodbury identity and is a bounded linear operator. Then (15) holds from [11, Proposition 10.18] and [30, Proposition 2.4.9] that imply that there exists a such that
which implies (15).
For the finite-dimensional SDE case more elaborate guiding functions can be found in [48, 44] and some of these could be potentially extended so that they can be used in the SPDE setting instead of (16). The advantage of using in (16) is that it provides a simple functional and can perform well for problems where is of moderate length, as also confirmed in the numerical examples of Section 5.
4.2 Bridging and With Adaptive Tempering
Guided proposals aim to bridge the dissimilarity between and by considering a Bayesian update from to . In a high-dimensional setting, even using well-designed likelihood-informed proposals is not sufficient to bridge the dissimilarity between the informed proposal and the target . As a result the importance weights could still degenerate. To avoid this more effort is required. One possibility is to allow for a progressive update via a sequence of intermediate artificial distributions between to , which we will denote as with and require that and . This is a well known strategy to improve particle filters; see [oudjane2000progressive, 21] for some early works in this direction for low dimensional problems.
To construct the sequence we will use a standard tempering scheme . Each can be defined using
for inverse temperatures . Note that each is defined on the same state space of and there are no natural stochastic dynamics connecting each . As a result we will follow the framework of [14, 6] and use artificial dynamics provided by a MCMC transition kernel that is invariant to . The details are provided in the next section. Using these MCMC proposals will result in the weights at iteration being , which depends on and the proposed from the MCMC kernel.
The main issue that needs addressing for this scheme to be successful is how to determine the temperatures and their number . We propose to set these on-the-fly using an adaptive procedure introduced in . Assume we are at the -th step of the algorithm, have completed tempering steps, and we have equally weighted particles. The next temperature is determined by expressing the weights as a function of
and determining via a requirement based on a quality criterion for the particle population. We use here the Effective Sample Size (ESS), and set
(under the convention that ) with a user-specified fraction . Equation (19) can be easily solved numerically using for instance a standard bisection method. This approach leads to a particle approximation for , say
we then propose to resample from so that one ends up with equally weighted particles.
The adaptive tempering procedure is presented in step 3 of Algorithm 2. In steps 3(a)-3(c), (18)-(19) are followed by resampling and MCMC steps and the steps are iterated until . The MCMC dynamics are denoted by and will be discussed below. For every , the output of step 4 of Algorithm 2 provides a particle approximation targeting . The interesting feature of this algorithm is that when moving from to , it does not required a user-specified intermediate sequence of target distributions , but these are adaptively set according to the locations of the particles and (19). The number of steps required, , will be determined according to the difficulty in assimilating .
In Algorithm 2 for simplicity we always resample once . This can be avoided, but then in the next time-step of the algorithm one should use
4.3 Adding Particle Diversity With MCMC kernels
Successive resampling due to the tempering steps leads to sample impoverishment unless the method re-injects sampling diversity. To achieve this, we propose using a small number of iterations from a MCMC procedure that leaves invariant. This is not the only possible choice, but it does lead to a a simple weight expression seen above; see  for extensions and more details. We use a particular MCMC design similar to  that is well defined on function spaces (based on theory for MCMC on general state spaces ). The design is often referred to as preconditioned Crank-Nicolson, abbreviated here to pCN; see [42, 9] for a detailed review.
We begin with a basic description of the pCN scheme for a given target distribution ; for simplicity we will drop the subscripts here. We will denote the one step Markov probability kernel obtained from the MCMC procedure as
with denoting the proposal kernel and the acceptance probability in a standard Metropolis-Hastings framework. Let be a probability measure that is absolutely continuous with respect to with Radon-Nikodym derivative
provides a kernel which is -invariant (by [43, Theorem 2]).
Next we discuss implementing the pCN design for our problem. At iteration the target distribution for the MCMC kernels is , so let , and denote the corresponding MCMC kernel, proposal and acceptance ratio respectively. Note that the state space of is the space of paths , which is growing with each observation time . We stress that for the purpose of particle filtering we are mainly interested on the invariance property of (to ) and not necessarily its ergodic properties on the full space. With this in mind can be a Markov kernel that generates proposals with for . This allows for on-line computation at each . At the same time reversibility holds as Proposition 1 and Theorem 2 in  still hold for such proposals. From a practical perspective, we are adding noise to the path of the hidden signal only within .
Then, we need to specify and . Recall that for a fixed the state space of each is the same for different , so needs not vary with . One possibility is to let and suppose with being the driving noise that generated . Note that we can assume than without loss of generality, since the -path uses the increments of . Suppose also that both and are stored in the computer’s memory and so that
To simulate from a -preserving proposal one first generates a new noise sample
where is the noise driving and is the Q-Wiener measure. To return to the original space, we use the new noise to solve for in (13). A standard calculation can show that , which in turn implies that for the part of the proposal in , holds. Reversibility with respect to is ensured using a simple conditioning and marginalization argument.
In Algorithm 2 we use iterations of (20) with specified as above. The corresponding m-iterate of the MCMC transition kernel is denoted as and is presented in Algorithm 3 in an algorithmic form. To simplify exposition, in Algorithm 2 for each iteration the simulated tempered path for particle is denoted as and the MCMC mutation is presented jointly with resampling in step 3 (c) ii.
At . For , sample i.i.d. , and set .
Firstly, similarly with  one can extend the proposals by reducing the lower bound on the time we start adding noise (here ). This could be made smaller and this can be beneficial in terms of adding diversity, but for the sake of simplicity we do not pursue this further.
It is important to note that is based on adapting a very basic version of pCN-MCMC as outlined in [42, 9, 23]. There, typically is chosen to be a Gaussian measure that concides with a pre-chosen prior for a static Bayesian inference problem. The resulting MCMC kernel often exhibits slow mixing properties. This can be addressed by allowing a few selected coordinates be proposed from a kernel invariant to a Gaussian approximation of the posterior distribution. The remaining coordinates are sampled as before (using kernels invariant to the prior), so that the scheme is valid for arbitrary dimensions. This results in more advanced pCN-MCMC algorithms with likelihood informed proposals for such as the ones described in [10, 32]. In the context of SMC one has the added benefit of using particle approximations for the mean and covariance to construct likelihood informed proposals for and this results to a simple and effective approach as illustrated in [29, beskos2017multilevel].
A natural question to pose is how these ideas can be extended to construct more efficient . Note that the filtering problem is more complicated as the variables of interest are SPDE paths. Still more advanced proposals can be implemented after a change of measure. For the MCMC above we chose . This choice was because of its simplicity in implementation and its effectiveness in the numerical examples we considered, where the MCMC kernel in Algorithm 3 mixed well. When facing harder problems, one can extend the construction of and use instead of any measure that admits a Radon-Nicodym derivative w.r.t it. For example one could use instead of (21) a proposal like
with , where the Girsanov tranformation between can be established rigourously as in [chang1996large, Propositions 4.1 and 4.2]. This construction is an alternative to Algorithm 3 that is more ameanable to extensions along the lines of [29, beskos2017multilevel] as the reference measure is Gaussian. To follow [29, beskos2017multilevel] one should use a Gaussian measure whose covariance operator should take into account the likelihood for low frequencies. This means one should use in (22) a different Gaussian measure in (22) than , which is identical to for high and for low the diffusion constants are computed from particle approximations for the posterior mean and covariance (given ) of a sequence obtained just before the MCMC mutation.