Unbiased Filtering of a Class of Partially Observed Diffusions

In this article we consider a Monte Carlo-based method to filter partially observed diffusions observed at regular and discrete times. Given access only to Euler discretizations of the diffusion process, we present a new procedure which can return online estimates of the filtering distribution with no discretization bias and finite variance. Our approach is based upon a novel double application of the randomization methods of Rhee Glynn (2015) along with the multilevel particle filter (MLPF) approach of Jasra et al (2017). A numerical comparison of our new approach with the MLPF, on a single processor, shows that similar errors are possible for a mild increase in computational cost. However, the new method scales strongly to arbitrarily many processors.

Authors

• 27 publications
• 3 publications
• 1 publication
• A Wasserstein Coupled Particle Filter for Multilevel Estimation

In this paper, we consider the filtering problem for partially observed ...
04/08/2020 ∙ by Marco Ballesio, et al. ∙ 0

• Multilevel Monte Carlo for Smoothing via Transport Methods

11/08/2017 ∙ by Jeremie Houssineau, et al. ∙ 0

• On Unbiased Score Estimation for Partially Observed Diffusions

We consider the problem of statistical inference for a class of partiall...
05/11/2021 ∙ by Jeremy Heng, et al. ∙ 0

• Unbiased Estimation of the Solution to Zakai's Equation

In the following article we consider the non-linear filtering problem in...
02/04/2020 ∙ by Hamza M. Ruzayqat, et al. ∙ 0

• Central Limit Theorems for Coupled Particle Filters

In this article we prove a new central limit theorem (CLT) for coupled p...
10/11/2018 ∙ by Ajay Jasra, et al. ∙ 0

• Unbiased Estimation of the Hessian for Partially Observed Diffusions

In this article we consider the development of unbiased estimators of th...
09/06/2021 ∙ by Neil K. Chada, et al. ∙ 0

• Online Smoothing for Diffusion Processes Observed with Noise

We introduce a methodology for online estimation of smoothing expectatio...
03/27/2020 ∙ by Shouto Yonekura, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the problem of estimating a hidden diffusion process, given access to only discrete time observations. It is assumed that the observation process is independent of all other random variables when conditioned on the hidden process at a given time. Such a model is often termed a hidden Markov or state-space model in the literature (e.g.

[4]) and has many real applications in engineering, finance and economics.

We are particularly concerned in the filtering problem: estimating the diffusion process online, that is, recursively in time as data arrive. Particle filters (PFs) are numerical methods which can provide exact approximations (consistent in the Monte Carlo sample size) of filtering distributions associated to state space models, with a fixed computational cost per observation time (see e.g. [5] and the references therein). The method can sometimes provide errors that are uniform in time (e.g. [5]

) and is most effective when the hidden state is in moderate dimension (1-15). In the context of diffusions, the problem is even more challenging than usual, because the transition density of the diffusion process is seldom available up-to a non-negative and unbiased estimator, which precludes the use of exact simulation methods such as in

[7]. As a result, it is common to adopt a time-discretization of the diffusion process, for instance using the Euler method, and perform inference using a PF with this biased model. This approach can be further enhanced by using a PF version of the popular multilevel Monte Carlo (MLMC) method of [8, 11], called the multilevel particle filter (MLPF); see e.g. [1, 13, 14]. The basic notion of this methodology is to introduce a hierarchy of time-discretized filters and a collapsing sum representation of the most precise time-discretization, and then to approximate the representation using independent coupled particle filters (CPFs). Using this approach, the cost to achieve a given mean square error (MSE) can be reduced quite significantly relative to a single level strategy, under appropriate assumptions. However, we note that this method will still produce estimates with a bias from the most precise time-discretization.

The objective of the present article is to develop a technique that can remove the time-discretization bias from filtering, even when we cannot sample from the exact unobserved diffusion and we do not have access to an estimate of the transition density which is non-negative and unbiased (as in [7]). The approach we follow is to consider randomization schemes as for instance found in [15, 16, 17]. In the context of estimating a class of expectations w.r.t. laws of diffusion processes, [16] show how to obtain unbiased estimates with only access to time-discretized approximations of the diffusion process. In terms of cost to obtain a given MSE (variance), it can perform better or worse than the MLMC method, depending on the context; the improvement depends upon both the time discretization and some underlying properties of the diffusion of interest. This approach cannot be routinely extended to the filtering of diffusion processes because one cannot in general obtain independent and online (fixed computational cost per observation time step) samples even from the discrete-time approximations. Here we develop a novel double randomization scheme, based upon that in [16], which uses the MLPF methodology above to yield unbiased and finite variance estimators of the filter. Moreover, our method is intrinsically parallelizable and has a computational cost that is comparable to the MLPF. This latter point is investigated both mathematically and empirically with numerical simulations. To the best of our knowledge, this is one of the first methods to achieve unbiasedness, when exact simulations of diffusions may not be possible. Moreover, the assumptions on the diffusion which are actually required to implement the procedure are minimal.

This article is structured as follows. In Section 2 we present our approach along with the details of the problem of interest. Alternative approaches are also discussed. In Section 3 we demonstrate that our estimate is both unbiased and of finite variance. In Section 4 our numerical results are presented. The proofs of our results in Sections 2 and 3 are given in Appendix A.

2 Approach

2.1 Notations

Let be a measurable space. For we write , and as the collection of bounded measurable, continuous, twice-differentiable and Lipschitz (finite Lipschitz constants) functions respectively. If then the metric for is (i.e.  and is the norm). For , we write the supremum norm .

denotes the collection of probability measures on

. For a finite measure on and a , the notation is used. For a measurable space and

a non-negative finite measure on this space, we use the tensor-product of function notations for

, , where the notation is used for . Let , be a non-negative kernel and be a measure then we use the notations and for , For the indicator is written . are the non-negative integers.

denotes a one dimensional Gaussian distribution of mean

and variance . denotes a Laplace distribution with location and scale .

2.2 General Problem

We begin by presenting the idea at a very high level, which will help to motivate the methodology to be described in the context of partially observed diffusions. Let be a probability measure of interest. We assume that we only are able to work with biased versions of , for example arising from discretization with time-step . In particular, consider , such that for any :

 liml→∞ηl(φ)=η(φ). (1)

Our objective is to introduce a Monte Carlo method that can deliver unbiased and finite variance estimates of , which in principle could be achieved in the following manner, using randomization approaches (e.g. [16], see also [17, Theorem 3]). Suppose that one can produce a sequence of independent random variables such that

 E[Ξl]=ηl(φ)−ηl−1(φ). (2)

where . Let be a positive probability mass function on . Sample from and consider the estimate

 ˆη(φ)1=ΞLPL(L). (3)

By [17, Theorem 3], is an unbiased and finite variance estimator of if

 ∑l∈Z+1PL(l)E[Ξ2l]<+∞. (4)

Notice the very important and very powerful fact that once we can obtain (3), we can construct independent and identically distributed (i.i.d.) samples in parallel by sampling independently from for . From these samples we are able to construct an unbiased estimator with mean squared error (assuming condition (4) holds) as follows

 ˆη(φ)=1MM∑i=1ΞLiPL(Li).

The main challenge in our application is to deliver an algorithm which can provide unbiased estimates of , and such that (2) and (4) are satisfied. We will construct such a method in our particular problem of interest.

2.3 Partially Observed Diffusions

 dZt = a(Zt)dt+b(Zt)dWt, (5)

with , (th element denoted ), (th element denoted ), and a dimensional Brownian motion. The following assumptions are made throughout the article. We set .

The coefficients , for . Also, and satisfy

• uniform ellipticity: is uniformly positive definite over ;

• globally Lipschitz: there is a such that for all , .

It is remarked that these assumptions are made for mathematical convenience. In general, all one really requires from a numerical perspective, is the existence of the solution of the stochastic differential equation. However, one must note that the numerical performance will be affected by the properties of the diffusion process.

The data are observed at regular unit time-intervals (i.e. in discrete time) , where . It is assumed that conditional on , is independent of all other random variables with strictly positive density . Let

be the transition density (assumed to exist) of the diffusion process (over unit time) and consider a discrete-time Markov chain

with initial distribution and transition density . Here we are creating a discrete-time Markov chain that corresponds to the discrete-time skeleton of the diffusion process at a time lag of 1. Since we condition on a given realization of observations , we will henceforth write instead of , and it is assumed that for every . Then we define, for

 γn(B):=∫Xn+1IB(xn)(n−1∏p=0Gp(xp))M(x∗,x0)n∏p=1M(xp−1,xp)dx0:n.

The predictor is which corresponds to the distribution associated to . The filter is

 ¯ηn(B)=ηn(GnIB)ηn(Gn).

Consider an Euler discretization of the diffusion with discretization , and write the associated transition over unit time as . Then we define, for

 γln(B):=∫Xn+1IB(xn)(n−1∏p=0Gp(xp))Ml(x∗,x0)n∏p=1Ml(xp−1,xp)dx0:n.

The predictor is which corresponds to the Euler approximation of the distribution associated to . The filter is

 ¯ηln(B)=ηln(GnIB)ηln(Gn).

Let . Throughout the article, it is assumed that there exists a Markov kernel such that for any , :

 ˇMl(B×X)(x,x′)=Ml(B)(x)ˇMl(X×B)(x,x′)=Ml−1(B)(x′).

This can be done by the coupling scheme used in [14] for example and is the one used in this article. We will use the notation as an exchangeable notation for .

We note the following (the proof is in Appendix A.1), which verifies that (1) will hold in our context.

Proposition 2.1.

For any ,

 liml→∞¯ηln(φ)=¯ηn(φ).

2.4 Strategy

We will now detail how one can obtain (as in Section 2.2) via unbiased estimates of and of , and fixed. This idea is based upon finding biased but consistent (in the Monte Carlo sample size) estimates of and . Our strategy is as follows. Let , be an increasing sequence of positive integers, with . Let (resp. ) be a Monte Carlo (type) estimate of (resp. ), of samples. By the consistency of the approximations, almost surely, we have

 limp→∞¯ηNp,0n(φ)=¯η0n(φ),andlimp→∞{[¯ηNp,ln−¯ηNp,l−1n](φ)}=[¯ηln−¯ηl−1n](φ)).

We do not require that or that and we use the convention , . Below we explain how the estimates of and can be used to obtain unbiased estimates of and of , hence the random variables (as in Section 2.2). Suppose that one has a positive probability mass function on . Define

 Ξl,p:=⎧⎪⎨⎪⎩1PP(p)[¯ηNp,0n−¯ηNp−1,0n](φ)if l=01PP(p)([¯ηNp,ln−¯ηNp,l−1n](φ)−[¯ηNp−1,ln−¯ηNp−1,l−1n](φ))otherwise.

Now set , where is sampled according to . Again using [17, Theorem 3], we have that

 E[Ξl]={¯η0n(φ)if l=0¯ηln(φ)−¯ηl−1n(φ)otherwise.

For each , has finite variance provided one has

 E[Ξ2l]=∑p≥0PP(p)E[Ξ2l,p]<+∞. (6)

The main objective is now to consider how one can compute and . We will comment on the various aspects of the approach in Section 2.5.

2.4.1 Approximating ¯η0n(φ)

We will now build a procedure, based upon particle filters, to approximate which will use samples. First we explain the PF with a fixed number of particles . The objective of PFs is to recursively in approximate . Let , and and define the probability measure:

 Φ0n(μ)(B)=μ(Gn−1M0(B))μ(Gn−1).

Note that for any , . The PF at time has law:

 PN(d(x1:N,00,…,x1:N,0n))=(N∏i=1η00(dxi,00))(n∏k=1N∏i=1Φ0k(ηN,0k−1)(dxi,0k))

where

 ηN,0k−1(dx)=1NN∑i=1δxi,0k−1(dx).

The PF is summarized in Algorithm 1.

To form our approximation of with samples, we run the PF as described above with samples. To form the approximation with samples, we run a PF independently of the first PF with samples and so on, for any (i.e. with samples). For a given , the joint law of the simulated samples is

 Pc(d(x1:Np,00,…,x1:Np,0n))=p∏q=0PNq−Nq−1(d(xNq−1+1:Nq,00,…,xNq−1+1:Nq,0n))

where . Expectations w.r.t.  will be written . Define now the notation

 ηN0:p,0n(φ) := p∑q=0(Nq−Nq−1Np)ηNq−Nq−1,0n(φ), ηNq−Nq−1,0n(φ) := 1Nq−Nq−1Nq∑i=Nq−1+1φ(xi,0n).

Here are generated from the first PF, independently from the second and so on. The procedure for sampling, in order to compute (7) is summarized in Algorithm 2. The approximation of is finally

 ¯ηNp,0n(φ)=ηN0:p0n(Gnφ)ηN0:p,0n(Gn). (7)

We remark that the strategy of running a CPF with samples and then running an additional one with would not suffice and lead to biased estimator, hence the strategy adopted,

2.4.2 Approximating [¯ηln−¯ηl−1n](φ)

Let be given. We will review a method to approximate using coupled PFs (CPFs). The objective of CPFs is to recursively approximate a coupling of . We describe the method in [14].

The following exposition is from [13, 14]. Let , and and define the probability measure:

 ˇΦln(μ)(B) = (μ⊗μ)({¯¯¯¯Fn−1,μ,l⊗¯¯¯¯Fn−1,μ,l−1}¯Ml(B))

where for

 ¯¯¯¯Fn−1,μ,l(x,x′) = Fn−1,μ,l(x,x′)−{Fn−1,μ,l(x,x′)∧Fn−1,μ,l−1(x,x′)}μ(Fn−1,μ,l−{Fn−1,μ,l∧Fn−1,μ,l−1}) ¯¯¯¯Fn−1,μ,l−1(x,x′) = Fn−1,μ,l−1(x,x′)−{Fn−1,μ,l(x,x′)∧Fn−1,μ,l−1(x,x′)}μ(Fn−1,μ,l−1−{Fn−1,μ,l∧Fn−1,μ,l−1}) Fn−1,μ,l(x,x′) = ˇGn−1,μ,l(x)⊗1 Fn−1,μ,l−1(x,x′) = 1⊗ˇGn−1,μ,l−1(x′) ˇGn−1,μ,l(x) = Gn−1(x)μ(Gn−1⊗1) ˇGn−1,μ,l−1(x′) = Gn−1(x′)μ(1⊗Gn−1)

and for and

 ¯Ml(B)((x,x′),(z,z′))=ˇMl(B)(x,z′).

Now we define, recursively, for any ,

 ˇηln(B)=ˇΦln(ˇηln−1)(B).

Note that by [14, Proposition A.1] we have for

 ˇηln(B×X)=ηln(B)andˇηln(X×B)=ηl−1n(B).

The CPF at time has the following law, with :

 ˇPN(d(u1:N0,…,u1:Nn))=(N∏i=1ˇηl0(dui0))(n∏k=1N∏i=1ˇΦlk(ˇηN,lk−1)(duik))

where for ,

 ˇηN,lk−1(du)=1NN∑i=1δuik(du).

and we set for

 ηN,sp−1(dx)=1NN∑i=1δxi,sp−1(dx).

To estimate , which will be critical in our forthcoming exposition, we have the estimate

 ηN,ln(Gnφ)ηN,ln(Gn)−ηN,l−1n(Gnφ)ηN,l−1n(Gn).

Note that this estimate is biased, but consistent, i.e. it converges in the limit as but has a bias for any finite . The CPF is summarized in Algorithm 3.

To form our approximation of of samples, we run the CPF as described above with samples. To form the approximation with samples, we run a CPF independently of the first CPF with samples and so on, for any (i.e. with samples). For a given , the joint law of the simulated samples is

 ˇPc(d(u1:Np0,…,u1:Npn))=p∏q=0ˇPNq−Nq−1(d(uNq−1+1:Nq0,…,uNq−1+1:Nqn))

and we will denote expectations w.r.t.  as . For and any we define

 ηN0:p,sn(φ) := p∑q=0(Nq−Nq−1Np)ηNq−Nq−1,sn(φ), ηNq−Nq−1,sn(φ) := 1Nq−Nq−1Nq∑i=Nq−1+1φ(xi,sn).

Finally the approximation of with samples is then

 [¯ηNp,ln−¯ηNp,l−1n](φ)=ηN0:p,ln(Gnφ)ηN0:p,ln(Gn)−ηN0:p,l−1n(Gnφ)ηN0:p,l−1n(Gn). (8)

The procedure for sampling, in order to compute (8) is summarized in Algorithm 4.

2.5 Algorithm

Our procedure for computing unbiased estimates, based on the components developed in the previous sections, is summarized in Algorithm 5.

A few remarks can help to clarify the algorithm.

• The terms in the differences in are not independent; they will share common samples that have been produced by the PF/CPF.

• Each sample in the estimate (9) can be computed in parallel; i.e. this is amenable to parallel computation.

• One can correlate them and . There is no reason why they need to be independent random variables.

• The algorithm is online, i.e. the computational cost per observation time is fixed. For each sample in the estimate (9), one can simply fix the and sampled at time and update the estimates of the filter as time progresses, by using the sequential nature of the PF/CPF algorithms.

• At this stage, we have still not established that the estimator is unbiased with finite variance, nor have we investigated the associated computational effort to compute the estimate; this is the topic of Section 3.

The scheme that has been proposed is a type of double randomization (or double ‘Rhee & Glynn’, following from the work [16]) where one randomizes twice; firstly with respect to the discretization level of the diffusion and secondly to obtain unbiased estimates of the increments. The first randomization seems necessary, given the current state-of-the-art of stochastic computation; we are assuming that unbiased simulation methods (e.g. [7] and the references therein) are not sensible in our problems of interest. For the second randomization, there are several alternatives which could be considered. The first is to replace the ’single-term’ estimator that we are currently using with the ‘coupled-sum’ estimator ([16]). Methodologically, this is not significantly different from what we have suggested, but the conditions for unbiasedness and finite variance change marginally. More precisely, one would use the term

 Ξli,pi=pi∑s=01∑∞q=sPP(q)([¯ηNs,lin−¯ηNs,li−1n](φ)−[¯ηNs−1,lin−¯ηNs−1,li−1n](φ))

in 3. of Algorithm 5, with a similar type expression in 2. of Algorithm 5. A second alternative would be to use an unbiased sampling scheme based upon Markov chain simulation (e.g. [10, 12]). Although these latter schemes would have to be modified and enhanced to be applicable in the context here, the main issue with applying them is that they are not intrinsically ‘online’. We note also that these schemes themselves are based upon randomization methods, and hence one would be using a double randomization again. We also remark that the approach to obtain , which is essentially a randomization on the number of samples, is similar to the approach in [3]. In [3] the authors also use an associated idea to unbiasedly estimate non-linear functions of expectations. The approach is related, except they rely on the using independent samples from a probability of interest; in this scenario one can use all the same samples in to construct both the fine and course approximations, whereas, this does not seem to be possible when the samples are not independent. This is related to the antithetic coupling described in [9].

One may attempt to construct an estimator using a single randomization. It is not clear how to construct an efficient method with this approach. In Algorithm 6 we present a potential single randomization strategy and the estimator is given in (10). The framework for this estimator is just a single-term estimator as discussed (for instance) in Section 2.2 and, as described there, one can establish that (10) is both unbiased and of finite variance if (4) is satisfied. In Section 3.2.1 we will explain why this estimator does not work well.