 # Multilevel Particle Filters for the Non-Linear Filtering Problem in Continuous Time

In the following article we consider the numerical approximation of the non-linear filter in continuous-time, where the observations and signal follow diffusion processes. Given access to high-frequency, but discrete-time observations, we resort to a first order time discretization of the non-linear filter, followed by an Euler discretization of the signal dynamics. In order to approximate the associated discretized non-linear filter, one can use a particle filter (PF). Under assumptions, this can achieve a mean square error of O(ϵ^2), for ϵ>0 arbitrary, such that the associated cost is O(ϵ^-4). We prove, under assumptions, that the multilevel particle filter (MLPF) of Jasra et al (2017) can achieve a mean square error of O(ϵ^2), for cost O(ϵ^-3). This is supported by numerical simulations in several examples.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The non-linear filtering problem in continuous-time is found in many applications in finance, economics and engineering; see e.g. . We consider the case where one seeks to filter an unobserved diffusion process (the signal) with access to an observation trajectory that is, in theory, continuous in time and following a diffusion process itself. The non-linear filter is the solution to the Kallianpur-Striebel formula (e.g. ) and typically has no analytical solution. This has lead to a substantial literature on the numerical solution of the filtering problem; see for instance [1, 7].

In practice, one has access to very high-frequency observations, but not an entire trajectory and this often means one has to time discretize the functionals associated to the path of the observation and signal. This latter task can be achieved by using the approach in , which is the one used in this article, but improvements exist; see for instance [5, 6]. Even under such a time-discretization, such a filter is not available analytically, for most problems of interest. From here one must often discretize the dynamics of the signal (such as Euler), which in essense leads to a high-frequency discrete-time non-linear filter. This latter object can be approximated using particle filters in discrete time, as in, for instance, ; this is the approach followed in this article. Alternatives exist, such as unbiased methods  and integration-by-parts, change of variables along with Feynman-Kac particle methods , but, each of these schemes has its advantages and pitfalls versus the one followed in this paper. We refer to e.g.  for some discussion.

Particle filters generate samples (or particles) in parallel and sequentially approximate non-linear filters using sampling and resampling. The algorithms are very well understood mathematically; see for instance  and the references therein. Given the particle filter approximation of the discretized filter, using an Euler method for the signal, one can expect that to obtain a mean square error (MSE), relative to the true filter, of , for arbitrary, such that the associated cost is . This follows from standard results on discretizations and particle filters. In a related context of regular, discrete time observations and dynamics, with the signal following a diffusion,  (see also ), show that when the MSE for a particle filter is , the cost is and one can improve particle filters using the multilevel Monte Carlo (MLMC) method [11, 12], as we now explain.

MLMC is an approach which can help to approximate expectations w.r.t. probability measures that are induced by discretizations, such as an Euler method. The idea is to create a collapsing sum representation of an expectation w.r.t. an accurate discretization and interpolate with differences of expectations of increasingly coarse (in terms of the discretization) probability measures. Then, if one can sample from appropriate couplings of the pairs of probability measures in the differences of the expectations, one can reduce the computational effort to achieve a given MSE. In the case of

, one can achieve a MSE , for cost for a class of processes.

In this paper we apply the methodology of , which combines particle filters with the MLMC methodology (termed the multilevel particle filter), to the non-linear filtering problem in continuous-time. The main issue is that in-order to mathematically understand the application of this methodology to this new context several new results are required. The main difference to the case of , other than the processes involved, is the fact that one averages over the data in the analysis of filters in continuous-time. This requires one to analyze the properties of several time-discretized Feynman-Kac semigroups, in order to verify the mathematical improvements of the approach (see also ). Under assumptions, we prove that to achieve a MSE one requires a cost

. This is verified in several numerical examples. We remark that the mathematical results are of interest beyond the context of this article, for instance, unbiased estimation; see

 for example.

This article is structured as follows. In Section 2 we formalize the problem of interest. Our approach is detailed in Section 3. The theoretical results are presented in Section 4. In Section 5 our numerical results are given. The proofs of our theoretical results are housed in the appendix.

## 2 Problem

### 2.1 Notations

Let be a measurable space. For we write as the collection of bounded measurable functions. Let , denotes the collection of real-valued functions that are Lipschitz w.r.t.  ( denotes the

norm of a vector

). That is, if there exists a such that for any

 |φ(x)−φ(y)|≤C∥x−y∥2.

We write as the Lipschitz constant of a function . For , we write the supremum norm . denotes the collection of probability measures on . For a measure on and a , the notation is used. denote the Borel sets on . is used to denote the Lebesgue measure. For a measurable space and

a non-negative measure on this space, we use the tensor-product of function notations for

, . Let be a non-negative operator and be a measure then we use the notations and for , For the indicator is written . (resp. ) denotes an

dimensional Gaussian distribution (density evaluated at

) of mean and covariance . If we omit the subscript . For a vector/matrix , is used to denote the transpose of . For , denotes the Dirac measure of , and if with , we write . For a vector-valued function in dimensions (resp. dimensional vector), (resp. ) say, we write the component () as (resp. ). For a matrix we write the entry as . For and with distribution associated to we use the notation . For a finite set , we write as the cardinality of .

### 2.2 Model

Let be a measurable space. On consider the probability measure and a pair of stochastic processes , , with , , , with given:

 dYt = h(Xt)dt+dBt (1) dXt = b(Xt)dt+σ(Xt)dWt (2)

where , , with non-constant and of full rank and are independent standard Brownian motions of dimension and respectively. To minimize certain technical difficulties, the following assumption is made throughout the paper:

• We have:

1. is bounded with , and is uniformly elliptic.

2. are bounded and , .

Now, we introduce the probability measure which is equivalent to defined by the Radon-Nikodym derivative

 ZT:=dPd¯¯¯P=exp{∫T0h(Xs)∗dYs−12∫T0h(Xs)∗h(Xs)ds}

with, under , following the dynamics (2) and independently is a standard Brownian motion. We have the solution to the Zakai equation for

 γt(φ):=¯¯¯¯E[φ(Xt)exp{∫t0h(Xs)∗dYs−12∫t0h(Xs)∗h(Xs)ds}∣∣Yt]

where is the filtration generated by the process . Our objective is to, recursively in time, estimate the filter, for

 ηt(φ):=γt(φ)γt(1).

### 2.3 Discretized Model

In practice, we will have to work with a discretization of the model in (1)-(2), for several reasons:

1. One only has access to a finite, but possibly very high frequency data.

2. is typically unavailable analytically.

3. There may not be a non-negative and unbiased estimator of the transition densities induced by the model (1)-(2).

We will assume access to path of data which is observed at a high frequency, as mentioned above.

Let be given and consider an Euler discretization of step-size , , :

 ˜XkΔl = ˜X(k−1)Δl+b(˜X(k−1)Δl)Δl+σ(˜X(k−1)Δl)[WkΔl−W(k−1)Δl]. (3)

It should be noted that the Brownian motion in (3) is the same as in (2) under both and . Then, for define:

 Glk(xkΔl):=exp{h(xkΔl)∗(y(k+1)Δl−ykΔl)−Δl2h(xkΔl)∗h(xkΔl)}

and note that for any

 ZlT(x0,xΔl,…,xT−Δl):=2lT−1∏k=0Glk(xkΔl)=exp{2lT−1∑k=0[h(xkΔl)∗(y(k+1)Δl−ykΔl)−Δl2h(xkΔl)∗h(xkΔl)]}

is simply a discretization of (of the type of ). Then set for

 γlt(φ) := ¯¯¯¯E[φ(Xt)Zlt(˜X0,˜XΔl,…,˜Xt−Δl)|Yt] ηlt(φ) := γlt(φ)γlt(1).

For notational convenience . For one can also set

 γlp+t(φ) := ¯¯¯¯E[φ(Xp+t)Zlp(˜X0,˜XΔl,…,˜Xp−Δl)(tΔ−1l−1∏k=0GlpΔ−1l+k(˜Xp+kΔ−1l))∣∣Yp+t] ηlp+t(φ) := γlp+t(φ)γlp+t(1)

where we define .

## 3 Approach

For notational convenience, throughout this Section, we omit the notation from for the Euler discretization.

### 3.1 Particle Filter

Let be given, we consider approximating using a particle filter. For set

 ulp:=(xp,xp+Δl,…,xp+1)∈(Rdx)Δ−1l+1=:El.

For , we define, for any ,

 φl(x0,xΔl,…,x1):=φ(x1).

Set, with

 Glp(ulp):=Δ−1l−1∏k=0GlpΔ−1l+k(xp+kΔl).

Denote by the joint Markov transition of defined via the Euler discretization (3) and a Dirac on a point : for ,

 Ml(φ)(x):=∫Elφ(x0,xΔl,…,x1)δx(dx0)[Δ−1l∏k=1ψdx(xkΔl;x(k−1)Δl+b(x(k−1)Δl)Δl,a(x(k−1)Δl)Δl)]d(xΔl,…,x1).

For , define the operator with as:

 Φlp(μ)(φ):=μ(Glp−1Ml(φ))μ(Glp−1) (4)

where, to clarify, . Now, define, for ,

 πlp(φ):=∫Rdxηlp(dx)∫ElMl(x,du)φ(u).

Then one can establish that for

 πlp(φ)=Φlp(πlp−1)(φ).

Moreover, for

 ηlt(φ)=πlt−1(Glt−1φl)πlt−1(Glt−1). (5)

The objective of the PF is to provide an approximation of the formulae (4) and (5).

Let be given, then the particle filter generates a system of random variables on at a time according to the probability measure

 Q(d(ul,1:N0,…,ul,1:Nn))=(N∏i=1Ml(x∗,dul,i0))N∏i=1n∏p=1Φlp(πl,Np−1)(dul,ip)

where for

 πl,Np−1(φ):=1NN∑i=1φ(ul,ip−1).

The particle filter is summarized in Algorithm 1. For one can approximate via

 ηl,Nt(φ):=πl,Nt−1(Glt−1φl)πl,Nt−1(Glt−1). (6)

For one can also estimate the filter at time , , as

 ηl,Np+t(φ):=∑Ni=1(∏tΔ−1l−1k=0GlpΔ−1l+k(xl,ip+kΔl))φ(xl,ip+t)∑Ni=1∏tΔ−1l−1k=0GlpΔ−1l+k(xl,ip+kΔl).

### 3.2 Coupled Particle Filter

Let be given, in multilevel estimation, the basic idea is to approximate, for

 ηLt(φ)=η0t(φ)+L∑l=1([ηlt−ηl−1t](φ)).

Normally is chosen to target a specific bias and this is the strategy considered here. There is a complication as also determines the level frequency of the data that are used - this is discussed below. We focus on the term , , as one can use the PF above for approximating the term (see (6)).

For (resp. ), we define (resp. )

 φl((x0,xΔl,…,x1),(x′0,x′Δl−1,…,x′1)):=φ(x1,x′1)

(resp. ). The following exposition closely follows , with modifications to the context here. Let be a Markov kernel, for paths and contructed by using the same Brownian increments in the discretization (3) (see e.g.  or [16, Section 3.3]). Let be a Markov kernel defined for

Note that for any ,

 ˇMl((u,v),A×El−1)=Ml(u,A)andˇMl((u,v′),El×B)=Ml−1(v,B).

Let and define the probability measure:

 ˇΦlp(μ)(φ) := μ({Fp−1,μ,l∧Fp−1,μ,l−1}ˇMl(φ))+(1−μ({Fp−1,μ,l∧Fp−1,μ,l−1}))× (7) (μ⊗μ)({¯¯¯¯Fp−1,μ,l⊗¯¯¯¯Fp−1,μ,l−1}¯Ml(φ))

where for

 ¯¯¯¯Fp−1,μ,l(u,v) = Fp−1,μ,l(u,v)−{Fp−1,μ,l(u,v)∧Fp−1,μ,l−1(u,v)}μ(Fp−1,μ,f−{Fp−1,μ,f∧Fp−1,μ,l−1}) ¯¯¯¯Fp−1,μ,l−1(u,v) = Fp−1,μ,l−1(u,v)−{Fp−1,μ,l(u,v)∧Fp−1,μ,l−1(u,v)}μ(Fp−1,μ,l−1−{Fn−1,μ,f∧Fp−1,μ,l−1})
 Fp−1,μ,l(u,v) = ˇGp−1,μ,l(u)⊗1 Fp−1,μ,l−1(u,v) = 1⊗ˇGp−1,μ,l−1(v) ˇGp−1,μ,l(u) = Glp−1(u)μ(Glp−1⊗1) ˇGp−1,μ,l−1(v) = Gl−1p−1(v)μ(1⊗Gl−1p−1)

and for and

 ¯Ml(φ)((u,v),(u′,v′))=ˇMl(φ)(u,v′).

Define for

 ˇπl0(φ):=ˇMl(φ)((x∗,x∗))

and for

 ˇπlp(φ)=ˇΦlp(ˇπp−1)(φ). (8)

Now it can be shown that (see ) that for

 ηlt(φ)=ˇπlt−1((Glt−1φl)⊗1)ˇπlt−1(Glt−1⊗1)andηl−1t(φ)=ˇπlt−1(1⊗(Gl−1t−1φl−1))ˇπlt−1(1⊗Gl−1t−1). (9)

The objective of the coupled particle filter (CPF) is to provide an approximation of the formulae (7) and (9).

For set . A CPF for sequentially approximating is then generated on at a time according to the probability measure

 ˇQ(d(wl,1:N0,…,wl,1:Nn))={N∏i=1ˇMl((x∗,x∗),dwl,i0)}N∏i=1n∏p=1ˇΦlp(ˇπl,Np−1)(dwl,ip)

where for

 ˇπl,Np−1(φ):=1NN∑i=1φ(wl,ip−1).

To run a CPF, one must understand how to sample from which is detailed in Algorithm 2. The CPF is then described in Algorithm 3. Then one can approximate , via (9)

 [ηlt−ηl−1t]N(φ):=ˇπl,Nt−1((Glt−1φl)⊗1)ˇπl,Nt−1(Glt−1⊗1)−ˇπl,Nt−1(1⊗(Gl−1t−1φl−1))ˇπl,Nt−1(1⊗Gl−1t−1). (10)

For one can also estimate the differences of the filter at time , , as

 [ηl+1p+t−ηlp+t]N(φ):=∑Ni=1(∏tΔ−1l−1k=0GlpΔ−1l+k(xl,ip+kΔl))φ(xl,ip+t)∑Ni=1∏tΔ−1l−1k=0GlpΔ−1l+k(xl,ip+kΔl)−∑Ni=1(∏tΔ−1l−1−1k=0Gl−1pΔ−1l−1+k(¯xl−1,ip+kΔl−1))φ(¯xl−1,ip+t)∑Ni=1∏tΔ−1l−1−1k=0Gl−1pΔ−1l−1+k(