# Posterior contraction rates for non-parametric state and drift estimation

We consider a combined state and drift estimation problem for the linear stochastic heat equation. The infinite-dimensional Bayesian inference problem is formulated in terms of the Kalman-Bucy filter over an extended state space, and its long-time asymptotic properties are studied. Asymptotic posterior contraction rates in the unknown drift function are the main contribution of this paper. Such rates have been studied before for stationary non-parametric Bayesian inverse problems, and here we demonstrate the consistency of our time-dependent formulation with these previous results building upon scale separation and a slow manifold approximation.

## Authors

• 15 publications
• 2 publications
• ### Bayesian inverse problems with unknown operators

We consider the Bayesian approach to linear inverse problems when the un...
01/30/2018 ∙ by Mathias Trabs, et al. ∙ 0

• ### Efficient drift parameter estimation for ergodic solutions of backward SDEs

We derive consistency and asymptotic normality results for quasi-maximum...
09/17/2021 ∙ by Teppei Ogihara, et al. ∙ 0

• ### Posterior Contraction Rates for Gaussian Cox Processes with Non-identically Distributed Data

This paper considers the posterior contraction of non-parametric Bayesia...
06/20/2019 ∙ by James A. Grant, et al. ∙ 0

• ### Nonparametric Bayesian inference for reversible multi-dimensional diffusions

We study nonparametric Bayesian modelling of reversible multi-dimensiona...
12/22/2020 ∙ by Matteo Giordano, et al. ∙ 0

• ### All-at-once formulation meets the Bayesian approach: A study of two prototypical linear inverse problems

In this work, the Bayesian approach to inverse problems is formulated in...
01/14/2021 ∙ by Anna Schlintl, et al. ∙ 0

• ### Posterior Consistency in the Binomial (n,p) Model with Unknown n and p: A Numerical Study

Estimating the parameters from k independent Bin(n,p) random variables, ...
09/07/2018 ∙ by Laura Fee Schneider, et al. ∙ 0

• ### Discussion on Bayesian Cluster Analysis: Point Estimation and Credible Balls by Sara Wade and Zoubin Ghahramani

I begin my discussion by giving an overview of the main results. Then I ...
03/13/2018 ∙ by William Weimin Yoo, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

In this paper, we consider the combined state and drift estimation problem for the stochastic heat equation

 (1) dut=∂2xutdt+f∗dt+γ1/2dwt

over the domain and for , with zero Dirichlet boundary conditions and zero initial condition for all , without loss of generality. We denote by a cylindrical Wiener process where characterises the strength of the model error. The unknown drift function is assumed to belong to a Sobolev space of appropriate regularity. We specify the properties of and more precisely in Section 2 below.

Linear stochastic partial differential equations (SPDEs) of the form (

1) are well understood from analytical and numerical perspectives. See, for example, Da Prato & Zabczyk (1992); Hairer (2009); Lord et al. (2014). In this paper, we specifically focus on the estimation of the states , , as well as the unknown drift function from noisy state measurements given by

 (2) dyt=utdt+ρ1/2dvt,

where for all , denotes the strength of the measurement noise, and is another cylindrical Wiener process independent of the model error .

Our focus is on asymptotic posterior contraction rates, which have been widely studied in the context of non-parametric stationary inverse problems. See, for example, Knapik et al. (2011); Giné & Nickl (2016); Ghosal & van der Vaart (2017). We note that pure parameter estimation problems for SPDEs given exact observations of the states are also well-studied. See, for example, Cialenco (2018) for a recent survey. A first step towards non-parametric time-dependent inverse problems has been taken in Yan (2019). However, there they are framed as inference problems over a fixed time interval resulting in smoothing problems, whereas filtering problems are the focus of this paper. Furthermore, state and drift estimation are treated separately. In addition, parameter estimation for a class of linear SPDEs from noiseless local state measurements has been considered in Altmeyer & Reiß (2019), whereas we assume noisy observations of the full states throughout this paper.

In this paper, the infinite-dimensional, time-dependent combined state and drift estimation problem is formulated and analysed in terms of the well-known Kalman-Bucy filter equations for the posterior mean and covariance operator in an augmented state space. See, for example, Jazwinski (1970); Simon (2006); Curtain (1975) for an introduction to the Kalman-Bucy filter in finite and infinite dimensions. More recently, the Kalman-Bucy filter has been reformulated as a set of mean-field equations termed the ensemble Kalman-Bucy filter. See, for example Bergemann & Reich (2012); Wiljes et al. (2018); Nüsken et al. (2019). These mean-field equations allow for a concise formulation of our time-dependent estimation problem. Furthermore, although not considered in this paper, the Kalman-Bucy mean-field equations can be generalised to non-Gaussian estimation problems (Yang et al., 2013; Taghvaei et al., 2017). A key observation arising from the analysis of the Kalman-Bucy mean-field equations is the time-scale separation in the dynamics of the state and parameter variables, allowing us to apply the concept of slow manifolds (Verhulst, 2007).

The remainder of this paper is structured as follows. A mathematical formulation of the combined state and drift estimation problem in terms of Fourier modes is provided in Section 2. Furthermore, the time-dependent estimation problem is formulated in terms of Kalman–Bucy mean-field equations. This formulation is applied to a stationary drift estimation problem in Section 3. Asymptotic posterior contraction rates are then derived and compared to existing results for a closely related non-parametric Bayesian inference problem. These results are extended to the combined state and drift estimation problem in Section 4. The essential analysis of the single-mode Kalman-Bucy filter equations is carried out in Section 4.1. A numerical exploration of the combined state and parameter estimation problem is carried out in Section 5. The paper concludes with Section 6.

## 2. Mathematical problem formulation

In this paper, we analyse the state and drift estimation problem using a spectral representation in terms of Fourier modes. We provide the essential background in this section. It is well known that solutions of (1) can be expanded in Fourier modes , that is,

 (3) ut(x)=∑k≥1Ut(k)sin(kx).

Because of (1), the Fourier modes obey the stochastic differential equations (SDEs)

 (4) dUt(k)=−k2Ut(k)dt+F∗(k)dt+γ1/2dWt(k),k≥1.

Here , , denote the Fourier coefficients of , and independent standard Brownian motions. The initial conditions are for all . Therefore (1) can be viewed as a stochastic evolution equation on the separable Hilbert space of square integrable functions of sufficient spatial regularity with zero boundary conditions.

The measurement model (2) can also be transformed into Fourier space, yielding

 (5)

with representing independent standard Brownian motions, independent of the model errors for all . Throughout this paper, we will exclusively work with the formulation (4) and (5) of our state and drift estimation problem in Fourier space. We now proceed to formulate a continuous-time Bayesian inference framework for this problem, which is used to derive asymptotic posterior contraction rates in Section 4.

The posterior Bayesian random variables for the state and drift at time

, conditioned on the observed data , , are denoted by and , respectively. Here represents all the data up to time , where

 (6) Ys(k)=∫s0dYτ(k)

is an observation at time . The mean-field Kalman–Bucy equations (Wiljes et al., 2018; Nüsken et al., 2019) for the combined state and drift estimation problem in the variables are then given by

 (7a) dUt(k) =−k2Ut(k)dt+Ft(k)dt+γ1/2dWt(k)−Kut(k)dIt(k), (7b) dFt(k) =−Kft(k)dIt(k),

with the innovation given by

 (8) dIt(k):=12(Ut(k)+E[Ut(k)|Y(0,t](k)])dt−dYt(k),

and Kalman gains

 (9) Kut(k):=ρ−1E[ΔUt(k)2|Y(0,t](k)],Kft(k):=ρ−1E[ΔUt(k)ΔFt(k)|Y(0,t](k)],

where

 (10) ΔUt(k):=Ut(k)−E[Ut(k)|Y(0,t](k)]

and

 (11) ΔFt(k):=Ft(k)−E[Ft(k)|Y(0,t](k)].

Denoting the distribution of the joint random variable at time by , we define

 (12) E[g|Y(0,t](k)]=π(k)t[g]:=∫R2π(k)t(u,f)g(u,f)dudf

for any measurable function . Hence, for example, , and

 (13) Kut(k)=ρ−1π(k)t[(u−π(k)t[u])2]=σut(k)ρ

where

denotes the variance of

.

The prior assumptions are that almost surely, and that the , , are independent Gaussian random variables with mean zero and variance

 (14) σf0(k)=k−2α−1

for a suitable . Since (4) and (5) are linear in the unknowns, and remain Gaussian under the evolution equations (7), and we investigate their behaviour in terms of their mean and covariance for . We further assume that the true drift function has Sobolev regularity , that is

 (15) ∑k≥1F∗(k)2k−2β<∞.

For example, (15) is satisfied if

 (16) F∗(k)=ckk−β−1/2−δ

for any and any sequence of coefficients with bounded as . These coefficients can, for example, be realisations of i.i.d. uniform random variables from the interval .

In addition to an analysis of the continuous-time Bayesian inference problem (7), we also study the (linear) dependence of the estimators (Bayesian posterior means)

 (17) ¯¯¯¯Ut(k):=π(k)t[u],¯¯¯¯Ft(k):=π(k)t[f],

on the random measurement process in Section 4. We denote the expectation values of (17) with respect to the observation process for fixed by

 (18) mut(k):=E∗[¯¯¯¯Ut(k)]andmft(k):=E∗[¯¯¯¯Ft(k)].

These two quantities characterise the systematic bias in the estimators (17), while the implied variances

 (19) put(k):=E∗[(¯¯¯¯Ut(k)−mut(k))2],pft(k):=E∗[(¯Ft(k)−mft(k))2]

are a measure of the ‘frequentist’ uncertainty of the estimators (17).

According to Lemma 8.2 from Ghosal & van der Vaart (2017), the posterior contraction rate in the th Fourier mode of the drift estimator , that is,

 (20) E∗[P{Ft(k):|Ft(k)−F∗(k)|≥Mtεt(k)|Y(0,t](k)}]→0,

is provided by

 (21) εt(k)2=σft(k)+pft(k)+(mft(k)−F∗(k))2.

Equation (20) holds for any nonnegative function with . Here denotes the variance of .

The main contributions from our subsequent analysis consist of providing bounds for the three quantities , , and in terms of the statistical models (4) and (5), and additionally studying the asymptotic behaviour of the asymptotic contraction rate , defined by

 (22) ε2t:=∑k≥1εt(k)2,

in terms of the Sobolev regularity of the true drift function and the variance of the prior characterised by in (14). Here, the asymptotic contraction rate is to be understood in the sense that

 (23)

for any nonnegative function with . In physical space,

 (24) E∗[P{ft:1π∫Ω(ft(x)−f∗(x))2dx≥M2tε2t|Y(0,t]}]→0,

where

denotes the inverse Fourier transform of

.

We first discuss a stationary formulation of the drift estimation problem in Section 3, which is closely linked to available results for non-parametric Bayesian inverse problems (Knapik et al., 2011; Giné & Nickl, 2016; Ghosal & van der Vaart, 2017). The combined state and drift estimation problem is analysed in Section 4.

###### Remark 2.1.

A more general class of time-scaled prior variances over the unknown set of static parameters has been considered in Knapik et al. (2011). This more general class of prior distributions could be incorporated into our analysis as well. However, we restrict the subsequent analysis to the simpler case (14).

## 3. Stationary drift estimation problem

Before addressing the full time-dependent problem, we first consider the stationary drift estimation problem in which and in (4), giving rise to the forward model

 (25) U(k)=k−2F(k)

and observations

 (26a) dYt(k) =U∗(k)dt+ρ1/2dVt(k) (26b) =k−2F∗(k)dt+ρ1/2dVt(k)

for . The forward model (26) leads to a mildly ill-posed inverse problem. We follow a Bayesian approach by placing a Gaussian mean-zero prior with variance given by (14) on the Fourier coefficients of the unknown drift function. Let us summarise our assumptions for ease of reference in the following:

###### Assumption 3.1.

We assume that the Fourier coefficients of the true drift function satisfy (15). The prior coefficients, denoted by , are independent Gaussian random variables with mean zero, that is, , and with variance satisfying (14).

Let us denote the Bayesian posterior estimate of the drift at time by , . We recall that is Gaussian for all provided is Gaussian. Additionally, recall that we denote the mean of by and the variance by . The Kalman–Bucy evolution equations (7) reduce to the following equations for the mean and the variance of the th Fourier mode:

 (27a) d¯¯¯¯Ft(k) =−Kft(k)(k−2¯¯¯¯Ft(k)dt−dYt(k)), (27b) ddtσft(k) =−Kft(k)σft(k)k2,

where denotes the Kalman gain factor

 (28) Kft(k):=σft(k)ρk2.

The initial conditions are provided by Assumption 3.1.

###### Remark 3.2.

Alternatively, the Kalman–Bucy equations (27) for the mean and the variance can be reformulated as mean-field equations in directly, that is,

 (29) dFt(k)=−Kft(k)(12k2(Ft(k)+¯¯¯¯Ft(k))dt−dYt(k)).

These equations are a special case of (7), and provide the starting point for numerical implementation in the form of the ensemble Kalman–Bucy filter (Bergemann & Reich, 2012), as well as extensions to nonlinear and non-Gaussian Bayesian inference problems (Yang et al., 2013; Taghvaei et al., 2017).

The evolution equation (27b) for the posterior variance has the closed form solution

 (30) σft(k)=σf0(k)ρ−1k−4σf0(k)t+1.

The trace of the covariance of the full joint posterior process thus satisfies the asymptotic estimate

 (31) ∑k≥1σft(k)≍t−2α/(2α+5)

with the initial variances satisfying (14). This result follows from asymptotic bounds for infinite sums. See Lemma K.7 in Ghosal & van der Vaart (2017) in particular.

Next, we carry out a ‘frequentist’ analysis of the mean in terms of its bias with respect to the true , and its variance with respect to the measurement noise . We rewrite (27a) as

 (32) d¯¯¯¯Ft(k)=−Kft(k)(k−2(¯¯¯¯Ft(k)−F∗(k))dt−ρ1/2dVt(k)).

Denoting the expectation value of with respect to the measurement noise by , we obtain

 (33) ddtmft(k)=−k−2Kft(k)(mft(k)−F∗(k))

with initial condition . Equation (33) has the closed-form solution

 (34) mft(k)=F∗(k)−σft(k)σf0(k)F∗(k).

Hence, based on (16), the -norm of the frequentist bias satisfies the asymptotic estimate

 (35) ∑k≥1(mft(k)−F∗(k))2≍t−2β/(2α+5).

This result follows again from asymptotic bounds for infinite sums.

We finally investigate the time evolution of the frequentist variance . Starting from the stochastic differential equation

 (36) d(¯¯¯¯Ft(k)−mft(k))=−Kft(k)(k−2(¯¯¯¯Ft(k)−mft(k))dt−ρ1/2dVt(k)),

using Itô’s lemma yields

 (37a) ddtpft(k) =−2k2Kft(k)pft(k)+ρ(Kft(k))2 (37b) =−2σft(k)ρk4+(σft(k))2ρk4.

The initial variance is for all . In order to analyse the solution behaviour of (37), we introduce and find the associated evolution equation

 (38) ddtΔpft(k)=−2σft(k)ρk4Δptt(k)

from which we conclude that , and

 (39) limt→∞pft(k)=σft(k).

In fact, the explicit solution of (37) is

 (40) pft(k)=ρk4σf0(k)2t(ρk4+σf0(k)t)2.

The ‘frequentist’ uncertainty is characterised by

 (41) ¯¯¯¯Ft(k)−F∗(k)=OP(Δt(k)),Δt(k)∼N(mft(k)−F∗(k),pft(k)),

so the probability of the event

 (42) |¯¯¯¯Ft(k)−F∗(k)|>Mt|Δt(k)|

has probability tending to zero for any nonnegative function with . Here

denotes the Gaussian distribution with mean

and variance .

###### Theorem 3.3.

Under Assumption 3.1, the posterior contraction rate (22) is given by

 (43) ε2t=∑k≥1(σft(k)+pft(k)+(mft(k)−F∗(k))2)≍t−2min{α,β}/(2α+5)

for .

###### Proof.

According to Lemma 8.2 from Ghosal & van der Vaart (2017), the posterior contraction rate in each Fourier mode is provided by

 (44) εt(k)2=σft(k)+pft(k)+(mft(k)−F∗(k))2.

Because of (31) and (35) together with , the result (43) follows. ∎

###### Remark 3.4.

The following white noise forward model has been investigated in

Knapik et al. (2011):

 (45) Yn(k)=k−pF∗(k)+1n1/2Ξn(k),n∈Z+,

where are i.i.d. Gaussian random variables with mean zero and variance one. The associated inference problem corresponds to a sequence of measurements with measurement error decreasing as . In our continuous-time problem, we obtain the same asymptotic rates as in the case considered above with under the formal equivalence .

## 4. Time-dependent state and drift estimation

We now return to the full dynamic model (4) subject to observations (5). We primarily wish to estimate the drift function . However, because of the stochastic model errors in (4), we also need to estimate the states . We start with a careful analysis of the single mode system for both small- and large- Fourier modes.

### 4.1. Analysis of the single-mode filtering problem

In this section, we conduct a careful analysis of the single-mode filtering and parameter estimation problem. We suppress the dependence on the mode number , and introduce the parameter . The signal process is therefore given by

 (46) dUt=−ϵ−1Utdt+F∗dt+γ1/2dWt

with given initial condition almost surely. Observations of the process are given by

 (47) dYt=Utdt+ρ1/2dVt

with . The complete observation path up to time is denoted by .

Recall that the mean-field Kalman–Bucy equations are given by (7). We again drop the dependence on the Fourier mode number in the subsequent analysis. The mean-field equations (7) give rise to the following evolution equations in the conditional means and :

 (48a) d¯¯¯¯Ut =−ϵ−1¯¯¯¯Utdt+¯¯¯¯Ftdt−Kut(¯¯¯¯Utdt−dYt), (48b) d¯¯¯¯Ft =−Kft(¯¯¯¯Utdt−dYt).

The deviations and thus satisfy

 (49a) dΔUt =−ϵ−1ΔUtdt+ΔFtdt+γ1/2dWt−12KutΔUtdt, (49b) dΔFt =−12KftΔUtdt.

We can further decompose (48) by introducing the ‘frequentist’  expectation values and with respect to the observation process (47), and the deviations and . Upon using

 (50) E∗[dYt]=E∗[Ut]dt

and introducing the shorthand

, one is left with the ordinary differential equations

 (51a) ddtmut =−ϵ−1mut+mft−Kut(mut−μt), (51b) ddtmft =−Kft(mut−μt),

for the mean values , as well as

 (52a) dΔ¯¯¯¯Ut =−ϵ−1Δ¯¯¯¯Utdt+Δ¯¯¯¯Ftdt−Kut((Δ¯¯¯¯Ut−ΔUt)dt−ρ1/2dVt), (52b) dΔ¯¯¯¯Ft =−Kft((Δ¯¯¯¯Ut−ΔUt)dt−ρ1/2dVt)

for the deviations . The expectation value satisfies the evolution equation

 (53) ddtμt=−ϵ−1μt+F∗,

while the deviation satisfies

 (54) dΔUt=−ϵ−1ΔUtdt+γ1/2dWt.

Both equations follow from (46). We finally combine (52a) and (54) into a single equation for the new variable , and replace (52) by

 (55a) dˆUt =−ϵ−1ˆUtdt+Δ¯¯¯¯Ftdt−γ1/2dWt−Kut(ˆUtdt−ρ1/2dVt), (55b) dΔ¯¯¯¯Ft =−Kft(ˆUtdt−ρ1/2dVt).

We have thus decomposed the mean-field Kalman–Bucy equations into the analysis of the three subsystems (49), (51) and (55). Here, (51) describes the systematic bias in the Bayesian mean estimator, while (55) and (49) characterise the ‘frequentist’ and ‘Bayesian’ uncertainties, respectively.

We note that the equations in (49) are decoupled from (51) and (55). In fact, it is sufficient to look at the deterministic time evolution equations for the conditional variances

 (56) σut:=πt[ΔU2t],σft:=πt[ΔF2t],

and the covariance

 (57) σuft:=πt[ΔUtΔFt],

which are given by

 (58a) ddtσut =−2ϵ−1σut+2σuft+γ−ρ−1(σut)2, (58b) ddtσuft =−ϵ−1σuft+σft−ρ−1σutσuft, (58c) ddtσft =−ρ−1(σuft)2.

The time-dependent linear stochastic differential equations (55) can also be analysed in terms of the variances

 (59) put:=E∗[ˆU2t],pft:=E∗[Δ¯¯¯¯F2t],

and the covariance

 (60) puft:=E∗[ˆUtΔ¯¯¯¯Ft].

These quantities satisfy the linear time-dependent ordinary differential equations

 (61a) ddtput =−2ϵ−1put+2puft+γ−2Kutput+ρ(Kut)2, (61b) ddtpuft =−ϵ−1puft+pft−Kutpuft−Kftput+ρKutKft, (61c) ddtpft =−2Kftpuft+ρ(Kft)2.

Equations (58) and (61) are combined to to yield

 (62a) ddtΔput =−2(ϵ−1+Kut)Δput+2Δpuft, (62b) ddtΔpuft =−(ϵ−1+Kut)Δpuft+Δpft−KftΔput, (62c) ddtΔpft =−2KftΔpuft

in the variables

 (63) Δpuft:=puft−σuft,Δput:=put−σut,Δpft:=pft−σft.

The initial conditions are , and .

###### Lemma 4.1.

It holds that

 (64) put→σut,pft→σft,puft→σuft

as , that is, is a stable equilibrium point of (62).

###### Proof.

The lemma follows from the decay property of

 (65) Vt(Δput,Δpuft,Δpft):=12(Δpft)2+Kft(Δpuft)2+(Kft)22(Δput)2

and the fact that for sufficiently large with an appropriate constant . In particular,

 (66) ∂tVt={(Δpuft)2+Kft(Δput)2}ddtKft≤0,

 (67) ∇Vt⋅ddt⎛⎜ ⎜⎝ΔputΔpuftΔpft⎞⎟ ⎟⎠=−2(ϵ−1+Kut){Kft(Δpuft)2+(Δput)2}Kft≤0.

Therefore, the total time derivative satisfies

 (68) DDtVt(Δput,Δpuft,Δpft)≤0.

Furthermore, for large enough ,

 (69) puft≈(ϵ−1+Kut)−1pft,put≈(ϵ−1+Kut)−1puft≈(ϵ−1+Kut)−2pft,

and becomes small relative to (67) since . Hence, for again sufficiently large and with an appropriate constant ,

 (70) ddt(Δpft)2≈−Ct(Δpft)2,

and the desired convergence result follows. ∎

Lemma 4.1 implies that, as for the pure drift estimation problem, the Bayesian (filtering) variance asymptotically covers the ‘frequentist’ (data) variance.

#### 4.1.1. The large-k Fourier mode case

We now investigate the case in more detail. The equations (58) possess a slow manifold for sufficiently small (Verhulst, 2007; Shchepakina et al., 2014). To leading order, solutions on the slow manifold satisfy

 (71) σut=ϵγ2,σuft=ϵσft.

In other words, the long-time variance of is governed by , and of by

 (72) ddtσft=−Ktσft

up to higher-order terms in . Here we have introduced the new Kalman gain factor

 (73) Kt:=ϵ2ρσft.

We also notice that the Kalman gain factor satisfies . See the Appendix for more details on the derivation of the reduced system (72).

The combined time-dependent linear equations (53) and (51) also give rise to a slow manifold, with the slow dynamics governed by