    # Nonparametric estimation for linear SPDEs from local measurements

We estimate the coefficient function of the leading differential operator in a linear stochastic partial differential equation (SPDE). The estimation is based on continuous time observations which are localised in space. For the asymptotic regime with fixed time horizon and with the spatial resolution of the observations tending to zero, we provide rate-optimal estimators and establish scaling limits of the deterministic PDE and of the SPDE on growing domains. The estimators are robust to lower order perturbations of the underlying differential operator and achieve the parametric rate even in the nonparametric setup with a spatially varying coefficient.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

While there is a large amount of work on probabilistic, analytical and recently also computational aspects of stochastic partial differential equations (SPDEs), many natural statistical questions are open. With this work we want to enlarge the scope of statistical methodology in two major directions. First, we consider observations of a solution path that are local in space and we ask whether the underlying differential operator or rather its local characteristics can be estimated from this local information only. Second, we allow the coefficients in the differential operator to vary in space and we pursue nonparametric estimation of the coefficient functions, as opposed to parametric estimation approaches for finite-dimensional global parameters in the coefficients. Naturally, both directions are intimately connected.

As a concrete model we consider the parabolic SPDE

 dX(t)=AϑX(t)dt+BdW(t),t∈[0,T],

with the second-order differential operator on some bounded domain , see Section 2 for formal details. The coefficient functions are unknown on and we aim at estimating , which models the diffusivity in a stochastic heat equation. The functions as well as the operator

in front of the driving space-time white noise process

form an unknown nuisance part.

Measurements of a solution process necessarily have a minimal spatial resolution and we dispose of the observations , where is an averaging kernel with support of diameter around some . We keep the time span fixed and construct an estimator, called proxy MLE, which for the resolution asymptotics converges at rate to and satisfies a CLT. Another estimator, the so-called augmented MLE

, will even converge under far more general conditions and exhibit a smaller asymptotic variance, but requires a second local observation process

in terms of the Laplace operator . Clearly, if we have access to these observations around all , then both estimators can be used to estimate the diffusivity function nonparametrically on all of .

These results are statistically remarkable. First of all, even for the parametric case that is a constant, it is not immediately clear that is identified (i.e., exactly recovered) from local observations in a shrinking neighbourhood around some only. Probabilistically, this means that the local observation laws are mutually singular for different values of

. What is more, the bias-variance tradeoff paradigm in nonparametric statistics does not apply: asymptotic bias and standard deviation are both of order

and the CLT provides us even with a simple pointwise confidence interval for

. The robustness of the estimators to lower order parts in the differential operator and unknown is very attractive for applications. The rate is shown to be the best achievable rate in a minimax sense even for constant without nuisance parts.

The fundamental probabilistic structure behind these results is a universal scaling limit of the observation process for . At a highly localised level, the differential operator behaves like , as expressed in Corollary 3.6 below, and the construction of the estimators shows a certain scaling invariance with respect to . To study these scaling limits, we need to consider the deterministic PDE on growing domains via the stochastic Feynman-Kac approach and to deduce tight asymptotics for the action of the semigroup and the heat kernels. Further tools like the fourth moment theorem or the Feldman-Hajek Theorem rely on the underlying Gaussian structure, but extensions to semi-linear SPDEs seem possible.

Let us compare our localisation approach to the spectral approach, introduced by Huebner and Rozovskii (1995), for parametric estimation. In the simplest case for some and commuting with , the SPDE solution can be expressed in the eigenbasis of the Laplace operator . If the first coefficient processes (Fourier modes of ) are observed, then a maximum-likelihood estimator for is asymptotically efficient as . This approach has turned out to be very versatile, allowing also for estimating time-dependent nonparametrically (Huebner and Lototsky (2000)) or to cover nonlinear SPDEs as the stochastic Navier-Stokes equation (Cialenco and Glatt-Holtz (2011)). The methodology, however, is intrinsically bound to observations in the spectral domain and to operators

whose eigenfunctions, at least in the leading order, are independent of

. In contrast, we work with local observations in space and the unknown spectrum of the operators does not harm us. More conceptually, we rely on the local action of the differential operator , while the spectral approach also applies in an abstract operator in Hilbert space setting.

Our case of spatially varying coefficients has been considered first by Aihara and Sunahara (1988) (with ) in a filtering problem. The corresponding nonparametric estimation problem is then addressed by Aihara and Bagchi (1989) with a sieve least squares estimator, but they achieve consistency only for global observations with a growing time horizon . In a stationary one-dimensional setting Bibinger and Trabs (2017) ask whether the parameter can be estimated when observing the solution only at over a fixed time interval . Interestingly, in the case the parameter cannot be recovered if the level of the space-time white noise is unknown. For a recent and exhaustive survey on statistics for SPDEs we refer to Cialenco (2018).

In Section 2 the SPDE and the observation model are introduced and in Section 3 the scaling properties along with the resolution level are discussed. Section 4 derives our estimators via a least-squares and a likelihood approach and provides some basic insight into their error analysis. The main convergence results as well as a minimax lower bound are presented in Section 5. The findings are illustrated by a numerical example in Section 6. While the main steps in the proofs are presented together with the results, all more technical arguments are delegated to the Appendix.

## 2 The model

### 2.1 The SPDE model

Let be a bounded open set in with -boundary and consider with the usual -norm . for denotes the fractional Sobolev spaces and is the closure of in . We write for the Euclidean scalar product and for the norm. Let denote the divergence and the Laplace operator. denotes the semigroup on generated by with domain .

Define a second order elliptic operator with Dirichlet boundary conditions

 Aϑ=Δϑ+A0,D(Aϑ)=H10(Λ)∩H2(Λ),

where is the weighted Laplace operator with spatially varying diffusivity , , and with functions , , .

Throughout this work is fixed. Let

be a filtered probability space with a cylindrical Brownian motion

on ( is also referred to as space-time white noise), and let be a bounded linear operator, which is not assumed to be trace class. We study the linear stochastic partial differential equation

 ⎧⎪⎨⎪⎩dX(t)=AϑX(t)dt+BdW(t),0

with Dirichlet boundary conditions and deterministic initial value .

Let be the analytic semigroup on generated by , cf. Lunardi (1995), Theorem 3.1.3. If with Hilbert-Schmidt norm , then the SPDE (2.1) admits a unique weak solution taking values in such that for , satisfies

 l(t,z)=⟨X0,z⟩+∫t0l(s,A∗ϑz)ds+⟨BW(t),z⟩, (2.2)

cf. Da Prato and Zabczyk (2014), Theorem 5.4. is explicitly defined via the variation of constants formula:

 X(t)=Sϑ(t)X0+∫t0Sϑ(t−s)BdW(s). (2.3)

Our statistical analysis below relies on the functionals rather than the full solution , and so we employ a more general solution concept. Note for , that

 l(t,z)=⟨Sϑ(t)X0,z⟩+∫t0⟨S∗ϑ(t−s)z,BdW(s)⟩ (2.4)

is well-defined even if .

###### Proposition 2.1.

Define by (2.4). Then:

1. solves the SPDE (2.1) in the sense that (2.2) holds and that is linear and continuous.

2. is a Gaussian process with mean function and covariance function

 c((t,z),(t′,z′))=∫t∧t′0⟨B∗S∗ϑ(t−s)z,B∗S∗ϑ(t′−s)z′⟩ds. (2.5)

In the following, justified formally by (2.4), we write instead of , even if the weak solution does not exist.

### 2.2 Local observations

Throughout this work let be fixed. The following rescaling will be useful in the sequel: for and set

 Λδ,x0 :=δ−1(Λ−x0)={δ−1(x−x0):x∈Λ} and Λ0,x0:=Rd, zδ,x0(x) :=δ−d/2z(δ−1(x−x0)),x∈Rd.

Let have compact support in . The compact support ensures that is localized around and , for all small . Local measurements of (2.1) at with resolution level until time are described by the real-valued processes , ,

 Xδ,x0(t) =⟨X(t),Kδ,x0⟩, (2.6) XΔδ,x0(t) =⟨X(t),ΔKδ,x0⟩. (2.7)

Note that it is sufficient to observe for in a neighbourhood of in order to provide us with .

The process satisfies and

 dXδ,x0(t)=⟨X(t),A∗ϑKδ,x0⟩dt+∥B∗Kδ,x0∥d¯W(t) (2.8)

with the scalar Brownian motion .

## 3 Scaling assumptions

### 3.1 Rescaled operators and semigroups

In view of (2.8) we need to study and as . For smooth compactly supported it is clear that

 ΔKδ,x0=δ−2(ΔK)δ,x0

and similarly for . If for constant , this suggests formally

 S∗ϑ(t)Kδ,x0=(S∗ϑ,δ,x0(tδ−2)K)δ,x0,

where is the semigroup generated by on . Applying the semigroup to a localized function is therefore equivalent to rescaling the semigroup in time and space and keeping the test function fixed. The scaling exhibited here is the usual scaling for parabolic PDEs.

In order to make this heuristic precise note first that

with domain and . is the infinitesimal generator of the analytic semigroup on (Pazy (1983, Lemma 7.3.4)). Define similarly operators , with domains , where for

 A0,δ,x0z =δ⟨a(x0+δ∙),∇z⟩Rd+δ2b(x0+δ∙)z, (3.1) A∗0,δ,x0z =−δdiv(a(x0+δ∙)z)+δ2b(x0+δ∙)z. (3.2)

They generate correspondingly analytic semigroups and on . In Section A.2 we prove further:

###### Lemma 3.1.

For the following holds:

1. If , then .

2. If , then , .

### 3.2 Scaling of B

Just as with we also need that behaves nicely when applied to . For this we shall assume a scaling limit for , which does not degenerate in combination with .

###### Assumption 3.2.

There are bounded linear operators such that for with support in and for and . Introducing

 Ψ(z,z′) :=∫∞0⟨B∗0,x0esΔz,B∗0,x0esΔz′⟩L2(Rd)ds,z,z′∈L2(Rd), (3.3)

assume the non-degeneracy conditions , .

###### Remark 3.3.

We shall see that is going to be the limiting covariance in (2.5). is always nonnegative and finite because

 Ψ(ΔK,ΔK) ≤∥B∗0,x0∥2L2(Rd)∫∞0⟨e2sΔΔK,ΔK⟩L2(Rd)ds =−12∥B∗0,x0∥2L2(Rd)⟨K,ΔK⟩L2(Rd).
###### Example 3.4.

1. For a bounded continuous function define the multiplication operator , . With the SPDE in (2.1) can be written informally as

 ˙X(t,x)=AϑX(t,x)+σ(x)˙W(t,x),0

Note that commutes with only if is constant. For we find that and so . Then for , , and thus is the multiplication operator on with the constant . For , we have

 Ψ(Δz,z′)=σ2(x0)∫∞0⟨e2sΔΔz,z′⟩L2(Rd)ds=−σ2(x0)2⟨z,z′⟩L2(Rd), (3.4)

and thus by partial integration . The non-degeneracy conditions are clearly satisfied.

2. Let be as in (i) and consider with bounded , , the perturbed multiplication operator , . By functional calculus for with and for , . and are as in (i).

3. Assumption 3.2 excludes , , a typical choice to obtain smooth solutions , cf. Da Prato and Zabczyk (2014, Chapter 5.5). Indeed, by (ii) and so , violating the non-degeneracy conditions. This can be dealt with by modifying the test function . For example, if for constant and , then assume we have access to , instead of (2.6), (2.7). Since and commute, has the same distribution as , where corresponds to the SPDE (2.1) with and , and so Assumption 3.2 is satisfied.

### 3.3 The initial condition

###### Assumption 3.4(z;β).

For and with compact support in for all small , the initial condition satisfies

 ∫T0⟨Sϑ(t)X0,(Δz)δ,x0⟩2dt=o(ℓd,2(δ)−1δβ),δ→0,

where for and otherwise.

###### Lemma 3.5.

Let or . Then Assumption 3.3(;) is satisfied for

1. if for some , in particular if ,

2. if .

###### Proof.

This follows from Lemma A.10(ii,iii) below. ∎

### 3.4 From bounded to unbounded domains

Lemma 3.1 and Assumption 3.2 allow us to study the covariance function of :

 c((t,Kδ,x0),(t′,Kδ,x0)) (3.5) =δ2∫tδ−2∧t′δ−20⟨B∗δ,x0S∗ϑ,δ,x0(δ−2t−s)K,B∗δ,x0S∗ϑ,δ,x0(δ−2t′−s)K⟩L2(Λδ,x0)ds.

Let us see what happens as . From (3.1) we find in , which suggests formally for the semigroups . This means that , the solution of

 ddtu(δ)(t) =(A∗ϑ,δ,x0u(δ))(t),u(δ)(0)=K,

on the bounded domain converges to , the solution of

 ddtu(t)=ϑ(x0)Δu(t),u(0)=K,

on all of . This scaling limit, which seems natural but is nevertheless non-trivial, lies at the heart of the analysis for the covariance function. We will prove it in Proposition A.8 below as well as the following interesting corollary, which for simplicity assumes a zero initial condition:

###### Corollary 3.6.

Grant Assumption 3.2 and let . Then the finite dimensional distributions of , , converge to those of , , solving the stochastic heat equation on with space-time white noise :

 {dY(t)=ϑ(x0)ΔY(t)dt+B0,x0dW(t),t>0,Y(0)=0.

This corollary demonstrates the strength of local measurements that at small scales only the highest order differential operator matters, together with the local coefficient and the local operator in the noise.

## 4 The two estimation methods

### 4.1 Motivation and construction

We give two motivations for deriving the estimators in the parametric case with constant , . As we shall see later, these estimators will then work quite universally for nonparametric specifications of and general and .

#### Least squares approach.

In the deterministic situation of (2.8) without driving noise (i.e. and ) we recover via for all . A standard least-squares ansatz in the noisy situation would therefore lead to an estimator . While this itself is certainly not well defined, the corresponding normal equations yield the feasible estimator

 ^ϑLSδ=∫T0XΔδ,x0(t)dXδ,x0(t)∫T0XΔδ,x0(t)2dt,

compare with the approach by Maslowski and Tudor (2013) for fractional noise.

#### Likelihood approach.

Assume that only is observed. Denote by , the laws of and on the canonical path space equipped with its Borel sigma algebra. Typically, the likelihood of with respect to is determined via Girsanov’s theorem. This is not immediate from (2.8), because cannot be obtained from for fixed . Therefore we employ Liptser and Shiryaev (2001, Theorem 7.17) and write as the diffusion-type process

 dXδ,x0(t)=ϑmϑ(t)dt+∥K∥L2(Rd)d~W(t),t∈[0,T],

with a different scalar Brownian motion , adapted to the filtration generated by , and

 mϑ(t) =Eϑ[XΔδ,x0(t)∣∣(Xδ,x0(s))0≤s≤t].

Girsanov’s theorem in the form of Liptser and Shiryaev (2001, Theorem 7.18) applies and we find that has with respect to the likelihood

 L(ϑ,Xδ,x0) =exp(ϑ∥K∥2L2(Rd)∫T0mϑ(t)dXδ,x0(t)−ϑ22∥K∥2L2(Rd)∫T0mϑ(t)2dt).

Computing the conditional expectation is a non-explicit filtering problem, even in the parametric case . In particular, depends on in a highly nonlinear way. We pursue two different modifications:

##### Augmented MLE.

If we observe additionally, then we can just replace the conditional expectation in the likelihood by its argument , which is in particular independent of . Maximizing this modified likelihood leads to the augmented MLE

 ^ϑAδ=∫T0XΔδ,x0(t)dXδ,x0(t)∫T0XΔδ,x0(t)2dt. (4.1)

We remark that .

##### Proxy MLE.

If we do not dispose of additional observations, we can approximate by the conditional expectation . In our simplified setup with and there exists a stationary solution , , with a two-sided cylindrical Brownian motion , provided the variance remains finite. Then also the processes and are stationary with

 Var(Xδ,x0(t)) =∫t−∞⟨Sϑ(2t−2s)Kδ,x0,Kδ,x0⟩ds =12ϑ⟨(−Δ)−1Kδ,x0,Kδ,x0⟩, (4.2) Cov(XΔδ,x0(t),Xδ,x0(t)) =∫t−∞⟨Sϑ(2t−2s)ΔKδ,x0,Kδ,x0⟩ds=−12ϑ∥K∥2L2(Rd), (4.3)

compare (2.5). In general, may not exist, but if we assume the existence of with and compact support in , then by the scaling in Lemma 3.1 follows. In this situation we therefore obtain

 Eϑ[XΔδ,x0(t)|Xδ,x0(t)]=Cov(XΔδ,x0