# Efficient drift parameter estimation for ergodic solutions of backward SDEs

We derive consistency and asymptotic normality results for quasi-maximum likelihood methods for drift parameters of ergodic stochastic processes observed in discrete time in an underlying continuous-time setting. The special feature of our analysis is that the stochastic integral part is unobserved and non-parametric. Additionally, the drift may depend on the (unknown and unobserved) stochastic integrand. Our results hold for ergodic semi-parametric diffusions and backward SDEs. Simulation studies confirm that the methods proposed yield good convergence results.

There are no comments yet.

## Authors

• 4 publications
• 1 publication
01/02/2022

### Parameter estimation of stochastic differential equation driven by small fractional noise

We study the problem of parametric estimation for continuously observed ...
08/28/2020

### Drift estimation of the threshold Ornstein-Uhlenbeck process from continuous and discrete observations

We refer by threshold Ornstein-Uhlenbeck to a continuous-time threshold ...
03/20/2020

### Posterior contraction rates for non-parametric state and drift estimation

We consider a combined state and drift estimation problem for the linear...
12/04/2019

### A probability theoretic approach to drifting data in continuous time domains

The notion of drift refers to the phenomenon that the distribution, whic...
10/16/2020

### Methods to Deal with Unknown Populational Minima during Parameter Inference

There is a myriad of phenomena that are better modelled with semi-infini...
08/03/2021

### Adaptive estimation for small diffusion processes based on sampled data

We consider parametric estimation for multi-dimensional diffusion proces...
09/20/2019

### Applications of Generalized Maximum Likelihood Estimators to stratified sampling and post-stratification with many unobserved strata

Consider the problem of estimating a weighted average of the means of n ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The paper analyzes statistical inference for Markovian ergodic forward backward stochastic differential equations (BSDEs). Ergodic solutions of

backward SDEs may be seen as a generalization of an ergodic Markovian diffusion process with unknown but ergodic diffusion part. Specifically, consider a probability space

with filtration being generated by a -dimensional Brownian motion . Let be a -dimensional Markov diffusion process depending on an unknown parameter . will in the sequel be also referred to as a data generating process. In the classical statistical inference problem for stochastic processes satisfies a stochastic differential equation of the form

 dYt=ψ(t,Yt,θ)+σ(t,Yt,θ)dWt (1)

where and are known functions and is assumed to be known as well. A classical example for is given by a Brownian motion with drift or, rather popular in finance, a geometric Brownian motion. Statistical inference results for (1) are analyzed through quasi-maximum likelihood methods in Yoshida (1992, 2011), Kessler (1997) and Uchida and Yoshida (2012). They have been extended to jump–diffusion processes by Shimizu and Yoshida (2006) and Ogihara and Yoshida (2011). Now assume that the diffusion function in (1) is unknown and that we only know that the integrand of the diffusion part is given by a positive definite -valued ergodic predictable process, say bounded away from zero. This leads to the stochastic differential equation

 dYt=ψ(t,Yt,θ)+VtdWt

where may be identified with a triagonal ergodic stochastic process. Next, suppose that we additionally allow the integrand of the drift, to possibly also depend on and furthermore on an observed additional Markov process . Then we have that satisfies

 dYt=ψ(t,Xt,Yt,VtV⊺t,θ)dt+VtdWt. (2)

This equation is also called a backward stochastic differential equation with solution and driver function . The goal of this paper to give consistency and asymptotic normality results to estimate in (2) with data generating processes and discrete time observations.

BSDEs have been introduced by Peng and Pardoux (1991) and have since been extended in many directions regarding assumptions on the driver function, connections to PDEs and Hamilton-Jacobi-Bellman equations, applications to stochastic optimal control theory, smoothness of , robustness, numerical approximations and invariance principles. Although originally developed for a finite maturity, in many situations the terminal time is either random or there is no natural terminal time at all and the decision maker faces instead an infinite time horizon. Usually in the theory of BSDEs existence and uniqueness of a solution can be guaranteed by Lipschitz conditions on the driver. Now for an infinite time horizon the BSDE may be ill posed which has been addressed by Briand and Hu (1998) by imposing a monotonicity assumption on the driver. However, for our statistical analysis we will simply assume that the data generating process satisfies an equation of the form (2) and is ergodic. In this case we refer to (2) also as an ergodic BSDE.

Ergodic backward SDEs for finite or infinite dimensional Brownian motion have for instance been considered in Buckdahn and Peng (1999), Fuhrmann, Hu and Tessitore (2009), Richou (2009), Debussche, Hu and Tessitore (2011), Hu and Wang (2018), Madec (2015), Hu et al. (2015), Liang and Zariphopoulou (2017), Chong et al. (2019), Hu and Lemonnier (2019), Hu, Liang and Tang (2020) and Guatteri and Tessitore (2020).

For statistical inference on BSDEs there is in general not much literature available. For nonparametric estimation of linear drivers see Su and Lin (2009), Chen and Lin (2010) and Zhang (2013). Zhang and Lin (2014) propose two terminal dependent estimation methods for integral forms of backward SDEs. Song (2014) gives results under independence assumptions. These works consider BSDEs which are non-ergodic and therefore need additional assumptions. In this work we show asymptotic results instead for an infinite time horizon under ergodicity assumptions on . Even if limited to conventional SDEs, our results enables drift parameter estimaion with an unknown volatility process, unlike previous studies (see Example 1 in Section 3).

The paper is structured as follows: In Section 2 we describe the setting our assumptions and give the main results. Section 3 gives a number of applications and examples. Section 4 contains numerical studies in the one- and multidimensional case. The proofs can be found in Section 5.

## 2 Main results

Given a probability space with a right-continuous filtration , let be a -dimensional -adapted process satisfying

 Yt=YT−∫Ttψ(Xs,Ys,VsV⊺s,θ0)ds−∫TtVsdWs,0≤t≤T<∞,

where is a -dimensional standard -Wiener process (), is an unknown parameter, is a bounded open subset in , is an -valued function, is a -dimensional continuous -adapted process, is a matrix-valued continuous -adapted process. The dimension of is possibly zero. In that case, we ignore . We observe , and consider asymptotics: and as .

We construct a maximum-likelihood-type estimator for the parameter . For this purpose, we construct a quasi-likelihood function . Let for a stochastic process . Let be a sequence of positive integers such that

 cnn−ϵ→∞andcnhnnϵ→0, (3)

for some . Let , , and let

 ^Zl=1cnhncn∑m=1(Ytlm−Ytlm−1)(Ytlm−Ytlm−1)⊺(0≤l≤Ln−1),

where denotes transpose. We define a quasi-log-likelihood function by

 Hn(θ)=−12Ln−1∑l=1{(ΔlY−cnhn^ψl(θ))⊺^Z−1l−1cnhn(ΔlY−cnhn^ψl(θ))}1{det^Zl−1>0}, (4)

where is the closure of and . Let for a stochastic process .

Then we can construct a maximum-likelihood-type estimator

as a random variable which maximizes

; .

Let be the space of symmetric, positive definite matrices.

For a vector

and a matrix , we denote

 ∂lv=(∂l∂vi1⋯∂vil)ki1,⋯,il=1and∂lm=(∂l∂mi1j1⋯∂miljl)1≤i1,⋯,il≤k11≤j1,⋯,jl≤k2.

We assume that admits Sobolev’s inequality, that is, for any , there exists a positive constant depending only and such that

 supx∈Θ|u(x)|≤C∑k=0,1(∫Θ|∂kxu(x)|pdx)1/p (5)

for any . Sobelev’s inequality is satisfied if has a Lipschitz boundary (see Adams and Founier (2003)).

Let be the closure of in , and for any , where is the unit matrix. For and , we consider the following assumptions.

Assumption (A1-).

almost surely and there exists a positive constant such that

 E[|Vt−Vs|2p]1/(2p)+E[|Xt−Xs|p]1/p ≤ C|t−s|1/2, E[∣∣∣E[Vt−Vs|Fs](t−s)∣∣∣2p] ≤ C, E[|Xs|p]∨E[|Vs|2p]∨E[|Ys|p] ≤ C,

for .

Assumption (A2-).

exists and is continuous on for , and there exists a constant such that

 |∂lθψ(x,y,z,θ)|≤C(1+|x|+|y|+|z|)r.

Moreover, for any , there exists a constant such that

for , , , , , and .

Assumption (A3-).

At least one of the following two conditions holds true.

1. The function does not depend on and is ergodic, that is, there exists an invariant distribution such that for any measurable function ,

 1T∫T0f(Xt,VtV⊺t)dtP→∫f(x,z)π(dxdz),

as . Moreover,

 ∫(1+|x|+|z|(detz)∧1)pπ(dxdz)<∞.
2. is ergodic, that is, there exists an invariant distribution such that for any measurable function ,

 1T∫T0f(Xt,Yt,VtV⊺t)dtP→∫f(x,y,z)π(dxdydz),

as . Moreover,

 ∫(1+|x|+|y|+|z|(detz)∧1)pπ(dxdydz)<∞.
Assumption (A4).

(Identifiability condition) For , for all on implies .

Most of the above assumptions are standard for asymptotic theory of maximum-likelihood-type estimation to ergodic diffusion processes, and similar (or stronger) assumptions are required in Kessler (1997) and Uchida and Yoshida (2012). A similar statement applies to Condition (A2-) appearing later. Here, the upper bound of in (A2-) depends on . While this assumption is not a typical one, by doing so, (A2-) is satisfied even the case that is not smooth at (for example, with ). For sufficient conditions of ergodicity for , we refer readers to Remark 1 of Uchida and Yoshida (2012).

Fix satisfying (3). Under the assumptions above, we obtain consistency of our estimator.

###### Theorem 2.1 (consistency).

Let such that

 p4r>d∨2ϵ∨4. (6)

Assume (A1-), (A2-), (A3-), and (A4). Then as .

Under (A2-) and (A3-), we define

 Γ=∫∂θψ(x,z,θ0)⊺z−1∂θψ(x,z,θ0)π(dxdz)

if the function does not depend on , and otherwise we define

 Γ=∫∂θψ(x,y,z,θ0)⊺z−1∂θψ(x,y,z,θ0)π(dxdydz).

To deduce asymptotic normality of our estimator, we need a further condition. Let be an open set in such that .

Assumption (A2-).

(A2-) is satisfied. exists and is continuous on for and with , and for any , there exists a constant such that

 |∂ix∂jy∂kz∂lθψ(x,y,z,θ)|≤C′δ(1+|x|+|y|+|z|)r

for , , , and with .

Moreover, there exist a Wiener process independent of and -progressively measurable processes for such that

 Xt=X0+∫t0a1sds+∫t0a2sdWs+∫t0a3sdW′s,

and for any and .

Suppose that . Then we can choose in the definition of satisfying

 nh2ncn→0and√nhn/cn→0. (7)

For such , fix satisfying (3).

###### Theorem 2.2 (Asymptotic normality).

Let such that (6) is satisfied. Assume (A1-), (A2-), (A3-), (A4), and that as . Assume further that is positive definite and satisfies (7). Then

 √nhn(^θn−θ0)d→N(0,Γ−1).

The condition is stronger than the ones in previous works (for instance in Yoshida (2011), and for in Uchida and Yoshida (2012) and Kessler (1997)). Unlike previous studies, we need to construct an estimator of whose structure is not specified. For this purpose, (7) and consequently is required.

###### Remark 2.1.

If is a diffusion process with SDE-coefficients not depending on , is asymptotically efficient under the assumptions of Gobet (2002) because

corresponds the efficient asymptotic variance in Gobet (2002).

## 3 Examples

1. The first example to which our results apply is a data generating process of the form

 X0=x0,
 dXt=ψ(t,Xt,θ)dt+VtdWt,

where is an unknown predictable ergodic process. We remark that previous literature only treated the case with and known.

2. As a further example consider

 dPs :=(dP1sdP2s) =⎛⎜ ⎜⎝(μP1s+√Z1,1s√νsθ1)ds+√Z1,1sdW1s(μP2s+√Z2,1s√νsθ1+√Z2,2s√νsθ2)ds+√Z2,1sdW1s+√Z2,2sdW2s⎞⎟ ⎟⎠. (8)

with This backward SDE is motivated by extending the evolution of a price process in the Heston model to a random and possibly arbitrary large time horizon.

3. Ergodic BSDEs appear naturally in forward performance processes which are utility functionals which do not depend on the specific time horizon, see for instance Hu, Liang and Tang (2020). In Liang and Zariphopoulou (2017) for instance a forward performance process is desribed which has the factor form with being the ergodic solution of an BSDE with quadratic driver function.

## 4 Simulation studies

In the sequel, we will consider different possibilities for our sequences converging to zero or to infinity. In particular, consider

Then we must have

1. [label=)]

Combining three cases yields . We will below try every one of these combinations.

### 4.1 Simulation Results for the Vasicek model

Suppose that evolves according to the Vasicek model, that is, where is the standard Brownian motion, with parameters and . The initial value is set as . Let us estimate in the equation

 dYt=θ√|Xt|+0.1 dt+√|Xt|+0.1 dWt , (9)

where .

In the following is set to be and to be . We consider integers and where to satisfy the conditions of Theorem 2.1 and Theorem 2.2 and . To look for the pair of which best estimates , we run simulations for each combination of and calculate the average of the errors as the sum of differences between and in percentage for the ’s simulated, which means

 Error=∑n∈A|^θn−θ|/θ|A|,

where denotes the set of ’s simulated. Two sets of ’s are considered: and . We let .

The results are summarized in the following tables.

From the tables it can be seen that the choices for and strongly matter. The pairs with gives the smallest error and estimates most accurately under both sets of ’s.When simulations are repeated, any of the three pairs could result in the smallest error. Overall, for the same , the smaller is, the better the estimation for is.

Below, Figure 1 shows an analysis for the Vasicek model where and are chosen to be 6 and 13 respectively, with . The number of simulation times is set as

 {1000,2000,…,10000,20000,…,100000,200000,…,500000}.

For each , we repeat the process by 500 times and calculate the Mean Error of the estimators ’s.

### 4.2 The Heston model

Next, the two-dimensional case is simulated. The process evolves according to the Heston model, that is, , with parameters and , and the initial value is . We want to estimate and in equation (8), where . and remains to be 6 and 13 respectively, and .

The number of simulation times is set as

 {10000,30000,⋯,90000,100000,300000,⋯,900000,1000000,2000000,…,5000000}.

For each , we repeat the process by 500 times and calculate the Mean-Absolute-Error (MAE) of the estimators ’s. Figure 2 shows the result.

## 5 Proofs

In this section, we prove the results in Section 2. In Section 5.1, we introduce two functions and which are approximation of the quasi-log-likelihood . The function is introduced to control the event that either or is close to degenerate for some or , and is equal to except on that event. The function is obtained by replacing the estimator in with . In Section 5.2, we will show that the difference of and can be asymptotically ignored, and we consequently obtain consistency of . To show Theorem 2.2, we need an accurate estimate for the difference of and , which is given in Proposition 5.1 of Section 5.3. Together with asymptotic estimate Lemma 5.3 of , we obtain then the desired results.

### 5.1 Approximation of Hn

For a vector and a matrix , and denote element of a matrix and -th element of , respectively. For and a sequence of positive numbers, let us denote by and sequences of random variables (which may also depend on and ) satisfying

 supθ,lE[|p−1n¯Rn,q(pn)|q]1/q<∞andsupθ,lE[|p−1nR––n,q(pn)|q]1/q→0. (10)

Then (A1-) and (A2-) imply

 ΔlY=∫tl−10tl0ψ(Xt,Yt,VtV⊺t)dt+∫tl−10tl0VtdWt=¯Rn,p/r(√cnhn). (11)

Let . We first introduce a family of stopping times controlling the degeneracy of and . For any , let

 Tn,δ=inf{tl+10;0≤l≤Ln−1,^Zl∉Pδ or Zt∉Pδ for some t∈[0,tl+10]},

where . Under (A1-), implies that and for because has a continuous path.

Let , and let

and

When is sufficiently small and sufficiently large, corresponds to with high probability (see (16)). is an approximation of which is useful when we deduce the asymptotic behavior.

The Burkholder-Davis-Gundy inequality and Jensen’s inequality yield

which implies that by (A1-). Similarly, (A1-) and (A2-) yield . Then by Itô’s formula and the Cauchy-Schwarz inequality, (A1-), and (A2-) yield

 ^Zl = 1cnhncn∑m=1(Ytlm−Ytlm−1)(Ytlm−Ytlm−1)⊺ (12) = = 1cnhncn∑m=1{∫tlmtlm−1Ztdt+2Al,m+¯Rn,p2r(h3/2n)} = Ztl0+2cnhncn∑m=1Al,m+¯Rn,p2r(√cnhn) = Ztl0+¯Rn,p2r(c−1/2n+√cnhn), (13)

where

 [Al,m]ij=12∑k∫tlmtlm−1([Yt−Ytlm−1]i[Vt]jk+[Yt−Ytlm−1]j[Vt]ik)d[Wt]k.

Therefore, for any and , we obtain

 P(maxl|^Zl−Ztl0|>δ)≤δ−q∑lE[|^Zl−Ztl0|q]=O(Ln(c−1/2n+√cnhn)q)→0, (14)

as if .

Then (A1-) yields

 limδ→0liminfn→∞P(Tn,δ=+∞)=1, (15)

and therefore, we have

 limδ→0liminfn→∞P(ˇHn,δ(θ)=Hn(θ) for any θ)=1. (16)

Equation (16) implies that the asymptotic behavior of is essentially the same as more tractable for sufficiently small . We further show that is asymptotically equivalent to in Lemma 5.1 of the following section.

### 5.2 Proof of consistency

###### Lemma 5.1.

Let such that (6) is satisfied. Assume (A1-) and (A2-). Then

 (nhn)−1supθ|ˇHn,δ(θ)−ˇHn,δ(θ0)−~Hn,δ(θ)+~Hn,δ(θ0)|P→0, (17)

as for any .

###### Proof.

By the definitions of and , we can decompose the difference as

 ˇHn,δ(θ)−ˇHn,δ(θ0)−~Hn,δ(θ)+~Hn,δ(θ0) =−cnhn2Ln−1∑l=1(~ψl(θ)⊺(^Z−1l−1−Z−1tl−10)~ψl(θ)−~ψl(θ0)⊺(^Z−1l−1−Z−1tl−10)~ψl(θ0))1{tl0