1 Introduction
The paper analyzes statistical inference for Markovian ergodic forward backward stochastic differential equations (BSDEs). Ergodic solutions of
backward SDEs may be seen as a generalization of an ergodic Markovian diffusion process with unknown but ergodic diffusion part. Specifically, consider a probability space
with filtration being generated by a dimensional Brownian motion . Let be a dimensional Markov diffusion process depending on an unknown parameter . will in the sequel be also referred to as a data generating process. In the classical statistical inference problem for stochastic processes satisfies a stochastic differential equation of the form(1) 
where and are known functions and is assumed to be known as well. A classical example for is given by a Brownian motion with drift or, rather popular in finance, a geometric Brownian motion. Statistical inference results for (1) are analyzed through quasimaximum likelihood methods in Yoshida (1992, 2011), Kessler (1997) and Uchida and Yoshida (2012). They have been extended to jump–diffusion processes by Shimizu and Yoshida (2006) and Ogihara and Yoshida (2011). Now assume that the diffusion function in (1) is unknown and that we only know that the integrand of the diffusion part is given by a positive definite valued ergodic predictable process, say bounded away from zero. This leads to the stochastic differential equation
where may be identified with a triagonal ergodic stochastic process. Next, suppose that we additionally allow the integrand of the drift, to possibly also depend on and furthermore on an observed additional Markov process . Then we have that satisfies
(2) 
This equation is also called a backward stochastic differential equation with solution and driver function . The goal of this paper to give consistency and asymptotic normality results to estimate in (2) with data generating processes and discrete time observations.
BSDEs have been introduced by Peng and Pardoux (1991) and have since been extended in many directions regarding assumptions on the driver function, connections to PDEs and HamiltonJacobiBellman equations, applications to stochastic optimal control theory, smoothness of , robustness, numerical approximations and invariance principles. Although originally developed for a finite maturity, in many situations the terminal time is either random or there is no natural terminal time at all and the decision maker faces instead an infinite time horizon. Usually in the theory of BSDEs existence and uniqueness of a solution can be guaranteed by Lipschitz conditions on the driver. Now for an infinite time horizon the BSDE may be ill posed which has been addressed by Briand and Hu (1998) by imposing a monotonicity assumption on the driver. However, for our statistical analysis we will simply assume that the data generating process satisfies an equation of the form (2) and is ergodic. In this case we refer to (2) also as an ergodic BSDE.
Ergodic backward SDEs for finite or infinite dimensional Brownian motion have for instance been considered in Buckdahn and Peng (1999), Fuhrmann, Hu and Tessitore (2009), Richou (2009), Debussche, Hu and Tessitore (2011), Hu and Wang (2018), Madec (2015), Hu et al. (2015), Liang and Zariphopoulou (2017), Chong et al. (2019), Hu and Lemonnier (2019), Hu, Liang and Tang (2020) and Guatteri and Tessitore (2020).
For statistical inference on BSDEs there is in general not much literature available. For nonparametric estimation of linear drivers see Su and Lin (2009), Chen and Lin (2010) and Zhang (2013). Zhang and Lin (2014) propose two terminal dependent estimation methods for integral forms of backward SDEs. Song (2014) gives results under independence assumptions. These works consider BSDEs which are nonergodic and therefore need additional assumptions. In this work we show asymptotic results instead for an infinite time horizon under ergodicity assumptions on . Even if limited to conventional SDEs, our results enables drift parameter estimaion with an unknown volatility process, unlike previous studies (see Example 1 in Section 3).
2 Main results
Given a probability space with a rightcontinuous filtration , let be a dimensional adapted process satisfying
where is a dimensional standard Wiener process (), is an unknown parameter, is a bounded open subset in , is an valued function, is a dimensional continuous adapted process, is a matrixvalued continuous adapted process. The dimension of is possibly zero. In that case, we ignore . We observe , and consider asymptotics: and as .
We construct a maximumlikelihoodtype estimator for the parameter . For this purpose, we construct a quasilikelihood function . Let for a stochastic process . Let be a sequence of positive integers such that
(3) 
for some . Let , , and let
where denotes transpose. We define a quasiloglikelihood function by
(4) 
where is the closure of and . Let for a stochastic process .
Let be the space of symmetric, positive definite matrices.
For a vector
and a matrix , we denoteWe assume that admits Sobolev’s inequality, that is, for any , there exists a positive constant depending only and such that
(5) 
for any . Sobelev’s inequality is satisfied if has a Lipschitz boundary (see Adams and Founier (2003)).
Let be the closure of in , and for any , where is the unit matrix. For and , we consider the following assumptions.
 Assumption (A1).

almost surely and there exists a positive constant such that
for .
 Assumption (A2).

exists and is continuous on for , and there exists a constant such that
Moreover, for any , there exists a constant such that
for , , , , , and .
 Assumption (A3).

At least one of the following two conditions holds true.

The function does not depend on and is ergodic, that is, there exists an invariant distribution such that for any measurable function ,
as . Moreover,

is ergodic, that is, there exists an invariant distribution such that for any measurable function ,
as . Moreover,

 Assumption (A4).

(Identifiability condition) For , for all on implies .
Most of the above assumptions are standard for asymptotic theory of maximumlikelihoodtype estimation to ergodic diffusion processes, and similar (or stronger) assumptions are required in Kessler (1997) and Uchida and Yoshida (2012). A similar statement applies to Condition (A2) appearing later. Here, the upper bound of in (A2) depends on . While this assumption is not a typical one, by doing so, (A2) is satisfied even the case that is not smooth at (for example, with ). For sufficient conditions of ergodicity for , we refer readers to Remark 1 of Uchida and Yoshida (2012).
Fix satisfying (3). Under the assumptions above, we obtain consistency of our estimator.
Theorem 2.1 (consistency).
Let such that
(6) 
Assume (A1), (A2), (A3), and (A4). Then as .
Under (A2) and (A3), we define
if the function does not depend on , and otherwise we define
To deduce asymptotic normality of our estimator, we need a further condition. Let be an open set in such that .
 Assumption (A2).

(A2) is satisfied. exists and is continuous on for and with , and for any , there exists a constant such that
for , , , and with .
Moreover, there exist a Wiener process independent of and progressively measurable processes for such that
and for any and .
Suppose that . Then we can choose in the definition of satisfying
(7) 
For such , fix satisfying (3).
Theorem 2.2 (Asymptotic normality).
The condition is stronger than the ones in previous works (for instance in Yoshida (2011), and for in Uchida and Yoshida (2012) and Kessler (1997)). Unlike previous studies, we need to construct an estimator of whose structure is not specified. For this purpose, (7) and consequently is required.
Remark 2.1.
If is a diffusion process with SDEcoefficients not depending on , is asymptotically efficient under the assumptions of Gobet (2002) because
corresponds the efficient asymptotic variance in Gobet (2002).
3 Examples

The first example to which our results apply is a data generating process of the form
where is an unknown predictable ergodic process. We remark that previous literature only treated the case with and known.

As a further example consider
(8) with This backward SDE is motivated by extending the evolution of a price process in the Heston model to a random and possibly arbitrary large time horizon.

Ergodic BSDEs appear naturally in forward performance processes which are utility functionals which do not depend on the specific time horizon, see for instance Hu, Liang and Tang (2020). In Liang and Zariphopoulou (2017) for instance a forward performance process is desribed which has the factor form with being the ergodic solution of an BSDE with quadratic driver function.
4 Simulation studies
In the sequel, we will consider different possibilities for our sequences converging to zero or to infinity. In particular, consider
Then we must have

[label=)]



Combining three cases yields
. We will below try every one of these combinations.
4.1 Simulation Results for the Vasicek model
Suppose that evolves according to the Vasicek model, that is, where is the standard Brownian motion, with parameters and . The initial value is set as . Let us estimate in the equation
(9) 
where .
In the following is set to be and to be . We consider integers and where to satisfy the conditions of Theorem 2.1 and Theorem 2.2 and . To look for the pair of which best estimates , we run simulations for each combination of and calculate the average of the errors as the sum of differences between and in percentage for the ’s simulated, which means
where denotes the set of ’s simulated. Two sets of ’s are considered: and . We let .
The results are summarized in the following tables.
13  14  15  16  17  18  19  

1  4.79986  4.39994  
2  0.55168  0.59204  0.64261  0.63193  
3  0.13564  0.19179  0.17408  0.43068  0.45217  0.82545  
4  0.065  0.16896  0.0839  0.21815  0.36891  0.46106  0.86921  
5  0.11211  0.14296  0.24044  0.29471  0.30672  0.36704  0.72157  
6  0.07487  0.10097  0.21671  0.19126  0.2234  0.44338  0.57126  
7  0.10343  0.16694  0.20898  0.19727  0.48259  0.55946  
8  0.1056  0.22114  0.24371  0.25512  0.63417  0.7991  
9  0.11754  0.19612  0.29589  0.32613  0.51654  
10  0.14666  0.17857  0.24282  0.18316  0.56393  
11  0.31039  0.22011  0.63986  0.71099  
12  0.23643  0.22018  0.31369  0.51456  
13  0.40641  0.50407  0.43586  
14  0.27931  0.50327  0.29167  
15  0.43433  0.38009  
16  0.52718  0.41497  
17  0.65534  
18  0.52093  
13  14  15  16  17  18  19  

1  1.97497  2.87813  
2  0.25113  0.41856  0.53016  1.08368  
3  0.06392  0.16965  0.20284  0.36099  0.47998  0.82778  
4  0.05567  0.06933  0.09849  0.1913  0.21299  0.51363  0.61066  
5  0.08798  0.06048  0.10773  0.19639  0.27578  0.2966  0.65836  
6  0.06242  0.10747  0.10952  0.19988  0.27281  0.32791  0.53102  
7  0.08838  0.12689  0.08608  0.18994  0.23873  0.44328  
8  0.05909  0.16884  0.20834  0.29658  0.44631  0.73857  
9  0.17656  0.15423  0.28707  0.33089  0.70613  
10  0.13615  0.21278  0.19562  0.38462  0.58632  
11  0.09943  0.15424  0.48022  0.71004  
12  0.17643  0.38302  0.32119  0.57695  
13  0.23213  0.22146  0.54199  
14  0.19692  0.47462  0.54148  
15  0.40643  0.38577  
16  0.33587  0.66682  
17  0.51917  
18  0.92505  
From the tables it can be seen that the choices for and strongly matter. The pairs with gives the smallest error and estimates most accurately under both sets of ’s.When simulations are repeated, any of the three pairs could result in the smallest error. Overall, for the same , the smaller is, the better the estimation for is.
Below, Figure 1 shows an analysis for the Vasicek model where and are chosen to be 6 and 13 respectively, with . The number of simulation times is set as
For each , we repeat the process by 500 times and calculate the Mean Error of the estimators ’s.
4.2 The Heston model
Next, the twodimensional case is simulated. The process evolves according to the Heston model, that is, , with parameters and , and the initial value is . We want to estimate and in equation (8), where . and remains to be 6 and 13 respectively, and .
The number of simulation times is set as
For each , we repeat the process by 500 times and calculate the MeanAbsoluteError (MAE) of the estimators ’s. Figure 2 shows the result.
5 Proofs
In this section, we prove the results in Section 2. In Section 5.1, we introduce two functions and which are approximation of the quasiloglikelihood . The function is introduced to control the event that either or is close to degenerate for some or , and is equal to except on that event. The function is obtained by replacing the estimator in with . In Section 5.2, we will show that the difference of and can be asymptotically ignored, and we consequently obtain consistency of . To show Theorem 2.2, we need an accurate estimate for the difference of and , which is given in Proposition 5.1 of Section 5.3. Together with asymptotic estimate Lemma 5.3 of , we obtain then the desired results.
5.1 Approximation of
For a vector and a matrix , and denote element of a matrix and th element of , respectively. For and a sequence of positive numbers, let us denote by and sequences of random variables (which may also depend on and ) satisfying
(10) 
Then (A1) and (A2) imply
(11) 
Let . We first introduce a family of stopping times controlling the degeneracy of and . For any , let
where . Under (A1), implies that and for because has a continuous path.
Let , and let
and
When is sufficiently small and sufficiently large, corresponds to with high probability (see (16)). is an approximation of which is useful when we deduce the asymptotic behavior.
The BurkholderDavisGundy inequality and Jensen’s inequality yield
which implies that by (A1). Similarly, (A1) and (A2) yield . Then by Itô’s formula and the CauchySchwarz inequality, (A1), and (A2) yield
(12)  
(13) 
where
Therefore, for any and , we obtain
(14) 
as if .
Then (A1) yields
(15) 
and therefore, we have
(16) 
5.2 Proof of consistency
Lemma 5.1.
Proof.
By the definitions of and , we can decompose the difference as
Comments
There are no comments yet.