The paper analyzes statistical inference for Markovian ergodic forward backward stochastic differential equations (BSDEs). Ergodic solutions of
backward SDEs may be seen as a generalization of an ergodic Markovian diffusion process with unknown but ergodic diffusion part. Specifically, consider a probability spacewith filtration being generated by a -dimensional Brownian motion . Let be a -dimensional Markov diffusion process depending on an unknown parameter . will in the sequel be also referred to as a data generating process. In the classical statistical inference problem for stochastic processes satisfies a stochastic differential equation of the form
where and are known functions and is assumed to be known as well. A classical example for is given by a Brownian motion with drift or, rather popular in finance, a geometric Brownian motion. Statistical inference results for (1) are analyzed through quasi-maximum likelihood methods in Yoshida (1992, 2011), Kessler (1997) and Uchida and Yoshida (2012). They have been extended to jump–diffusion processes by Shimizu and Yoshida (2006) and Ogihara and Yoshida (2011). Now assume that the diffusion function in (1) is unknown and that we only know that the integrand of the diffusion part is given by a positive definite -valued ergodic predictable process, say bounded away from zero. This leads to the stochastic differential equation
where may be identified with a triagonal ergodic stochastic process. Next, suppose that we additionally allow the integrand of the drift, to possibly also depend on and furthermore on an observed additional Markov process . Then we have that satisfies
This equation is also called a backward stochastic differential equation with solution and driver function . The goal of this paper to give consistency and asymptotic normality results to estimate in (2) with data generating processes and discrete time observations.
BSDEs have been introduced by Peng and Pardoux (1991) and have since been extended in many directions regarding assumptions on the driver function, connections to PDEs and Hamilton-Jacobi-Bellman equations, applications to stochastic optimal control theory, smoothness of , robustness, numerical approximations and invariance principles. Although originally developed for a finite maturity, in many situations the terminal time is either random or there is no natural terminal time at all and the decision maker faces instead an infinite time horizon. Usually in the theory of BSDEs existence and uniqueness of a solution can be guaranteed by Lipschitz conditions on the driver. Now for an infinite time horizon the BSDE may be ill posed which has been addressed by Briand and Hu (1998) by imposing a monotonicity assumption on the driver. However, for our statistical analysis we will simply assume that the data generating process satisfies an equation of the form (2) and is ergodic. In this case we refer to (2) also as an ergodic BSDE.
Ergodic backward SDEs for finite or infinite dimensional Brownian motion have for instance been considered in Buckdahn and Peng (1999), Fuhrmann, Hu and Tessitore (2009), Richou (2009), Debussche, Hu and Tessitore (2011), Hu and Wang (2018), Madec (2015), Hu et al. (2015), Liang and Zariphopoulou (2017), Chong et al. (2019), Hu and Lemonnier (2019), Hu, Liang and Tang (2020) and Guatteri and Tessitore (2020).
For statistical inference on BSDEs there is in general not much literature available. For nonparametric estimation of linear drivers see Su and Lin (2009), Chen and Lin (2010) and Zhang (2013). Zhang and Lin (2014) propose two terminal dependent estimation methods for integral forms of backward SDEs. Song (2014) gives results under independence assumptions. These works consider BSDEs which are non-ergodic and therefore need additional assumptions. In this work we show asymptotic results instead for an infinite time horizon under ergodicity assumptions on . Even if limited to conventional SDEs, our results enables drift parameter estimaion with an unknown volatility process, unlike previous studies (see Example 1 in Section 3).
2 Main results
Given a probability space with a right-continuous filtration , let be a -dimensional -adapted process satisfying
where is a -dimensional standard -Wiener process (), is an unknown parameter, is a bounded open subset in , is an -valued function, is a -dimensional continuous -adapted process, is a matrix-valued continuous -adapted process. The dimension of is possibly zero. In that case, we ignore . We observe , and consider asymptotics: and as .
We construct a maximum-likelihood-type estimator for the parameter . For this purpose, we construct a quasi-likelihood function . Let for a stochastic process . Let be a sequence of positive integers such that
for some . Let , , and let
where denotes transpose. We define a quasi-log-likelihood function by
where is the closure of and . Let for a stochastic process .
Then we can construct a maximum-likelihood-type estimator
as a random variable which maximizes; .
Let be the space of symmetric, positive definite matrices.
For a vector
For a vectorand a matrix , we denote
We assume that admits Sobolev’s inequality, that is, for any , there exists a positive constant depending only and such that
for any . Sobelev’s inequality is satisfied if has a Lipschitz boundary (see Adams and Founier (2003)).
Let be the closure of in , and for any , where is the unit matrix. For and , we consider the following assumptions.
- Assumption (A1-).
almost surely and there exists a positive constant such that
- Assumption (A2-).
exists and is continuous on for , and there exists a constant such that
Moreover, for any , there exists a constant such that
for , , , , , and .
- Assumption (A3-).
At least one of the following two conditions holds true.
The function does not depend on and is ergodic, that is, there exists an invariant distribution such that for any measurable function ,
as . Moreover,
is ergodic, that is, there exists an invariant distribution such that for any measurable function ,
as . Moreover,
- Assumption (A4).
(Identifiability condition) For , for all on implies .
Most of the above assumptions are standard for asymptotic theory of maximum-likelihood-type estimation to ergodic diffusion processes, and similar (or stronger) assumptions are required in Kessler (1997) and Uchida and Yoshida (2012). A similar statement applies to Condition (A2-) appearing later. Here, the upper bound of in (A2-) depends on . While this assumption is not a typical one, by doing so, (A2-) is satisfied even the case that is not smooth at (for example, with ). For sufficient conditions of ergodicity for , we refer readers to Remark 1 of Uchida and Yoshida (2012).
Fix satisfying (3). Under the assumptions above, we obtain consistency of our estimator.
Theorem 2.1 (consistency).
Let such that
Assume (A1-), (A2-), (A3-), and (A4). Then as .
Under (A2-) and (A3-), we define
if the function does not depend on , and otherwise we define
To deduce asymptotic normality of our estimator, we need a further condition. Let be an open set in such that .
- Assumption (A2-).
(A2-) is satisfied. exists and is continuous on for and with , and for any , there exists a constant such that
for , , , and with .
Moreover, there exist a Wiener process independent of and -progressively measurable processes for such that
and for any and .
Suppose that . Then we can choose in the definition of satisfying
For such , fix satisfying (3).
Theorem 2.2 (Asymptotic normality).
The condition is stronger than the ones in previous works (for instance in Yoshida (2011), and for in Uchida and Yoshida (2012) and Kessler (1997)). Unlike previous studies, we need to construct an estimator of whose structure is not specified. For this purpose, (7) and consequently is required.
If is a diffusion process with SDE-coefficients not depending on , is asymptotically efficient under the assumptions of Gobet (2002) because corresponds the efficient asymptotic variance in Gobet (2002).
corresponds the efficient asymptotic variance in Gobet (2002).
The first example to which our results apply is a data generating process of the form
where is an unknown predictable ergodic process. We remark that previous literature only treated the case with and known.
As a further example consider
with This backward SDE is motivated by extending the evolution of a price process in the Heston model to a random and possibly arbitrary large time horizon.
Ergodic BSDEs appear naturally in forward performance processes which are utility functionals which do not depend on the specific time horizon, see for instance Hu, Liang and Tang (2020). In Liang and Zariphopoulou (2017) for instance a forward performance process is desribed which has the factor form with being the ergodic solution of an BSDE with quadratic driver function.
4 Simulation studies
In the sequel, we will consider different possibilities for our sequences converging to zero or to infinity. In particular, consider
Then we must have
Combining three cases yields
. We will below try every one of these combinations.
4.1 Simulation Results for the Vasicek model
Suppose that evolves according to the Vasicek model, that is, where is the standard Brownian motion, with parameters and . The initial value is set as . Let us estimate in the equation
In the following is set to be and to be . We consider integers and where to satisfy the conditions of Theorem 2.1 and Theorem 2.2 and . To look for the pair of which best estimates , we run simulations for each combination of and calculate the average of the errors as the sum of differences between and in percentage for the ’s simulated, which means
where denotes the set of ’s simulated. Two sets of ’s are considered: and . We let .
The results are summarized in the following tables.
From the tables it can be seen that the choices for and strongly matter. The pairs with gives the smallest error and estimates most accurately under both sets of ’s.When simulations are repeated, any of the three pairs could result in the smallest error. Overall, for the same , the smaller is, the better the estimation for is.
Below, Figure 1 shows an analysis for the Vasicek model where and are chosen to be 6 and 13 respectively, with . The number of simulation times is set as
For each , we repeat the process by 500 times and calculate the Mean Error of the estimators ’s.
4.2 The Heston model
Next, the two-dimensional case is simulated. The process evolves according to the Heston model, that is, , with parameters and , and the initial value is . We want to estimate and in equation (8), where . and remains to be 6 and 13 respectively, and .
The number of simulation times is set as
For each , we repeat the process by 500 times and calculate the Mean-Absolute-Error (MAE) of the estimators ’s. Figure 2 shows the result.
In this section, we prove the results in Section 2. In Section 5.1, we introduce two functions and which are approximation of the quasi-log-likelihood . The function is introduced to control the event that either or is close to degenerate for some or , and is equal to except on that event. The function is obtained by replacing the estimator in with . In Section 5.2, we will show that the difference of and can be asymptotically ignored, and we consequently obtain consistency of . To show Theorem 2.2, we need an accurate estimate for the difference of and , which is given in Proposition 5.1 of Section 5.3. Together with asymptotic estimate Lemma 5.3 of , we obtain then the desired results.
5.1 Approximation of
For a vector and a matrix , and denote element of a matrix and -th element of , respectively. For and a sequence of positive numbers, let us denote by and sequences of random variables (which may also depend on and ) satisfying
Then (A1-) and (A2-) imply
Let . We first introduce a family of stopping times controlling the degeneracy of and . For any , let
where . Under (A1-), implies that and for because has a continuous path.
Let , and let
When is sufficiently small and sufficiently large, corresponds to with high probability (see (16)). is an approximation of which is useful when we deduce the asymptotic behavior.
The Burkholder-Davis-Gundy inequality and Jensen’s inequality yield
which implies that by (A1-). Similarly, (A1-) and (A2-) yield . Then by Itô’s formula and the Cauchy-Schwarz inequality, (A1-), and (A2-) yield
Therefore, for any and , we obtain
as if .
Then (A1-) yields
and therefore, we have
5.2 Proof of consistency
Let such that (6) is satisfied. Assume (A1-) and (A2-). Then
as for any .
By the definitions of and , we can decompose the difference as