1 Introduction
Hypoelliptic diffusions naturally occur in various applications, most notably in neuroscience, molecular physics and in mathematical finance. In particular, multidimensional models of a neuron population, such as stochastic approximation of a Hawkes process (Ditlevsen and Löcherbach, 2017), or exotic models of option pricing (Malliavin and Thalmaier, 2006) are described by a hypoelliptic diffusion.
The main difference to the classical (or elliptic) setting is that in the hypoelliptic case the dimensionality of the noise is lower than the dimensionality of the system of stochastic differential equations (SDE), which describes the process. Hypoellipticity can be intuitively explained in the following way: though the covariance matrix of noise is singular due to a degenerate diffusion coefficient, smooth transition density with respect to the Lebesgue measure still exists. That is the case when the noise is propagated to all the coordinates through the drift term.
Properties of hypoelliptic diffusions significantly differ from those of elliptic ones, when all coordinates are driven by a Gaussian noise. Thus they are more difficult to study. The first problem is that each coordinate has a variance of different order. It is the main cause why classical numerical approximation methods do not work well with hypoelliptic diffusions. In particular, it is proven that for hypoelliptic systems the classical EulerMaruyama scheme does not preserve ergodic properties of the true process (Mattingly et al., 2002). The second problem is the degenerate diffusion coefficient. As the explicit form of the transition density is often unknown, parametric inference is usually based on its discrete approximation with the piecewise Gaussian processes (see, for example Kessler (1997)). But in the hypoelliptic case this approach cannot be applied directly because the covariance matrix of the approximated transition density is not invertible.
Now let us be more specific. Consider a twodimensional system of stochastic differential equations of the form:
(1) 
where , is the drift term, is the diffusion coefficient,
is a standard Brownian motion defined on some probability space
,is the vector of the unknown parameters, taken from some compact set
.The goal of this paper is to estimate the parameters of (1) from discrete observations of both coordinates and . It is achieved in two steps: first, we consider a discretization scheme in order to approximate the transition density of the continuous process preserving the ergodic property, and then we propose an estimation technique which maximizes the likelihood function of the discrete approximate model. Let us discuss the solutions proposed by other authors for hypoelliptic systems of different types.
Several works treat the parametric inference problem for a particular case of system (1). It is natural to introduce first the class of stochastic Damping Hamiltonian systems, also known as Langevin equations (Gardiner and Collett, 1985). These hypoelliptic models arise as a stochastic expansion of 2dimensional deterministic dynamical systems — for example, the Van der Pol oscillator (Van der Pol, 1920) perturbed by noise. They are defined as the solution of the following SDE:
(2) 
The particular case of Hamiltonian systems with and is considered in Ozaki (1989), where the link between the continuoustime solution of (2) and the corresponding discrete model is obtained with the socalled local linearization scheme. The idea of this scheme is the following: for a system of SDE with a nonconstant drift and a constant variance, its solution can be intervalwise approximated by a system with a linear drift, and the original covariance matrix being expanded by adding higherorder terms. It allows to construct a quasi Maximum Likelihood Estimator. Pokern et al. (2007) attempt to solve the problem of the noninvertibility of the covariance matrix for the particular case of system (2) with a constant variance with the help of ItôTaylor expansion of the transition density. The parameters are then estimated with a Gibbs sampler based on the discretized model with the noise propagated into the first coordinate with order . This approach allows to estimate the variance coefficient, but it is not efficient for estimating the parameters of the drift term. In Samson and Thieullen (2012) it is shown that a consistent estimator for fully and partially observed data can be constructed using only the discrete approximation of the second equation of the system (2). This method works reasonably good in practice even for more general models when it is possible to convert a system (1) to a simpler form (2). However, the transformation of the observations sampled from the continuous model (1) requires the prior knowledge of the parameters involved in the first equation. The other particular case of (1), when and the drift term is linear and thus the transition density is known explicitly, is treated in LeBreton and Musiela (1985). A consistent maximum likelihood estimator is then constructed in two steps — first, a covariance matrix of the process is estimated from available continuoustime observations, and then it is used for computing the parameters of the drift term. Few other works are also devoted to a nonparametric estimation of the drift and the variance terms (Cattiaux et al., 2014, 2016). To the best of our knowledge, for systems (1) the only reference is Ditlevsen and Samson (2017). They construct a consistent estimator using a discretization scheme based on a ItôTaylor expansion. But the estimation is conducted separately for each coordinate, so it requires partial knowledge of the parameters of the system.
In this paper we propose a new estimation method, adjusting the local linearization scheme described in Ozaki (1989) developed for the models of type (2) to the more general class of SDEs (1). Under the hypoellipticity assumption this scheme propagates the noise to both coordinates of the system and allows to obtain an invertible covariance matrix. We start with describing the discretization scheme, approximating the transition density and proposing a contrast estimator based on the discretized loglikelihood. While we attempt to estimate the parameters included in the drift and diffusion coefficient simultaneuously, we also explain in which cases and how the estimation can be splitted. Then we study the convergence of the scheme and prove the consistency of the proposed estimator based on the 2dimensional contrast. We also detail the different speeds of convergence of the diffusion parameters, which are implied by the hypoellipticity. To the best of our knowledge, the proof of this consistency is the first in the literature. We finish with some numerical experiments, testing the proposed approach on the hypoelliptic FitzHughNagumo model and compare it to other estimators.
This paper is organized as follows: Section 2 presents the model and assumptions. Discrete model is introduced in Section 3. The estimators are studied in Section 4 and illustrated numerically in Section 5. We close with Section 6, devoted to conclusions and discussions. Formal proofs are gathered in Appendix.
2 Models and assumptions
2.1 Notations
We consider system (1). We assume that both variables are discretely observed at equally spaced periods of time on some finite time interval . The vector of observations at time is denoted by , where is the value of the process at the time . We further assume that it is possible to draw a sufficiently large and accurate sample of data, i.e that may be arbitrary large, and the partition size — arbitrary small. Let us also introduce the vector notations:
(3) 
where , is a standard twodimensional Brownian motion defined on the filtered probability space, is a measurable dimensional random vector. Matrices and represent, respectively, the drift and the diffusion coefficient, that is and
(4) 
Throughout the paper we use the following abbreviations: and . We suppress the dependency on the parameter , when its value is clear from context, otherwise additional indices are introduced. True values of the parameters are denoted by . We also adopt the notations from Pokern et al. (2007) and refer to the variable which is directly driven by Gaussian noise as ”rough”, and to as ”smooth”.
2.2 Assumptions
Further, we are working under the following assumptions:

Functions and have bounded partial derivatives of every order, uniformly in . Furthermore .

Global Lipschitz and linear growth conditions. :
where is the standard Euclidean norm. Further, denote by the initial value of the process , then .

Process is ergodic and there exists a unique invariant probability measure
with finite moments of any order.

Both functions and are identifiable, that is .
Assumption (A1) ensures that the weak Hörmander condition is satisfied, thus the system is hypoelliptic in the sense of stochastic calculus of variations (Nualart, 2006, Malliavin and Thalmaier, 2006). In order to prove it we first write the coefficients of system (3) as two vector fields, converting (3) from Itô to the Stratonovich form:
Then their Lie bracket is equal to
By (A1) the first element of this vector is not equal to , thus we conclude that and generate . That means that the weak Hörmander condition is satisfied and as a result the transition density for system (3) exists, though not necessarily has an explicit form.
(A2) implies the existence and uniqueness in law of the weak solution of the system (3) Karatzas and Shreve (1987) and (A4) is a standard condition which is needed to prove the consistency of the estimator.
(A3) ensures that we can apply the weak ergodic theorem, that is, for any continuous function with polynomial growth at infinity:
We do not investigate the conditions under which process is ergodic, as it is not the main focus of this work. Ergodicity of the stochastic damping Hamiltonian system (2) is studied in Wu (2001). Conditions for a wider class of hypoelliptic SDEs can be found in Mattingly et al. (2002). It is also important to know that if the process is ergodic, then its sampling is also ergodic (GenonCatalot et al., 2000).
3 Discrete model
3.1 Derivation
Now we introduce a discretization scheme which approximates the transition density of system (3). It is a generalized version of the local linearization scheme presented in Ozaki (1989) for systems (2). Recall that the process is observed at equally spaced periods of time of size . On each interval we consider a new process described by the homogeneous linear system with the constant diffusion coefficient. That is, we use the approximation on each interval of length , where is the Jacobian of the drift coefficient , computed at the beginning of such an interval. In other words, instead of working with (3), we study systems of the following type:
(5) 
where the initial value is given by the observation of the true process at time . Note that the value of the diffusion matrix is fixed at time . Solution of (5) has an explicit form:
Then the first moment and the covariance matrix of the process on each interval are given, respectively, by:
(6)  
(7) 
Continuous representation of the drift and the variance terms (6)  (7) is not convenient for the numerical implementation, so it has to be discretized. For the drift term the discretization is straightforward and follows directly from the definition of the matrix exponent. For the covariance matrix we use the following proposition, proof of which is postponed to appendix:
Proposition 1.
The secondorder Taylor approximation of matrix defined in (7) has the following form:
(8) 
where the derivatives are computed at time .
Note that the noise in the first coordinate appears only through . Thus, under assumption (A1) matrix (7) has rank 2, while the original covariance matrix is of rank 1. This fact allows us to use techniques developed for nondegenerate Gaussian diffusions. However, note that this matrix is still highly illconditioned, as its determinant is of order .
At this point we give up the continuous time setting and move to the discrete process. Let us denote the approximated form (8) of matrix (7) by . Then the approximation for the solution of (5) is given by:
(9) 
where is a standard Gaussian 2dimensional random vector, is any matrix such that , is an approximation of the conditional expectation of the drift (6), given by:
(10) 
Thus, on each small interval the discretized process can be approximated by the Gaussian process such that .
3.2 Properties of the discrete model
The difference between the true process and its approximation cannot be computed explicitly since the transition density of the process is in general unknown. But the moments of the process can be approximated by a moment generator function. For sufficiently smooth and integrable function
we know that:(11) 
where is the times iterated generator of model (3), given by
where is a weighted Laplace type operator. Thus, the value of the process is approximated by:
(12) 
which coincides with (10) up to the terms of order . Adding more terms to (10) does not improve the accuracy of the scheme unless the model is linear. Further we assume that . Now let us denote by the first element of the vector , and by the second. We have the following proposition:
Proposition 2 (Weak convergence of the local linearization scheme).
For the following holds:
Proof.
For systems (3) it is suitable to approximate the drift term up to the order . It is not sufficient to use a firstorder approximation, as in that case the variance of the first coordinate would be suppressed by the error in the drift term. However, for elliptic systems it is often enough to approximate a drift only up to .
4 Parameter estimation
Let us introduce a contrast function for system (3). In the elliptic case this function is defined as times the loglikelihood of the discretized model (FlorensZmirou (1989), Kessler (1997)). In hypoelliptic case, however, we must modify this criterion taking into account the specific structure of the covariance matrix. Most notably, the contrast is obtained by dividing the first part by :
(13) 
Then the estimator is defined as:
(14) 
For system (2) this correction is not proposed nor in Ozaki (1989), nor in Pokern et al. (2007). We justify this bias theoretically in Lemma 1. However, the 2dimensional criterion (14) is tricky to analyze because of the different orders of variance for the first and the second coordinate. As a result, under the scaling which allows to estimate the parameters included in the rough coordinate, we cannot say anything about the parameters from the smooth coordinate and vice versa. It does not introduce any additional restrictions on the numerical implementation, but must be taken into account during the theoretical study. When both equations are driven by the same parameters, the task is simplified as the parameters can be then estimated from a onedimensional criteria, which involves only one of the two equations.
Let us denote the parameters in the first coordinate by , and in the second by . We start with considering the variance term. In order to prove the convergence of the estimator for the , we fix the value of the parameter in the first coordinate in order to focus on the parameters which directly regulate the diffusion term of the original process. It results in the following Lemma:
Lemma 1.
Under assumptions (A1)(A4), and while the following holds:
(15) 
Then we proceed with the drift term. Note that the first statement of the Lemma 2 suffices to obtain the consistency of the estimator, but only in the case when both equations are driven by the same parameters. However, in combination with the second term it allows us to establish the main result of the paper:
Lemma 2.
Under assumptions (A1)(A4), and the following holds:
Note that consistency for each term is obtained under different scalings. Scaling is standard for the diffusion coefficient, while — for the drift parameter (Kessler (1997)). But is very unusual. That means that each parameter converges to its real value with different speed. It is the main theoretical difficulty encountered while constructing a twodimensional contrast, compared to 2 different onedimensional contrasts proposed in Ditlevsen and Samson (2017). Finally, we establish the following theorem:
Theorem 1.
Under assumptions (A1)(A4) and and the following holds:
Proof.
Let us start with the variance term. The result follows from Lemma 1. Denote the integral on the right side of (15) by . We can choose some subsequence such that converges to some . By the definition of the estimator we know that . But we also know that and thus . It proves the consistency of .
Consistency for the drift term essentially follows from Lemma 2 and from the identifiability of the drift functions. We can state that there exists a subsequence which converges to . At the same time, as is identifiable, thus . That means that the estimator is consistent with respect to the parameters of the first coordinate. Proof for the is analogous. ∎
Another way to unify the scaling and simplify the task of parameter estimation is to consider a quadratic variance of the drift term. This approach does not allow to estimate the parameters of the diffusion term, but is effective for the parameters of the drift. Its main advantage consists in the fact that the computation of the contrast function does not require finding an inverse of matrix (8), which is illconditioned and can cause numerical problems for small values of . Consider
(16) 
with being defined in (10).
Proposition 3.
Under assumptions (A1)(A4) the following holds:
What about the diffusion term, in the case when , parameter can be computed explicitly with the help of the sample covariance matrix. Good properties of this approach for the elliptic case are proven in FlorensZmirou (1989). When it comes to the hypoelliptic systems, this approach must be modified, as the discretization step of order does not allow to compute the terms of order , which represent the propagated noise. However, the value of can still be inferred from the observations of the rough coordinate by computing
(17) 
5 Simulation study
5.1 The model
The two estimators and are evaluated on the simulation study with a hypoelliptic stochastic neuronal model called FitzHughNagumo model (Fitzhugh, 1961). It is a simplified version of the HodgkinHuxley model (Hodgkin and Huxley, 1952), which describes in a detailed manner activation and deactivation dynamics of a spiking neuron. First it was studied in the deterministic case, then it was improved by adding two sources of noise to the both coordinates, what results in an elliptic SDE. However, it is often argued that only ion channels are perturbed by noise, while the membrane potential depends on them in a deterministic way. This idea leads to a 2dimensional hypoelliptic diffusion. In this paper we consider a hypoelliptic version with noise only in the second coordinate as studied in Leon and Samson (2017). More precisely, the behaviour of the neuron is defined through the solution of the system
(18) 
where the variable represents the membrane potential of the neuron at time , and is a recovery variable, which could represent channel kinetic. Parameter is the magnitude of the stimulus current and is often known in experiments, is a time scale parameter and is typically significantly smaller than , since moves ”faster” than . Parameters to be estimated are .
Hypoellipticity and ergodicity of (18) are proven in Leon and Samson (2017). In Jensen et al. (2012) it is proven that is unidentifiable when only one coordinate is observed. Parametric inference for elliptic FitzHughNagumo model both in fully and partially observed case is investigated in Jensen (2014). The same problem, but for the hypoelliptic setting is studied in Ditlevsen and Samson (2017).
5.2 Experimental design
In order to make our experiments more representative, we consider two different settings: an excitatory and an oscillatory behaviour. For the first regime, the drift parameters are set to and the diffusion coefficient , and for the second and . The diffusion coefficient does not change the behaviour pattern, only the ”noisiness” of the observations.
We organize the trials as follows: first, we generate 100 trajectories using formula (9) for each set of parameters with and . Then we downsample the sequence and work only with each 10th value of the process, so that and . We estimate the parameters corresponding to the contrast given by (13). We refer to this method as linearized contrast. For the ”quadratic variance” estimation (QV) we do the following: we estimate the parameter explicitly from the observations of the second variable by (17), and then compute the parameters of the drift by minimizing (16). In addition, we compare both methods to the 1.5 strong order scheme (Ditlevsen and Samson, 2017), based on two separate estimators for each coordinate.
The minimization of the criterions is conducted with the optim function in R with the Conjugate Gradient method. In Table 1
we present the mean value of the estimated parameters and the standard deviation (in brackets). Figure
LABEL:FitzHugh illustrates the estimation densities. Linearized estimator is depicted in blue, QuadraticVariance in red, 1.5 scheme in green.Set 1:  1.5  0.3  0.1  0.6 

Lin. contrast  1.477 (1.056)  0.289 (0.428)  0.100 (0.561)  0.672 (0.291) 
QV  1.460 (1.059)  0.311 (0.403)  0.100 (0.562)  0.611 (0.287) 
1.5 scheme  1.497 (1.055)  0.299 (0.393)  0.099 (0.563)  0.597 (0.288) 
Set 2:  1.2  1.3  0.1  0.4 
Lin. contrast  1.199 (0.531)  1.315 (0.621)  0.102 (0.683)  0.472 (0.340) 
QV  1.170 (0.423)  1.268 (0.598)  0.100 (0.678)  0.400 (0.381) 
1.5 scheme  1.221 (0.645)  1.324 (0.777)  0.088 (0.575)  0.398 (0.338) 
The first thing is that the estimation of the diffusion coefficient with the 2dimensional linearized estimator is biased in both sets of data, even though the contrast is corrected with respect to the hypoellipticity of the system. This bias does not appear in the onedimensional criteria and when the value is directly computed from the observations. Thus its origin may be explained by the dimensionality of the system. Parameters of the second coordinate and are estimated efficiently with all three methods, though the 1.5 scheme provides a more accurate estimation. It is expected, since one of the parameters is fixed to its real value. However, in the case of , 1dimensional criteria does not score better than the linearized and QV estimators. This parameter seems to be underestimated in the case of 1.5 scheme, and a bit overestimated with the linearization scheme, as well as the diffusion coefficient. During the simulation study it is also observed that is the most sensitive to the initial value with which the optim function is initialized, since it directly regulates the amount of noise which is propagated to the first coordinate.
6 Conclusions
The proposed estimator successfully generalizes parametric inference methods developed for models of type (2) for more general class (1). Numerical study shows that it can be used with no prior knowledge of the parameters. It is the most prominent advantage of our method over the analogous works.
From the theoretical point of view, our estimators also reveal good properties. Both linearized and the quadratic variance contrasts are consistent. We did not study the question of the asymptotic normality of the estimator, since in multidimensional case this question is much more harder to treat in comparison to the elliptic case because of the different orders of variance for each coordinate.
The most important direction of prospective work is the adaptation of the estimation method to the case when only the observations of the first coordinate are available. Under proper conditions it must be possible to couple the contrast minimization with one of the existing filtering methods and estimate the parameters of the system (at least, partially). It would allow to face real experimental data.
Another point is the generalization of the contrast to systems of higher dimension. In practice we deal with arbitrary number of rough and smooth variables, and the general rule which describes the behaviour of the contrast in that case is not yet deriven.
7 Acknowledgments
Author’s work was financially supported by LabEx PERSYVALLab and LABEX MMEDII. Sincere gratitude is expressed to Adeline LeclercqSamson and Eva Löcherbach for numerous discussions, helpful remarks and multiple rereadings of the paper draft.
References
 Cattiaux et al. (2014) Cattiaux, P., León, J. R., and Prieur, C. (2014). Estimation for stochastic damping hamiltonian systems under partial observation. ii drift term. ALEA, 11(1):p–359.
 Cattiaux et al. (2016) Cattiaux, P., León, J. R., Prieur, C., et al. (2016). Estimation for stochastic damping hamiltonian systems under partial observation. iii. diffusion term. The Annals of Applied Probability, 26(3):1581–1619.
 Ditlevsen and Löcherbach (2017) Ditlevsen, S. and Löcherbach, E. (2017). Multiclass oscillating systems of interacting neurons. SPA, 127:1840–1869.
 Ditlevsen and Samson (2017) Ditlevsen, S. and Samson, A. (2017). Hypoelliptic diffusions: discretization, filtering and inference from complete and partial observations. submitted.
 Fitzhugh (1961) Fitzhugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6):445–466.
 FlorensZmirou (1989) FlorensZmirou, D. (1989). Approximate discretetime schemes for statistics of diffusion processes. Statistics: A Journal of Theoretical and Applied Statistics, 20(4):547–557.
 Gardiner and Collett (1985) Gardiner, C. W. and Collett, M. J. (1985). Input and output in damped quantum systems: Quantum stochastic differential equations and the master equation. Phys. Rev. A, 31:3761–3774.
 GenonCatalot and Jacod (1993) GenonCatalot, V. and Jacod, J. (1993). On the estimation of the diffusion coefficient for multidimensional diffusion processes. In Annales de l’IHP Probabilités et statistiques, volume 29, pages 119–151.

GenonCatalot et al. (2000)
GenonCatalot, V., Jeantheau, T., and Larédo, C. (2000).
Stochastic volatility models as hidden markov models and statistical applications. bernoulli 6 1051–1079.
Mathematical Reviews (MathSciNet), 10:3318471.  Hodgkin and Huxley (1952) Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative description of membrane currents and its application to conduction and excitation in nerve. Journal of PhysiologyLondon, 117(4):500–544.
 Jensen (2014) Jensen, A. C. (2014). Statistical Inference for Partially Observed Diffusion Processes. Phd thesis, University of Copenhagen.
 Jensen et al. (2012) Jensen, A. C., Ditlevsen, S., Kessler, M., and Papaspiliopoulos, O. (2012). Markov chain monte carlo approach to parameter estimation in the fitzhughnagumo model. Physical Review E, 86(4):041114.
 Karatzas and Shreve (1987) Karatzas, I. and Shreve, S. E. (1987). Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics. Springer, 1 edition.
 Kessler (1997) Kessler, M. (1997). Estimation of an ergodic diffusion from discrete observations. Scandinavian Journal of Statistics, 24(2):211–229.
 Kloeden et al. (2003) Kloeden, P. E., Platen, E., and Schurz, H. (2003). Numerical solution of SDE through computer experiments. Universitext. Springer.
 LeBreton and Musiela (1985) LeBreton, A. and Musiela, M. (1985). Some parameter estimation problems for hypoelliptic homogeneous gaussian diffusions. Banach Center Publications, 16(1):337–356.
 Leon and Samson (2017) Leon, J. R. and Samson, A. (2017). Hypoelliptic stochastic FitzHughNagumo neuronal model: mixing, upcrossing and estimation of the spike rate. Annals of Applied Probability. to appear.
 Malliavin and Thalmaier (2006) Malliavin, P. and Thalmaier, A. (2006). Stochastic calculus of variations in mathematical finance. Springer finance. Springer, 1 edition.
 Mattingly et al. (2002) Mattingly, J. C., Stuart, A. M., and Higham, D. J. (2002). Ergodicity for sdes and approximations: locally lipschitz vector fields and degenerate noise. Stochastic processes and their applications, 101(2):185–232.
 Nualart (2006) Nualart, D. (2006). Malliavin Calculus and Related Topics. Springer. New York.
 Ozaki (1989) Ozaki, T. (1989). Statistical identification of nonlinear random vibration systems. Journal of Applied Mechanics, 56:186–191.
 Pokern et al. (2007) Pokern, Y., Stuart, A. M., and Wiberg, P. (2007). Parameter estimation for partially observed hypoelliptic diffusions. J. Roy. Stat. Soc., 71(1):49–73.
 Samson and Thieullen (2012) Samson, A. and Thieullen, M. (2012). Contrast estimator for completely or partially observed hypoelliptic diffusion. Stochastic Processes and their Applications, 122:2521–2552.
 Van der Pol (1920) Van der Pol, B. (1920). A theory of the amplitude of free and forced triode vibrations. Radio Review, 1(1920):701–710.
 Wu (2001) Wu, L. (2001). Large and moderate deviations and exponential convergence for stochastic damping hamiltonian systems. Stochastic processes and their applications, 91(2):205–238.
8 Appendix
8.1 Properties of the scheme
Proof of the Proposition 1.
Let us consider each integral of (8) separately. Denote:
where we suppress the dependency of the Jacobian of the starting point on the interval in order to keep notations simple. Recalling the Jacobian of system (3) and the definition of the matrix exponent, we have:
Then we can calculate :
where entries are given by:
The first entry can be easily calculated by the Itô isometry:
Now consider the product of two stochastic integrals in the remaining terms. Assume for simplicity that . From the properties of the stochastic integrals (Karatzas and Shreve (1987)), it is straightforward to see that:
That gives the proposition. ∎
8.2 Auxiliary results
In this section we introduce an index in the notation for the time step in order to highlight that it depends on the experimental design. Whenever this dependency is not important, the old notations are used. We start with an important Lemma which links the sampling and the probabilistic law of the continuous process:
Lemma 3 (Kessler (1997)).
Let and , let be such that is differentiable with respect to and , with derivatives of polynomial growth in uniformly in . Then:
Lemma is proven in Kessler (1997) for the onedimensional case. However, as its proof is based only on ergodicity of the process and the assumptions analogous to ours, and not on the discretization scheme or dimensionality, we take it for granted without giving a formal generalization for a multidimensional case. Then proposition 2 in combination with the continuous ergodic theorem and Lemma 3 allow us to establish the following important result:
Lemma 4.
Let be a function with the derivatives of polynomial growth in , uniformly in . Then:

Assume .

Assume .

Assume .
Let us introduce an auxiliary lemma which repeats Lemma 3 in Ditlevsen and Samson (2017). Its proof is based on Lemma 9 from GenonCatalot and Jacod (1993) and Lemma 3:
Lemma 5.
Let be a function with derivatives of polynomial growth in , uniformly in .

Assume and . Then
uniformly in .

Assume and . Then
uniformly in .

and . Then
uniformly in .
Proof.
8.3 Consistency of the estimator
Proof of Lemma 1.
We can split the contrast in the following sum:
where terms are given by follows: