Inference for ergodic diffusions plus noise

by   Shogo H. Nakakita, et al.

We research adaptive maximum likelihood-type estimation for an ergodic diffusion process where the observation is contaminated by noise. This methodology leads to the asymptotic independence of the estimators for the variance of observation noise, the diffusion parameter and the drift one of the latent diffusion process. Moreover, it can lessen the computational burden compared to simultaneous maximum likelihood-type estimation. In addition to adaptive estimation, we propose a test to see if noise exists or not, and analyse real data as the example such that data contains observation noise with statistical significance.


Adaptive estimation and noise detection for an ergodic diffusion with observation noises

We research adaptive maximum likelihood-type estimation for an ergodic d...

Quasi-likelihood analysis of an ergodic diffusion plus noise

We consider adaptive maximum-likelihood-type estimators and adaptive Bay...

Parametric estimation for convolutionally observed diffusion processes

We propose a new statistical observation scheme of diffusion processes n...

Perspective Maximum Likelihood-Type Estimation via Proximal Decomposition

We introduce an optimization model for maximum likelihood-type estimatio...

Adaptive test for ergodic diffusions plus noise

We propose some parametric tests for ergodic diffusion-plus-noise model,...

Parametric estimation for a signal-plus-noise model from discrete time observations

This paper deals with the parametric inference for integrated signals em...

Inference in the stochastic Cox-Ingersol-Ross diffusion process with continuous sampling: Computational aspects and simulation

In this paper, we consider a stochastic model based on the Cox- Ingersol...

1 Introduction

We consider a -dimensional ergodic diffusion process defined by the following stochastic differential equation


where is a -dimensional standard Wiener process, is a

-valued random variable independent of

, , with and being compact and convex. Moreover, , are known functions. We denote and as the true value of which belongs to .

We deal with the problem of parametric inference for with defined by the following model


where is the discretisation step, is a positive semi-definite matrix and is an i.i.d. sequence of -valued random variables such that , , and each component is independent of other components, and . Hence the term indicates the exogenous noise. Let be the convex and compact parameter space such that and be the true value of such that , where is the half-vectorisation operator. We denote and . With respect to the sampling scheme, we assume that and as .

Our main concern with these settings is the adaptive maximum likelihood (ML)-type estimation scheme in the form of ,


where for any matrix , indicates the transpose of , and are quasi-likelihood functions, which are defined in Section 3.

The composition of the model above is quite analogous to that of discrete-time state space models (e.g., see [19]) in terms of expression of endogenous perturbation in the system of interest and exogenous noise attributed to observation separately. As seen in the assumption , this model that we consider is for the situation where high-frequency observation holds, and this requirement enhances the flexibility of modelling since our setting includes the models with non-linearity, dependency of the innovation on state space itself. In addition, adaptive estimation which also becomes possible through the high-frequency setting has the advantage in easing computational burden in comparison to simultaneous one. Fortunately, the number of situations where requirements are satisfied has been grown gradually, and will continue to soar because of increase in the amount of real-time data and progress of observation technology these days.

The idea of modelling with diffusion process concerning observational noise is no new phenomenon. For instance, in the context of high-frequency financial data analysis, the researchers have addressed the existence of ”microstructure noise” with large variance with respect to time increment questioning the premise that what we observe are purely diffusions. The energetic research of the modelling with ”diffusion + noise” has been conducted in the decade: some research have examined the asymptotics of this model in the framework of fixed time interval such that (e.g., [9], [10], [12], [20] and [18]); and [3] and [4] research the parametric inference of this model with ergodicity and the asymptotic framework . For parametric estimation for discretely observed diffusion processes without measurement errors, see [5], [24], [25], [2], [14] and references therein.

Our research is focused on the statistical inference for an ergodic diffusion plus noise. We give the estimation methodology with adaptive estimation that relaxes computational burden and that has been researched for ergodic diffusions so far (see [24], [25], [13], [21], [22]) in comparison to the simultaneous estimation of [3] and [4]. In previous researches the simultaneous asymptotic normality of , and has not been shown, but our method allows us to see asymptotic normality and asymptotic independence of them with the different convergence rates. Our methods also broaden the applicability of modelling with stochastic differential equations since it is more robust for the existence of noise than the existent results in discretely observed diffusion processes with ergodicity not concerning observation noise.

As the real data analysis, we analyse the 2-dimensional wind data [17] and try to model the dynamics with 2-dimensional Ornstein-Uhlenbeck process. We utilise the fitting of our diffusion-plus-noise modelling and that of diffusion modelling with estimation methodology called local Gaussian approximation method (LGA method) which has been investigated for these decades (for instance, see [24], [13] and [14]). The result (see Section 5) seems that there is considerable difference between these estimates: however, we cannot evaluate which is the more trustworthy fitting only with these results. It results from the fact that we cannot distinguish a diffusion from a diffusion-plus-noise; if , then the observation is not contaminated by noise and the estimation of LGA should be adopted for its asymptotic efficiency; but if

, what we observe is no more a diffusion process and the LGA method loses its theoretical validity. Therefore, it is necessary to compose the statistical hypothesis test with

and . In addition to estimation methodology, we also research this problem of hypothesis test and propose a test which has the consistency property.

In Section 2, we gather the assumption and notation across the paper. Section 3 gives the main results of this paper. Section 4 examines the result of Section 3 with simulation. In Section 5 we analyse the real data for wind velocity named MetData with our estimators and LGA as discussed above and test whether noise does exist.

2 Local means, notations and assumptions

2.1 Local means

We partition the observation into blocks containing observations and examine the property of the following local means such that


where is an arbitrary sequence of random variables on the mesh as , and ; and . Note that and .

In the same way as [3] and [4], our estimation method is based on these local means with respect to the observation . The idea is so straightforward; taking means of the data in each partition should reduce the influence of the noise term

because of the law of large numbers and then we will obtain the information of the latent process


We show how local means work to extract the information of the latent process. The first plot on next page (Figure 3) is the simulation of a 1-dimensional Ornstein-Uhlenbeck process such that


where , and

. Secondly, we contaminate the observation with normally-distributed noise

and and plot the observation on next page (Figure 3). Finally we make the sequence of local means where and plot at the bottom of the next page (Figure 3).

With these plots, it seems that the local means recover rough states of the latent processes, and actually it is possible to compose the quantity which converges to each state on the mesh for Proposition 6 with the assumptions below.

Figure 2: plot of the contaminated observation
Figure 1: plot of the latent process
Figure 2: plot of the contaminated observation
Figure 3: plot of the local means
Figure 1: plot of the latent process

2.2 Notations and assumptions

We set the following notations.

  1. For a matrix , denotes the transpose of and . For same size matrices and , .

  2. For any vector

    , denotes the -th component of . Similarly, , and denote the -th component, the -th row vector and -th column vector of a matrix respectively.

  3. For any vector , , and for any matrix , .

  4. is a positive generic constant independent of all other variables. If it depends on fixed other variables, e.g. an integer , we will express as .

  5. and .

  6. Let us define .

  7. A -valued function on is a polynomial growth function if for all ,

    is a polynomial growth function uniformly in if for all ,

    Similarly we say is a polynomial growth function uniformly in if for all ,

    3 Proofs

    We give the proofs of the main theorems discussed above and some preliminary ones. Some of them are also discussed in [15] with details.

    We set some notations which only appear in the proof section.

    1. Let us denote some -fields such that , , , .

    2. We define the following -valued random variables which appear in the expansion:

    3. .

    4. We set the following empirical functionals:

    5. Let us define , and for , and .

    6. We denote

      which are sequences of -valued functions and -valued ones such that the components of themselves and their derivatives with respect to are polynomial growth functions for all and .

    7. Let us define

      which is a family of sequences of the functions such that the components of the functions and their derivatives with respect to are polynomial growth functions and there exist a -valued sequence s.t. and such that for all and for the sequence discussed above,

    8. Denote

    3.1 Conditional expectation of supremum

    The following two propositions are multidimensional extensions of Proposition 5.1 and Proposition A in [7] respectively.

    Proposition 1.

    Under (A1), for all , there exists a constant such that for all ,

    Proposition 2.

    Under (A1) and for a function whose components are in , assume that there exists such that

    Then for any ,

    Especially for ,

    The next proposition summarises some results useful for computation.

    Proposition 3.

    Under (A1), for all where there exists a constant such that and , we have


    (i), (ii): Let be the infinitesimal generator of the diffusion process. Since Ito-Taylor expansion, for all ,

    and the second term has the evaluation

    Therefore, we have (ii) and identical revaluation holds for (ii).
    (iii): Using (i) and Hölder’s inequality, we have the result.
    (iv): Because of Proposition 2 and Hölder’s inequality, We obtain the proof.
    (v): For convexity, we have

    Hölder’s inequality, Fubini’s theorem, BDG theorem and Proposition 2 give the result. ∎

    3.2 Propositions for ergodicity and evaluations of expectation

    The next result is a multivariate version of [14] or [8] using Proposition 1.

    Lemma 4.

    Assume (A1)-(A3) hold. Let be a function in and assume that , the components of and are polynomial growth functions uniformly in . Then the following convergence holds:

    3.3 Characteristics of local means

    The following propositions, lemmas and corollary are multidimensional extensions of those in [7] and [3].

    Lemma 5.

    and are -measurable, independent of and Gaussian. These random variables have the following decomposition:

    In addition, the evaluation of the following conditional expectations holds:

    where , and .

    For the proof, see Lemma 8.2 in [3] and extend it to multidimensional discussion.

    Proposition 6.

    Under (A1), (AH), assume the component of the function on , and are polynomial growth functions uniformly in . Then there exists such that for all and ,

    Moreover, for all ,

    The proof is almost identical to that of Corollary 3.3 in [3] except for dimension, but it does not influence the evaluation.

    Proposition 7.

    Under (A1) and (AH),

    where is a -measurable random variable such that there exists and for all satisfying the inequalities

    For the proof, see that of Proposition 3.4 in [3] and extend the discussion to multidimensional one.

    Corollary 8.

    Under (A1) and (AH),

    where is a -measurable random variable such that there exists and for all satisfying the inequalities


    It is enough to see satisfies the evaluation for . Corollary 6 and Proposition 7 give

    With respect to the third evaluation, Hölder’s inequality verifies the result. ∎

    The following lemma summarises some useful evaluations for computation.

    Lemma 9.

    Assume is a function whose components are in and the components of and are polynomial growth functions in . In addition, denotes a function whose components are in and that the components of and are polynomial growth functions. Under (A1), (A3), (A4) and (AH), the following uniform evaluation holds:


    Simple computations and the results above lead to the proof. ∎

    3.4 Uniform law of large numbers

    The following propositions and theorems are multidimensional version of [3].

    Proposition 10.

    Assume is a function in and , the components of , and are polynomial growth functions uniformly in . Under (A1)-(A4), (AH),

    The proof is almost same as Proposition 4.1 in [3].

    Theorem 11.

    Assume is a function in and the components of , , and are polynomial growth functions uniformly in . Under (A1)-(A4), (AH),


    We define the following random variables:

    and then

    Hence it is enough to see the uniform convergences in probability of the first term and the second one in the right hand side.

    In the first place, we consider the first term of the right hand side above. We can decompose the sum of as follows:

    To simplify notations, we only consider the first term of the right hand side and the other terms have the identical evaluation. Let us define the following random variables:

    and recall Proposition 7 which states

    Therefore we have

    First of all, the pointwise convergence to 0 for all and we abbreviate as . Since is -measurable and hence -measurable, the sequence of random variables are -adopted, and hence it is enough to see