1. Introduction
We consider the dimensional ergodic diffusion process defined by the stochastic differential equation
where is an dimensional Wiener process, is a
dimensional random vector independent of
, and are unknown parameters, is bounded, open and convex sets in admitting Sobolev’s inequalities for embedding (see Adams and Fournier, 2003; Yoshida, 2011) for , is the true parameter vector, and and are known functions.Our concern in this paper is the estimation of with longterm, discrete and noisy observation defined as the sequence of the dimensional random vectors such that for all ,
where is the discretisation step satisfying and as , is the i.i.d. sequence of dimensional random vectors with and , the components of are independent of each other and have symmetric distribution with respect to 0, and is a
real matrix being positive semidefinite, defining the variance of noise term
. Let us assume the halfvectorisation is in the bounded, convex and open parameter space , and denote .Statistical inference for ergodic diffusion processes has been researched for the last few decades, for instance, see FlorensZmirou (1989); Yoshida (1992); Bibby and Sørensen (1995); Kessler (1995, 1997); Kutoyants (2004); Iacus (2008); De Gregorio and Iacus (2013, 2018); Iacus and Yoshida (2018) and references therein. The parametric inference for ergodic diffusion processes with discrete and noisy observations has been researched in Favetto (2014, 2016) and Nakakita and Uchida (2017, 2018a, 2018b, 2018c). For parametric estimation for nonergodic diffusion processes in the presence of market microstructure noise, see Ogihara (2018). Favetto (2014) proposes a simultaneous quasi likelihood function which necessitates optimisation with respect to both and and shows maximum likelihood (ML) type estimators have consistency even if the variance of noise is unknown; Favetto (2016) discusses asymptotic normality of the estimator proposed in Favetto (2014) when the variance of noise is known; Nakakita and Uchida (2017, 2018b) suggest adaptive quasi likelihood functions and which succeed in lessening the computational burden in comparison to Favetto (2014, 2016), and prove consistency and asymptotic normality of the adaptive ML type estimators corresponding to the quasi likelihoods; Nakakita and Uchida (2018a)
use those quasi likelihood functions for likelihoodratiotype test and show the asymptotic behaviour of test statistics under both null hypotheses and alternative ones;
Nakakita and Uchida (2018c) analyse those quasi likelihood functions with the framework of quasi likelihood analysis (QLA) proposed by Yoshida (2011), and show the polynomial large deviation inequality (PLDI) for the quasi likelihood functions and consequently the convergence of moments of adaptive ML type estimators and adaptive Bayes type ones. For details of adaptive estimation for diffusion processes, see Yoshida (1992, 2011); Uchida and Yoshida (2012, 2014).In general, however, the optimisation of quasi likelihood functions for diffusion processes, regardless of noise existence, is strongly dependent on initial values, especially in the case where the volatility function or drift function are nonlinear with respect to parameters. Hence, Kaino et al. (2017) and Kaino and Uchida (2018a, b) propose hybrid multistep estimation procedure for diffusion processes where initial values in optimisation are derived from Bayes type estimation with reduced sample sizes and the sequential optimisation with these initial values is implemented, which inherits the idea of hybrid multistep estimation for diffusion processes with full sample sizes in Kamatani and Uchida (2015) (see also Kutoyants, 2017). In this research, we also consider hybrid multistep estimation and apply the idea into inference problem, in particular, PLDI for the quasi likelihood functions and the convergence of moments of estimators for discretely and noisily observed ergodic diffusion processes since PLDI and convergence of moment of estimators are key tools to show the mathematical validity of information criteria for model selection problems (see Uchida, 2010; Fujii and Uchida, 2014; Eguchi and Masuda, 2018).
This paper consists of the following parts: Section 2 deals with the notation; we define the initial and multistep estimators and set the main theorem for the polynomialtype large deviation inequalities, moment estimates of the Bayes type estimators and convergences of moments in Section 3; a concrete example and simulation results are given in Section 4, the conclusions of this work are summarised in Section 5, and finally we give the proofs of the results in Section 6.
2. Notation and Assumption
First of all we give the notation used throughout this paper.

For every matrix , is the transpose of , and .

For every set of matrices and of the same size, . Moreover, for any , and , .

Let us denote the th element of any vector as and th one of any matrix as .

For any vector and any matrix , and .

For every , is the norm.

, , and .

For and , , , .

With respect to filtration, for all , , , , , , and .
With respect to , we assume the following conditions.


.

For a constant , for all ,

For all , .

There exists a unique invariant measure on and for all and with polynomial growth,

For any polynomial growth function satisfying , there exist , with at most polynomial growth for such that for all ,
where is the infinitesimal generator of .

Remark 1.

There exists such that and have continuous derivatives satisfying
With the invariant measure , we define
where . For these functions, let us assume the following identifiability conditions hold.

There exist and such that for all and , and .

For all , there exist and such that for all and , and .
The next assumption is concerned with the moments of noise.

For any , has th moment and the components of are independent of the other components for all , and
. In addition, for all odd integer
, , , and , , and .

There exist and such that for sufficiently large .
Remark 2.
should be smaller than or equal to such that . should be larger than and smaller than ; otherwise for some , , which must converge to 0 for in Theorem 2, or as , which must diverge in entire discussion. In addition, note that under [A6], for all .
3. Multistep estimator and PLDI
3.1. Setting of the initial and multistep estimators
We define sequences of local means such that , and , where
for and . For the detailed properties of local means, see Favetto (2014, 2016); Nakakita and Uchida (2017, 2018b, 2018c).
We set for , and such that satisfying (then it holds ), and correspondingly .
Remark 3.
and should be larger than to support the divergence . and actually work to make and , which are the quantities appearing in Remark 6.
Let us set and the following quasi likelihood functions:
Using these quasi likelihood functions, we also define the next two functions such that
Then, the initial estimators are defined as follows:
where and . Note that uses the whole data.
3.2. PLDIs for the quasi likelihood functions
To examine the asymptotic behaviours of and , firstly we will see that the boundedness such that
To show these boundedness, we define some random quantities: random fields such that
score functions such that
the observed information matrices such that
and the limiting information matrices such that