## 1 Introduction

Let us consider the following problem. The observed continuous time trajectory of the diffusion process satisfies the stochastic differential equation

(1) |

where is the standard Wiener process and the drift coefficient has a cusp-type singularity, i.e., at the vicinity of the point we have , where . The parameter is unknown and we have to estimate it by the observations . We are interested in the asymptotic properties of the estimators of this parameter in the asymptotics of small noise: .

Such stochastic models, called sometimes, dynamical systems with small noise or perturbed dynamical systems attract attention of probabilists and statisticians (see, for example, Freidlin and Wentzel [7] and Kutoyants [12]

and references therein). The interest to this stochastic models can be explained as follows. Suppose that we have a dynamical system described by the ordinary differential equation

(2) |

The right hand part (rhp) of this system depends on some parameter and therefore the state of the dynamical system of course depends on the value of this parameter, i.e., . If we know , then we know the trajectory . For many real systems it is natural to suppose that the rhp contains some small noise (perturbations)

(3) |

The most “popular” noise considered in the corresponding literature is the so-called white Gaussian noise (WGN), i.e., is a Gaussian process with the properties . Here is the Dirac delta-function. In this case the observations of the system (3) can be written as solution of the stochastic differential equation (1). Therefore we replaced by the derivative of the standard Wiener process. Of course, the Wiener process is not differentiable and the equation (1) is just a short-writing of the corresponding integral equation

A wide class of estimation problems (parameter estimation and nonparametric estimation) were considered in [12]. The properties of estimators (maximum likelihood, Bayesian, minimum distance) are well studied in regular (smooth with respect to the unknown parameter) and non regular (change point, delay estimation) cases. The smooth case corresponds to the trend coefficient continuously differentiable w.r.t. and finite Fisher information. The change-point problem can be described by the following example

i.e., we have a switching diffusion process with unknown threshold . Such models are called threshold diffusion processes like threshold autoregressive (TAR) time series [1] and statistical problems related to this model are singular [14]. If we have a cusp-type singularity as

where , then for close to zero we have cusp-type switching similar to change-point, but without jump. Usually the characteristics of the real systems can not “make jumps” and the cusp-type switching sometimes fits better to the real systems.

In the present work we are interested in the properties of these estimators when the trend coefficient has a singularity like cusp. This case is in some sense intermediate between regular case and the change-point (discontinuous drift) case. The statistical problems with the models having cusp-type singularities were studied since 1968, when Prakasa Rao [19] described the asymptotic distribution of the MLE in the case of i.i.d. observations with the density function having the representation with at the vicinity of the point . It was shown that

where

is some constant and the random variable

will be described later. Note that in this case the Fisher information does not exist and the study of estimators requires special techniques. The exhaustive treatment of singular estimation problems (including cusp-type singularity) can be found in the Chapter VI of the fundamental work by Ibragimov and Khasminskii [9]. In this work one can find the general results concerning the asymptotic behavior of the MLE and Bayesian estimators in the situations including cusp-type singularity. In particular, they described the asymptotic distribution of the MLE and BE and showed that the BE are asymptotically efficient in minimax sense. For inhomogeneous Poisson processes with the intensity functions having a cusp-type singularity the properties of the MLE and BE were described in [3]. For ergodic diffusion processes with the drift coefficient having cusp-type singularity the similar results were obtained in [4]. The case of cusp-type singularity for the model of observations of regression model were treated in [20] and in [6]. For the model of signal in WGN, where the signal has cusp-type singularity such results were obtained in [2]. Note that the case was considered in [8] (ergodic diffusion) and in [10]. The survey of the properties of estimators for the different models of stochastic processes with cusp-type singularities can be found in [5].The method of the study of estimators through the properties of the normalized likelihood ratio developed in the work [9] is in some sense of universal nature. It was applied in the study of estimators for a wide class of models of observations and is applied in the present work too. In particular, we check the conditions of two general theorems (Theorem 1.10.1 and Theorem 1.10.2) in [9] concerning the behavior of estimators.

We show that the MLE and Bayesian estimators are consistent, have different limit distributions

with the same constant , the polynomial moments of these estimators converge and that the BE are asymptotically efficient. The random variables and are defined in the next section.

## 2 Main result

We suppose that the
following condition is fulfilled:

Condition . The
drift coefficient

where and . The function is bounded, has continuous bounded derivative w.r.t. : and is separated from zero: (for all ). The parameter , where and .

The limit of is – solution of the deterministic equation

(4) |

Note that by this condition we have the estimate

(5) |

with some . Here and in the sequel we denoted the true value. Let us denote

where .

The properties of the maximum likelihood and Bayesian estimates are described with the help of the limit likelihood ratio. Let us remind that the likelihood ratio in this problem is (see Liptser and Shiryaev [15])

The maximum likelihood estimator (MLE) is defined as solution of the equation

If this equation has more than one solution, then we can take anyone as MLE. Note that we cannot use the maximum likelihood equation

where dot means derivative w.r.t. because the likelihood ratio function is not differentiable.

The Bayesian estimator (BE)

for the quadratic loss function and density a priori

(continuous positive function) is defined by the expressionWe take quadratic loss function for the simplicity of exposition. The established in this work properties of the likelihood ratio allow to describe the behavior of the BE for essentially wider class of loss functions (see Theorem 1.10.2 in [9]).

The limit behavior of the MLE and BE are described with the help of two random variables and defined as follows. Let us introduce the random function

(6) |

and put

(7) |

Here is two-sided fractional Brownian motion with Hurst parameter . The random variable is well defined [18]. We need as well the definitions

As usual in such problems, we can introduce the lower minimax bound on the risks of all estimators:

###### Proposition 1

Let the condition be fulfilled then for all and all estimators we have

(8) |

The proof of this proposition we discuss after the proof of the Theorem 1 below.

According to this bound we call an estimator asymptotically efficient if for all we have the equality

The main result of this work is the following theorem.

###### Theorem 1

Let the condition be fulfilled, then the MLE and the BE are uniformly on compacts consistent, have different limit distributions

the moments converge (uniformly on compacts): for any

and the Bayesian estimators are asymptotically efficient.

Proof. Let us introduce the normalized likelihood ratio

It has the representation

We show below that converges in distribution to the random function .

The first result which we are going to prove is the uniform convergence of the random process to the deterministic solution of the ordinary equation (4). To prove it we need the following estimate.

###### Lemma 1

(N.V. Krylov [11]) Let the conditions be fulfilled, then there exists a constant

such that with probability 1

(9) |

Proof. Let us denote by the right hand part of the equation (4). Then we can write

(10) |

If we put , then the equation

can be written as

or

Using the smoothness of and the elementary inequalities

we write two estimates

Hence we have

and (remind that )

The equality (10) allows to write

As the function is continuous we have

where . Hence

Recall that is bounded and separated from zero by a positive constant which does not depend on . Further, there exists a constant such that

Therefore

where the constant .

###### Lemma 2

Let the condition be fulfilled, then for any there exist the constants and such that

(11) |

for all with some .

Proof. Remind that for any

Hence we can write

The last expression allows us to take such that for all we have the estimate (11) where and .

###### Lemma 3

Let the condition be fulfilled then the finite dimensional distributions of the stochastic process converge to the finite dimensional distributions of and this convergence is uniform on the compacts .

Proof. Consider the stochastic integral

Note that is a Gaussian process. By condition the solution is strictly increasing function. Therefore we can put by the relation

This provides us the equality ()

Hence if we put

then is a Gaussian process with independent increments

and

Further, let us change the variables . Then

with the corresponding two-sided Wiener process

Here and are two independent standard Wiener processes. We used here the relation

Therefore for any fixed value we have the following representation of the limit process

It has the following properties: and

The process

is known as a representation of the two-sided fractional Brownian motion, because is a Gaussian process with the properties:

Hence using the standard arguments we obtain the convergence of the finite-dimensional distributions

and this convergence is uniform on the compacts .

Let us consider the ordinary integral

If we show the convergence in probability

(12) |

then we obtain the convergence

We can write

where we denoted

Let us denote the normalized local time of the diffusion process and remind that for any function we have the occupation time formula

(13) |

Moreover, according to (9), we know that

(14) |

where we denoted Hence for any continuous function we have the convergence

(see details in [13]). For example, for any small and

We can write

where we put and .

For we have the similar relations