DeepAI

# Optimal convergence rates for the invariant density estimation of jump-diffusion processes

We aim at estimating the invariant density associated to a stochastic differential equation with jumps in low dimension, which is for d=1 and d=2. We consider a class of jump diffusion processes whose invariant density belongs to some Hölder space. Firstly, in dimension one, we show that the kernel density estimator achieves the convergence rate 1/T, which is the optimal rate in the absence of jumps. This improves the convergence rate obtained in [Amorino, Gloter (2021)], which depends on the Blumenthal-Getoor index for d=1 and is equal to log T/T for d=2. Secondly, we show that is not possible to find an estimator with faster rates of estimation. Indeed, we get some lower bounds with the same rates {1/T,log T/T} in the mono and bi-dimensional cases, respectively. Finally, we obtain the asymptotic normality of the estimator in the one-dimensional case.

• 11 publications
• 1 publication
01/21/2020

### Invariant density adaptive estimation for ergodic jump diffusion processes over anisotropic classes

We consider the solution X = (Xt) t>0 of a multivariate stochastic diffe...
08/05/2022

### Malliavin calculus for the optimal estimation of the invariant density of discretely observed diffusions in intermediate regime

Let (X_t)_t ≥ 0 be solution of a one-dimensional stochastic differential...
03/02/2022

### Estimation of the invariant density for discretely observed diffusion processes: impact of the sampling and of the asynchronicity

We aim at estimating in a non-parametric way the density π of the statio...
09/03/2019

### A Diffusion Process Perspective on Posterior Contraction Rates for Parameters

We show that diffusion processes can be exploited to study the posterior...
10/13/2018

### Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Dimension

We derive concentration inequalities for the supremum norm of the differ...
06/29/2020

### Spectral Gap of Replica Exchange Langevin Diffusion on Mixture Distributions

Langevin diffusion (LD) is one of the main workhorses for sampling probl...
10/06/2021

### Minimax rate of estimation for invariant densities associated to continuous stochastic differential equations over anisotropic Holder classes

We study the problem of the nonparametric estimation for the density π o...

## 1 Introduction

Solutions to Lévy-driven stochastic differential equations have recently attracted a lot of attention in the literature due to its many applications in various areas such as finance, physics, and neuroscience. Indeed, it includes some important examples from finance such as the well-known Kou model in [29], the Barndorff-Nielsen-Shephard model ([8]), and the Merton model ([32]

) to name just a few. An important example of application of jump-processes in neuroscience is the stochastic Morris-Lecar neuron model presented in

[25]. As a consequence, statistical inference for jump processes has recently become an active domain of research.

We consider the process solution to the following stochastic differential equation with jumps:

 Xt=X0+∫t0b(Xs)ds+∫t0a(Xs)dBs+∫t0∫Rd0γ(Xs−)z(ν(ds,dz)−F(z)dzds), (1)

where is a -dimensional Brownian motion and is a Poisson random measure on associated to a Lévy process with Lévy density function . We focus on the estimation of the invariant density associated to the jump-process solution to (1) in low dimension, which is for and . In particular, assuming that a continuous record of is available, our goal is to propose a non-parametric kernel estimator for the estimation of the stationary measure and to discuss its convergence rate for large .

The same framework has been considered in some recent papers such as [2], [23] (Section 5.2), and [3]. In the first paper, it is shown that the kernel estimator achieves the following convergence rates for the pointwise estimation of the invariant density: for and for (where is the Blumenthal-Getoor index). We recall that, in the absence of jumps, the optimal convergence rate in the one-dimensional case is , while the one found in [2] depends on the jumps and belongs to the interval .

In this paper, we wonder if such a deterioration on the rate is because of the presence of jumps or the used approach. Indeed, our purpose is to look for a new approach to recover a better convergence rate in the one-dimensional case (hopefully the same as in the continuous case) and to discuss the optimality of such a rate. This new approach will also lead to the obtaining of the asymptotic normality of the proposed estimator. After that, we will discuss the optimality of the convergence rate in the bi-dimensional case. This will close the circle of the analysis of the convergence rates for the estimation of the invariant density of jump-diffusions, as the convergence rates and their optimality in the case have already been treated in detail in [3].

Beyond these works, to our best knowledge, the literature concerning non-parametric estimation of diffusion processes with jumps is not wide. One of the few examples is given by Funke and Schmisser: in [27] they investigate the non parametric adaptive estimation of the drift of an integrated jump diffusion process, while in [35], Schmisser deals with the non-parametric adaptive estimation of the coefficients of a jumps diffusion process. To name other examples, in [24] the authors estimate in a non-parametric way the drift of a diffusion with jumps driven by a Hawkes process, while in [4] the volatility and the jump coefficients are considered.

On the other hand, the problem of invariant density estimation has been considered by many authors (see e.g. [33], [20], [10], [39], and [5]) in several different frameworks: it is at the same time a long-standing problem and a highly active current topic of research. One of the reasons why the estimation of the invariant density has attracted the attention of many statisticians is the huge amount of numerical methods to which it is connected, the MCMC method above all. An approximation algorithm for the computation of the invariant density can be found for example in [30] and [34]. Moreover, invariant distributions are essential for the analysis of the stability of stochastic differential systems (see e.g. [28] and [5]).

In [5], [6], and [11] some kernel estimators are used to estimate the marginal density of a continuous time process. When belongs to some Hölder class whose smoothness is , they prove under some mixing conditions that their pointwise risk achieves the standard rate of convergence and the rates are minimax in their framework. Castellana and Leadbetter proved in [15] that, under the following condition CL, the density can be estimated with the parametric rate by some non-parametric estimators (the kernel ones among them).

• is integrable on and is continuous for each .

In our context, , where

is the transition density. More precisely, they shed light to the fact that local irregularities of the sample paths provide some additional information. Indeed, if the joint distribution of

is not too close to a singular distribution for small, then it is possible to achieve the superoptimal rate for the pointwise quadratic risk of the kernel estimator. Condition CL can be verified for ergodic continuous diffusion processes (see [38] for sufficient conditions). The paper of Castellana and Leadbetter led to a lot of works regarding the estimation of the common marginal distribution of a continuous time process. In [9], [10], [14], [21], and [7] several related results and examples can be found.

An alternative to the kernel density estimator is given by the local time density estimator, which was proposed by Kutoyants in [22] in the case of diffusion processes and was extended by Bosq and Davydov in [12] to a more general context. The latest have proved that, under a condition which is mildly weaker than CL, the mean squared error of the local time estimator reaches the full rate . Leblanc built in [31] a wavelet estimator of a density belonging to some general Besov space and proved that, if the process is geometrically strong mixing and a condition like CL is satisfied, then its -integrated risk converges at rate as well. In [18] the authors built a projection estimator and showed that its -integrated risk achieves the parametric rate under a condition named WCL, which is blandly different compared to CL.

• There exists a positive integrable function (defined on ) such that

 supy∈R∫∞0gu(x,y)du≤k(x), for % all x∈R.

In this paper, we will show that our mono-dimensional jump-process satisfies a local irregularity condition WCL1 and an asymptotic independence condition WCL2 (see Proposition 1), two conditions in which the original condition WCL can be decomposed. In this way, it will be possible to show that the risk for the pointwise estimation of the invariant measure achieves the superoptimal rate , using our kernel density estimator. Moreover, the same conditions will result in the asymptotic normality of the proposed estimator. Indeed, as we will see in the proof of Theorem 2, the main challenge in this part is to justify the use of dominated convergence theorem, which will ensured by conditions WCL1 and WCL2. We will find in particular that, for any collection of real numbers, we have

 √T(^μh,T(xi)−μ(xi),1≤i≤m)D→N(m)(0,Σ(m)) as T→∞,

where is the kernel density estimator and

 Σ(m):=(σ(xi,xj))1≤i,j≤m,σ(xi,xj):=2∫∞0gu(xi,xj)du.

We remark that the precise form of the equation above allows us to construct tests and confidence sets for the density.

We have found the convergence rate for the risk associated to our kernel density estimator for the estimation of the invariant density for and . Then, some questions naturally arise: are the convergence rates the best possible or is it possible to improve them by using other estimators? In order to answer, we consider a simpler model where both the volatility and the jump coefficient are constant and the intensity of the jumps is finite. Then, we look for a lower bound for the risk at a point defined as in equation (9) below. The first idea is to use the two hypothesis method (see Section 2.3 in [37]). To do that, the knowledge of the link between the drift and the invariant density is essential. If in absence of jumps such link is explicit, in our context it is more challenging. As shown in [19] and [3], it is possible to find the link knowing that the invariant measure has to satisfy , where is the adjoint of the generator of the considered diffusion. This method allows us to show that the superoptimal rate is the best possible for the estimation of the invariant density in

, but it fails in the bi-dimensional case (see Remark 1 below for details). Finally, we use a finite number of hypotheses to prove a lower bound in the bi-dimensional case. This requires a detailed analysis of the Kullback divergence between the probability laws associated to the different hypotheses. Thanks to that, it is possible to recover the optimal rate

in the two-dimensional case.

The paper is organised as follows. In Section 2 we give the assumptions on our model and we provide our main results. Section 3 is devoted to state and prove some preliminary results needed for the proofs of the main results. To conclude, in Section 4 we give the proof of Theorems 1, 2, 3, and 4, where our main results are gathered.

## 2 Model assumption and main results

We consider the following stochastic differential equation with jumps

 Xt=X0+∫t0b(Xs)ds+∫t0a(Xs)dBs+∫t0∫Rd0γ(Xs−)z(ν(ds,dz)−F(z)dzds), (2)

where , , , the initial condition is a

-valued random variable, the coefficients

, and are measurable functions, is a -dimensional Brownian motion, and is a Poisson random measure on associated to a Lévy process with Lévy density function . All sources of randomness are mutually independent.

We consider the following assumptions on the coefficients and on the Lévy density :

1. The functions , and are globally Lipschitz and bounded. Moreover, Id, for some constant , where Id denotes the identity matrix and .

2. , for all , for some .

3. Supp and for all , , for some .

4. There exist and such that .

5. If , , for any .

Assumption A1 ensures that equation (2) admits a unique càdlàg adapted solution satisfying the strong Markov property, see e.g. [1]. Moreover, it is shown in [2, Lemma 2] that if we further assume Assumptions A2-A4, then the process is exponentially ergodic and exponentially -mixing. Therefore, it has a unique invariant distribution , which we assume it has a density with respect to the Lebesgue measure. Finally, Assumption A5 ensures the existence of the transition density of denoted by which satisfies the following upper bound (see [2, Lemma 1]): for all , there exists and such that for any and ,

 pt(x,y)≤c0(t−d/2e−λ0|y−x|2t+t(t1/2+|y−x|)d+α). (3)

We assume that the process is observed continuously in a time interval such that tends to . In the paper [2] cited above, the nonparametric estimation of is studied via the kernel estimator which is defined as follows. We assume that belongs to the Hölder space where , and , , which means that for all , and ,

 ∥∥D(k)iμ∥∥∞≤L and ∥∥D(⌊βi⌋)iμ(.+tei)−D(⌊βi⌋)iμ(.)∥∥∞≤Li|t|βi−⌊βi⌋,

where denotes the th order partial derivative of w.r.t the th component, is the integer part of , and is the canonical basis of . We set

 ^μh,T(x)=1T∏di=1hi∫T0d∏i=1K(xi−Xithi)dt=:1T∫T0Kh(x−Xt)dt,

where , is a bandwidth and is a kernel function satisfying

 ∫RK(x)dx=1,∥K∥∞<∞,% supp(K)⊂[−1,1],∫RK(x)xidx=0,

for all with .

We first consider equation (2) with and show that the kernel estimator reaches the optimal rate , as it is for the stochastic differential equation (2) without jumps. For this, we need the following additional assumption on .

1. F belongs to and for all , , for some .

###### Theorem 1.

Let be the solution to (2) on with . Suppose that Assumptions A1-A6 hold and . Then there exists a constant independent of and such that for all ,

 E[|^μh,T(x)−μ(x)|2]≤c(h2β+1T). (4)

In particular, choosing with , we conclude that

 E[|^μh,T(x)−μ(x)|2]≤cT.

Theorem 1 improves the upper bound obtained in [2] which was of the form

. As in that paper, we will use the bias-variance decomposition (see

[17, Proposition 1])

 E[|^μh,T(x)−μ(x)|2]≤c(h2β+T−2% Var(∫T0K(x−Xt)dt)). (5)

Then in [2] bounds on the transition semigroup and on the transition density (see (3) above) give an upper bound for the variance depending on the bandwidth. Here, we use the same approach as in [15] and [18] to obtain a bandwidth-free rate for the variance of smoothing density estimators (which include the kernel estimator). For Markov diffusions, the sufficient conditions can be decomposed into a local irregularity condition WCL1 plus an asymptotic independence condition WCL2:

 \bf WCL1: ∫R∫10supy∈R|gu(x,y)|dudx<∞, \bf WCL2: ∫R∫∞1supy∈R|gu(x,y)|dudx<∞,

where . In order to show these conditions, an upper bound of the second derivative of the transition density is obtained (see Lemma 1 below), for which the additional condition A6 is needed.

As shown in [13], conditions WLC1 and WLC2 are also useful to show the asymptotic normality of the kernel density estimator, as proved in the next theorem.

###### Theorem 2.

Let be the solution to (2) on with . Suppose that Assumptions A1-A6 hold and . Then, for any collection of distinct real numbers

 √T(^μh,T(xi)−E[^μh,T(xi)],1≤i≤m)D→N(m)(0,Σ(m)) as T→∞, (6)

where

 Σ(m):=(σ(xi,xj))1≤i,j≤m,σ(xi,xj):=2∫∞0gu(xi,xj)du.

Furthermore,

 √T(^μh,T(xi)−μ(xi),1≤i≤m)D→N(m)(0,Σ(m)) as T→∞. (7)

We are also interested in obtaining lower bounds in dimension . For this, we consider the particular case of equation (2):

 Xt=X0+∫t0b(Xs)ds+aBt+∫t0∫Rd0γz(ν(ds,dz)−F(z)dzds), (8)

where and are invertible matrices and is a Lipschitz and bounded function satisfying Assumption A2. We assume that satisfies Assumptions A3-A5 and Then, the unique solution to equation (8) admits a unique invariant measure , which we assume has a density with respect to the Lebesgue measure. We denote by and the law and expectation of the solution .

We say that a bounded and Lipschitz function belongs to if the unique invariant density belongs to for some , , .

We define the minimax risk at a point by

 RxT(β,L):=inf~μTR(~μT(x)):=inf~μTsupb∈Σ(β,L)E(T)b[(~μT(x)−μb(x))2], (9)

where the infimum is taken on all possible estimators of the invariant density.

The following lower bounds hold true.

###### Theorem 3.

Let be the solution to (8) on with . We assume that and are non-zero constants. There exists and such that, for all ,

 infx∈RRxT(β,L)≥cT.
###### Theorem 4.

Let be the solution to (8) on with . Assume that for all and ,

 |(aaT)ij(aaT)−1jj|≤12. (10)

There exists and such that, for ,

 inf~μTsupb∈Σ(β,L)E(T)b[supx∈R2(~μT(x)−μb(x))2]≥clogTT.

Comparing these lower bounds with the upper bound of Theorem 1 for the case and Proposition 4 in [2] for the two-dimensional case, we conclude that the convergence rate are the best possible for the kernel estimator of the invariant density in dimension .

The proof of Theorem 3 follows along the same lines as that of Theorem 2 in [3], where a lower bound for the kernel estimator of the invariant density for the solution to (8) for is obtained. The proof is based on the two hypotheses method, explained for example in Section 2.3 of [37]. However, this method does not work for the two-dimensional case as explained in Remark 1 below. Instead, we use the Kullback’s version of the finite number of hypotheses method as stated in Lemma C.1 of [36], see Lemma 2 below. Observe that this method gives a slightly weaker lower bound as we get a inside the expectation, while the method in [3] provides an outside the expectation.

## 3 Preliminary results

The proof of Theorems 1 and 2 will use the following upper bound on the second partial derivative of the transition density.

###### Lemma 1.

Let be the solution to (2) on with . Suppose that Assumptions A1-A6 hold. For all , there exist two constants and such that for any and

 ∂2∂x2pt(x,y)≤c(t−3/2e−λ1|y−x|2t+1(t1/2+|x−y|)1+α).
###### Proof.

We apply the estimate in Theorem 3.5(v) of [16]. We remark that, in [16], the authors assumed . After inspection of the proof it is possible to see that the result can be extended to the case : it was stated for for the convenience in describing the Kato class function (for the drift). We also remark that the sufficient conditions Theorem 3.5(v) of [16] are the same as that to obtain the upper bound for the transition density (3) (which hold under Assumptions A1-5), together with the following additional condition: there exist and such that for all ,

 |b(x)−b(y)|+|k(x,z)−k(y,z)|≤c|x−y|δ, (11)

where . Thus, we only need to show (11). As is bounded and Lipschitz, it satisfies (11). In fact, when and are such that , thanks to the boundedness of we have, for each ,

 |b(x)−b(y)|≤|b(x)|+|b(y)|≤2c≤2c|x−y|δ.

Instead, when and are such that , the Lipschitz continuity gives

 |b(x)−b(y)|≤L|x−y|=L|x−y|1−δ|x−y|δ≤L|x−y|δ.

Concerning , we write

 |k(x,z)−k(y,z)|=|z|1+α∣∣∣1γ(x)F(zγ(x))−1γ(y)F(zγ(y))∣∣∣=|z|1+α|γ(x)|∣∣∣F(zγ(x))−F(zγ(y))∣∣∣+|z|1+α∣∣∣F(zγ(y))∣∣∣∣∣∣1γ(x)−1γ(y)∣∣∣. (12)

From the intermediate value theorem and defining (assuming WLOG that , otherwise ), the first term in the r.h.s above is bounded by

 |z|1+α|γ(x)||F′(~z)|∣∣∣zγ(x)−zγ(y)∣∣∣≤|z|1+α|γ(x)|c|~z|2+α|z||γ(x)γ(y)||γ(y)−γ(x)|≤cγ2+αmaxγ3min|γ(x)γ(y)|,

where we have used A6 in the first inequality and and . Moreover, by A3, the second term in the r.h.s of (12) is bounded by

 |z|1+αc|γ(y)|1+α|z|1+α1|γ(x)γ(y)||γ(y)−γ(x)|≤cγαmaxγmin|γ(y)−γ(x)|.

Thus, we have shown that

 |k(x,z)−k(y,z)|≤c(γ2+αmaxγ3min+γαmaxγmin)|γ(y)−γ(x)|.

Finally, as is Lipschitz and bounded, we conclude that (11) holds. This concludes the proof of the lemma. ∎

The key point of the proof of Theorem 1 consists in showing that conditions WCL1 and WCL2 hold true, which is proved in the next proposition.

###### Proposition 1.

Let be the solution to (2) on with . Suppose that Assumptions A1-A6 hold. Then, conditions WCL1 and WCL2 are satisfied.

###### Proof.

We start considering WCL1. The density estimate (3) yields

 pt(x,y)≤ct−12+~ct1−α2≤¯ct−120

which combined with gives WCL1. In order to show WCL2, we set and and we claim that here exists such that

 |φ(λ)|≤c1(1+|λ|−2). (14)

Moreover, for all , there exists , such that for all ,

 |φx(λ,t)|≤c2(1+|λ|−2). (15)

Recall from Lemma 2 in [2] that the process is exponentially -mixing, which implies that , where is the -mixing coefficient defined in Section 1.3.2 of [26]. It follows that, for any , . Thus, by Proposition 10 of [18], inequalities (14) and (15) and the integrability of the -mixing coefficient imply WCL2. Therefore, we are left to show (14) and (15). We start showing (15). Integrating by parts and using Lemma 1 it yields

 |φx(λ,t)|=∣∣∣∫Rexp(iλy)pt(x,y)dy∣∣∣=|λ|−2∣∣∣∫Rexp(iλy)∂2∂y2pt(x,y)dy∣∣∣=|λ|−2∣∣∣∫Rexp(iλy)(∂2∂y2∫Rpt−1(x,z)p1(z,y))dy∣∣∣≤|λ|−2∫R∫Rpt−1(x,z)∣∣∣∂2∂y2p1(z,y)∣∣∣dzdy≤c|λ|−2∫R∫Rpt−1(x,z)(e−λ12|x−y|2+1(1+|x−y|)1+α)dydz.

As , the integral in is finite. Since , we get

 |φx(λ,t)|≤c|λ|−2≤c(1+|λ|−2),

which proves (15). Similarly,

which gives (15). The proof of the proposition is now completed. ∎

Theorem 2

is an application of the following central limit theorem for discrete stationary sequences. Let

, be a sequence of strictly stationary discrete time valued random process. We define the -mixing coefficient of by

 αn,k:=supA∈σ(Yn,i,i≤0),B∈σ(Yn,i,i≥k)P(A∩B)−P(A)P(B)

and we set (see also Section 1 in [26]). We denote by the r-th component of an

dimensional random vector

.

###### Theorem 5 (Theorem 1.1 [13]).

Assume that

1. and for every , and , where is a constant depending only .

2.  supi≥1,1≤r≤mE[(Y(r)n,i)2]<∞.
3. For every and for every sequence such that for every , we have

 limn→∞1bnE[bn∑i=1Y(r)n,ibn∑j=1Y(s)n,j]=σr,s.
4. There exists such that .

5. For some constant and for every , .

Then,

 ∑ni=1Yn,i√nD→N(0,Σ)as n→∞,

where .

The proof of Theorem 4 is based on the following Kullback version of the main theorem on lower bounds in [37], see Lemma C.1 of [36]:

###### Lemma 2.

Fix and assume that there exists and a finite set such that one can find satisfying

 ∥∥fj−fk∥∥∞≥2ψ>0∀j≠k∈JT. (16)

Moreover, denoting the probability measure associated with , , and

 1|JT|∑j∈JTKL(P(T)j,P(T)0)=1|JT|∑j∈JTE(T)j⎡⎢⎣log⎛⎜⎝dP(T)jdP(T)0(XT)⎞⎟⎠⎤⎥⎦≤γlog(|JT|) (17)

for some . Then, for , we have

 inf~μTsupμb∈H2(β,L)(E(T)b[ψ−q∥~μT−μb∥q∞])1/q≥c(γ)>0,

where the infimum is taken over all the possible estimators of .

## 4 Proof of the main results

### 4.1 Proof of Theorem 1

By the symmetry of the covariance operator and the stationarity of the process,

 TVar(^μh,T(x))=1T∫T0∫T0Cov(Kh(x−Xt),Kh(x−Xs))dsdt=2T∫T0(T−u)Cov(Kh(x−Xu),Kh(x−X0))du=2∫T0(1−uT)∫R∫RKh(x−y)Kh(x−z)gu(y,z)dydzdu≤∫R∫RKh(x−y)Kh(x−z)∫∞0gu(y,z)dydzdu≤c,

where in the last inequality we have used Proposition 1. Then, from the bias-variance decomposition (5) we obtain (4), which concludes the desired proof.

### 4.2 Proof of Theorem 2

We aim to apply Theorem 5. First of all we split the interval into small intervals whose length is as follows: , with and and, for any , . By construction, it clearly holds that .

For each and , we consider the sequence defined as

 Y(r)n,i:=1√Δn(∫titi−1Kh(xr−Xu)du−E[∫titi−1Kh(xr−Xu)du]),

for . We denote by the valued random vector defined by . By construction,

 ∑ni=1Yn,i√n=√T(^μh,T(x)−E[^μh,T(x)]),

where is the vector

 (^μh,T(x1)−E[^μh,T(x1)],…,^μh,T(xm)−E[^μh,T(xm)]).

It is clear that for all and . Moreover, for all , and we have

 |Y(r)n,i|≤1√Δn∥Kh∥