# Improved estimation via model selection method for semimartingale regressions based on discrete data

We consider the robust adaptive nonparametric estimation problem for a periodic function observed in the framework of a continuous time regression model with semimartingale noises.

## Authors

• 1 publication
• 1 publication
• 1 publication
11/11/2018

### Adaptive model selection method for a conditionally Gaussian semimartingale regression in continuous time

This paper considers the problem of robust adaptive efficient estimating...
12/15/2017

### Oracle inequalities for the stochastic differential equations

This paper is a survey of recent results on the adaptive robust non para...
10/29/2017

06/17/2021

### Generalized regression operator estimation for continuous time functional data processes with missing at random response

In this paper, we are interested in nonparametric kernel estimation of a...
02/17/2021

### Joint Continuous and Discrete Model Selection via Submodularity

In model selection problems for machine learning, the desire for a well-...
10/25/2017

### Nonparametric estimation of the fragmentation kernel based on a PDE stationary distribution approximation

We consider a stochastic individual-based model in continuous time to de...
03/08/2022

### Data adaptive RKHS Tikhonov regularization for learning kernels in operators

We present DARTR: a Data Adaptive RKHS Tikhonov Regularization method fo...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we consider the following continuous time regression model

 dyt=S(t)dt+dξt,0≤t≤n, (1.1)

where is an unknown -periodic function from , is an unobservable noise which is a square integrated semimartingale with the values in the Skorokhod space such that, for any function from , the stochastic integral

 (1.2)

has the following properties

 (1.3)

Here denotes the expectation with respect to the distribution of the noise process on the space , is some positive constant depending on the distribution . As to the noise distribution we assume that it is unknown and belongs to some family

.

The problem is to estimate the unknown function in the model (1.1) on the basis of observations

 (1.4)

where the observations frequency is some fixed integer number. For this problem we use the quadratic risk, which for any estimate , is defined as

 (1.5)

where stands for the expectation with respect to the distribution of the process (1.1) with a fixed distribution of the noise and a given function . Moreover, in the case when the distribution is unknown we use also the robust risk

 R∗(ˆS,S)=supQ∈QnRQ(ˆS,S). (1.6)

Note that if

is a Brownian motion, then we obtain the well known white noise model (see, for example,

[10, 30, 20, 21, 22] and etc.) which is popular in statistical radio-physics. Later, to take into account the dependence structure in the papers [13, 9, 16] it was proposed to use the Ornstein - Uhlenbeck noise processes, so called color gaussian noises. Then, to study the estimation problem for non-Gaussian observations (1.1) in the papers [12, 1, 25, 19, 17, 18] it was introduced impulse noises defined through the semi-Markov or compound Poisson processes with unknown impulse distributions. However, the semi-Markov or compound Poisson processes can describe the impulse influence of only one fixed frequency. It should be noted that in the telecommunication systems, the noise impulses are without limitations on frequencies and, therefore, such models are too restricted for practical applications. To include all possible impulse noises, in [14, 15] it was proposed to use general non-Gaussian semimartingale processes. Later, for semimartingale models in the papers [26, 27, 28, 29, 31] the authors developed the improved (shrinkage) nonparametric estimation methods. It should be emphasized, that in all these papers the improved estimation problems are studied only for the complete observations cases, i.e. when the all trajectory is accessed to be observed.

Our main goal in this paper is to develop improved estimation methods for the incomplete observations, i.e. when the process (1.1

) can be observed only in the fixed time moments (

1.4). As an example, we consider the regression model (1.1) with the noise defined by non-Gaussian Ornstein–Uhlenbeck process with unknown distribution. To this end we propose model selection methods based on the improved weighted least squares estimates. For the first time such approach was proposed in [6] for regression models in discrete time and in [16] for Gaussian regression models in continuous time. It should be noted that for the non-Gaussian regression models we can not use directly the well-known improved estimators proposed in [11] for spherically symmetric observations. To apply the improved estimation methods to the non-Gaussian regression models in continuous time one needs to use the modifications of the James - Stein shrinkage procedure proposed in [19, 25] for parametric estimation problems and developed in [26, 27, 28, 29, 31]

for nonparametric estimation. Moreover, we develop new analytical tools to provide the improvement effect for the non-asymptotic estimation accuracy. It turns out that in this case the accuracy improvement is much more significant than for parametric models, since according to the well-known James - Stein formula the accuracy improvement increases when dimension of the parameters increases. Recall, that for the parametric models this dimension is always fixed, while for the nonparametric models it tends to infinity, that is, it becomes arbitrarily large with an increase in the number of observations. Therefore, the gain from the application of improved methods is essentially increasing with respect to the parametric case. Then, we find constructive conditions for the observation frequency and the noise distributions under which we show sharp non-asymptotic oracle inequalities for the robust risks (

1.6). Then, through the established oracle inequalities we get the efficiency property for the developed model selection methods in adaptive setting. Furthermore, we show that the obtained conditions hold for the non-Gaussian Ornstein–Uhlenbeck processes defined in Section 2.

The rest of the paper is organized as follows. In Section 3 we construct the shrinkage weighted least squares estimates and study the improvement effect. In Section 4 we construct the model selection procedure on the basis of improved weighted least squares estimates. In Section 5.1 we state the main results in the form of oracle inequalities for the quadratic risk (1.5) and the robust risk (1.6). In Section 5.2 it is shown that the proposed model selection procedure for estimating in (1.1) is asymptotically efficient with respect to the robust risk (1.6). In Section 6 we illustrate the performance of the proposed model selection procedure through numerical simulations. Section 7 gives the main properties of stochastic integrals for the non-Gaussian Ornstein-Uhlenbeck processes. Section 8 gives the proofs of the main results. Appendix A contents all auxiliary results.

## 2 Non-Gaussian Ornstein-Uhlenbeck-Lévy process

Now we consider the noise process in (1.1) defined by a non-Gaussian Ornstein–Uhlenbeck process with the Lévy subordinator. Such processes are used in the financial Black–Scholes type markets with jumps (see, for example, [2], and the references therein). Let the noise process in (1.1) obeys the equation

 (2.1)

where

 (2.2)

Here is a standard Brownian motion, ”” denotes the stochastic integral with respect to the compensated jump measure with deterministic compensator , i.e.

 zt=∫t0∫R∗∗v(μ−˜μ)(dsdv)andR∗=R∖{0},

is the Lévy measure on , (see, for example in [5]), such that

 Π(x2)=1andΠ(x8)<∞. (2.3)

We use the notation . Moreover, we assume that the nuisance parameters , and satisfy the conditions

 (2.4)

where the bounds , and are functions of , i.e. , and , such that for any

 (2.5)

We denote by the family of all distributions of process (1.1) – (2.1) on the Skorokhod space satisfying the conditions (2.4) – (2.5). It should be noted that in view of Corollary 7.2 and the last inequality in (2.4) the condition (1.3) for the process (2.1) holds with . Note also that the process (2.1) is conditionally-Gaussian square integrated semimartingale with respect to -algebra which is generated by jump process .

## 3 Improved estimation

For estimating the unknown function in (1.1) we will use it’s Fourier expansion with respect to an orthonormal basis in . We extend these functions by the periodic way on , i.e. = for any . Assume that the basis functions are uniformly bounded, i.e. for some constant , which may be depend on ,

 sup0≤j≤nsup0≤t≤1|ϕj(t)|≤ϕ∗<∞. (3.1)

Moreover we will use such basis that the restrictions of the functions , on the sampling lattice

 Tp={t1,...,tp},tj=j/p,

form an orthonormal basis in the Hilbert space with the inner product

 (x,y)p=1pp∑j=1x(tj)y(tj)forx,y∈RTp, (3.2)

i.e. . We put the norm .

For example, we can take the trigonometric basis defined as and for

 (3.3)

Here denotes the integer part of .

We write the discrete Fourier expansion of the unknown function on the lattice in the form

 S(t)=p∑j=1θj,pϕj(t),

where the corresponding Fourier coefficients

 (3.4)

can be estimated from the discrete data (1.4) by the formulae

 (3.5)

We note that the system of the functions is orthonormal in because

Then the Fourier coefficients for the function with respect to these functions can be written as

 ¯θj,p=(S,ψj)=θj,p+hj,p, (3.6)

where

 hj,p=hj,p(S)=p∑k=1∫tktk−1ϕj(tk)(S(t)−S(tk))dt.

In view of (1.1), one obtains

 (3.7)

where is given in (1.2). As in [18] we define a class of weighted least squares estimates for as

 ˆSγ(t)=p∑j=1γ(j)ˆθj,nψj,p(t), (3.8)

where the weights belong to some finite set from for which we set

 (3.9)

where

is the number of the vectors

in . In the sequel we assume that all vectors from satisfies the following condition.

) Assume that for any vector there exists some fixed integer such that their first components are equal to one, i.e. for for any .

) There exists such that for any there exists a - field for which the random vector is the -conditionally Gaussian in with the covariance matrix

 (3.10)

and for some nonrandom constant

 infQ∈Qn(trGp−λ% max(Gp))≥l% pa.s% ., (3.11)

where

is the maximal eigenvalue of the matrix

.

As it is shown in Proposition 7.11 in [27] the condition ) holds for the model (1.1) – (2.1) with and .

For the first Fourier coefficients in (3.7) we will use the improved estimation method proposed for parametric models in [25]. To this end we set . In the sequel we will use the norm for any vector from . Now we define the shrinkage estimators as

 (3.12)

where ,

The positive parameter may be dependent of , i.e. , and such that

 (3.13)

Now we set shrinkage estimates for

 (3.14)

We compare the estimators (3.8) and (3.14) through the difference

 ΔQ(S):=RQ(S∗γ,S)−RQ(ˆSγ,S).

Now we obtain the non asymptotic bound for this comparative risk. Let now we set

 p0=√dϕ∗Lcn+1, (3.15)

where is the Lipschitz constant, i.e.

 L=sup0≤s,t≤1|S(t)−S(s)||t−s|.
###### Theorem 3.1.

Assume that the conditions hold. Moreover, assume that the function is Lipschitzian. Then for any

 supQ∈Qnsup∥S∥≤r∥S∥≤rΔQ(S)<0. (3.16)
###### Remark 3.1.

This inequality (3.16) means that non-asymptotically, i.e. for any the estimate (3.14) outperforms in mean square accuracy the estimate (3.8).

## 4 Model selection

This Section gives the construction of a model selection procedure for estimating a function in (1.1) on the basis of improved weighted least square estimates.

The model selection procedure for the unknown function in (1.1) will be constructed on the basis of a family of estimates .

The performance of any estimate will be measured by the empirical squared error

In order to obtain a good estimate, we have to write a rule to choose a weight vector in (3.14). It is obvious, that the best way is to minimise the empirical squared error with respect to . Making use the estimate definition (3.14

) and the Fourier transformation of

implies

 (4.1)

Since the Fourier coefficients are unknown, the weight coefficients can not be found by minimizing this quantity. To circumvent this difficulty one needs to replace the terms by their estimators . We set

 (4.2)

where

is the estimate for the limiting variance of

which we choose in the following form

 ˆσn=n∑j=[√n]+1ˆt2j,n,ˆtj,n=∫10Trj(t)dyt. (4.3)

For this change in the empirical squared error, one has to pay some penalty. Thus, one comes to the cost function of the form

 (4.4)

where is some positive constant, is the penalty term defined as

 (4.5)

Substituting the weight coefficients, minimizing the cost function

 (4.6)

in (3.8) leads to the improved model selection procedure

 S∗=S∗γ∗. (4.7)

It will be noted that exists because is a finite set. If the minimizing sequence in (4.6) is not unique, one can take any minimizer.

Now we specify the weight coefficients as it is proposed in [7, 8]

for a heteroscedastic regression model in discrete time. Firstly, we define the normalizing coefficient

. Consider a numerical grid of the form

where , and . We assume that the parameters and are functions of , i.e. and , such that

 limn→∞(1k∗(n)+k∗(n)lnn)=0andlimn→∞(ε(n)+1nbε(n))=0 (4.8)

for any . One can take, for example, for

 ε(n)=1/ln(n+1)andk∗(n)=k∗0+√ln(n+1), (4.9)

where is some fixed constant. For each we introduce the weight sequence as

 γαα(j)=1{1≤j≤d(α)}+(1−(j/ωα)β)1{d(α)

where , and

 τβ=(β+1)(2β+1)π2ββ.

Finally, we set

 Γ={γα,α∈An}. (4.11)

It will be noted that such weight coefficients satisfy the condition .

## 5 Main results

In this Section we obtain the sharp oracle inequalities for the quadratic risk (1.5) and robust risk (1.6) of proposed procedure. Then on the basis of these inequalities the robust efficiency property has been established in adaptive setting.

### 5.1 Oracle inequalities

To prove the sharp oracle inequality, the following conditions will be needed for the family of distributions of the noise in (1.1).

We need to impose some stability conditions for the noise Fourier transform sequence introduced in [26].

There exists a proxy variance such that for any

In the sequel we will use the following notations

 (5.1)

Moreover, we set

 (5.2)

where is euclidean norm in , i.e. for any from .

Assume that the sequence is such that

First, we obtain the oracle inequalities for the risks (1.5).

###### Theorem 5.1.

Assume that the conditions and hold. Then, for any and

 RQ(S∗,S)

where the coefficient is such that for any

 limn→∞Unnϵ=0. (5.3)

In the case, when the value of in is known, one can take and

 (5.4)

Now we study the estimate (4.3). To obtain the oracle inequality for the robust risk (1.6) we need some additional condition on the distribution family . We set

 ς∗=ς∗n=supQ∈QnσQ. (5.5)

) Assume that the limit equation (5.1) - (5.2) hold uniformly in and as for any .

Now we impose the conditions on the set of the weight coefficients .

Assume that the set is such and for any .

As is shown in [27], both the conditions and hold for the model (1.1) with Ornstein-Uhlenbeck noise process (2.1). Using Proposition 4.2 from [27] we can obtain the oracles inequalities for the robust risks (1.6).

###### Theorem 5.2.

Assume that the conditions hold and is continuously differentiable. Then for any and

where the term satisfies the property (5.3).

### 5.2 Asymptotic efficiency

In order to study the asymptotic efficiency we define the following functional Sobolev ball

 Wk,r={f∈Ckp[0,1]:k∑j=0∥f(j)∥2≤r},

where and are some unknown parameters, is the space of times differentiable - periodic functions such that for any . To study the asymptotic efficiency we denote by all estimators i.e. any measurable functions. In the sequel we denote by the distribution of the process with , i.e. white noise model with the intensity .

###### Theorem 5.3.

Assume that . The robust risk (1.6) admits the following lower bound

where .

We show that this lower bound is sharp in the following sense.

###### Theorem 5.4.

Assume that and there exists such that . Then the robust risk of the model selection procedure (4.7) with the weight coefficients (4.11) satisfies the following upper bound

It is clear that these theorems imply the following efficient property.

###### Corollary 5.5.

Assume that . Then the model selection procedure (4.7) with the weight coefficients (4.11) is asymptotically efficient, i.e.

 limn→∞v2k/(2k+1)nnsup% S∈Wk,rR∗(S∗,S)=lk(r).

Theorem 5.3 is shown by the same way as Theorem 1 in [15]. Theorem 5.4 follows from Theorems 5.2, 3.1 and Theorem 5.2 in [18].

## 6 Monte Carlo simulations

In this section we give the results of numerical simulations to assess the performance and improvement of the proposed model selection procedure (4.6). We simulate the model (1.1) with -periodic functions of the forms

 S1(t)=tsin(2πt)+t2(1−t)cos(4πt), (6.1)
 S2(t)=0.5−|0.5−t| (6.2)

on and the Lévy noise process is defined as

where is a homogeneous Poisson process with intensity and is i.i.d. sequence (see, for example, [18]).

We use the model selection procedure (4.6) with the weights (4.10) in which , , , and . We define the empirical risk as

where and is the deviation for the -th replication. In this example we take and .

Table 1 gives the values for the sample risks of the improved estimate (4.6) and the model selection procedure based on the weighted LSE (3.15) from [17] for different numbers of observation period . Table 2 gives the values for the sample risks of the the model selection procedure based on the weighted LSE (3.15) from [17] and it’s improved version for different numbers of observation period .

###### Remark 6.1.

Figure shows the behavior of the procedures (3.8) and (4.6) depending on the values of observation periods . The bold line is the function (6.2), the continuous line is the model selection procedure based on the least squares estimators and the dashed line is the improved model selection procedure . From the Table 2 for the same with various observations numbers we can conclude that theoretical result on the improvement effect (3.16) is confirmed by the numerical simulations. Moreover, for the proposed shrinkage procedure, Table 1 and Figures 1–3, we can conclude that the benefit is considerable for non large . However we note that for the function we have an improvement in accuracy for .

## 7 Stochastic calculus for the non-Gaussian Ornstein-Uhlenbeck processes

In this section we study the process (2.1).

###### Proposition 7.1.

Let and be two nonrandom left continuous functions with the finite right limits. Then for any

 (7.1)

where and

 ˇεt(f)=a∫t0ea(t−s)f(s)(1+e2as2)ds.

Proof. Taking into account the definitions (3.7) and (2.1) we obtain through the Ito formula that

 (7.2)

where ,

and . Moreover, using the Ito formula we obtain

 EI2t(1)=Eξ2tt=σQe2at−12a. (7.3)

Note now, that

 EI2t(f) ≤2a2∫t0f2(s)ds∫t0Eξ2sds+2σQ∫t0f2(s)ds.

So, from here

 sup0≤t≤nEI2t(f)≤2σQ(|a|+1)∫n0f2(s)ds<∞. (7.4)

This implies immediately that . Using this in (7.2) yields

 EIt(f)It(g) =σQQ∫t0f(s)g(s)ds (7.5)

where . Therefore, putting in (7.5), we obtain that

Taking into account here, that , we obtain that

Therefore, using this in (7.5) we obtain (7.1).

###### Corollary 7.2.

 EI2n(f)≤2σQ∫n0f2(s)ds. (7.6)

Proof. Indeed, putting in (7.1) we get

Moreover, note that

By the Cauchy-Bunyakovsky-Schwarz inequality

This implies immediately upper bound (7.6). Hence Corollary 7.2.

Now we set

 (7.7)

Using (7.2) with we can obtain that

 d˜It(f)=2af(t)Vtt(f)dt+d˜Mtt(f), (7.8)

where . To study this process we need to introduce the following functions

 (7.9)

and

 At(f)=∫t0e3a(t−s)f(s)υ(s)ds+2σ2%$Q$∫t0e3a(t−s)ˇεs(f)ds, (7.10)

where , and .

###### Proposition 7.3.

For any left continuous functions with finite right limits and

 (7.11)

where .

Proof. Applying again (7.2) with yields

 (7.12)

where and . By the Ito formula we get

 dVt(f)Vt(g)

Now from Lemma A.2 we obtain that

 dEVt(f)Vt(g) (7.13)

where . Note that and

 E[L(f),L(g)]t

To find the function we put in (7.13). Taking into account that we get

Using here that

 (7.14)

we obtain the representation (7.10). Hence Proposition 7.3.

###### Proposition 7.4.

For any left continuous function with finite right limits

 E˜It% (f)˜It(1)=∫t0e2a(t−s)˜ϰs(f)ds, (7.15)

where .

Proof. Using the Ito formula and Lemma A.2 we obtain that for any bounded nonrandom functions and

 (7.16)

Putting here and taking into account that , we obtain that

 +dE[˜M(f),L(1)]t.

By the direct calculation we find

 E[˜M(f),L(1)]t=∫t0ˇAs(f)ds.

So, we get (7.15) and this proposition.

Further we need the following correlation measures for two integrated functions and

 (7.17)

For any bounded function we introduce the following uniform norm

###### Proposition 7.5.

Let and be two left continuous bounded by functions with finite right limits, i.e. and . Then for any

 ∣∣aE˜It(f)Vt(g)∣∣≤u∗1ϖt(1,g)+u∗2ϖt(f,g)+u∗3, (7.18)

where , and .

Proof. First, note that from Ito formula we find

 aE˜It(f)Vt(g) =a2∫t0ea(t−s)g(s)(E˜Is(f)˜Is(1))ds +2a2∫t0ea(t−s)f(s)(EVs(g)Vs(f))ds +a∫t0ea(t−s)dE[˜M(f),L(g)]s. (7.19)

Using here Lemma A.4. and Lemma A.6 we can obtain that

 (7.20)

One can check directly that

 =2σQ∫t0g(s)f(s)(EIs(f)Is(1))ds +2σQ∫t0f(s)(EIs(f)Is(g))ds+ˇϱ2∫t0f2(s)g(s)ds.

From (7.1) we find that

 =2σ2Q∫t0g(s)f(s)τs(f,1)ds +2σ2Q∫t0f(s)τs(f,g)ds+ˇϱ2∫t0f2(s)g(s)ds.

Using the last equality in (7.14) we obtain that

 a∫t0ea(t−s) dE[˜M(f),L(g)]s=2σ2Q∫t0ea(t−s)g(s)f(s)ˇεs(f)ds

Note now that

i.e. . Therefore, in view of Lemma A.3 we get

Moreover, by integrating by parts we can obtain directly that

 ∣∣∣∫t0g(s)ˇεs(f)ds∣∣∣≤ϖt(f,g),

and, therefore,

 (7.21)

So, the last term in (7.19) can be estimated as

Using Lemma A.5 in (7.19) we come to the bound (7.18). Hence Proposition 7.5.

###### Proposition 7.6.

Let and be two left continuous bounded by functions with finite right limits, i.e. and . Then for any

 ∣∣E[˜M(f),˜M(g)]t∣∣≤(12σ2Qϕ2∗ϖt(f,g)+ϕ4∗ˇϱ% 2)t. (7.22)

Proof. First of all note that from (7.1) we obtain that

 E[˜M(f),˜M(g)]t +ˇϱ2∫t0f2(s)g2(s)ds. (7.23)

Using here the bound (7.21) we obtain (7.22). Hence Proposition 7.6.

###### Corollary 7.7.

Let and be two left continuous bounded by functions with finite right limits, i.e. and . Then for any

 ∣∣E˜It(f)˜It(g)∣∣≤(v∗1(ϖt(1,f)+ϖt(1,g))+v∗2ϖt(f,g)+v∗3)t, (7.24)

where ,