DeepAI
Log In Sign Up

Asymptotic properties of Bernstein estimators on the simplex

In this paper, we study various asymptotic properties (bias, variance, mean squared error, mean integrated squared error, asymptotic normality, uniform strong consistency) for Bernstein estimators of cumulative distribution functions and density functions on the d-dimensional simplex. Our results generalize the ones in Leblanc (2012) and Babu et al. (2002), which treated the case d = 1, and significantly extend those found in Tenbusch (1994) for the density estimators when d = 2. The density estimator (or smoothed histogram) is closely related to the Dirichlet kernel estimator from Ouimet (2020), and can also be used to analyze compositional data.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/21/2020

Asymptotic properties of Bernstein estimators on the simplex. Part 2: the boundary case

In this paper, we study the asymptotic properties (bias, variance, mean ...
02/17/2020

Density estimation using Dirichlet kernels

In this paper, we introduce Dirichlet kernels for the estimation of mult...
11/30/2020

A study of seven asymmetric kernels for the estimation of cumulative distribution functions

In Mombeni et al. (2019), Birnbaum-Saunders and Weibull kernel estimator...
11/13/2017

Generalised empirical likelihood-based kernel density estimation

If additional information about the distribution of a random variable is...
05/11/2022

The naïve estimator of a Poisson regression model with measurement errors

We generalize the naïve estimator of a Poisson regression model with mea...
12/06/2021

Minimax properties of Dirichlet kernel density estimators

This paper is concerned with the asymptotic behavior in β-Hölder spaces ...
04/23/2018

Log-transformed kernel density estimation for positive data

Kernel density estimators (KDEs) are ubiquitous tools for nonpara- metri...

1 Introduction

The -dimensional simplex and its interior are defined by

(1.1)

where . For any cumulative distribution function on , define the Bernstein polynomial of order for by

(1.2)

where the weights are the following probabilities from the

distribution :

(1.3)

The Bernstein estimator of , denoted , is the Bernstein polynomial of order for the blueempirical cumulative distribution function

, where the random variables

are independent and distributed. Precisely,

(1.4)

Similarly, if has a density function , we define the Bernstein density estimator of by

(1.5)

where is just a scaling factor, namely the inverse of the volume of the hypercube .

2 Results for the c.d.f. estimator

Except for Theorem 2.7, we assume the following everywhere in this section :

Assumption.
(2.1)
Proposition 2.1.

Under assumption (2.1), we have, uniformly for ,

(2.2)

as , where

(2.3)
Theorem 2.2 (blueBias and variance).

Under assumption (2.1), we have, for ,

(2.4)
(2.5)

as , where

(2.6)
Remark 2.3.

In Leblanc (2012a), the function should be equal to instead of . The error is explained in the appendix and the estimates can easily be verified numerically. The same error also appears in the statements of Belalia (2016), since the proofs relied on the same estimates as Leblanc.

Corollary 2.4 (blueMean squared error).

Under assumption (2.1), we have, for ,

(2.7)

In particular, if , the asymptotically optimal choice of , with respect to MSE, is

(2.8)

in which case

(2.9)
Theorem 2.5 (blueMean integrated squared error).

Under assumption (2.1), we have

(2.10)

In particular, if , the asymptotically optimal choice of , with respect to , is

(2.11)

in which case

(2.12)
Theorem 2.6 (blueAsymptotic normality).

Assume (2.1). For such that , we have the following convergence in distribution :

(2.13)

In particular, Proposition 2.1 implies

(2.14)
(2.15)

for any constant .

For the next result, we use the notation for any bounded function , and also

(2.16)
Theorem 2.7 (blueUniform strong consistency).

Let be continuous on . Then, as ,

(2.17)

Assume further that is differentiable and its partial derivatives are Lipschitz continuous on . Then, for all such that (for example, works), we have, as ,

(2.18)

In particular, for , we have a.s.

3 Results for the density estimator

For each result stated in this section, one of the following two assumptions will be used.

Assumptions.
(3.1)
(3.2)

We denote the expectation of by

(3.3)
Proposition 3.1.

Under assumption (3.2), we have, uniformly for ,

(3.4)

as , where

(3.5)
Theorem 3.2 (blueBias and variance).

We have, for ,

(3.6)
(3.7)

as , where

(3.8)
Corollary 3.3 (blueMean squared error).

Under assumption (3.2), we have, for ,

(3.9)

In particular, if , the asymptotically optimal choice of , with respect to , is

(3.10)
(3.11)

and, more generally, if for some , then

(3.12)
Theorem 3.4 (blueMean integrated squared error).

Under assumption (3.2), we have

(3.13)

In particular, if , the asymptotically optimal choice of , with respect to , is

(3.14)
(3.15)

and, more generally, if for some , then

(3.16)
Theorem 3.5 (blueUniform strong consistency).

Assume (3.1). If as , then

(3.17)

In particular, if , then a.s.

Theorem 3.6 (blueAsymptotic normality).

Assume (3.1). Let be such that . If as , then

(3.18)

If we also have as , then Theorem 3.5 implies

(3.19)

Independently of the above rates for and , if we assume (3.2) instead and for some as , then Proposition 3.1 implies

(3.20)
Remark 3.7.

The rate of convergence for the

-dimensional kernel density estimator with i.i.d. data and bandwidth

is in Theorem 3.1.15 of Prakasa Rao (1983), whereas our estimator converges at a rate of . Hence, the relation between the scaling factor of and the bandwidth of other multivariate kernel smoothers is .

4 Proof of the results for the c.d.f. estimator

Proof of Proposition 2.1.

We generalize the proof of (Lorentz, 1986, Section 1.6.1), which treated the case . By the assumption (2.1), a second order mean value theorem yields

(4.1)

for some random vector

on the line segment joining and . Using the well-known identities

(4.2)

and

(4.3)

we can multiply (4.1) by and sum over to obtain

(4.4)

To conclude, we need to show that the last term is . By the uniform continuity of the second order partial derivatives of , we know that for some , and we also know that, for all , there exists such that implies , uniformly for . By considering the two cases and , the last term in (4) is

(4.5)

By Cauchy-Schwarz and the identity (4.3), the first term inside the bracket in (4.5) is

(4.6)

By Bernstein’s inequality (see e.g. Lemma A.1), the second term inside the bracket in (4.5) is

(4.7)

If we take a sequence such that , then (4.5) is . ∎

Proof of Theorem 2.2.

The expression for the bias of just follows from Proposition 2.1 and the fact that

(4.8)

To estimate the variance of , note that

(4.9)

where

(4.10)

For every , the random variables are i.i.d. and centered, so that

(4.11)

Using the expansion in (4.1) and Proposition 2.1, the above is

(4.12)

The double sum on the second line inside the braces is estimated in (A.10) of Lemma A.3. By Cauchy-Schwarz, the identity (4.3), and the fact that , the double sum inside the big term is

(4.13)

This ends the proof. ∎

Proof of Theorem 2.5.

By (4.12), (4.13) and (2.4), we have

(4.14)

By the assumption (2.1), the partial derivatives are bounded on , so Lemma A.3 and the bounded convergence theorem imply

(4.15)

This ends the proof. ∎

Proof of Theorem 2.6.

Recall from (4.9) that where the ’s are i.i.d. and centered random variables. Therefore, it suffices to show the following Lindeberg condition for double arrays : 222See e.g. Section 1.9.3. in Serfling (1980). For every ,

(4.16)

where and where . But this follows from the fact that for all , and as by Theorem 2.2. ∎

Before proving Theorem 2.7, we need the following lemma (it is an adaptation of Lemma 2.2 in Babu & Chaubey (2006)).

Lemma 4.1.

Let be Lipschitz continuous on , and let 333You can think of as the bulk of the distribution; the contributions coming from outside the bulk are small for appropriate ’s.

(4.17)

Then, for all that satisfies , we have, as ,

(4.18)
Proof.

For all , we have

(4.19)

where the last inequality comes from our assumption that is Lipschitz continuous.

For , , and using the notation , we have

(4.20)

where

(4.21)