DeepAI

# Asymptotic properties of the normalized discrete associated-kernel estimator for probability mass function

Discrete kernel smoothing is now gaining importance in nonparametric statistics. In this paper, we investigate some asymptotic properties of the normalized discrete associated-kernel estimator of a probability mass function. We show, under some regularity and non-restrictive assumptions on the associated-kernel, that the normalizing random variable converges in mean square to 1. We then derive the consistency and the asymptotic normality of the proposed estimator. Various families of discrete kernels already exhibited satisfy the conditions, including the refined CoM-Poisson which is underdispersed and of second-order. Finally, the first-order binomial kernel is discussed and, surprisingly, its normalized estimator has a suitable asymptotic behaviour through simulations.

• 5 publications
• 6 publications
• 3 publications
10/07/2020

### A consistent second-order discrete kernel smoother

The histogram estimator of a discrete probability mass function often ex...
02/16/2020

### Conditional Shannon, Réyni, and Tsallis entropies estimation and asymptotic limits: discrete case

A method of estimating the joint probability mass function of a pair of ...
05/07/2018

### Nonparametric regression estimation for quasi-associated Hilbertian processes

We establish the asymptotic normality of the kernel type estimator for t...
10/04/2022

### Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth

This study investigates the effect of bandwidth selection via a plug-in ...
11/15/2018

### Kernel Smoothing of the Treatment Effect CDF

We provide a CV-TMLE estimator for a kernel smoothed version of the cumu...
04/27/2020

### Limit laws for the norms of extremal samples

Let denote S_n(p) = k_n^-1∑_i=1^k_n( log (X_n+1-i,n / X_n-k_n, n) )^p, w...
06/02/2020

### Robust and efficient mean estimation: approach based on the properties of self-normalized sums

Let X be a random variable with unknown mean and finite variance. We pre...

## 1 Introduction

The modern notion of a discrete associated kernel for smoothing or estimating discrete functions , defined on the discrete set , requires the development of new properties of convergences of the corresponding estimator. The support of is not subject to any restrictive condition; it can be bounded, unbounded, finite or infinite. In this sense, Abdous and Kokonendji (2009) presented some asymptotic properties for non-normalized discrete associated-kernel estimators of a probability mass function (pmf). Several authors pointed out the use of a discrete associated kernel from Dirac and discrete triangular kernels (Kokonendji et al., 2007; Kokonendji and Zocchi, 2010) and also from extensions of Dirac kernels proposed by Aitchison and Aitken (1976) for categorial data and Wang and Van Ryzin (1981). Furthermore, we have count kernels as the binomial (Kokonendji and Senga Kiessé, 2011) and, recently, the CoM-Poisson (Huang et al., 2021) kernels which are both underdispersed (i.e.

Harfouche et al. (2018) and Senga Kiessé (2017) for other properties. Notice that one can use them to estimate, instead of the pmf, discrete regression or weighted functions; see, e.g., Kokonendji and Somé (2021), Senga Kiessé and Cuny (2014) and, Senga Kiessé and Ventura (2016).

Let us firstly fix the refined definition of discrete associated kernel from Kokonendji and Somé (2018) and state in Theorem 1.2 some important asymptotic properties to be completed in this paper.

###### Definition 1.1.

Let be the discrete support of the pmf to be estimated, a target point and a bandwidth. A parameterized pmf on the discrete support is called "discrete associated kernel" if the following conditions are satisfied:

 x∈Sx,limh→0E(Zx,h)=xandlimh→0Var(Zx,h)=δ∈[0,1), (1)

where

denotes the discrete random variable with pmf

.

The choice of the discrete associated kernel referred to as a "second-order" satisfying in (1), ensures the convergence of its corresponding estimator; and, an elementary example is the naive or Dirac kernel for smoothing a very large sample of discrete data. Otherwise, the convergence of its corresponding estimator is not guaranteed; that is for in Definition 1.1, the discrete associated kernel is called of "first-order" like the well-known binomial kernel.

Let be a sample of independent and identically distributed (i.i.d.) discrete random variables having a pmf on . In general, the basical estimator of is not a pmf. Indeed, for some discrete associated-kernels (e.g., binomial, triangular and CoM-Poisson), the total mass of the corresponding estimator is not equal to 1. This limit is explained by the fact that the normalizing variable (which is equal to the sum over all the targets belonging to of the discrete associated-kernel estimator) was assumed to be equal to 1 only to simplify the calculations. More precisely, one can write both estimators as:

 ˆfn(x)=˜fn(x)Cn,x∈T, (2)

with

 ˜fn(x)=1nn∑i=1Kx,hn(Xi)% and Cn=∑x∈T˜fn(x)>0, (3)

where is an arbitrary sequence of positive smoothing parameters that satisfies , while is a suitably chosen discrete kernel function. If as for the kernels of Dirac, Aitchison and Aitken (1976) and Wang and Van Ryzin (1981), one obviously has . Hence:

###### Theorem 1.2.

(Abdous and Kokonendji, 2009) For any and under Assumptions (1) of the second-order (i.e., ), one has

 ˜fn(x)L2,a.s.−−−−→n→∞ f(x),

where stands for both “mean square and almost surely convergences". Furthermore, if then

 {˜fn(x)−E˜fn(x)}{Var˜fn(x)}−1/2D−−−→n→∞N(0,1),

where stands for “convergence in distribution" and

denotes the standard normal distribution.

In this paper we mainly extend Theorem 1.2 of the non-normalized estimator of (3) to the normalized one of (2), introducing new and non-restrictive assumptions with uniformities on the target point in the limits of (1) and, therefore, changing the types of convergences. As a matter of fact and more importantly, we shall demonstrate the convergence in mean square of the positive normalizing random variable of (3) to 1; which clearly completes the similar result in Kokonendji and Varron (2016, Theorem 2.1). The following Section 2 states different assumptions and their corresponding results with illustrations on the recent CoM-Poisson kernel estimator. The case of the first-order binomial kernel is briefly discussed. Finally, Section 3 is devoted to the detailed proofs.

## 2 Results and illustrations

In order to obtain some soft convergences of the pointwise normalized estimator at , we need quite strong assumptions instead of the most popular (1). In this way, we do not use concentration inequalities as in Kokonendji and Varron (2016) as well as Abdous and Kokonendji (2009) through, for instance, an inequality of Hoeffding (1963).

The first set of assumptions is uniformly in the target and it is satisfied, in our knowledge, by all discrete kernels of Definition 1.1 with :

 (A1):x∈Sx,limn→∞supx∈T|E(Zx,hn)−x|=0andlimn→∞supx∈TVar(Zx,hn)=0.

Hence, the following proposition provides a key point to establish the next result on the pointwise probability convergence of defined in (2).

###### Proposition 2.1.

Under Assumptions , the normalizing random variable converges in mean square to 1.

###### Theorem 2.2 (Consistency).

Under and for any , we have:

 ˆfn(x)P−−−→n→∞f(x),

where stands for convergence in probability.

On a finite discrete set , the pointwise and uniform convergences of a sequence of functions are equivalent; hence, the previous results are guaranteed when the discrete associated-kernel satisfies the common set of hypotheses (1) with (see Section 3 for further details).

###### Corollary 2.3 (Uniform consistency).

Suppose that the set is finite. Under Conditions (1) with , one has

 supx∈T∣∣ˆfn(x)−f(x)∣∣P−−−→n→∞ 0.

Regarding to a refined result of the asymptotic normality of , it is necessary to quantify the speed of convergence to 0 of and in . We therefore assume that these two sequences satisfy the following second set of conditions:

 (A2):x∈Sx,supx∈T|E(Zx,hn)−x|=O(hn)andsupx∈TVar(Zx,hn)=O(hn).

The previous Assumptions and also are verified by all the usual kernels of second-order introduced as examples in Section 1.

###### Theorem 2.4 (Asymptotic normality).

Let (A2) be satisfied. If the sequence is chosen such that as , then, for any such that , the sequence has a limiting centered normal distribution with variance .

To conclude this section, we highlight some of our previous results on the recent CoM-Poisson kernel estimator of Huang et al. (2021) and compare with the classical binomial one. In fact, we consider the refined version of the CoM-Poisson kernel satisfying (A1) and (A2) as follows: for each and any ,

 KCMPx,h(z)={λ(x,1/h)}z(z!)1/h{D(λ(x,1/h),1/h)}−1,

where is the normalizing constant and represents a function of and given by the solution of

 ∞∑z=0{λ(x,1/h)}z(z!)1/h(z−x)=0. (4)

This construction implies that and

 Var(ZCMPx,h)=h{λ(x,1/h)}h+O({λ(x,1/h)}−h)ash→0. (5)

Indeed, Huang (2017) proposed the parametrization via the mean of the original CoM-Poisson (Conway-Maxwell-Poisson or CMP) distribution; see, e.g., Shmueli et al. (2005), Kokonendji et al. (2008, Section 4.2), Gaunt et al. (2019) and Toledo et al. (2022, Section 2.2) for more details on the original form, asymptotic properties and the relative dispersion with respect to the standard Poisson model. Also demonstrating in the Appendix, the following proposition points out the mean and the main key of the variance behaviour (5) of this CoM-Poisson kernel which is of the secnd-order and underdispersed for .

###### Proposition 2.5.

Let be a count random variable following the mean-parametrized CoM-Poisson distribution with location (or mean) parameter and dispersion parameter such that its pmf is defined by

 p(y;μ,ν):=KCMPμ,1/ν(y),   y∈N. (6)

Then and, when as , the variance of verifies

 Var(Y)=1ν[λ(μ,ν)]1/ν+O({λ(μ,ν)}−1/ν).

As for the binomial kernel of first-order and underdispersed (Kokonendji and Senga Kiessé, 2011), one has: , for each and ,

 KBx,h(z)=(x+1)!z!(x+1−z)!(x+hx+1)z(1−hx+1)x+1−z

with as and

 Var(ZBx,h)=(x+h)(1−h)x+1. (7)

From Assumptions (1) and through (7), one here has which does not clearly satisfy the last condition of (A1) as well as for (A2). Notice that

is the pmf of the standard binomial distribution

with and . Nevertheless, we always use the binomial kernel for smoothing count data of small and moderate sample sizes.

All numerical studies are here performed using the classical binomial and the recent CoM-Poisson kernel smoothers with the aim to corroborate the previous theoretical results. Computations have been run on PC 2.30 GHz by using the R software R Core Team (2021). Both previous estimators are fitted using the Ake package by Wansouwé et al. (2016) and the mpcmp one of Fung et al. (2020), respectively. We evaluate the performances of these two discrete associated-kernel estimators with cross-validation choices of the optimal bandwidth parameter. In fact, the optimal bandwidth of using the cross-validation method is obtained through

 hcv=argminh>0[∑x∈T{ˆfn(x)}2−2nn∑i=1ˆfn,h,−i(Xi)],

where

 ˆfn,h,−i(Xi)=1n−1n∑ℓ=1,ℓ≠iKXi,h(Xℓ)

is being computed as without the observation .

We here consider four scenarios which are denoted by A, B, C and D to simulate count datasets with respect to the support of both discrete kernels. These scenarios have been considered to evaluate the performances of both smoothers to deal with zero-inflated, unimodal and multimodal distributions. We shall examine the efficiency of both smoothers via the empirical estimates of and of the integrated squared errors (ISE):

 ˆISEn:=1NsimNsim∑t=1∑x∈T{ˆfn(x)−f(x)}2 and ˆCn:=1NsimNsim∑t=1∑x∈T˜fn(x),

where is the number of replications and corresponds to the sample size which shall be small, medium and large.

• Scenario A is generated by using the Poisson distribution

 fA(x)=8xe−8x!,x∈N;
• Scenario B comes from the zero-inflated Poisson distribution

 fB(x)=(7101{x=0})+(310×10xe−10x!),x∈N;
• Scenario C is from a mixture of two Poisson distributions

 fC(x)=(25×0.5xe−0.5x!)+(35×8xe−8x!),x∈N;
• Scenario D comes from a mixture of three Poisson distributions

 fD(x)=(35×10xe−10x!)+(15×22xe−22x!)+(15×50xe−50x!),x∈N.

Table 1 reports some empirical mean values of and with their standard deviations using replications from Scenarios A, B, C and D to the corresponding sample sizes . For each given subsample and the discrete associated-kernel CoM-Poisson or binomial, we have to compute the related bandwidth through the cross-validation method before , , and finally . Hence, we observe the following behaviours. Firstly, when the sample size increases then all standard deviations of Table 1 steadly decrease towards . The normalizing constant for CoM-Poisson kernel estimator also becomes more and more precise to in absolute value; while the binomial one moves further away from 1 in absolute value for medium and large sample sizes, in particular for both zero-inflated Scenarios B and C. Next and as expected, the consistent CoM-Poisson smoother is increasingly accurate as sample size increases according to the criterion. It is seemingly better than the binomial one , especially for small and moderate sample sizes . With enormous surprise and satisfaction, the normalized binomial kernel smoother is also asymptotically consistent in practice, similar to the CoM-Poisson one for all used Scenarios. In fact, this normalizing process of by for obtaining apparently controls the consistency property of , even for the discrete first-order associated-kernel not verifying (A1). Finally, we can notify the importance of normalization of the discrete associated-kernel estimators of pmfs in practice; see, e.g., Wansouwé et al. (2016) and Kokonendji and Somé (2021) for some illustrations in uni- and multivariate cases.

Figure 1 illustrates both empirical distributions of (left) and (right) over replications of Scenario A with the sample size and the bandwidth . It is obviously remarkable that the normalized CoM-Poisson kernel estimator is more suitable than the one computed from a binomial kernel, for which the bias increases considerably with the sample size. Once again, these figures confirm the pointwise consistency as well as the pointwise asymptotic normality of the normalized CoM-Poisson kernel estimator, unlike the normalized discrete associated-kernel estimator obtained from the binomial kernel which does not verify our set of hypotheses.

Concerning an application on real data for pointing out the very competitive CoM-Poisson kernel, both discrete kernel estimators are finally used to smooth a count dataset on development days of insect pests on Hura trees with moderate sample size ; see Senga Kiessé (2017) and also Huang et al. (2021) for applications using these two discrete associated-kernel estimators among others. Practical performances are here examined via the cross-validation method and the empirical criterion of : , where is the empirical or naive estimator. The CoM-Poisson kernel appears to be the best with , and followed by the binomial smoother with , and finally ; see Figure 2 for graphical representations. Notice that, for the same dataset, Senga Kiessé (2017) produced and for the non-normalized binomial estimation . While Huang et al. (2021) only presented without for the non-normalized CoM-Poisson estimation with precisions to fit a non-zero probability outside of the observed range and also to preserve the sample mean of the dataset.

## 3 Proofs of results

###### Proof of Proposition 2.1.

Firstly, one easily has the following decomposition:

 E[|Cn−1|2]=Var(Cn)+(E[Cn]−1)2. (8)

We use Equation (3) and the fact that the ’s are i.i.d. to obtain

 Var(Cn) =∑x∈T∑y∈TCov(˜fn(x),˜fn(y)) =1n2∑x∈T∑y∈Tn∑i=1n∑j=1Cov(Kx,hn(Xi),Ky,hn(Xj)) =1n2∑x∈T∑y∈Tn∑i=1Cov(Kx,hn(Xi),Ky,hn(Xi)) =1n∑x∈T∑y∈TCov(Kx,hn(X1),Ky,hn(X1)) =1n∑x∈TVar[Kx,hn(X1)]+1n∑x∈T∑y∈T∖{x}Cov(Kx,hn(X1),Ky,hn(X1)). (9)

The bias term in (8) can be explicitly rewritten as:

 E[Cn]−1 =∑x∈TE[˜fn(x)]−∑x∈Tf(x) =∑x∈T∑z∈T∩SxKx,hn(z)f(z)−∑x∈T∑z∈SxKx,hn(z)f(x) =E1,n−E2,n, (10)

with

 E1,n =∑x∈T∑z∈T∩Sx(f(z)−f(x))Kx,hn(z)

and

 E2,n =∑x∈Tf(x)∑z∈¯¯¯T∩SxKx,hn(z),

where is the set .

Secondly, to make the proof more readable, we divide it in two steps according to the covergences to 0 of both variance and bias terms in (8).

### ⋄ Step 1: Convergence to 0 of the variance term in (8).

Under , one can prove that the first term in the right hand side of (3) converges to 0. As a matter of fact, observe first that

 Var[Kx,hn(X1)]−f(x){1−f(x)} ={∑z∈T∩Sx(Kx,hn(z))2f(z)−f(x)} +{f(x)2−(E[˜fn(x)])2} =:F1,n(x)+F2,n(x). (11)

The sets and are discrete. So, one can find a finite constant (which does not depend on ) such that for any in . Hence, the use of Markov’s inequality and Assumptions lead to deduce that the first sequence in (3) converges uniformly on to 0 as follows:

 supx∈T|F1,n(x)| ≤supx∈T{f(x)(1−Kx,hn(x))Kx,hn(x) +∑z∈T∩Sx∖{x}|f(z)Kx,hn(z)−f(x)|Kx,hn(z)+f(x)∑z∈¯¯¯T∩SxKx,hn(z)⎫⎪⎬⎪⎭ ≤4supx∈T{1−Kx,hn(x)} ≤4supx∈TP(|Zx,hn−x|≥α) ≤4α2{supx∈TVar(Zx,hn)+(supx∈T|E[Zx,hn]−x|)2}→0, as n→∞.

Similarly, we show that the second sequence in (3) also converges uniformly on to 0. Let us be more precise. Note that

 |F2,n(x)| ≤2∣∣E[˜fn(x)]−f(x)∣∣ ≤2⎧⎪⎨⎪⎩∑z∈T∩Sx∖{x}Kx,hn(z)|f(z)−f(x)|+f(x)∑z∈¯¯¯T∩SxKx,hn(z)⎫⎪⎬⎪⎭ ≤6P(Zx,hn≠x),

where we have used the fact that if , then necessarily . Arguing as before, we obtain the expected uniform convergence of to 0.

Consequently, from Equation (3) the sequence converges uniformly on to . Finally, there exists such that for any , we have

 1n∑x∈TVar[Kx,hn(X1)]≤2n∑x∈Tf(x){1−f(x)}≤2n,

and the enacted convergence is obtained.

We now deal with the second term on the right hand side of Equation (3). Using the definition of the non-normalized associated-kernel estimator introduced in Equation (3), one can write that for any and all ,

 Cov(Kx,hn(X1),Ky,hn(X1)) =∑z∈T∩Sx∩SyKx,hn(z)Ky,hn(z)f(z) −(∑z∈T∩SxKx,hn(z)f(z))⎛⎝∑z∈T∩SyKy,hn(z)f(z)⎞⎠ =∑z∈T∩Sx∩SyKx,hn(z)Ky,hn(z)f(z)−E[˜fn(x)]E[˜fn(y)].

It then follows that

 Cov(Kx,hn(X1),Ky,hn(X1))+f(x)f(y) =∑z∈T∩Sx∩SyKx,hn(z)Ky,hn(z)f(z) −(E[˜fn(x)]−f(x))E[˜fn(y)] −(E[˜fn(y)]−f(y))f(x).

Thus, one has

 sup(x,y)∈T2x≠y∣∣Cov(Kx,hn(X1),Ky,hn(X1))+f(x)f(y)∣∣≤G1,n+2G2,n,

with

 G1,n=sup(x,y)∈T2x≠y∑z∈T∩Sx∩SyKx,hn(z)Ky,hn(z)f(z)    % and    G2,n=supx∈T∣∣E[˜fn(x)]−f(x)∣∣.

Following the same arguments used to prove the convergence of to 0, we show that converges to 0. Indeed, observe that

 ∑z∈T∩Sx∩SyKx,hn(z)Ky,hn(z)f(z) =Kx,hn(x)Ky,hn(x)f(x) +∑z∈T∩Sx∩Syz≠xKx,hn(z)Ky,hn(z)f(z) ≤∑z∈Sy∖{y}Ky,hn(z)+∑z∈Sx∖{x}Kx,hn(z) =P(Zy,hn≠y)+P(Zx,hn≠x).

Hence, the same lines of proof given before can be reproduced to show that converges to 0. The sequence is uniformly convergent on the set with limit . It can therefore be easily demonstrated that there exists a positive integer such that for any , one has

 1n∑x∈T∑y∈T∖{x}Cov(Kx,hn(X1),Ky,hn(X1)) ≤1n∑x∈Tf(x)∑y∈T∖{x}f(y)≤1n.

This completes the proof of the convergence to 0 of the second term on the right hand side of (3) and, finally, the proof of the convergence to 0 of the variance term in (8).

### ⋄ Step 2: Convergence to 0 of the bias term in (8).

The sequence introduced in Equation (3) clearly converges to 0 since

 E2,n≤supx∈TP(Zx,hn≠x)→0, as n→∞.

We now use Equation (3) and the same arguments developed in the previous step to obtain that

 |E1,n| ≤∑x∈T∣∣ ∣∣(∑z∈T∩Sxf(z)Kx,hn(z))−f(x)+f(x){1−Kx,hn(x)}∣∣ ∣∣ +∑x∈Tf(x)∑z∈T∩Sx∖{x}f(z)Kx,hn(z) ≤∑x∈T∣∣E[˜fn(x)−f(x)]∣∣+2supx∈TP(Zx,hn≠x)

Consequently, the bias term in (8) converges to 0. This concludes the proof of the proposition. ∎

###### Proof of Theorem 2.2.

Note that, for any , one can express

 ˆfn(x)−f(x) =1Cn{(˜fn(x)−f(x))+(1−Cn)f(x)}.

Theorem 1.2 of Abdous and Kokonendji (2009) recalls that converges in mean square to ; such result obviously remains valid in our context. Proposition 2.1 and Slutsky’s theorem complete the proof. ∎

###### Proof of Corollary 2.3.

It is enough to observe that

 supx∈T∣∣ˆfn(x)−f(x)∣∣ =1Cnsupx∈T∣∣˜fn(x)−f(x)+f(x)(1−Cn)∣∣ ≤1Cn{|1−Cn|+supx∈T∣∣˜fn(x)−f(x)∣∣}.

Consequently, Proposition 2.1, Theorem 2.2 and the continuous mapping theorem easily allow to deduce the corollary. ∎

###### Proof of Theorem 2.4.

From the end of Theorem 1.2, one may first observe that

 √n(ˆfn(x)−f(x)) =1Cn⎛⎜ ⎜ ⎜ ⎜⎝˜fn(x)−E[˜fn(x)]√Var{˜fn(x)}⎞⎟ ⎟ ⎟ ⎟⎠√nVar{˜fn(x)} +√nCn(E[˜fn(x)]−Cnf(x)) =1Cn⎛⎜ ⎜ ⎜ ⎜⎝˜fn(x)−E[˜fn(x)]√Var{˜fn(x)}⎞⎟ ⎟ ⎟ ⎟⎠√nVar{˜fn(x)} +√nCn(E[˜fn(x)]−f(x))+√nCn(1−Cn)f(x). (12)

Let be the rowwise i.i.d. triangular array defined by

 Yn,i=Kx,hn(Xi)−E[Kx,hn(Xi)]√nVar(Kx,hn(Xi)),   i=1,…,n.

It is clear that for all and any ,

 E[Yn,i]=0    and    n∑i=1E[Y