1 Introduction
The modern notion of a discrete associated kernel for smoothing or estimating discrete functions , defined on the discrete set , requires the development of new properties of convergences of the corresponding estimator. The support of is not subject to any restrictive condition; it can be bounded, unbounded, finite or infinite. In this sense, Abdous and Kokonendji (2009) presented some asymptotic properties for nonnormalized discrete associatedkernel estimators of a probability mass function (pmf). Several authors pointed out the use of a discrete associated kernel from Dirac and discrete triangular kernels (Kokonendji et al., 2007; Kokonendji and Zocchi, 2010) and also from extensions of Dirac kernels proposed by Aitchison and Aitken (1976) for categorial data and Wang and Van Ryzin (1981). Furthermore, we have count kernels as the binomial (Kokonendji and Senga Kiessé, 2011) and, recently, the CoMPoisson (Huang et al., 2021) kernels which are both underdispersed (i.e.
, variance less than mean). See also
Harfouche et al. (2018) and Senga Kiessé (2017) for other properties. Notice that one can use them to estimate, instead of the pmf, discrete regression or weighted functions; see, e.g., Kokonendji and Somé (2021), Senga Kiessé and Cuny (2014) and, Senga Kiessé and Ventura (2016).Let us firstly fix the refined definition of discrete associated kernel from Kokonendji and Somé (2018) and state in Theorem 1.2 some important asymptotic properties to be completed in this paper.
Definition 1.1.
Let be the discrete support of the pmf to be estimated, a target point and a bandwidth. A parameterized pmf on the discrete support is called "discrete associated kernel" if the following conditions are satisfied:
(1) 
where
denotes the discrete random variable with pmf
.The choice of the discrete associated kernel referred to as a "secondorder" satisfying in (1), ensures the convergence of its corresponding estimator; and, an elementary example is the naive or Dirac kernel for smoothing a very large sample of discrete data. Otherwise, the convergence of its corresponding estimator is not guaranteed; that is for in Definition 1.1, the discrete associated kernel is called of "firstorder" like the wellknown binomial kernel.
Let be a sample of independent and identically distributed (i.i.d.) discrete random variables having a pmf on . In general, the basical estimator of is not a pmf. Indeed, for some discrete associatedkernels (e.g., binomial, triangular and CoMPoisson), the total mass of the corresponding estimator is not equal to 1. This limit is explained by the fact that the normalizing variable (which is equal to the sum over all the targets belonging to of the discrete associatedkernel estimator) was assumed to be equal to 1 only to simplify the calculations. More precisely, one can write both estimators as:
(2) 
with
(3) 
where is an arbitrary sequence of positive smoothing parameters that satisfies , while is a suitably chosen discrete kernel function. If as for the kernels of Dirac, Aitchison and Aitken (1976) and Wang and Van Ryzin (1981), one obviously has . Hence:
Theorem 1.2.
(Abdous and Kokonendji, 2009) For any and under Assumptions (1) of the secondorder (i.e., ), one has
where stands for both “mean square and almost surely convergences". Furthermore, if then
where stands for “convergence in distribution" and
denotes the standard normal distribution.
In this paper we mainly extend Theorem 1.2 of the nonnormalized estimator of (3) to the normalized one of (2), introducing new and nonrestrictive assumptions with uniformities on the target point in the limits of (1) and, therefore, changing the types of convergences. As a matter of fact and more importantly, we shall demonstrate the convergence in mean square of the positive normalizing random variable of (3) to 1; which clearly completes the similar result in Kokonendji and Varron (2016, Theorem 2.1). The following Section 2 states different assumptions and their corresponding results with illustrations on the recent CoMPoisson kernel estimator. The case of the firstorder binomial kernel is briefly discussed. Finally, Section 3 is devoted to the detailed proofs.
2 Results and illustrations
In order to obtain some soft convergences of the pointwise normalized estimator at , we need quite strong assumptions instead of the most popular (1). In this way, we do not use concentration inequalities as in Kokonendji and Varron (2016) as well as Abdous and Kokonendji (2009) through, for instance, an inequality of Hoeffding (1963).
The first set of assumptions is uniformly in the target and it is satisfied, in our knowledge, by all discrete kernels of Definition 1.1 with :
Hence, the following proposition provides a key point to establish the next result on the pointwise probability convergence of defined in (2).
Proposition 2.1.
Under Assumptions , the normalizing random variable converges in mean square to 1.
Theorem 2.2 (Consistency).
Under and for any , we have:
where stands for convergence in probability.
On a finite discrete set , the pointwise and uniform convergences of a sequence of functions are equivalent; hence, the previous results are guaranteed when the discrete associatedkernel satisfies the common set of hypotheses (1) with (see Section 3 for further details).
Corollary 2.3 (Uniform consistency).
Suppose that the set is finite. Under Conditions (1) with , one has
Regarding to a refined result of the asymptotic normality of , it is necessary to quantify the speed of convergence to 0 of and in . We therefore assume that these two sequences satisfy the following second set of conditions:
The previous Assumptions and also are verified by all the usual kernels of secondorder introduced as examples in Section 1.
Theorem 2.4 (Asymptotic normality).
Let (A2) be satisfied. If the sequence is chosen such that as , then, for any such that , the sequence has a limiting centered normal distribution with variance .
To conclude this section, we highlight some of our previous results on the recent CoMPoisson kernel estimator of Huang et al. (2021) and compare with the classical binomial one. In fact, we consider the refined version of the CoMPoisson kernel satisfying (A1) and (A2) as follows: for each and any ,
where is the normalizing constant and represents a function of and given by the solution of
(4) 
This construction implies that and
(5) 
Indeed, Huang (2017) proposed the parametrization via the mean of the original CoMPoisson (ConwayMaxwellPoisson or CMP) distribution; see, e.g., Shmueli et al. (2005), Kokonendji et al. (2008, Section 4.2), Gaunt et al. (2019) and Toledo et al. (2022, Section 2.2) for more details on the original form, asymptotic properties and the relative dispersion with respect to the standard Poisson model. Also demonstrating in the Appendix, the following proposition points out the mean and the main key of the variance behaviour (5) of this CoMPoisson kernel which is of the secndorder and underdispersed for .
Proposition 2.5.
Let be a count random variable following the meanparametrized CoMPoisson distribution with location (or mean) parameter and dispersion parameter such that its pmf is defined by
(6) 
Then and, when as , the variance of verifies
As for the binomial kernel of firstorder and underdispersed (Kokonendji and Senga Kiessé, 2011), one has: , for each and ,
with as and
(7) 
From Assumptions (1) and through (7), one here has which does not clearly satisfy the last condition of (A1) as well as for (A2). Notice that
is the pmf of the standard binomial distribution
with and . Nevertheless, we always use the binomial kernel for smoothing count data of small and moderate sample sizes.All numerical studies are here performed using the classical binomial and the recent CoMPoisson kernel smoothers with the aim to corroborate the previous theoretical results. Computations have been run on PC 2.30 GHz by using the R software R Core Team (2021). Both previous estimators are fitted using the Ake package by Wansouwé et al. (2016) and the mpcmp one of Fung et al. (2020), respectively. We evaluate the performances of these two discrete associatedkernel estimators with crossvalidation choices of the optimal bandwidth parameter. In fact, the optimal bandwidth of using the crossvalidation method is obtained through
where
is being computed as without the observation .
We here consider four scenarios which are denoted by A, B, C and D to simulate count datasets with respect to the support of both discrete kernels. These scenarios have been considered to evaluate the performances of both smoothers to deal with zeroinflated, unimodal and multimodal distributions. We shall examine the efficiency of both smoothers via the empirical estimates of and of the integrated squared errors (ISE):
where is the number of replications and corresponds to the sample size which shall be small, medium and large.

Scenario A is generated by using the Poisson distribution

Scenario B comes from the zeroinflated Poisson distribution

Scenario C is from a mixture of two Poisson distributions

Scenario D comes from a mixture of three Poisson distributions
A  10  0.98690 (0.01035)  0.91187 (0.05425)  0.02961 (0.02706)  0.01466 (0.01863) 


25  0.99321 (0.00452)  0.96705 (0.02471)  0.01004 (0.00599)  0.00861 (0.00999) 

50  0.99460 (0.00259)  0.98634 (0.01233)  0.00566 (0.00344)  0.00557 (0.00425) 

100  0.99570 (0.00147)  0.99525 (0.00307)  0.00271 (0.00204)  0.00291 (0.00268) 

250  0.99685 (0.00076)  0.99973 (0.00104)  0.00076 (0.00057)  0.00131 (0.00105) 
500  0.99703 (0.00043)  1.00015 (0.00081)  0.00017 (0.00021)  0.00042 (0.00047)  
B 
10  0.98663 (0.03655)  0.94781 (0.05083)  0.03326 (0.02456)  0.02232 (0.01877) 

25  0.99788 (0.02049)  0.98053 (0.03101)  0.01392 (0.00871)  0.01054 (0.01033) 

50  1.00869 (0.00826)  1.00095 (0.00739)  0.00696 (0.00330)  0.00573 (0.00350) 
100  1.01265 (0.00557)  0.99951 (0.00140)  0.00352 (0.00204)  0.00343 (0.00227)  

250  1.01272 (0.00480)  0.99921 (0.00111)  0.00055 (0.00034)  0.00107 (0.00076) 
500  1.01460 (0.00238)  0.99969 (0.00059)  0.00051 (0.00033)  0.00072 (0.00055)  
C 
10  0.91176 (0.07860)  1.01341 (0.03870)  0.03842 (0.02421)  0.02711 (0.02782) 

25  0.94838 (0.05030)  1.03786 (0.02800)  0.01175 (0.00786)  0.01021 (0.00874) 

50  0.98508 (0.02758)  1.03479 (0.01489)  0.00499 (0.00297)  0.00520 (0.00451) 

100  1.00242 (0.01048)  1.02441 (0.01134)  0.00273 (0.00159)  0.00336 (0.00265) 
250  1.04017 (0.01055)  1.01232 (0.01244)  0.00053 (0.00047)  0.00078 (0.00061)  
500  1.03560 (0.00892)  1.00365 (0.00361)  0.00080 (0.00049)  0.00051 (0.00031)  
D 
10  0.99122 (0.00118)  0.95058 (0.01701)  0.02489 (0.00987)  0.01315 (0.02328) 
25  0.99556 (0.00171)  0.97465 (0.01179)  0.00955 (0.00376)  0.00533 (0.00627)  
50  0.99770 (0.00058)  0.99276 (0.00479)  0.00296 (0.00146)  0.00294 (0.00256)  
100  0.99839 (0.00045)  0.99711 (0.00209)  0.00098 (0.00044)  0.00125 (0.00089)  
250  0.99889 (0.00018)  0.99919 (0.00080)  0.00022 (0.00012)  0.00043 (0.00038)  
500  1.01080 (0.00015)  1.00061 (0.00031)  0.00020 (0.00010)  0.00042 (0.00024) 
with their standard deviations in parentheses over
replications and with different sample sizes under four Scenarios A, B, C and D by using CoMPoisson and binomial kernels with crossvalidated bandwidth selection.Table 1 reports some empirical mean values of and with their standard deviations using replications from Scenarios A, B, C and D to the corresponding sample sizes . For each given subsample and the discrete associatedkernel CoMPoisson or binomial, we have to compute the related bandwidth through the crossvalidation method before , , and finally . Hence, we observe the following behaviours. Firstly, when the sample size increases then all standard deviations of Table 1 steadly decrease towards . The normalizing constant for CoMPoisson kernel estimator also becomes more and more precise to in absolute value; while the binomial one moves further away from 1 in absolute value for medium and large sample sizes, in particular for both zeroinflated Scenarios B and C. Next and as expected, the consistent CoMPoisson smoother is increasingly accurate as sample size increases according to the criterion. It is seemingly better than the binomial one , especially for small and moderate sample sizes . With enormous surprise and satisfaction, the normalized binomial kernel smoother is also asymptotically consistent in practice, similar to the CoMPoisson one for all used Scenarios. In fact, this normalizing process of by for obtaining apparently controls the consistency property of , even for the discrete firstorder associatedkernel not verifying (A1). Finally, we can notify the importance of normalization of the discrete associatedkernel estimators of pmfs in practice; see, e.g., Wansouwé et al. (2016) and Kokonendji and Somé (2021) for some illustrations in uni and multivariate cases.
Figure 1 illustrates both empirical distributions of (left) and (right) over replications of Scenario A with the sample size and the bandwidth . It is obviously remarkable that the normalized CoMPoisson kernel estimator is more suitable than the one computed from a binomial kernel, for which the bias increases considerably with the sample size. Once again, these figures confirm the pointwise consistency as well as the pointwise asymptotic normality of the normalized CoMPoisson kernel estimator, unlike the normalized discrete associatedkernel estimator obtained from the binomial kernel which does not verify our set of hypotheses.
Concerning an application on real data for pointing out the very competitive CoMPoisson kernel, both discrete kernel estimators are finally used to smooth a count dataset on development days of insect pests on Hura trees with moderate sample size ; see Senga Kiessé (2017) and also Huang et al. (2021) for applications using these two discrete associatedkernel estimators among others. Practical performances are here examined via the crossvalidation method and the empirical criterion of : , where is the empirical or naive estimator. The CoMPoisson kernel appears to be the best with , and followed by the binomial smoother with , and finally ; see Figure 2 for graphical representations. Notice that, for the same dataset, Senga Kiessé (2017) produced and for the nonnormalized binomial estimation . While Huang et al. (2021) only presented without for the nonnormalized CoMPoisson estimation with precisions to fit a nonzero probability outside of the observed range and also to preserve the sample mean of the dataset.
3 Proofs of results
Proof of Proposition 2.1.
Firstly, one easily has the following decomposition:
(8) 
We use Equation (3) and the fact that the ’s are i.i.d. to obtain
(9) 
The bias term in (8) can be explicitly rewritten as:
(10) 
with
and
where is the set .
Secondly, to make the proof more readable, we divide it in two steps according to the covergences to 0 of both variance and bias terms in (8).
Step 1: Convergence to 0 of the variance term in (8).
Under , one can prove that the first term in the right hand side of (3) converges to 0. As a matter of fact, observe first that
(11) 
The sets and are discrete. So, one can find a finite constant (which does not depend on ) such that for any in . Hence, the use of Markov’s inequality and Assumptions lead to deduce that the first sequence in (3) converges uniformly on to 0 as follows:
Similarly, we show that the second sequence in (3) also converges uniformly on to 0. Let us be more precise. Note that
where we have used the fact that if , then necessarily . Arguing as before, we obtain the expected uniform convergence of to 0.
Consequently, from Equation (3) the sequence converges uniformly on to . Finally, there exists such that for any , we have
and the enacted convergence is obtained.
We now deal with the second term on the right hand side of Equation (3). Using the definition of the nonnormalized associatedkernel estimator introduced in Equation (3), one can write that for any and all ,
It then follows that
Thus, one has
with
Following the same arguments used to prove the convergence of to 0, we show that converges to 0. Indeed, observe that
Hence, the same lines of proof given before can be reproduced to show that converges to 0. The sequence is uniformly convergent on the set with limit . It can therefore be easily demonstrated that there exists a positive integer such that for any , one has
This completes the proof of the convergence to 0 of the second term on the right hand side of (3) and, finally, the proof of the convergence to 0 of the variance term in (8).
Step 2: Convergence to 0 of the bias term in (8).
The sequence introduced in Equation (3) clearly converges to 0 since
We now use Equation (3) and the same arguments developed in the previous step to obtain that
Consequently, the bias term in (8) converges to 0. This concludes the proof of the proposition. ∎