DeepAI

Bivariate density estimation using normal-gamma kernel with application to astronomy

We consider the problem of estimation of a bivariate density function with support ×[0,∞), where a classical bivariate kernel estimator causes boundary bias due to the non-negative variable. To overcome this problem, we propose four kernel density estimators whose performances are compared in terms of the mean integrated squared error. Simulation study shows that the estimator based on our proposed normal-gamma (NG) kernel performs best, whose applicability is demonstrated using two astronomical data sets.

• 2 publications
• 6 publications
02/17/2020

Density estimation using Dirichlet kernels

In this paper, we introduce Dirichlet kernels for the estimation of mult...
04/14/2019

Recursive density estimators based on Robbins-Monro's scheme and using Bernstein polynomials

In this paper, we consider the alleviation of the boundary problem when ...
08/29/2022

PGNAA Spectral Classification of Metal with Density Estimations

For environmental, sustainable economic and political reasons, recycling...
05/28/2020

Boundary-free Estimators of the Mean Residual Life Function by Transformation

We propose two new kernel-type estimators of the mean residual life func...
12/11/2018

A combined strategy for multivariate density estimation

Non-linear aggregation strategies have recently been proposed in respons...
05/02/2022

Incomplete Gamma Kernels: Generalizing Locally Optimal Projection Operators

We present incomplete gamma kernels, a generalization of Locally Optimal...
05/28/2020

Boundary-free Kernel-smoothed Goodness-of-fit Tests for Data on General Interval

We propose kernel-type smoothed Kolmogorov-Smirnov and Cramér-von Mises ...

1 Introduction

Let , be independent realizations of

-dimensional random variable

having an unknown continuous

-variate probability density function

. In this chapter, we concentrate on the problem of estimating by kernel density estimator, in which with support can be estimated by a variate classical kernel estimator (see, for example, Silverman, 1986; Wand and Jones, 1995). But this causes boundary bias in case of bounded or semi-bounded support. To solve this problem in univariate set-up, the associated kernels are proposed (see, for example, Chen, 1999, 2000; Libengué, 2013; Igarashi and Kakizawa, 2014), whereas, in multivariate set-up, the boundary bias can be omitted by using the product of univariate associated kernels (see, for example, Bouerzmarni and Rombouts, 2010). In the context of multivariate associated kernel, Kokonendji and Somé (2018) propose a bivariate beta kernel with a correlation structure. Now, when the support is a cartesian product of and bounded or semi-bounded sets, can be estimated using the product of univariate classical kernels and univariate associated kernels. Here, in particular, we consider the estimation of a bivariate density function with support .

In this regard, Section 2 contains the properties of the estimators based on the product of a univariate classical kernel and a univariate gamma kernel. Section 3 provides bivariate density estimators based on normal-gamma () kernels. Section 4 discusses the relative performances of the estimators through simulation followed by data study in Section 5. Section 6 has the discussion, whereas some technical details are deferred to the Appendix.

2 Product of classical and gamma kernels

Consider a bivariate continuous density function satisfying (i) , (ii) is twice continuously partially differentiable on , (iii) and . To estimate , we consider the estimator as follows

 ^f1(x)=1nhn∑i=1K(x1−Xi1h)Kx2/b2+1,b2(Xi2), (1)

where is the classical kernel satisfying (a) , (b) , (c) and (d) , with bandwidth satisfying and as . is the first class of gamma kernels (Chen, ) defined as

 Kx2/b2+1,b2(t)=tx2/b2e−t/b2b2(x2/b2+1)Γ(x2/b2+1),

where is the gamma function, with bandwidth satisfying and as . Bandwidths of the kernels are so chosen as to make the amount of smoothing in the same scale for both the kernels. In general, any associated kernel can be used here. However, we choose the gamma kernel due to its flexible properties (see, for example, Chen, ).

Now, using , we get

 E{^f1(x)} =∫∞−∞∫∞0[h−1K(x1−y1h)]Kx2/b2+1,b2(y2)f(y1,y2)dy1dy2 =∫∞−∞∫∞0K(t)Kx2/b2+1,b2(y2)f(x1−ht,y2)dtdy2 =∫∞−∞K(t)Eξx2[f(x1−ht,y2)]dt, (2)

where follows gamma.

Again, Taylor series expansion gives as

 f(x1,x2)+Eξx2(x1−ht−x1)f1(x)+Eξx2(ξx2−x2)f2(x) +12Eξx2{(x1−ht−x1)2}f11(x)+12Eξx2{(ξx2−x2)2}f22(x) +12Eξx2{(−ht)(ξx2−x2)}f12(x)+12Eξx2{(−ht)(ξx2−x2)}f21(x) +o(h2+b2) =f(x)−htf1(x)+Eξx2(ξx2−x2−b2)f2(x)+b2f2(x) +12h2t2f11(x)+12Var(ξx2)f22(x)+o(h2+b2),

where and . Then, substituting the last expression in (2), we get

 E{^f1(x)}=f(x)+12k2h2f11(%x)+b2{f2(x)+12x2f22(x)}+o(h2+b2),

which implies is

 12k2h2f11(x)+b2{f2(x)+12x2f22(x)}+o(h2+b2)=O(h2+b2). (3)

This shows estimator is free of boundary bias and the corresponding integrated squared bias is given by

 ∫{Bias(^f1(x))}2dx =14k22h4∫{f11(x)}2d% x+b4∫{f2(x)+12x2f22(x)}2dx +k2h2b2∫f11(x){f2(x)+12x2f22(x)}dx+o(h4+b4). (4)

Now,

 Var{^f1(x)} =n−1Var{K(Xi1)Kx2/b2+1,b2(Xi2)} =n−1E{K(Xi1)Kx2/b2+1,b2(Xi2)}2+O(n−1) =n−1h−1∫∞−∞∫∞0K2(t)K2x2/b2+1,b2(y2)f(x1−ht,y2)dtdy2 +O(n−1)

and

 ∫∞0K2x2/b2+1,b2(y2)f(x1−ht,y2)dy2=Bb(x2)Eηx2{f(x1−ht,ηx2)},

where follows gamma and . Lemma of Brown and Chen gives

 Bb(x2)∼ 12√πb−1x−1/2 if x2b2→∞, Bb(x2)∼ Γ(2κ+1)21+2κΓ2(κ+1)b−2 if x2b2→κ (a non-negative constant),

which implies

 Var{^f1(x)}∼ 12√πn−1h−1b−1k3x−1/22f(%x) if x2b2→∞, Var{^f1(x)}∼ Γ(2κ+1)21+2κΓ2(κ+1)n−1h−1b−2k3f(x) if x2b2→κ, Var{^f1(x)}∼ (5)

where . Expressions (3) and (2) imply that for and as , the nonparametric density estimator is consistent for the true density function at each point x. Now, for with ,

 ∫∞−∞∫∞0Var{^f1(% x)}dx1dx2 =∫∞−∞∫δ0Var{^f1(%x)}dx1dx2+∫∞−∞∫∞δVar{^f1(x)}dx1dx2 =12√πn−1h−1b−1k3∫∞−∞∫∞δx−1/22f(x)dx1dx2+O(n−1h−1b−ϵ) =12√πn−1h−1b−1k3∫∞−∞∫∞0x−1/22f(x)dx1dx2+o(n−1h−1b−1), (6)

provided is finite.

Combining (2) and (6), the mean integrated squared error (MISE) is obtained as

 MISE{^f1(x)}=∫{Bias(^f1(x% ))}2dx+∫Var{^f1(x)}dx =14k22h4∫{f11(x)}2d% x+b4∫{f2(x)+12x2f22(x)}2dx +k2h2b2∫f11(x){f2(x)+12x2f22(x)}dx +12√πn−1h−1b−1k3∫x−1/22f(x)dx+o(n−1h−1b−1+h4+b4) (7)

and the leading terms in (7) give the expression of the corresponding asymptotic mean integrated squared error (AMISE). AMISE is optimal for and , where and are constants, i.e. the optimal bandwidths for kernel density estimator are and which give as

 [14k22h40∫{f11(x)}2dx+b40∫{f2(x)+12x2f22(% x)}2dx +k2h20b20∫f11(x){f2(x)+12x2f22(x)}dx +12√πn−1h−10b−10k3∫x−1/22f(x)dx]n−2/3.

For , the optimal is

 [12√πk3∫x−1/22f(x)dx2∫{12k2f11(x)+f2(x)+12x2f22(x)}2dx]1/6n−1/6,

which gives as

 322/3[∫{12k2f11(x)+f2(x)+12x2f22(x)}2dx]1/3 [12√πk3∫x−1/22f(x)dx]2/3n−2/3.

Another estimator of is considered as

 ^f2(x)=1nhn∑i=1K(x1−Xi1h)Kρb2(x2),b2(Xi2), (8)

where is the second class of gamma kernels (Chen, ) defined as

 ρb2(x2)= x2/b2 if x2≥2b2, ρb2(x2)= 14(x2/b2)2+1 if x2∈[0,2b2). (9)

So, , given by

 12k2h2f11(x)+12b2x2f22(x)+o(h2+b2) if x2≥2b2, 12k2h2f11(x)+b2{ρb2(x2)−x2/b2}f2(x) if x2∈[0,2b2), +o(h2+b2) (10)

which shows the boundary unbiasedness of estimator and for a non-negative constant ,

 Var{^f2(x)}∼ 12√πn−1h−1b−1k3x−1/22f(%x) if x2b2→∞, Var{^f2(x)}∼ Γ(κ2/2+1)21+κ2/2Γ2(κ2/4+1)n−1h−1b−2k3f(x) if x2b2→κ,

imply

 MISE{^f2(x)} =14k22h4∫{f11(x)}2d% x+14b4∫{x2f22(x)}2dx +12k2h2b2∫f11(x){x2f22(x)}dx+o(h4+b4) +12√πn−1h−1b−1k3∫x−1/22f(x)dx+o(n−1h−1b−1).

For , the optimal is

 [12√πk3∫x−1/22f(x)dx12∫{k2f11(x)+x2f22(x)}2dx]1/6n−1/6,

which corresponds to , given by

 324/3[∫{k2f11(x)+x2f22(x)}2dx]1/3[12√πk3∫x−1/22f(x)dx]2/3n−2/3.

Observe that implies is expected to have a better asymptotic performance than .

3 Bivariate density estimation using Ng kernel

Consider the density function

of a bivariate normal-gamma distribution defined as (Bernardo and Smith,

)

 KΘ(t1,t2) =NG(t1,t2|Θ=(μ,λ,α,β))=N(t1|μ,(λt2)−1)Ga(t2|α,β) =√λt22πe−(t1−μ)2λt22×βαΓ(α)tα−12e−βt2 (11)

with , where and , respectively, stand for normal and gamma distributions. Using (3), we define the following estimator of as

 ^f3(x)=1nn∑i=1KΘ1(Xi), (12)

where is the kernel with such that the bandwidths and as Then,

 E{^f3(x)}=∫KΘ1(y)dy=E{f(ξx)},

where follows , which implies and . By Taylor series expansion we get (see, Appendix A.1),

 E{f(ξx)} =f(x)+E(ξx1−x1)f1(x)+E(ξx2−x2)f2(x) +12E{(ξx1−x1)(ξx2−x2)}{f12(x)+f21(x)} +12E(ξx1−x1)2f11(x)+12E(ξx2−x2)2f22(x)+o(b1+b2) =f(x)+b1{12|x1|f11(x)}+b2{2f2(x)+12x2f22(x)}+o(b1+b2).

Therefore, is given by

 b1{12|x1|f11(x)}+b2{2f2(x)+12x2f22(x)}+o(b1+b2)=O(b1+b2), (13)

which shows estimator is free of boundary bias, and the integrated squared bias is

 ∫{Bias(^f3(x))}2dx =14b21∫{x1f11(x)}2d% x+b22∫{2f2(x)+12x2f22(x)}2dx +b1b2∫{|x1|f11(x)}{2f2(% x)+12x2f22(x)}dx+o(b21+b22). (14)

The variance of

is

 Var{^f3(x)}∼ 14π√en−1b−1/21b−1/22|x1|−1/2x−1/22f(x) if |x1|/b1→∞, x2/b2→∞, Var{^f3(x)}∼ Γ(2κ2+7/2)√π22κ2+9/2√(κ1+1)(κ2+1)Γ2(κ2+2)n−1b−11b−12f(x) if |x1|/b1→κ1, x2/b2→κ2, Var{^f3(x)}∼ Γ(2κ2+7/2)√π22κ2+9/2√(κ2+1)Γ2(κ2+2)n−1b−1/21b−12|x1|−1/2f(x) if |x1|/b1→∞, x2/b2→κ2, Var{^f3(x)}∼ 14π√e√κ1+1n−1b−11b−1/22x−1/22f(x) if |x1|/b1→κ1, x2/b2→∞,

for non-negative constants (see, Appendix A.2), and

 ∫Var{^f3(x)}dx =14π√en−1b−1/21b−1/22∫|x1|−1/2x−1/22f(x)dx+o(n−1b−1/21b−1/22), (15)

assuming (see, Appendix A.3).

Now, combining (3) and (3), we get the expression of the MISE as follows

 MISE{^f3(x)} =14b21∫{|x1|f11(x)}2dx+b22∫{2f2(x)+12x2f22(x)}2dx +b1b2∫|x1|f11(x){2f2(x)+12x2f22(x)}dx +14