# The Gram-Charlier A Series based Extended Rule-of-Thumb for Bandwidth Selection in Univariate and Multivariate Kernel Density Estimations

The article derives a novel Gram-Charlier A (GCA) Series based Extended Rule-of-Thumb (ExROT) for bandwidth selection in Kernel Density Estimation (KDE). There are existing various bandwidth selection rules achieving minimization of the Asymptotic Mean Integrated Square Error (AMISE) between the estimated probability density function (PDF) and the actual PDF. The rules differ in a way to estimate the integration of the squared second order derivative of an unknown PDF (f(·)), identified as the roughness R(f"(·)). The simplest Rule-of-Thumb (ROT) estimates R(f"(·)) with an assumption that the density being estimated is Gaussian. Intuitively, better estimation of R(f"(·)) and consequently better bandwidth selection rules can be derived, if the unknown PDF is approximated through an infinite series expansion based on a more generalized density assumption. As a demonstration and verification to this concept, the ExROT derived in the article uses an extended assumption that the density being estimated is near Gaussian. This helps use of the GCA expansion as an approximation to the unknown near Gaussian PDF. The ExROT for univariate KDE is extended to that for multivariate KDE. The required multivariate AMISE criteria is re-derived using elementary calculus of several variables, instead of Tensor calculus. The derivation uses the Kronecker product and the vector differential operator to achieve the AMISE expression in vector notations. There is also derived ExROT for kernel based density derivative estimator.

06/03/2018

### Bandwidth selection for kernel density estimators of multivariate level sets and highest density regions

We consider bandwidth matrix selection for kernel density estimators (KD...
07/27/2021

### Kernel Density Estimation by Stagewise Algorithm with a Simple Dictionary

This study proposes multivariate kernel density estimation by stagewise ...
02/04/2019

### Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation

Kernel density estimation is a well known method involving a smoothing p...
05/04/2018

### Axiomatic Approach to Variable Kernel Density Estimation

Variable kernel density estimation allows the approximation of a probabi...
02/14/2011

### Dual-Tree Fast Gauss Transforms

Kernel density estimation (KDE) is a popular statistical technique for e...
06/27/2012

### Faster Gaussian Summation: Theory and Experiment

We provide faster algorithms for the problem of Gaussian summation, whic...
02/18/2022

### Multiple combined gamma kernel estimations for nonnegative data with Bayesian adaptive bandwidths

A modified gamma kernel should not be automatically preferred to the sta...

## 1 Introduction

Continuous probability density function (PDF) estimation using kernel methods is widely used in statistics, machine learning and signal processing

(Silverman, 1986). The optimal estimation depends upon the selected kernel function and its spread decided by the smoothing or bandwidth parameter. The selection of kernel has limited impact on optimal PDF estimation, although Epanechnikov kernel is the most optimal kernel (Epanechnikov, 1969). On the other hand, the optimal value of bandwidth parameter avoids either too rough or too smooth estimation of an unknown PDF.

There exist variety of rules for bandwidth selection in KDE. The rules vary based on the criteria to measure accuracy of density estimation and to satisfy the used criteria. The brief survey of data driven bandwidth selectors is provided by Wand and Jones (1994); Park and Marron (1990); Jones and Sheather (1996); Park and Turlach (1992); Sheather (1986b). The Asymptotic Mean Integrated Square Error (AMISE) between the estimated PDF and the actual PDF is the most widely used performance criteria to derive the rules, though there are many others. The AMISE criteria requires estimating the roughness of the squared second order PDF as a prior step to estimate the kernel bandwidth parameter, where the roughness of a function is defined as integration of the squared function. The rules, based on the AMISE as a performance criteria, differ in a way to estimate the . The simplest Rule-of-Thumb (ROT), satisfying the AMISE, by Silverman (1986)

assumes Gaussian distribution for the unknown density. It is not the most optimal bandwidth selector but is used either as a very fast reasonably good estimator or as a first estimator in multistage bandwidth selectors. More precise

solve-the-equation plug-in rules (Sheather, 1983, 1986a) use estimation of integrated squared density derivative functionals to estimate . They demand high computations to solve a non-linear equation using iterative methods. They use ROT as a very first estimate. The fastest -exact approximation algorithm based solve-the-equation plug-in rule (Raykar and Duraiswami, 2005, 2006) requires order of computations, where is number of samples and is the selected number of evaluation points. So, deriving a data dependent bandwidth parameter selection rule for KDE that balances accuracy and computation is the goal of this article.

The article achieves this goal by deriving an Extended Rule-of-Thumb

(ExROT). The assumption about Gaussianity of an unknown PDF in ROT is too restrictive. Expressing an unknown PDF, in terms of an infinite series of higher order statistics, based on a more generalized assumption should result into a better bandwidth selection rule. As a verification and demonstration to this concept, the ExROT extends the Gaussian assumption in ROT to near Gaussian assumption. This facilitates use of cumulants based Gram-Charlier A (GCA) Series expansion as an approximation for the unknown PDF to satisfy the same AMISE criteria. The empirical simulations prove that the ExROT for bandwidth selection is better than the ROT, with respect to an integrated mean square error (IMSE) or MISE performance criteria, for all types of nongaussian unimodal distributions including the skewed, the kurtotic and with outlier distributions. This is achieved with computation comparable to the ROT and too less compare to the

-exact solve-the-equation plug-in rule.

The ExROT for bandwidth selection in univariate KDE is extended to the similar for multivariate KDE and kernel based multivariate density derivative estimation. The ExROT for multivariate KDE requires multivariate expression for AMISE, multivariate Taylor Series expansion, multivariate Hermite polynomials and multivariate GCA Series expansion. The required multivariate AMISE is conventionally derived using gradient and Hessian of the PDF of a random vector (Wand and Jones, 1994). Conventionally, the other required multivariate expressions are also derived using Tensor calculus, as higher order derivatives of a multivariate functions are involved. Often, the corresponding final expressions requires coordinatewise representations. But recently, the higher order cumulants (Terdik, 2002; S. Rao Jammalamadaka and Terdik, May, 2006) and multivariate Hermite polynomials (Terdik, 2002; Holmquist, 1996) are derived using only elementary calculus of several variables. This is achieved by replacing conventional multivariate differentiations by repeated applications of the Kronecker product to vector differential operator. The derived expressions are also more elementary as using vector notations and more comprehensive as apparently more nearer to their counterparts in univariate. The same approach has been used here to derive multivariate AMISE criteria in a vector notations. Overall, the multivariate ExROT is derived using the multivariate Taylor Series, the multivariate cumulants and the multivariate Hermite polynomials derived by Terdik (2002); Holmquist (1996), the multivariate GCA derived by Bhaveshkumar (2015a) and the multivariate AMISE obtained in Section 6 of this article. There is also derived bandwidth selection rule for kernel based density derivative estimation.

The next Section 2 derives the univariate AMISE criteria and gives brief on the existing rules for data driven kernel bandwidth selectors. The Section 3 derives the ExROT. The performance analysis is done using two separate experiments in Section 4. The preliminary background on multivariate KDE, Kronecker product, multivariate Taylor Series and others is briefed in Section 5. A compact derivation for multivariate GGC Series and GCA Series is provided in Section 5.4. Then, the Section 6 derives multivariate AMISE in a vector form using Kronecker Product. The multivariate AMISE and multivariate GCA Series are used to derive ExROT for multivariate KDE in Section 7. Similarly, the Section 8 derives ExROT for density derivative estimation. Finally, the article is concluded in Section 9.

## 2 Bandwidth Selection Criteria and Selectors

Given N realizations of an unknown PDF , the kernel density estimate is given by

 ^f(x)=1NhN∑i=1K(x−xih)=1NN∑i=1Kh(x−xi) (1)

where, is the kernel function and h is the bandwidth parameter. Usually, is a symmetric, positive definite and bounded function; mostly a PDF; satisfying the following properties:

 K(u)≥0, ∫∞−∞K(u)du=1, ∫∞−∞uK(u)Du=0, ∫∞−∞u2K(u)du=μ2(K)<∞

The accuracy of a PDF estimation can be quantified by the available distance measures between PDFs; like; norm based mean integrated absolute measure, norm based mean integrated square error (MISE), Kullback-Libeler divergence and others. The optimal smoothing parameter (the bandwidth) is obtained by minimizing the selected distance measure. The most widely used criteria MISE or IMSE (Integrated Mean Square Error) based bandwidth selection rule, as in (Silverman, 1986; Wand and Jones, 1994), is briefed in the Appendix A. It is given as under:

 MISE(^f(x)) = E{ISE(f(x),^f(x))}=E{∫∞−∞(^f(x)−f(x))2dx} (2) = ∫∞−∞Bias2(^f(x))dx+∫∞−∞Var(^f(x))dx = h44(μ2(K))2R(f′′)+1NhR(K)+O(h4)+O(hN)

where, , and . In general, is identified as the roughness of function . An asymptotic large sample approximation AMISE is obtained, assuming and i.e. h reduces to 0 at a rate slower than .

 AMISE(^f(x))=h44(μ2(K))2R(f′′)+1NhR(K) (3)

The Equation (3) interprets that a small

increases estimation variance, whereas, a larger

increases estimation bias. An optimal minimizing the total is given by,

 ⇒hAMISE = (R(k)μ2(K)2R(f′′)N)15 (4)

Thus, the optimal bandwidth parameter depends upon some of the kernel parameters, number of samples and the second derivative of the actual PDF being estimated.

### 2.1 Brief Survey on the bandwidth selectors

As the bandwidth selection rules vary based on the choice of performance criteria for density estimation, they also vary based on the way the performance criteria is optimized. The various rules, satisfying AMISE, for bandwidth parameter selection differ in the way is estimated. The first group of rules named scale measures give rough estimate of the bandwidth parameter with less computation. It includes Silverman’s Rule-of-Thumb that estimates assuming being Gaussian (Silverman, 1986). For a Gaussian PDF and for a Gaussian kernel . Accordingly,

 hrot=1.0592σN−1/5 (5)

where,

is the standard deviation of

. There are many other rules based on the assumption of other parametric family. There are also rules based on oversmoothed , difference based and others briefed by (Janssen et al., 1995).

An another group of rules is based on the more accurate at high computation plug-in rules. They plug-in the kernel based estimate of the . The direct plug-in rules estimate derivative of the density functionals instead of estimating actual derivatives. Every order derivative functional estimation requires order estimate and pilot bandwidth to start with. Assuming, some parametric density for the order density the pilot bandwidth is selected and cumulatively the bandwidth parameter to estimate is obtained. The solve-the-equation plug-in rules use the same approach but, instead of assuming bandwidth parameter, they optimize it by directly putting it into the AMISE. This requires solving a non-linear equation iteratively. They have better performances at high computation. Other than the rules for bandwidth selection, there are also cross-validation methods selecting the best from a user-defined list of bandwidth parameter based on some performance criteria. But, there is always a compromise between a length of the list for possible bandwidth parameters and the amount of computation.

Over all, a bandwidth selection rule that achieves precise bandwidth parameter at low computation is still open for research.

## 3 Extended Rule-of-Thumb (ExROT) Bandwidth Selector

The Gaussianity assumption for an unknown PDF to estimate is too restrictive. Intuitively, better PDF estimations can be derived if is estimated by approximating PDFs through infinite series expansion. As a verification and demonstration to this concept, ExROT is derived in this section using cumulants based Gram-Charlier A Series expansion of based on near Gaussianity assumption for an unknown PDF . The series is given as:

 f(x)=1√2πσexp[∞∑r=0(kr−γr)(−D)rr!]exp(−12(x−μσ)2)

where, is the cumulant of , is the cumulant of Gaussian PDF (G(x)) and is the derivative operator with respect to x. With , and taking derivative twice with respect to x yields:

 f′′(x)=1√2πσexp[∞∑r=0(kr−γr)(−D)(r+2)r!]exp(−12(x−μσ)2)

Also, the nth derivative of Gaussian is given by

 d(n)G(x;μ,σ)dxn=1σn(−1)nHn(x)G(x;μ,σ)

where, is the Gaussian PDF and is the nth order Hermite polynomial. Accordingly, approximating upto fourth order cumulant

 f′′(x) ≈ 1√2πσexp(−12(x−μσ)2)[1σ2H2(z)−k33!σ5H5(z)+k44!σ6H6(z)] R(f′′) = (6)

The integration is obtained using the following rules:

 ∫∞−∞e−ax2dx =√πa ∫∞−∞x2ne−ax2dx ={(2n−1)!!(2a)n√πa%for$n$even0 for n odd, as the function becomes odd

where, . Accordingly, the quantities in Equation (6) are obtained as under:

 T1 = ∫∞−∞G2(x)[1σ2H2(x)]2dx=1σ538√π T2 = ∫∞−∞G2(x)[1σ5H5(x)]2dx=1σ1194564√π T3 = ∫∞−∞G2(x)[1σ6H6(x)]2dx=1σ1310395128√π T4 = ∫∞−∞G2(x)[2σ7H2(x)H5(x)]dx=0 T5 = ∫∞−∞G2(x)[2σ11H5(x)H6(x)]dx=0 T6 = ∫∞−∞G2(x)[2σ8H2(x)H6(x)]dx=1σ910516√π

This gives:

 R(f′′) = 1σ538√π[1+1σ63158(k36)2+1σ8346516(k424)2+1σ4352k424] R(f′′) = 1σ538√π[1+1.0938k23σ6+0.3764k24σ8+0.7292k4σ4] (7)

Combining above Equation (7) with equation (4), the Gram-Charlier A Series based an Extended Rule-of-Thumb for bandwidth parameter selection using near Gaussian PDF assumption and Gaussian kernel is given as under. As shown, the Silverman’s Rule-of-Thumb is one case of the extended rule.

 hcum = 1.0592σ(CN)−15 (8) where, C = ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩1both k3=K4=0 i.e. Gaussian PDF1+0.3764k24σ8+0.7292k4σ4 if % k3=0 i.e. symmetric PDF,1+1.0938k23+0.3764k24+0.7292k4if σ=1,1+1.0938k23σ6+0.3764k24σ8+0.7292k4σ4otherwise

## 4 Performance Analysis of ExROT Bandwidth Selector

There has been performed two separate experiments to test the performance of bandwidth selector. In both the experiments, the performance is tested on a set of 15 densities selected as a test-bed for density estimators by J. S. Marron (1992). The densities are shown in Figure 1. In both the experiments, the performance of ExROT is compared against Silverman’s Rule-of-Thumb and the -exact approximation algorithm based solve-the-equation plug-in rule (Raykar and Duraiswami, 2005, 2006).

### 4.1 Experiment 1 (Performance against varying PDFs)

The first experiment is done to test the performance against varying PDFs. The experiment is done with 50000 samples and for 100 trials. The results are shown in Table 1. Each table entry is an average of the 100 trials. The selected three rules are compared for performances against three parameters - the value of bandwidth estimated, the corresponding IMSE between the estimated PDF and the theoretical PDF and the time taken in Seconds to estimate the bandwidth. The theoretical PDF, required to calculate IMSE, is obtained from the normal mixture equations. That is why, all the 15 selected densities are generated using normal mixture equations.

The -exact solve-the-equation plug-in rule is the best giving minimum IMSE error in all the cases accept the pure Gaussian density estimation. For Gaussian density, ROT is slightly better than remaining both the rules. The best IMSE performance of -exact solve-the-equation plug-in rule is at the cost of very high computation time. The mean time to estimate the bandwidth parameter for ROT is less than one millisecond. For ExROT it is about 10 to 20 milliseconds and the same for -exact solve-the-equation plug-in rule is about 30 to 60 seconds. That means, the ExROT has time complexity comparable to that of the ROT. So, the IMSE performance of these two needs a comparision. The boldface values for IMSE comparison in Table 1

, show a better between these two. It shows that in all non-Gaussian unimodal density estimation cases - skewed or kurtotic or with outlier - ExROT has outperformed ROT. The worst performance of ExROT in multimodal density estimation is due to wrong estimation of the skewness and kurtosis. The ExROT has also outperformed ROT in some of the - claw and Asymmetric claw - multimodal density estimation cases. Thus, ExROT surely is a better option to ROT in unimodal density estimation.

### 4.2 Experiment 2 (Performance against varying number of samples)

The second experiment is done to have the performance comparision of the same selected three bandwidth estimators against varying number of samples. The results of estimated bandwidth parameter, the IMSE and the estimation time (in Seconds) versus number of samples for varying PDFs are tabulated in Table 2. For better interpretations, the IMSE versus log of the number of samples are plotted; for all 15 PDFs and number of samples varying from 100 to the 50000; as shown in Figure 2.

The IMSE performances against varying number of samples (Nsamples) for varying PDFs are similar to that discussed in Experiment 1. The ExROT performance is better for unimodal skewed, kurtotic or with outlier densities. Even it is better in some cases of multimodal skewed densities. Also, for small number of samples the ExROT performance is better compare to ROT. The convergence performance of ExROT is matching other two rules. The IMSE decreases almost inversely with increase in number of samples.

## 5 Prerequisites for Multivariate ExROT

The derivation for multivariate ExROT requires expressions for multivariate KDE, multivariate Taylor Series, multivariate cumulants, multivariate Hermite polynomials, multivariate GCA Series and multivariate AMISE for bandwidth selection in KDE. Due to the reasons discussed in the Section 1, the article uses the expressions in vector notations. The required pre-requisites are reported in this section and the multivariate AMISE is derived in the next section.

### 5.1 KDE for Multivariate PDF

The nonparametric estimation of a multivariate PDF requires a multivariate kernel. A multivariate kernel can be derived in two ways. The most popular technique is to use a product kernel i.e. a d-dimensional kernel is achieved through the product of 1-dimensional kernels in each direction. The product kernel may have unequal spread or bandwidth in each direction. The other way is to select a spherically symmetric multidimensional PDF as the kernel, called spherical kernel.

Let be a d-dimensional random vector. Let be the d-dimensional kernel. Then, the multivariate PDF of with available N realizations can be given as under:

 ^f(x) =1NN∑i=11det(H)K(H−1(x−xi))=1NN∑i=1KH(x−xi) (9)

where, is the bandwidth matrix, implies the determinant of a matrix and the symbol . In a correlated data, is made proportional to the covariance matrix to compensate for the mutual correlations. Usually, the kernel is a multiplicative kernel given by . Also, it is symmetric, positive definite, zero mean and bounded function; usually a PDF.

Let be the class of all symmetric, positive definite matrices. Then, . A simplification can be achieved, if , where is a set of diagonal matrices. More precisely, . The multivariate PDF with available N realizations is now given as under:

 ^f(x) =1NN∑i=11h1h2…hdK(x1−xi1h1,…,xd−xidhd) (10) =1NN∑i=1{d∏j=1h−1jK(xj−xijhj)} (11)

Further simplification is achieved if , where is a set set of matrices with constant diagonal. More precisely, i.e. equal bandwidth in all directions.

### 5.2 Taylor Series of a multivariate function

Let be a d-dimensional column vector and be the function of several variables differentiable in each variable. The definition and application of Kronecker product on vector derivative operator are briefed in Appendix B. Using them, the Taylor Series in vector notations for by expanding it at origin, is given as under:

 (12)

where, is the vector of dimension and given in terms of the derivative vector as

 k(m,d)=(D⊗mxf(x))\vlinex=0

### 5.3 Moments, Cumulants and Hermite Polynomials

Let be a d-dimensional random vector, be its joint PDF differentiable in each variable. Also, let for , and -

be the Characteristic function,

be the Moment Generating Function (MGF),

be the other form of Characteristic function through natural log of , be the Cumulant Generating Function (CGF). Then, assuming and have been expanded using Taylor Series,

 m(k,d)=∫∞−∞x⊗kf(x)dx=D⊗kλM(λ)\vlineλ=0=(−i)kD⊗kλFx(λ)\vlineλ=0 (13)

Assuming and have been expanded using Taylor Series,

 c(k,d)=D⊗kλC(λ)\vlineλ=0=(−i)kD⊗kλCx(λ)\vlineλ=0 (14)

where, and . The details on the moments, cumulants and their relationships can be found in (Bhaveshkumar, 2015a).

The multivariate Hermite polynomials are defined as under in Equation (15).

 Hk(x;0,Cx) =[G(x;0,Cx)]−1(−1)k(CxDx)⊗kG(x;0,Cx) (15) where, G(x;0,Cx) =|Cx|−1/2(2π)−d/2exp(x′C−1xx)=|Cx|−1/2(2π)−d/2exp((vecC−1x)′x⊗2) (16)

The d-variate Hermite polynomials, derived by Terdik (2002, Proposition 6), are given by the following recursion rule:

 H0(x) =1;H1(x)=x (17) For n>1 Hn(x(1:n)) =Hn−1(x(1:n−1))⊗x −n−1∑j=1K−1P(n,j)→(1,2)(d(1:n))×cn,j(2,d)⊗Hn−2(x(1:j−1,j+1:n−1)) (18)

where, the permutation is putting the element to the first and to the second place, keeping the other elements unchanged. More precisely,

 K−1P(n,j)→(1,2)(d(1:n)) =(KPj+1→2(dn,d(1:n−1))KPn→1(dn,d(1:n−1)))−1 =K−1Pn→1(dn,d(1:n−1))K−1Pj+1→2(dn,d(1:n−1))

### 5.4 A Compact Derivation for Multivariate GGC Series and GCA Series

A full derivation for multivariate GGC and corresponding GCA can be found in Bhaveshkumar (2015a). This is a compact derivation for Generalized Gram-Charlier Series. The detailed proof can be found in the article. Let be the characteristic function of an unknown multivariate PDF . Let be the characteristic function of a reference multivariate PDF . Also, let be the cumulant of d-dimensional and be the cumulant of d-dimensional . The difference of the cumulant vector is given by . Instead of the detailed proof, a compact derivation of GGC follows here.

Let , ; be the Kronecker product operator and

is the Fourier transform operator. Then,

 Fx(λ) =exp[∞∑k=1c(k,d)(iλ)⊗kk!] ( ∵ definition ) (19) =exp[∞∑k=1δ(k,d)(iλ)⊗kk!]exp[∞∑k=1cr(k,d)(iλ)⊗kk!] (20) =[∞∑k=0α(k,d)(iλ)⊗kk!]F(ψ(x)) (21)

Taking inverse Fourier transform of the above equation brings

 f(x) =∞∑k=0α(k,d)(−1)kk!δ(k)(x)∗ψ(x) (22) f(x) =∞∑k=0α(k,d)(−1)kk!ψ(k)(x) (23)

where, indicates convolution. Thus, the GGC is obtained in a compact way.

The can be derived using its equality in terms of as:

 ∞∑k=0α(k,d)(iλ)⊗kk!=exp[∞∑k=1δ(k,d)(iλ)⊗kk!]

This results into the following relations:

 α(0,d)= 1 α(1,d)= δ(1,d) α(2,d)= δ(2,d)+δ(1,d)⊗2 α(3,d)= δ(3,d)+3δ(2,d)⊗δ(1,d)+δ(1,d)⊗3 α(4,d)= δ(4,d)+4δ(3,d)⊗δ(1,d)+3δ(2,d)⊗2+6δ(2,d)⊗δ(1,d)⊗2+δ(1,d)⊗4 α(5,d)= δ(5,d)+5δ(4,d)⊗δ(1,d)+10δ(3,d)⊗δ(2,d)+10δ(3,d)⊗δ(1,d)⊗2 +15δ(2,d)⊗2⊗δ(1,d)+10δ(2,d)⊗δ(1,d)⊗3+δ(1,d)⊗5 α(6,d)= δ(6,d)+6δ(5,d)⊗δ(1,d)+15δ(4,d)⊗δ(2,d)+15δ(4,d)⊗δ(1,d)⊗2+10δ(3,d)⊗2 +60δ(3,d)⊗δ(2,d)⊗δ(1,d)+20δ(3,d)⊗δ(1,d)⊗2+15δ(2,d)⊗3 +45δ(2,d)⊗2⊗δ(1,d)⊗2+15δ(2,d)⊗δ(1,d)⊗4+δ(1,d)⊗6

The multivariate GGC, for specific Gaussian PDF reference and matching first and second order moments, can be obtained by taking , and . The GGC with respect to a Gaussian PDF is identified as GCA Series, given as under:

 f(x) +c(6,d)′+10c(3,d)⊗2′6!G(6)(x)+… (24)

## 6 AMISE in vector notations for bandwidth selection multivariate KDE

The derivation for bandwidth parameter selection, satisfying AMISE between the estimated multivariate PDF and the actual multivariate PDF, can be found in (Wand and Jones, 1994). For the reasons described before, it is re-derived here in terms of vector notations using Kronecker product and differential operator as a column vector .

 MISE(f(x),^f(x)) =∫∞−∞Bias2(^f(x))dx+∫∞−∞Var(^f(x))dx (25)

The bias and variance estimations, using the Taylor Series expansion in Equation (12), are derived as under.

 E{^f(x)} =1NN∑i=1E{KH(x−xi)} =∫∞−∞KH(u−x)f(u)du =∫∞−∞K(z)(f(x)+k(1,d)′(Hz)+12k(2,d)′(Hz)⊗2+O(tr(H⊗2)))dz

where,
Assuming the kernel with symmetric and bounded PDF the following properties are satisfied:

 ∫∞−∞uK(u)du =0 (26) ∫∞−∞u⊗uK(u)du =mK(2,d)=μ2(K)vec(I(d×d)) (27) =μ2(K)δ2where, δ2=vec(I(d×d)) (28)

Here, is the second order moment vector of the d-variate kernel with components and is a vector of size with only selected d values being 1. The second order moment of each component is constant with value and all cross-moments are zero. Using these properties, the bias can be written as:

 ⇒Bias(^f(x))=E{^f(x)}−f(x) ≈12k(2,d)′(H⊗2mK(2,d)) (29)

Similarly, the variance of the estimation can be approximated as under:

 Var(^f(x)) =Var(1NN∑i=1K