Empirical estimation of entropy functionals with confidence

12/19/2010
by   Kumar Sricharan, et al.
0

This paper introduces a class of k-nearest neighbor (k-NN) estimators called bipartite plug-in (BPI) estimators for estimating integrals of non-linear functions of a probability density, such as Shannon entropy and Rényi entropy. The density is assumed to be smooth, have bounded support, and be uniformly bounded from below on this set. Unlike previous k-NN estimators of non-linear density functionals, the proposed estimator uses data-splitting and boundary correction to achieve lower mean square error. Specifically, we assume that T i.i.d. samples X_i ∈R^d from the density are split into two pieces of cardinality M and N respectively, with M samples used for computing a k-nearest-neighbor density estimate and the remaining N samples used for empirical estimation of the integral of the density functional. By studying the statistical properties of k-NN balls, explicit rates for the bias and variance of the BPI estimator are derived in terms of the sample size, the dimension of the samples and the underlying probability distribution. Based on these results, it is possible to specify optimal choice of tuning parameters M/T, k for maximizing the rate of decrease of the mean square error (MSE). The resultant optimized BPI estimator converges faster and achieves lower mean squared error than previous k-NN entropy estimators. In addition, a central limit theorem is established for the BPI estimator that allows us to specify tight asymptotic confidence intervals.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2018

Nearest neighbor density functional estimation based on inverse Laplace transform

A general approach to L_2-consistent estimation of various density funct...
research
09/12/2019

Optimal choice of k for k-nearest neighbor regression

The k-nearest neighbor algorithm (k-NN) is a widely used non-parametric ...
research
02/17/2017

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

We propose a direct estimation method for Rényi and f-divergence measure...
research
10/21/2022

Alternative Mean Square Error Estimators and Confidence Intervals for Prediction of Nonlinear Small Area Parameters

A difficulty in MSE estimation occurs because we do not specify a full d...
research
10/07/2021

Neural Estimation of Statistical Divergences

Statistical divergences (SDs), which quantify the dissimilarity between ...
research
10/31/2017

Rate-optimal Meta Learning of Classification Error

Meta learning of optimal classifier error rates allows an experimenter t...

Please sign up or login with your details

Forgot password? Click here to reset