Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting

04/27/2012
by   José E. Chacón, et al.
0

Important information concerning a multivariate data set, such as clusters and modal regions, is contained in the derivatives of the probability density function. Despite this importance, nonparametric estimation of higher order derivatives of the density functions have received only relatively scant attention. Kernel estimators of density functions are widely used as they exhibit excellent theoretical and practical properties, though their generalization to density derivatives has progressed more slowly due to the mathematical intractabilities encountered in the crucial problem of bandwidth (or smoothing parameter) selection. This paper presents the first fully automatic, data-based bandwidth selectors for multivariate kernel density derivative estimators. This is achieved by synthesizing recent advances in matrix analytic theory which allow mathematically and computationally tractable representations of higher order derivatives of multivariate vector valued functions. The theoretical asymptotic properties as well as the finite sample behaviour of the proposed selectors are studied. In addition, we explore in detail the applications of the new data-driven methods for two other statistical problems: clustering and bump hunting. The introduced techniques are combined with the mean shift algorithm to develop novel automatic, nonparametric clustering procedures which are shown to outperform mixture-model cluster analysis and other recent nonparametric approaches in practice. Furthermore, the advantage of the use of smoothing parameters designed for density derivative estimation for feature significance analysis for bump hunting is illustrated with a real data example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Kernel Estimator and Bandwidth Selection for Density and its Derivatives: The kedd Package

The kedd package providing additional smoothing techniques to the R stat...
research
07/29/2020

Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences

Kernel methods are powerful machine learning techniques which implement ...
research
10/26/2020

Modal clustering of matrix-variate data

The nonparametric formulation of density-based clustering, known as moda...
research
12/31/2022

Higher-order Refinements of Small Bandwidth Asymptotics for Density-Weighted Average Derivative Estimators

The density weighted average derivative (DWAD) of a regression function ...
research
11/18/2022

A reliable data-based smoothing parameter selection method for circular kernel estimation

A new data-based smoothing parameter for circular kernel density (and it...
research
08/06/2014

A Population Background for Nonparametric Density-Based Clustering

Despite its popularity, it is widely recognized that the investigation o...
research
04/06/2021

Nonparametric needlet estimation for partial derivatives of a probability density function on the d-torus

This paper is concerned with the estimation of the partial derivatives o...

Please sign up or login with your details

Forgot password? Click here to reset