Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

02/28/2018
by   Maria d'Errico, et al.
0

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach for charting data spaces, providing a topography of the probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the "valleys" separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering exploiting a non-parametric density estimator, which automatically measures the density in the manifold containing the data. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.

READ FULL TEXT

page 2

page 5

page 6

research
10/08/2022

An Efficient and Continuous Voronoi Density Estimator

We introduce a non-parametric density estimator deemed Radial Voronoi De...
research
06/15/2020

Occam's Ghost

This article applies the principle of Occam's Razor to non-parametric mo...
research
11/10/2021

A K-function for inhomogeneous random measures with geometric features

This paper introduces a K-function for assessing second-order properties...
research
02/14/2008

FINE: Fisher Information Non-parametric Embedding

We consider the problems of clustering, classification, and visualizatio...
research
01/10/2013

Discovering Multiple Constraints that are Frequently Approximately Satisfied

Some high-dimensional data.sets can be modelled by assuming that there a...
research
06/05/2020

Overcoming the Curse of Dimensionality in Density Estimation with Mixed Sobolev GANs

We propose a novel GAN framework for non-parametric density estimation w...
research
05/14/2010

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Data analysis and data mining are concerned with unsupervised pattern fi...

Please sign up or login with your details

Forgot password? Click here to reset