Pruning nearest neighbor cluster trees

05/03/2011
by   Samory Kpotufe, et al.
0

Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how certain subgraphs of a k-NN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures at all levels of the tree while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2014

A Bayes consistent 1-NN classifier

We show that a simple modification of the 1-nearest neighbor classifier ...
research
06/05/2016

Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

We provide finite-sample analysis of a general framework for using k-nea...
research
12/17/2009

Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

We study clustering algorithms based on neighborhood graphs on a random ...
research
03/28/2016

Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

Estimating entropy and mutual information consistently is important for ...
research
06/19/2015

A general framework for the IT-based clustering methods

Previously, we proposed a physically inspired rule to organize the data ...
research
02/16/2015

Clustering by Descending to the Nearest Neighbor in the Delaunay Graph Space

In our previous works, we proposed a physically-inspired rule to organiz...
research
01/30/2023

Bagging Provides Assumption-free Stability

Bagging is an important technique for stabilizing machine learning model...

Please sign up or login with your details

Forgot password? Click here to reset