Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

09/12/2009
by   Ery Arias-Castro, et al.
0

In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2010

Spectral clustering based on local linear approximations

In the context of clustering, we assume a generative model where each cl...
research
01/09/2013

Spectral Clustering Based on Local PCA

We propose a spectral clustering method based on local principal compone...
research
12/17/2017

Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms

We consider the problem of clustering with the longest leg path distance...
research
09/26/2019

An agglomerative hierarchical clustering method by optimizing the average silhouette width

An agglomerative hierarchical clustering (AHC) framework and algorithm n...
research
11/17/2021

On prescribing total preorders and linear orders to pairwise distances of points in Euclidean space

We show that any total preorder on a set with n2 elements coincides with...
research
10/08/2020

Near-Optimal Comparison Based Clustering

The goal of clustering is to group similar objects into meaningful parti...
research
11/21/2019

Local Spectral Clustering of Density Upper Level Sets

We analyze the Personalized PageRank (PPR) algorithm, a local spectral m...

Please sign up or login with your details

Forgot password? Click here to reset