Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms

12/17/2017
by   Anna Little, et al.
0

We consider the problem of clustering with the longest leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the number of points mislabeled by the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient approximation algorithm, easy to implement, for the LLPD, based on a multiscale analysis of adjacency graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2017

Spectral Clustering via Graph Filtering: Consistency on the High-Dimensional Stochastic Block Model

Spectral clustering is amongst the most popular methods for community de...
research
11/15/2018

Subspace Clustering through Sub-Clusters

The problem of dimension reduction is of increasing importance in modern...
research
03/12/2020

Bringing in the outliers: A sparse subspace clustering approach to learn a dictionary of mouse ultrasonic vocalizations

Mice vocalize in the ultrasonic range during social interactions. These ...
research
07/23/2008

Data spectroscopy: Eigenspaces of convolution operators and clustering

This paper focuses on obtaining clustering information about a distribut...
research
08/10/2019

Estimation of Spectral Clustering Hyper Parameters

Robust automation of analysis procedures capable of handling diverse dat...
research
10/22/2019

Multiple Sample Clustering

The clustering algorithms that view each object data as a single sample ...
research
09/12/2009

Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

In the context of clustering, we consider a generative model in a Euclid...

Please sign up or login with your details

Forgot password? Click here to reset