Data spectroscopy: Eigenspaces of convolution operators and clustering

07/23/2008
by   Tao Shi, et al.
0

This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.

READ FULL TEXT
research
12/17/2017

Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms

We consider the problem of clustering with the longest leg path distance...
research
03/29/2017

Improving Spectral Clustering using the Asymptotic Value of the Normalised Cut

Spectral clustering is a popular and versatile clustering method based o...
research
11/12/2017

Unified Spectral Clustering with Optimal Graph

Spectral clustering has found extensive use in many areas. Most traditio...
research
04/29/2014

The geometry of kernelized spectral clustering

Clustering of data sets is a standard problem in many areas of science a...
research
08/18/2021

Clustering dynamics on graphs: from spectral clustering to mean shift through Fokker-Planck interpolation

In this work we build a unifying framework to interpolate between densit...
research
10/16/2012

A Model-Based Approach to Rounding in Spectral Clustering

In spectral clustering, one defines a similarity matrix for a collection...
research
09/07/2023

Medoid Silhouette clustering with automatic cluster number selection

The evaluation of clustering results is difficult, highly dependent on t...

Please sign up or login with your details

Forgot password? Click here to reset