The geometry of kernelized spectral clustering

04/29/2014
by   Geoffrey Schiebinger, et al.
0

Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2019

Geometric structure of graph Laplacian embeddings

We analyze the spectral clustering procedure for identifying coarse stru...
research
07/07/2016

Mini-Batch Spectral Clustering

The cost of computing the spectrum of Laplacian matrices hinders the app...
research
05/11/2023

Spectral Clustering on Large Datasets: When Does it Work? Theory from Continuous Clustering and Density Cheeger-Buser

Spectral clustering is one of the most popular clustering algorithms tha...
research
04/25/2023

On Uniform Consistency of Spectral Embeddings

In this paper, we study the convergence of the spectral embeddings obtai...
research
04/20/2020

Weighted Cheeger and Buser Inequalities, with Applications to Clustering and Cutting Probability Densities

In this paper, we show how sparse or isoperimetric cuts of a probability...
research
04/17/2017

Mixture modeling on related samples by ψ-stick breaking and kernel perturbation

There has been great interest recently in applying nonparametric kernel ...
research
07/23/2008

Data spectroscopy: Eigenspaces of convolution operators and clustering

This paper focuses on obtaining clustering information about a distribut...

Please sign up or login with your details

Forgot password? Click here to reset