Revisiting k-means: New Algorithms via Bayesian Nonparametrics

11/02/2011
by   Brian Kulis, et al.
0

Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.

READ FULL TEXT
research
04/25/2020

Unsupervised K-Means Clustering Algorithm

The k-means algorithm is generally the most known and used clustering me...
research
08/25/2015

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...
research
12/10/2012

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

The classical mixture of Gaussians model is related to K-means via small...
research
07/26/2017

Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models

Bayesian nonparametrics are a class of probabilistic models in which the...
research
05/28/2013

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

This paper presents a novel algorithm, based upon the dependent Dirichle...
research
01/29/2015

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

Bayesian hierarchical clustering (BHC) is an agglomerative clustering me...
research
02/24/2023

Bayesian contiguity constrained clustering, spanning trees and dendrograms

Clustering is a well-known and studied problem, one of its variants, cal...

Please sign up or login with your details

Forgot password? Click here to reset