Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

08/25/2015
by   Daniel Khashabi, et al.
0

In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDP-means). It can be viewed as an extension of K-means that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2011

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

Bayesian models offer great flexibility for clustering applications---Ba...
research
07/26/2017

Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models

Bayesian nonparametrics are a class of probabilistic models in which the...
research
08/26/2019

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the u...
research
02/14/2023

Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two li...
research
01/29/2015

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

Bayesian hierarchical clustering (BHC) is an agglomerative clustering me...
research
06/09/2020

A generalized Bayes framework for probabilistic clustering

Loss-based clustering methods, such as k-means and its variants, are sta...
research
11/19/2010

Sparse Choice Models

Choice models, which capture popular preferences over objects of interes...

Please sign up or login with your details

Forgot password? Click here to reset