Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

06/19/2016
by   Akash Srivastava, et al.
0

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when they see one. We present a new approach to interactive clustering for data exploration called TINDER, based on a particularly simple feedback mechanism, in which an analyst can reject a given clustering and request a new one, which is chosen to be different from the previous clustering while fitting the data well. We formalize this interaction in a Bayesian framework as a method for prior elicitation, in which each different clustering is produced by a prior distribution that is modified to discourage previously rejected clusterings. We show that TINDER successfully produces a diverse set of clusterings, each of equivalent quality, that are much more diverse than would be obtained by randomized restarts.

READ FULL TEXT
research
07/30/2021

Distribution free optimality intervals for clustering

We address the problem of validating the ouput of clustering algorithms....
research
01/29/2021

How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action

In model-based clustering, the Galaxy data set is often used as a benchm...
research
04/20/2022

Clustering of football players based on performance data and aggregated clustering validity indexes

We analyse football (soccer) player performance data with mixed type var...
research
06/03/2022

Interactive Exploration of Large Dendrograms with Prototypes

Hierarchical clustering is one of the standard methods taught for identi...
research
06/18/2020

Guarantees for Hierarchical Clustering by the Sublevel Set method

Meila (2018) introduces an optimization based method called the Sublevel...
research
03/14/2021

Pandemonium: a clustering tool to partition parameter space – application to the B anomalies

We introduce the interactive tool pandemonium to cluster model predictio...
research
10/08/2020

Clustering Analysis of Interactive Learning Activities Based on Improved BIRCH Algorithm

Group tendency is a research branch of computer assisted learning. The c...

Please sign up or login with your details

Forgot password? Click here to reset