How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action

01/29/2021
by   Bettina Grün, et al.
0

In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin (2001) compares maximum likelihood and Bayesian analyses of the Galaxy data set and expresses reservations about the Bayesian approach due to the fact that the prior assumptions imposed remain rather obscure while playing a major role in the results obtained and conclusions drawn. The aim of the paper is to address Aitkin's concerns about the Bayesian approach by shedding light on how the specified priors impact on the number of estimated clusters. We perform a sensitivity analysis of different prior specifications for the mixtures of finite mixture model, i.e., the mixture model where a prior on the number of components is included. We use an extensive set of different prior specifications in a full factorial design and assess their impact on the estimated number of clusters for the Galaxy data set. Results highlight the interaction effects of the prior specifications and provide insights into which prior specifications are recommended to obtain a sparse clustering solution. A clear understanding of the impact of the prior specifications removes restraints preventing the use of Bayesian methods due to the complexity of selecting suitable priors. Also, the regularizing properties of the priors may be intentionally exploited to obtain a suitable clustering solution meeting prior expectations and needs of the application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2020

Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis

Mixture models represent the key modelling approach for Bayesian cluster...
research
08/01/2023

Informed Bayesian Finite Mixture Models via Asymmetric Dirichlet Priors

Finite mixture models are flexible methods that are commonly used for mo...
research
10/19/2018

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas....
research
06/19/2016

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

A good clustering can help a data analyst to explore and understand a da...
research
10/16/2012

Unsupervised Joint Alignment and Clustering using Bayesian Nonparametrics

Joint alignment of a collection of functions is the process of independe...
research
09/30/2019

Data-Driven Model Set Design for Model Averaged Particle Filter

This paper is concerned with sequential state filtering in the presence ...
research
07/29/2022

Bayesian nonparametric mixture inconsistency for the number of components: How worried should we be in practice?

We consider the Bayesian mixture of finite mixtures (MFMs) and Dirichlet...

Please sign up or login with your details

Forgot password? Click here to reset