Entropy regularization in probabilistic clustering

07/19/2023
by   Beatrice Franzolini, et al.
0

Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the form of entropy-regularization of the Bayesian estimate. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific model used.

READ FULL TEXT

page 6

page 13

page 16

page 17

page 19

page 20

page 21

research
03/30/2023

A review on Bayesian model-based clustering

Clustering is an important task in many areas of knowledge: medicine and...
research
05/23/2019

Posterior Distribution for the Number of Clusters in Dirichlet Process Mixture Models

Dirichlet process mixture models (DPMM) play a central role in Bayesian ...
research
09/20/2018

Optimal Bayesian clustering using non-negative matrix factorization

Bayesian model-based clustering is a widely applied procedure for discov...
research
11/09/2019

Estimation of entropy measures for categorical variables with spatial correlation

Entropy is a measure of heterogeneity widely used in applied sciences, o...
research
01/30/2022

Why the Rich Get Richer? On the Balancedness of Random Partition Models

Random partition models are widely used in Bayesian methods for various ...
research
07/12/2021

Cohesion and Repulsion in Bayesian Distance Clustering

Clustering in high-dimensions poses many statistical challenges. While t...
research
01/30/2014

Sparse Bayesian Unsupervised Learning

This paper is about variable selection, clustering and estimation in an ...

Please sign up or login with your details

Forgot password? Click here to reset