Why the Rich Get Richer? On the Balancedness of Random Partition Models

01/30/2022
by   Changwoo J. Lee, et al.
0

Random partition models are widely used in Bayesian methods for various clustering tasks, such as mixture models, topic models, and community detection problems. While the number of clusters induced by random partition models has been studied extensively, another important model property regarding the balancedness of cluster sizes has been largely neglected. We formulate a framework to define and theoretically study the balancedness of exchangeable random partition models, by analyzing how a model assigns probabilities to partitions with different levels of balancedness. We demonstrate that the "rich-get-richer" characteristic of many existing popular random partition models is an inevitable consequence of two common assumptions: product-form exchangeability and projectivity. We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn't for different applications. We also introduce the "rich-get-poorer" random partition models and illustrate their application to entity resolution tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2020

Random Partition Models for Microclustering Tasks

Traditional Bayesian random partition models assume that the size of eac...
research
03/30/2023

A review on Bayesian model-based clustering

Clustering is an important task in many areas of knowledge: medicine and...
research
06/14/2023

Graph-Aligned Random Partition Model (GARP)

Bayesian nonparametric mixtures and random partition models are powerful...
research
07/19/2023

Entropy regularization in probabilistic clustering

Bayesian nonparametric mixture models are widely used to cluster observa...
research
11/20/2017

Non-exchangeable random partition models for microclustering

Many popular random partition models, such as the Chinese restaurant pro...
research
02/04/2019

A note on the geometry of the MAP partition in some Normal Bayesian Mixture Models

We investigate the geometry of the maximal a posteriori (MAP) partition ...
research
08/27/2018

Creating a surrogate commuter network from Australian Bureau of Statistics census data

Between the 2011 and 2016 national censuses, the Australian Bureau of St...

Please sign up or login with your details

Forgot password? Click here to reset