Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

04/26/2021
by   Gaël Poux-Médard, et al.
0

One of the most used priors in Bayesian clustering is the Dirichlet prior. It can be expressed as a Chinese Restaurant Process. This process allows nonparametric estimation of the number of clusters when partitioning datasets. Its key feature is the "rich-get-richer" property, which assumes a cluster has an a priori probability to get chosen linearly dependent on population. In this paper, we show that such prior is not always the best choice to model data. We derive the Powered Chinese Restaurant process from a modified version of the Dirichlet-Multinomial distribution to answer this problem. We then develop some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the "rich-get-richer" prior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2018

Reducing over-clustering via the powered Chinese restaurant process

Dirichlet process mixture (DPM) models tend to produce many small cluste...
research
08/01/2023

Informed Bayesian Finite Mixture Models via Asymmetric Dirichlet Priors

Finite mixture models are flexible methods that are commonly used for mo...
research
11/04/2014

Simple approximate MAP Inference for Dirichlet processes

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian n...
research
10/15/2018

Evaluating Sensitivity to the Stick Breaking Prior in Bayesian Nonparametrics

A central question in many probabilistic clustering problems is how many...
research
09/15/2021

Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

The textual content of a document and its publication date are intertwin...
research
09/06/2022

Fast Generation of Exchangeable Sequence of Clusters Data

Recent advances in Bayesian models for random partitions have led to the...
research
08/03/2023

Similarity-based Random Partition Distribution for Clustering Functional Data

Random partitioned distribution is a powerful tool for model-based clust...

Please sign up or login with your details

Forgot password? Click here to reset