Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

by   Gaël Poux-Médard, et al.

One of the most used priors in Bayesian clustering is the Dirichlet prior. It can be expressed as a Chinese Restaurant Process. This process allows nonparametric estimation of the number of clusters when partitioning datasets. Its key feature is the "rich-get-richer" property, which assumes a cluster has an a priori probability to get chosen linearly dependent on population. In this paper, we show that such prior is not always the best choice to model data. We derive the Powered Chinese Restaurant process from a modified version of the Dirichlet-Multinomial distribution to answer this problem. We then develop some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the "rich-get-richer" prior.



There are no comments yet.


page 1

page 2

page 3

page 4


Reducing over-clustering via the powered Chinese restaurant process

Dirichlet process mixture (DPM) models tend to produce many small cluste...

Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

The textual content of a document and its publication date are intertwin...

Evaluating Sensitivity to the Stick Breaking Prior in Bayesian Nonparametrics

A central question in many probabilistic clustering problems is how many...

Simple approximate MAP Inference for Dirichlet processes

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian n...

Flexible clustering via hidden hierarchical Dirichlet priors

The Bayesian approach to inference stands out for naturally allowing bor...

An elementary derivation of the Chinese restaurant process from Sethuraman's stick-breaking process

The Chinese restaurant process and the stick-breaking process are the tw...

Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

In this paper we propose a Bayesian nonparametric model for clustering p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.