Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency

08/23/2020
by   Cheng Zeng, et al.
0

In mixture modeling and clustering application, the number of components is often not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such a shrinkage is unsatisfactory: even when the component distribution is correctly specified, small and spurious weights will appear and give an inconsistent estimate on the cluster number. In this article, we propose a simple solution that gains stronger control on the redundant weights – when breaking each stick into two pieces, we adjust the length of the second piece by multiplying it to a quasi-Bernoulli random variable, supported at one and a positive constant close to zero. This substantially increases the chance of shrinking all the redundant weights to almost zero, leading to a consistent estimator on the cluster number; at the same time, it avoids the singularity due to assigning an exactly zero weight, and maintains a support in the infinite-dimensional space. As a stick-breaking model, its posterior computation can be carried out efficiently via the classic blocked Gibbs sampler, allowing straightforward extension of using non-Gaussian components. Compared to existing methods, our model demonstrates much superior performances in the simulations and data application, showing a substantial reduction in the number of clusters.

READ FULL TEXT
research
07/08/2020

Finite mixture models are typically inconsistent for the number of components

Scientists and engineers are often interested in learning the number of ...
research
04/28/2023

PAM: Plaid Atoms Model for Bayesian Nonparametric Analysis of Grouped Data

We consider dependent clustering of observations in groups. The proposed...
research
06/27/2012

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Nonparametric Bayesian approaches to clustering, information retrieval, ...
research
04/05/2023

The transcoding sampler for stick-breaking inferences on Dirichlet process mixtures

An issue of Dirichlet process mixture models is the slow mixing of the M...
research
11/12/2020

MCMC computations for Bayesian mixture models using repulsive point processes

Repulsive mixture models have recently gained popularity for Bayesian cl...
research
10/15/2012

The Kernel Pitman-Yor Process

In this work, we propose the kernel Pitman-Yor process (KPYP) for nonpar...

Please sign up or login with your details

Forgot password? Click here to reset