DeepAI AI Chat
Log In Sign Up

Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency

by   Cheng Zeng, et al.

In mixture modeling and clustering application, the number of components is often not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such a shrinkage is unsatisfactory: even when the component distribution is correctly specified, small and spurious weights will appear and give an inconsistent estimate on the cluster number. In this article, we propose a simple solution that gains stronger control on the redundant weights – when breaking each stick into two pieces, we adjust the length of the second piece by multiplying it to a quasi-Bernoulli random variable, supported at one and a positive constant close to zero. This substantially increases the chance of shrinking all the redundant weights to almost zero, leading to a consistent estimator on the cluster number; at the same time, it avoids the singularity due to assigning an exactly zero weight, and maintains a support in the infinite-dimensional space. As a stick-breaking model, its posterior computation can be carried out efficiently via the classic blocked Gibbs sampler, allowing straightforward extension of using non-Gaussian components. Compared to existing methods, our model demonstrates much superior performances in the simulations and data application, showing a substantial reduction in the number of clusters.


Finite mixture models are typically inconsistent for the number of components

Scientists and engineers are often interested in learning the number of ...

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Nonparametric Bayesian approaches to clustering, information retrieval, ...

MCMC computations for Bayesian mixture models using repulsive point processes

Repulsive mixture models have recently gained popularity for Bayesian cl...

Bayesian clustering of multiple zero-inflated outcomes

Several applications involving counts present a large proportion of zero...

Tree stick-breaking priors for covariate-dependent mixture models

Stick-breaking priors are often adopted in Bayesian nonparametric mixtur...

The Kernel Pitman-Yor Process

In this work, we propose the kernel Pitman-Yor process (KPYP) for nonpar...

Bayesian Eigenvalue Regularization via Cumulative Shrinkage Process

This study proposes a novel hierarchical prior for inferring possibly lo...