Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency

08/23/2020 ∙ by Cheng Zeng, et al. ∙ 0

In mixture modeling and clustering application, the number of components is often not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such a shrinkage is unsatisfactory: even when the component distribution is correctly specified, small and spurious weights will appear and give an inconsistent estimate on the cluster number. In this article, we propose a simple solution that gains stronger control on the redundant weights – when breaking each stick into two pieces, we adjust the length of the second piece by multiplying it to a quasi-Bernoulli random variable, supported at one and a positive constant close to zero. This substantially increases the chance of shrinking all the redundant weights to almost zero, leading to a consistent estimator on the cluster number; at the same time, it avoids the singularity due to assigning an exactly zero weight, and maintains a support in the infinite-dimensional space. As a stick-breaking model, its posterior computation can be carried out efficiently via the classic blocked Gibbs sampler, allowing straightforward extension of using non-Gaussian components. Compared to existing methods, our model demonstrates much superior performances in the simulations and data application, showing a substantial reduction in the number of clusters.



There are no comments yet.


page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.