Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap

by   Ryan Giordano, et al.

Clustering procedures typically estimate which data points are clustered together, a quantity of primary importance in many analyses. Often used as a preliminary step for dimensionality reduction or to facilitate interpretation, finding robust and stable clusters is often crucial for appropriate for downstream analysis. In the present work, we consider Bayesian nonparametric (BNP) models, a particularly popular set of Bayesian models for clustering due to their flexibility. Because of its complexity, the Bayesian posterior often cannot be computed exactly, and approximations must be employed. Mean-field variational Bayes forms a posterior approximation by solving an optimization problem and is widely used due to its speed. An exact BNP posterior might vary dramatically when presented with different data. As such, stability and robustness of the clustering should be assessed. A popular mean to assess stability is to apply the bootstrap by resampling the data, and rerun the clustering for each simulated data set. The time cost is thus often very expensive, especially for the sort of exploratory analysis where clustering is typically used. We propose to use a fast and automatic approximation to the full bootstrap called the "linear bootstrap", which can be seen by local data perturbation. In this work, we demonstrate how to apply this idea to a data analysis pipeline, consisting of an MFVB approximation to a BNP clustering posterior of time course gene expression data. We show that using auto-differentiation tools, the necessary calculations can be done automatically, and that the linear bootstrap is a fast but approximate alternative to the bootstrap.


page 1

page 2

page 3

page 4


Fitting Structural Equation Models via Variational Approximations

Structural equation models are commonly used to capture the structural r...

Fast robustness quantification with variational Bayes

Bayesian hierarchical models are increasing popular in economics. When u...

Approximate Inference via Clustering

In recent years, large-scale Bayesian learning draws a great deal of att...

Bayes Hilbert Spaces for Posterior Approximation

Performing inference in Bayesian models requires sampling algorithms to ...

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assump...

Covariance Matrices for Mean Field Variational Bayes

Mean Field Variational Bayes (MFVB) is a popular posterior approximation...

Mathematical modelling European temperature data: spatial differences in global warming

This paper shows an analysis of the gridded European precipitation data....

Please sign up or login with your details

Forgot password? Click here to reset