ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

04/08/2013
by   Dan Lovell, et al.
0

The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous approaches, our technique does not require alteration of the model and leaves the true posterior distribution invariant. It also naturally lends itself to a distributed software implementation in terms of Map-Reduce, which we test in cluster configurations of over 50 machines and 100 cores. We present experiments exploring the parallel efficiency and convergence properties of our approach on both synthetic and real-world data, including runs on 1MM data vectors in 256 dimensions.

READ FULL TEXT
research
11/27/2020

Comparison of Bayesian Nonparametric Density Estimation Methods

In this paper, we propose a nonparametric Bayesian approach for Lindsey ...
research
01/08/2012

A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

The hierarchical Dirichlet process (HDP) has become an important Bayesia...
research
06/18/2022

IID Sampling from Posterior Dirichlet Process Mixtures

The influence of Dirichlet process mixture is ubiquitous in the Bayesian...
research
09/22/2021

Bayesian Nonparametric Modelling of Conditional Multidimensional Dependence Structures

In recent years, conditional copulas, that allow dependence between vari...
research
02/08/2021

Learning-augmented count-min sketches via Bayesian nonparametrics

The count-min sketch (CMS) is a time and memory efficient randomized dat...
research
10/12/2018

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for...
research
09/07/2018

Dirichlet process mixtures under affine transformations of the data

Location-scale Dirichlet process mixtures of Gaussians (DPM-G) have prov...

Please sign up or login with your details

Forgot password? Click here to reset