Distributed Bayesian clustering

03/31/2020
by   Hanyu Song, et al.
0

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. We introduce a nearly embarrassingly parallel algorithm using a Bayesian finite mixture of mixtures model for distributed clustering, which we term distributed Bayesian clustering (DIB-C). DIB-C can flexibly accommodate data sets with various shapes (e.g. skewed or multi-modal). With data randomly partitioned and distributed, we first run Markov chain Monte Carlo in an embarrassingly parallel manner to obtain local clustering draws and then refine across nodes for a final cluster estimate based on any loss function on the space of partitions. DIB-C can also provide a posterior predictive distribution, estimate cluster densities, and quickly classify new subjects. Both simulation studies and real data applications show superior performance of DIB-C in terms of robustness and computational efficiency.

READ FULL TEXT

page 7

page 8

page 15

page 16

page 19

research
03/31/2020

Distributed Bayesian clustering using finite mixture of mixtures

In many modern applications, there is interest in analyzing enormous dat...
research
06/04/2020

Bayesian clustering of high-dimensional data

In many applications, it is of interest to cluster subjects based on ver...
research
12/20/2021

Bayesian nonparametric model based clustering with intractable distributions: an ABC approach

Bayesian nonparametric mixture models offer a rich framework for model b...
research
03/31/2023

Bayesian Clustering via Fusing of Localized Densities

Bayesian clustering typically relies on mixture models, with each compon...
research
02/16/2019

Model fitting in Multiple Systems Analysis for the quantification of Modern Slavery: Classical and Bayesian approaches

Multiple Systems Estimation is a key estimation approach for hidden popu...
research
02/16/2020

Bayesian Spatial Homogeneity Pursuit of Functional Data: an Application to the U.S. Income Distribution

An income distribution describes how an entity's total wealth is distrib...
research
12/02/2019

Clustering via Ant Colonies: Parameter Analysis and Improvement of the Algorithm

An ant colony optimization approach for partitioning a set of objects is...

Please sign up or login with your details

Forgot password? Click here to reset