DeepAI AI Chat
Log In Sign Up

Distributed Bayesian clustering

03/31/2020
by   Hanyu Song, et al.
Duke University
SAS
0

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. We introduce a nearly embarrassingly parallel algorithm using a Bayesian finite mixture of mixtures model for distributed clustering, which we term distributed Bayesian clustering (DIB-C). DIB-C can flexibly accommodate data sets with various shapes (e.g. skewed or multi-modal). With data randomly partitioned and distributed, we first run Markov chain Monte Carlo in an embarrassingly parallel manner to obtain local clustering draws and then refine across nodes for a final cluster estimate based on any loss function on the space of partitions. DIB-C can also provide a posterior predictive distribution, estimate cluster densities, and quickly classify new subjects. Both simulation studies and real data applications show superior performance of DIB-C in terms of robustness and computational efficiency.

READ FULL TEXT

page 7

page 8

page 15

page 16

page 19

03/31/2020

Distributed Bayesian clustering using finite mixture of mixtures

In many modern applications, there is interest in analyzing enormous dat...
06/04/2020

Bayesian clustering of high-dimensional data

In many applications, it is of interest to cluster subjects based on ver...
12/20/2021

Bayesian nonparametric model based clustering with intractable distributions: an ABC approach

Bayesian nonparametric mixture models offer a rich framework for model b...
02/16/2020

Bayesian Spatial Homogeneity Pursuit of Functional Data: an Application to the U.S. Income Distribution

An income distribution describes how an entity's total wealth is distrib...
11/12/2017

Bayesian linear regression models with flexible error distributions

This work introduces a novel methodology based on finite mixtures of Stu...
12/02/2019

Clustering via Ant Colonies: Parameter Analysis and Improvement of the Algorithm

An ant colony optimization approach for partitioning a set of objects is...