Efficient hierarchical clustering for continuous data

04/20/2012
by   Ricardo Henao, et al.
0

We present an new sequential Monte Carlo sampler for coalescent based Bayesian hierarchical clustering. Our model is appropriate for modeling non-i.i.d. data and offers a substantial reduction of computational cost when compared to the original sampler without resorting to approximations. We also propose a quadratic complexity approximation that in practice shows almost no loss in performance compared to its counterpart. We show that as a byproduct of our formulation, we obtain a greedy algorithm that exhibits performance improvement over other greedy algorithms, particularly in small data sets. In order to exploit the correlation structure of the data, we describe how to incorporate Gaussian process priors in the model as a flexible way to model non-i.i.d. data. Results on artificial and real data show significant improvements over closely related approaches.

READ FULL TEXT
research
07/15/2021

The Taxicab Sampler: MCMC for Discrete Spaces with Application to Tree Models

Motivated by the problem of exploring discrete but very complex state sp...
research
07/13/2018

Sequential sampling of Gaussian process latent variable models

We consider the problem of inferring a latent function in a probabilisti...
research
09/27/2013

Bayesian Inference in Sparse Gaussian Graphical Models

One of the fundamental tasks of science is to find explainable relations...
research
06/18/2015

Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases

For big data analysis, high computational cost for Bayesian methods ofte...
research
07/13/2018

Sequential sampling of Gaussian latent variable models

We consider the problem of inferring a latent function in a probabilisti...
research
11/09/2012

Efficient Monte Carlo Methods for Multi-Dimensional Learning with Classifier Chains

Multi-dimensional classification (MDC) is the supervised learning proble...
research
01/28/2023

ClusterFuG: Clustering Fully connected Graphs by Multicut

We propose a graph clustering formulation based on multicut (a.k.a. weig...

Please sign up or login with your details

Forgot password? Click here to reset