Bayesian Distance Clustering

10/19/2018
by   Leo L. Duan, et al.
0

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness.

READ FULL TEXT
research
07/12/2021

Cohesion and Repulsion in Bayesian Distance Clustering

Clustering in high-dimensions poses many statistical challenges. While t...
research
01/29/2021

How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action

In model-based clustering, the Galaxy data set is often used as a benchm...
research
09/26/2013

Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

Semi-supervised clustering is the task of clustering data points into cl...
research
12/07/2022

A parallelizable model-based approach for marginal and multivariate clustering

This paper develops a clustering method that takes advantage of the stur...
research
11/24/2015

Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

Distance-based hierarchical clustering (HC) methods are widely used in u...
research
07/31/2020

Bayesian Approaches for Flexible and Informative Clustering of Microbiome Data

We propose two unsupervised clustering methods that are designed for hum...
research
09/23/2022

Nonparametric clustering of RNA-sequencing data

Identification of clusters of co-expressed genes in transcriptomic data ...

Please sign up or login with your details

Forgot password? Click here to reset