A parallelizable model-based approach for marginal and multivariate clustering

12/07/2022
by   Miguel de Carvalho, et al.
0

This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins – but leaves the joint unspecified – it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.

READ FULL TEXT

page 4

page 8

page 10

page 12

research
07/21/2023

Longitudinal Data Clustering with a Copula Kernel Mixture Model

Many common clustering methods cannot be used for clustering multivariat...
research
10/03/2020

EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

The Gaussian mixture model (GMM) provides a convenient yet principled fr...
research
01/15/2020

Model-based Clustering for Multivariate Networks

Network data are relational data recorded among a group of individuals, ...
research
10/19/2018

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas....
research
07/12/2021

Cohesion and Repulsion in Bayesian Distance Clustering

Clustering in high-dimensions poses many statistical challenges. While t...
research
04/23/2019

Model based functional clustering of varved lake sediments

In this paper we propose a model-based method for clustering subjects fo...
research
11/17/2020

Defying the Circadian Rhythm: Clustering Participant Telemetry in the UK Biobank Data

The UK Biobank dataset follows over 500,000 volunteers and contains a di...

Please sign up or login with your details

Forgot password? Click here to reset