Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

02/26/2020
by   Etienne Côme, et al.
0

In this paper, we introduce a two step methodology to extract a hierarchical clustering. This methodology considers the integrated classification likelihood criterion as an objective function, and applies to any discrete latent variable models (DLVM) where this quantity is tractable. The first step of the methodology involves maximizing the criterion with respect to the discrete latent variables state with uninformative priors. To that end we propose a new hybrid algorithm based on greedy local searches as well as a genetic algorithm which allows the joint inference of the number K of clusters and of the clusters themselves. The second step of the methodology is based on a bottom-up greedy procedure to extract a hierarchy of clusters from this natural partition. In a Bayesian context, this is achieved by considering the Dirichlet cluster proportion prior parameter α as a regularisation term controlling the granularity of the clustering. This second step allows the exploration of the clustering at coarser scales and the ordering of the clusters an important output for the visual representations of the clustering results. The clustering results obtained with the proposed approach, on simulated as well as real settings, are compared with existing strategies and are shown to be particularly relevant. This work is implemented in the R package greed.

READ FULL TEXT

page 17

page 21

research
04/29/2022

greed: An R Package for Model-Based Clustering by Greedy Maximization of the Integrated Classification Likelihood

The greed package implements the general and flexible framework of arXiv...
research
08/18/2020

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...
research
11/12/2020

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assump...
research
08/17/2020

Computation for Latent Variable Model Estimation: A Unified Stochastic Proximal Framework

Latent variable models have been playing a central role in psychometrics...
research
05/22/2023

funLOCI: a local clustering algorithm for functional data

Nowadays, more and more problems are dealing with data with one infinite...
research
05/09/2016

Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

The stochastic block model (SBM) is a flexible probabilistic tool that c...
research
08/03/2016

Improving Quality of Hierarchical Clustering for Large Data Series

Brown clustering is a hard, hierarchical, bottom-up clustering of words ...

Please sign up or login with your details

Forgot password? Click here to reset