DeepAI AI Chat
Log In Sign Up

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

by   Etienne Côme, et al.

In this paper, we introduce a two step methodology to extract a hierarchical clustering. This methodology considers the integrated classification likelihood criterion as an objective function, and applies to any discrete latent variable models (DLVM) where this quantity is tractable. The first step of the methodology involves maximizing the criterion with respect to the discrete latent variables state with uninformative priors. To that end we propose a new hybrid algorithm based on greedy local searches as well as a genetic algorithm which allows the joint inference of the number K of clusters and of the clusters themselves. The second step of the methodology is based on a bottom-up greedy procedure to extract a hierarchy of clusters from this natural partition. In a Bayesian context, this is achieved by considering the Dirichlet cluster proportion prior parameter α as a regularisation term controlling the granularity of the clustering. This second step allows the exploration of the clustering at coarser scales and the ordering of the clusters an important output for the visual representations of the clustering results. The clustering results obtained with the proposed approach, on simulated as well as real settings, are compared with existing strategies and are shown to be particularly relevant. This work is implemented in the R package greed.


page 17

page 21


greed: An R Package for Model-Based Clustering by Greedy Maximization of the Integrated Classification Likelihood

The greed package implements the general and flexible framework of arXiv...

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assump...

Computation for Latent Variable Model Estimation: A Unified Stochastic Proximal Framework

Latent variable models have been playing a central role in psychometrics...

funLOCI: a local clustering algorithm for functional data

Nowadays, more and more problems are dealing with data with one infinite...

Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

The stochastic block model (SBM) is a flexible probabilistic tool that c...

Improving Quality of Hierarchical Clustering for Large Data Series

Brown clustering is a hard, hierarchical, bottom-up clustering of words ...