Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions

Robust clustering of high-dimensional data is an important topic because, in many practical situations, real data sets are heavy-tailed and/or asymmetric. Moreover, traditional model-based clustering often fails for high dimensional data due to the number of free covariance parameters. A parametrization of the component scale matrices for the mixture of generalized hyperbolic distributions is proposed by including a penalty term in the likelihood constraining the parameters resulting in a flexible model for high dimensional data and a meaningful interpretation. An analytically feasible EM algorithm is developed by placing a gamma-Lasso penalty constraining the concentration matrix. The proposed methodology is investigated through simulation studies and two real data sets.

READ FULL TEXT

page 11

page 12

page 13

research
11/01/2022

A Bayesian Framework on Asymmetric Mixture of Factor Analyser

Mixture of factor analyzer (MFA) model is an efficient model for the ana...
research
08/26/2022

High-dimensional sparse vine copula regression with application to genomic prediction

High-dimensional data sets are often available in genome-enabled predict...
research
02/18/2019

Going deep in clustering high-dimensional data: deep mixtures of unigrams for uncovering topics in textual data

Mixtures of Unigrams (Nigam et al., 2000) are one of the simplest and mo...
research
07/08/2020

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

We study two practically important cases of model based clustering using...
research
01/10/2013

Discovering Multiple Constraints that are Frequently Approximately Satisfied

Some high-dimensional data.sets can be modelled by assuming that there a...
research
07/20/2023

Sparse model-based clustering of three-way data via lasso-type penalties

Mixtures of matrix Gaussian distributions provide a probabilistic framew...
research
07/07/2021

Bayesian model-based clustering for multiple network data

There is increasing appetite for analysing multiple network data. This i...

Please sign up or login with your details

Forgot password? Click here to reset