Flexible Variable Selection for Clustering and Classification

05/25/2023
by   Mackenzie R. Neal, et al.
0

The importance of variable selection for clustering has been recognized for some time, and mixture models are well-established as a statistical approach to clustering. Yet, the literature on variable selection in model-based clustering remains largely rooted in the assumption of Gaussian clusters. Unsurprisingly, variable selection algorithms based on this assumption tend to break down in the presence of cluster skewness. A novel variable selection algorithm is presented that utilizes the Manly transformation mixture model to select variables based on their ability to separate clusters, and is effective even when clusters depart from the Gaussian assumption. The proposed approach, which is implemented within the R package vscc, is compared to existing variable selection methods – including an existing method that can account for cluster skewness – using simulated and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2017

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its prob...
research
02/02/2022

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Sparse linear prediction methods suffer from decreased prediction accura...
research
05/09/2019

A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables

Finite mixture model is an important branch of clustering methods and ca...
research
10/12/2018

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for...
research
08/26/2016

Estimating the Number of Clusters via Normalized Cluster Instability

We improve existing instability-based methods for the selection of the n...
research
03/03/2021

PIntMF: Penalized Integrative Matrix Factorization Method for Multi-Omics Data

It is more and more common to explore the genome at diverse levels and n...
research
03/07/2022

Functional Clustering of Neuronal Signals with FMM Mixture Models

The identification of unlabelled neuronal electric signals is one of the...

Please sign up or login with your details

Forgot password? Click here to reset