Sparse GEMINI for Joint Discriminative Clustering and Feature Selection

02/07/2023
by   Louis Ohl, et al.
0

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on p(x), we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple ℓ_1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model p_θ(y|x). We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

READ FULL TEXT
research
03/24/2019

A Strongly Consistent Sparse k-means Clustering with Direct l_1 Penalization on Variable Weights

We propose the Lasso Weighted k-means (LW-k-means) algorithm as a simple...
research
09/04/2019

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult ...
research
02/20/2020

A Scalable Framework for Sparse Clustering Without Shrinkage

Clustering, a fundamental activity in unsupervised learning, is notoriou...
research
03/31/2014

Sparse K-Means with ℓ_∞/ℓ_0 Penalty for High-Dimensional Data Clustering

Sparse clustering, which aims to find a proper partition of an extremely...
research
01/01/2020

Toward Generalized Clustering through an One-Dimensional Approach

After generalizing the concept of clusters to incorporate clusters that ...
research
06/25/2015

CRAFT: ClusteR-specific Assorted Feature selecTion

We present a framework for clustering with cluster-specific feature sele...
research
07/30/2014

Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

Feature selection, identifying a subset of variables that are relevant f...

Please sign up or login with your details

Forgot password? Click here to reset