Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features

by   Uri Shaham, et al.

Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported.


page 12

page 13


Let the Data Choose its Features: Differentiable Unsupervised Feature Selection

Scientific observations often consist of a large number of variables (fe...

Spectral Simplicial Theory for Feature Selection and Applications to Genomics

The scale and complexity of modern data sets and the limitations associa...

Discovering Support and Affiliated Features from Very High Dimensions

In this paper, a novel learning paradigm is presented to automatically i...

An Evolutionary Correlation-aware Feature Selection Method for Classification Problems

The population-based optimization algorithms have provided promising res...

An Experiment Design Paradigm using Joint Feature Selection and Task Optimization

This paper presents a subsampling-task paradigm for data-driven task-spe...

Please sign up or login with your details

Forgot password? Click here to reset