Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study

by   Federico Zocco, et al.

Dimensionality reduction is a important step in the development of scalable and interpretable data-driven models, especially when there are a large number of candidate variables. This paper focuses on unsupervised variable selection based dimensionality reduction, and in particular on unsupervised greedy selection methods, which have been proposed by various researchers as computationally tractable approximations to optimal subset selection. These methods are largely distinguished from each other by the selection criterion adopted, which include squared correlation, variance explained, mutual information and frame potential. Motivated by the absence in the literature of a systematic comparison of these different methods, we present a critical evaluation of seven unsupervised greedy variable selection algorithms considering both simulated and real world case studies. We also review the theoretical results that provide performance guarantees and enable efficient implementations for certain classes of greedy selection function, related to the concept of submodularity. Furthermore, we introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm. Our experimental results show that: (1) variance explained and mutual information based selection methods yield smaller approximation errors than frame potential; (2) the lazy FSCA implementation has similar performance to FSCA, while being an order of magnitude faster to compute, making it the algorithm of choice for unsupervised variable selection.


page 1

page 2

page 3

page 4


A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection

An analysis of high dimensional data can offer a detailed description of...

Recovery of Linear Components: Reduced Complexity Autoencoder Designs

Reducing dimensionality is a key preprocessing step in many data analysi...

Feature selection in functional data classification with recursive maxima hunting

Dimensionality reduction is one of the key issues in the design of effec...

DiscoVars: A New Data Analysis Perspective – Application in Variable Selection for Clustering

We present a new data analysis perspective to determine variable importa...

Pruning variable selection ensembles

In the context of variable selection, ensemble learning has gained incre...

Lazy Greedy Hypervolume Subset Selection from Large Candidate Solution Sets

Subset selection is a popular topic in recent years and a number of subs...

Experts in the Loop: Conditional Variable Selection for Accelerating Post-Silicon Analysis Based on Deep Learning

Post-silicon validation is one of the most critical processes in modern ...

Please sign up or login with your details

Forgot password? Click here to reset