Unsupervised Feature Selection Based on the Morisita Estimator of Intrinsic Dimension

08/19/2016
by   Jean Golay, et al.
0

This paper deals with a new filter algorithm for selecting the smallest subset of features carrying all the information content of a data set (i.e. for removing redundant features). It is an advanced version of the fractal dimension reduction technique, and it relies on the recently introduced Morisita estimator of Intrinsic Dimension (ID). Here, the ID is used to quantify dependencies between subsets of features, which allows the effective processing of highly non-linear data. The proposed algorithm is successfully tested on simulated and real world case studies. Different levels of sample size and noise are examined along with the variability of the results. In addition, a comprehensive procedure based on random forests shows that the data dimensionality is significantly reduced by the algorithm without loss of relevant information. And finally, comparisons with benchmark feature selection techniques demonstrate the promising performance of this new filter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2016

Feature Selection for Regression Problems Based on the Morisita Estimator of Intrinsic Dimension

Data acquisition, storage and management have been improved, while the k...
research
06/27/2017

Unsupervised Feature Selection Based on Space Filling Concept

The paper deals with the adaptation of a new measure for the unsupervise...
research
04/05/2023

Selecting Features by their Resilience to the Curse of Dimensionality

Real-world datasets are often of high dimension and effected by the curs...
research
11/08/2017

Dimension Estimation Using Random Connection Models

Information about intrinsic dimension is crucial to perform dimensionali...
research
09/25/2017

Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection

In this paper, we analyze the behavior of the multivariate symmetric unc...
research
09/18/2023

Noise-Augmented Boruta: The Neural Network Perturbation Infusion with Boruta Feature Selection

With the surge in data generation, both vertically (i.e., volume of data...
research
09/29/2022

Feature Selection via the Intervened Interpolative Decomposition and its Application in Diversifying Quantitative Strategies

In this paper, we propose a probabilistic model for computing an interpo...

Please sign up or login with your details

Forgot password? Click here to reset