Selecting Features by their Resilience to the Curse of Dimensionality

04/05/2023
by   Maximilian Stubbemann, et al.
0

Real-world datasets are often of high dimension and effected by the curse of dimensionality. This hinders their comprehensibility and interpretability. To reduce the complexity feature selection aims to identify features that are crucial to learn from said data. While measures of relevance and pairwise similarities are commonly used, the curse of dimensionality is rarely incorporated into the process of selecting features. Here we step in with a novel method that identifies the features that allow to discriminate data subsets of different sizes. By adapting recent work on computing intrinsic dimensionalities, our method is able to select the features that can discriminate data and thus weaken the curse of dimensionality. Our experiments show that our method is competitive and commonly outperforms established feature selection methods. Furthermore, we propose an approximation that allows our method to scale to datasets consisting of millions of data points. Our findings suggest that features that discriminate data and are connected to a low intrinsic dimensionality are meaningful for learning procedures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Review of Swarm Intelligence-based Feature Selection Methods

In the past decades, the rapid growth of computer and database technolog...
research
09/18/2023

Noise-Augmented Boruta: The Neural Network Perturbation Infusion with Boruta Feature Selection

With the surge in data generation, both vertically (i.e., volume of data...
research
08/19/2016

Unsupervised Feature Selection Based on the Morisita Estimator of Intrinsic Dimension

This paper deals with a new filter algorithm for selecting the smallest ...
research
09/05/2021

Scalable Feature Selection for (Multitask) Gradient Boosted Trees

Gradient Boosted Decision Trees (GBDTs) are widely used for building ran...
research
06/12/2018

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

Feature selection is an important challenge in machine learning. It play...
research
09/21/2021

Classification with Nearest Disjoint Centroids

In this paper, we develop a new classification method based on nearest c...
research
09/29/2022

Feature Selection via the Intervened Interpolative Decomposition and its Application in Diversifying Quantitative Strategies

In this paper, we propose a probabilistic model for computing an interpo...

Please sign up or login with your details

Forgot password? Click here to reset