Feature Selection in High-dimensional Space Using Graph-Based Methods

08/28/2021
by   Swarnadip Ghosh, et al.
0

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We describe an algorithm for selecting informative features in high-dimensional data, where each observation comes from one of K different distributions. Our algorithm can be applied in a completely nonparametric setup without any distributional assumptions on the data, and it aims at outputting those features in the data, that contribute the most to the overall distributional variation. At the heart of our method is the recursive application of distribution-free graph-based tests on subsets of the feature set, located at different depths of a hierarchical clustering tree constructed from the data. Our algorithm recovers all truly contributing features with high probability, while ensuring optimal control on false-discovery. Finally, we show the superior performance of our method over other existing ones through synthetic data, and also demonstrate the utility of the method on a real-life dataset from the domain of climate change.

READ FULL TEXT
research
07/27/2023

Robust graphs for graph-based methods

Graph-based two-sample tests and graph-based change-point detection that...
research
10/27/2022

Clustering High-dimensional Data via Feature Selection

High-dimensional clustering analysis is a challenging problem in statist...
research
03/30/2021

A General Framework of Nonparametric Feature Selection in High-Dimensional Data

Nonparametric feature selection in high-dimensional data is an important...
research
07/23/2023

A Robust Framework for Graph-based Two-Sample Tests Using Weights

Graph-based tests are a class of non-parametric two-sample tests useful ...
research
05/25/2020

Feature Robust Optimal Transport for High-dimensional Data

Optimal transport is a machine learning technique with applications incl...
research
06/01/2017

Statistical Analysis and Parameter Selection for Mapper

In this article, we study the question of the statistical convergence of...
research
07/29/2023

Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature Selection

The complexity of high-dimensional datasets presents significant challen...

Please sign up or login with your details

Forgot password? Click here to reset