An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

08/08/2017
by   S L Happy, et al.
0

Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2022

Graph Convolutional Network-based Feature Selection for High-dimensional and Low-sample Size Data

Feature selection is a powerful dimension reduction technique which sele...
research
08/27/2020

Feature Selection from High-Dimensional Data with Very Low Sample Size: A Cautionary Tale

In classification problems, the purpose of feature selection is to ident...
research
02/19/2019

An entropic feature selection method in perspective of Turing formula

Health data are generally complex in type and small in sample size. Such...
research
02/26/2023

Data-Centric AI: Deep Generative Differentiable Feature Selection via Discrete Subsetting as Continuous Embedding Space Optimization

Feature Selection (FS), such as filter, wrapper, and embedded methods, a...
research
06/22/2021

Doubly Robust Feature Selection with Mean and Variance Outlier Detection and Oracle Properties

We propose a general approach to handle data contaminations that might d...
research
01/06/2012

The Interaction of Entropy-Based Discretization and Sample Size: An Empirical Study

An empirical investigation of the interaction of sample size and discret...
research
06/21/2023

ProtoGate: Prototype-based Neural Networks with Local Feature Selection for Tabular Biomedical Data

Tabular biomedical data poses challenges in machine learning because it ...

Please sign up or login with your details

Forgot password? Click here to reset