Sparse Centroid-Encoder: A Nonlinear Model for Feature Selection

01/30/2022
by   Tomojit Ghosh, et al.
0

We develop a sparse optimization problem for the determination of the total set of features that discriminate two or more classes. This is a sparse implementation of the centroid-encoder for nonlinear data reduction and visualization called Sparse Centroid-Encoder (SCE). We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. The algorithm is applied to a wide variety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and speech data. We compared our method to various state-of-the-art feature selection techniques, including two neural network-based models (DFS, and LassoNet), Sparse SVM, and Random Forest. We empirically showed that SCE features produced better classification accuracy on the unseen test data, often with fewer features.

READ FULL TEXT
research
06/07/2023

Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder

We introduce a novel nonlinear model, Sparse Adaptive Bottleneck Centroi...
research
06/07/2023

Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection

We present a novel feature selection technique, Sparse Linear Centroid-E...
research
01/05/2015

Fast forward feature selection for the nonlinear classification of hyperspectral images

A fast forward feature selection algorithm is presented in this paper. I...
research
05/23/2019

forgeNet: A graph deep neural network model using tree-based ensemble classifiers for feature extraction

A unique challenge in predictive model building for omics data has been ...
research
04/21/2020

A Scalable Feature Selection and Opinion Miner Using Whale Optimization Algorithm

Due to the fast-growing volume of text documents and reviews in recent y...
research
10/08/2013

Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

The amount of collected data in many scientific fields is increasing, al...
research
08/14/2016

Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data

Machine learning methods are used to discover complex nonlinear relation...

Please sign up or login with your details

Forgot password? Click here to reset