Analytic Feature Selection for Support Vector Machines

04/20/2013
by   Carly Stambaugh, et al.
0

Support vector machines (SVMs) rely on the inherent geometry of a data set to classify training data. Because of this, we believe SVMs are an excellent candidate to guide the development of an analytic feature selection algorithm, as opposed to the more commonly used heuristic methods. We propose a filter-based feature selection algorithm based on the inherent geometry of a feature set. Through observation, we identified six geometric properties that differ between optimal and suboptimal feature sets, and have statistically significant correlations to classifier performance. Our algorithm is based on logistic and linear regression models using these six geometric properties as predictor variables. The proposed algorithm achieves excellent results on high dimensional text data sets, with features that can be organized into a handful of feature types; for example, unigrams, bigrams or semantic structural features. We believe this algorithm is a novel and effective approach to solving the feature selection problem for linear SVMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2017

Feature selection algorithm based on Catastrophe model to improve the performance of regression analysis

In this paper we introduce a new feature selection algorithm to remove t...
research
05/23/2019

New methods for SVM feature selection

Support Vector Machines have been a popular topic for quite some time no...
research
06/07/2023

Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection

We present a novel feature selection technique, Sparse Linear Centroid-E...
research
08/03/2023

Bees Local Phase Quantization Feature Selection for RGB-D Facial Expressions Recognition

Feature selection could be defined as an optimization problem and solved...
research
06/22/2016

Automated Extraction of Number of Subjects in Randomised Controlled Trials

We present a simple approach for automatically extracting the number of ...
research
06/11/2023

Efficient Learning of Minimax Risk Classifiers in High Dimensions

High-dimensional data is common in multiple areas, such as health care a...
research
03/02/2019

FRI - Feature Relevance Intervals for Interpretable and Interactive Data Exploration

Most existing feature selection methods are insufficient for analytic pu...

Please sign up or login with your details

Forgot password? Click here to reset