Parallel feature selection based on the trace ratio criterion

03/03/2022
by   Thu Nguyen, et al.
0

The growth of data today poses a challenge in management and inference. While feature extraction methods are capable of reducing the size of the data for inference, they do not help in minimizing the cost of data storage. On the other hand, feature selection helps to remove the redundant features and therefore is helpful not only in inference but also in reducing management costs. This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST), which scales up to very large datasets. Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. We analyzed the criterion's desirable properties theoretically. Based on the criterion, PFST rapidly finds important features out of a set of features for big datasets by first making a forward selection with early removal of seemingly redundant features parallelly. After the most important features are included in the model, we check back their contribution for possible interaction that may improve the fit. Lastly, we make a backward selection to check back possible redundant added by the forward steps. We evaluate our methods via various experiments using Linear Discriminant Analysis as the classifier on selected features. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison. In addition, the classifier trained on the features selected by PFST not only achieves better accuracy than the ones chosen by other approaches but can also achieve better accuracy than the classification on all available features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2019

Feature Selection for multi-labeled variables via Dependency Maximization

Feature selection and reducing the dimensionality of data is an essentia...
research
10/17/2019

Dropping forward-backward algorithms for feature selection

In this era of big data, feature selection techniques, which have long b...
research
06/27/2015

A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data

Feature selection is among the most important components because it not ...
research
10/30/2014

Training for Fast Sequential Prediction Using Dynamic Feature Selection

We present paired learning and inference algorithms for significantly re...
research
08/23/2017

Massively-Parallel Feature Selection for Big Data

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm ...
research
07/04/2018

Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data

We introduce a new method of performing high dimensional discriminant an...
research
03/06/2023

Video traffic identification with novel feature extraction and selection method

In recent years, the rapid rise of video applications has led to an expl...

Please sign up or login with your details

Forgot password? Click here to reset