Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark

02/08/2022
by   Yelleti Vivek, et al.
0

Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding area under the ROC curve (AUC) to be maximized. In this research study, we propose a novel multiplicative single objective function involving cardinality and AUC. The randomness involved in the Binary Differential Evolution (BDE) may yield less diverse solutions thereby getting trapped in local minima. Hence, we embed Logistic and Tent chaotic maps into the BDE and named it as Chaotic Binary Differential Evolution (CBDE). Designing a scalable solution to the FSS is critical when dealing with high-dimensional and voluminous datasets. Hence, we propose a scalable island (iS) based parallelization approach where the data is divided into multiple partitions/islands thereby the solution evolves individually and gets combined eventually in a migration strategy. The results empirically show that the proposed parallel Chaotic Binary Differential Evolution (P-CBDE-iS) is able to find the better quality feature subsets than the Parallel Bi-nary Differential Evolution (P-BDE-iS). Logistic Regression (LR) is used as a classifier owing to its simplicity and power. The speedup attained by the proposed parallel approach signifies the importance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2021

Scalable Feature Subset Selection for Big Data using Parallel Hybrid Evolutionary Algorithm based Wrapper in Apache Spark

In this paper, we propose a wrapper for feature subset selection (FSS) b...
research
05/19/2022

Parallel bi-objective evolutionary algorithms for scalable feature subset selection via migration strategy under Spark

Feature subset selection (FSS) for classification is inherently a bi-obj...
research
04/25/2011

An inflationary differential evolution algorithm for space trajectory optimization

In this paper we define a discrete dynamical system that governs the evo...
research
05/07/2018

Classifying Big Data over Networks via the Logistic Network Lasso

We apply network Lasso to solve binary classification (clustering) probl...
research
08/21/2022

Scalable mRMR feature selection to handle high dimensional datasets: Vertical partitioning based Iterative MapReduce framework

While building machine learning models, Feature selection (FS) stands ou...
research
01/15/2018

Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC

In Divide & Recombine (D&R), big data are divided into subsets, each ana...
research
09/07/2022

Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark

Wavelet neural networks (WNN) have been applied in many fields to solve ...

Please sign up or login with your details

Forgot password? Click here to reset