Parallel bi-objective evolutionary algorithms for scalable feature subset selection via migration strategy under Spark

05/19/2022
by   Yelleti Vivek, et al.
0

Feature subset selection (FSS) for classification is inherently a bi-objective optimization problem, where the task is to obtain a feature subset which yields the maximum possible area under the receiver operator characteristic curve (AUC) with minimum cardinality of the feature subset. In todays world, a humungous amount of data is generated in all activities of humans. To mine such voluminous data, which is often high-dimensional, there is a need to develop parallel and scalable frameworks. In the first-of-its-kind study, we propose and develop an iterative MapReduce-based framework for bi-objective evolutionary algorithms (EAs) based wrappers under Apache spark with the migration strategy. In order to accomplish this, we parallelized the non-dominated sorting based algorithms namely non dominated sorting algorithm (NSGA-II), and non-dominated sorting particle swarm optimization (NSPSO), also the decomposition-based algorithm, namely the multi-objective evolutionary algorithm based on decomposition (MOEA-D), and named them P-NSGA-II-IS, P-NSPSO-IS, P-MOEA-D-IS, respectively. We proposed a modified MOEA-D by incorporating the non-dominated sorting principle while parallelizing it. Throughout the study, AUC is computed by logistic regression (LR). We test the effectiveness of the proposed methodology on various datasets. It is noteworthy that the P-NSGA-II turns out to be statistically significant by being in the top 2 positions on most datasets. We also reported the empirical attainment plots, speed up analysis, and mean AUC obtained by the most repeated feature subset and the least cardinal feature subset with the highest AUC, and diversity analysis using hypervolume.

READ FULL TEXT
research
06/26/2021

Scalable Feature Subset Selection for Big Data using Parallel Hybrid Evolutionary Algorithm based Wrapper in Apache Spark

In this paper, we propose a wrapper for feature subset selection (FSS) b...
research
02/08/2022

Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark

Feature subset selection (FSS) using a wrapper approach is essentially a...
research
03/22/2022

Running Time Analysis of the Non-dominated Sorting Genetic Algorithm II (NSGA-II) using Binary or Stochastic Tournament Selection

Evolutionary algorithms (EAs) have been widely used to solve multi-objec...
research
09/17/2018

Merge Non-Dominated Sorting Algorithm for Many-Objective Optimization

Many Pareto-based multi-objective evolutionary algorithms require to ran...
research
04/14/2018

On Asynchronous Non-Dominated Sorting for Steady-State Multiobjective Evolutionary Algorithms

In parallel and distributed environments, generational evolutionary algo...
research
01/05/2017

Subpopulation Diversity Based Selecting Migration Moment in Distributed Evolutionary Algorithms

In distributed evolutionary algorithms, migration interval is used to de...
research
03/25/2022

Rank-based Non-dominated Sorting

Non-dominated sorting is a computational bottleneck in Pareto-based mult...

Please sign up or login with your details

Forgot password? Click here to reset