Multi-objective Feature Selection with Missing Data in Classification

04/18/2021
by   Yu Xue, et al.
0

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a+ bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data

Missing data are a concern in many real world data sets and imputation m...
research
09/23/2020

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

In this paper, we propose Ensemble Learning models to identify factors c...
research
12/04/2020

Machine learning with incomplete datasets using multi-objective optimization models

Machine learning techniques have been developed to learn from complete d...
research
03/19/2023

Generative Adversarial Classification Network with Application to Network Traffic Classification

Large datasets in machine learning often contain missing data, which nec...
research
07/05/2022

Data Integrity Error Localization in Networked Systems with Missing Data

Most recent network failure diagnosis systems focused on data center net...
research
04/04/2023

Learning from data with structured missingness

Missing data are an unavoidable complication in many machine learning ta...
research
06/06/2021

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decade...

Please sign up or login with your details

Forgot password? Click here to reset