Missing Data Imputation for Supervised Learning

10/28/2016
by   Jason Poulos, et al.
0

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different degrees of missing-data perturbation. The results show imputation methods can increase predictive accuracy in the presence of missing-data perturbation. Additionally, we find that for imputed models, missing-data perturbation can improve prediction accuracy by regularizing the classifier.

READ FULL TEXT

page 19

page 20

page 21

page 23

research
04/06/2020

Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems

Dealing with missing data in data analysis is inevitable. Although power...
research
02/08/2023

IRTCI: Item Response Theory for Categorical Imputation

Most datasets suffer from partial or complete missing values, which has ...
research
07/18/2022

Deeply-Learned Generalized Linear Models with Missing Data

Deep Learning (DL) methods have dramatically increased in popularity in ...
research
07/03/2020

Neumann networks: differential programming for supervised learning with missing values

The presence of missing values makes supervised learning much more chall...
research
07/19/2021

A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues

Data quality is a common problem in machine learning, especially in high...
research
04/04/2023

Learning from data with structured missingness

Missing data are an unavoidable complication in many machine learning ta...
research
09/06/2022

Understanding and Reducing Crater Counting Errors in Citizen Science Data and the Need for Standardisation

Citizen science has become a popular tool for preliminary data processin...

Please sign up or login with your details

Forgot password? Click here to reset