Classification of datasets with imputed missing values: does imputation quality matter?

06/16/2022
by   Tolou Shadbahr, et al.
29

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete, imputed, samples. The focus of the machine learning researcher is then to optimise the downstream classification performance. In this study, we highlight that it is imperative to consider the quality of the imputation. We demonstrate how the commonly used measures for assessing quality are flawed and propose a new class of discrepancy scores which focus on how well the method recreates the overall distribution of the data. To conclude, we highlight the compromised interpretability of classifier models trained using poorly imputed data.

READ FULL TEXT

page 9

page 31

page 32

page 33

page 34

page 35

page 36

page 37

research
05/22/2019

Generative Imputation and Stochastic Prediction

In many machine learning applications, we are faced with incomplete data...
research
06/03/2022

PROMISSING: Pruning Missing Values in Neural Networks

While data are the primary fuel for machine learning models, they often ...
research
01/19/2021

Goodness (of fit) of Imputation Accuracy: The GoodImpact Analysis

In statistical survey analysis, (partial) non-responders are integral el...
research
10/29/2021

Quality control, data cleaning, imputation

This chapter addresses important steps during the quality assurance and ...
research
03/30/2020

Imputation of missing sub-hourly precipitation data in a large sensor network: a machine learning approach

Precipitation data from rain gauges is fundamental across many lines of ...
research
11/28/2020

Learning from Incomplete Data by Simultaneous Training of Neural Networks and Sparse Coding

Handling correctly incomplete datasets in machine learning is a fundamen...
research
11/19/2020

Preparing Weather Data for Real-Time Building Energy Simulation

This study introduces a framework for quality control of measured weathe...

Please sign up or login with your details

Forgot password? Click here to reset