Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help?

04/23/2023
by   Zhi Chen, et al.
0

Missing values are a fundamental problem in data science. Many datasets have missing values that must be properly handled because the way missing values are treated can have large impact on the resulting machine learning model. In medical applications, the consequences may affect healthcare decisions. There are many methods in the literature for dealing with missing values, including state-of-the-art methods which often depend on black-box models for imputation. In this work, we show how recent advances in interpretable machine learning provide a new perspective for understanding and tackling the missing value problem. We propose methods based on high-accuracy glass-box Explainable Boosting Machines (EBMs) that can help users (1) gain new insights on missingness mechanisms and better understand the causes of missingness, and (2) detect – or even alleviate – potential risks introduced by imputation algorithms. Experiments on real-world medical datasets illustrate the effectiveness of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2023

Chains of Autoreplicative Random Forests for missing value imputation in high-dimensional datasets

Missing values are a common problem in data science and machine learning...
research
06/25/2020

ELMV: a Ensemble-Learning Approach for Analyzing Electrical Health Records with Significant Missing Values

Many real-world Electronic Health Record (EHR) data contains a large pro...
research
10/15/2022

Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques

Objective: The proper handling of missing values is critical to deliveri...
research
05/07/2020

Visualisation and knowledge discovery from interpretable models

Increasing number of sectors which affect human lives, are using Machine...
research
06/03/2022

PROMISSING: Pruning Missing Values in Neural Networks

While data are the primary fuel for machine learning models, they often ...
research
12/04/2020

Machine learning with incomplete datasets using multi-objective optimization models

Machine learning techniques have been developed to learn from complete d...
research
06/20/2022

Autoencoder-based Attribute Noise Handling Method for Medical Data

Medical datasets are particularly subject to attribute noise, that is, m...

Please sign up or login with your details

Forgot password? Click here to reset