Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset

10/06/2022
by   Konstantinos Psychogyios, et al.
0

Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels and according to World Health Organization is the leading cause of death worldwide. EHR data regarding this case, as well as medical cases in general, contain missing values very frequently. The percentage of missingness may vary and is linked with instrument errors, manual data entry procedures, etc. Even though the missing rate is usually significant, in many cases the missing value imputation part is handled poorly either with case-deletion or with simple statistical approaches such as mode and median imputation. These methods are known to introduce significant bias, since they do not account for the relationships between the dataset's variables. Within the medical framework, many datasets consist of lab tests or patient medical tests, where these relationships are present and strong. To address these limitations, in this paper we test and modify state-of-the-art missing value imputation methods based on Generative Adversarial Networks (GANs) and Autoencoders. The evaluation is accomplished for both the tasks of data imputation and post-imputation prediction. Regarding the imputation task, we achieve improvements of 0.20, 7.00 Area Under the Receiver Operating Characteristic Curve (AUROC) respectively. In terms of the post-imputation prediction task, our models outperform the standard approaches by 2.50

READ FULL TEXT
research
12/23/2020

IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks

Missing value imputation is a challenging and well-researched topic in d...
research
05/01/2023

Predicting blood pressure under circumstances of missing data: An analysis of missing data patterns and imputation methods using NHANES

The World Health Organization defines cardio-vascular disease (CVD) as "...
research
11/04/2020

Learning to Rank with Missing Data via Generative Adversarial Networks

We explore the role of Conditional Generative Adversarial Networks (GAN)...
research
05/04/2022

The Effect of Multiple Imputation of Routine Pathology Variables on Laboratory Diagnosis of Hepatitis C Infection

Pathology tests are central to modern healthcare in terms of diagnosis a...
research
01/28/2019

CollaGAN : Collaborative GAN for Missing Image Data Imputation

In many applications requiring multiple inputs to obtain a desired outpu...
research
10/19/2022

EGG-GAE: scalable graph neural networks for tabular data imputation

Missing data imputation (MDI) is crucial when dealing with tabular datas...
research
06/22/2021

Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks

Multiple organ failure (MOF) is a severe syndrome with a high mortality ...

Please sign up or login with your details

Forgot password? Click here to reset