Conditional expectation for missing data imputation

02/02/2023
by   Mai Anh Vu, et al.
0

Missing data is common in datasets retrieved in various areas, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the method used has a low root mean square error (RMSE) between the imputed and the true values. In addition, for some critical applications, it is also often a requirement that the logic behind the imputation is explainable, which is especially difficult for complex methods that are for example, based on deep learning. This motivates us to introduce a conditional Distribution based Imputation of Missing Values (DIMV) algorithm. This approach works based on finding the conditional distribution of a feature with missing entries based on the fully observed features. As will be illustrated in the paper, DIMV (i) gives a low RMSE for the imputed values compared to state-of-the-art methods under comparison; (ii) is explainable; (iii) can provide an approximated confidence region for the missing values in a given sample; (iv) works for both small and large scale data; (v) in many scenarios, does not require a huge number of parameters as deep learning approaches and therefore can be used for mobile devices or web browsers; and (vi) is robust to the normally distributed assumption that its theoretical grounds rely on. In addition to DIMV, we also introduce the DPER* algorithm improving the speed of DPER for estimating the mean and covariance matrix from the data, and we confirm the speed-up via experiments.

READ FULL TEXT
research
05/30/2022

Principle Components Analysis based frameworks for efficient missing data imputation algorithms

Missing data is a commonly occurring problem in practice, and imputation...
research
06/07/2021

Proper Scoring Rules for Missing Value Imputation

Given the prevalence of missing data in modern statistical research, a b...
research
10/26/2022

Imputation of missing values in multi-view data

When missing values occur in multi-view data, all features in a view are...
research
02/26/2020

SSIM - A Deep Learning Approach for Recovering Missing Time Series Sensor Data

Missing data are unavoidable in wireless sensor networks, due to issues ...
research
02/10/2020

Missing Data Imputation using Optimal Transport

Missing data is a crucial issue when applying machine learning algorithm...
research
02/11/2018

PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction Under Imbalanced Data

The real-time crash likelihood prediction has been an important research...
research
08/05/2018

Missing Value Imputation Based on Deep Generative Models

Missing values widely exist in many real-world datasets, which hinders t...

Please sign up or login with your details

Forgot password? Click here to reset