Goodness (of fit) of Imputation Accuracy: The GoodImpact Analysis

01/19/2021
by   Maria Thurow, et al.
0

In statistical survey analysis, (partial) non-responders are integral elements during data acquisition. Treating missing values during data preparation and data analysis is therefore a non-trivial underpinning. Focusing on different data sets from the Federal Statistical Office of Germany (DESTATIS), we investigate various imputation methods regarding their imputation accuracy. Since the latter is not uniquely determined in theory and practice, we study different measures for assessing imputation accuracy: Beyond the most common measures, the normalized-root mean squared error (NRMSE) and the proportion of false classification (PFC), we put a special focus on (distribution) distance- and association measures for assessing imputation accuracy. The aim is to deliver guidelines for correctly assessing distributional accuracy after imputation. Our empirical findings indicate a discrepancy between the NRMSE resp. PFC and distance measures. While the latter measure distributional similarities, NRMSE and PFC focus on data reproducibility. We realize that a low NRMSE or PFC seem not to imply lower distributional discrepancies. Although several measures for assessing distributional discrepancies exist, our results indicate that not all of them are suitable for evaluating imputation-induced differences.

READ FULL TEXT
research
06/16/2022

Classification of datasets with imputed missing values: does imputation quality matter?

Classifying samples in incomplete datasets is a common aim for machine l...
research
07/06/2020

Does imputation matter? Benchmark for predictive models

Incomplete data are common in practical applications. Most predictive ma...
research
11/22/2019

Bootstrap Inference for Multiple Imputation under Uncongeniality and Misspecification

Multiple imputation has become one of the most popular approaches for ha...
research
09/09/2021

Evaluation of imputation techniques with varying percentage of missing data

Missing data is a common problem which has consistently plagued statisti...
research
09/22/2022

Multistage Large Segment Imputation Framework Based on Deep Learning and Statistic Metrics

Missing value is a very common and unavoidable problem in sensors, and r...
research
04/23/2020

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by...
research
05/07/2021

The r-value: evaluating stability with respect to distributional shifts

Common statistical measures of uncertainty like p-values and confidence ...

Please sign up or login with your details

Forgot password? Click here to reset