Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

06/29/2023
by   Florian Lalande, et al.
0

Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the kNN×KDE algorithm: a data imputation method combining nearest neighbor estimation (kNN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde

READ FULL TEXT

page 15

page 16

page 17

page 18

research
04/21/2009

Using Association Rules for Better Treatment of Missing Values

The quality of training data for knowledge discovery in databases (KDD) ...
research
01/07/2021

Distances with mixed type variables some modified Gower's coefficients

Nearest neighbor methods have become popular in official statistics, mai...
research
10/03/2022

Generating Synthetic Data with The Nearest Neighbors Algorithm

The k nearest neighbor algorithm (kNN) is one of the most popular nonpar...
research
02/08/2016

Adaptive imputation of missing values for incomplete pattern classification

In classification of incomplete pattern, the missing values can either p...
research
06/27/2023

Assessing small area estimates via artificial populations from KBAABB: a kNN-based approximation to ABB

Comparing and evaluating small area estimation (SAE) models for a given ...
research
03/31/2022

QUIP: Query-driven Missing Value Imputation

Missing values widely exist in real-world data sets, and failure to clea...
research
09/09/2022

Boosting Sensitivity of Large-scale Online Experimentation via Dropout Buyer Imputation

Metrics provide strong evidence to support hypotheses in online experime...

Please sign up or login with your details

Forgot password? Click here to reset