Missing Data Imputation using Optimal Transport

02/10/2020
by   Boris Muzellec, et al.
16

Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.

READ FULL TEXT

page 6

page 7

page 12

page 13

page 14

page 15

page 16

research
05/08/2017

Multiple Imputation Using Deep Denoising Autoencoders

Missing data is a well-recognized problem impacting all domains. State-o...
research
10/23/2022

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences

Existing multimodal tasks mostly target at the complete input modality s...
research
02/03/2022

Minimax rate of consistency for linear models with missing values

Missing values arise in most real-world data sets due to the aggregation...
research
07/19/2021

A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues

Data quality is a common problem in machine learning, especially in high...
research
03/21/2021

Deep Distribution-preserving Incomplete Clustering with Optimal Transport

Clustering is a fundamental task in the computer vision and machine lear...
research
09/23/2020

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

In this paper, we propose Ensemble Learning models to identify factors c...
research
02/02/2023

Conditional expectation for missing data imputation

Missing data is common in datasets retrieved in various areas, such as m...

Please sign up or login with your details

Forgot password? Click here to reset