Dealing with missing data using attention and latent space regularization

11/14/2022
by   Jahan C. Penny-Dimri, et al.
0

Most practical data science problems encounter missing data. A wide variety of solutions exist, each with strengths and weaknesses that depend upon the missingness-generating process. Here we develop a theoretical framework for training and inference using only observed variables enabling modeling of incomplete datasets without imputation. Using an information and measure-theoretic argument we construct models with latent space representations that regularize against the potential bias introduced by missing data. The theoretical properties of this approach are demonstrated empirically using a synthetic dataset. The performance of this approach is tested on 11 benchmarking datasets with missingness and 18 datasets corrupted across three missingness patterns with comparison against a state-of-the-art model and industry-standard imputation. We show that our proposed method overcomes the weaknesses of imputation methods and outperforms the current state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence...
research
02/10/2021

MAIN: Multihead-Attention Imputation Networks

The problem of missing data, usually absent incurated and competition-st...
research
06/09/2021

EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models

High dimensional incomplete data can be found in a wide range of systems...
research
06/25/2022

Missing data patterns in runners' careers: do they matter?

Predicting the future performance of young runners is an important resea...
research
11/05/2022

Towards a methodology for addressing missingness in datasets, with an application to demographic health datasets

Missing data is a common concern in health datasets, and its impact on g...
research
10/16/2017

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

This paper takes a step towards temporal reasoning in a dynamically chan...
research
11/04/2014

Iterated geometric harmonics for data imputation and reconstruction of missing data

The method of geometric harmonics is adapted to the situation of incompl...

Please sign up or login with your details

Forgot password? Click here to reset