Missing Value Imputation for Mixed Data Through Gaussian Copula

10/28/2019
by   Yuxuan Zhao, et al.
0

Missing data imputation forms the first critical step of many data analysis pipelines. The challenge is greatest for mixed data sets, including real, Boolean, and ordinal data, where standard techniques for imputation fail basic sanity checks: for example, the imputed values may not follow the same distributions as the data. This paper proposes a new semiparametric algorithm to impute missing values, with no tuning parameters. The algorithm models mixed data as a Gaussian copula. This model can fit arbitrary marginals for continuous variables and can handle ordinal variables with many levels, including Boolean variables as a special case. We develop an efficient approximate EM algorithm to estimate copula parameters from incomplete mixed data. The resulting model reveals the statistical associations among variables. Experimental results on several synthetic and real datasets show superiority of our proposed algorithm to state-of-the-art imputation algorithms for mixed data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data

Many real-world datasets contain missing entries and mixed data types in...
research
09/25/2020

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula

Most data science algorithms require complete observations, yet many dat...
research
02/04/2021

Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

Missing values with mixed data types is a common problem in a large numb...
research
10/26/2022

Nonparametric Copula Models for Mixed Data with Informative Missingness

Modern datasets commonly feature both substantial missingness and variab...
research
06/18/2020

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Modern large scale datasets are often plagued with missing entries; inde...
research
03/31/2022

QUIP: Query-driven Missing Value Imputation

Missing values widely exist in real-world data sets, and failure to clea...
research
04/30/2018

Imputation of mixed data with multilevel singular value decomposition

Statistical analysis of large data sets offers new opportunities to bett...

Please sign up or login with your details

Forgot password? Click here to reset