Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

02/04/2021
by   Benjamin Christoffersen, et al.
0

Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Missing Value Imputation for Mixed Data Through Gaussian Copula

Missing data imputation forms the first critical step of many data analy...
research
10/13/2022

Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data

Many real-world datasets contain missing entries and mixed data types in...
research
09/25/2020

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula

Most data science algorithms require complete observations, yet many dat...
research
06/30/2022

Solving the "many variables" problem in MICE with principal component regression

Multiple Imputation (MI) is one of the most popular approaches to addres...
research
01/07/2021

Distances with mixed type variables some modified Gower's coefficients

Nearest neighbor methods have become popular in official statistics, mai...
research
09/24/2018

Preserving the distribution function in surveys in case of imputation for zero inflated data

Item non-response in surveys is usually handled by single imputation, wh...
research
07/13/2020

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obta...

Please sign up or login with your details

Forgot password? Click here to reset