Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data

10/13/2022
by   Yuxuan Zhao, et al.
0

Many real-world datasets contain missing entries and mixed data types including categorical and ordered (e.g. continuous and ordinal) variables. Imputing the missing entries is necessary, since many data analysis pipelines require complete data, but this is challenging especially for mixed data. This paper proposes a probabilistic imputation method using an extended Gaussian copula model that supports both single and multiple imputation. The method models mixed categorical and ordered data using a latent Gaussian distribution. The unordered characteristics of categorical variables is explicitly modeled using the argmax operator. The method makes no assumptions on the data marginals nor does it require tuning any hyperparameters. Experimental results on synthetic and real datasets show that imputation with the extended Gaussian copula outperforms the current state-of-the-art for both categorical and ordered variables in mixed data.

READ FULL TEXT

page 9

page 19

page 20

research
10/28/2019

Missing Value Imputation for Mixed Data Through Gaussian Copula

Missing data imputation forms the first critical step of many data analy...
research
09/25/2020

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula

Most data science algorithms require complete observations, yet many dat...
research
02/04/2021

Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types

Missing values with mixed data types is a common problem in a large numb...
research
06/18/2020

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Modern large scale datasets are often plagued with missing entries; inde...
research
02/08/2023

IRTCI: Item Response Theory for Categorical Imputation

Most datasets suffer from partial or complete missing values, which has ...
research
11/12/2018

Modeling Text Complexity using a Multi-Scale Probit

We present a novel model for text complexity analysis which can be fitte...
research
08/05/2018

Missing Value Imputation Based on Deep Generative Models

Missing values widely exist in many real-world datasets, which hinders t...

Please sign up or login with your details

Forgot password? Click here to reset