IRTCI: Item Response Theory for Categorical Imputation

by   Adrienne Kline, et al.

Most datasets suffer from partial or complete missing values, which has downstream limitations on the available models on which to test the data and on any statistical inferences that can be made from the data. Several imputation techniques have been designed to replace missing data with stand in values. The various approaches have implications for calculating clinical scores, model building and model testing. The work showcased here offers a novel means for categorical imputation based on item response theory (IRT) and compares it against several methodologies currently used in the machine learning field including k-nearest neighbors (kNN), multiple imputed chained equations (MICE) and Amazon Web Services (AWS) deep learning method, Datawig. Analyses comparing these techniques were performed on three different datasets that represented ordinal, nominal and binary categories. The data were modified so that they also varied on both the proportion of data missing and the systematization of the missing data. Two different assessments of performance were conducted: accuracy in reproducing the missing values, and predictive performance using the imputed data. Results demonstrated that the new method, Item Response Theory for Categorical Imputation (IRTCI), fared quite well compared to currently used methods, outperforming several of them in many conditions. Given the theoretical basis for the new approach, and the unique generation of probabilistic terms for determining category belonging for missing cells, IRTCI offers a viable alternative to current approaches.


page 1

page 2

page 3

page 4


Missing Data Imputation for Supervised Learning

This paper compares methods for imputing missing categorical data for su...

Remiod: Reference-based Controlled Multiple Imputation of Longitudinal Binary and Ordinal Outcomes with non-ignorable missingness

Missing data on response variables are common in clinical studies. Corre...

Categorical EHR Imputation with Generative Adversarial Nets

Electronic Health Records often suffer from missing data, which poses a ...

Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback

Although data may be abundant, complete data is less so, due to missing ...

Evaluation of imputation techniques with varying percentage of missing data

Missing data is a common problem which has consistently plagued statisti...

Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data

Many real-world datasets contain missing entries and mixed data types in...

IlocA: An algorithm to Cluster Cells and form Imputation Groups from a pair of Classification Variables

We set out the novel bottom up procedure to aggregate or cluster cells w...

Please sign up or login with your details

Forgot password? Click here to reset