IlocA: An algorithm to Cluster Cells and form Imputation Groups from a pair of Classification Variables

02/23/2023
by   Geraard Keogh, et al.
0

We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are accurate in general. With non-ignorable missingness results are consistent with those obtained elsewhere. We close with a case study applying our method to imputing missing building energy performance data

READ FULL TEXT
research
08/13/2023

Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for...
research
06/30/2022

Solving the "many variables" problem in MICE with principal component regression

Multiple Imputation (MI) is one of the most popular approaches to addres...
research
06/04/2017

Evolving imputation strategies for missing data in classification problems with TPOT

Missing data has a ubiquitous presence in real-life applications of mach...
research
07/29/2020

Regression-based imputation of explanatory discrete missing data

Imputation of missing values is a strategy for handling non-responses in...
research
02/08/2023

IRTCI: Item Response Theory for Categorical Imputation

Most datasets suffer from partial or complete missing values, which has ...
research
09/10/2019

Robust Multivariate Estimation Based On Statistical Data Depth Filters

In the classical contamination models, such as the gross-error (Huber an...

Please sign up or login with your details

Forgot password? Click here to reset