IlocA: An algorithm to Cluster Cells and form Imputation Groups from a pair of Classification Variables

by   Geraard Keogh, et al.

We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are accurate in general. With non-ignorable missingness results are consistent with those obtained elsewhere. We close with a case study applying our method to imputing missing building energy performance data


Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for...

Solving the "many variables" problem in MICE with principal component regression

Multiple Imputation (MI) is one of the most popular approaches to addres...

Evolving imputation strategies for missing data in classification problems with TPOT

Missing data has a ubiquitous presence in real-life applications of mach...

Regression-based imputation of explanatory discrete missing data

Imputation of missing values is a strategy for handling non-responses in...

IRTCI: Item Response Theory for Categorical Imputation

Most datasets suffer from partial or complete missing values, which has ...

Robust Multivariate Estimation Based On Statistical Data Depth Filters

In the classical contamination models, such as the gross-error (Huber an...

Please sign up or login with your details

Forgot password? Click here to reset