Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

05/10/2023
by   Tu T. Do, et al.
0

Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2023

Supervised dimensionality reduction for multiple imputation by chained equations

Multivariate imputation by chained equations (MICE) is one of the most p...
research
05/30/2022

Principle Components Analysis based frameworks for efficient missing data imputation algorithms

Missing data is a commonly occurring problem in practice, and imputation...
research
05/02/2019

Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

How does missing data affect our ability to learn signal structures? It ...
research
07/14/2020

Predicting feature imputability in the absence of ground truth

Data imputation is the most popular method of dealing with missing value...
research
06/06/2019

Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

Missing Not At Random values are considered to be non-ignorable and requ...
research
04/11/2020

Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data

In health-pollution cohort studies, accurate predictions of pollutant co...
research
02/11/2018

PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction Under Imbalanced Data

The real-time crash likelihood prediction has been an important research...

Please sign up or login with your details

Forgot password? Click here to reset