Principle Components Analysis based frameworks for efficient missing data imputation algorithms

05/30/2022
by   Thu Nguyen, et al.
0

Missing data is a commonly occurring problem in practice, and imputation, i.e., filling the missing entries of the data, is a popular way to deal with this problem. This motivates multiple works on imputation to deal with missing data of various types and dimensions. However, for high-dimensional datasets, these imputation methods can be computationally expensive. Therefore, in this work, we propose Principle Component Analysis Imputation (PCAI), a simple framework based on Principle Component Analysis (PCA) to speed up the imputation process of many available imputation techniques. Next, based on PCAI, we propose PCA Imputation - Classification (PIC), an imputation-dimension reduction-classification framework to deal with missing data classification problems where it is desirable to reduce the dimensions before training a classification model. Our experiments show that the proposed frameworks can be utilized with various imputation algorithms and improve the imputation speed significantly. Interestingly, the frameworks aid imputation methods that rely on many parameters by reducing the dimension of the data and hence, reducing the number of parameters needed to be estimated. Moreover, they not only can achieve compatible mean square error/higher classification accuracy compared to the traditional imputation style on the original missing dataset but many times deliver even better results. In addition, the frameworks also help to tackle the memory issue that many imputation approaches have by reducing the number of features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2023

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Monotone missing data is a common problem in data analysis. However, imp...
research
02/02/2023

Conditional expectation for missing data imputation

Missing data is common in datasets retrieved in various areas, such as m...
research
02/11/2018

PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction Under Imbalanced Data

The real-time crash likelihood prediction has been an important research...
research
10/11/2022

Combining datasets to increase the number of samples and improve model fitting

For many use cases, combining information from different datasets can be...
research
10/04/2021

Internal Data Imputation in Data Warehouse Dimensions

Missing values occur commonly in the multidimensional data warehouses. T...
research
06/06/2021

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decade...
research
06/03/2021

Multiple Imputation Through XGBoost

Multiple imputation is increasingly used in dealing with missing data. W...

Please sign up or login with your details

Forgot password? Click here to reset