Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

06/06/2019
by   Aude Sportisse, et al.
0

Missing Not At Random values are considered to be non-ignorable and require defining a model for the missing values mechanism which involves strong a priori on the parametric form of the distribution and makes the inference or imputation tasks more complex. Methodologies to handle MNAR values also focus on simple settings assuming that only one variable (such as the outcome one) has missing entries. Recent work of Mohan and Pearl based on graphical models and causality show that specific settings of MNAR enable to recover some aspects of the distribution without specifying the MNAR mechanism. We pursue this line of research. Considering a data matrix generated from a probabilistic principal component analysis (PPCA) model containing several MNAR variables, not necessarily under the same self-masked missing mechanism, we propose estimators for the means, variances and covariances of the variables and study their consistency. The estimators present the great advantage of being computed by only using observed data. In addition, we propose an imputation method of the data matrix and an estimation of the PPCA loading matrix. We compare our proposal with results obtained for ignorable missing values based on the use of expectation-maximization algorithm.

READ FULL TEXT

page 16

page 18

page 19

page 20

research
06/06/2019

Estimation with informative missing data in the low-rank model with random effects

Matrix completion based on low-rank models is very popular and comes wit...
research
05/10/2023

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Monotone missing data is a common problem in data analysis. However, imp...
research
07/21/2022

Missing Values and the Dimensionality of Expected Returns

Combining 100+ cross-sectional predictors requires either dropping 90 da...
research
02/16/2022

Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values

The self-organizing map is an unsupervised neural network which is widel...
research
01/23/2021

A Geospatial Functional Model For OCO-2 Data with Application on Imputation and Land Fraction Estimation

Data from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite is esse...
research
06/18/2020

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Modern large scale datasets are often plagued with missing entries; inde...
research
02/28/2019

Learning partially ranked data based on graph regularization

Ranked data appear in many different applications, including voting and ...

Please sign up or login with your details

Forgot password? Click here to reset