Diagnosing missing always at random in multivariate data

10/18/2017
by   Iavor Bojinov, et al.
0

Models for analyzing multivariate data sets with missing values require strong, often unassessable, assumptions. The most common of these is that the mechanism that created the missing data is ignorable - a twofold assumption dependent on the mode of inference. The first part, which is the focus here, under the Bayesian and direct likelihood paradigms, requires that the missing data are missing at random (MAR); in contrast, the frequentist-likelihood paradigm demands that the missing data mechanism always produces MAR data, a condition known as missing always at random (MAAR). Under certain regularity conditions, assuming MAAR leads to an assumption that can be tested using the observed data alone namely, the missing data indicators only depend on fully observed variables. Here, we propose three different diagnostics procedures that not only indicate when this assumption is invalid but also suggest which variables are the most likely culprits. Although MAAR is not a necessary condition to ensure validity under the Bayesian and direct likelihood paradigms, it is sufficient, and evidence for its violation should encourage the statistician to conduct a targeted sensitivity analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2018

Missing at random: a stochastic process perspective

We offer a natural and extensible measure-theoretic treatment of missing...
research
11/15/2016

Recoverability of Joint Distribution from Missing Data

A probabilistic query may not be estimable from observed data corrupted ...
research
01/09/2017

Coupled Compound Poisson Factorization

We present a general framework, the coupled compound Poisson factorizati...
research
01/22/2021

Revisiting Identifying Assumptions for Population Size Estimation

The problem of estimating the size of a population based on a subset of ...
research
11/13/2018

What is really needed to justify ignoring the response mechanism for modelling purposes?

With incomplete data, the standard argument for when the response mechan...
research
10/19/2021

Riemannian classification of EEG signals with missing values

This paper proposes two strategies to handle missing data for the classi...

Please sign up or login with your details

Forgot password? Click here to reset