What does it mean for data to be `observed' or `missing'?

11/09/2018

∙

In statistical modelling of incomplete data, missingness is encoded as a relation between datasets Y and response patterns R. We identify two different meanings of `observed' and `missing' implicit in this framework, only one of which is consistent with the definition formally encoded in (Y, R). Notation that has been used in the literature for more than three decades fails to distinguish between these two concepts, rendering the notations `f(y_obs,y_mis)' and `f(y_mis | y_obs)' conceptually contradictory. Additionally, the same notation `f(y_mis | y_obs)' is used to refer to two densities with different domains. These densities can be considered to be equivalent mathematically, but conceptually they are not interchangeable as distributions because of their differing relationships to (Y, R). Only one of these distributions is consistent with (Y, R) and standard conventions for interpretation of mathematical notation leads to the wrong choice conceptually for ignorable multiple imputation. We introduce formal definitions and notational improvements to treat these and other ambiguities, and we demonstrate their use through several example derivations.

READ FULL TEXT

What does it mean for data to be `observed' or `missing'?

Sign in with Google

Consider DeepAI Pro