A Complete Characterisation of Structured Missingness

07/05/2023
βˆ™
by   James Jackson, et al.
βˆ™
0
βˆ™

Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors M_1, M_2, ..., M_p are independent after conditioning on the relevant portion of the data matrix 𝐗. As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each M_j can depend on 𝐌_-j (i.e., all missingness indicator vectors except M_j), in addition to 𝐗. We embed this new framework within the well-established decomposition of mechanisms into MCAR, MAR, and MNAR (Rubin, 1976), allowing us to recast mechanisms into a broader setting, where we can consider the combined effect of 𝐗 and 𝐌_-j on M_j. We also demonstrate, via simulations, the impact of SM on inference and prediction, and consider contextual instances of SM arising in a de-identified nationwide (US-based) clinico-genomic database (CGDB). We hope to stimulate interest in SM, and encourage timely research into this phenomenon.

READ FULL TEXT
research
βˆ™ 08/03/2018

How to Avoid Reidentification with Proper Anonymization

De Montjoye et al. claimed that most individuals can be reidentified fro...
research
βˆ™ 12/01/2020

Transfer learning to enhance amenorrhea status prediction in cancer and fertility data with missing values

Collecting sufficient labelled training data for health and medical prob...
research
βˆ™ 07/03/2020

Neumann networks: differential programming for supervised learning with missing values

The presence of missing values makes supervised learning much more chall...
research
βˆ™ 02/16/2019

Sequentially additive nonignorable missing data modeling using auxiliary marginal information

We study a class of missingness mechanisms, called sequentially additive...
research
βˆ™ 07/15/2020

Motifs for processes on networks

The study of motifs in networks can help researchers uncover links betwe...
research
βˆ™ 03/01/2018

A Global Information Based Adaptive Threshold for Grouping Large Scale Global Optimization Problems

By taking the idea of divide-and-conquer, cooperative coevolution (CC) p...
research
βˆ™ 08/27/2020

A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft)

In January 2019, prompted by the Wikimedia Movement's 2030 strategic dir...

Please sign up or login with your details

Forgot password? Click here to reset