Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models

05/04/2018
by   Tra My Pham, et al.
0

Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random (MNAR) mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given dataset, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population, by incorporating an appropriately calculated offset termed the `calibrated-δ adjustment'. We describe the derivation of this offset from the population distribution of the incomplete variable and show how in applications it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-δ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general MNAR missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared to standard MI. Calibrated-δ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departure from the MAR assumption.

READ FULL TEXT
research
05/15/2022

Inference with Imputed Data: The Allure of Making Stuff Up

Incomplete observability of data generates an identification problem. Th...
research
03/02/2021

Multiple imputation with missing data indicators

Multiple imputation is a well-established general technique for analyzin...
research
04/14/2020

A logic-based resampling with matching approach to multiple imputation of missing data

Researchers often use model-based multiple imputation to handle missing ...
research
12/02/2017

Efficient Bayesian Nonparametric Inference for Categorical Data with General High Missingness

Missingness in categorical data is a common problem in various real appl...
research
01/20/2021

Accounting for not-at-random missingness through imputation stacking

Not-at-random missingness presents a challenge in addressing missing dat...
research
11/26/2022

Multiple imputation for logistic regression models: incorporating an interaction

Background: Multiple imputation is often used to reduce bias and gain ef...
research
08/09/2022

Dealing with missing data under stratified sampling designs where strata are study domains

A quick count seeks to estimate the voting trends of an election and com...

Please sign up or login with your details

Forgot password? Click here to reset