Multiple imputation for logistic regression models: incorporating an interaction

11/26/2022
by   Matthew J. Smith, et al.
0

Background: Multiple imputation is often used to reduce bias and gain efficiency when there is missing data. The most appropriate imputation method depends on the model the analyst is interested in fitting. Several imputation approaches have been proposed for when this model is a logistic regression model with an interaction term that contains a binary partially observed variable; however, it is not clear which performs best under certain parameter settings. Methods: Using 1000 simulations, each with 10,000 observations, under six data-generating mechanisms (DGM), we investigate the performance of four methods: (i) 'passive imputation', (ii) 'just another variable' (JAV), (iii) 'stratify-impute-append' (SIA), and (iv) 'substantive model compatible fully conditional specifica-tion' (SMCFCS). The application of each method is shown in an empirical example using England-based cancer registry data. Results: SMCFCS and SIA showed the least biased estimate of the coefficients for the fully, and partially, observed variable and the interaction term. SMCFCS and SIA showed good coverage and low relative error for all DGMs. SMCFCS had a large bias when there was a low prevalence of the fully observed variable in the interaction. SIA performed poorly when the fully observed variable in the interaction had a continuous underlying form. Conclusion: SMCFCS and SIA give consistent estimation for logistic regression models with an interaction term when data are missing at random, and either can be used in most analyses. SMCFCS performed better than SIA when the fully observed variable in the interaction had an underlying continuous form. Researchers should be cautious when using SMCFCS when there is a low prevalence of the fully observed variable in the interaction.

READ FULL TEXT

page 12

page 13

page 14

research
03/02/2021

Multiple imputation with missing data indicators

Multiple imputation is a well-established general technique for analyzin...
research
07/25/2018

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and r...
research
10/20/2020

A Comparative Study of Imputation Methods for Multivariate Ordinal Data

Missing data remains a very common problem in large datasets, including ...
research
11/25/2014

PLUTO: Penalized Unbiased Logistic Regression Trees

We propose a new algorithm called PLUTO for building logistic regression...
research
05/04/2018

Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models

Multiple imputation (MI) has become popular for analyses with missing da...
research
03/28/2018

Semi-supervised learning for structured regression on partially observed attributed graphs

Conditional probabilistic graphical models provide a powerful framework ...
research
09/05/2023

A Likelihood Approach to Incorporating Self-Report Data in HIV Recency Classification

Estimating new HIV infections is significant yet challenging due to the ...

Please sign up or login with your details

Forgot password? Click here to reset