Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation

12/10/2021
by   S. Ghazaleh Dashti, et al.
0

Causal inference from longitudinal studies is central to epidemiologic research. Targeted Maximum Likelihood Estimation (TMLE) is an established double-robust causal effect estimation method, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on motivating data from the Victorian Adolescent Health Cohort Study, we conducted simulation and case studies to evaluate the performance of methods for handling missing data when using TMLE. These were complete-case analysis; an extended TMLE method incorporating a model for outcome missingness mechanism; missing indicator method for missing covariate data; and six multiple imputation (MI) approaches using parametric or machine-learning approaches to handle missing outcome, exposure, and covariate data. The simulation study considered a simple scenario (the exposure and outcome generated from main-effects regressions), and two complex scenarios (models also included interactions), alongside eleven missingness mechanisms defined using causal diagrams. No approach performed well across all scenarios and missingness mechanisms. For non-MI methods, bias depended on missingness mechanism (little when outcome did not influence missingness in any variable). For parametric MI, bias depended on missingness mechanism (smaller when outcome did not directly influence outcome missingness) and data generation scenario (larger for the complex scenarios). Including interaction terms in the imputation model improved performance. For MI using machine learning, bias depended on missingness mechanism (smaller when no variable with missing data directly influenced outcome missingness). We recommend considering missing data mechanism and, if using MI, opting for a saturated parametric or data-adaptive imputation model for handling missing data in TMLE estimation.

READ FULL TEXT

page 28

page 30

research
11/28/2019

A review and evaluation of standard methods to handle missing data on time-varying confounders in marginal structural models

Marginal structural models (MSMs) are commonly used to estimate causal i...
research
07/25/2018

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and r...
research
01/27/2023

G-formula for causal inference via multiple imputation

G-formula is a popular approach for estimating treatment or exposure eff...
research
01/17/2023

Recoverability and estimation of causal effects under typical multivariable missingness mechanisms

In the context of missing data, the identifiability or "recoverability" ...
research
10/30/2020

Evaluation of approaches for accommodating interactions and non-linear terms in multiple imputation of incomplete three-level data

Three-level data structures arising from repeated measures on individual...
research
10/30/2021

The Missing Covariate Indicator Method is Nearly Valid Almost Always

Background: Although the missing covariate indicator method (MCIM) has b...

Please sign up or login with your details

Forgot password? Click here to reset