Machine learning for causal inference: on the use of cross-fit estimators

04/21/2020
by   Paul N Zivich, et al.
0

Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in bias and incorrect inferences due to overfitting. Cross-fit estimators have been proposed to eliminate this bias and yield better statistical properties. We conducted a simulation study to assess the performance of several different estimators for the average causal effect (ACE). The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly-robust estimators (g-formula, inverse probability weights) and doubly-robust estimators (augmented inverse probability weights, targeted maximum likelihood estimation). Nuisance functions were estimated with parametric models and ensemble machine learning, separately. We further assessed cross-fit doubly-robust estimators. With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, only the cross-fit estimators were unbiased and had nominal confidence interval coverage. Due to the difficulty of properly specifying parametric models in high dimensional data, doubly-robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the ACE in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.

READ FULL TEXT
research
03/23/2022

Causal Inference in High Dimensions – Without Sparsity

We revisit the classical causal inference problem of estimating the aver...
research
11/18/2022

All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples

There is a long-standing debate in the statistical, epidemiological and ...
research
07/13/2021

On doubly robust inference for double machine learning

Due to concerns about parametric model misspecification, there is intere...
research
09/28/2021

Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials

Several recently developed methods have the potential to harness machine...
research
11/20/2017

Nonparametric Double Robustness

Use of nonparametric techniques (e.g., machine learning, kernel smoothin...
research
07/18/2017

On Adaptive Propensity Score Truncation in Causal Inference

The positivity assumption, or the experimental treatment assignment (ETA...

Please sign up or login with your details

Forgot password? Click here to reset