Doubly robust, machine learning effect estimation in real-world clinical sciences: A practical evaluation of performance in molecular epidemiology cohort settings

05/27/2021
by   Xiang Meng, et al.
0

Modern efficient estimators such as AIPW and TMLE facilitate the application of flexible, non-parametric machine learning algorithms to improve treatment and outcome model fit, allowing for some model misspecification while still maintaining desired bias and variance properties. Recent simulation work has pointed to essential conditions for effective application including: the need for cross-fitting, using of a broad library of well-tuned, flexible learners, and sufficiently large sample sizes. In these settings,cross-fit, doubly robust estimators fit with machine learning appear to be clearly superior to conventional alternatives. However, commonly simulated conditions differ in important ways from settings in which these estimators may be most useful, namely in high-dimensional, observational settings where: costs of measurements limit sample size, high numbers of covariates may only contain a subset of true confounders, and where model misspecification may include the omission of essential biological interactions. In such settings, computationally-intensive and challenging to optimize cross-fit, ensemble learning-based estimators may have less of a practical advantage. We present extensive simulation results drawing data on 331 covariates from 1178 subjects of a multi-omic, longitudinal birth cohort while fixing treatment and outcome effects. We fit models under various conditions including under- and over- (e.g. excess orthogonal covariates) specification, and missing interactions using both state-of-the-art and less-computationally intensive (e.g. singly-fit,parametric) estimators. In real data structures, we find in nearly every scenario (e.g. model misspecification, single- or cross-fit- estimators), that efficient estimators fit with parametric learner out perform those that include non-parametric learners on the basis of bias and coverage.

READ FULL TEXT

page 12

page 14

page 15

research
04/21/2020

Machine learning for causal inference: on the use of cross-fit estimators

Modern causal inference methods allow machine learning to be used to wea...
research
11/18/2022

All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples

There is a long-standing debate in the statistical, epidemiological and ...
research
07/01/2021

Demystifying statistical learning based on efficient influence functions

Evaluation of treatment effects and more general estimands is typically ...
research
10/06/2018

Robust variance estimation and inference for causal effect estimation

We consider a longitudinal data structure consisting of baseline covaria...
research
01/30/2022

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

Estimation of causal effects using machine learning methods has become a...
research
01/22/2021

Semi-parametric estimation of biomarker age trends with endogenous medication use in longitudinal data

In cohort studies, non-random medication use can pose barriers to estima...
research
10/26/2021

Towards Optimal Variance Reduction in Online Controlled Experiments

We study the optimal variance reduction solutions for online controlled ...

Please sign up or login with your details

Forgot password? Click here to reset