How to select predictive models for causal inference?

02/01/2023
by   Doutreligne Matthieu, et al.
0

Predictive models – as with machine learning – can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.

READ FULL TEXT

page 13

page 22

page 26

page 27

research
10/31/2017

Synth-Validation: Selecting the Best Causal Inference Method for a Given Dataset

Many decisions in healthcare, business, and other policy domains are mad...
research
08/29/2020

Model selection for estimation of causal parameters

A popular technique for selecting and tuning machine learning estimators...
research
06/02/2019

An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference

Real world observational data, together with causal inference, allow the...
research
11/03/2022

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

We study the problem of model selection in causal inference, specificall...
research
10/27/2021

Doubly Robust Criterion for Causal Inference

The semiparametric estimation approach, which includes inverse-probabili...
research
08/25/2018

Causes of Effects via a Bayesian Model Selection Procedure

In causal inference, and specifically in the Causes of Effects problem, ...
research
09/15/2022

Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection

When evaluating the performance of clinical machine learning models, one...

Please sign up or login with your details

Forgot password? Click here to reset