Valid inference after prediction

06/23/2023
by   Keshav Motwani, et al.
0

Recent work has focused on the very common practice of prediction-based inference: that is, (i) using a pre-trained machine learning model to predict an unobserved response variable, and then (ii) conducting inference on the association between that predicted response and some covariates. As pointed out by Wang et al. [2020], applying a standard inferential approach in (ii) does not accurately quantify the association between the unobserved (as opposed to the predicted) response and the covariates. In recent work, Wang et al. [2020] and Angelopoulos et al. [2023] propose corrections to step (ii) in order to enable valid inference on the association between the unobserved response and the covariates. Here, we show that the method proposed by Angelopoulos et al. [2023] successfully controls the type 1 error rate and provides confidence intervals with correct nominal coverage, regardless of the quality of the pre-trained machine learning model used to predict the unobserved response. However, the method proposed by Wang et al. [2020] provides valid inference only under very strong conditions that rarely hold in practice: for instance, if the machine learning model perfectly approximates the true regression function in the study population of interest.

READ FULL TEXT
research
06/15/2021

Tree-Values: selective inference for regression trees

We consider conducting inference on the output of the Classification and...
research
06/11/2018

Valid Post-selection Inference in Assumption-lean Linear Regression

Construction of valid statistical inference for estimators based on data...
research
11/15/2018

It Does Not Follow. Response to "Yes They Can! ..."

This a response to "Yes They Can! ..." (a comment on [5]) by J.S. Shaari...
research
05/22/2023

The NTK approximation is valid for longer than you think

We study when the neural tangent kernel (NTK) approximation is valid for...
research
11/29/2019

Generalized inferential models for censored data

Inferential challenges that arise when data are censored have been exten...
research
10/31/2022

Exact and Approximate Conformal Inference in Multiple Dimensions

It is common in machine learning to estimate a response y given covariat...
research
11/08/2020

Performative Prediction in a Stateful World

Deployed supervised machine learning models make predictions that intera...

Please sign up or login with your details

Forgot password? Click here to reset