Conformal prediction for the design problem

02/08/2022
by   Clara Fannjiang, et al.
1

In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. For example, in the protein design problem, we have a regression model that predicts some real-valued property of a protein sequence, which we use to propose new sequences believed to exhibit higher property values than observed in the training data. Since validating designed sequences in the wet lab is typically costly, it is important to know how much we can trust the model's predictions. In such settings, however, there is a distinct type of distribution shift between the training and test data: one where the training and test data are statistically dependent, as the latter is chosen based on the former. Consequently, the model's error on the test data – that is, the designed sequences – has some non-trivial relationship with its error on the training data. Herein, we introduce a method to quantify predictive uncertainty in such settings. We do so by constructing confidence sets for predictions that account for the dependence between the training and test data. The confidence sets we construct have finite-sample guarantees that hold for any prediction algorithm, even when a trained model chooses the test-time input distribution. As a motivating use case, we demonstrate how our method quantifies uncertainty for the predicted fitness of designed protein using several real data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

Balance-Subsampled Stable Prediction

In machine learning, it is commonly assumed that training and test data ...
research
02/27/2020

To be or not to be? A spatial predictive crime model for Rochester

This project uses a spatial model (Geographically Weighted Regression) t...
research
07/08/2021

Predicting Disease Progress with Imprecise Lab Test Results

In existing deep learning methods, almost all loss functions assume that...
research
01/31/2020

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

For many machine learning algorithms, two main assumptions are required ...
research
11/04/2021

Conformal prediction for text infilling and part-of-speech prediction

Modern machine learning algorithms are capable of providing remarkably a...
research
11/22/2018

Learning in the Absence of Training Data -- a Galactic Application

There are multiple real-world problems in which training data is unavail...
research
07/22/2019

Model Adaptation via Model Interpolation and Boosting for Web Search Ranking

This paper explores two classes of model adaptation methods for Web sear...

Please sign up or login with your details

Forgot password? Click here to reset