Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target

11/29/2022
by   Alexander Guyer, et al.
0

As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal. If that probability falls below some desired level, it should alert the user so that appropriate interventions can be made. This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary, such as the cumulative reward, measured at a fixed horizon H. At each time t ∈{0, …, H-1}, our method produces a calibrated estimate of the probability that the final cumulative reward will fall within a user-specified target interval [y^-,y^+]. Using this estimate, the autonomous system can raise an alarm if the probability drops below a specified threshold. We compute the probability estimates by inverting conformal prediction. Our starting point is the Conformalized Quantile Regression (CQR) method of Romano et al., which applies split-conformal prediction to the results of quantile regression. CQR is not invertible, but by using the conditional cumulative distribution function (CDF) as the non-conformity measure, we show how to obtain an invertible modification that we call Probability-space Conformalized Quantile Regression (PCQR). Like CQR, PCQR produces well-calibrated conditional prediction intervals with finite-sample marginal guarantees. By inverting PCQR, we obtain marginal guarantees for the probability that the cumulative reward of an autonomous system will fall within an arbitrary user-specified target intervals. Experiments on two domains confirm that these probabilities are well-calibrated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2022

Conformal Prediction Intervals for Markov Decision Process Trajectories

Before delegating a task to an autonomous system, a human operator may w...
research
06/01/2021

Improving Conditional Coverage via Orthogonal Quantile Regression

We develop a method to generate prediction intervals that have a user-sp...
research
09/22/2012

An efficient model-free estimation of multiclass conditional probability

Conventional multiclass conditional probability estimation methods, such...
research
10/02/2021

Calibrated Multiple-Output Quantile Regression with Representation Learning

We develop a method to generate predictive regions that cover a multivar...
research
04/13/2020

Quantile regression on inactivity time

The inactivity time, or lost lifespan specifically for mortality data, c...
research
05/26/2022

Censored Quantile Regression Neural Networks

This paper considers doing quantile regression on censored data using ne...

Please sign up or login with your details

Forgot password? Click here to reset