Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation

07/27/2020
by   Ilya Kostrikov, et al.
0

In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches. Although straightforward, these techniques in general yield biased estimates of the true value of the policy. In this work, we investigate the potential for statistical bootstrapping to be used as a way to take these biased estimates and produce calibrated confidence intervals for the true value of the policy. We identify conditions - specifically, sufficient data size and sufficient coverage - under which statistical bootstrapping in this setting is guaranteed to yield correct confidence intervals. In practical situations, these conditions often do not hold, and so we discuss and propose mechanisms that can be employed to mitigate their effects. We evaluate our proposed method and show that it can yield accurate confidence intervals in a variety of conditions, including challenging continuous control environments and small data regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...
research
06/09/2023

Conformalizing Machine Translation Evaluation

Several uncertainty estimation methods have been recently proposed for m...
research
12/07/2019

Tighter Confidence Intervals for Rating Systems

Rating systems are ubiquitous, with applications ranging from product re...
research
06/19/2019

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value f...
research
08/15/2020

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

We consider off-policy evaluation (OPE), which evaluates the performance...
research
02/10/2020

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of usi...
research
11/19/2016

A Bayesian approach to type-specific conic fitting

A perturbative approach is used to quantify the effect of noise in data ...

Please sign up or login with your details

Forgot password? Click here to reset