Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

06/20/2016
by   Josiah P. Hanna, et al.
0

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Hallucinated Adversarial Control for Conservative Offline Policy Evaluation

We study the problem of conservative off-policy evaluation (COPE) where ...
research
10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...
research
10/29/2020

Off-Policy Interval Estimation with Lipschitz Value Iteration

Off-policy evaluation provides an essential tool for evaluating the effe...
research
07/03/2017

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress to...
research
08/15/2020

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

We consider off-policy evaluation (OPE), which evaluates the performance...
research
06/19/2019

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value f...
research
02/06/2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value-fun...

Please sign up or login with your details

Forgot password? Click here to reset