Deeply-Debiased Off-Policy Interval Estimation

05/10/2021
by   Chengchun Shi, et al.
7

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

This paper is concerned with constructing a confidence interval for a ta...
research
11/08/2020

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation (OPE) est...
research
03/03/2022

Reinforcement Learning in Possibly Nonstationary Environments

We consider reinforcement learning (RL) methods in offline nonstationary...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...
research
12/12/2020

Offline Policy Selection under Uncertainty

The presence of uncertainty in policy evaluation significantly complicat...
research
10/29/2020

Off-Policy Interval Estimation with Lipschitz Value Iteration

Off-policy evaluation provides an essential tool for evaluating the effe...
research
02/06/2021

Bootstrapping Statistical Inference for Off-Policy Evaluation

Bootstrapping provides a flexible and effective approach for assessing t...

Please sign up or login with your details

Forgot password? Click here to reset