Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

10/16/2019
by   Ziyang Tang, et al.
0

Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical importance sampling (IS) estimators. Recently, Liu et al. (2018a) proposed an approach that significantly reduces the variance of infinite-horizon off-policy evaluation by estimating the stationary density ratio, but at the cost of introducing potentially high biases due to the error in density ratio estimation. In this paper, we develop a bias-reduced augmentation of their method, which can take advantage of a learned value function to obtain higher accuracy. Our method is doubly robust in that the bias vanishes when either the density ratio or the value function estimation is perfect. In general, when either of them is accurate, the bias can also be reduced. Both theoretical and empirical results show that our method yields significant advantages over previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

We consider the off-policy estimation problem of estimating the expected...
research
10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...
research
10/10/2019

Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies

We consider off-policy policy evaluation when the trajectory data are ge...
research
10/20/2019

From Importance Sampling to Doubly Robust Policy Gradient

We show that policy gradient (PG) and its variance reduction variants ca...
research
03/24/2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

Off-policy estimation for long-horizon problems is important in many rea...
research
02/06/2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value-fun...
research
12/26/2013

Shape-constrained Estimation of Value Functions

We present a fully nonparametric method to estimate the value function, ...

Please sign up or login with your details

Forgot password? Click here to reset