Minimax Weight and Q-Function Learning for Off-Policy Evaluation

10/28/2019
by   Masatoshi Uehara, et al.
0

We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that directly estimates importance ratios over the state-action distributions, removing the reliance on knowledge of the behavior policy as in prior work (Liu et al., 2018). (2) Another new estimator, MQL, obtained by swapping the roles of importance weights and value-functions in MWL. MQL has an intuitive interpretation of minimizing average Bellman errors and can be combined with MWL in a doubly robust manner. (3) Several additional results that offer further insights into these methods, including the sample complexity analyses of MWL and MQL, their asymptotic optimality in the tabular setting, how the learned importance weights depend the choice of the discriminator class, and how our methods provide a unified view of some old and new algorithms in RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2015

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement lea...
research
02/06/2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value-fun...
research
06/27/2023

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Importance sampling is a central idea underlying off-policy prediction i...
research
05/07/2019

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

In importance sampling (IS)-based reinforcement learning algorithms such...
research
02/05/2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

We offer a theoretical characterization of off-policy evaluation (OPE) i...
research
06/16/2023

Bootstrapped Representations in Reinforcement Learning

In reinforcement learning (RL), state representations are key to dealing...
research
07/26/2022

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

We study off-policy evaluation (OPE) for partially observable MDPs (POMD...

Please sign up or login with your details

Forgot password? Click here to reset