Bellman Residual Orthogonalization for Offline Reinforcement Learning

03/24/2022
by   Andrea Zanette, et al.
0

We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Offline Reinforcement Learning (RL) is a promising approach for learning...
research
10/26/2021

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different...
research
02/21/2023

Adversarial Model for Offline Reinforcement Learning

We propose a novel model-based offline Reinforcement Learning (RL) frame...
research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...
research
03/08/2022

A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinfo...
research
03/09/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Off-policy evaluation (OPE) is the task of estimating the expected rewar...
research
02/10/2020

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of usi...

Please sign up or login with your details

Forgot password? Click here to reset