Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps

03/25/2022
by   Jinglin Chen, et al.
0

We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling, and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the utility and the novel mechanism of gap assumptions in offline RL with weak function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Sample-efficiency guarantees for offline reinforcement learning (RL) oft...
research
02/05/2023

Refined Value-Based Offline RL under Realizability and Partial Coverage

In offline reinforcement learning (RL) we have no opportunity to explore...
research
05/30/2019

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), ...
research
11/01/2022

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Offline reinforcement learning (RL), which refers to decision-making fro...
research
05/22/2023

Offline Reinforcement Learning with Additional Covering Distributions

We study learning optimal policies from a logged dataset, i.e., offline ...
research
05/05/2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both co...
research
05/01/2019

Information-Theoretic Considerations in Batch Reinforcement Learning

Value-function approximation methods that operate in batch mode have fou...

Please sign up or login with your details

Forgot password? Click here to reset