Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

06/24/2023
by   Sunil Madhow, et al.
0

Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution – namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may have been gathered adaptively. We develop theory for the TMIS Offline Policy Evaluation (OPE) estimator in this generalized setting for tabular MDPs, deriving high-probability, instance-dependent bounds on its estimation error. We also recover minimax-optimal offline learning in the adaptive setting. Finally, we conduct simulations to empirically analyze the behavior of these estimators under adaptive and non-adaptive regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Recent theoretical work studies sample-efficient reinforcement learning ...
research
03/07/2023

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Offline reinforcement learning (offline RL) considers problems where lea...
research
11/23/2022

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear functio...
research
11/15/2022

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning ...
research
03/08/2022

A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinfo...
research
01/21/2022

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic vari...
research
07/05/2022

Offline RL Policies Should be Trained to be Adaptive

Offline RL algorithms must account for the fact that the dataset they ar...

Please sign up or login with your details

Forgot password? Click here to reset