Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

11/29/2021
by   Rujie Zhong, et al.
0

This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2022

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

This paper studies the problem of data collection for policy evaluation ...
research
06/12/2017

Data-Efficient Policy Evaluation Through Behavior Policy Search

We consider the task of evaluating a policy for a Markov decision proces...
research
01/29/2023

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

In this paper, we study the problem of optimal data collection for polic...
research
02/26/2022

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

We consider reinforcement learning (RL) methods in offline domains witho...
research
08/27/2023

Distributional Off-Policy Evaluation for Slate Recommendations

Recommendation strategies are typically evaluated by using previously lo...
research
06/05/2020

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Most reinforcement learning (RL) algorithms assume online access to the ...
research
12/14/2022

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

We consider the problem of off-policy evaluation (OPE) in reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset