Offline RL Without Off-Policy Evaluation

06/16/2021
by   David Brandfonbrener, et al.
0

Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well. This one-step algorithm beats the previously reported results of iterative algorithms on a large portion of the D4RL benchmark. The simple one-step baseline achieves this strong performance without many of the tricks used by previously proposed iterative algorithms and is more robust to hyperparameters. We argue that the relatively poor performance of iterative approaches is a result of the high variance inherent in doing off-policy evaluation and magnified by the repeated optimization of policies against those high-variance estimates. In addition, we hypothesize that the strong performance of the one-step algorithm is due to a combination of favorable structure in the environment and behavior policy.

READ FULL TEXT

page 8

page 13

page 16

research
10/05/2021

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

The goal of offline reinforcement learning (RL) is to find an optimal po...
research
07/24/2023

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

As with any machine learning problem with limited data, effective offlin...
research
06/12/2021

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

The Reinforcement Learning (RL) building blocks, i.e. Q-functions and po...
research
10/14/2021

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...
research
11/26/2019

Behavior Regularized Offline Reinforcement Learning

In reinforcement learning (RL) research, it is common to assume access t...
research
10/21/2021

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Reinforcement learning (RL) experiments have notoriously high variance, ...
research
07/05/2023

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

Currently, research on Reinforcement learning (RL) can be broadly classi...

Please sign up or login with your details

Forgot password? Click here to reset