In-Sample Policy Iteration for Offline Reinforcement Learning

06/09/2023
by   Xiaohan Hu, et al.
0

Offline reinforcement learning (RL) seeks to derive an effective control policy from previously collected data. To circumvent errors due to inadequate data coverage, behavior-regularized methods optimize the control policy while concurrently minimizing deviation from the data collection policy. Nevertheless, these methods often exhibit subpar practical performance, particularly when the offline dataset is collected by sub-optimal policies. In this paper, we propose a novel algorithm employing in-sample policy iteration that substantially enhances behavior-regularized methods in offline RL. The core insight is that by continuously refining the policy used for behavior regularization, in-sample policy iteration gradually improves itself while implicitly avoids querying out-of-sample actions to avert catastrophic learning failures. Our theoretical analysis verifies its ability to learn the in-sample optimal policy, exclusively utilizing actions well-covered by the dataset. Moreover, we propose competitive policy improvement, a technique applying two competitive policies, both of which are trained by iteratively improving over the best competitor. We show that this simple yet potent technique significantly enhances learning efficiency when function approximation is applied. Lastly, experimental results on the D4RL benchmark indicate that our algorithm outperforms previous state-of-the-art methods in most tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
12/19/2022

Policy learning "without” overlap: Pessimism and generalized empirical Bernstein's inequality

This paper studies offline policy learning, which aims at utilizing obse...
research
07/03/2021

Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning

The performance of state-of-the-art baselines in the offline RL regime v...
research
02/28/2023

The In-Sample Softmax for Offline Reinforcement Learning

Reinforcement learning (RL) agents can leverage batches of previously co...
research
07/25/2023

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing wit...
research
09/29/2022

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

In offline reinforcement learning, weighted regression is a common metho...
research
03/17/2021

Regularized Behavior Value Estimation

Offline reinforcement learning restricts the learning process to rely on...

Please sign up or login with your details

Forgot password? Click here to reset