A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

07/24/2023
by   Benjamin Eysenbach, et al.
0

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This “early stopping” makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While practical implementations violate our assumptions and critic regularization is typically applied with smaller regularization coefficients, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. Our results that every problem can be solved with a single step of policy improvement, but rather that one-step RL might be competitive with critic regularization on RL problems that demand strong regularization.

READ FULL TEXT

page 7

page 22

research
06/16/2021

Offline RL Without Off-Policy Evaluation

Most prior approaches to offline reinforcement learning (RL) have taken ...
research
10/14/2021

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...
research
07/10/2017

Scale-Regularized Filter Learning

We start out by demonstrating that an elementary learning task, correspo...
research
03/14/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Many modern approaches to offline Reinforcement Learning (RL) utilize be...
research
07/13/2021

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

In this paper, we propose cautious policy programming (CPP), a novel val...
research
07/05/2022

An Empirical Study of Implicit Regularization in Deep Offline RL

Deep neural networks are the most commonly used function approximators i...
research
10/12/2022

Efficient Offline Policy Optimization with a Learned Model

MuZero Unplugged presents a promising approach for offline policy learni...

Please sign up or login with your details

Forgot password? Click here to reset