Offline Reinforcement Learning with Fisher Divergence Critic Regularization

03/14/2021
by   Ilya Kostrikov, et al.
0

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...
research
10/02/2021

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for tra...
research
01/30/2023

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a...
research
02/13/2022

Supported Policy Optimization for Offline Reinforcement Learning

Policy constraint methods to offline reinforcement learning (RL) typical...
research
10/04/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Offline reinforcement learning (offline RL), which aims to find an optim...
research
07/24/2023

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

As with any machine learning problem with limited data, effective offlin...
research
07/04/2020

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitab...

Please sign up or login with your details

Forgot password? Click here to reset