Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

10/19/2022
by   Chengqian Gao, et al.
0

A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL. However, existing works heavily rely on the purity of the data, exhibiting performance degradation or even catastrophic failure when learning from contaminated datasets containing impure trajectories of diverse levels. e.g., expert level, medium level, etc., while offline contaminated data logs exist commonly in the real world. To mitigate this, we first introduce gradient penalty over the learned value function to tackle the exploding Q-functions. We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation. Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods, evaluated on a set of contaminated D4RL Mujoco and Adroit datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

We study a new paradigm for sequential decision making, called offline P...
research
12/26/2020

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...
research
10/19/2021

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real...
research
09/04/2023

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously...
research
06/23/2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Offline Reinforcement Learning (RL) is a promising approach for learning...
research
10/13/2020

Balancing Constraints and Rewards with Meta-Gradient D4PG

Deploying Reinforcement Learning (RL) agents to solve real-world applica...
research
03/28/2023

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Most offline reinforcement learning (RL) methods suffer from the trade-o...

Please sign up or login with your details

Forgot password? Click here to reset