Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

05/17/2021
by   Yue Wu, et al.
18

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

READ FULL TEXT

page 5

page 13

page 14

page 17

page 19

page 20

page 21

page 22

research
06/06/2019

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action space...
research
02/12/2021

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Sample efficiency and performance in the offline setting have emerged as...
research
02/10/2021

Risk-Averse Offline Reinforcement Learning

Training Reinforcement Learning (RL) agents in high-stakes applications ...
research
09/23/2021

Reinforcement Learning Under Algorithmic Triage

Methods to learn under algorithmic triage have predominantly focused on ...
research
04/20/2023

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distributi...
research
03/02/2021

Offline Reinforcement Learning with Pseudometric Learning

Offline Reinforcement Learning methods seek to learn a policy from logge...
research
08/07/2023

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

StarCraft II is one of the most challenging simulated reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset