DeepAI AI Chat
Log In Sign Up

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

by   Yue Wu, et al.

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.


page 5

page 13

page 14

page 17

page 19

page 20

page 21

page 22


Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action space...

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Sample efficiency and performance in the offline setting have emerged as...

Risk-Averse Offline Reinforcement Learning

Training Reinforcement Learning (RL) agents in high-stakes applications ...

How to Enable Uncertainty Estimation in Proximal Policy Optimization

While deep reinforcement learning (RL) agents have showcased strong resu...

Reinforcement Learning Under Algorithmic Triage

Methods to learn under algorithmic triage have predominantly focused on ...

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distributi...

Offline Reinforcement Learning with Pseudometric Learning

Offline Reinforcement Learning methods seek to learn a policy from logge...