Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

10/14/2022
by   Albert Wilcox, et al.
17

Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified Q-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high Q-values in corresponding regions of the state space. Experiments across 5 continuous control domains suggest that MCAC can be used to significantly increase learning efficiency across 6 commonly used RL and RL-from-demonstrations algorithms. See https://sites.google.com/view/mcac-rl for code and supplementary material.

READ FULL TEXT
research
01/31/2018

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speedin...
research
01/28/2023

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

Many existing reinforcement learning (RL) methods employ stochastic grad...
research
08/12/2021

HAC Explore: Accelerating Exploration with Hierarchical Reinforcement Learning

Sparse rewards and long time horizons remain challenging for reinforceme...
research
02/11/2023

UGAE: A Novel Approach to Non-exponential Discounting

The discounting mechanism in Reinforcement Learning determines the relat...
research
10/27/2021

Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling

During recent years, deep reinforcement learning (DRL) has made successf...
research
08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...
research
09/09/2019

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

The exploration mechanism used by a Deep Reinforcement Learning (RL) age...

Please sign up or login with your details

Forgot password? Click here to reset