Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

11/04/2019
by   Alexander Trott, et al.
8

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance.

READ FULL TEXT

page 4

page 14

research
11/14/2022

Redeeming Intrinsic Rewards via Constrained Optimization

State-of-the-art reinforcement learning (RL) algorithms typically use ra...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
10/05/2020

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

Training agents using Reinforcement Learning in games with sparse reward...
research
01/20/2021

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Exploration under sparse reward is a long-standing challenge of model-fr...
research
05/15/2020

Simple Sensor Intentions for Exploration

Modern reinforcement learning algorithms can learn solutions to increasi...
research
09/17/2021

Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Curiosity-based reward schemes can present powerful exploration mechanis...
research
06/15/2023

Reward-Free Curricula for Training Robust World Models

There has been a recent surge of interest in developing generally-capabl...

Please sign up or login with your details

Forgot password? Click here to reset