Unbiased Methods for Multi-Goal Reinforcement Learning

06/16/2021
by   Léonard Blier, et al.
0

In multi-goal reinforcement learning (RL) settings, the reward for each goal is sparse, and located in a small neighborhood of the goal. In large dimension, the probability of reaching a reward vanishes and the agent receives little learning signal. Methods such as Hindsight Experience Replay (HER) tackle this issue by also learning from realized but unplanned-for goals. But HER is known to introduce bias, and can converge to low-return policies by overestimating chancy outcomes. First, we vindicate HER by proving that it is actually unbiased in deterministic environments, such as many optimal control settings. Next, for stochastic environments in continuous spaces, we tackle sparse rewards by directly taking the infinitely sparse reward limit. We fully formalize the problem of multi-goal RL with infinitely sparse Dirac rewards at each goal. We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2022

USHER: Unbiased Sampling for Hindsight Experience Replay

Dealing with sparse rewards is a long-standing challenge in reinforcemen...
research
01/18/2020

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Actor critic methods with sparse rewards in model-based deep reinforceme...
research
11/26/2019

The problem with DDPG: understanding failures in deterministic environments with sparse rewards

In environments with continuous state and action spaces, state-of-the-ar...
research
05/09/2018

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learnin...
research
10/02/2018

Reinforcement Learning with Perturbed Rewards

Recent studies have shown the vulnerability of reinforcement learning (R...
research
10/04/2018

Episodic Curiosity through Reachability

Rewards are sparse in the real world and most today's reinforcement lear...
research
11/03/2022

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

We present an approach for autonomous sensor control for information gat...

Please sign up or login with your details

Forgot password? Click here to reset