DeepAI AI Chat
Log In Sign Up

USHER: Unbiased Sampling for Hindsight Experience Replay

by   Liam Schramm, et al.

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.


page 1

page 7


Unbiased Methods for Multi-Goal Reinforcement Learning

In multi-goal reinforcement learning (RL) settings, the reward for each ...

PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

Learning with sparse rewards remains a significant challenge in reinforc...

Hindsight Experience Replay

Dealing with sparse rewards is one of the biggest challenges in Reinforc...

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...

Curiosity-Driven Experience Prioritization via Density Estimation

In Reinforcement Learning (RL), an agent explores the environment and co...

Importance Resampling for Off-policy Prediction

Importance sampling (IS) is a common reweighting strategy for off-policy...

Explanation-Aware Experience Replay in Rule-Dense Environments

Human environments are often regulated by explicit and complex rulesets....