PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

06/01/2020
by   Henry Charlesworth, et al.
20

Learning with sparse rewards remains a significant challenge in reinforcement learning (RL), especially when the aim is to train a policy capable of achieving multiple different goals. To date, the most successful approaches for dealing with multi-goal, sparse reward environments have been model-free RL algorithms. In this work we propose PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards. Our method builds on the fact that any trajectory of experience collected by an agent contains useful information about how to achieve the goals observed during that trajectory. We use this to train an ensemble of conditional generative models (GANs) to generate plausible trajectories that lead the agent from its current state towards a specified goal. We then combine these imagined trajectories into a novel planning algorithm in order to achieve the desired goal as efficiently as possible. The performance of PlanGAN has been tested on a number of robotic navigation/manipulation tasks in comparison with a range of model-free reinforcement learning baselines, including Hindsight Experience Replay. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.

READ FULL TEXT
research
07/01/2021

MHER: Model-based Hindsight Experience Replay

Solving multi-goal reinforcement learning (RL) problems with sparse rewa...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
03/22/2019

DQN with model-based exploration: efficient learning on environments with sparse rewards

We propose Deep Q-Networks (DQN) with model-based exploration, an algori...
research
03/13/2019

Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

In this paper, we propose a reinforcement learning-based algorithm for t...
research
07/03/2022

USHER: Unbiased Sampling for Hindsight Experience Replay

Dealing with sparse rewards is a long-standing challenge in reinforcemen...
research
08/01/2022

Relay Hindsight Experience Replay: Continual Reinforcement Learning for Robot Manipulation Tasks with Sparse Rewards

Learning with sparse rewards is usually inefficient in Reinforcement Lea...
research
05/13/2021

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

In Goal-oriented Reinforcement learning, relabeling the raw goals in pas...

Please sign up or login with your details

Forgot password? Click here to reset