Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

07/06/2020
by   Silviu Pitis, et al.
0

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.

READ FULL TEXT

page 7

page 18

page 24

research
10/28/2022

Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning

Reinforcement learning (RL) often struggles to accomplish a sparse-rewar...
research
11/09/2020

Planning under Uncertainty to Goal Distributions

Goal spaces for planning problems are typically conceived of as subsets ...
research
05/21/2019

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

In Multi-Goal Reinforcement Learning, an agent learns to achieve multipl...
research
06/14/2022

Stein Variational Goal Generation For Reinforcement Learning in Hard Exploration Problems

Multi-goal Reinforcement Learning has recently attracted a large amount ...
research
10/06/2018

Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning

Goal-oriented learning has become a core concept in reinforcement learni...
research
11/24/2017

Identifying Reusable Macros for Efficient Exploration via Policy Compression

Reinforcement Learning agents often need to solve not a single task, but...
research
05/13/2021

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

In Goal-oriented Reinforcement learning, relabeling the raw goals in pas...

Please sign up or login with your details

Forgot password? Click here to reset