Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

05/21/2019
by   Rui Zhao, et al.
0

In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.

READ FULL TEXT
research
09/16/2018

Improvements on Hindsight Learning

Sparse reward problems are one of the biggest challenges in Reinforcemen...
research
02/20/2019

Curiosity-Driven Experience Prioritization via Density Estimation

In Reinforcement Learning (RL), an agent explores the environment and co...
research
11/16/2017

Hindsight policy gradients

Goal-conditional policies allow reinforcement learning agents to pursue ...
research
05/14/2019

Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning...
research
01/31/2019

Visual Hindsight Experience Replay

Reinforcement Learning algorithms typically require millions of environm...
research
02/10/2023

A Song of Ice and Fire: Analyzing Textual Autotelic Agents in ScienceWorld

Building open-ended agents that can autonomously discover a diversity of...
research
07/06/2020

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

What goals should a multi-goal reinforcement learning agent pursue durin...

Please sign up or login with your details

Forgot password? Click here to reset