Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization

09/26/2022
by   Lunjun Zhang, et al.
0

Hindsight goal relabeling has become a foundational technique for multi-goal reinforcement learning (RL). The idea is quite simple: any arbitrary trajectory can be seen as an expert demonstration for reaching the trajectory's end state. Intuitively, this procedure trains a goal-conditioned policy to imitate a sub-optimal expert. However, this connection between imitation and hindsight relabeling is not well understood. Modern imitation learning algorithms are described in the language of divergence minimization, and yet it remains an open problem how to recast hindsight goal relabeling into that framework. In this work, we develop a unified objective for goal-reaching that explains such a connection, from which we can derive goal-conditioned supervised learning (GCSL) and the reward function in hindsight experience replay (HER) from first principles. Experimentally, we find that despite recent advances in goal-conditioned behaviour cloning (BC), multi-goal Q-learning can still outperform BC-like methods; moreover, a vanilla combination of both actually hurts model performance. Under our framework, we study when BC is expected to help, and empirically validate our findings. Our work further bridges goal-reaching and generative modeling, illustrating the nuances and new pathways of extending the success of generative models to RL.

READ FULL TEXT

page 2

page 8

page 19

research
12/12/2019

Learning To Reach Goals Without Reinforcement Learning

Imitation learning algorithms provide a simple and straightforward appro...
research
06/07/2022

Imitating Past Successes can be Very Suboptimal

Prior work has proposed a simple strategy for reinforcement learning (RL...
research
02/15/2020

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

This work considers two distinct settings: imitation learning and goal-c...
research
12/31/2019

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition ...
research
06/13/2019

Goal-conditioned Imitation Learning

Designing rewards for Reinforcement Learning (RL) is challenging because...
research
05/27/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Learning with an objective to minimize the mismatch with a reference dis...
research
06/12/2019

Sub-Goal Trees -- a Framework for Goal-Directed Trajectory Prediction and Optimization

Many AI problems, in robotics and other domains, are goal-directed, esse...

Please sign up or login with your details

Forgot password? Click here to reset