Imitating Graph-Based Planning with Goal-Conditioned Policies

03/20/2023
by   Junsu Kim, et al.
0

Recently, graph-based planning algorithms have gained much attention to solve goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of subgoals to reach the target-goal, and the agents learn to execute subgoal-conditioned policies. However, the sample-efficiency of such RL schemes still remains a challenge, particularly for long-horizon tasks. To address this issue, we present a simple yet effective self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. Our intuition here is that to reach a target-goal, an agent should pass through a subgoal, so target-goal- and subgoal- conditioned policies should be similar to each other. We also propose a novel scheme of stochastically skipping executed subgoals in a planned path, which further improves performance. Unlike prior methods that only utilize graph-based planning in an execution phase, our method transfers knowledge from a planner along with a graph into policy learning. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods under various long-horizon control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2021

C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

Goal-conditioned reinforcement learning (RL) can solve tasks in a wide r...
research
11/18/2021

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Operating in the real-world often requires agents to learn about a compl...
research
11/19/2019

Planning with Goal-Conditioned Policies

Planning methods can solve temporally extended sequential decision makin...
research
07/16/2023

Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is an interesting extension...
research
01/05/2023

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Developing agents that can execute multiple skills by learning from pre-...
research
04/17/2020

Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion

We propose a novel approach to learn goal-conditioned policies for locom...
research
03/13/2020

Sparse Graphical Memory for Robust Planning

To operate effectively in the real world, artificial agents must act fro...

Please sign up or login with your details

Forgot password? Click here to reset