C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

10/22/2021
by   Tianjun Zhang, et al.
0

Goal-conditioned reinforcement learning (RL) can solve tasks in a wide range of domains, including navigation and manipulation, but learning to reach distant goals remains a central challenge to the field. Learning to reach such goals is particularly hard without any offline data, expert demonstrations, and reward shaping. In this paper, we propose an algorithm to solve the distant goal-reaching task by using search at training time to automatically generate a curriculum of intermediate states. Our algorithm, Classifier-Planning (C-Planning), frames the learning of the goal-conditioned policies as expectation maximization: the E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints. Unlike prior methods that combine goal-conditioned RL with graph search, ours performs search only during training and not testing, significantly decreasing the compute costs of deploying the learned policy. Empirically, we demonstrate that our method is more sample efficient than prior methods. Moreover, it is able to solve very long horizons manipulation and navigation tasks, tasks that prior goal-conditioned methods and methods based on graph search fail to solve.

READ FULL TEXT

page 2

page 6

page 9

page 17

research
03/20/2023

Imitating Graph-Based Planning with Goal-Conditioned Policies

Recently, graph-based planning algorithms have gained much attention to ...
research
06/12/2019

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

The history of learning for control has been an exciting back and forth ...
research
07/20/2023

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spon...
research
11/17/2020

C-Learning: Learning to Achieve Goals via Recursive Classification

We study the problem of predicting and controlling the future state dist...
research
06/17/2020

Automatic Curriculum Learning through Value Disagreement

Continually solving new, unsolved tasks is the key to learning diverse b...
research
07/22/2022

Graph-Structured Policy Learning for Multi-Goal Manipulation Tasks

Multi-goal policy learning for robotic manipulation is challenging. Prio...
research
11/01/2022

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is a promising direction fo...

Please sign up or login with your details

Forgot password? Click here to reset