Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

10/25/2019
by   Abhishek Gupta, et al.
14

We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage that produces goal-conditioned hierarchical policies, and a reinforcement learning phase that finetunes these policies for task performance. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction, allowing it to scale to challenging long-horizon tasks. We simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of every specific tasks that is being solved, and instead leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment. Videos are available at https://relay-policy-learning.github.io/

READ FULL TEXT

page 1

page 7

page 13

research
03/13/2020

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

Imitation learning is an effective and safe technique to train robot pol...
research
04/05/2023

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

We propose a new policy representation based on score-based diffusion mo...
research
11/13/2019

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

Learning from offline task demonstrations is a problem of great interest...
research
04/28/2021

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Learned visuomotor policies have shown considerable success as an altern...
research
04/03/2023

Chain-of-Thought Predictive Control

We study generalizable policy learning from demonstrations for complex l...
research
03/01/2022

FIRL: Fast Imitation and Policy Reuse Learning

Intelligent robotics policies have been widely researched for challengin...
research
06/10/2023

PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) has the potential to solve com...

Please sign up or login with your details

Forgot password? Click here to reset