Learning Video-Conditioned Policies for Unseen Manipulation Tasks

05/10/2023
by   Elliot Chane-Sane, et al.
0

The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target task. While prior work typically aims to imitate human demonstrations performed in robot environments, here we focus on a more realistic and challenging setup with demonstrations recorded in natural and diverse human environments. We propose Video-conditioned Policy learning (ViP), a data-driven approach that maps human demonstrations of previously unseen tasks to robot manipulation skills. To this end, we learn our policy to generate appropriate actions given current scene observations and a video of the target task. To encourage generalization to new tasks, we avoid particular tasks during training and learn our policy from unlabelled robot trajectories and corresponding robot videos. Both robot and human videos in our framework are represented by video embeddings pre-trained for human action recognition. At test time we first translate human videos to robot videos in the common video embedding space, and then use resulting embeddings to condition our policies. Notably, our approach enables robot control by human demonstrations in a zero-shot manner, i.e., without using robot trajectories paired with human instructions during training. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art. Our method also demonstrates excellent performance in a new challenging zero-shot setup where no paired data is used during training.

READ FULL TEXT

page 1

page 3

page 4

research
06/11/2020

Learning to Play by Imitating Humans

Acquiring multiple skills has commonly involved collecting a large numbe...
research
03/15/2023

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

A rich representation is key to general robotic manipulation, but existi...
research
01/18/2021

Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos

We present an approach for physical imitation from human videos for robo...
research
01/31/2023

Learning Universal Policies via Text-Guided Video Generation

A goal of artificial intelligence is to construct an agent that can solv...
research
04/30/2020

Plan-Space State Embeddings for Improved Reinforcement Learning

Robot control problems are often structured with a policy function that ...
research
10/22/2020

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Imitation learning is a popular approach for teaching motor skills to ro...
research
03/13/2021

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Simulation provides a safe and efficient way to generate useful data for...

Please sign up or login with your details

Forgot password? Click here to reset