Offline Learning from Demonstrations and Unlabeled Experience

11/27/2020
by   Konrad Zolna, et al.
6

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot. Towards data-driven offline robot learning that can use this unlabeled experience, we introduce Offline Reinforced Imitation Learning (ORIL). ORIL first learns a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotates all data with the learned reward, and finally trains an agent via offline reinforcement learning. Across a diverse set of continuous control and simulated robotic manipulation tasks, we show that ORIL consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.

READ FULL TEXT

page 4

page 12

research
02/03/2022

How to Leverage Unlabeled Data in Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn control policies from stat...
research
09/26/2019

A Framework for Data-Driven Robotics

We present a framework for data-driven robotics that makes use of a larg...
research
04/18/2023

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Enabling robots to learn novel visuomotor skills in a data-efficient man...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
research
07/31/2022

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Learning robotic tasks in the real world is still highly challenging and...
research
10/05/2022

Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

We consider how to most efficiently leverage teleoperator time to collec...
research
12/01/2016

Generalizing Skills with Semi-Supervised Reinforcement Learning

Deep reinforcement learning (RL) can acquire complex behaviors from low-...

Please sign up or login with your details

Forgot password? Click here to reset