-
A Framework for Data-Driven Robotics
We present a framework for data-driven robotics that makes use of a larg...
read it
-
Semi-supervised reward learning for offline reinforcement learning
In offline reinforcement learning (RL) agents are trained using a logged...
read it
-
Interactively shaping robot behaviour with unlabeled human instructions
In this paper, we propose a framework that enables a human teacher to sh...
read it
-
Generalizing Skills with Semi-Supervised Reinforcement Learning
Deep reinforcement learning (RL) can acquire complex behaviors from low-...
read it
-
Positive-Unlabeled Reward Learning
Learning reward functions from data is a promising path towards achievin...
read it
-
Learning Dexterous Manipulation from Suboptimal Experts
Learning dexterous manipulation in high-dimensional state-action spaces ...
read it
-
Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation
Collecting and automatically obtaining reward signals from real robotic ...
read it
Offline Learning from Demonstrations and Unlabeled Experience
Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot. Towards data-driven offline robot learning that can use this unlabeled experience, we introduce Offline Reinforced Imitation Learning (ORIL). ORIL first learns a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotates all data with the learned reward, and finally trains an agent via offline reinforcement learning. Across a diverse set of continuous control and simulated robotic manipulation tasks, we show that ORIL consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.
READ FULL TEXT
Comments
There are no comments yet.