Few-Shot Preference Learning for Human-in-the-Loop RL

12/06/2022
by   Joey Hejna, et al.
0

While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to minimize the amount of data required for learning reward functions, we take an opposite approach: expanding the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20×, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at https://sites.google.com/view/few-shot-preference-rl/home.

READ FULL TEXT

page 3

page 5

page 7

page 8

page 14

page 19

page 22

page 23

research
06/09/2021

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Conveying complex objectives to reinforcement learning (RL) agents can o...
research
11/20/2022

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Learning new task-specific skills from a few trials is a fundamental cha...
research
02/27/2023

Active Reward Learning from Online Preferences

Robot policies need to adapt to human preferences and/or new environment...
research
05/27/2023

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) provides a natural way to...
research
04/10/2023

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Generating human-like behavior on robots is a great challenge especially...
research
06/06/2023

Zero-shot Preference Learning for Offline RL via Optimal Transport

Preference-based Reinforcement Learning (PbRL) has demonstrated remarkab...
research
10/24/2018

Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

This paper proposes a sample-efficient yet simple approach to learning c...

Please sign up or login with your details

Forgot password? Click here to reset