Active Reward Learning from Online Preferences

02/27/2023
by   Vivek Myers, et al.
0

Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on human feedback, and those feedback usually need to be frequent and too complex for the humans to reliably provide. To avoid placing undue burden on human experts and allow quick adaptation in critical real-world situations, we propose designing and sparingly presenting easy-to-answer pairwise action preference queries in an online fashion. Our approach designs queries and determines when to present them to maximize the expected value derived from the queries' information. We demonstrate our approach with experiments in simulation, human user studies, and real robot experiments. In these settings, our approach outperforms baseline techniques while presenting fewer queries to human experts. Experiment videos, code and appendices are found at https://sites.google.com/view/onlineactivepreferences.

READ FULL TEXT

page 3

page 5

page 6

page 9

research
12/06/2022

Few-Shot Preference Learning for Human-in-the-Loop RL

While reinforcement learning (RL) has become a more popular approach for...
research
10/15/2020

Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach

Human demonstrations can provide trustful samples to train reinforcement...
research
09/28/2022

Argumentative Reward Learning: Reasoning About Human Preferences

We define a novel neuro-symbolic framework, argumentative reward learnin...
research
03/09/2022

Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

For robots to be effectively deployed in novel environments and tasks, t...
research
04/13/2022

A Study of Causal Confusion in Preference-Based Reward Learning

Learning robot policies via preference-based reward learning is an incre...
research
10/10/2019

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Robots can learn the right reward function by querying a human expert. E...
research
04/12/2022

Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback

For summarization, human preference is critical to tame outputs of the s...

Please sign up or login with your details

Forgot password? Click here to reset