Learning Multimodal Rewards from Rankings

09/27/2021
by   Vivek Myers, et al.
0

Learning from human feedback has shown to be a useful approach in acquiring robot reward functions. However, expert feedback is often assumed to be drawn from an underlying unimodal reward function. This assumption does not always hold including in settings where multiple experts provide data or when a single expert provides data for different tasks – we thus go beyond learning a unimodal reward and focus on learning a multimodal reward function. We formulate the multimodal reward learning as a mixture learning problem and develop a novel ranking-based learning approach, where the experts are only required to rank a given set of trajectories. Furthermore, as access to interaction data is often expensive in robotics, we develop an active querying approach to accelerate the learning process. We conduct experiments and user studies using a multi-task variant of OpenAI's LunarLander and a real Fetch robot, where we collect data from multiple users with different preferences. The results suggest that our approach can efficiently learn multimodal reward functions, and improve data-efficiency over benchmark methods that we adapt to our learning problem.

READ FULL TEXT

page 1

page 5

page 14

page 15

page 16

research
05/06/2020

Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. ...
research
06/21/2019

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Our goal is to accurately and efficiently learn reward functions for aut...
research
04/14/2021

Reward function shape exploration in adversarial imitation learning: an empirical study

For adversarial imitation learning algorithms (AILs), no true rewards ar...
research
08/11/2020

Maximizing BCI Human Feedback using Active Learning

Recent advancements in Learning from Human Feedback present an effective...
research
07/31/2019

Inverse Reinforcement Learning with Multiple Ranked Experts

We consider the problem of learning to behave optimally in a Markov Deci...
research
06/24/2020

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

Reward functions are a common way to specify the objective of a robot. A...
research
05/28/2023

Reward Collapse in Aligning Large Language Models

The extraordinary capabilities of large language models (LLMs) such as C...

Please sign up or login with your details

Forgot password? Click here to reset