Learning Reward Functions by Integrating Human Demonstrations and Preferences

06/21/2019
by   Malayandi Palan, et al.
4

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.

READ FULL TEXT

page 1

page 2

page 5

page 8

research
05/06/2020

Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. ...
research
10/15/2020

Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach

Human demonstrations can provide trustful samples to train reinforcement...
research
10/10/2018

Batch Active Preference-Based Learning of Reward Functions

Data generation and labeling are usually an expensive part of learning f...
research
03/03/2021

Preference-based Learning of Reward Function Features

Preference-based learning of reward functions, where the reward function...
research
02/05/2018

Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries

We focus on learning the desired objective function for a robot. Althoug...
research
08/16/2021

APReL: A Library for Active Preference-based Reward Learning Algorithms

Reward learning is a fundamental problem in robotics to have robots that...
research
09/27/2021

Learning Multimodal Rewards from Rankings

Learning from human feedback has shown to be a useful approach in acquir...

Please sign up or login with your details

Forgot password? Click here to reset