Preference-based Learning of Reward Function Features

03/03/2021
by   Sydney M. Katz, et al.
0

Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular user's relative weighting for each feature. Designing a representative set of features to encode reward is challenging and can result in inaccurate models that fail to model the users' preferences or perform the task properly. In this paper, we present a method to learn both the relative weighting among features as well as additional features that help encode a user's reward function. The additional features are modeled as a neural network that is trained on the data from pairwise comparison queries. We apply our methods to a driving scenario used in previous work and compare the predictive power of our method to that of only hand-coded features. We perform additional analysis to interpret the learned features and examine the optimal trajectories. Our results show that adding an additional learned feature to the reward model enhances both its predictive power and expressiveness, producing unique results for each user.

READ FULL TEXT

page 2

page 5

page 6

research
06/21/2019

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Our goal is to accurately and efficiently learn reward functions for aut...
research
05/06/2020

Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. ...
research
12/10/2020

Understanding Learned Reward Functions

In many real-world tasks, it is not possible to procedurally specify an ...
research
01/09/2023

On The Fragility of Learned Reward Functions

Reward functions are notoriously difficult to specify, especially for ta...
research
08/16/2021

APReL: A Library for Active Preference-based Reward Learning Algorithms

Reward learning is a fundamental problem in robotics to have robots that...
research
07/12/2019

Learning an Urban Air Mobility Encounter Model from Expert Preferences

Airspace models have played an important role in the development and eva...
research
12/07/2019

Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving

General-purpose planning algorithms for automated driving combine missio...

Please sign up or login with your details

Forgot password? Click here to reset