Models of human preference for learning reward functions

06/05/2022
by   W. Bradley Knox, et al.
0

The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments. These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling preferences instead as arising from a different statistic: each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences. We also prove that the previous partial return model lacks this identifiability property without preference noise that reveals rewards' relative proportions, and we empirically show that our proposed regret preference model outperforms it with finite training data in otherwise the same setting. Additionally, our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Preference Transformer: Modeling Human Preferences using Transformers for RL

Preference-based reinforcement learning (RL) provides a framework to tra...
research
09/06/2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Reward design is a fundamental, yet challenging aspect of practical rein...
research
02/17/2023

Data Driven Reward Initialization for Preference based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods utilize binary fe...
research
06/22/2023

Can Differentiable Decision Trees Learn Interpretable Reward Functions?

There is an increasing interest in learning reward functions that model ...
research
05/04/2023

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Language has a strong influence on our perceptions of time and rewards. ...
research
12/15/2021

Learning Submodular Objectives for Team Environmental Monitoring

In this paper, we study the well-known team orienteering problem where a...
research
07/24/2023

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

A crucial task in decision-making problems is reward engineering. It is ...

Please sign up or login with your details

Forgot password? Click here to reset