Impossibility of deducing preferences and rationality from human policy

12/15/2017
by   Stuart Armstrong, et al.
0

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. However, human planning systematically deviates from rationality. Though there has been some IRL work which assumes humans are noisily rational, there has been little analysis of the general problem of inferring the reward of a human of unknown rationality. The observed behavior can, in principle, be decomposed into two composed into two components: a reward function and a planning algorithm that maps reward function to policy. Both of these variables have to be inferred from behaviour. This paper presents a "No Free Lunch" theorem in this area, showing that, without making `normative' assumptions beyond the data, nothing about the human reward function can be deduced from human behaviour. Unlike most No Free Lunch theorems, this cannot be alleviated by regularising with simplicity assumptions. The simplest hypotheses are generally degenerate. The paper will then sketch how one might begin to use normative assumptions to get around the problem, without which solving the general IRL problem is impossible. The reward function-planning algorithm formalism can also be used to encode what it means for an agent to manipulate or override human preferences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2022

Misspecification in Inverse Reinforcement Learning

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward fun...
research
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
research
12/18/2020

Content Masked Loss: Human-Like Brush Stroke Planning in a Reinforcement Learning Painting Agent

The objective of most Reinforcement Learning painting agents is to minim...
research
11/11/2020

Accounting for Human Learning when Inferring Human Preferences

Inverse reinforcement learning (IRL) is a common technique for inferring...
research
01/13/2020

LESS is More: Rethinking Probabilistic Models of Human Behavior

Robots need models of human behavior for both inferring human goals and ...
research
05/21/2018

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

Inferring intent from observed behavior has been studied extensively wit...
research
02/03/2023

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

There is a recent trend of applying multi-agent reinforcement learning (...

Please sign up or login with your details

Forgot password? Click here to reset