Misspecification in Inverse Reinforcement Learning

12/06/2022
by   Joar Skalse, et al.
0

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy π. To do this, we need a model of how π relates to R. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function R. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2017

Impossibility of deducing preferences and rationality from human policy

Inverse reinforcement learning (IRL) attempts to infer human rewards or ...
research
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
research
11/11/2020

Accounting for Human Learning when Inferring Human Preferences

Inverse reinforcement learning (IRL) is a common technique for inferring...
research
01/26/2023

Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons

We provide a theoretical framework for Reinforcement Learning with Human...
research
01/02/2020

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Reinforcement learning (RL) has achieved tremendous success as a general...
research
11/16/2019

Generalized Maximum Causal Entropy for Inverse Reinforcement Learning

We consider the problem of learning from demonstrated trajectories with ...
research
10/19/2022

Scaling Laws for Reward Model Overoptimization

In reinforcement learning from human feedback, it is common to optimize ...

Please sign up or login with your details

Forgot password? Click here to reset