Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

03/14/2022
by   Joar Skalse, et al.
0

It's challenging to design reward functions for complex, real-world tasks. Reward learning lets one instead infer reward functions from data. However, multiple reward functions often fit the data equally well, even in the infinite-data limit. Prior work often considers reward functions to be uniquely recoverable, by imposing additional assumptions on data sources. By contrast, we formally characterise the partial identifiability of popular data sources, including demonstrations and trajectory preferences, under multiple common sets of assumptions. We analyse the impact of this partial identifiability on downstream tasks such as policy optimisation, including under changes in environment dynamics. We unify our results in a framework for comparing data sources and downstream tasks by their invariances, with implications for the design and selection of data sources for reward learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

Reward functions are a common way to specify the objective of a robot. A...
research
06/24/2020

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...
research
03/19/2021

On the design of autonomous agents from multiple data sources

This paper is concerned with the problem of designing agents able to dyn...
research
03/22/2021

Combining Reward Information from Multiple Sources

Given two sources of evidence about a latent variable, one can combine t...
research
07/25/2023

A Primer on the Data Cleaning Pipeline

The availability of both structured and unstructured databases, such as ...
research
05/03/2021

OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in Distributed Learning

The diversity and quantity of the data warehousing, gathering data from ...
research
12/20/2021

An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

Software Vulnerability (SV) severity assessment is a vital task for info...

Please sign up or login with your details

Forgot password? Click here to reset