Conservative Agency via Attainable Utility Preservation

02/26/2019
by   Alexander Matt Turner, et al.
0

Reward functions are often misspecified. An agent optimizing an incorrect reward function can change its environment in large, undesirable, and potentially irreversible ways. Work on impact measurement seeks a means of identifying (and thereby avoiding) large changes to the environment. We propose a novel impact measure which induces conservative, effective behavior across a range of situations. The approach attempts to preserve the attainable utility of auxiliary objectives. We evaluate our proposal on an array of benchmark tasks and show that it matches or outperforms relative reachability, the state-of-the-art in impact measurement.

READ FULL TEXT

page 4

page 5

research
02/07/2021

Consequences of Misaligned AI

AI systems often rely on two key components: a specified goal or reward ...
research
01/29/2021

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besi...
research
06/11/2020

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...
research
06/04/2018

Measuring and avoiding side effects using relative reachability

How can we design reinforcement learning agents that avoid causing unnec...
research
10/15/2020

Avoiding Side Effects By Considering Future Tasks

Designing reward functions is difficult: the designer has to specify wha...

Please sign up or login with your details

Forgot password? Click here to reset