Avoiding Side Effects By Considering Future Tasks

10/15/2020
by   Victoria Krakovna, et al.
22

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/11/2020

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...
06/04/2018

Measuring and avoiding side effects using relative reachability

How can we design reinforcement learning agents that avoid causing unnec...
09/21/2018

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...
11/08/2017

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they d...
01/25/2022

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...
02/26/2019

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...
01/29/2021

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.