Avoiding Side Effects By Considering Future Tasks

10/15/2020
by   Victoria Krakovna, et al.
22

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...
research
06/04/2018

Measuring and avoiding side effects using relative reachability

How can we design reinforcement learning agents that avoid causing unnec...
research
09/21/2018

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...
research
01/25/2022

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...
research
06/23/2022

World Value Functions: Knowledge Representation for Learning and Planning

We propose world value functions (WVFs), a type of goal-oriented general...
research
02/26/2019

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...
research
11/18/2021

Assisted Robust Reward Design

Real-world robotic tasks require complex reward functions. When we defin...

Please sign up or login with your details

Forgot password? Click here to reset