Avoiding Side Effects By Considering Future Tasks

by   Victoria Krakovna, et al.

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.



There are no comments yet.


page 1

page 2

page 3

page 4


Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...

Measuring and avoiding side effects using relative reachability

How can we design reinforcement learning agents that avoid causing unnec...

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they d...

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.