Challenges for Using Impact Regularizers to Avoid Negative Side Effects

01/29/2021
by   David Lindner, et al.
0

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2017

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they d...
research
09/09/2018

Active Inverse Reward Design

Reward design, the problem of selecting an appropriate reward function f...
research
02/26/2019

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...
research
01/08/2023

Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Many real-world reinforcement learning (RL) problems necessitate learnin...
research
06/11/2020

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...
research
04/28/2020

Learned Garbage Collection

Several programming languages use garbage collectors (GCs) to automatica...
research
06/18/2018

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

Inverse reinforcement learning is the problem of inferring the reward fu...

Please sign up or login with your details

Forgot password? Click here to reset