Measuring and avoiding side effects using relative reachability

06/04/2018
by   Victoria Krakovna, et al.
2

How can we design reinforcement learning agents that avoid causing unnecessary disruptions to their environment? We argue that current approaches to penalizing side effects can introduce bad incentives in tasks that require irreversible actions, and in environments that contain sources of change other than the agent. For example, some approaches give the agent an incentive to prevent any irreversible changes in the environment, including the actions of other agents. We introduce a general definition of side effects, based on relative reachability of states compared to a default state, that avoids these undesirable incentives. Using a set of gridworld experiments illustrating relevant scenarios, we empirically compare relative reachability to penalties based on existing definitions and show that it is the only penalty among those tested that produces the desired behavior in all the scenarios.

READ FULL TEXT
research
10/15/2020

Avoiding Side Effects By Considering Future Tasks

Designing reward functions is difficult: the designer has to specify wha...
research
10/09/2018

Investigating Enactive Learning for Autonomous Intelligent Agents

The enactive approach to cognition is typically proposed as a viable alt...
research
07/28/2022

Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning

For over a decade now, robotics and the use of artificial agents have be...
research
02/19/2022

Learning a Shield from Catastrophic Action Effects: Never Repeat the Same Mistake

Agents that operate in an unknown environment are bound to make mistakes...
research
02/26/2019

Conservative Agency via Attainable Utility Preservation

Reward functions are often misspecified. An agent optimizing an incorrec...
research
10/24/2022

Reachability-Aware Laplacian Representation in Reinforcement Learning

In Reinforcement Learning (RL), Laplacian Representation (LapRep) is a t...
research
04/01/2022

Traversability, Reconfiguration, and Reachability in the Gadget Framework

Consider an agent traversing a graph of "gadgets", each with local state...

Please sign up or login with your details

Forgot password? Click here to reset