Parametrically Retargetable Decision-Makers Tend To Seek Power

06/27/2022
by   Alexander Matt Turner, et al.
0

If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive. However, the real world is neither fully observable, nor will agents be perfectly optimal. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We discover that many decision-making functions are retargetable, and that retargetability is sufficient to cause power-seeking tendencies. Our functional criterion is simple and broad. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, highly retargetable training procedures may train real-world agents which seek power over humans.

READ FULL TEXT

page 5

page 25

research
12/03/2019

Optimal Farsighted Agents Tend to Seek Power

Some researchers have speculated that capable reinforcement learning (RL...
research
06/23/2022

On Avoiding Power-Seeking by Artificial Intelligence

We do not know how to align a very intelligent AI agent's behavior with ...
research
03/17/2022

The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents

Learned communication between agents is a powerful tool when approaching...
research
01/11/2022

Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

In this paper, we contribute a multi-faceted study into Pavlovian signal...
research
06/16/2022

Is Power-Seeking AI an Existential Risk?

This report examines what I see as the core argument for concern about e...
research
10/15/2019

Visual Hide and Seek

We train embodied agents to play Visual Hide and Seek where a prey must ...
research
04/06/2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Artificial agents have traditionally been trained to maximize reward, wh...

Please sign up or login with your details

Forgot password? Click here to reset