Categorizing Wireheading in Partially Embedded Agents

06/21/2019
by   Arushi Majha, et al.
0

Embedded agents are not explicitly separated from their environment, lacking clear I/O channels. Such agents can reason about and modify their internal parts, which they are incentivized to shortcut or wirehead in order to achieve the maximal reward. In this paper, we provide a taxonomy of ways by which wireheading can occur, followed by a definition of wirehead-vulnerable agents. Starting from the fully dualistic universal agent AIXI, we introduce a spectrum of partially embedded agents and identify wireheading opportunities that such agents can exploit, experimentally demonstrating the results with the GRL simulation platform AIXIjs. We contextualize wireheading in the broader class of all misalignment problems - where the goals of the agent conflict with the goals of the human designer - and conjecture that the only other possible type of misalignment is specification gaming. Motivated by this taxonomy, we define wirehead-vulnerable agents as embedded agents that choose to behave differently from fully dualistic agents lacking access to their internal parts.

READ FULL TEXT
research
02/25/2019

Embedded Agency

Traditional models of rational action treat the agent as though it is cl...
research
11/12/2020

Performance of Bounded-Rational Agents With the Ability to Self-Modify

Self-modification of agents embedded in complex environments is hard to ...
research
10/01/1996

Mechanisms for Automated Negotiation in State Oriented Domains

This paper lays part of the groundwork for a domain theory of negotiatio...
research
11/03/2020

Domain-independent generation and classification of behavior traces

Financial institutions mostly deal with people. Therefore, characterizin...
research
07/30/2015

Framework for learning agents in quantum environments

In this paper we provide a broad framework for describing learning agent...
research
08/30/2023

Penalization Framework For Autonomous Agents Using Answer Set Programming

This paper presents a framework for enforcing penalties on intelligent a...

Please sign up or login with your details

Forgot password? Click here to reset