Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

05/30/2023
by   Catalin Mitelut, et al.
0

The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2022

Human-centered mechanism design with Democratic AI

Building artificial intelligence (AI) that aligns with human values is a...
research
03/16/2023

Characterizing Manipulation from AI Systems

Manipulation is a common concern in many domains, such as social media, ...
research
12/11/2020

Conceptualization and Framework of Hybrid Intelligence Systems

As artificial intelligence (AI) systems are getting ubiquitous within ou...
research
01/17/2021

Adversarial Interaction Attack: Fooling AI to Misinterpret Human Intentions

Understanding the actions of both humans and artificial intelligence (AI...
research
07/16/2023

Datalism and Data Monopolies in the Era of A.I.: A Research Agenda

The increasing use of data in various parts of the economic and social s...
research
03/26/2021

Alignment of Language Agents

For artificial intelligence to be beneficial to humans the behaviour of ...
research
11/25/2021

Meaningful human control over AI systems: beyond talking the talk

The concept of meaningful human control has been proposed to address res...

Please sign up or login with your details

Forgot password? Click here to reset