The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Adversaries

10/12/2020
by   Stephen Casper, et al.
20

As progress in AI continues to advance at a rapid pace, it is crucial to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build systems which may have capabilities at or above the human level is of particular concern. One might suspect that superhumanly-intelligent systems should be modeled as as something which humans, by definition, can't outsmart. However, as a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that highly-effective goal-oriented systems – even ones that are potentially superintelligent – may nonetheless have stable decision theoretic delusions which cause them to make obviously irrational decisions in adversarial settings. In a survey of relevant dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made involving the ways in which these weaknesses could be implanted into a system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2019

Experiential AI

Experiential AI is proposed as a new research agenda in which artists an...
research
01/25/2021

Cognitive Perspectives on Context-based Decisions and Explanations

When human cognition is modeled in Philosophy and Cognitive Science, the...
research
07/19/2023

Absolutist AI

This paper argues that training AI systems with absolute constraints – w...
research
11/30/2020

Inductive Biases for Deep Learning of Higher-Level Cognition

A fascinating hypothesis is that human and animal intelligence could be ...
research
05/31/2023

Decision-Oriented Dialogue for Human-AI Collaboration

We describe a class of tasks called decision-oriented dialogues, in whic...
research
07/09/2020

When Humans and Machines Make Joint Decisions: A Non-Symmetric Bandit Model

How can humans and machines learn to make joint decisions? This has beco...
research
03/16/2023

Characterizing Manipulation from AI Systems

Manipulation is a common concern in many domains, such as social media, ...

Please sign up or login with your details

Forgot password? Click here to reset