Absolutist AI

07/19/2023
by   Mitchell Barrington, et al.
0

This paper argues that training AI systems with absolute constraints – which forbid certain acts irrespective of the amount of value they might produce – may make considerable progress on many AI safety problems in principle. First, it provides a guardrail for avoiding the very worst outcomes of misalignment. Second, it could prevent AIs from causing catastrophes for the sake of very valuable consequences, such as replacing humans with a much larger number of beings living at a higher welfare level. Third, it makes systems more corrigible, allowing creators to make corrective interventions in them, such as altering their objective functions or shutting them down. And fourth, it helps systems explore their environment more safely by prohibiting them from exploring especially dangerous acts. I offer a decision-theoretic formalization of an absolute constraints, improving on existing models in the literature, and use this model to prove some results about the training and behavior of absolutist AIs. I conclude by showing that, although absolutist AIs will not maximize expected value, they will not be susceptible to behave irrationally, and they will not (contra coherence arguments) face environmental pressure to become expected-value maximizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2018

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function)

Utility functions or their equivalents (value functions, objective funct...
research
10/12/2020

The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Adversaries

As progress in AI continues to advance at a rapid pace, it is crucial to...
research
01/15/2021

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

This paper addresses the question of how to align AI systems with human ...
research
09/07/2022

Sell Me the Blackbox! Regulating eXplainable Artificial Intelligence (XAI) May Harm Consumers

Recent AI algorithms are blackbox models whose decisions are difficult t...
research
02/10/2020

Steps Towards Value-Aligned Systems

Algorithmic (including AI/ML) decision-making artifacts are an establish...
research
06/08/2020

Learning under Invariable Bayesian Safety

A recent body of work addresses safety constraints in explore-and-exploi...
research
10/13/2021

Truthful AI: Developing and governing AI that does not lie

In many contexts, lying – the use of verbal falsehoods to deceive – is h...

Please sign up or login with your details

Forgot password? Click here to reset