Human Control: Definitions and Algorithms

05/31/2023
by   Ryan Carey, et al.
0

How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2013

Comparison between the two definitions of AI

Two different definitions of the Artificial Intelligence concept have be...
research
02/10/2020

Human-Centered Artificial Intelligence: Reliable, Safe Trustworthy

Well-designed technologies that offer high levels of human control and h...
research
09/10/2019

Pluggable Social Artificial Intelligence for Enabling Human-Agent Teaming

As intelligent systems are increasingly capable of performing their task...
research
02/19/2020

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Past work on evacuation planning assumes that evacuees will follow instr...
research
02/10/2020

Human-Centered Artificial Intelligence: Trusted, Reliable Safe

Well-designed technologies that offer high levels of human control and h...
research
07/16/2023

Datalism and Data Monopolies in the Era of A.I.: A Research Agenda

The increasing use of data in various parts of the economic and social s...
research
02/27/2023

Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values behind Arguments by Leveraging Their Definitions

We describe our experiments for SemEval-2023 Task 4 on the identificatio...

Please sign up or login with your details

Forgot password? Click here to reset