The Concept of Criticality in AI Safety

01/12/2022
by   Yitzhak Spielberg, et al.
0

When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

Backdoor Detection in Reinforcement Learning

While the real world application of reinforcement learning (RL) is becom...
research
05/29/2019

Unpredictability of AI

The young field of AI Safety is still in the process of identifying its ...
research
01/01/2019

Personal Universes: A Solution to the Multi-Agent Value Alignment Problem

AI Safety researchers attempting to align values of highly capable intel...
research
06/22/2011

Competitive Safety Analysis: Robust Decision-Making in Multi-Agent Systems

Much work in AI deals with the selection of proper actions in a given (k...
research
01/16/2023

AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents

AI alignment is about ensuring AI systems only pursue goals and activiti...
research
10/05/2019

Towards Deployment of Robust AI Agents for Human-Machine Partnerships

We study the problem of designing AI agents that can robustly cooperate ...
research
05/14/2021

Building Affordance Relations for Robotic Agents - A Review

Affordances describe the possibilities for an agent to perform actions w...

Please sign up or login with your details

Forgot password? Click here to reset