Concrete Problems in AI Safety

06/21/2016
by   Dario Amodei, et al.
0

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/29/2020

On Safety Assessment of Artificial Intelligence

In this paper we discuss how systems with Artificial Intelligence (AI) c...
research
08/24/2020

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

Autonomous agents acting in the real-world often operate based on models...
research
11/27/2017

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating v...
research
01/31/2017

CommAI: Evaluating the first steps towards a useful general AI

With machine learning successfully applied to new daunting problems almo...
research
05/30/2020

AI Research Considerations for Human Existential Safety (ARCHES)

Framed in positive terms, this report examines how technical AI research...
research
12/15/2017

A Berkeley View of Systems Challenges for AI

With the increasing commoditization of computer vision, speech recogniti...
research
06/19/2022

Modeling Transformative AI Risks (MTAIR) Project – Summary Report

This report outlines work by the Modeling Transformative AI Risk (MTAIR)...

Please sign up or login with your details

Forgot password? Click here to reset