The alignment problem from a deep learning perspective

08/30/2022
by   Richard Ngo, et al.
0

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. This report makes a case for why, without substantial action to prevent it, AGIs will likely use their intelligence to pursue goals which are very undesirable (in other words, misaligned) from a human perspective, with potentially catastrophic consequences. The report aims to cover the key arguments motivating concern about the alignment problem in a way that's as succinct, concrete and technically-grounded as possible. I argue that realistic training processes plausibly lead to the development of misaligned goals in AGIs, in particular because neural networks trained via reinforcement learning will learn to plan towards achieving a range of goals; gain more reward by deceptively pursuing misaligned goals; and generalize in ways which undermine obedience. As in an earlier report from Cotra (2022), I explain my claims with reference to an illustrative AGI training process, then outline possible research directions for addressing different aspects of the problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2022

Aligning Artificial Intelligence with Humans through Public Policy

Given that Artificial Intelligence (AI) increasingly permeates our lives...
research
05/09/2022

Aligned with Whom? Direct and social goals for AI systems

As artificial intelligence (AI) becomes more powerful and widespread, th...
research
02/12/2020

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

Deep learning networks have been trained to recognize speech, caption ph...
research
06/28/2022

Explaining Any ML Model? – On Goals and Capabilities of XAI

An increasing ubiquity of machine learning (ML) motivates research on al...
research
10/24/2019

Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Human beings are able to master a variety of knowledge and skills with o...
research
12/08/2022

A Rubric for Human-like Agents and NeuroAI

Researchers across cognitive, neuro-, and computer sciences increasingly...
research
10/04/2022

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

The field of AI alignment is concerned with AI systems that pursue unint...

Please sign up or login with your details

Forgot password? Click here to reset