Alignment of Language Agents

03/26/2021
by   Zachary Kenton, et al.
0

For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2022

The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial

The value-alignment problem for artificial intelligence (AI) asks how we...
research
04/30/2019

Coevo: a collaborative design platform with artificial agents

We present Coevo, an online platform that allows both humans and artific...
research
05/30/2023

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

The rapid advancement of artificial intelligence (AI) systems suggests t...
research
07/19/2020

Expected Utilitarianism

We want artificial intelligence (AI) to be beneficial. This is the groun...
research
06/16/2022

Is Power-Seeking AI an Existential Risk?

This report examines what I see as the core argument for concern about e...
research
03/28/2023

Natural Selection Favors AIs over Humans

For billions of years, evolution has been the driving force behind the d...
research
04/23/2021

Intensional Artificial Intelligence: From Symbol Emergence to Explainable and Empathetic AI

We argue that an explainable artificial intelligence must possess a rati...

Please sign up or login with your details

Forgot password? Click here to reset