Aligning AI With Shared Human Values

08/05/2020
by   Dan Hendrycks, et al.
13

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to filter out needlessly inflammatory chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

READ FULL TEXT

page 15

page 16

05/02/2020

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

In this paper, we consider the recent trend of evaluating progress on re...
07/07/2021

Not Quite 'Ask a Librarian': AI on the Nature, Value, and Future of LIS

AI language models trained on Web data generate prose that reflects huma...
09/19/2023

An Evaluation of GPT-4 on the ETHICS Dataset

This report summarizes a short study of the performance of GPT-4 on the ...
04/02/2023

Towards Healthy AI: Large Language Models Need Therapists Too

Recent advances in large language models (LLMs) have led to the developm...
03/15/2022

The Ghost in the Machine has an American accent: value conflict in GPT-3

The alignment problem in the context of large language models must consi...
05/24/2022

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Recent work has shown that Pre-trained Language Models (PLMs) have the a...
05/26/2023

Training Socially Aligned Language Models in Simulated Human Society

Social alignment in AI systems aims to ensure that these models behave a...

Code Repositories

test

Measuring Massive Multitask Language Understanding | ICLR 2021


view repo

ethics

Can ML Models Learn Right from Wrong?


view repo

Please sign up or login with your details

Forgot password? Click here to reset