Aligning AI With Shared Human Values

by   Dan Hendrycks, et al.

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to filter out needlessly inflammatory chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.


page 15

page 16


Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

In this paper, we consider the recent trend of evaluating progress on re...

Not Quite 'Ask a Librarian': AI on the Nature, Value, and Future of LIS

AI language models trained on Web data generate prose that reflects huma...

An Evaluation of GPT-4 on the ETHICS Dataset

This report summarizes a short study of the performance of GPT-4 on the ...

Towards Healthy AI: Large Language Models Need Therapists Too

Recent advances in large language models (LLMs) have led to the developm...

The Ghost in the Machine has an American accent: value conflict in GPT-3

The alignment problem in the context of large language models must consi...

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Recent work has shown that Pre-trained Language Models (PLMs) have the a...

Training Socially Aligned Language Models in Simulated Human Society

Social alignment in AI systems aims to ensure that these models behave a...

Code Repositories


Measuring Massive Multitask Language Understanding | ICLR 2021

view repo


Can ML Models Learn Right from Wrong?

view repo

Please sign up or login with your details

Forgot password? Click here to reset