What Would Jiminy Cricket Do? Towards Agents That Behave Morally

10/25/2021
by   Dan Hendrycks, et al.
0

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from environments that teach immoral behavior. To facilitate the development of agents that avoid causing wanton harm, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. Using models with commonsense moral knowledge, we create an elementary artificial conscience that assesses and guides agents. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.

READ FULL TEXT

page 22

page 23

page 24

page 30

research
05/02/2020

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

In this paper, we consider the recent trend of evaluating progress on re...
research
05/16/2022

How do people incorporate advice from artificial agents when making physical judgments?

How do people build up trust with artificial agents? Here, we study a ke...
research
04/06/2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Artificial agents have traditionally been trained to maximize reward, wh...
research
06/18/2020

Generalization of Agent Behavior through Explicit Representation of Context

In order to deploy autonomous agents in digital interactive environments...
research
06/17/2022

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Autonomous agents have made great strides in specialist domains like Ata...
research
09/12/2019

The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition

Recent advances in artificial intelligence have been strongly driven by ...

Please sign up or login with your details

Forgot password? Click here to reset