Log In Sign Up

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

by   Jieyu Zhao, et al.

Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.


page 1

page 2

page 3

page 4


AiSocrates: Towards Answering Ethical Quandary Questions

Considerable advancements have been made in various NLP tasks based on t...

How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?

Text-to-image generative models have achieved unprecedented success in g...

Robustly Optimized and Distilled Training for Natural Language Understanding

In this paper, we explore multi-task learning (MTL) as a second pretrain...

Machine Reading, Fast and Slow: When Do Models "Understand" Language?

Two of the most fundamental challenges in Natural Language Understanding...

DREAM: Uncovering Mental Models behind Language Models

To what extent do language models (LMs) build "mental models" of a scene...

One of Many: Assessing User-level Effects of Moderation Interventions on r/The_Donald

Evaluating the effects of moderation interventions is a task of paramoun...