DeepAI AI Chat
Log In Sign Up

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

by   Ashutosh Baheti, et al.
Georgia Institute of Technology
University of Washington

Dialogue models trained on human conversations inadvertently learn to generate offensive responses. Moreover, models can insult anyone by agreeing with an offensive context. To understand the dynamics of contextually offensive language, we study the stance of dialogue model responses in offensive Reddit conversations. Specifically, we crowd-annotate ToxiChat, a new dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42 their agreement with safe comments (13 classifiers fine-tuned on our dataset achieve 0.71 F1 for offensive labels and 0.53 Macro-F1 for stance labels. Finally, we analyze some existing controllable text generation (CTG) methods to mitigate the contextual offensive behavior of dialogue models. Compared to the baseline, our best CTG model obtains a 19 reduction in agreement with offensive context and 29 responses. This highlights the need for future work to characterize and analyze more forms of inappropriate behavior in dialogue models to help make them safer. Our code and corpus are available at .


page 4

page 14

page 16


DiSCoL: Toward Engaging Dialogue Systems through Conversational Line Guided Response Generation

Having engaging and informative conversations with users is the utmost g...

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

Large pretrained language models can easily produce toxic or biased cont...

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

The goal of information-seeking dialogue is to respond to seeker queries...

An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation

Generating semantically coherent responses is still a major challenge in...

NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation

Sequence-to-Sequence (seq2seq) models have become overwhelmingly popular...

Viola: A Topic Agnostic Generate-and-Rank Dialogue System

We present Viola, an open-domain dialogue system for spoken conversation...

Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness

Open-domain dialogue agents have vastly improved, but still confidently ...