Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

08/17/2019
by   Emily Dinan, et al.
0

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galán-García et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it strategy with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods will be made open source and publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2022

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

We present BlenderBot 3, a 175B parameter dialogue model capable of open...
research
03/24/2022

Language Models that Seek for Knowledge: Modular Search Generation for Dialogue and Prompt Completion

Language models (LMs) have recently been shown to generate more factual ...
research
10/02/2020

Multi-Modal Open-Domain Dialogue

Recent work in open-domain conversational agents has demonstrated that s...
research
11/02/2018

Engaging Image Chat: Modeling Personality in Grounded Dialogue

To achieve the long-term goal of machines being able to engage humans in...
research
12/24/2020

I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

To quantify how well natural language understanding models can capture c...
research
01/30/2019

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

Beyond current conversational chatbots or task-oriented dialogue systems...
research
11/02/2021

A Review of Dialogue Systems: From Trained Monkeys to Stochastic Parrots

In spoken dialogue systems, we aim to deploy artificial intelligence to ...

Please sign up or login with your details

Forgot password? Click here to reset