Recipes for Safety in Open-domain Chatbots

10/14/2020
by   Jing Xu, et al.
4

Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. We investigate a variety of methods to mitigate these issues in the context of open-domain generative dialogue models. We introduce a new human-and-model-in-the-loop framework for both training safer models and for evaluating them, as well as a novel method to distill safety considerations inside generative models without the use of an external classifier at deployment time. We conduct experiments comparing these methods and find our new techniques are (i) safer than existing models as measured by automatic and human evaluations while (ii) maintaining usability metrics such as engagingness relative to the state of the art. We then discuss the limitations of this work by analyzing failure cases of our models.

READ FULL TEXT

page 21

page 22

research
07/17/2015

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

We investigate the task of building open domain, conversational dialogue...
research
03/17/2022

EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training

Large-scale pre-training has shown remarkable performance in building op...
research
10/16/2021

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Dialogue safety problems severely limit the real-world deployment of neu...
research
12/10/2021

Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

State-of-the-art dialogue models still often stumble with regards to fac...
research
06/09/2023

Safety and Fairness for Content Moderation in Generative Models

With significant advances in generative AI, new technologies are rapidly...
research
04/28/2020

Recipes for building an open-domain chatbot

Building open-domain chatbots is a challenging area for machine learning...
research
07/23/2020

Evaluation metrics for behaviour modeling

A primary difficulty with unsupervised discovery of structure in large d...

Please sign up or login with your details

Forgot password? Click here to reset