Recipes for Safety in Open-domain Chatbots

by   Jing Xu, et al.

Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. We investigate a variety of methods to mitigate these issues in the context of open-domain generative dialogue models. We introduce a new human-and-model-in-the-loop framework for both training safer models and for evaluating them, as well as a novel method to distill safety considerations inside generative models without the use of an external classifier at deployment time. We conduct experiments comparing these methods and find our new techniques are (i) safer than existing models as measured by automatic and human evaluations while (ii) maintaining usability metrics such as engagingness relative to the state of the art. We then discuss the limitations of this work by analyzing failure cases of our models.


page 21

page 22


Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

We investigate the task of building open domain, conversational dialogue...

EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training

Large-scale pre-training has shown remarkable performance in building op...

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Dialogue safety problems severely limit the real-world deployment of neu...

Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

State-of-the-art dialogue models still often stumble with regards to fac...

Safety and Fairness for Content Moderation in Generative Models

With significant advances in generative AI, new technologies are rapidly...

Recipes for building an open-domain chatbot

Building open-domain chatbots is a challenging area for machine learning...

Evaluation metrics for behaviour modeling

A primary difficulty with unsupervised discovery of structure in large d...

Please sign up or login with your details

Forgot password? Click here to reset