Improving Open Language Models by Learning from Organic Interactions

06/07/2023
by   Jing Xu, et al.
0

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work.

READ FULL TEXT
research
10/14/2021

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures

Current open-domain conversational models can easily be made to talk in ...
research
08/05/2022

Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

The promise of interaction between intelligent conversational agents and...
research
07/25/2022

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Despite recent advances in natural language understanding and generation...
research
09/24/2021

An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog

Online conversations include more than just text. Increasingly, image-ba...
research
11/07/2018

A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons

We present the design of an online social skills development interface f...
research
10/14/2020

Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

The ubiquitous nature of chatbots and their interaction with users gener...
research
05/22/2023

CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles

We present a design framework called Conversational Learning with Analyt...

Please sign up or login with your details

Forgot password? Click here to reset