Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

08/05/2022
by   Jing Xu, et al.
9

Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback – including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feedback and algorithms work best. We find the recently introduced Director model (Arora et al., '22) shows significant improvements over other existing approaches.

READ FULL TEXT

page 13

page 14

page 15

research
11/01/2020

Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning

The interaction of conversational systems with users poses an exciting o...
research
10/28/2022

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

Deployed dialogue agents have the potential to integrate human feedback ...
research
06/02/2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Language models (LMs) often exhibit undesirable text generation behavior...
research
07/15/2021

Internet-Augmented Dialogue Generation

The largest store of continually updating knowledge on our planet can be...
research
08/05/2022

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

We present BlenderBot 3, a 175B parameter dialogue model capable of open...
research
04/28/2020

Recipes for building an open-domain chatbot

Building open-domain chatbots is a challenging area for machine learning...
research
08/05/2022

Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

The promise of interaction between intelligent conversational agents and...

Please sign up or login with your details

Forgot password? Click here to reset