When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

10/28/2022
by   Weiyan Shi, et al.
0

Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfaction classifier to label the unlabeled data; and (ii) training a reply corrector to map the bad replies to good ones. We find that augmenting training with model-corrected replies improves the final dialogue model, and we can further improve performance by using both positive and negative replies through the recently proposed Director model.

READ FULL TEXT
research
01/16/2019

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

The majority of conversations a dialogue agent sees over its lifetime oc...
research
08/05/2022

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Frozen models trained to mimic static datasets can never improve their p...
research
02/06/2023

Languages are Rewards: Chain of Hindsight Finetuning using Human Feedback

Learning from human preferences is important for language models to be h...
research
07/17/2020

Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

Human-in-the-loop Reinforcement Learning (HRL) aims to integrate human g...
research
11/29/2016

Dialogue Learning With Human-In-The-Loop

An important aspect of developing conversational agents is to give a bot...
research
01/23/2020

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework

Interactive reinforcement learning provides a way for agents to learn to...
research
02/01/2020

Dialogue-based simulation for cultural awareness training

Existing simulations designed for cultural and interpersonal skill train...

Please sign up or login with your details

Forgot password? Click here to reset