Improving alignment of dialogue agents via targeted human judgements

09/28/2022
by   Amelia Glaese, et al.
2

We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78 than baselines while being more resilient to adversarial probing by humans, violating our rules only 8 extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.

READ FULL TEXT
research
02/27/2022

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Language-guided Embodied AI benchmarks requiring an agent to navigate an...
research
05/25/2023

Role-Play with Large Language Models

As dialogue agents become increasingly human-like in their performance, ...
research
05/25/2022

Helpfulness and Fairness of Task-Oriented Dialogue Systems

Task-oriented dialogue systems aim to answer questions from users and pr...
research
11/01/2019

Generating Justifications for Norm-Related Agent Decisions

We present an approach to generating natural language justifications of ...
research
07/24/2023

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

We propose Reinforcement Learning from Contrast Distillation (RLCD), a m...
research
12/17/2016

A User Simulator for Task-Completion Dialogues

Despite widespread interests in reinforcement-learning for task-oriented...
research
08/21/2018

Aiming to Know You Better Perhaps Makes Me a More Engaging Dialogue Partner

There have been several attempts to define a plausible motivation for a ...

Please sign up or login with your details

Forgot password? Click here to reset