On the Effectiveness of Offline RL for Dialogue Response Generation

07/23/2023
by   Paloma Sodhi, et al.
0

A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Conventionally, generation of natural language for dialogue agents may b...
research
02/21/2023

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Reinforcement learning (RL) has shown great promise for developing dialo...
research
09/02/2022

Dialogue Evaluation with Offline Reinforcement Learning

Task-oriented dialogue systems aim to fulfill user goals through natural...
research
03/22/2023

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Conventionally, since the natural language action space is astronomical,...
research
02/10/2021

Personalization for Web-based Services using Offline Reinforcement Learning

Large-scale Web-based services present opportunities for improving UI po...
research
05/24/2023

Improving Language Models with Advantage-based Offline Policy Gradients

Improving language model generations according to some user-defined qual...
research
04/05/2020

Stylistic Dialogue Generation via Information-Guided Reinforcement Learning Strategy

Stylistic response generation is crucial for building an engaging dialog...

Please sign up or login with your details

Forgot password? Click here to reset