Reinforced Language Modeling for End-to-End Task Oriented Dialog

11/30/2022
by   Xiao Yu, et al.
0

In task-oriented dialogs such as MultiWoZ (Budzianowski et al., 2018), an informative and/or successful system response needs to include necessary key information such as the phone number of a hotel. Therefore, we hypothesize that by helping the model to focus more on learning key quantities in the dialog, the model can generative more informative and helpful responses. In this paper, we propose a new training algorithm, Reinforced Language Modeling (RLM), that aims to use a fine-grained reward function and reinforcement learning to help the model focus more on generating key quantities correctly during test time. Empirical results show our proposed RLM achieves state-of-the-art performance on the inform rate, success rate, and combined score in MultiWoZ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2022

Mars: Semantic-aware Contrastive Learning for End-to-End Task-Oriented Dialog

Traditional end-to-end task-oriented dialog systems first convert dialog...
research
09/22/2020

SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning

The recent advent of neural approaches for developing each dialog compon...
research
09/14/2022

SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation

Recently, pre-training methods have shown remarkable success in task-ori...
research
10/13/2022

Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

Recently, there has been progress in supervised funetuning pretrained GP...
research
05/31/2020

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

Despite its notable success in adversarial learning approaches to multi-...
research
06/09/2021

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation ...
research
12/30/2019

Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog

The task of identifying out-of-domain (OOD) input examples directly at t...

Please sign up or login with your details

Forgot password? Click here to reset