Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

05/31/2020
by   Jeiyoon Park, et al.
0

Despite its notable success in adversarial learning approaches to multi-domain task-oriented dialog system, training the dialog policy via adversarial inverse reinforcement learning often fails to balance the performance of the policy generator and reward estimator. During optimization, the reward estimator often overwhelms the policy generator and produces excessively uninformative gradients. We proposes the Variational Reward estimator Bottleneck (VRB), which is an effective regularization method that aims to constrain unproductive information flows between inputs and the reward estimator. The VRB focuses on capturing discriminative features, by exploiting information bottleneck on mutual information. Empirical results on a multi-domain task-oriented dialog dataset demonstrate that the VRB significantly outperforms previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2019

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

Dialog policy decides what and how a task-oriented dialog system will re...
research
07/13/2023

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Dialog policies, which determine a system's action based on the current ...
research
04/07/2020

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement-based training methods have emerged as the most popular ch...
research
05/30/2018

Adversarial Learning of Task-Oriented Neural Dialog Models

In this work, we propose an adversarial learning method for reward estim...
research
04/08/2020

Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

Many studies have applied reinforcement learning to train a dialog polic...
research
11/30/2022

Reinforced Language Modeling for End-to-End Task Oriented Dialog

In task-oriented dialogs such as MultiWoZ (Budzianowski et al., 2018), a...
research
09/20/2021

Two Approaches to Building Collaborative, Task-Oriented Dialog Agents through Self-Play

Task-oriented dialog systems are often trained on human/human dialogs, s...

Please sign up or login with your details

Forgot password? Click here to reset