Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

08/28/2019
by   Ryuichi Takanobu, et al.
0

Dialog policy decides what and how a task-oriented dialog system will respond, and plays a vital role in delivering effective conversations. Many studies apply Reinforcement Learning to learn a dialog policy with the reward function which requires elaborate design and pre-specified user goals. With the growing needs to handle complex goals across multiple domains, such manually designed reward functions are not affordable to deal with the complexity of real-world tasks. To this end, we propose Guided Dialog Policy Learning, a novel algorithm based on Adversarial Inverse Reinforcement Learning for joint reward estimation and policy optimization in multi-domain task-oriented dialog. The proposed approach estimates the reward signal and infers the user goal in the dialog sessions. The reward estimator evaluates the state-action pairs so that it can guide the dialog policy at each dialog turn. Extensive experiments on a multi-domain dialog dataset show that the dialog policy guided by the learned reward function achieves remarkably higher task success than state-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement-based training methods have emerged as the most popular ch...
research
05/31/2020

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

Despite its notable success in adversarial learning approaches to multi-...
research
09/06/2019

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation

Reinforcement learning (RL) is an effective approach to learn an optimal...
research
11/25/2022

Towards Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

The next step for intelligent dialog agents is to escape their role as s...
research
04/07/2019

Unsupervised Dialog Structure Learning

Learning a shared dialog structure from a set of task-oriented dialogs i...
research
07/01/2022

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

Learning task-oriented dialog policies via reinforcement learning typica...
research
03/24/2023

Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function

Task-oriented dialog systems enable users to accomplish tasks using natu...

Please sign up or login with your details

Forgot password? Click here to reset