Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

04/08/2020
by   Ryuichi Takanobu, et al.
0

Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly building a user simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. The method uses the actor-critic framework to facilitate pretraining and improve scalability. We also propose Hybrid Value Network for the role-aware reward decomposition to integrate role-specific domain knowledge of each agent in the task-oriented dialog. Results show that our method can successfully build a system policy and a user policy simultaneously, and two agents can achieve a high task success rate through conversational interaction.

READ FULL TEXT

page 2

page 9

research
09/18/2017

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

In this paper, we present a deep reinforcement learning (RL) framework f...
research
05/07/2020

Adaptive Dialog Policy Learning with Hindsight and User Modeling

Reinforcement learning methods have been used to compute dialog policies...
research
04/07/2020

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement-based training methods have emerged as the most popular ch...
research
11/25/2022

Towards Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

The next step for intelligent dialog agents is to escape their role as s...
research
09/03/2019

How to Build User Simulators to Train RL-based Dialog Systems

User simulators are essential for training reinforcement learning (RL) b...
research
10/17/2022

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

Building user simulators (USs) for reinforcement learning (RL) of task-o...
research
05/31/2020

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

Despite its notable success in adversarial learning approaches to multi-...

Please sign up or login with your details

Forgot password? Click here to reset