Adversarial Learning of Task-Oriented Neural Dialog Models

05/30/2018
by   Bing Liu, et al.
0

In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2023

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Dialog policies, which determine a system's action based on the current ...
research
04/07/2020

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement-based training methods have emerged as the most popular ch...
research
04/10/2021

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

For task-oriented dialog systems, training a Reinforcement Learning (RL)...
research
12/07/2017

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline wi...
research
02/27/2023

Multi-Action Dialog Policy Learning from Logged User Feedback

Multi-action dialog policy, which generates multiple atomic dialog actio...
research
10/17/2022

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

Building user simulators (USs) for reinforcement learning (RL) of task-o...
research
05/31/2020

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

Despite its notable success in adversarial learning approaches to multi-...

Please sign up or login with your details

Forgot password? Click here to reset