End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

12/07/2017
by   Li Zhou, et al.
0

Learning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between customers and trained human agents, encoder-decoder methods have gained popularity as agent utterances can be directly treated as supervision without the need for utterance-level annotations. However, one potential drawback of such approaches is that they myopically generate the next agent utterance without regard for dialog-level considerations. To resolve this concern, this paper describes an offline RL method for learning from unannotated corpora that can optimize a goal-oriented policy at both the utterance and dialog level. We introduce a novel reward function and use both on-policy and off-policy policy gradient to learn a policy offline without requiring online user interaction or an explicit state space definition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

In this paper, we present a deep reinforcement learning (RL) framework f...
research
09/06/2019

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation

Reinforcement learning (RL) is an effective approach to learn an optimal...
research
05/30/2018

Adversarial Learning of Task-Oriented Neural Dialog Models

In this work, we propose an adversarial learning method for reward estim...
research
06/03/2016

End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

This paper presents a model for end-to-end learning of task-oriented dia...
research
07/02/2018

Improving Goal-Oriented Visual Dialog Agents via Advanced Recurrent Nets with Tempered Policy Gradient

Learning goal-oriented dialogues by means of deep reinforcement learning...
research
06/09/2021

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation ...
research
10/02/2018

Efficient Dialog Policy Learning via Positive Memory Retention

This paper is concerned with the training of recurrent neural networks a...

Please sign up or login with your details

Forgot password? Click here to reset