Integrating planning for task-completion dialogue policy learning

01/18/2018
by   Baolin Peng, et al.
0

Training a task-completion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on Dyna-Q framework, and provides a more sample-efficient approach to learn the dialogue polices. The proposed agent consists of a planner trained on-line with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a movie-booking task in both a simulation setting and a human-in-the-loop setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2018

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to impro...
research
11/19/2018

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Training task-completion dialogue agents with reinforcement learning usu...
research
06/02/2019

Budgeted Policy Learning for Task-Oriented Dialogue Systems

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by inc...
research
04/20/2018

Subgoal Discovery for Hierarchical Dialogue Policy Learning

Developing conversational agents to engage in complex dialogues is chall...
research
04/10/2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Building a dialogue agent to fulfill complex tasks, such as travel plann...
research
08/14/2015

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Statistical spoken dialogue systems have the attractive property of bein...
research
12/28/2020

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Dialogue policy learning based on reinforcement learning is difficult to...

Please sign up or login with your details

Forgot password? Click here to reset