Budgeted Policy Learning for Task-Oriented Dialogue Systems

06/02/2019
by   Zhirui Zhang, et al.
2

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2018

Integrating planning for task-completion dialogue policy learning

Training a task-completion dialogue agent with real users via reinforcem...
research
11/19/2018

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Training task-completion dialogue agents with reinforcement learning usu...
research
04/20/2018

Subgoal Discovery for Hierarchical Dialogue Policy Learning

Developing conversational agents to engage in complex dialogues is chall...
research
08/28/2018

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to impro...
research
04/10/2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Building a dialogue agent to fulfill complex tasks, such as travel plann...
research
08/14/2015

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Statistical spoken dialogue systems have the attractive property of bein...
research
11/15/2017

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of...

Please sign up or login with your details

Forgot password? Click here to reset