Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

11/19/2018
by   Yuexin Wu, et al.
0

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.

READ FULL TEXT
research
01/18/2018

Integrating planning for task-completion dialogue policy learning

Training a task-completion dialogue agent with real users via reinforcem...
research
08/28/2018

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to impro...
research
06/02/2019

Budgeted Policy Learning for Task-Oriented Dialogue Systems

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by inc...
research
11/09/2020

Action State Update Approach to Dialogue Management

Utterance interpretation is one of the main functions of a dialogue mana...
research
05/24/2016

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

The ability to compute an accurate reward function is essential for opti...
research
11/10/2017

Integrating User and Agent Models: A Deep Task-Oriented Dialogue System

Task-oriented dialogue systems can efficiently serve a large number of c...
research
12/17/2020

Embodied Visual Active Learning for Semantic Segmentation

We study the task of embodied visual active learning, where an agent is ...

Please sign up or login with your details

Forgot password? Click here to reset