Subgoal Discovery for Hierarchical Dialogue Policy Learning

04/20/2018
by   Da Tang, et al.
0

Developing conversational agents to engage in complex dialogues is challenging partly because the dialogue policy needs to explore a large state-action space. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given a set of successful dialogue sessions, we present a Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a hierarchical policy which consists of 1) a top-level policy that selects among subgoals, and 2) a low-level policy that selects primitive actions to accomplish the subgoal. We exemplify our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that an agent trained with automatically discovered subgoals performs competitively against an agent with human-defined subgoals, and significantly outperforms an agent without subgoals. Moreover, we show that learned subgoals are human comprehensible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Building a dialogue agent to fulfill complex tasks, such as travel plann...
research
01/18/2018

Integrating planning for task-completion dialogue policy learning

Training a task-completion dialogue agent with real users via reinforcem...
research
04/21/2020

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness

Most existing approaches for goal-oriented dialogue policy learning used...
research
06/02/2019

Budgeted Policy Learning for Task-Oriented Dialogue Systems

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by inc...
research
09/22/2020

Structured Hierarchical Dialogue Policy with Graph Neural Networks

Dialogue policy training for composite tasks, such as restaurant reserva...
research
08/28/2018

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to impro...
research
08/30/2019

Modeling Multi-Action Policy for Task-Oriented Dialogues

Dialogue management (DM) plays a key role in the quality of the interact...

Please sign up or login with your details

Forgot password? Click here to reset