What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation

09/15/2021
by   Christian Geishauser, et al.
0

The dialogue management component of a task-oriented dialogue system is typically optimised via reinforcement learning (RL). Optimisation via RL is highly susceptible to sample inefficiency and instability. The hierarchical approach called Feudal Dialogue Management takes a step towards more efficient learning by decomposing the action space. However, it still suffers from instability due to the reward only being provided at the end of the dialogue. We propose the usage of an intrinsic reward based on information gain to address this issue. Our proposed reward favours actions that resolve uncertainty or query the user whenever necessary. It enables the policy to learn how to retrieve the users' needs efficiently, which is an integral aspect in every task-oriented conversation. Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework, outperforming much more complex approaches. We confirm the sample efficiency and stability of our algorithm through experiments in simulation and a human trial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

A Survey on Recent Advances and Challenges in Reinforcement LearningMethods for Task-Oriented Dialogue Policy Learning

Dialogue Policy Learning is a key component in a task-oriented dialogue ...
research
03/08/2018

Feudal Reinforcement Learning for Dialogue Management in Large Domains

Reinforcement learning (RL) is a promising approach to solve dialogue po...
research
03/10/2021

Causal-aware Safe Policy Improvement for Task-oriented dialogue

The recent success of reinforcement learning's (RL) in solving complex t...
research
06/19/2017

Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning

Human conversation is inherently complex, often spanning many different ...
research
08/01/2019

Reinforcement Learning for Personalized Dialogue Management

Language systems have been of great interest to the research community a...
research
09/01/2023

JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning

Dialogue policy learning (DPL) is a crucial component of dialogue modell...
research
05/24/2016

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

The ability to compute an accurate reward function is essential for opti...

Please sign up or login with your details

Forgot password? Click here to reset