On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

05/24/2016
by   Pei-Hao Su, et al.
0

The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user's intent is known in advance or data is available to pre-train a task success predictor off-line. In practice neither of these apply for most real world applications. Here we propose an on-line learning framework whereby the dialogue policy is jointly trained alongside the reward model via active learning with a Gaussian process model. This Gaussian process operates on a continuous space dialogue representation generated in an unsupervised fashion using a recurrent neural network encoder-decoder. The experimental results demonstrate that the proposed framework is able to significantly reduce data annotation costs and mitigate noisy user feedback in dialogue policy learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2015

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Statistical spoken dialogue systems have the attractive property of bein...
research
07/19/2017

Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning

Reinforcement learning is widely used for dialogue policy optimization w...
research
08/13/2015

Learning from Real Users: Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems

To train a statistical spoken dialogue system (SDS) it is essential that...
research
01/21/2020

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Learning suitable and well-performing dialogue behaviour in statistical ...
research
09/15/2021

What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation

The dialogue management component of a task-oriented dialogue system is ...
research
11/19/2018

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Training task-completion dialogue agents with reinforcement learning usu...
research
09/09/2016

Dialogue manager domain adaptation using Gaussian process reinforcement learning

Spoken dialogue systems allow humans to interact with machines using nat...

Please sign up or login with your details

Forgot password? Click here to reset