Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

02/11/2018
by   Gellért Weisz, et al.
0

In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this paper, we investigate deep reinforcement learning approaches to solve this problem. Particular attention is given to actor-critic methods, off-policy reinforcement learning with experience replay, and various methods aimed at reducing the bias and variance of estimators. When combined, these methods result in the previously proposed ACER algorithm that gave competitive results in gaming environments. These environments however are fully observable and have a relatively small action set so in this paper we examine the application of ACER to dialogue policy optimisation. We show that this method beats the current state-of-the-art in deep learning approaches for spoken dialogue systems. This not only leads to a more sample efficient algorithm that can train faster, but also allows us to apply the algorithm in more difficult environments than before. We thus experiment with learning in a very large action space, which has two orders of magnitude more actions than previously considered. We find that ACER trains significantly faster than the current state-of-the-art.

READ FULL TEXT
research
09/22/2020

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

The task-oriented spoken dialogue system (SDS) aims to assist a human us...
research
07/01/2017

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for ...
research
06/10/2016

Policy Networks with Two-Stage Training for Dialogue Systems

In this paper, we propose to use deep policy networks which are trained ...
research
11/30/2017

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

In statistical dialogue management, the dialogue manager learns a policy...
research
03/08/2018

Feudal Reinforcement Learning for Dialogue Management in Large Domains

Reinforcement learning (RL) is a promising approach to solve dialogue po...
research
06/14/2018

Maximum a Posteriori Policy Optimisation

We introduce a new algorithm for reinforcement learning called Maximum a...
research
11/15/2017

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of...

Please sign up or login with your details

Forgot password? Click here to reset