Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

07/01/2017
by   Pei-Hao Su, et al.
0

Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackle this problem. Firstly, to speed up the learning process, two sample-efficient neural networks algorithms: trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER) are presented. For TRACER, the trust region helps to control the learning step size and avoid catastrophic model changes. For eNACER, the natural gradient identifies the steepest ascent direction in policy space to speed up the convergence. Both models employ off-policy learning with experience replay to improve sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning. Combining these two approaches, we demonstrate a practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning al...
research
09/01/2022

Actor Prioritized Experience Replay

A widely-studied deep reinforcement learning (RL) technique known as Pri...
research
02/11/2018

Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

In spoken dialogue systems, we aim to deploy artificial intelligence to ...
research
05/05/2020

Discrete-to-Deep Supervised Policy Learning

Neural networks are effective function approximators, but hard to train ...
research
06/10/2016

Policy Networks with Two-Stage Training for Dialogue Systems

In this paper, we propose to use deep policy networks which are trained ...
research
11/29/2017

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

Dialogue assistants are rapidly becoming an indispensable daily aid. To ...
research
11/30/2022

Efficient Reinforcement Learning (ERL): Targeted Exploration Through Action Saturation

Reinforcement Learning (RL) generally suffers from poor sample complexit...

Please sign up or login with your details

Forgot password? Click here to reset