Sample-efficient Deep Reinforcement Learning for Dialog Control

12/18/2016
by   Kavosh Asadi, et al.
0

Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2016

End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

This paper presents a model for end-to-end learning of task-oriented dia...
research
10/02/2018

Efficient Dialog Policy Learning via Positive Memory Retention

This paper is concerned with the training of recurrent neural networks a...
research
02/12/2018

ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

Learning to walk over a graph towards a target node for a given input qu...
research
04/28/2020

Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling

We present preliminary results from our sixth placed entry to the Flatla...
research
02/10/2017

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

End-to-end learning of recurrent neural networks (RNNs) is an attractive...
research
04/12/2019

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Reinforcement learning (RL) is about sequential decision making and is t...
research
06/21/2019

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation an...

Please sign up or login with your details

Forgot password? Click here to reset