Efficient Dialog Policy Learning via Positive Memory Retention

10/02/2018
by   Rui Zhao, et al.
0

This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chat-bots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2016

Sample-efficient Deep Reinforcement Learning for Dialog Control

Representing a dialog policy as a recurrent neural network (RNN) is attr...
research
07/02/2018

Improving Goal-Oriented Visual Dialog Agents via Advanced Recurrent Nets with Tempered Policy Gradient

Learning goal-oriented dialogues by means of deep reinforcement learning...
research
06/21/2019

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation an...
research
02/10/2017

Batch Policy Gradient Methods for Improving Neural Conversation Models

We study reinforcement learning of chatbots with recurrent neural networ...
research
05/07/2020

Adaptive Dialog Policy Learning with Hindsight and User Modeling

Reinforcement learning methods have been used to compute dialog policies...
research
12/07/2017

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline wi...
research
07/24/2019

Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

This paper tackles the problem of learning a questioner in the goal-orie...

Please sign up or login with your details

Forgot password? Click here to reset