AI Chat AI Image Generator AI Video Text to Speech

Efficient Dialog Policy Learning via Positive Memory Retention

10/02/2018

∙

by Rui Zhao, et al.

∙

∙

This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chat-bots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.

Rui Zhao
119 publications
Volker Tresp
101 publications

page 1

page 2

page 3

page 4

research

∙ 12/18/2016

Sample-efficient Deep Reinforcement Learning for Dialog Control

Representing a dialog policy as a recurrent neural network (RNN) is attr...

0 Kavosh Asadi, et al. ∙

research

∙ 07/02/2018

Improving Goal-Oriented Visual Dialog Agents via Advanced Recurrent Nets with Tempered Policy Gradient

Learning goal-oriented dialogues by means of deep reinforcement learning...

0 Rui Zhao, et al. ∙

research

∙ 06/21/2019

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation an...

5 Layla El Asri, et al. ∙

research

∙ 02/10/2017

Batch Policy Gradient Methods for Improving Neural Conversation Models

We study reinforcement learning of chatbots with recurrent neural networ...

0 Kirthevasan Kandasamy, et al. ∙

research

∙ 05/07/2020

Adaptive Dialog Policy Learning with Hindsight and User Modeling

Reinforcement learning methods have been used to compute dialog policies...

0 Yan Cao, et al. ∙

research

∙ 12/07/2017

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline wi...

0 Li Zhou, et al. ∙

research

∙ 07/24/2019

Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

This paper tackles the problem of learning a questioner in the goal-orie...

0 Yen-Wei Chang, et al. ∙