Batch Policy Gradient Methods for Improving Neural Conversation Models

02/10/2017
by   Kirthevasan Kandasamy, et al.
0

We study reinforcement learning of chatbots with recurrent neural network architectures when the rewards are noisy and expensive to obtain. For instance, a chatbot used in automated customer service support can be scored by quality assurance agents, but this process can be expensive, time consuming and noisy. Previous reinforcement learning work for natural language processing uses on-policy updates and/or is designed for on-line learning settings. We demonstrate empirically that such strategies are not appropriate for this setting and develop an off-policy batch policy gradient method (BPG). We demonstrate the efficacy of our method via a series of synthetic experiments and an Amazon Mechanical Turk experiment on a restaurant recommendations dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation an...
research
01/22/2022

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Natural policy gradient methods are popular reinforcement learning metho...
research
08/24/2018

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model op...
research
10/02/2018

Efficient Dialog Policy Learning via Positive Memory Retention

This paper is concerned with the training of recurrent neural networks a...
research
06/08/2020

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

We develop a mathematical framework for solving multi-task reinforcement...
research
10/28/2021

Bayesian Sequential Optimal Experimental Design for Nonlinear Models Using Policy Gradient Reinforcement Learning

We present a mathematical framework and computational methods to optimal...
research
09/12/2018

Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning

Songs can be well arranged by professional music curators to form a rive...

Please sign up or login with your details

Forgot password? Click here to reset