Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

08/27/2019
by   Heriberto Cuayáhuitl, et al.
44

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text—without any manual annotations. Experimental results using different splits of training data report the following. First, that our agents learn reasonable policies in the environments they get familiarised with, but their performance drops substantially when they are exposed to a test set of unseen dialogues. Second, that the choice of sentence embedding size between 100 and 300 dimensions is not significantly different on test data. Third, that our proposed human-likeness rewards are reasonable for training chatbots as long as they use lengthy dialogue histories of >=10 sentences.

READ FULL TEXT

page 1

page 5

research
08/27/2019

Ensemble-Based Deep Reinforcement Learning for Chatbots

Trainable chatbots that exhibit fluent and human-like conversations rema...
research
01/18/2016

SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

This paper presents 'SimpleDS', a simple and publicly available dialogue...
research
09/30/2018

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Deep Reinforcement Learning (DRL) has become a powerful strategy to solv...
research
03/31/2017

Sentence Simplification with Deep Reinforcement Learning

Sentence simplification aims to make sentences easier to read and unders...
research
08/14/2019

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Deep Reinforcement Learning (DRL) has become a powerful methodology to s...
research
04/05/2022

Inferring Rewards from Language in Context

In classic instruction following, language like "I'd like the JetBlue fl...
research
07/12/2021

Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue

To encourage AI agents to conduct meaningful Visual Dialogue (VD), the u...

Please sign up or login with your details

Forgot password? Click here to reset