Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

02/23/2019
by   Tiancheng Zhao, et al.
0

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. Common practice has been to use handcrafted dialog acts, or the output vocabulary, e.g. in neural encoder decoders, as the action spaces. Both have their own limitations. This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and MultiWoz dialogs. Our detailed analysis also provides insights about various latent variable approaches for policy learning and can serve as a foundation for developing better latent actions in future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2020

SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning

The recent advent of neural approaches for developing each dialog compon...
research
11/18/2020

LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization

Reinforcement learning (RL) can enable task-oriented dialogue systems to...
research
07/26/2020

Data-efficient visuomotor policy training using reinforcement learning and generative models

We present a data-efficient framework for solving deep visuomotor sequen...
research
11/02/2018

Unsupervised Learning of Interpretable Dialog Models

Recently several deep learning based models have been proposed for end-t...
research
04/04/2023

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a ...
research
04/25/2022

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

Multi-action dialog policy (MADP), which generates multiple atomic dialo...
research
07/05/2019

Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete Attributes

Open domain dialog systems face the challenge of being repetitive and pr...

Please sign up or login with your details

Forgot password? Click here to reset