DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient Context

07/07/2021
by   Hyunmin Jeon, et al.
23

Recently, reinforcement learning (RL) has been applied to task-oriented dialogue systems by using latent actions to solve shortcomings of supervised learning (SL). In this paper, we propose a multi-domain task-oriented dialogue system, called Dialogue System with Optimizing a Recurrent Action Policy using Efficient Context (DORA), that uses SL, with subsequently applied RL to optimize dialogue systems using a recurrent dialogue policy. This dialogue policy recurrently generates explicit system actions as a both word-level and high-level policy. As a result, DORA is clearly optimized during both SL and RL steps by using an explicit system action policy that considers an efficient context instead of the entire dialogue history. The system actions are both interpretable and controllable, whereas the latent actions are not. DORA improved the success rate by 6.6 points on MultiWOZ 2.0 and by 10.9 points on MultiWOZ 2.1.

READ FULL TEXT

page 3

page 17

page 20

page 21

research
02/28/2022

A Survey on Recent Advances and Challenges in Reinforcement LearningMethods for Task-Oriented Dialogue Policy Learning

Dialogue Policy Learning is a key component in a task-oriented dialogue ...
research
09/21/2020

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Dialogue policy learning for task-oriented dialogue systems has enjoyed ...
research
09/22/2022

Learning Interpretable Latent Dialogue Actions With Less Supervision

We present a novel architecture for explainable modeling of task-oriente...
research
03/22/2023

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Conventionally, since the natural language action space is astronomical,...
research
05/16/2022

Taming Continuous Posteriors for Latent Variational Dialogue Policies

Utilizing amortized variational inference for latent-action reinforcemen...
research
11/18/2020

LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization

Reinforcement learning (RL) can enable task-oriented dialogue systems to...
research
06/04/2021

Retrieve Memorize: Dialog Policy Learning with Multi-Action Memory

Dialogue policy learning, a subtask that determines the content of syste...

Please sign up or login with your details

Forgot password? Click here to reset