An Asynchronous Updating Reinforcement Learning Framework for Task-oriented Dialog System

05/04/2023
by   Sai Zhang, et al.
0

Reinforcement learning has been applied to train the dialog systems in many works. Previous approaches divide the dialog system into multiple modules including DST (dialog state tracking) and DP (dialog policy), and train these modules simultaneously. However, different modules influence each other during training. The errors from DST might misguide the dialog policy, and the system action brings extra difficulties for the DST module. To alleviate this problem, we propose Asynchronous Updating Reinforcement Learning framework (AURL) that updates the DST module and the DP module asynchronously under a cooperative setting. Furthermore, curriculum learning is implemented to address the problem of unbalanced data distribution during reinforcement learning sampling, and multiple user models are introduced to increase the dialog diversity. Results on the public SSD-PHONE dataset show that our method achieves a compelling result with a 31.37 publicly available via https://github.com/shunjiu/AURL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

Multi-action dialog policy (MADP), which generates multiple atomic dialo...
research
09/06/2019

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation

Reinforcement learning (RL) is an effective approach to learn an optimal...
research
05/08/2018

Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog

Creating an intelligent conversational system that understands vision an...
research
06/01/2021

Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialog State Tracking

Existing dialog state tracking (DST) models are trained with dialog data...
research
07/23/2019

Structured Fusion Networks for Dialog

Neural dialog models have exhibited strong performance, however their en...
research
02/27/2023

Multi-Action Dialog Policy Learning from Logged User Feedback

Multi-action dialog policy, which generates multiple atomic dialog actio...
research
04/06/2022

Standardized feature extraction from pairwise conflicts applied to the train rescheduling problem

We propose a train rescheduling algorithm which applies a standardized f...

Please sign up or login with your details

Forgot password? Click here to reset