Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

09/21/2020
by   Ziming Li, et al.
0

Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1) traditional supervised learning together with (2) a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2021

DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient Context

Recently, reinforcement learning (RL) has been applied to task-oriented ...
research
07/25/2022

Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning

Many studies have proposed methods for optimizing the dialogue performan...
research
08/01/2019

Reinforcement Learning for Personalized Dialogue Management

Language systems have been of great interest to the research community a...
research
12/17/2016

A User Simulator for Task-Completion Dialogues

Despite widespread interests in reinforcement-learning for task-oriented...
research
04/21/2020

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness

Most existing approaches for goal-oriented dialogue policy learning used...
research
07/24/2022

Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System

A dialogue policy module is an essential part of task-completion dialogu...
research
09/01/2023

JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning

Dialogue policy learning (DPL) is a crucial component of dialogue modell...

Please sign up or login with your details

Forgot password? Click here to reset