Integrating Pretrained Language Model for Dialogue Policy Learning

11/02/2021
by   Hongru Wang, et al.
0

Reinforcement Learning (RL) has been witnessed its potential for training a dialogue policy agent towards maximizing the accumulated rewards given from users. However, the reward can be very sparse for it is usually only provided at the end of a dialog session, which causes unaffordable interaction requirements for an acceptable dialog agent. Distinguished from many efforts dedicated to optimizing the policy and recovering the reward alternatively which suffers from easily getting stuck in local optima and model collapse, we decompose the adversarial training into two steps: 1) we integrate a pre-trained language model as a discriminator to judge whether the current system action is good enough for the last user action (i.e., next action prediction); 2) the discriminator gives and extra local dense reward to guide the agent's exploration. The experimental result demonstrates that our method significantly improves the complete rate ( 4.4%) and success rate ( 8.0%) of the dialogue system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

Dialog Action-Aware Transformer for Dialog Policy Learning

Recent works usually address Dialog policy learning DPL by training a re...
research
04/07/2020

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement-based training methods have emerged as the most popular ch...
research
04/10/2021

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

For task-oriented dialog systems, training a Reinforcement Learning (RL)...
research
07/13/2023

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Dialog policies, which determine a system's action based on the current ...
research
10/12/2020

Human-centric Dialog Training via Offline Reinforcement Learning

How can we train a dialog model to produce better conversations by learn...
research
03/22/2023

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Conventionally, since the natural language action space is astronomical,...
research
02/19/2019

A novel repetition normalized adversarial reward for headline generation

While reinforcement learning can effectively improve language generation...

Please sign up or login with your details

Forgot password? Click here to reset