Deep RL with Hierarchical Action Exploration for Dialogue Generation

03/22/2023
by   Itsugun Cho, et al.
0

Conventionally, since the natural language action space is astronomical, approximate dynamic programming applied to dialogue generation involves policy improvement with action sampling. However, such a practice is inefficient for reinforcement learning (RL) because the eligible (high action value) responses are very sparse, and the greedy policy sustained by the random sampling is flabby. This paper shows that the performance of dialogue policy positively correlated with sampling size by theoretical and experimental. We introduce a novel dual-granularity Q-function to alleviate this limitation by exploring the most promising response category to intervene in the sampling. It extracts the actions following the grained hierarchy, which can achieve the optimum with fewer policy iterations. Our approach learns in the way of offline RL from multiple reward functions designed to recognize human emotional details. Empirical studies demonstrate that our algorithm outperforms the baseline methods. Further verification presents that ours can generate responses with higher expected rewards and controllability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2021

DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient Context

Recently, reinforcement learning (RL) has been applied to task-oriented ...
research
09/01/2023

JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning

Dialogue policy learning (DPL) is a crucial component of dialogue modell...
research
08/04/2023

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

Applying Reinforcement Learning (RL) to sequence generation models enabl...
research
07/24/2022

Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System

A dialogue policy module is an essential part of task-completion dialogu...
research
07/23/2023

On the Effectiveness of Offline RL for Dialogue Response Generation

A common training technique for language models is teacher forcing (TF)....
research
03/10/2021

Causal-aware Safe Policy Improvement for Task-oriented dialogue

The recent success of reinforcement learning's (RL) in solving complex t...
research
11/02/2021

Integrating Pretrained Language Model for Dialogue Policy Learning

Reinforcement Learning (RL) has been witnessed its potential for trainin...

Please sign up or login with your details

Forgot password? Click here to reset