Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

04/10/2021
by   Zhengxu Hou, et al.
0

For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action pairs in real dialogs. In this paper, we propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot. Based on inverse adversarial reinforcement learning, our designed reward model can provide more accurate and explainable reward signals for state-action pairs.Extensive evaluations show that our approach can be applied to a wide range of reinforcement learning-based dialog systems and significantly improves both the performance and the speed of convergence.

READ FULL TEXT
research
07/13/2023

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Dialog policies, which determine a system's action based on the current ...
research
05/30/2018

Adversarial Learning of Task-Oriented Neural Dialog Models

In this work, we propose an adversarial learning method for reward estim...
research
05/05/2020

A Survey on Dialog Management: Recent Advances and Challenges

Dialog management (DM) is a crucial component in a task-oriented dialog ...
research
11/02/2021

Integrating Pretrained Language Model for Dialogue Policy Learning

Reinforcement Learning (RL) has been witnessed its potential for trainin...
research
07/19/2023

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

The main challenge in developing effective reinforcement learning (RL) p...
research
06/10/2019

Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards

Generating keyphrases that summarize the main points of a document is a ...
research
07/05/2019

Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete Attributes

Open domain dialog systems face the challenge of being repetitive and pr...

Please sign up or login with your details

Forgot password? Click here to reset