Dialog Action-Aware Transformer for Dialog Policy Learning

09/05/2023
by   Huimin Wang, et al.
0

Recent works usually address Dialog policy learning DPL by training a reinforcement learning (RL) agent to determine the best dialog action. However, existing works on deep RL require a large volume of agent-user interactions to achieve acceptable performance. In this paper, we propose to make full use of the plain text knowledge from the pre-trained language model to accelerate the RL agent's learning speed. Specifically, we design a dialog action-aware transformer encoder (DaTrans), which integrates a new fine-tuning procedure named masked last action task to encourage DaTrans to be dialog-aware and distils action-specific features. Then, DaTrans is further optimized in an RL setting with ongoing interactions and evolves through exploration in the dialog action space toward maximizing long-term accumulated rewards. The effectiveness and efficiency of the proposed model are demonstrated with both simulator evaluation and human evaluation.

READ FULL TEXT
research
11/02/2021

Integrating Pretrained Language Model for Dialogue Policy Learning

Reinforcement Learning (RL) has been witnessed its potential for trainin...
research
05/05/2020

A Survey on Dialog Management: Recent Advances and Challenges

Dialog management (DM) is a crucial component in a task-oriented dialog ...
research
07/05/2019

Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete Attributes

Open domain dialog systems face the challenge of being repetitive and pr...
research
09/17/2019

Hierarchical Reinforcement Learning for Open-Domain Dialog

Open-domain dialog generation is a challenging problem; maximum likeliho...
research
09/03/2019

How to Build User Simulators to Train RL-based Dialog Systems

User simulators are essential for training reinforcement learning (RL) b...
research
06/09/2021

TempoRL: Learning When to Act

Reinforcement learning is a powerful approach to learn behaviour through...
research
09/16/2019

Learning Index Selection with Structured Action Spaces

Configuration spaces for computer systems can be challenging for traditi...

Please sign up or login with your details

Forgot password? Click here to reset