DeepAI AI Chat
Log In Sign Up

Meta Dialogue Policy Learning

by   Yumo Xu, et al.

Dialog policy determines the next-step actions for agents and hence is central to a dialogue system. However, when migrated to novel domains with little data, a policy model can fail to adapt due to insufficient interactions with the new environment. We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains, such as dialogue acts and slots. We decompose the state and action representation space into feature subspaces corresponding to these low-level components to facilitate cross-domain knowledge transfer. Furthermore, we embed DTQN in a meta-learning framework and introduce Meta-DTQN with a dual-replay mechanism to enable effective off-policy training and adaptation. In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency on the multi-domain dialogue dataset MultiWOZ 2.0.


Domain Adaptation in Dialogue Systems using Transfer and Meta-Learning

Current generative-based dialogue systems are data-hungry and fail to ad...

Cross-domain Dialogue Policy Transfer via Simultaneous Speech-act and Slot Alignment

Dialogue policy transfer enables us to build dialogue policies in a targ...

Generative Dialog Policy for Task-oriented Dialog Systems

There is an increasing demand for task-oriented dialogue systems which c...

A Student-Teacher Architecture for Dialog Domain Adaptation under the Meta-Learning Setting

Numerous new dialog domains are being created every day while collecting...

MTSS: Learn from Multiple Domain Teachers and Become a Multi-domain Dialogue Expert

How to build a high-quality multi-domain dialogue system is a challengin...

Improving the Generalizability of Collaborative Dialogue Analysis with Multi-Feature Embeddings

Conflict prediction in communication is integral to the design of virtua...

Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog

Recent studies have shown remarkable success in end-to-end task-oriented...