DeepAI AI Chat
Log In Sign Up

Show Us the Way: Learning to Manage Dialog from Demonstrations

04/17/2020
by   Gabriel Gordon-Hall, et al.
HUAWEI Technologies Co., Ltd.
0

We present our submission to the End-to-End Multi-Domain Dialog Challenge Track of the Eighth Dialog System Technology Challenge. Our proposed dialog system adopts a pipeline architecture, with distinct components for Natural Language Understanding, Dialog State Tracking, Dialog Management and Natural Language Generation. At the core of our system is a reinforcement learning algorithm which uses Deep Q-learning from Demonstrations to learn a dialog policy with the help of expert examples. We find that demonstrations are essential to training an accurate dialog policy where both state and action spaces are large. Evaluation of our Dialog Management component shows that our approach is effective - beating supervised and reinforcement learning baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/23/2020

Learning Dialog Policies from Weak Demonstrations

Deep reinforcement learning is a promising approach to training a dialog...
02/08/2021

A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining

This paper describes our submission for the End-to-end Multi-domain Task...
09/12/2019

MOSS: End-to-End Dialog System Framework with Modular Supervision

A major bottleneck in training end-to-end task-oriented dialog system is...
11/16/2020

Dialog Simulation with Realistic Variations for Training Goal-Oriented Conversational Systems

Goal-oriented dialog systems enable users to complete specific goals lik...
11/26/2018

Learning Latent Beliefs and Performing Epistemic Reasoning for Efficient and Meaningful Dialog Management

Many dialogue management frameworks allow the system designer to directl...
06/16/2016

Spectral decomposition method of dialog state tracking via collective matrix factorization

The task of dialog management is commonly decomposed into two sequential...
05/07/2020

Adaptive Dialog Policy Learning with Hindsight and User Modeling

Reinforcement learning methods have been used to compute dialog policies...