End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

06/03/2016
by   Jason D. Williams, et al.
0

This paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone can derive a reasonable initial policy from a small number of training dialogs; and starting RL optimization with a policy trained with SL substantially accelerates the learning rate of RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

In this paper, we present a deep reinforcement learning (RL) framework f...
research
12/18/2016

Sample-efficient Deep Reinforcement Learning for Dialog Control

Representing a dialog policy as a recurrent neural network (RNN) is attr...
research
04/30/2020

Unsupervised Learning of KB Queries in Task Oriented Dialogs

Task-oriented dialog (TOD) systems converse with users to accomplish a s...
research
02/10/2017

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

End-to-end learning of recurrent neural networks (RNNs) is an attractive...
research
12/07/2017

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline wi...
research
01/27/2022

Excavation Reinforcement Learning Using Geometric Representation

Excavation of irregular rigid objects in clutter, such as fragmented roc...
research
04/13/2022

Revisiting Markovian Generative Architectures for Efficient Task-Oriented Dialog Systems

Recently, Transformer based pretrained language models (PLMs), such as G...

Please sign up or login with your details

Forgot password? Click here to reset