Recent data-driven machine learning approaches treat dialogue as a sequence-to-sequence generation problem, and train their models from large datasets, e.g. Wen.etal16a ; Wen.etal16b ; vinyals . As a result, while these systems reproduce patterns found in training data, they do not exploit any structural knowledge about language encoded in grammars and formal models of dialogue. However, the interpretation of many context-dependent utterances in dialogue depends on the underlying structure and content of prior dialogue turns (see e.g. Purver.Ginzburg04 ; Eshghi.etal15 for how clarification requests are interpreted). In consequence, models that rely on surface features of the dialogue alone (i.e. words) may have limitations in handling such data (e.g. by providing a relevant response), even if they have observed the relevant sequences often. Furthermore, as these systems do not parse to logical forms (i.e. a compositional, interpretable representation), they do not allow for inference, and this further limits their application since such a system has no notion of why or how it acts the way it does, and so cannot explain its actions or reasoning.
For these reasons, we explore how formal grammars and dialogue models can be combined with machine learning methods, where linguistic knowledge is used to bootstrap new dialogue systems from very small amounts of unannotated data. This also has the important benefit of reducing developer effort. In addition, we learn dialogue policies at the word-level, rather than turn level – producing more natural dialogues that are known to be preferred by users (e.g. aistincremental , and see examples in figure 2).
2 Inducing Dialogue Systems
Our overall method combines incremental dialogue parsing and Reinforcement Learning for system utterance generation in context. We employ a Dynamic Syntax (DS) parser Kempson.etal01 for incremental language understanding and tracking of the dialogue state using Eshghi et al.’s model of feedback in dialogue Eshghi.etal15 ; Eshghi15 , and a set of transcribed successful dialogues in the target application domain.
construct a Markov Decision Process forwe induce it using DS as described in section 2.1. We define the state encoding function , where any is a DS context and
is a (binary) state vector. For more details see section2.2
. Finally we define the action set as the DS lexicon(i.e. MDP actions are words) and the reward function , which is described in section 2.4. We then use Reinforcement Learning to train a policy , where is the DS Lexicon, and , where is the (incrementally constructed) dialogue context as output by DS at any point in a dialogue. The system is trained in interaction with a (semantic) simulated user, also automatically built from the dialogue data – see section 2.3.
The resulting learned policy forms the combined (incremental) DM and NLG components of a dialogue system for : i.e. a jointly optimised action selection mechanism for DM and NLG, with DS providing the language understanding component. We now go into the details of the above steps:
2.1 Inducing the MDP state space
We induce an MDP state space from the relevant semantic features in the dialogue data by tracking all and only those semantic features which are relevant in that domain. These constitute the goal contexts reached in the dialogues in , expressed as Record Types (RT) in Type Theory with Records Cooper05 ; where each feature is in the form of an atomic (i.e. non-decomposable) RT - usually a predicate, packaged together with its argument fields (see Fig 1 for example RT features). Importantly for us here, the standard subtype relation can also be defined for RTs, and is used in the state encoding function (section 2.2).
To induce the MDP state space, we parse all using DS, generating a set of final success contexts, . We take the Maximally Specific Common Supertype (MCS – see Hough.Purver14 ) and abstract out the domain ‘slot values’. This process has been dubbed ‘delexicalization’ in recent work Wen.etal16a ; Gasic.etal13 , but we note that while this has previously been done on the dialogue surface level, either by hand or via an external domain ontology, here we do it automatically.This results in the goal contexts for , containing the semantic features to be tracked in the MDP state space. Finally, we proceed to decompose these into their constituent, atomic semantic features that will go on to be encoded by the state encoding function.
For example, the semantic RT features being tracked in Fig. 1 have resulted from automatically decomposing goal contexts of in the consumer electronics domain. From left to right, these correspond roughly to the following: “there is something that’s a brand”; “there is a liking (or wanting, or all equivalents) event in the present tense”; “there is something made by that brand”; “the subject of the liking event is the user”;“the object of the liking event is the thing by that brand”.
2.2 The state encoding function
As shown in figure 1 the MDP state is a binary vector of size , i.e. twice the number of the RT features. The first half of the state vector contains the grounded features (i.e. agreed by the participants) , while the second half contains the current semantics being incrementally built in the current dialogue utterance.
Formally the state vector is given by:
where 1 if , and 0 otherwise. (Recall that is the RT subtype relation).
2.3 Semantic User Simulation
Unlike most other dialogue systems, ours isn’t based on dialogue act representations, and is word-by-word incremental. In this setup the notion of a dialogue turn has no clear definition. Turns are here defined by the user simulator that interrupts system generation according to boundaries encountered in the data. The rules for interrupting system generation and outputting a user utterance are extracted automatically from the unlabeled dialogue data () via incremental parsing using DS. Intermediate contexts and user utterances occurring in the dialogue are recorded for use by the simulator. As these rules are semantic, they generalise across different interactional variants by assigning user utterances to matched dialogue contexts. If the simulator cannot match the contexts, the system output is considered out-of-domain and is penalised. Formally, the extracted rules for the simulator are of the form: ; where is the current semantics of some prior system turn, and, the are the utterances (strings) output by the user in that context, as observed in .
2.4 Reinforcement Learning method
The Reward Function is: in the case of out-of-context, ungrammatical, or lengthy utterances; when the agent reaches the final (goal) context; and otherwise.
We have trained policies with this method which are able to successfully perform dialogues in the domain of electronics sales, see figure 2.
We have bootstrapped a system using this method from only a single dialogue, showing that incremental dialogue systems can be automatically created from small amounts of transcribed dialogue data. Besides reducing development time and cost (since it requires no annotated data), our system discovers (and can process) many interactional variations not found in the training data. For example, figure 2, shows several structural dialogue variants that have been discovered by the system (via RL policy exploration), when it has been given only a single training dialogue. The training dialogue was: “SYS – What would you like?; USR –a phone; SYS– by which brand?; USR – by Apple.”
|USR: I would like an LG phone
|USR: I would like a phone.||USR: I would like a phone||SYS: what would you like?|
|SYS: by which brand?||SYS: …by?||USR: a phone|
|USR: Apple.||USR: LG.||SYS: …by?|
|USR: okay.||SYS: okay.||USR: Samsung|
|SYS: you like…?
SYS: a phone
|SYS: what would you like?||SYS: you would like…?||SYS: you like…?|
|USR: a phone by LG||USR: I would like a computer||USR: a tablet by Google.|
|SYS: okay.||SYS: by which brand?||SYS: okay.|
The benefits of such incremental dialogue variants have been empirically established in prior work (e.g. aistincremental ). Our work shows the additional benefits of combining linguistic knowledge with machine learning methods: minimising the role of the dialogue engineer, and rapid domain transfer. Ongoing work involves integrating this method with an end-to-end spoken dialogue system framework, and more substantial evaluation with real users. We are also employing this method for the task of learning perceptually grounded language Yu.etal16 .
-  Gregory Aist, James Allen, Ellen Campana, Carlos Gallo, Scott Stoness, Mary Swift, and Michael Tanenhaus. Incremental dialogue system faster than and preferred to its nonincremental counterpart. In Annual Conference of the Cognitive Science Society, 2007.
-  Robin Cooper. Records and record types in semantic theory. Journal of Logic and Computation, 15(2):99–112, 2005.
-  A. Eshghi, C. Howes, E. Gregoromichelaki, J. Hough, and M. Purver. Feedback in conversation as incremental semantic update. In Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015), 2015.
-  Arash Eshghi. DS-TTR: An incremental, semantic, contextual parser for dialogue. In Proceedings of Semdial 2015 (goDial), the 19th workshop on the semantics and pragmatics of dialogue, 2015.
-  Millica Gas̆ić, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thompson, P. Tsiakoulis, and S. Young. Pomdp-based dialogue manager adaptation to extended domains. In Proc. SIGDIAL, 2013.
-  Julian Hough and Matthew Purver. Probabilistic type theory for incremental dialogue processing. In Proc. EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), 2014.
-  Ruth Kempson, Wilfried Meyer-Viol, and Dov Gabbay. Dynamic Syntax: The Flow of Language Understanding. Blackwell, 2001.
-  Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. In NIPS Deep Learning Workshop. 2013.
-  Matthew Purver and Jonathan Ginzburg. Clarifying noun phrase semantics. Journal of Semantics, 21(3):283–339, 2004.
-  Oriol Vinyals and Quoc Le. A neural conversational model. arXiv, (1506.05869v3), 2015.
-  Tsung-Hsien Wen, Milica Gašić, Nikola Mrkšić, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, and Steve Young. Multi-domain neural network language generation for spoken dialogue systems. In Proc. NAACL, 2016.
-  Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gašić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint: 1604.04562, April 2016.
-  Yanchao Yu, Arash Eshghi, and Oliver Lemon. Training an adaptive dialogue policy for interactive learning of visually grounded word meanings. In Proceedings of SIGDIAL 2016, 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 339–349, Los Angeles, 2016.