Context-Sensitive Generation Network for Handing Unknown Slot Values in Dialogue State Tracking

05/08/2020
by   Puhai Yang, et al.
Beijing Institute of Technology
0

As a key component in a dialogue system, dialogue state tracking plays an important role. It is very important for dialogue state tracking to deal with the problem of unknown slot values. As far as we known, almost all existing approaches depend on pointer network to solve the unknown slot value problem. These pointer network-based methods usually have a hidden assumption that there is at most one out-of-vocabulary word in an unknown slot value because of the character of a pointer network. However, often, there are multiple out-of-vocabulary words in an unknown slot value, and it makes the existing methods perform bad. To tackle the problem, in this paper, we propose a novel Context-Sensitive Generation network (CSG) which can facilitate the representation of out-of-vocabulary words when generating the unknown slot value. Extensive experiments show that our proposed method performs better than the state-of-the-art baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

10/21/2020

STN4DST: A Scalable Dialogue State Tracking based on Slot Tagging Navigation

Scalability for handling unknown slot values is a important problem in d...
07/27/2021

Dual Slot Selector via Local Reliability Verification for Dialogue State Tracking

The goal of dialogue state tracking (DST) is to predict the current dial...
08/26/2019

Leveraging External Knowledge for Out-Of-Vocabulary Entity Labeling

Dealing with previously unseen slots is a challenging problem in a real-...
05/03/2018

An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking

We highlight a practical yet rarely discussed problem in dialogue state ...
10/21/2020

Multi-Domain Dialogue State Tracking based on State Graph

We investigate the problem of multi-domain Dialogue State Tracking (DST)...
06/02/2020

A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Recent studies in dialogue state tracking (DST) leverage historical info...
07/05/2019

BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer

An important yet rarely tackled problem in dialogue state tracking (DST)...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Currently, the research and application of dialogue systems are widely concerned, especially for taskoriented dialogue systems, such as booking tickets and ordering restaurants. Dialog state tracking is a key component of a task-oriented dialogue system. By parsing dialogue history, dialog state tracking extracts user’s intentional state, such as intention, slot and value, as the input of dialogue manager for system decision making. For example, (price, cheap) and (area, centre) are extracted from “I am looking for a cheap restaurant in the centre of the city” as user’s state.

Fig. 1: Two different representations of words: word embedding and contextual representation. The yellow unit is the word in the vocabulary, the red unit is out-of-vocabulary word, the blue unit refers to the vocabulary space representation of the word, namely word embedding, and the green unit refers to the representation of the word in a specific context.

Traditionally, dialog state tracking is typically solved using discriminative approach [18][10][21], which is based on the assumption that all slot values are known in advance. In reality, however, it is impossible to know all the slot values, dialog state tracking models often encounter slot values that have never been seen in training, which are also known as unknown slot values [17]. Thus, recently, researches on dialog state tracking mainly concentrates on the generative method [19][20][16][6], which attempt to solve the problem of unknown slot values by generating novel words through vocabulary-based distribution.

However, the generative method almost always depends on pointer network [13] to extract unknown slot values on account of the fact that unknown slot values often contain the outof- vocabulary word. And the validity of pointer network to extract unknown slot values is usually based on the assumption that the unknown slot value contains not more than one out-of-vocabulary word. Invariably, in the decoder, pointer network use word embedding to represent each word, and all out-of-vocabulary words are represented by a uniform embedding, such as ”UNK”. So that multiple out-of-vocabulary words in an unknown slot value are indistinguishable, as shown in Fig 1 (a), which confuses the decoder that uses the word embedding as input.

In fact, there is often more than one out-ofvocabulary words in an unknown slot value, the pointer network cannot distinguish these different out-of-vocabulary words only by word embedding, and the information of these out-of-vocabulary words cannot be adequately represented by a uniform embedding. Due to the input uncertainty that comes with this situation during decoding, the output of the decoder will gradually deviate, resulting in the error of the unknown slot value.

To tackle the drawback, we emphasize that the input of the decoder should be infused with more information than just the word embedding. In this paper, we propose a novel Context-Sensitive Generation network (CSG) for the unknown slot value problem in dialog state tracking. Our proposed model joins word contextual information, as shown in Fig 1 (b), to the input of the decoder. So that different out-of-vocabulary words can be distinguished by word contextual information, and word contextual information can also enrich the representation of the word, which used to be represented only by word embedding.

The main contributions of this paper are as follows:

  • We propose a novel context-sensitive generation network that utilizes word contextual information to overcome the problem of uncertain information caused by out-of-vocabulary words.

  • On the most influential MultiWOZ 2.1 benchmark, our model has obvious advantages over the state-of-the-art baselines in the extraction of unknown slot values.

The rest of paper is organized as follows: Related work is briefly introduced in section 2 and the shortcomings are pointed out. In section 3, The main structure of our proposed model is described in detail. Experiments and analysis are presented in section 4, followed by conclusions in section 5.

2 Related Work

As we mentioned above, the generative method usually depends on pointer network [13] to solve the unknown slot value problem when in dialog state tracking (DST). The validity of pointer network implies the assumption that slot value contains not more than one out-of-vocabulary word, which is the motivation of this paper. At present, researches on generative DST mainly focus on two categories: extractive DST and hybrid DST.

The Encoder-Decoder structure based on pointer network has been used in many researches [14][7][9]. These models are designed to directly copy the words in the input text, which is different from the traditional generative method of modeling vocabulary distribution, but more like a variant of sequence labeling [8]. Since almost all slot values are contained in the dialogue history, pointer network-based extractive DST was proposed for DST modeling [17], and the extractive DST can also solve the unknown slot value problem. This method can copy slot values directly from the dialogue history, but it faced with the problem of subsequent processing, because some slot values need to be inferred instead of being directly contained in the dialogue history.

Pointer-Generator Networks (PGN) [12]

, a hybrid between sequence-to-sequence attentional model

[11] and a pointer network, has received a lot of attention since it was proposed, and there are also many relevant studies in DST [19][16][6]. Different from the extractive method, PGN-based DST, hybrid DST we called, can copy words from the input text via pointer network while maintaining the ability to generate novel words using vocabulary-based distribution. Therefore, the hybrid DST does not require subsequent processing modules

In general, current generative DST methods are basically depend on pointer network to solve unknown slot value problem. However, as we mentioned in the previous section, pointer network is faced with the problem of uncertain input information in decoding. When there are multiple out-of-vocabulary words in an unknown slot value, the unknown slot value generated by the pointer in pointer network will be deviated.

3 Context-Sensitive Generation Network

In this section, we describe (1) the framework of our proposed model and (2) different schemes to leverage context information in our model. The code for this paper is available online111https://github.com/yangpuhai/CSG.

3.1 Framework

Fig. 2:

The framework of our proposed model. The green unit is the encoder, the blue unit is the decoder, the yellow unit is the output vector of the encoder for each encoding step, ”da” and ”Vinci” are out-of-vocabulary words, and the word ”UNK” is used as the substitution of out-of-vocabulary words. TAG means a slot, such as ”restaurant-name”.

In our model (shown in Fig 2) the encoder is used to generate the vector representation of dialogue history and the contextual representation of each word. It is important to note that the encoder can be any encoding model, such as bi-LSTM [5] and bi-GRU [2]. The input of the encoder is the dialogue history , which is the concatenation of all words in the dialogue history. is the length of dialogue history and is the size of word embedding. The output of the encoder consists of two parts, one is the hidden state at the last encoding step, which is the initial input state of the decoder, and the other is the output consisting of the output of the encoder at each encoding step, is the hidden size. In our model, we assume that is not only a representation of the dialogue history, but also a contextual representation of each word in the dialogue history. Therefore, can be used to enhance the representation of out-of-vocabulary word in decoding.

At the initial step of decoding, we use the slot embedding as input to the decoder. It is important to point out that the slot embedding does not have to be the input to the decoder, but can also be placed in the output, which is not the focus of this paper. At decoding step , the output of the decoder is used to generate the attention over each word in dialogue history.

Then, the position of the word of slot value in the input dialogue history is determined by the maximum attention in .

At decoding step , The pointer network traditionally take the embedding of the word selected in step as the input of the decoder. However, the information in the embedding of the out-of-vocabulary word is incomplete and cannot effectively represent the word. As we mentioned above, the output of the encoder can be seen as the contextual representation of each word in dialogue history. Therefore, our proposed model combines the embedding of word and its contextual representation as input to the decoder.

Where is the way and are combined, and we will discuss the different combination schemes in the following.

3.2 Context Utilization Schemes

In this paper, we believe that words should have not only vocabulary space representation, that is, word embedding , but also contextual representation . In the traditional Encoder-Decoder model, only word embedding is usually considered and contextual representation is ignored. When a word is an out-of-vocabulary word, it is common to use a uniform word embedding “UNK” to represent the word. In this way, the information of the word cannot be adequately represented, which will lead to the deviation of the results. Therefore, we propose to combine the embedding and contextual representation of words, so that not only the information in words is enhanced, but also the unknown slot value problem can be effectively addressed.

In order to make effective use of contextual information, we propose different schemes combining contextual information with word embedding, as follows:

Enc: . The contextual representation is used directly as the representation of the word.

Sum: . The sum of the word embedding and contextual representations is used as the representation of the word.

Cat: . The concatenation of the word embedding and contextual representations is used as the representation of the word.

Pws: . As mentioned in section 2, pointer network [13] is used in Pointer-Generator Networks (PGN) [12] to solve the unknown slot value problem, so our model can be generalized to improve PGN. In traditional PGN, while generating the attention for the dialogue history, the distribution for the vocabulary space is also calculated, and then weighted sum is made according to the proportion , which is usually calculated in different ways in different models [20][12]. Here, we weighted the sum of word embedding and contextual representation according to the proportion . It should be pointed out that this scheme is only suitable for PGN generalization model.

out-of-vocabulary ratios (%)
0 10 20 30 40 50 60 70 80 90 100
USV_1 in test set (%) 0 10 21 29 32 30 50 49 47 45 43
USV_2 in test set (%) 0 2 3 12 18 26 35 38 45 51 57
TABLE I: The statistics of the modified dataset MultiWOZ 2.1 in different out-of-vocabulary ratios. USV_1 refers to the unknown slot value containing only one out-of-vocabulary word, while USV_2 refers to the unknown slot value containing two or more out-of-vocabulary words.

4 Experiments

4.1 Dataset

Our experiments are conduct on MultiWOZ 2.1 dataset [3], which is the latest corrected version of the MultiWOZ dataset [1]. Compared with the DSTC2 dataset [4] which is the traditional standard dialog state tracking benchmark, MultiWOZ 2.1 containing around 10K dialogues, with each dialogue averaging 13.68 turns. And there are more than 30 slots and over 4500 possible slot values in Multi- WOZ 2.1. More importantly, since there are slot values containing multiple words in MultiWOZ 2.1 dataset, that is consistent with the problem of extracting unknown slot value containing multiple out-of-vocabulary words studied in this paper, so MultiWOZ 2.1 dataset is selected as bench-mark. Further, we eliminate the slots in MultiWOZ 2.1 whose slot value contains only one word, and the final dataset contain 7 slots: ’traindestination’, ’train-departure’, ’attraction-name’, ’restaurant-name’, ’hotel-name’, ’taxi-destination’, ’taxi-departure’ in 5 domains: ”train”, ”attraction”, ”restaurant”, ”hotel”, ”taxi”. And the modified dataset consists of training, validation, and testing, which contain 32,233, 5,431, and 5,568 dialogue utterances respectively.

It should be noted that the slot values of the validation and test sets of the original MultiWOZ 2.1 dataset do not contain the unknown slot value. For experimental investigation, we select some words from the slot values of the validation and test sets as out-of-vocabulary words to simulate the unknown slot value problem. Specifically, we randomly select the word in the slot values from the validation set and test set in different proportions, and then discard the word from the training set vocabulary. Meanwhile, any sample containing the word in the training set changes the word to the character ”UNK”, but keeps the sample for training purposes. In order to highlight the experimental comparison, we discard the negative samples that do not contain any slot values in the data set without changing the experimental conclusion. The statistics of the modified dataset are shown in Table I. Importantly, the out-of-vocabulary ratio mentioned in this paper refers to the ratio of out-of-vocabulary words in all slot values in the validation and test sets.

4.2 Baselines

It is mentioned in section 2 that existing generative dialog state tracking (DST) models are mainly divided into two types, pointer network-based extractive DST and pointer-generator networks (PGN)- based hybrid DST. Therefore, our baselines include two types, the extractive model: SpanPtr [17] and SeqPtr, and the hybrid model: HD [20] and TRADE [16]. Next, we give a brief introduction to these models:

SpanPtr: This model uses pointer network to generate the start and end positions of slot values in a dialogue, and then extracts the slot values by copying.

SeqPtr: This is our modified version of the SpanPtr, this model generates the position of each word in the slot value in the dialogue instead of just the start and end positions.

HD:

Hierarchical structure is considered in this model, where multiple classifiers are used to predict the existence of each slot, and then the slot information is used to generate the slot value with PGN.

TRADE: This is the current state-of-the-art model on the MultiWOZ dataset. It uses a slot gate to predict whether slot values need to be generated, and there is a PGN-based state generator in the model to generate slot values.

All baselines and our models are set with the same parameters. Bi-GRU and GRU are used as encoder and decoder respectively. The dimension of word embedding and hidden state are both 400, and the dropout ratio is set to 0.2. All models are trained using the Adam optimizer with a batch size of 32, and all training consists of 50 epochs with early stopping on the validation set. In addition, word dropout is used on all models to improve generalization. More importantly, teacher forcing

[15] with ratio of 0.5 is adopted by all models in decoding, except for the word contextual representation on PGN-based DST, in order to be consistent with baseline.

4.3 Results

Models out-of-vocabulary ratios (%)
0 10 20 30 40 50 60 70 80 90 100
SpanPtr 63.2 62.8 61.8 60.8 59.5 60.0 59.7 56.7 54.8 54.1 47.5
SpanPtr_CSG(Enc) 63.4 64.1 62.8 62.4 61.2 59.4 58.3 57.9 55.8 54.5 49.9
SpanPtr_CSG(Sum) 62.6 61.9 62.9 60.3 61.0 59.1 58.2 57.9 55.8 53.8 51.2
SpanPt_CSG(Cat) 63.2 62.4 63.2 62.1 60.5 60.0 59.2 57.2 56.1 55.5 52.4
SeqPtr 64.8 64.1 64.2 61.5 61.2 60.4 59.9 57.0 56.7 55.9 50.4
SeqPtr_CSG(Enc) 63.5 62.9 62.9 61.5 62.1 61.1 56.8 58.0 57.0 55.4 51.2
SeqPtr_CSG(Sum) 64.8 63.4 63.0 60.9 61.8 60.0 58.9 58.3 56.9 56.4 52.3
SeqPtr_CSG(Cat) 64.0 63.6 63.8 60.8 62.2 60.7 58.8 59.6 56.5 54.9 48.3
HD 66.3 66.7 64.6 65.0 61.6 61.2 57.3 59.5 57.4 55.1 48.4
HD_CSG(Enc) 65.9 64.8 65.1 62.9 61.3 59.1 58.2 57.3 56.0 54.2 47.5
HD_CSG(Sum) 65.6 66.7 65.8 64.2 63.2 61.9 57.1 59.6 57.8 54.6 48.9
HD_CSG(Cat) 65.8 65.6 64.4 63.0 62.8 59.6 59.4 58.5 58.0 56.5 49.7
HD_CSG(Pws) 65.9 65.9 64.7 64.7 62.6 60.1 54.8 56.9 56.2 54.4 48.7
TRADE 65.8 66.0 66.6 65.8 62.6 60.4 62.8 59.7 59.1 58.1 51.0
TRADE_CSG(Enc) 65.7 64.9 65.4 64.1 62.4 58.3 60.2 59.1 56.9 56.9 50.0
TRADE_CSG(Sum) 67.4 67.4 66.6 65.1 63.7 62.2 61.3 61.9 59.6 57.3 52.7
TRADE_CSG(Cat) 65.3 66.1 65.8 65.1 63.7 60.0 61.5 59.8 59.6 57.1 52.1
TRADE_CSG(Pws) 66.8 65.9 65.9 64.7 64.0 60.9 58.7 59.9 58.7 54.2 51.1
TABLE II: Joint accuracy of dialog state tracking in multiple domains with different out-of-vocabulary ratios on MultiWOZ 2.1 dataset.

The joint accuracy of dialog state tracking (DST) on the modified MultiWOZ 2.1 dataset in different out-of-vocabulary ratios is shown in Table II and Fig 3. As an illustration of the name of our model, for example, SpanPtr CSG(Enc) refers to the improved DST model after adding the contextsensitive generation network we proposed into SpanPtr, where the utilization scheme of context is ”Enc”.

It should be emphasized here that the proposed model is mainly for the handing of unknown slot value containing multiple out-of-vocabulary words. In addition, since the MultiWOZ 2.1 dataset is target at complex DST in multiple domains, the final joint accuracy is not only relate to the extraction of unknown slot values, but also depend to other factors, such as cross-domain learning. Under this premise, according to Table II, in general, our model performs as well as all baselines when there are less than 38% unknown slot values containing multiple out-of-vocabulary words (out-ofvocabulary ratio is less than 70%). And when there are more than 38% unknown slot value with multiple out-of-vocabulary words (the out-of-vocabulary ratio is greater than 70%), our model is almost al ways outperforming all baselines. What can also be observed is that compared with pointer networkbased DST, the context utilization scheme ”Enc” does not perform very well on pointer-generator networks-based DST, this should be related to the fact that no force teaching is taken in training for the word contextual representation.

Fig. 3: Joint accuracy of dialog state tracking in domain restaurant with different out-of-vocabulary ratios on MultiWOZ 2.1 dataset.
Fig. 4: Comparison of the model’s correct predictions when slot values contains different numbers of words in domain restaurant on MultiWOZ 2.1 dataset. OOV0.0 refers to the out-of-vocabulary ratio of 0%, where the unknown slot value is not included. OOV1.0 refers to an out-of-vocabulary ratio of 100%, in which case all slot values are unknown slot values.

The experiments in the individual restaurant domain can better highlight the superiority of our model over all baselines, as shown in Fig 3, where the performance of all DST models is unaffected by knowledge sharing across domains. Here, we can observe more clearly that when out-ofvocabulary ratio exceeds 70% (the proportion of unknown slot values containing multiple out-ofvocabulary words exceeds 38%), all baselines are defective in extracting unknown slot values compared with our models. Besides, we can observe that “Sum” and “Cat” perform better in the four proposed context utilization schemes, which can also be observed in Table II. Therefore, we believe that these two schemes can make better use of the word contextual information to enhance the representation of word.

The influence of the number of out-ofvocabulary words on the extraction of unknown slot values is shown in Fig 4. The “Sum” scheme on the pointer network-based DST model and the “Cat” scheme on the pointer-generator networksbased DST model were compared with the baseline. It can be observed that as the number of out-of-vocabulary words in unknown slot value increases, the difficulty of unknown slot value extraction also increases gradually, and the superiority of our model becomes more and more obvious. In conclusion, our model can extract unknown slot values more efficiently, especially for unknown slot values that contain multiple out-of-vocabulary words, while retaining the ability to extract slot values that do not contain out-of-vocabulary words.

5 Conclusion

In this paper, we point out the defects of the current pointer network-based dialogue state tracking model in extracting unknown slot values, and propose a novel model to extract unknown slot values more effectively by enhancing the representation of word with the word contextual information, namely, context-sensitive generation network (CSG). And the method can also be generalized to improve pointer-generator networks based dialogue state tracking model. We also propose different context utilization schemes for the CSG, among which the “Sum” and “Cat” schemes proved to have very good performance and exceed the state-of-the-art models on MultiWOZ 2.1 dataset. In addition, our model performs better in extracting unknown slot values containing multiple out-of-vocabulary words than all baselines.

Acknowledgments

The authors would like to thank…

References

  • [1] P. Budzianowski, T. Wen, B. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gasic (2018) MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

    ,
    pp. 5016–5026. Cited by: §4.1.
  • [2] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014-10) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Cited by: §3.1.
  • [3] M. Eric, R. Goel, S. Paul, A. Sethi, S. Agarwal, S. Gao, and D. Hakkani-Tur (2019) Multiwoz 2.1: multi-domain dialogue state corrections and state tracking baselines. arXiv preprint arXiv:1907.01669. Cited by: §4.1.
  • [4] M. Henderson, B. Thomson, and J. D. Williams (2014) The second dialog state tracking challenge. In Proceedings of the 15th annual meeting of the special interest group on discourse and dialogue (SIGDIAL), pp. 263–272. Cited by: §4.1.
  • [5] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.1.
  • [6] H. Huang, X. Mao, and P. Yang (2019) Streamlined decoder for chinese spoken language understanding. In 2019 International Conference on Multimodal Interaction, pp. 516–520. Cited by: §1, §2.
  • [7] A. Jadhav and V. Rajan (2018) Extractive summarization with swap-net: sentences and words from alternating pointer networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 142–151. Cited by: §2.
  • [8] S. Kim and R. E. Banchs (2014) Sequential labeling for tracking dynamic dialog states. In 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 332. Cited by: §2.
  • [9] J. Li, D. Ye, and S. Shang (2019) Adversarial transfer for named entity boundary detection with pointer networks. In

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    ,
    pp. 5053–5059. Cited by: §2.
  • [10] N. Mrkšić, D. Ó. Séaghdha, T. Wen, B. Thomson, and S. Young (2017) Neural belief tracker: data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1777–1788. Cited by: §1.
  • [11] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gulçehre, and B. Xiang (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 280–290. Cited by: §2.
  • [12] A. See, P. J. Liu, and C. D. Manning (2017) Get to the point: summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083. Cited by: §2, §3.2.
  • [13] O. Vinyals, M. Fortunato, and N. Jaitly (2015) Pointer networks. In Advances in neural information processing systems, pp. 2692–2700. Cited by: §1, §2, §3.2.
  • [14] S. Wang and J. Jiang (2016) Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905. Cited by: §2.
  • [15] R. J. Williams and D. Zipser (1989)

    A learning algorithm for continually running fully recurrent neural networks

    .
    Neural computation 1 (2), pp. 270–280. Cited by: §4.2.
  • [16] C. Wu, A. Madotto, E. Hosseini-Asl, C. Xiong, R. Socher, and P. Fung (2019) Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 808–819. Cited by: §1, §2, §4.2.
  • [17] P. Xu and Q. Hu (2018) An end-to-end approach for handling unknown slot values in dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1448–1457. Cited by: §1, §2, §4.2.
  • [18] M. Yazdani and J. Henderson (2015) A model of zero-shot learning of spoken language understanding. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 244–249. Cited by: §1.
  • [19] L. Zhao and Z. Feng (2018) Improving slot filling in spoken language understanding with joint pointer and attention. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 426–431. Cited by: §1, §2.
  • [20] Z. Zhao, S. Zhu, and K. Yu (2019) A hierarchical decoding model for spoken language understanding from unaligned data. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7305–7309. Cited by: §1, §3.2, §4.2.
  • [21] V. Zhong, C. Xiong, and R. Socher (2018) Global-locally self-attentive encoder for dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1458–1467. Cited by: §1.