Real-world conversations often involve more than two speakers. In the Ubuntu Internet Relay Chat channel (IRC), for example, one user can initiate a discussion about an Ubuntu-related technical issue, and many other users can work together to solve the problem. Dialogs can have complex speaker interactions: at each turn, users play one of three roles (sender, addressee, observer), and those roles vary across turns.
In this paper, we study the problem of addressee and response selection in multi-party conversations: given a responding speaker and a dialog context, the task is to select an addressee and a response from a set of candidates for the responding speaker. The task requires modeling multi-party conversations and can be directly used to build retrieval-based dialog systems [Lu and Li2013, Hu et al.2014, Ji, Lu, and Li2014, Wang et al.2015].
The previous state-of-the-art Dynamic-RNN model from ouchi-tsuboi:2016:EMNLP2016 ouchi-tsuboi:2016:EMNLP2016 maintains speaker embeddings to track each speaker status, which dynamically changes across time steps. It then produces the context embedding from the speaker embeddings and selects the addressee and response based on embedding similarity. However, this model updates only the sender embedding, not the embeddings of the addressee or observers, with the corresponding utterance, and it selects the addressee and response separately. In this way, it only models who says what and fails to capture addressee information. Experimental results show that the separate selection process often produces inconsistent addressee-response pairs.
To solve these issues, we introduce the Speaker Interaction Recurrent Neural Network (SI-RNN). SI-RNN redesigns the dialog encoder by updating speaker embeddings in a role-sensitive way. Speaker embeddings are updated in different GRU-based units depending on their roles (sender, addressee, observer). Furthermore, we note that the addressee and response are mutually dependent and view the task as a joint prediction problem. Therefore, SI-RNN models the conditional probability (of addressee given the response and vice versa) and selects the addressee and response pair by maximizing the joint probability.
On a public standard benchmark data set, SI-RNN significantly improves the addressee and response selection accuracy, particularly in complex conversations with many speakers and responses to distant messages many turns in the past. Our code and data set are available online.111The released code: https://github.com/ryanzhumich/sirnn
2 Related Work
We follow a data-driven approach to dialog systems. singh1999reinforcement singh1999reinforcement, henderson2008hybrid henderson2008hybrid, and young2013pomdp young2013pomdp optimize the dialog policy using Reinforcement Learning or the Partially Observable Markov Decision Process framework. In addition, henderson2014second henderson2014second propose to use a predefined ontology as a logical representation for the information exchanged in the conversation. The dialog system can be divided into different modules, such as Natural Language Understanding[Yao et al.2014, Mesnil et al.2015], Dialog State Tracking [Henderson, Thomson, and Young2014, Williams, Raux, and Henderson2016]Wen et al.2015]. Furthermore, wen2016network wen2016network and bordes2017learning bordes2017learning propose end-to-end trainable goal-oriented dialog systems.
Recently, short text conversation has been popular. The system receives a short dialog context and generates a response using statistical machine translation or sequence-to-sequence networks [Ritter, Cherry, and Dolan2011, Vinyals and Le2015, Shang, Lu, and Li2015, Serban et al.2016, Li et al.2016, Mei, Bansal, and Walter2017]. In contrast to response generation, the retrieval-based approach uses a ranking model to select the highest scoring response from candidates [Lu and Li2013, Hu et al.2014, Ji, Lu, and Li2014, Wang et al.2015]. However, these models are single-turn responding machines and thus still are limited to short contexts with only two speakers. As for larger context, lowe2015ubuntu lowe2015ubuntu propose the Next Utterance Classification (NUC) task for multi-turn two-party dialogs. ouchi-tsuboi:2016:EMNLP2016 ouchi-tsuboi:2016:EMNLP2016 extend NUC to multi-party conversations by integrating the addressee detection problem. Since the data is text based, they use only textual information to predict addressees as opposed to relying on acoustic signals or gaze information in multimodal dialog systems [Jovanović, Akker, and Nijholt2006, op den Akker and Traum2009].
Furthermore, several other papers are recently presented focusing on modeling role-specific information given the dialogue contexts [Meng, Mou, and Jin2017, Chi et al.2017, Chen et al.2017]. For example, meng2017towards meng2017towards combine content and temporal information to predict the utterance speaker. By contrast, our SIRNN explicitly utilizes the speaker interaction to maintain speaker embeddings and predicts the addressee and response by joint selection.
3.1 Addressee and Response Selection
ouchi-tsuboi:2016:EMNLP2016 ouchi-tsuboi:2016:EMNLP2016 propose the addressee and response selection task for multi-party conversation. Given a responding speaker and a dialog context , the task is to select a response and an addressee. is a list ordered by time step:
where says to at time step , and is the total number of time steps before the response and addressee selection. The set of speakers appearing in is denoted . As for the output, the addressee is selected from , and the response is selected from a set of candidates . Here, contains the ground-truth response and one or more false responses. We provide some examples in Table 4 (Section 6).
|Sender ID at time|
|Addressee ID at time|
|Utterance at time|
|Utterance embedding at time|
|Speaker embedding of at time|
(Left) and SI-RNN (Right) for an example context at the top. Speaker embeddings are initialized as zero vectors and updated recurrently as hidden states along the time step. In SI-RNN, the same speaker embedding is updated in different units depending on the role (for sender, for addressee, for observer).
3.2 Dynamic-RNN Model
In this section, we briefly review the state-of-the-art Dynamic-RNN model [Ouchi and Tsuboi2016], which our proposed model is based on. Dynamic-RNN solves the task in two phases: 1) the dialog encoder maintains a set of speaker embeddings to track each speaker status, which dynamically changes with time step ; 2) then Dynamic-RNN produces the context embedding from the speaker embeddings and selects the addressee and response based on embedding similarity among context, speaker, and utterance.
Figure 1 (Left) illustrates the dialog encoder in Dynamic-RNN on an example context. In this example, says to , then says to , and finally says to . The context will be:
with the set of speakers .
For a speaker , the bold letter denotes its embedding at time step . Speaker embeddings are initialized as zero vectors and updated recurrently as hidden states of GRUs [Cho et al.2014, Chung et al.2014]. Specifically, for each time step with the sender and the utterance , the sender embedding is updated recurrently from the utterance:
where is the embedding for utterance . Other speaker embeddings are updated from . The speaker embeddings are updated until time step .
To summarize the whole dialog context
, the model applies element-wise max pooling over all the speaker embeddings to get the context embedding:
The probability of an addressee and a response being the ground truth is calculated based on embedding similarity. To be specific, for addressee selection, the model compares the candidate speaker , the dialog context , and the responding speaker :
where is the final speaker embedding for the responding speaker , is the final speaker embedding for the candidate addressee ,
is the logistic sigmoid function,is the row-wise concatenation operator, and is a learnable parameter. Similarly, for response selection,
where is the embedding for the candidate response , and is a learnable parameter.
4 Speaker Interaction RNN
While Dynamic-RNN can track the speaker status by capturing who says what in multi-party conversation, there are still some issues. First, at each time step, only the sender embedding is updated from the utterance. Therefore, other speakers are blind to what is being said, and the model fails to capture addressee information. Second, while the addressee and response are mutually dependent, Dynamic-RNN selects them independently. Consider a case where the responding speaker is talking to two other speakers in separate conversation threads. The choice of addressee is likely to be either of the two speakers, but the choice is much less ambiguous if the correct response is given, and vice versa. Dynamic-RNN often produces inconsistent addressee-response pairs due to the separate selection. See Table 4 for examples.
In contrast to Dynamic-RNN, the dialog encoder in SI-RNN updates embeddings for all the speakers besides the sender at each time step. Speaker embeddings are updated depending on their roles: the update of the sender is different from the addressee, which is different from the observers. Furthermore, the update of a speaker embedding is not only from the utterance, but also from other speakers. These are achieved by designing variations of GRUs for different roles. Finally, SI-RNN selects the addressee and response jointly by maximizing the joint probability.
4.1 Utterance Encoder
4.2 Dialog Encoder
Figure 1 (Right) shows how SI-RNN encodes the example in Eq 1. Unlike Dynamic-RNN, SI-RNN updates all speaker embeddings in a role-sensitive manner. For example, at the first time step when says to , Dynamic-RNN only updates using , while other speakers are updated using . In contrast, SI-RNN updates each speaker status with different units: updates the sender embedding from the utterance embedding and the addressee embedding ; updates the addressee embedding from and ; updates the observer embedding from .
Algorithm 1 gives a formal definition of the dialog encoder in SI-RNN. The dialog encoder is a function that takes as input a dialog context (lines 1-5) and returns speaker embeddings at the final time step (lines 28-30). Speaker embeddings are initialized as -dimensional zero vectors (lines 6-9). Speaker embeddings are updated by iterating over each line in the context (lines 10-27).
4.3 Role-Sensitive Update
In this subsection, we explain in detail how // update speaker embeddings according to their roles at each time step (Algorithm 1 lines 19-26).
As shown in Figure 2, // are all GRU-based units. updates the sender embedding from the previous sender embedding , the previous addressee embedding , and the utterance embedding :
The update, as illustrated in the upper part of Figure 2, is controlled by three gates. The gate controls the previous sender embedding , and controls the previous addressee embedding . Those two gated interactions together produce the sender embedding proposal . Finally, the update gate combines the proposal and the previous sender embedding to update the sender embedding . The computations in (including gates , , , the proposal embedding , and the final updated embedding ) are formulated as:
where are learnable parameters. uses the same formulation with a different set of parameters, as illustrated in the middle of Figure 2. In addition, we update the observer embeddings from the utterance. is implemented as the traditional GRU unit in the lower part of Figure 2. Note that the parameters in // are not shared. This allows SI-RNN to learn role-dependent features to control speaker embedding updates. The formulations of and are similar.
4.4 Joint Selection
The dialog encoder takes the dialog context as input and returns speaker embeddings at the final time step, . Recall from Section 3.2 that Dynamic-RNN produces the context embedding using Eq 2 and then selects the addressee and response separately using Eq 3 and Eq 4.
In contrast, SI-RNN performs addressee and response selection jointly: the response is dependent on the addressee and vice versa. Therefore, we view the task as a sequence prediction process: given the context and responding speaker, we first predict the addressee, and then predict the response given the addressee. (We also use the reversed prediction order as in Eq 7.)
In addition to Eq 3 and Eq 4, SI-RNN is also trained to model the conditional probability as follows. To predict the addressee, we calculate the probability of the candidate speaker to be the ground-truth given the ground-truth response (available during training time):
The key difference from Eq 3 is that Eq 5 is conditioned on the correct response with embedding . Similarly, for response selection, we calculate the probability of a candidate response given the ground-truth addressee :
At test time, SI-RNN selects the addressee-response pair from to maximize the joint probability :
In Eq 7, we decompose the joint probability into two terms: the first term selects the response given the context, and then selects the addressee given the context and the selected response; the second term selects the addressee and response in the reversed order.222Detail: We also considered an alternative decomposition of the joint probability as , but the performance was similar to Eq 7.
|RES-CAND = 2||RES-CAND = 10|
|[Ouchi and Tsuboi2016]||10||48.52||48.67||60.97||77.75||22.78||23.31||60.66||35.91|
|[Zhou et al.2016]||10||51.37||51.76||64.61||78.28||25.46||25.83||64.86||36.94|
|[Serban et al.2016]||15||52.78||53.04||65.84||79.08||26.31||26.62||65.89||37.85|
|[Ouchi and Tsuboi2016]||10||52.76||53.85||66.94||78.16||25.44||25.95||66.70||36.14|
|SI-RNN w/ shared IGRUs||15||59.50||59.47||74.20||78.08||28.31||28.45||73.35||36.00|
|SI-RNN w/o joint selection||15||63.13||63.40||77.56||80.38||32.24||32.53||77.61||39.73|
|Adr Mention Freq||-||0.32||0.34||0.34|
|# Speakers / Doc||26.8||26.3||30.7||32.1|
|# Utters / Doc||326.3||317.9||360.8||396.1|
|# Words / Utter||11.1||11.1||11.2||11.3|
5 Experimental Setup
We use the Ubuntu Multiparty Conversation Corpus [Ouchi and
Tsuboi2016] and summarize the data statistics in Table 3.
The whole data set (including the Train/Dev/Test split and the false response candidates) is publicly available.333https://github.com/hiroki13/response-ranking/tree/master/data/input
The data set is built from the Ubuntu IRC chat room where a number of users discuss Ubuntu-related technical issues.
The log is organized as one file per day corresponding to a document .
Each document consists of (Time, SenderID, Utterance) lines.
If users explicitly mention addressees at the beginning of the utterance, the addresseeID is extracted.
Then a sample, namely a unit of input (the dialog context and the current sender) and output (the addressee and response prediction) for the task, is created to predict the ground-truth addressee and response of this line.
Note that samples are created only when the addressee is explicitly mentioned for clear, unambiguous ground-truth labels.
False response candidates are randomly chosen from all other utterances within the same document.
Therefore, distractors are likely from the same sub-conversation or even from the same sender but at different time steps.
This makes it harder than lowe2015ubuntu lowe2015ubuntu where distractors are randomly chosen from all documents.
If no addressee is explicitly mentioned, the addressee is left blank and the line is marked as a part of the context.
Baselines. Apart from Dynamic-RNN, we also include several other baselines. Recent+TF-IDF always selects the most recent speaker (except the responding speaker
For a fair comparison, we follow the hyperparameters from ouchi-tsuboi:2016:EMNLP2016 ouchi-tsuboi:2016:EMNLP2016, which are chosen based on the validation data set. We take a maximum of 20 words for each utterance. We use 300-dimensional GloVe word vectors444http://nlp.stanford.edu/projects/glove/
, which are fixed during training. SI-RNN uses 50-dimensional vectors for both speaker embeddings and hidden states. Model parameters are initialized with a uniform distribution between -0.01 and 0.01. We set the mini-batch size to 128. The joint cross-entropy loss function with 0.001 L2 weight decay is minimized by Adam[Kingma and Ba2015]
. The training is stopped early if the validation accuracy is not improved for 5 consecutive epochs. All experiments are performed on a single GTX Titan X GPU. The maximum number of epochs is 30, and most models converge within 10 epochs.
6 Results and Discussion
For fair and meaningful quantitative comparisons, we follow ouchi-tsuboi:2016:EMNLP2016 ouchi-tsuboi:2016:EMNLP2016’s evaluation protocols.
SI-RNN improves the overall accuracy on the addressee and response selection task.
Two ablation experiments further analyze the contribution of role-sensitive units and joint selection respectively.
We then confirm the robustness of SI-RNN with the number of speakers and distant responses.
Finally, in a case study we discuss how SI-RNN handles complex conversations by either engaging in a new sub-conversation or responding to a distant message.
Overall Result. As shown in Table 2, SI-RNN significantly improves upon the previous state-of-the-art. In particular, addressee selection (ADR) benefits most, with different number of candidate responses (denoted as RES-CAND): around 12% in RES-CAND and more than 10% in RES-CAND . Response selection (RES) is also improved, suggesting role-sensitive GRUs and joint selection are helpful for response selection as well. The improvement is more obvious with more candidate responses (2% in RES-CAND and 4% in RES-CAND ). These together result in significantly better accuracy on the ADR-RES metric as well.
Ablation Study. We show an ablation study in the last rows of Table 2. First, we share the parameters of //. The accuracy decreases significantly, indicating that it is crucial to learn role-sensitive units to update speaker embeddings. Second, to examine our joint selection, we fall back to selecting the addressee and response separately, as in Dynamic-RNN. We find that joint selection improves ADR and RES individually, and it is particularly helpful for pair selection ADR-RES.
Number of Speakers.
Numerous speakers create complex dialogs and increased candidate addressee, thus the task becomes more challenging.
In Figure 3 (Upper), we investigate how ADR accuracy changes with the number of speakers in the context of length 15, corresponding to the rows with T=15 in Table 2.
Recent+TF-IDF always chooses the most recent speaker and the accuracy drops dramatically as the number of speakers increases.
Direct-Recent+TF-IDF shows better performance, and Dynamic-RNNis marginally better.
SI-RNN is much more robust and remains above 70% accuracy across all bins.
The advantage is more obvious for bins with more speakers.
Addressing Distance. Addressing distance is the time difference from the responding speaker to the ground-truth addressee. As the histogram in Figure 3 (Lower) shows, while the majority of responses target the most recent speaker, many responses go back five or more time steps. It is important to note that for those distant responses, Dynamic-RNN sees a clear performance decrease, even worse than Direct-Recent+TF-IDF. In contrast, SI-RNN handles distant responses much more accurately.
Case Study. Examples in Table 4 show how SI-RNN can handle complex multi-party conversations by selecting from 10 candidate responses. In both examples, the responding speakers participate in two or more concurrent sub-conversations with other speakers.
Example (a) demonstrates the ability of SI-RNN to engage in a new sub-conversation. The responding speaker “wafflejock” is originally involved in two sub-conversations: the sub-conversation 1 with “codepython”, and the ubuntu installation issue with “theoletom”. While it is reasonable to address “codepython” and “theoletom”, the responses from other baselines are not helpful to solve corresponding issues. TF-IDF prefers the response with the “install” key-word, yet the response is repetitive and not helpful. Dynamic-RNN selects an irrelevant response to “codepython”. SI-RNN chooses to engage in a new sub-conversation by suggesting a solution to “releaf” about Ubuntu dedicated laptops.
Example (b) shows the advantage of SI-RNN in responding to a distant message. The responding speaker “nicomachus” is actively engaged with “VeryBewitching” in the sub-conversation 1 and is also loosely involved in the sub-conversation 2: “chingao” mentions “nicomachus” in the most recent utterance. SI-RNN remembers the distant sub-conversation 1 and responds to “VeryBewitching” with a detailed answer. Direct-Recent+TF-IDF selects the ground-truth addressee because “VeryBewitching” talks to “nicomachus”, but the response is not helpful. Dynamic-RNN is biased to the recent speaker “chingao”, yet the response is not relevant.
SI-RNN jointly models who says what to whom by updating speaker embeddings in a role-sensitive way. It provides state-of-the-art addressee and response selection, which can instantly help retrieval-based dialog systems. In the future, we also consider using SI-RNN to extract sub-conversations in the unlabeled conversation corpus and provide a large-scale disentangled multi-party conversation data set.
We thank the members of the UMichigan-IBM Sapphire Project and all the reviewers for their helpful feedback. This material is based in part upon work supported by IBM under contract 4915012629. Any opinions, findings, conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of IBM.
- [Bordes and Weston2017] Bordes, A., and Weston, J. 2017. Learning end-to-end goal-oriented dialog. In ICLR.
- [Chen et al.2017] Chen, P.-C.; Chi, T.-C.; Su, S.-Y.; and Chen, Y.-N. 2017. Dynamic time-aware attention to speaker roles and contexts for spoken language understanding. In ASRU.
- [Chi et al.2017] Chi, T.-C.; Chen, P.-C.; Su, S.-Y.; and Chen, Y.-N. 2017. Speaker role contextual modeling for language understanding and dialogue policy learning. In IJCNLP.
- [Cho et al.2014] Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP.
[Chung et al.2014]
Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y.
Empirical evaluation of gated recurrent neural networks on sequence
NIPS 2014 Deep Learning and Representation Learning Workshop.
[Henderson, Lemon, and
Henderson, J.; Lemon, O.; and Georgila, K.
Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets.Computational Linguistics 34(4):487–511.
- [Henderson, Thomson, and Williams2014] Henderson, M.; Thomson, B.; and Williams, J. 2014. The second dialog state tracking challenge. In SIGDIAL.
- [Henderson, Thomson, and Young2014] Henderson, M.; Thomson, B.; and Young, S. 2014. Word-based dialog state tracking with recurrent neural networks. In SIGDIAL.
- [Hu et al.2014] Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS.
- [Ji, Lu, and Li2014] Ji, Z.; Lu, Z.; and Li, H. 2014. An information retrieval approach to short text conversation. arXiv preprint arXiv:1408.6988.
- [Jovanović, Akker, and Nijholt2006] Jovanović, N.; Akker, R. o. d.; and Nijholt, A. 2006. Addressee identification in face-to-face meetings. In EACL.
- [Kingma and Ba2015] Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. International Conference for Learning Representations (ICLR).
- [Li et al.2016] Li, J.; Galley, M.; Brockett, C.; Spithourakis, G.; Gao, J.; and Dolan, B. 2016. A persona-based neural conversation model. In ACL.
- [Lowe et al.2015] Lowe, R.; Pow, N.; Serban, I.; and Pineau, J. 2015. The Ubuntu Dialogue Corpus: A large dataset for research in unstructured multi-turn dialogue systems. In SIGDIAL.
- [Lu and Li2013] Lu, Z., and Li, H. 2013. A deep architecture for matching short texts. In NIPS.
- [Mei, Bansal, and Walter2017] Mei, H.; Bansal, M.; and Walter, M. R. 2017. Coherent dialogue with attention-based language models. In AAAI.
- [Meng, Mou, and Jin2017] Meng, Z.; Mou, L.; and Jin, Z. 2017. Towards neural speaker modeling in multi-party conversation: The task, dataset, and models. arXiv preprint arXiv:1708.03152.
- [Mesnil et al.2015] Mesnil, G.; Dauphin, Y.; Yao, K.; Bengio, Y.; Deng, L.; Hakkani-Tur, D.; He, X.; Heck, L.; Tur, G.; Yu, D.; et al. 2015. Using recurrent neural networks for slot filling in spoken language understanding. Audio, Speech, and Language Processing, IEEE/ACM Transactions on 23(3):530–539.
- [op den Akker and Traum2009] op den Akker, R., and Traum, D. 2009. A comparison of addressee detection methods for multiparty conversations. Workshop on the Semantics and Pragmatics of Dialogue.
- [Ouchi and Tsuboi2016] Ouchi, H., and Tsuboi, Y. 2016. Addressee and response selection for multi-party conversation. In EMNLP.
- [Ritter, Cherry, and Dolan2011] Ritter, A.; Cherry, C.; and Dolan, W. B. 2011. Data-driven response generation in social media. In EMNLP.
- [Serban et al.2016] Serban, I. V.; Sordoni, A.; Bengio, Y.; Courville, A.; and Pineau, J. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI.
- [Shang, Lu, and Li2015] Shang, L.; Lu, Z.; and Li, H. 2015. Neural responding machine for short-text conversation. In ACL.
- [Singh et al.1999] Singh, S. P.; Kearns, M. J.; Litman, D. J.; and Walker, M. A. 1999. Reinforcement learning for spoken dialogue systems. In NIPS.
- [Vinyals and Le2015] Vinyals, O., and Le, Q. 2015. A neural conversational model. ICML Deep Learning Workshop.
- [Wang et al.2015] Wang, M.; Lu, Z.; Li, H.; and Liu, Q. 2015. Syntax-based deep matching of short texts. In IJCAI.
- [Wen et al.2015] Wen, T.-H.; Gašić, M.; Mrkšić, N.; Su, P.-H.; Vandyke, D.; and Young, S. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In EMNLP.
- [Wen et al.2016] Wen, T.-H.; Vandyke, D.; Mrksic, N.; Gasic, M.; Rojas-Barahona, L. M.; Su, P.-H.; Ultes, S.; and Young, S. 2016. A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562.
- [Williams, Raux, and Henderson2016] Williams, J.; Raux, A.; and Henderson, M. 2016. The dialog state tracking challenge series: A review. Dialogue & Discourse 7(3):4–33.
[Yao et al.2014]
Yao, K.; Peng, B.; Zhang, Y.; Yu, D.; Zweig, G.; and Shi, Y.
Spoken language understanding using long short-term memory neural networks.In Spoken Language Technology Workshop (SLT), 2014 IEEE, 189–194. IEEE.
- [Young et al.2013] Young, S.; Gasic, M.; Thomson, B.; and Williams, J. D. 2013. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE 101(5):1160–1179.
- [Zhou et al.2016] Zhou, X.; Dong, D.; Wu, H.; Zhao, S.; Yu, D.; Tian, H.; Liu, X.; and Yan, R. 2016. Multi-view response selection for human-computer conversation. In EMNLP.