Beyond "How may I help you?": Assisting Customer Service Agents with Proactive Responses

We study the problem of providing recommended responses to customer service agents in live-chat dialogue systems. Smart-reply systems have been widely applied in real-world applications (e.g. Gmail, LinkedIn Messaging), where most of them can successfully recommend reactive responses. However, we observe a major limitation of current methods is that they generally have difficulties in suggesting proactive investigation act (e.g. "Do you perhaps have another account with us?") due to the lack of long-term context information, which indeed act as critical steps for customer service agents to collect information and resolve customers' issues. Thus in this work, we propose an end-to-end method with special focus on suggesting proactive investigative questions to customer agents in Airbnb's customer service live-chat system. Effectiveness of our proposed method can be validated through qualitative and quantitative results.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/27/2022

AdaCoach: A Virtual Coach for Training Customer Service Agents

With the development of online business, customer service agents gradual...
03/23/2021

Unsupervised Contextual Paraphrase Generation using Lexical Control and Reinforcement Learning

Customer support via chat requires agents to resolve customer queries wi...
07/26/2016

Leveraging Unstructured Data to Detect Emerging Reliability Issues

Unstructured data refers to information that does not have a predefined ...
12/14/2020

Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling

In a customer service system, dialogue summarization can boost service e...
01/17/2022

Exploit Customer Life-time Value with Memoryless Experiments

As a measure of the long-term contribution produced by customers in a se...
08/17/2017

An Annotated Corpus of Relational Strategies in Customer Service

We create and release the first publicly available commercial customer s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Figure 1: An example of dialogue assistant for Airbnb’s live-chat customer service agents. The recommendations carry the context from previous rounds: customer’s intent to change payment method. (Placeholders such as customer, agent, host name, and 1234 are auto-filled in real applications.)

Smart-reply systems have been adopted in many real-world applications and have assisted numerous textual responses in people’s day-to-day email and messaging activities (e.g. Gmail, LinkedIn Messaging). These systems are particularly effective in reducing users’ cognitive load of processing text content and the inconvenience of typing responses.

At Airbnb, we serve the community of hosts and guests around the world. It is an essential part of Airbnb’s business to provide high quality customer service at scale. Live-chat is one of the primary customer support channels, where customers work with our customer service agents to resolve their issues through live-chat conversations. We observed that it was time-consuming for our live-chat agents to repeatedly type or copy-paste commonly used messages in their conversations. It could become an overhead preventing the agents from deep engagement with the customers. These observations initially motivated us to introduce ‘smart replies’ into our customer support live-chat conversations, where we wish to suggest short responses for agents to use with 1 click.

Most existing methods of ‘smart replies’ seek to build a sequence-to-sequence or a classification model on top of the preceding message and generate the corresponding response [Kannan et al.2016, Henderson et al.2017, Pasternack et al.2017]. We first deployed a variant of such systems in Airbnb’s live-chat customer service and evaluated its performance. This first version of smart replies has resulted in significant positive impact in agent efficiency metrics and received positive feedback from our agents. However, we also observed a unique challenge in our application scenario. Unlike existing smart-reply systems where messages involved are mostly asynchronous and reactive, in a live-chat conversation, there are many rounds of back and forth between the customer and the agent (see Table 1). Apart from reacting to customers’ preceding inquires, agents may ask questions to investigate customers’ problems and provide information to resolve customer issues. Thus in order to improve the efficiency of the problem-solving process as well as providing consistent investigation guidance to our agents, we extend our work to focus on suggesting proactive investigative questions to Airbnb’s live-chat customer service agents.

Median Mean
Number of Messages 17.4 14
Number of Turns 7.9 7.0
Number of Rounds 4.7 4
Table 1: Median and mean number of messages, turns and rounds in live-chat conversations. One round contains two consecutive turns from the agent and the customer, exceptions at start and end

Related Work

Modeling conversation dialogues and generating responses have been extensively studied recently. Most relevant to our work are smart-reply systems, where the preceding message is taken as the input and the corresponding response is generated accordingly [Oriol Vinyals2015, Kannan et al.2016, Henderson et al.2017, Pasternack et al.2017]

. Specifically, given the sequence of tokens in the most recent message, the conditional probability of response tokens can be regarded as the output sequence, and text generation

[Kannan et al.2016, Oriol Vinyals2015] or classification [Henderson et al.2017, Pasternack et al.2017]

models can be used to approximate such a probability distribution. At the inference time, response candidates with high probabilities will be suggested.

We wish to extend existing smart-reply systems to customer service live-chat dialogues. Notice that in the above described systems, only the short-term memory (i.e., the preceding message) is involved as the input sequence. We observe that such a method is particularly suitable for suggesting reactive responses and answer customer questions.

  • Reactive Responses. The intention of this type of messages is promptly responding to the current situation rather than leading the conversation flow, which largely relies on the short-term memory only. For example, it is relatively straightforward to predict “You’re welcome!” or “It’s my pleasure.” by looking at the most recent message “Thank you!”.

  • Question and Answer. Another type of tasks these short-term methods are good at is to answer customers’ simple questions (e.g. questions do not require further investigations and can be answered in one single round). For example, answering “Your payout will be released in 24 hours.” to the question “When will my payout be released?”. Again, this use case is reactive in nature and mostly relies on the short-term memory.

However, we find existing methods have difficulties in suggesting common responses that include investigation intents (e.g. “Can you provide the reservation code?”, “Do you perhaps have a duplicated account with us?”). Different from previous reactive responses, these messages are proactive (i.e., it requires agents to lead conversations and investigation flows), dynamic (i.e., solving a problem could require a sequence of investigative questions), and issue-dependent (i.e., different types of problem issues may lead to different resolving processes). Thus in order to suggest these investigative questions, our system needs to be capable of capturing the short-term context from the preceding massage, as well as understanding the long-term problem context throughout the conversation.

On the other hand, chatbot systems have raised much attention in the customer service domain [Xu et al.2017, Cui et al.2017]. Our work differs from these systems in terms of the following two aspects: 1) unlike most chatbot systems which utilize language generation models to predict the full responses, we develop a pipeline to efficiently generate and carefully curate the response candidates, and regard the ultimate response suggestion task as a candidate ranking problem; 2) different from the chatbot system [Cui et al.2017] where the problem-solving task is regarded as a single-round question-answering problem, our smart-reply system is designed to solve more sophisticated user issues where long-term problem contexts are required to be considered.

Many other studies exists on modeling dialogues, such as [Serban et al.2016, Sordoni et al.2015, Shang, Lu, and Li2015, Li et al.2016b, Li et al.2016a]. Although most of these models have not been deployed in production settings, some of them did inspire us in terms of the model architecture design. Specifically, our model is largely inspired by the HRED architecture [Serban et al.2016] where hierarchical recurrent encoder-decoders are applied to capture both short-term and long-term conversational contexts.

In this work, we focus on suggesting investigative questions

to agents. In the subsequent section, we will describe 1) how we define and uncover investigative question candidates from Airbnb’s agent-customer conversation data; and 2) how we build a machine learning system to suggest these response candidates to agents by leveraging both short-term conversation contexts and long-term problem-solving contexts.

Method

We describe a typical live-chat workflow as follows. A user first describes the problems and this initializes a customer ticket, which will be routed to one of our live-chat agents. The live-chat agent picks up the ticket afterwards and determines the associated ticket issue category (e.g.  “cancel reservation”, “change payment method”). This ticket issue category can be selected by the customer when filing the ticket, re-assigned by an agent when picking up the ticket, or remaining blank (i.e., missing) throughout the conversation. Then the agent-customer live-chat conversation starts with the original customer problem description message, and keeps running for several conversation rounds in real-time until the ticket is solved or escalated to another channel.

Then we introduce our system from the following two perspectives: investigative query candidate generation, and investigative question recommendation.

Investigative Question Candidate Generation

Figure 2: Investigative Question Candidate Generation Pipeline.

An important observation from our agent-customer conversation data is that agents’ investigation intents can be carried out by asking questions to customers. This motivates us to start inspecting the questions within agents’ messages and generating investigative query candidates from there. We describe the complete question candidate generation pipeline as follows.

  • Preprocessing and Tokenization. We begin preprocessing by anonymizing personal information in the original conversation messages. This kind of information includes customers’ names, agents’ names, phone numbers, emails, credit card numbers, URLs, dollar amounts, timestamps and other sensitive information. Note this information will be replaced by the associated placeholders during training but will be automatically filled back at the inference time according to the ticket information. After data anonymization, we split agents’ messages into sentences and process each sentence into unigram tokens.

  • Vectorization.

    The purpose of this step is to transform the original textual information into numeric vector representations so that the downstream machine learning techniques can be applied. We start with all tokenized sentences from agents’ messages in our live-chat conversation data and apply

    word2vec [Mikolov et al.2013b, Mikolov et al.2013a] to obtain word representations111The dimensionality of word embeddings is set to be 300.. For each word token, we calculate its term frequency – inverse document frequency (Tf-Idf) in the complete agent’s sentence corpus. Then we can generate sentence embeddings by aggregating word embeddings based on their associated Tf-Idf weights.

  • Question Extraction. Now we narrow down the query candidate generation scope on the sentences ending with question marks. Note although the initial candidates are generated from these questions, they will be expended to cover sentences without question marks later by the re-matching process.

  • Investigative Question Candidate Generation. By analyzing the extracted agents’ questions, we observe three different types of questions in our live-chat conversation data.

    • Courtesy Questions. These questions usually appear as greetings (e.g. “How are you doing today?”) or closure signals (e.g. “Is there anything else I can help you with?”) in agent-customer live-chat conversations.

    • Status-Checking Questions.

      In order to ensure that customers are always engaged in our live-chat problem-solving conversations, agents may need to ask status-checking questions occasionally. A typical example of such questions is “Are you still there with me?”.

    • Investigative Questions. Apart from courtesy questions and status-checking questions, we find most of the remaining questions is relevant to investigating and resolving customers’ issues. We thus refer these questions as investigative questions.

    Figure 3: Illustration of the response recommendation model architecture for a live-chat conversation with a pre-assigned ticket issue.

    We notice that variants of agents’ courtesy and status-checking questions are relatively limited and can be easily identified by exact matching or simple rules. On the other hand, we still observe more than 60% of agents’ questions are investigative questions, where each with numerous variations and cannot be naively retrieved by exact matching or simple rules. We thus apply a pipeline to cluster these questions and uncover different question variations.

    • We first identify several most common courtesy and status-checking questions. Then we remove these questions and their variations, according to the cosine similarities between precomputed sentence embeddings and the courtesy/status-checking question embeddings, from our question candidate pool.

    • In order to cover investigative question candidates for different ticket issues, for each of the top ticket issues, we extract the associated questions, apply a scalable mini-batch K-means algorithm and assign them into 100 clusters. From each mini-cluster, we extract 3 most frequent variations as candidates. After de-duplicating, we can obtain coarse-grained investigative question candidates.

    • Then we apply a hierarchical clustering algorithm on the extracted coarse-grained candidates. We refine the top candidates in each cluster and then generate a fine-grained question candidate list.

    • In order to ensure the high quality of suggested responses, it is important to have our content experts involved in carefully reviewing these response candidates. Reviewing these response candidates is a much lighter and scalable project for the content team compared with writing all potential responses from scratch, which might run into the situation of not applicable to particular contexts, as well as cumbersome to maintain and to keep up to date. In this process, we need to ensure that each candidate cluster represents distinguished semantic intent (e.g. “Can you provide the reservation code?” versus “Do you perhaps have a duplicated account with us?”), different variants can be generated within each cluster (e.g. “Can you provide the reservation code?” versus “Would you mind providing the reservation code?”), and all the response variants are in appropriate tones and styles. After this content control, we are able to finalize 71 distinguished investigative response candidates with around 400 variants in total.

Once the investigative question candidates are obtained, we apply a candidate re-matching process to construct a structured training dataset. Specifically, we inspect the complete agents’ sentence corpus, calculate the cosine similarity between each sentence and each question candidate, and assign sentences where the embedding-based cosine similarity is above a given threshold to the closest candidate cluster. Note that all sentences in agents’ messages are considered in this re-matching process regardless of question mark, the original investigative question candidates thus can be expended to include their non-question variations (e.g. from “Can you provide the reservation code?” to “Please let me know the reservation code if you have.”).

Investigative Question Recommendation

After the candidate generation and re-matching process, we are able to approach the investigative response recommendation task.

Conversation Reference Top 3 Recommendations Evaluation
1 Customer: The payment link I received by email does not guide me to a payment page. I like to complete this payment. Please help. NA NA
2 Agent: Hi {user name}. Is this for your reservation with {host name}? 1. Could you provide the reservation code or the host name? 2. Which payment method would you like to use? 3. Have you added the payment method? Correct
Customer: Yes.
3 Agent: Are you not able to submit the payment through your account as well? 1. Have you added the payment method? 2. Which payment method would you like to use? 3. Did you want to charge your card ending in {4 digits}? Missed
Customer: Not sure how to do that.
4 Agent: If you go to your trips tab and click on this reservation, there should be a button on that page to complete the payment. 1. Have you added the payment method? 2. Which payment method would you like to use? 3. Are you on the app or website? The 3rd recommendation reflects correct context
Customer: Thats empty.
5 Agent: Are you on the app or website? 1. Are you on the app or website? 2. Have you added the payment method? 3. Which payment method would you like to use? Correct
Customer: App.
Table 2: An example agent-customer live-chat dialogue from our test dataset.

Problem Formulation

Suppose that we have rounds in the conversation of a customer service ticket , and each conversation round represents two consecutive dialogue turns from agent and customer respectively. Within each dialogue turn, there could be more than one message sent from the same interlocutor. As shown in Figure 3

, an inbound customer service conversation always starts with the customer turn, with a pre-assigned ticket issue. Thus in the first round of a conversation, we have the messages from customer only. In the case that the associated ticket issue is missing, we will introduce an additional binary variable to encode this status. Therefore, we could conclude the investigative response recommendation task as follows:

Goal: For a live-chat conversation, given the ticket issue and historical messages from the customer and the agent, we wish to suggest the most likely investigative queries for the agent to ask in the next round of the conversation.

Embedding Conversation Turns

Similar to the previous candidate generation process, we apply word2vec on the entire conversation corpus to generate word representations. For each conversation turn, we merge all the messages and aggregate word embeddings based on their associated Tf-Idf weights and obtain turn embeddings.

Encoding Structured Outputs

On the other hand, by re-matching the query candidates to the original conversation messages, we are able to construct structured outputs (i.e., the true investigative questions that agents asked afterwards) in each conversation round, where indicates the -th candidate is hit by the -th round of the agent’s messages (i.e., messages in the next round, see Figure 3). Notice that agents may ask multiple investigative queries in the same conversation turn. Thus there could be more than one non-zero element in the output vector . At the end, we aim to predict the following conditional probability for the -th conversation round of ticket 222All the notations are ticket-dependent, for brevity we omit the ticket subscription in the subsequent contents.

(1)

where is a one-hot vector to represent the pre-assigned ticket issue for ticket , , are precomputed turn embeddings for the dialogue turns and respectively.

Model Architecture

As shown in Figure 3

, we apply a Recurrent Neural Network (

RNN

) with Long Short-Term Memory (

LSTM) units to model the above probability. We consider a base version of our long-term context model, where the concatenated dialogue turn embeddings of each round are regarded as the input sequence, and the labels reflected in the next round are regarded as the output sequence. The dialog turn embeddings are gradually read in to represent short-term conversation contexts. In addition, the long-term problem contexts are encoded as hidden states which could be carried and adapted throughout the entire conversation. A softmax function can be applied in the output layer to approximate the probability in Eq. 1.

To extend the previous base version, the pre-assigned ticket issue can be regarded as an additional feature of each ticket , which can be easily concatenated in the input layer with dialogue turn embeddings or/and in the output layer with the output hidden states .

Training

Our training objective is to maximize the following cross-entropy function

(2)

We apply a standard regularizer and a stochastic gradient-based method ADAM [Kingma and Ba2014]

for optimization. The hyperparameter for the

regularizer is selected by grid search based on the validation performance.

Inference

At inference time, we feed the conversation message turns as the input sequence and obtain the output probability distribution over investigative query candidates at each timestamp. We then rank candidates based on their associated probabilities and suggest the top three candidates to agents in the live-chat conversation.

Experiments

Data

We conduct offline experiments on around 20 thousand live-chat ticket conversations and 1 million messages from Airbnb’s customer service system. Training, validation and test sets are created using an 80/10/10 random split of these tickets.

Baselines

We consider the following different methods for the investigative query recommendation task.

  • Issue-Wise Frequency. Query candidates can be simply ranked based on their overall frequencies in the training tickets with the same pre-assigned ticket issue. This straightforward approach does not require additional training efforts and the recommendation outcomes remain static in the entire conversation.

  • Short-Term Context Models. We remove the RNN architecture in the proposed model and consider messages in the preceding conversation round as short-term contexts only. Specifically, we consider the following two variations.

    • Base Version. We concatenate the agent’s and the customer’s dialogue turn embeddings and apply a linear model on top of them.

    • Include ticket issue. We consider the pre-assigned ticket issue as an additional feature and concatenate it with short-term dialogue turn embeddings. Similarly, a linear model is applied on these features.

  • Long-Term Context Models. We now consider different variations of the proposed RNN architecture:

    • Base Version. Only dialogue turn embeddings are included in the model.

    • Include Issue in the Input Layer. The pre-assigned ticket issue is concatenated with dialogue turn embeddings in the input layer.

    • Include Issue in the Output Layer. The pre-assigned ticket issue is concatenated with hidden states in the output layer.

    • Include Issue in the Input and Output Layer. The ticket issue is included in both the input layer with and the output layer with .

By comparing short-term context models and long-term context models, we evaluate if capturing long-term problem contexts helps with recommending investigative queries. By comparing different variations of each method, we evaluate if incorporating pre-assigned ticket issues is useful in terms of query recommendation performance.

Evaluation Metrics

We consider Recall@Top3, the coverage of agent’s real queries by the suggested top three queries, as the quantitative evaluation metric. Suppose

denote the top three predicted candidates in the round and denote the set of candidates covered in the agent’s message. Then our evaluation metric can be defined as

We evaluate the recommendation performance on the conversation rounds where at least one query candidate is included in agents’ messages. Two average values of such a metric for each method are reported: 1) we compute the average of Recall@Top3 across all valid conversation rounds (Recall-r); and 2) for each ticket conversation, we first calculate the average value across all valid rounds, then compute the average across all the tickets (Recall-t).

Quantitative Results

Method Recall-r Recall-t
Issue-Wise Frequency 0.627 0.631
Short-Term Linear
base 0.646 (0.001) 0.652 (0.002)
+ issue 0.648 (0.001) 0.654 (0.001)
Long-Term LSTM
base 0.689 (0.001) 0.690 (0.002)
+ issue (in. layer) 0.688 (0.001) 0.689 (0.001)
+ issue (out. layer) 0.688 (0.002) 0.690 (0.002)
+ issue (in. + out. layers) 0.686 (0.002) 0.687 (0.002)
Table 3:

Investigative question recommendation results from different methods. Associated standard errors are included in parentheses. Best results are

underlined.

We run each method 10 times with randomized initializations and report performance of the above models on the investigative question recommendation task. Results are provided in Table 3.

From this table we observe that long-term context models significantly outperform short-term contexts models, which indicates that incorporating long-term problem contexts does benefit the performance of recommending investigative responses to agents. In addition, we find incorporating the pre-assigned ticket issue as an additional feature provides a performance boost for the short-term context model. However, such improvements are not significant for the long-term LSTM-based model. A possible reason is that ticket issue information has been implicitly included in the long-term conversation contexts. Therefore, we cannot observe a significant performance gain by explicitly modeling these signals as additional information. This is actually a desired property in real-world applications, as pre-assigned ticket issues could be noisy (customers and agents may choose wrong issues when initializing tickets) or unavailable at beginnings of conversations. Fortunately, such a long-term context model indeed provides us an option to bypass these noisy signals and utilize the conversation messages directly without scarifying recommendation performance.

Qualitative Results

In Table 2, the first 5 rounds of one conversation are shown. The customer was having trouble figuring out how to make a payment. From the recommendations for round 3, we can see that the model mistakenly recommended change payment method related investigative questions. As the conversation progressed, the model was able to capture the correct context and start to recommend debugging related investigative questions (e.g., “Are you on the app or website?”). This demonstrated the effectiveness of the model in capturing both short-term and long-term contexts as the conversation progresses, and correct mistakes with more information.

Conclusion and Future Work

In this work, we studied an end-to-end investigative question suggestion system for live-chat customer service agents. We described the pipeline for investigative question candidate generation and proposed a proactive response recommendation model with long-term context memory.

We leave several online serving challenges as open questions. For example, given that most machine learning infrastructures assume models are stateless, where and how to cache hidden states of the RNN model could be challenging. Notice that in our problem setting, multiple messages in the same dialogue turn would be concatenated in the offline training. However, as the end-of-a-turn signals are not always accessible in real-time, how to address and resolve this problem during online serving would be worthy of interest as well.

Acknowledgement

The authors would like to thank Yashar Mehdad and Negin Nejati for helpful discussions and insights, as well as Lisa Qian for her guidance and support.

The authors would also like to thank members of the Airbnb Agent Platform Design, Engineering, Product, and Content teams for making this work possible: Tyler Townley, Colleen Purdy, Jen Wardwell, Adrien Cahen, Ted Hadjisavas, Virginia Vickery, and many others.

References

  • [Cui et al.2017] Cui, L.; Huang, S.; Wei, F.; Tan, C.; Duan, C.; and Zhou, M. 2017. Superagent: a customer service chatbot for e-commerce websites. Proceedings of ACL 2017, System Demonstrations 97–102.
  • [Henderson et al.2017] Henderson, M.; Al-Rfou, R.; Strope, B.; Sung, Y.-h.; Lukacs, L.; Guo, R.; Kumar, S.; Miklos, B.; and Kurzweil, R. 2017. Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652.
  • [Kannan et al.2016] Kannan, A.; Kurach, K.; Ravi, S.; Kaufmann, T.; Tomkins, A.; Miklos, B.; Corrado, G.; Lukacs, L.; Ganea, M.; Young, P.; et al. 2016. Smart reply: Automated response suggestion for email. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 955–964. ACM.
  • [Kingma and Ba2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [Li et al.2016a] Li, J.; Galley, M.; Brockett, C.; Gao, J.; and Dolan, B. 2016a. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 110–119.
  • [Li et al.2016b] Li, J.; Monroe, W.; Ritter, A.; Jurafsky, D.; Galley, M.; and Gao, J. 2016b.

    Deep reinforcement learning for dialogue generation.

    In

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    , 1192–1202.
  • [Mikolov et al.2013a] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • [Mikolov et al.2013b] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111–3119.
  • [Oriol Vinyals2015] Oriol Vinyals, Q. L. 2015. A neural conversation model.

    ICML Deep Learning Workshop 2015

    .
  • [Pasternack et al.2017] Pasternack, J.; Chakravarthi, N.; Leon, A.; Rajashekar, N.; Tiwana, B.; and Zhao, B. 2017. Building smart replies for member messages. https://engineering.linkedin.com/blog/2017/10/building-smart-replies-for-member-messages. Accessed: 2018-10-30.
  • [Serban et al.2016] Serban, I. V.; Sordoni, A.; Bengio, Y.; Courville, A. C.; and Pineau, J. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI, volume 16, 3776–3784.
  • [Shang, Lu, and Li2015] Shang, L.; Lu, Z.; and Li, H. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, 1577–1586.
  • [Sordoni et al.2015] Sordoni, A.; Galley, M.; Auli, M.; Brockett, C.; Ji, Y.; Mitchell, M.; Nie, J.-Y.; Gao, J.; and Dolan, B. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 196–205.
  • [Xu et al.2017] Xu, A.; Liu, Z.; Guo, Y.; Sinha, V.; and Akkiraju, R. 2017. A new chatbot for customer service on social media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 3506–3510. ACM.