You Impress Me: Dialogue Generation via Mutual Persona Perception

04/11/2020 ∙ by Qian Liu, et al. ∙ Beihang University Microsoft UCL 6

Despite the continuing efforts to improve the engagingness and consistency of chit-chat dialogue systems, the majority of current work simply focus on mimicking human-like responses, leaving understudied the aspects of modeling understanding between interlocutors. The research in cognitive science, instead, suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P^2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding. Specifically, P^2 Bot incorporates mutual persona perception to enhance the quality of personalized dialogue generation. Experiments on a large public dataset, Persona-Chat, demonstrate the effectiveness of our approach, with a considerable boost over the state-of-the-art baselines across both automatic metrics and human evaluations.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Thanks to the advance in neural models and the accessibility of massive datasets, open-domain dialogue (i.e. chit-chat) systems have made great progress towards mimicking human-like responses. Nevertheless, there still exist some serious challenges in building personalized chatbots that can deliver engaging conversations and gain user trust Song et al. (2019). For example, current chit-chat systems tend to generate uninformative responses Li et al. (2016). Moreover, they are usually lack of coherent personality traits due to the fact that training dialogues actually come from a diverse set of speakers Zhang et al. (2018).

Several attempts have been made to alleviate the above issues. Methods like special reward shaping to reduce generic responses Li et al. (2016) and representing the speakers with latent variables Li et al. (2016) were introduced to improve the engagingness of chit-chat systems. A more straightforward approach, which equips chit-chat systems with predefined personas, was proposed accompanied by a novel dataset, Persona-Chat Zhang et al. (2018). Figure 1 shows a clipped dialogue from Persona-Chat. Two interlocutors meet for the first time and are having a conversation in order to get to know each other. What makes Persona-Chat unique is that personas of both interlocutors are explicitly described using several profile sentences, facilitating the training of chatbots with configurable and persistent personalities.

Figure 1: A clippled dialogue from Persona-Chat.
Figure 2: The overview of (see text).

Persona-Chat has fueled a growing interest in developing methods for personalized dialogue generation. Mazaré et al. (2018) incorporated additional data from Reddit to train the model. Wolf et al. (2019b) fine-tuned pretrained language model Radford et al. (2018) to improve the dialogue generation. Although both works demonstrate promising results, they focus more on mimicking the style of human-like responses, leaving understudied the aspects of explicitly modeling understanding between interlocutors. Our work, instead, takes the perspective of understanding modeling.

According to the research in cognitive science, effective communication creates similar activation maps in the brains of both interlocutors Hasson et al. (2012), suggesting that understanding between interlocutors is an essential signal for a high-quality chit-chat conversation. For instance, in the conversation shown in Figure 1, the two interlocutors foster understanding either by raising persona-related topics, “Seen any good movies lately?”, or by revealing their own personas through answering questions, “I don’t watch movies more of a writer.”. The efforts to build understanding keep the conversation flowing.

Taking into account the above, we propose Persona Perception Bot (), explicitly modeling the understanding between interlocutors with a transmitter-receiver framework. Distinguished from traditional methods, highlights a novel concept, mutual persona perception, which is better suited to describe the information exchange process that empowers the interlocutors to get to know each other. In order to train for personalized dialogue generation, we employ supervised training and self-play fine-tuning piloted by reward signals characterizing mutual persona perception. Experiments on the Persona-Chat dataset demonstrate the superiority of our approach over the baselines in both automatic metrics and human evaluations111Our code is available at

2 Methodology Overview

The central idea of is to explicitly model understanding between interlocutors and enhance dialogue generation via mutual persona perception. It comprises two components, Transmitter and Receiver, respectively responsible for dialogue generation and mutual persona perception. Figure 2 gives an overview of : interlocutor has a persona , described with profile sentences . When she first meets the other interlocutor , they are going to know each other through a -turn dialogue , where denotes the utterance that says in -th turn and denotes the number of total turns. Given the entire dialogue history up to -th turn , Transmitter generates according to the distribution , and transmits it to . The same process applies to , keeping the conversation flowing.

As the conversation goes on, impressions are gradually built via utterances. For example, when says “I don’t watch movies more of a writer.”, the impression that is a writer.” is left on ’s mind. As mentioned above, a successful conversation helps interlocutors know each other, which means ’s impression of should correspond to ’s persona and vice versa. Receiver aims to measure the proximity between the built impressions and the actual personas. Specifically, as demonstrated by the dashed black lines in Figure 2, Receiver first projects impressions and personas into a latent space, and then measures the relevance between them based on the impression encoding (e.g. , ’s impression on , projected from ’s utterances ), and persona encoding (e.g. , projected from ’s persona )222We take as an example, and all are similar to .. The relevance scores serve as mutual persona perception rewards, and are further incorporated into the training of Transmitter. Details of the two components are presented in Section 3 and 4.

3 Transmitter

Following previous work Li et al. (2016); Zhang et al. (2018), we treat dialogue generation as a sequence generation problem. Concretely, we employ the pretraining transformer language model introduced in Radford et al. (2018) (i.e. GPT) to initialize Transmitter. The entire training procedure consists of two steps: (1) Supervised Dialogue Generation

. We optimize Transmitter via maximum likelihood estimation (MLE) on the supervised dialogue generation task. (2)

Self-play Model Fine-tuning

. We simulate dialogues between two randomly paired interlocutors, encouraging Transmitter to learn a policy that maximizes reward signals via reinforcement learning (RL) 

Sutton et al. (1999). The design of the reward function considers both language modeling and our proposed mutual persona perception.

3.1 Supervised Dialogue Generation

Figure 3: The overall architecture of Transmitter. “Block” is short for “Transformer Block”. Arrows

bridge the current block to subsequent blocks of its following layer. Position encoding is to incorporate position information into block by assigning an embedding for each absolute position in the sequence. Here we omit the architecture inside the block, and refer the readers to Vaswani et al. (2017) for more details. [MASK] tokens are ignored in the training objective.

As illustrated in Figure 3, Transmitter follows the overall architecture of 12 stacked transformer layers to encode context and generate response. Here, the context contains the persona , the dialogue history , and several special tokens (e.g. [PS] which indicates the start of persona). Given a training instance , the training objective of MLE is to maximize the conditional log-likelihood as:


where is the parameter of Transmitter. means the -th token in , and indicates the token sequence before -th token. Equation 1, hereafter simplified as , applies to both and , and we mention for the sake of brevity (the same as below).

During inference, beam search is applied to store top-ranked response candidates , and Transmitter subsequently chooses as prediction the one that maximizes the length-normalized score:


Besides the sequence generation task, inspired by Wolf et al. (2019b), we set up an auxiliary task, Next Utterance Prediction. Apart from training Transmitter to generate responses, we also train it to discriminate whether the response is the next utterance of the given context. Concretely, we append a special token [CLS]

to the tail of the generated tokens. A classifier is built on top of the token’s hidden state in the last transformer layer, as indicated by the red rounded rectangle in Figure

3. In training, for each response, we randomly sample a distractor and train the classifier to give a higher score on the response than the distractor. In inference, the classifier is used to rank response candidates together with Equation 2. Denoting as the signal indicating the generated response is predicted as the next utterance, Equation 2 is extended as:


where is a hyper-parameter.

3.2 Self-play Model Fine-tuning

Although supervised dialogue generation alone can be used to mimic human-like responses, it does not inherently target at understanding. Therefore, we further fine-tune Transmitter using reinforcement learning with the goal of maximizing mutual persona perception. Analogous to Lewis et al. (2017), we apply self-play to simulate the communication between two Transmitters, both of which have been trained as described in Section 3.1.

Specifically, we have the two Transmitters communicate with each other for several turns. One Transmitter serves as a user with the parameters frozen, while the other is a learnable agent. The parameter of the learnable agent, , is fine-tuned during the self-play. Without loss of generality, in our experiments, we let interlocutor , who starts a conversation, be the user, and correspondingly be the learnable agent.

Figure 4: The illustration of the self-play procedure. Arrows represent the process of dialogue generation driven by Transmitter. Note that is directly taken from the dataset as it is difficult to generate high-quality utterances without any dialogue history.

Here we introduce some necessary formulations for modeling our problem with reinforcement learning. A state contains the persona and the dialogue history. For example, the state for at turn is defined as . An action is the response to be generated. The action space is infinitely large as the response can be arbitrary long. Taking as input, the parameter defines a policy , through which the learnable agent generates its response.

As illustrated in Figure 4, when it is ’s turn to speak, receives and picks according to the policy . As for , it receives and generates the response to simulate a user. and alternately produce responses till the number of turns exceeds the given limit. Once a complete dialogue is generated, the reward is collected to optimize using policy gradient Sutton et al. (1999). Denoting as the reward gets at turn (more details are provided later), we can optimize it by maximizing the following objective:


Applying likelihood ratio trick, is updated by ascending the following gradient:


As aforementioned, the space of action is infinite. In practice, REINFORCE algorithm Williams (1992) is leveraged to approximate Equation 5 by sampling from policy . Furthermore, subtracting a baseline Weaver and Tao (2001), here the mean reward of a mini-batch, is applied on

to reduce variance. The agent samples tokens one by one through

multinomial sampling over the output distribution of , until the special token [EOS] is sampled or exceeding the maximum allowed decoding step (e.g. 32). Compared to beam search sampling, multinomial sampling provides more diversities.

3.3 Reward Shaping (RS)

As described in Section 1, we believe that a high-quality chit-chat conversation should highlight both human language modeling and mutual persona perception. Bearing this in mind, we design three rewards to address language style, discourse coherence and mutual persona perception respectively.

RS.1 Language Style

The generated responses should conform to human language styles, which we believe can be evaluated by a pretrained language model (i.e. GPT). After length normalization, the score for is given as:


where and have similar denotation as the previously mentioned and .

RS.2 Discourse Coherence

The language score is evaluated individually, without considering the discourse coherence. However, a reasonable response should establish links in meaning with context, which is also an important aspect of human-like responses. To take into account the discourse coherence, we employ the well-trained Next Utterance Predictor (mentioned in Section 3.1

). The reward is given by the log probability of

being the next utterance of :


RS.3 Mutual Persona Perception

RS.1 and RS.2 only steer the agent training process towards human-like responding. They do not explicitly encourage understanding between interlocutors. Therefore, we meticulously design the reward to characterize mutual persona perception. Contrast from RS.1 and RS.2, mutual persona perception is a long-term goal throughout the whole dialogue, meaning that the effect of current action might only play out some time later. For instance, receiving “what are your hobbies?” from , it is highly likely that ’s response is relevant to ’s hobbies. This suggests that, not only ’s response but also ’s initial question contributes to mutual persona perception. Denoting as the discount factor indicating how far ahead looks, the reward of mutual persona perception for is defined as:


where is the persona perception score that obtains in -th turn, and is defined likewise. can be computed using a score function:


In , the score function comes from Receiver, which will be elaborated in Section 4. The final reward for is a weighted sum of the rewards listed above:


where , and are hyper-parameters.

4 Receiver

Receiver is devised to measure the proximity between the built impressions and the actual personas, implemented by negative sampling. Specifically, in training, we randomly sample a persona distractor . Receiver is trained to identify the real persona from . In inference, for each utterance, Receiver is responsible for providing a reasonable relevance score, to model our proposed mutual persona perception. The score subsequently joins the self-play fine-tuning on Transmitter as part of the rewards, as in Equation 8.

Figure 5: The overall architecture of Receiver (see text).

4.1 Training

As illustrated in Figure 5, Receiver contains two different encoders for impression and persona respectively. Initialized by BERT Devlin et al. (2019), both encoders provide deep contextualized representations for each token. Then we average all the representations, yielding a fixed

-dimensional vector for one sentence. In this way, feeding

into the impression encoder consecutively, we obtain the impression encoding . The persona encoding is produced likewise, where . The relevance score matrix is computed via the scaled dot product Vaswani et al. (2017):

Category Model Original Revised
Hits@1(%)    ppl    F1(%)  Hits@1(%)    ppl    F1(%) 
Retrieval KV Profile Memory     -     -
Dually Interactive Matching     -     -     -     -
Generative Generative Profile Memory    
Language Model     -     -
Lost In Conversation     -     -
Transfertransfo     -     -     -
(Our) [0.1] [0.16] [0.08] [0.2] [0.11] [0.07]
Table 1: Automatic evaluation results of different methods on the Persona-Chat

dataset. The standard deviation [

] (across 5 runs) of is also reported. All the results were evaluated on the dev set since the test set was not publicly available.

In essence, Receiver is expected to capture fine-grained correlations between the persona and the dialogue. However, we do not have access to the golden fine-grained correlations. The only thing we know is that, compared with , is more correlated to . Since the comparison is at a coarse granularity, we gather into the cumulative score through an aggregate function , as shown in Figure 5. To encourage while at the same time depress , we design a marginal loss , which makes larger than by a margin . Moreover, considering that an utterance generally relates to zero or one profile, regularization is enforced to make sparse. Combining all of these, the training loss for Receiver is:


where is a hyper-parameter for penalty.

As for , one straightforward way is to average over all positions of . However, it maximizes every entry in , including all those that should not be activated (e.g. relevance scores between unrelated profile sentences and utterances), introducing unnecessary noise into the training of Transmitter. To alleviate the problem, we choose to implement as a controllable weighted function, which summarizes as:


where temperature is a tunable parameter Hinton et al. (2015) controlling the evolution of . In the beginning, behaves close to average pooling. As anneals, gradually focuses more on the highest relevance score. In this way, noise reduces as training goes on. Finally, is given by:


4.2 Inference

Given and , Receiver employs the following function to obtain ’s persona perception score, further modeling mutual persona perception as in Equation 9:


where and are the impression encoding and persona encoding for and respectively.

5 Experiment

We conducted experiments on the dataset Persona-Chat, assessing using both automatic metrics and human evaluations. To verify the effectiveness of our proposed mutual persona perception, we perform a thorough model analysis in Section 5.3. Finally, we probe Receiver’s capability on perceiving persona in Section 5.4.

5.1 Implementation Details

Persona-Chat dataset contains 8,939 / 1,000 multi-turn dialogues conditioned on 1,155 / 100 personas for train / dev. Each persona is described with at least 5 profile sentences. To make it more challenging, Persona-Chat also provides revised personas by rephrasing, generalizing or specializing the original ones. For example, “I am overweight.” is revised from “I weight 300 pounds.”.

Our implementation was based on PyTorch

Paszke et al. (2019), ParlAI Miller et al. (2017), and HuggingFace’s transformers library Wolf et al. (2019a). We used Adam Kingma and Ba (2015)

optimizer with a learning rate of 6.25e-5 for both Receiver and Transmitter in supervised learning. In the training of Receiver,

reduced linearly from 10 to 0.5. In the self-play phase of Transmitter, the learning rate was set as 1e-6. The hyper-parameters , , , , , and

were set as 0.4, 0.1, 1e-4, 0.5, 0.4, 0.1 and 0.5 respectively. The supervised training of Transmitter lasted for 2 epochs, and the self-play fine-tuning comprised 2000 dialogues, where the number of turns was 3. The beam search size was set as 2.

5.2 Methods Comparison

Our baselines fall into three categories: retrieval-based, generative-based and pretrain-finetune-based models. Among the retrieval-based baselines, KV Profile Memory Zhang et al. (2018) was the official baseline which employed the memory network along with profile information, and Dually Interactive Matching Network Gu et al. (2019) proposed a dual matching architecture to match between the responses and their corresponding contexts. Language Model, Generative Profile Memory Zhang et al. (2018) and Seq2Seq with attention mechanism Bahdanau et al. (2015) were implemented as generative baselines for dialogue generation. The remaining methods were all pretrain-finetune-based. Transfertransfo Wolf et al. (2019b)333 achieved the state-of-the-art performance on automatic metrics, while Lost In Conversation444 topped the human evaluations Dinan et al. (2019). Analogous to our approach, they employed the pretrained language model GPT to initialize their models, and then fine-tuned it on the dataset.

Model 1 (%) 2 (%) 3 (%) 4 (%) Avg
Lost In Conversation   
Table 2: Human evaluation results.

Table 1 shows the experimental results on automatic metrics. Following Zhang et al. (2018), we reported the official automatic metrics to evaluate the methods: Hits@1, Perplexity (ppl) and F1

. Given 20 response candidates, Hits@1 is the probability that the real response ranks the highest according to the model. Perplexity measures the negative log likelihood of the correct sequence output by the model, lower values indicating better performance. F1 is the harmonic mean of word-level precision and recall. As observed, our approach outperforms almost all baselines and achieves new state-of-the-art performance on ppl and F1, with highly competitive performance on Hits@1. In the revised mode, our approach still achieves the best performance, obtaining a relative improvement of

on F1 against the strongest baseline. It is worth noting that we also tried to employ F1 as the reward, but the result is far from satisfactory.

As mentioned in Dinan et al. (2019), no automatic metric is perfect for evaluating such an open-domain task. Hence, we also performed crowd-sourced human evaluations on the state-of-the-art baselines (i.e. Transfertransfo & Lost In Conversation) and our proposed . Concretely, on the original dev set, we randomly sampled 200 responses generated by these methods and asked each worker to rate them. The rating ranges from to . means the response is good only in terms of grammar and sentence structure; means in addition to valid grammar, the response is also coherent with the context; means the coherent response is meanwhile interesting and informative, instead of just a simple response like “Yes”; And means the response is consistent with the persona of the interlocutor, which is of extreme importance for the task of reflecting whether the model can effectively utilize the persona information. As shown in Table 2, the results are consistent with the automatic evaluation results, demonstrating the superiority of against the baselines. We also conducted Wilcoxon signed-rank tests between our method and the baselines and the results show the improvements are significant with p .

Variant Hits@1(%)  F1(%)  BLEU(%) 
 -  Persona  (- %)  (+   %)
 -  Next  (- %)  (-    %)
+ RS.1  (+%)  (+   %)
+ RS.2  (+%)  (+   %)
       + RS.3  (+%)  (+%)
Table 3: Variant analysis results on Persona-Chat revised mode, along with relative improvements (shown inside brackets) compared with -S. BLEU refers to the cumulative 4-gram BLEU score. “- Persona” means dialogue generation without personas; “- Next” ablates the auxiliary task mentioned in Section 3.1; “+ RS.1” means only using Language Style score as the reward in the self-play fine-tuning phase; “ + RS.2” means adding Discourse Coherence to the reward on the basis of RS.1; “ + RS.3” is equivalent to our proposed .

5.3 Model Analysis

Table 4: Sampled responses(*) by Human, and the state-of-the-art baselines.

Variant Analysis

We conducted variant analysis on to investigate the influence of RS.1, RS.2 and RS.3. Another metric BLEU Papineni et al. (2002), which evaluates the quality of response, was introduced to make the analysis more comprehensive. We show the variant analysis results in Table 3, where -S is the variant of which is trained only in the supervised setting. As expected, the results on Hits@1 validate the important role of the auxiliary task. Across all the variants, the gains in BLEU and F1 are very small, revealing the difficulty in improving them. Nevertheless, solely by adding RS.3, we obtained a relative improvement on BLEU, indicating the effectiveness of our proposed mutual persona perception. Similar conclusions can be drawn from the trend of F1.

Case Study

For a more comprehensive comparison, we show in Table 4 some randomly sampled responses of different methods. The results suggest the responses generated by our approach are more human-like. As observed, benefiting from our proposed mutual persona perception, the responses of are more consistent, engaging and informative. For instance, in the last example in Table 4, the response “I’m busy with my robot project” explicates why the speaker does not exercise, meanwhile revealing that he is working on the robot, as depicted in his persona.

Error Analysis

Though our approach works well in most cases, we observed that the self-play simulation might fall into repeated cycles after rounds of training, as the challenge mentioned by Li et al. (2016). Another issue is that the bots sometimes ask redundant questions in our approach, which might be due to inappropriate hyper-parameters in reward shaping.

5.4 Persona Perception Probing

Model Original Revised
Hits@1  MRR  Hits@1  MRR 
Receiver 93.8 37.5 78.2 16.6
Table 5: Experimental results on Persona Perception.
Figure 6: Visualization of the relevance scores between a sampled dialogue and its corresponding revised persona. Deeper color means higher score. We omit some context due to space limitation.

Receiver plays an important role in our approach, and we are interested in its capability on perceiving persona. Therefore, we conducted experiments on a synthesized dataset. We constructed the dataset by sampling 31 persona distractors for each dialogue in Persona-Chat. Two widely used ranking metrics were used to evaluate the performance: Hits@1 and Mean Reciprocal Rank (MRR). Hits@1 is the same metric as the one mentioned in Section 5.2, except that the candidate size is 32. Given a dialogue and the complete set of profile sentences, MRR is the average reciprocal ranks of the dialogue-relevant profile sentences. Two simple baselines Random and IR Sordoni et al. (2015) were chosen for comparison. Table 5 shows the experimental results of different methods on the synthesized dataset. As observed, our approach achieved excellent results on both original and revised modes. For example, compared with the IR baseline, our approach achieved an absolute improvement of on Hits@1 in the original mode. In addition, the surprising results in the revised mode further demonstrate Receiver’s capability to perceive rephrased persona.

To further understand the trained Receiver, we visualize the relevance scores between a sampled dialogue and its corresponding revised persona in Figure 6. As illustrated, the relevance scores between related profile sentences and dialogue utterances are significantly higher. For example, the utterance “I volunteer at the local pool” from the interlocutor implies the profile “I love being in the water”, and our Receiver successfully captures the relevance between them.

6 Related Work

Methods to build open-domain dialogue systems generally fall into two major categories: retrieval-based and generative-based. Retrieval-based methods retrieve response candidates and rank them based on the matching scores with the dialogue Sordoni et al. (2015); Wu et al. (2017); Gu et al. (2019). Generative-based methods typically use Seq2Seq model as the backbone Sutskever et al. (2014); Bahdanau et al. (2015); Serban et al. (2017); Wolf et al. (2019b), where the encoder extracts the information in an utterance and the decoder generates the response. Our work adopts a similar architecture. Besides supervised learning, researchers also explore reinforcement learning based methods. Lewis et al. (2017) applied reinforcement learning for negotiation dialogues and showed it outperforms supervised learning when negotiating with humans. Yang et al. (2018) proposed to generate dialogue responses by dual learning based domain adaptation. Zhang et al. (2018) built a coherence model to provide the reward signal for penalizing dull responses. Liu et al. (2019) employed reinfrocement learning to learn an intermediate structure span. Our approach differs from this line of work in that we focus on improving personalized dialogues via mutual persona perception, which has not yet been explored before.

More recently, under the topic of dialogue personalizing, Zemlyanskiy and Sha (2018) proposed a post-processing method to re-rank candidates generated by beam search, while Olabiyi et al. (2019) employed adversarial approaches to solve the consistency problem on interlocutors’ names. Madotto et al. (2019) applied meta-learning to quickly adapt to new speakers, and Tigunova et al. (2019) extracted user attributes from daily dialogues. Compared with them, our work enhances persona based dialogue generation from a novel perspective.

Furthermore, researchers explored to generate diverse responses conditioned on persona Song et al. (2019, 2020). Personalization in goal-oriented dialogue systems has also received some attention Joshi et al. (2017); Luo et al. (2019). The researches focus more on making the goal-oriented bots adjust the response according to different user profiles, while we aim to endow bots with persistent personalities.

7 Conclusion & Future Work

We propose , a transmitter-receiver framework which explicitly models understanding between interlocutors. Under this framework, mutual persona perception is incorporated as a reward signal to achieve the personalized dialogue generation. Experiments on a large public dataset Persona-Chat demonstrate the effectiveness of our approach. For future work, we would like to extend Receiver to conversational recommender systems. After turns of chatting, the agent should be able to infer the user’s persona, based on which personalized contents can be recommended.


We thank all the anonymous reviewers for their valuable comments. This work was supported in part by National Natural Science Foundation of China (U1736217 and 61932003), and National Key R&D Program of China (2019YFF0302902).


  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, May 7-9, 2015, San Diego, CA, USA, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §5.2, §6.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2019, Minneapolis, Minnesota. External Links: Link, Document Cited by: §4.1.
  • E. Dinan, V. Logacheva, V. Malykh, A. H. Miller, K. Shuster, J. Urbanek, D. Kiela, A. Szlam, I. Serban, R. Lowe, S. Prabhumoye, A. W. Black, A. I. Rudnicky, J. Williams, J. Pineau, M. Burtsev, and J. Weston (2019) The second conversational intelligence challenge (convai2). CoRR abs/1902.00098. External Links: Link, 1902.00098 Cited by: §5.2, §5.2.
  • J. Gu, Z. Ling, X. Zhu, and Q. Liu (2019) Dually interactive matching network for personalized response selection in retrieval-based chatbots. In

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019

    Hong Kong, China. External Links: Link, Document Cited by: §5.2, §6.
  • U. Hasson, A. A. Ghazanfar, B. Galantucci, S. Garrod, and C. Keysers (2012) Brain-to-brain coupling: a mechanism for creating and sharing a social world. Trends in cognitive sciences. External Links: Link Cited by: §1.
  • G. E. Hinton, O. Vinyals, and J. Dean (2015)

    Distilling the knowledge in a neural network

    CoRR abs/1503.02531. External Links: Link, 1503.02531 Cited by: §4.1.
  • C. K. Joshi, F. Mi, and B. Faltings (2017) Personalization in goal-oriented dialog. CoRR abs/1706.07503. External Links: Link, 1706.07503 Cited by: §6.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, May 7-9, 2015, San Diego, CA, USA, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §5.1.
  • M. Lewis, D. Yarats, Y. Dauphin, D. Parikh, and D. Batra (2017) Deal or no deal? end-to-end learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark. External Links: Link, Document Cited by: §3.2, §6.
  • J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan (2016) A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany. External Links: Link, Document Cited by: §1.
  • J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley, and J. Gao (2016) Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas. External Links: Link, Document Cited by: §1, §1, §3, §5.3.
  • Q. Liu, B. Chen, H. Liu, J. Lou, L. Fang, B. Zhou, and D. Zhang (2019) A split-and-recombine approach for follow-up query analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China. External Links: Link, Document Cited by: §6.
  • L. Luo, W. Huang, Q. Zeng, Z. Nie, and X. Sun (2019) Learning personalized end-to-end goal-oriented dialog. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, The 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, January 27 - February 1, 2019, Honolulu, Hawaii, USA, External Links: Link, Document Cited by: §6.
  • A. Madotto, Z. Lin, C. Wu, and P. Fung (2019) Personalizing dialogue agents via meta-learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, Florence, Italy. External Links: Link, Document Cited by: §6.
  • P. Mazaré, S. Humeau, M. Raison, and A. Bordes (2018) Training millions of personalized dialogue agents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Brussels, Belgium. External Links: Link, Document Cited by: §1.
  • A. Miller, W. Feng, D. Batra, A. Bordes, A. Fisch, J. Lu, D. Parikh, and J. Weston (2017) ParlAI: a dialog research software platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2017, Copenhagen, Denmark. External Links: Link, Document Cited by: §5.1.
  • O. Olabiyi, A. Khazane, A. Salimov, and E. T. Mueller (2019) An adversarial learning framework for a persona-based multi-turn dialogue model. CoRR abs/1905.01992. External Links: Link, 1905.01992 Cited by: §6.
  • K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, Philadelphia, Pennsylvania, USA. External Links: Link, Document Cited by: §5.3.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019)

    PyTorch: an imperative style, high-performance deep learning library

    In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), External Links: Link Cited by: §5.1.
  • A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever (2018) Improving language understanding by generative pre-training. External Links: Link Cited by: §1, §3.
  • I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. C. Courville, and Y. Bengio (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2019, February 4-9, 2017, San Francisco, California, USA, S. P. Singh and S. Markovitch (Eds.), External Links: Link Cited by: §6.
  • H. Song, W. Zhang, J. Hu, and T. Liu (2020) Generating persona consistent dialogues by exploiting natural language inference. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, February 7-12, 2020, New York City, New York, USA, External Links: Link Cited by: §6.
  • H. Song, W. Zhang, Y. Cui, D. Wang, and T. Liu (2019) Exploiting persona information for diverse generation of conversational responses. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, August 10-16, 2019, Macao, China, S. Kraus (Ed.), External Links: Link, Document Cited by: §1, §6.
  • A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J. Nie, J. Gao, and B. Dolan (2015) A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2015, Denver, Colorado. External Links: Link, Document Cited by: §5.4, §6.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, NIPS 2014, December 8-13 2014, Montreal, Quebec, Canada, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), External Links: Link Cited by: §6.
  • R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour (1999) Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12: Annual Conference on Neural Information Processing Systems 1999, NIPS 1999, November 29 - December 4, 1999, Denver, Colorado, USA,, S. A. Solla, T. K. Leen, and K. Müller (Eds.), External Links: Link Cited by: §3.2, §3.
  • A. Tigunova, A. Yates, P. Mirza, and G. Weikum (2019) Listening between the lines: learning personal attributes from conversations. In Proceedings of the World Wide Web Conference, WWW 2019, May 13-17, 2019, San Francisco, CA, USA, L. Liu, R. W. White, A. Mantrach, F. Silvestri, J. J. McAuley, R. Baeza-Yates, and L. Zia (Eds.), External Links: Link, Document Cited by: §6.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, NIPS 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), External Links: Link Cited by: Figure 3, §4.1.
  • L. Weaver and N. Tao (2001) The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, UAI 2001, August 2-5, 2001, University of Washington, Seattle, Washington, USA, J. S. Breese and D. Koller (Eds.), External Links: Link Cited by: §3.2.
  • R. J. Williams (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning. External Links: Link, Document Cited by: §3.2.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew (2019a) HuggingFace’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771. External Links: Link, 1910.03771 Cited by: §5.1.
  • T. Wolf, V. Sanh, J. Chaumond, and C. Delangue (2019b)

    TransferTransfo: a transfer learning approach for neural network based conversational agents

    CoRR abs/1901.08149. External Links: Link, 1901.08149 Cited by: §1, §3.1, §5.2, §6.
  • Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li (2017) Sequential matching network: a new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada. External Links: Link, Document Cited by: §6.
  • M. Yang, W. Tu, Q. Qu, Z. Zhao, X. Chen, and J. Zhu (2018) Personalized response generation by dual-learning based domain adaptation. Neural Networks. External Links: Link, Document Cited by: §6.
  • Y. Zemlyanskiy and F. Sha (2018) Aiming to know you better perhaps makes me a more engaging dialogue partner. In Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium. External Links: Link, Document Cited by: §6.
  • H. Zhang, Y. Lan, J. Guo, J. Xu, and X. Cheng (2018) Reinforcing coherence for sequence to sequence model in dialogue generation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, J. Lang (Ed.), External Links: Link, Document Cited by: §6.
  • S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston (2018) Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia. External Links: Link, Document Cited by: §1, §1, §3, §5.2, §5.2.