Personalizing Dialogue Agents via Meta-Learning
Existing personalized dialogue models use human designed persona descriptions to improve dialogue consistency. Collecting such descriptions from existing dialogues is expensive and requires hand-crafted feature designs. In this paper, we propose to extend Model-Agnostic Meta-Learning (MAML)(Finn et al., 2017) to personalized dialogue learning without using any persona descriptions. Our model learns to quickly adapt to new personas by leveraging only a few dialogue samples collected from the same user, which is fundamentally different from conditioning the response on the persona descriptions. Empirical results on Persona-chat dataset (Zhang et al., 2018) indicate that our solution outperforms non-meta-learning baselines using automatic evaluation metrics, and in terms of human-evaluated fluency and consistency.READ FULL TEXT VIEW PDF
Conventional approaches to personalized dialogue generation typically re...
Current generative-based dialogue systems are data-hungry and fail to ad...
Current dialogue systems are not very engaging for users, especially whe...
Although consistency has been a long-standing issue in dialogue agents, ...
In this paper, we explore the reproducibility of MetaMF, a meta matrix
Neural knowledge-grounded generative models for dialogue often produce
Dialog policy determines the next-step actions for agents and hence is
Personalizing Dialogue Agents via Meta-Learning
There is a growing interest in learning personalized chit-chat dialogue agents for making chat-bots more consistent. Recently, a multi-turn conversational dataset called Persona-chat Zhang et al. (2018) has been released, where two speakers are paired and a persona description (4-5 sentences) is randomly assigned to each of them. For example, “I am an old man” and “I like to play football” are one of the possible persona descriptions provided to the speaker. By conditioning the response generation on the persona descriptions, a chit-chat model is able to produce a more persona consistent dialogue Zhang et al. (2018).
However, it is difficult to capture a persona just by using few sentences, and collecting a non-synthetic set of persona descriptions from a real human-human conversation, e.g., Reddit, is challenging as well since it requires hand-crafted feature designs Mazare et al. (2018). In light of this, we propose to leverage a set of dialogues done by the same persona directly, instead of using its persona descriptions, to generate a more consistent response.
We consider learning different personas as different tasks via meta-learning algorithms, which is fundamentally different from optimizing the model to represent all the personas. A high-level intuition of the difference between these two approaches is shown in Figure 1. We aim to learn a persona-independent model that is able to quickly adapt to a new persona given the dialogues. We formulate this task as a few-shot learning problem, where dialogues are used for training and the remaining for the test. Hence, we expect to learn initial parameters of a dialogue model that can quickly adapt to the response style of a certain persona just by using few dialogues.
The main contribution of this paper is to cast the personalized dialogue learning as a meta-learning problem, which allows our model to generate personalized responses by efficiently leveraging only a few dialogue samples instead of human-designed persona descriptions. Empirical results show that our solution outperforms joint training, in terms of human-evaluated fluency and consistency.
In Persona-chat dataset Zhang et al. (2018), a dialogue is defined as a set of utterances and a persona description is defined as a set of sentences . A personalized dialogue model is trained to produce a response conditioned on previous utterances and persona sentences :
Instead of conditioning our response on the persona sentences, we first adapt to the set of dialogue made by a persona and then we only use the dialogue history to condition our response. Eq. (1) becomes:
Therefore, we define the set of dialogues of a persona as . Conceptually, a model is expected to generate personalized response after being trained with a few dialogues example from . The main idea of our work is to use Model-Agnostic Meta-Learning (MAML) Finn et al. (2017) to learn an initial set of parameters that can quickly learn a persona from few dialogues sample. We refer to the proposed meta-learning method for persona dialogues as Persona-Agnostic Meta-Learning (PAML).
We define the persona meta-dataset as , where is the number of persona. Before training, is split into .
For each training epoch, we uniformly sample a batch of personasfrom , then from each persona in we sample a set of dialogues as training , and another set of dialogues as validation . After iterations of training on , the dialogue model , parameterized by , is updated to by standard gradient descent,
where is learning of the inner optimization, and the training loss. Specifically, cross-entropy loss is used for training the response generation:
The meta-learning model is then trained to maximize the performance of the adapted model to the unseen dialogues in . Following Finn et al. (2017), we define the meta-objective as:
where is the loss evaluated on . For optimizing Eq.(5
), we apply again stochastic gradient descent on the meta-model parametersby computing the gradient of , which is:
is meta-learning rate. This process requires second order optimization partial derivatives, which can be computed by any automatic differentiation library (e.g. PyTorch, Tensorflow etc.). A summary of the training procedure is shown in Algorithm1.
The experiments are conducted using Persona-chat Zhang et al. (2018). To create the meta-sets , we match the dialogues by their persona description separately for train, validation and test, by following the same persona split as in Zhang et al. (2018). On average each persona description has 8.3 unique dialogues. In the Appendix, we report the number of dialogue distribution.
In our experiments, we compared different training settings:
(Dialogue) a model trained using dialogue history, as in Eq.(2);
(PAML) a meta-trained model as in Eq.(5), where we test each set by selecting one dialogue and training with all the others. To elaborate, suppose we are testing then we first fine-tuning using all the dialogues in , and then test on . This process is repeated for all the dialogues in .
(Dialogue+Fine-tuning) we use the same testing as PAML but on a model trained as Dialogue.
We also report a trained model that assumes persona description is available and we refer it as (Dialogue+Persona).
We implemented using a standard Transformer architecture Vaswani et al. (2017) with pre-trained Glove embedding Pennington et al. (2014) 111The model and the pre-processing scripts are available at https://github.com/HLTCHKUST/PAML. For the standard training, we used Adam Kingma and Ba (2014) optimizer with a warm-up learning rate strategy, and a batch size of 32. Instead, in meta-training, we used SGD for the inner loop and Adam for the outer loop with learning rate and respectively, and batch size of 16 for both. In all the model we used beam search with beam size 5.
The objective of the evaluation is to verify whether PAML can produce a more consistent response with reference to the given dialogue and persona description (even though is not seen). To do so, we employ both automatic and human evaluation.
We report perplexity and BLEU score Papineni et al. (2002) of the generate sentences against the human-generated prediction. Aside of standards evaluation metrics, we also train a Natural Language Inference (NLI) model using Dialog NLI Sean et al. (2018) dataset, a recently proposed corpus based on Persona dataset, with NLI annotation between persona description sentences and dialogues utterance. We fine-tune a pre-trained BERT model Devlin et al. (2018) using the DNLI corpus and achieve a test set accuracy of 88.43%, which is aligned to the best-reported model ESIM Chen et al. (2017) in Sean et al. (2018) (with 88.20% accuracy). Then, we defined a new evaluation metric for dialogue consistency as follow:
where is a generated utterance and the is one sentence in the persona description. Hence, having a higher consistency C score means having a more persona consistent dialogue response.
Since automatic evaluation performs poorly in this task Liu et al. (2016), we perform a human evaluation using crowd-sourced workers. We randomly selected 300 generated response examples from 10 unique personas and we asked each worker to evaluate fluency (1 to 5) and consistency of the generated response with respect to the dialogue history and the respective persona description. We asked the workers to assign a score of 1, 0 or -1 for consistent, neutral, and contradicts respectively, the full instruction set is available in the Appendix.
Table 1 shows both automatic and human evaluation results. PAML achieve consistently better results in term of dialogue consistency in both automatic and human evaluation. The latter also shows that all the experimental settings have comparable fluency scores, where instead perplexity and BLEU score are lower in PAML. This confirms that these measures are not correlated to human judgment Liu et al. (2016). For completeness, we also show generated responses examples from PAML and baseline models in Appendix.
On the other hand, the human evaluated consistency is aligned to the C score, which confirms the meaningfulness of the defined measure. This agrees with results of Sean et al. (2018), where the authors showed that by re-ranking the beam search hypothesis using the DNLI score (i.e. C score), they achieved a substantial improvement in dialogue consistency.
We analyze the ability of our model to fast adapt to a certain persona in term of shots. We define shot as the number of dialogues used in for fine-tuning a certain persona, e.g. 1-shot one dialogue, 3-shot three dialogue and so on. Figure 2 compares the -shot consistency C results for equal to 0, 1, 3, 5 and 10, both PAML and Dialogue+Fine-tuning. PAML can achieve a high consistency score just by using 3 dialogues, which is better than Persona+Dialogue. On the other hand, Dialogue+Fine-tuning cannot properly leverage the dialogues in , which proves the effectiveness of training with meta-learning.
is sub-field of machine learning with the aim of learning the learning algorithm itself. Recently, several meta-learning models has been proposed for solving few-shot image classificationRavi and Larochelle (2016); Vinyals et al. (2016); Finn et al. (2017); Mishra et al. (2017); Santoro et al. (2016), optimization Andrychowicz et al. (2016)2017). Meta-learning for NLP application is less common, and it has been applied in semantic parsing task Huang et al. (2018), machine translation for low resource language Gu et al. (2018), and for text classification Yu et al. (2018). To the best of our knowledge, this is the first attempt in adapting meta-learning to personalized dialogue learning.
Li et al. (2016) was the first to propose a persona based dialogue models for improving response consistency. Zhang et al. (2018) introduced Persona-chat, which was further extended in ConvAI2 (2019). Several works improved on the initial baselines with various methodologies Kulikov et al. (2018); Yavuz et al. ; Hancock et al. (2019); Lucas et al. (2009); Joshi et al. (2017); Zemlyanskiy and Sha (2018); Gao et al. (2018). However, all of these previous works conditioned their response on the persona description, instead of using the dialogues produced by the persona.
In this paper, we present a novel meta-learning setting for personalizing dialogue agents without conditioning the model response to the persona description. This is especially useful since obtaining such persona description requires human effort. Moreover, we show that a dialogue agent trained with meta-learning achieves a more consistent dialogue by both of automatic measures and human evaluation. In future works, we plan to apply meta-learning to comment generation Lin et al. (2019) and task-oriented dialogues systems Madotto et al. (2018); Wu et al. (2019, 2017, 2018); Reddy et al. (2018).
This work has been funded by MRP/055/18 of the Innovation Technology Commission, of the Hong Kong University of Science and Technology.
Preprints Conf. Optimality in Artificial and Biological Neural Networks, pages 6–8. Univ. of Texas.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3622–3631. Association for Computational Linguistics.
Glove: Global vectors for word representation.In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
|Usr:||hey there . how are you ?|
|Sys:||hey there . i am fine . what about you ?|
|Usr:||been better . stressful day at work .|
|Sys:||do you work on sundays ?|
|Usr:||yep . i work fast food . always working on a sunday .|
|Sys:||how old are you ? if do not mind to share it .|
|Usr:||i am 20 and i have been doing this for 4 years .|
|Sys:||you are too young . do you study ?|
|Sys:||i feel so sorry . you should get it soon .|
|Sys:||god help you and make you rich one day .|
|Usr:||haha thanks . enough about me . what do you like to do ?|
|Human:||i finished college now looking for a job|
|Dialogue:||i like to play video games .|
|Persona+Dialogue:||i like to play ultimate frisbee .|
|Multi-task+Fine-tune:||i like to go to the movies .|
|PAML:||i like to play frisbee with my friends .|
|Usr:||hi how are you today|
|Sys:||i am well , studying for an exam|
|Usr:||for college or for what ?|
|Usr:||nice i love fish and that stuff|
|Usr:||straight edge is the way to go minor threat says|
|Sys:||i guess . what do you do|
|Usr:||i just go to the gym all day really|
|Sys:||you must be very strong then|
|Usr:||yeah but i am single .|
|Sys:||trying to find a mate at the gym ?|
|Usr:||no i actually enjoy being alone|
|Dialog:||what do you do for work ?|
|Persona+Dialog:||what do you do for work ?|
|Multi-task+Finetune:||what do you do for work ?|
|PAML:||do you like to read ? i like to read .|
|Usr:||hi , tell me about yourself .|
|Sys:||i have a pet turtle that i love ! his name is timothy|
|Usr:||that is cool . i like fantasy videogames . do you play any ?|
|Sys:||not really . i like hanging out with people in person|
|Sys:||i understand . i love autumn , how about you ?|
|Sys:||definitely . my parents went away to bora bora for a while .|
|Usr:||oh cool ! my mom is a doctor so my parents travel a lot .|
|Sys:||what do you do for a living ?|
|Sys:||i am currently in school|
|Usr:||do you have any hobbies ? i build model spaceships .|
|Human:||i guess i like to cook|
|Dialog:||that sounds like a lot of fun .|
|Persona+Dialog:||i love to build people .|
|Multi-task+Finetune:||that is a good idea . i love to play the guitar .|
|PAML:||that is awesome . i like to play video games .|
To each crowed worker we show a dialogue history, a persona description and the generated response from one of the evaluated settings. Then we ask them to evaluate fluency and consistency. The former is a pretty straightforward measure, where instead we defined consistency as following:
An answer is considered consistent if and only if it
does not contradict with neither the dialogue history, nor the persona description;
is relevant to any of the given persona description sentences.
Usually, generic answer like ”I am not sure” or ”I am sorry to hear that” are considered Neutral. For example, from the persona description, if User 2 likes basketball, talking about basketball will make the answer consistent. An answer like ”I hate basketball” will be considered a contradiction. However, in the following cases, the answer is considered neutral:
The answer does not contradict neither the dialogue history nor the persona description
The answer is not relevant to any of the given persona description sentences
For example, from the persona description, if User 2 likes basketball, talking about swimming is considered neutral, as it is not relevant to basketball but does not contradict anything.
Therefore, we ask you to score only the consistency as such:
The answer is contradicting:-1
The answer is neutral: 0
The answer is consistent: 1