Call For Customized Conversation Jang et al. (2021) is an open-domain chat dataset that grounds human dialogue in both Persona and Knowledge. The dataset provides Knowledge grounded multi-turn dialogue that are aligned with user’s Persona. In particular, this dataset explores how variety of people’s individual preferences affects required Knowledge selection to generate the answer while travelling around the world (history, design, structure, tourism etc.). Thus, the dataset is composed of dialogues annotated with individual landmark associated Wiki passages and simple sentences inferring user’s preferences. This results in more realistic dialogue environment for evaluation of open-domain dialogue agents.
One important aspect of this configuration is that Persona and Knowledge pairs should be retrieved from given dialogue. Following grounding prediction tasks in Jang et al. (2021), we define Persona and Knowledge Dual Context Identification as the task to identify Persona and Knowledge jointly for a given dialogue. We hypothesize that there are specific interactions that happen between Persona, Knowledge, and Dialogue, thus they cannot be predicted separately from partial contexts. We utilize neural retrieval tools such as Sentence-BERT Reimers and Gurevych (2019) to jointly predict Persona and Knowledge. This is the first paper to outline joint retrieval techniques for multi-context grounded dialogue as of our understanding.
In addition, decoding techniques are crucial since conversation models have same encoder-decoder architecture utilized for other Text Generation tasks.Roller et al. (2020) introduce recipes for retrieval and generation models where they emphasize decoding choices for grounded open-domain dialogue. Wu et al. (2016) propose a variety of normalization techniques for machine translation in a production system (Google Translate). Meister et al. (2020) investigates importance of beam configurations in reaching optimal performance. Following these studies, we aim to tackle known problem of brevity in which generative models favor shorter, less informative text than is optimal. We extensively experiment with various decoding strategies, length constraints and normalization techniques.
Our contributions are as follows :
1. Persona-Knowledge dual context retrieval methodology which utilizes neural retrieval tools to jointly retrieve Persona and Knowledge given Dialogue. We achieve SOTA performance for both Persona and Knowledge retrieval. Notably, no model fine-tuning is required for top-1 Knowledge retrieval method.
2. Enhanced decoding strategy that target optimal performance with specific emphasis on brevity enhancement. Notably, our approach obtains a significant performance gain without additional data or training.
2 Related Works
Integrating Persona with dialogue agents has been actively studied. Various different datasets and systems exist for the purpose, including Persona Chat Zhang et al. (2018) and many others Majumder et al. (2020); Joshi et al. (2017); Shuster et al. (2018); Xu et al. (2020); Rashkin et al. (2019). Access to Persona assists the dialog agent in responding correct dialogue to the user, however, lack of Knowledge context prohibits the agent from elaborating with specific detailed information.
On the other hand, integrating knowledge bases with dialogue is another engaging topic of dialogue studies. Datasets for this purpose are Dinan et al. (2018); Zhou et al. (2018). Relevant Knowledge to the dialogue is retrieved from the knowledge base and utilized in response generation. The shortcoming of this Knowledge-only approach is that relevant Knowledge itself might depend on Persona of the user. We specifically address this shortcoming in our method via studying interactions between all components of dialogue.
In dialogue generation, Wu et al. (2016) propose a variety of beam normalization techniques for machine translation. Roller et al. (2020) emphasizes decoding strategies for open-domain chatbot including beam size, beam length, and sampling methods. Meister et al. (2020) introduces regularization strategies for beam search.
3.1 Knowledge Retrieval
We introduce a novel formulation of Persona, Knowledge and Dialogue as Q & A input (Figure 1). This form is specifically selected to infer relations between all inputs of the grounded dialogue during answer likelihood calculation, and to replicate short question and descriptive answer pairs often found in Q & A setting. notates pair for inference with retrieval model, notates specific Q & A candidate pairs, notates specific Persona and Knowledge pairs respectively, and notates dialogue corresponding to the pairs.
We then perform permutative Persona-Knowledge evaluation (Figure 2) on all pairs of augmented Persona and Knowledge . We find the best Knowledge via computing all pairs and recording Knowledge of most aligned pair. This is to make sure we find the best Knowledge that aligns with the Dialogue and Persona of the human. notates Q & A retrieval model that returns relevancy score and notates index of predicted true Knowledge .
3.2 Persona Retrieval
Continuing from Section 3.1, we fine-tune the Q & A retrieval model using augmented Persona and predicted true Knowledge pairs only, without incorrect Knowledge pairs. This fine-tuning step is to increase the performance of the model, and obtain correct normalized scores for Persona. Otherwise we will obtain higher scores due to alignment of with in terms of Q & A configuration. notates the fine-tuned model. is input to the Q & A model similar to , only difference being fixed true Knowledge . We note separately because it is data from separate training set formulated in same manner as with labeled true Knowledge.
Finally, we infer data pairs with model to obtain Persona likelihood score. We utilize a threshold to avoid retrieving unrelated Persona. Certain Dialogue has no Persona assigned to it, which we can replicate with the threshold.
Retrieved Persona and Knowledge for given Dialogue is as follows, notated by :
3.3 Decoding Techniques
We describe generated grounded conversation response as a downstream task of Persona-Knowledge retrieval.
where indicates a response, represents a dialogue, denotes persona, indicates knowledge, represents minimum response length, denotes maximum response length, indicates the coefficient of the length normalization, denotes a beam size and represents the dialogue generation model. Note that we utilize our implementation of beam search instead of nucleus sampling Holtzman et al. (2019) baseline from Jang et al. (2021).
For length normalization technique, we apply the following formula proposed by Wu et al. (2016) to our decoder with various alpha values. We report experimental result in Appendix A.
where |Y| denotes the current target length and indicates the length normalization coefficient.
4 Experiment Setup
We utilize Call For Customized Conversation Jang et al. (2021) dataset for evaluation and fine-tuning, which has 10 Knowledge and 5 Persona candidates respectably for each dialogue. We integrate neural Question and Answering retrieval model from Sentence-BERT Reimers and Gurevych (2019) as starting model . Specifically, we utilize 12 layer MiniLM Wang et al. (2020) (33M params) based cross-encoder trained on MS MARCO111MRR@10 on MS MARCO Dev Set: 39.02 Nguyen et al. (2016). This model fits very well with our formulation since its purpose is for semantic search, with model evaluating short questions and long passages together. For Persona search (eq. 4, 6
), we fine-tune for 2 epochs and provide threshold ofin our best configuration.
In addition, to evaluate generation task, we extensively experiment with baseline generation model trained via configuration in Jang et al. (2021)
combined with several decoding hyperparameters. We train the baseline model for 5 epochs, and we use default decoding settings as minimum length 1, maximum length 20, and nucleus sampling. Finally, our method is trained additionally 25 epochs and uses minimum length 5, maximum length 80, and beam size 1, alpha 1.0. Exact hyperparameters are attached to Table9.
5.1 Knowledge Retrieval
We experiment with various ablations of Dialogue / Persona / Knowledge interactions and find permutative evaluation of eq.1 form yields best performance for selecting top-1 Knowledge. Result of 15 point increase confirms that considering all components of dialogue is important. We report the results on test set.
|D & K||79.26|
|P & K (pairwise)||84.62|
|P + D & K (pairwise)||94.69 (+15.41)|
5.2 Persona Retrieval
For Persona retrieval experiments, we start with grounding Knowledge selected in Section 5.1. Then, we perform ablations of Dialogue augmentation and fine-tuning. Fine-tuning of model yields 8 point performance increase.
|P + D &||83.83|
|P + D & (fine-tuned)||91.57 (+7.74)|
We observe low performance for in comparison to . We suspect that this is due to lack of score normalization, in that Q & A relationship of Dialogue to true Knowledge may affect likelihood score. We argue that fine-tuning model normalizes the score in addition to raw performance increase. We perform threshold ablations as shown in Table 3 to verify our hypothesis.
|P + D &||79.30||83.83||84.02||84.26|
We find that fine-tuned model has increased performance across all thresholds, including where the output has top-1 characteristics. We also find that the score increases in tandem with Persona threshold for non-fine-tuned case.
5.3 Generation Results
We experiment with various decoding methods and perform ablations. In these experiments, we use ours (5 epoch) model described in Table 9. We report the results on dev set.
Q1. What is the optimal performance we can reach with decoding method improvements?
Q2. How does the decoding strategy affect performance?
Q3. How does the length constraints affect performance?
Q1. What is the optimal performance we can reach with decoding method improvements?
We obtain 10, 11 point increase of BLEU and Rough-L respectably as described in Table 4.
|Ours (5 epoch)||38.50 (+7.71)||19.31 (+8.15)|
|Ours (30 epoch)||41.54 (+10.75)||21.42 (+10.26)|
Q2. How does the decoding strategy affect performance?
We select beam size of 10 informed by Meister et al. (2020). Table 5 demonstrates effectiveness of beam search compared to baseline nucleus sampling.
|Beam Size||N/A (nucleus)||10|
Q3. How does length constraints affect performance?
Table 6 demonstrates that the longer the maximum response length, the higher the performance gain. We also experiment with minimum length constraints in Appendix A.
We introduce Persona-Knowledge dual context retrieval method in this paper. We achieve SOTA grounding retrieval performance by Q & A informed data augmentations and application of novel fine-tuning techniques. We achieve SOTA dialogue generation performance by utilizing beam search and brevity-informed constraints. We perform minimal fine-tuning for both high-performing methods. We are first place across all metrics (Persona / Knowledge accuracy, SacreBLEU, CharF++, ROUGE-L) in the official leaderboard. We achieve significant point increase over each baseline metrics for both Grounding and Generation tasks.
Wizard of wikipedia: knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241. Cited by: §2.
- The curious case of neural text degeneration. arXiv. External Links: Cited by: §3.3.
- Call for customized conversation: customized conversation grounding persona and knowledge. AAAI-22. External Links: Cited by: Persona-Knowledge Dialogue Multi-Context Retrieval and Enhanced Decoding Methods, §1, §1, §3.3, §4.
- Personalization in goal-oriented dialog. arXiv. External Links: Cited by: §2.
Like hiking? you probably enjoy nature: persona-grounded dialog with commonsense expansions. In
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 9194–9206. External Links: Cited by: §2.
- If beam search is the answer, what was the question?. arXiv preprint arXiv:2010.02650. Cited by: §1, §2, §5.3.
- MS marco: a human generated machine reading comprehension dataset. External Links: Cited by: §4.
- Towards empathetic open-domain conversation models: a new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 5370–5381. External Links: Cited by: §2.
- Sentence-bert: sentence embeddings using siamese bert-networks. EMNLP 2019. External Links: Cited by: §1, §4.
- Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637. Cited by: Appendix A, §1, §2.
- Image chat: engaging grounded conversations. arXiv. External Links: Cited by: §2.
- MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. External Links: Cited by: §4.
Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. Cited by: Appendix A, §1, §2, §3.3.
- A neural topical expansion framework for unstructured persona-oriented dialogue generation. arXiv. External Links: Cited by: §2.
- Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 2204–2213. External Links: Cited by: §2.
- A dataset for document grounded conversations. arXiv preprint arXiv:1809.07358. Cited by: §2.
Appendix A Appendix
While we achieve strong performance increase without any training of generative model, we find that our experimental results do not fully agree with existing methods introduced in Roller et al. (2020) and Wu et al. (2016). Robust decoding method applicable to multiple open-domain dialogue domains could be found. We leave this question to future studies.
a.1 The effect of the minimum length
Performance with different decoding minimum lengths. Other parameters are same as ours (5 epoch).
a.2 The effect of the length normalization
Performance with different alpha coefficients of the length normalization. Other parameters are the same as ours (5 epoch).
|training epochs||5||5 or 30|
|training batch size||2||2|
|alpha (length norm)||0.0||1.0|
|beam size||N/A (nucleus)||10|