Partner Personas Generation for Diverse Dialogue Generation

11/27/2021
by   Hongyuan Lu, et al.
The Chinese University of Hong Kong
0

Incorporating personas information allows diverse and engaging responses in dialogue response generation. Unfortunately, prior works have primarily focused on self personas and have overlooked the value of partner personas. Moreover, in practical applications, the availability of ground truth partner personas is often not the case. This paper attempts to tackle these issues by offering a novel framework that leverages automatic partner personas generation to enhance the succeeding dialogue generation. We incorporate reinforcement learning with a dedicatedly designed critic network for reward judgement. Experimental results from both automatic and human evaluation demonstrate a) Our framework is capable of generating relevant, informative and coherent partner personas, even compared to the ground truth partner personas. b) Generated partner personas enhance the succeeding response generation, thus surpassing our baselines and comparison model when partner personas are missing during the inference stage. c) Our framework generates responses that are more informative and engaging than our baseline conditioned on the ground truth partner personas during inference. d) Our dedicatedly designed critic network reinforces our framework effectively. Finally, our framework gives better explainability and reduces the demands for external databases for partner personas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

10/22/2021

Adaptive Bridge between Training and Inference for Dialogue

Although exposure bias has been widely studied in some NLP tasks, it fac...
07/18/2017

A Machine Learning Approach for Evaluating Creative Artifacts

Much work has been done in understanding human creativity and defining m...
06/12/2020

Speaker Sensitive Response Evaluation Model

Automatic evaluation of open-domain dialogue response generation is very...
04/29/2020

Evaluating Dialogue Generation Systems via Response Selection

Existing automatic evaluation metrics for open-domain dialogue response ...
02/12/2022

Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents

In recent years, latent variable models, such as the Conditional Variati...
09/09/2021

Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Human dialogue contains evolving concepts, and speakers naturally associ...
02/25/2019

Making History Matter: Gold-Critic Sequence Training for Visual Dialog

We study the multi-round response generation in visual dialog systems, w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Building informative and engaging dialogue agents (Zhang et al., 2019; Roller et al., 2020; Bao et al., 2021)

is a popular research direction within the area of natural language processing. For the sake of engagement, diverse and consistent responses

(Song et al., 2020, 2021) are important factors, and personas information (Zhang et al., 2018) gives rise to both of the factors. There are two types of personas information, namely self persona and partner persona. The former refers to a self profile consisting of several sentences representing the dialogue agent. Such a persona allows the agent to produce consistent responses rather than solely relying on the personas information (Kim et al., 2020) that is randomly learned and embedded in the model parameters. The latter also refers to a profile but representing the user. Leveraging such partner personas is helpful for dialogue generation (Gu et al., 2021). Therefore, we exploit partner personas for diverse dialogue generation.

Unfortunately, the user profile could be commonly missing due to the cold-start (Li et al., 2020) when deploying online dialogue agents or for newly registered users. Most of the works, if not all, (Mazaré et al., 2018; Song et al., 2019; Gu et al., 2019; Zhao et al., 2019) have been either overlooking the value of partner personas or simply focusing on the impractical situation where partner personas guarantee to exist in both training and inference stages. Our work demonstrates the importance of diverse partner personas generation, and we particularly investigate the practical situation when partner personas are missing in the inference stage. Such an investigation is essential as there is no guarantee that the ground truth partner personas always exist. Ultimately, our proposed framework can produce even more diverse and engaging responses than our baseline that conditions on the ground truth partner personas. We demonstrate a case study in Section 5.2 that illustrates this advantage.

To our best knowledge, this is the first attempt to formulate partner personas prediction in a generative manner that boosts the performance of the succeeding dialogue generation. Our work is motivated by three underlying hypotheses: i) Partner personas generation is plausible given the self personas and dialogue context. ii) Generated personas are more diverse and interesting than the retrieved ground truth. iii) Such diverse generated personas help to produce diverse succeeding dialogue responses. Our automatic and human evaluation results support these hypotheses, and this paper paves the way to exploit generative partner personas for diverse dialogue generation.

We develop a novel framework composed of three major components, namely a partner personas generator, a dialogue response generator and a critic network. We use a partner personas generator to generate partner personas, which the dialogue response generator uses for succeeding dialogue response generation. We employ reinforcement learning with a dedicatedly designed critic network that propagates the reward back to the generators.

Prior works have investigated retrieval-based partner persona (Zhang et al., 2018; Song et al., 2019). The human-constructed ground truth personas serve as the upper bound for such retrieval-based systems, and we argue that the ground truth are not diverse enough. We observe that the generative counterpart generates relevant, informative and coherent partner personas, which further diversifies the succeeding dialogue response generation. It follows another advantage that our framework does not need an external database to retrieve from (Madotto et al., 2020; Xu et al., 2021).

One close work to ours is a multi-task framework for meta-learning (Lee et al., 2021) that uses personas reconstruction as an auxiliary task to improve response consistency. The differences are that theirs does not differentiate between self personas and partner personas, while ours does. Their model focuses on meta-learning, while ours does not set such a constraint. Theirs indicates an improvement over personality consistency only, while ours focuses on diverse dialogue generation. We conduct an empirical comparison with their model, reconstructing the partner personas. Experimental results indicate that such a multi-task model does not work well in our problem setting.

We compare our proposed framework with some competitive baselines. The automatic and human evaluation results indicate that our framework can generate even more diverse and engaging responses than the baseline conditioned on ground truth partner personas. It leads to a conclusion that i) Partner personas generation is plausible. ii) The generated partner personas are more diverse than the ground truth partner personas. iii) Our framework produces even more diverse and engaging responses than our competitive baselines that condition on the ground truth partner personas.

2 Related Work

2.1 Personalized Dialgoue Generation

Conditioning on personas helps produce informative and engaging responses. The largest multi-turn dialogue dataset conditioned on personal profiles is PersonaChat, in which two crowdsourcers converse and find more about each other. To better utilise the self personas in generating consistent responses, the community has proposed quite a lot of methods. Mazaré et al. (2018) employs a pre-training stage based on dedicatedly extracted large-scale persona-based dialogues and fine-tunes the model on PersonaChat. Zhao et al. (2019) fuses information in personas and dialogue context into individual contextualised representations by attending to different parts of both. Gu et al. (2019) exploits the interaction between personas, dialogue context and response to improve retrieval-based dialogue agents. Lee et al. (2021) utilises multi-task learning for improved personality consistency in the meta-learning scenario. Gu et al. (2021) employs four different strategies for personas fusing, which learns to use self persona and partner persona in a more effective manner. There have also been several works based on GPT (Wolf et al., 2019). However, most of these prior works focus on exploiting self personas rather than partner personas, and they have been assuming that the ground truth partner personas guarantee to exist.

2.2 Reinforcement Learning

Reinforcement learning (RL), or specifically, policy gradient methods (Williams, 1992), have been frequently adopted to both task-oriented dialogue agents (Roman Roman et al., 2020; Deng et al., 2021) or open-domain chitchat agents (Li et al., 2016; Saleh et al., 2019). It can either propagate non-differentiable loss (Cai et al., 2019) or optimize an expert reward such as ease of answering (Li et al., 2016). It also adopts a scenario where a user simulator and a dialogue agent interact, and an expert reward function is defined to assign the goodness to each response generated (Roman Roman et al., 2020).

Figure 1: An example of the inference flow that shows the generated partner personas and the incorporation of partner personas generation into response generation.

Figure 2:

The illustrated reinforcement learning strategy that directly backpropagates the response-related rewards from the critic network to the partner personas generator and the dialogue response generator.

3 Proposed Framework

We develop a novel framework composed of three major components, namely a partner personas generator, a dialogue response generator and a critic network used by reinforcement learning. Figure 1 depicts the inference flow of our setting. The input dialogue context with self persona is first fed into the partner personas generator. The generated partner personas output is then concatenated with the dialogue context and the self personas as the input into the dialogue response generator. In the beginning, we train our partner personas generator and dialogue response generator separately

under supervised learning. In the training stage, we use the ground truth partner personas to train the dialogue response generator, and we replace it with generated partner personas in the inference stage. After the supervised learning stage, the second stage is a reinforcement learning stage which

jointly optimizes both partner personas generator and dialogue response generator as depicted in Figure 2. Such framework has two advantages: i) The partner personas generator can be trained directly using the reward signal that is relevant to dialogue response generation. ii) The dialogue response generator trained using ground truth partner personas can be further fine-tuned on the generated partner personas.222Section 5.4 presents an ablation study on reinforcement learning that supports this claim. Particularly, we employ a dedicatedly designed critic network that receives generated partner personas and generated dialogue responses as the input and output a reward that measures the relevance between the generated personas and responses and propagates back to the generators.

3.1 Partner Personas Generation

A Seq2Seq neural network

(Sutskever et al., 2014) is adopted as our partner personas generator for the task of partner personas generation (PPG). The concatenation of dialogue context c and self personas is fed as an input into the partner personas generator. The personas generator then outputs an approximated partner personas conditioned on the input, which maximises the following conditional likelihood:

For training, the ground truth partner personas

is used and we train our generator under the maximum likelihood estimation:

3.2 Dialogue Response Generation

We also adopt a Seq2Seq neural network for the task of dialogue response generation (DRG). The concatenation of dialogue context c, self personas , and partner personas is fed as an input into the dialogue response generator. The personas generator then outputs an approximated dialogue response conditioned on the input, which maximises the following conditional likelihood:

For training, the ground truth partner personas and the ground truth dialogue response is used and we train our generator under the maximum likelihood estimation:

We use the ground truth partner personas for training and generated partner personas for inference.

3.3 Reinforcement Learning

We employ a critic network that is the core of our reinforcement learning (RL) algorithm to reward our reinforcement agents. We train a binary classifier as our critic network by extracting sub-training-instances

, where represents positive training samples. We then randomly sample two distinct positive sub-instances and :

Then two negative samples can be derived as:

Thereafter, we fine-tune a binary classifier as our critic on this training set by minimizing the following binary cross-entropy loss:


In the equation above, the binary label indicates whether the given response is relevant to the given personas. During the reinforcement learning stage, this classifier acts as a critic network that outputs , conditioned on the generated partner personas and generated response . The predicted binary label is then converted to a reward . is a positive reward when , and is a negative reward when . We then update our RL agents with the following gradients:

for the partner personas generator (PPG), and for the dialogue response generator (DRG):

We particularly want to give positive rewards to our RL agents when they give high-quality responses that differ from the ground truth. Since it is not straightforward to understand the underlying motivation for such a critic network, we divide it into two cases and conquer each of them:

  • The critic network outputs as : the generated personas and response are irrelevant, and we assign a negative reward.

  • The critic network outputs as : the generated personas and response are relevant, and we assign a positive reward.

The first case is trivial, as it is reasonable to assign a negative reward when at least one of our RL agents generates an output far away from the ground truth. For the second case, in addition to the trivial case that both agents output ground-truth-like generation, it also considers such a case when both partner personas generator and dialogue response generator generate a relevant output, but not the exact ground truth. Maximum likelihood estimation might fail to capture this reward, as there could still be a certain distance to the ground truth. In contrast, our critic network captures this by outputting and assigns both of our RL agents a positive reward. We design such a dedicate reward mechanism to encourage the generator to produce a diverse and engaging response with the diverse partner personas generated. We present a case study in Section 5.2 that illustrates this advantage.

Previous work (Cai et al., 2019) employed critic network for reinforcement loss backpropagation. At first glance, our usage of the critic network shares some resemblances to theirs, but indeed, the underlying motivation vastly differs. The major difference is that their critic is trained in an adversarial manner (Li et al., 2018) to pick up the ground truth response among other negative candidates. Also, their critic network conditions only on the dialogue response but not on the generated skeleton. In contrast, we further diversify the response generation with a classifier conditioning on both the generated personas and the generated response.

Model PPL ROUGE METEOR Distinct-1 Distinct-2
E2E w/o Partner Personas
E2E w/ Training Partner Personas
Multi-task Learning (Lee et al., 2021)
Our Framework w/ RL PPG
Our Framework w/ RL DRG
Our Framework w/o RL
Our Framework w/ RL PPG&DRG
Table 1: Test results on the PersonaChat dataset. Perplexity (PPL) attains better quality with lower scores, and the remaining metrics attain better quality with higher scores. PPG represents partner personas generation and DRG represents dialogue response generation. We leave corresponding validation performance in Appendix A.  

4 Experimental Setup

4.1 Dataset

We conduct all our experiments on PersonaChat (Zhang et al., 2018), the largest multi-turn dialogue dataset conditioned on personas profile. We follow the training/validation/testing split from the ParlAI platform (Miller et al., 2017), which contains about 65,000 training instances, about 7,800 validation instances and about 7,500 testing instances. As for the reinforcement learning in Section 3.3, we collect about 130,000 training instances from the training partition with equally distributed positive and negative samples to train our critic network.

Figure 3: Change of the validation perplexity for our proposed framework during RL.

4.2 Baselines and Comparison Models

End-to-end Baseline without Partner Personas

Our first baseline is an end-to-end response generator without any partner persona information throughout the experiment. We notice that it could be unfair to directly compare our proposed framework with this baseline, as our framework uses ground truth partner personas in the training stage for our proposed framework. Since it does not use the same amount of training information as our proposed framework, we offer our second baseline trained with the ground truth partner personas.

End-to-end Baseline with Training Partner Personas

Our second baseline is a an end-to-end response generator that is trained using ground truth partner personas. In the inference stage, we feed the concatenation of self personas and dialogue context as the input. For the sake of fairness, it uses the same amount of training information and inference information as our proposed framework.

Multi-task Learning Comparison Model

Following prior work (Lee et al., 2021), we build a multi-task learning comparison model with partner personas generation as an auxiliary task. The model is trained to maximise the training objective of the sum of the partner personas generation labels likelihood , and the dialogue response generation labels likelihood . Both of the tasks are generated and conditioned on dialogue context and self personas by sharing the same model parameters. We maximise the loss:

where is a loss weighting parameter which we tune it over the validation set.

4.3 Implementation Details

For all of our baselines, comparison model, the partner personas generator and the dialogue response generator, we use pre-trained GPT-2

(Radford et al., 2019) to initialize their model parameters. For the supervised phase, we set Adam (Kingma and Ba, 2014)

as our optimizer, with hyperparameters

, , ,

. We fine-tune 2 epochs on all the baselines, comparison models and our proposed framework modules and select the models with the lowest validating perplexity. For the RL phase, we set Adam as our optimizer, with hyperparameters

, , , . We update the model parameters every training instances and validate the model performance every updates. For our critic network for reward judgement in the RL phase, we use DistilBERT (Sanh et al., 2019) to initialize the model parameters. We set Adam as our optimizer, with hyperparameters , , , . We fine-tune the critic for 1 epoch on the original training split from the PersonaChat. We conduct all our experiments based on the Transformers library from Huggingface (Wolf et al., 2020).

4.4 Evaluation Metrics

We report intrinsic perplexity to evaluate the model with the ground truth response (Roller et al., 2020). We report distinct-1 and distinct-2 (Li et al., 2015) to evaluate model diversity, which calculate the ratio of distinct unigrams/bigrams against total unigrams/bigrams generated. We report ROUGE (Lin, 2004) and METEOR (Banerjee and Lavie, 2005) as the extrinsic evaluations.

5 Results

Recall our four claims: a) Our framework generates diverse partner personas than the ground truth partner personas. b) Partner personas generation benefits the succeeding response generation, thus surpassing our baselines. c) Our framework generates more diverse and engaging responses than our competitive baseline that uses ground truth training and inference partner personas. d) We employ reinforcement learning with a dedicatedly designed critic network that effectively boosts our framework. Also, recall the three hypotheses that motivate our work: i) Partner personas generation is plausible. ii) Generated personas are more diverse and interesting than the retrieved ground truth. iii) Generated personas boost succeeding dialogue generation. In the remaining section, we attempt to verify these claims and hypotheses.

5.1 Main Results

The main results are presented in Table 1.

Our Framework w/o RL

Even when trained without the reinforcement learning algorithm, our framework surpasses all our baselines and a comparison model. This phenomenon serves as an evidence for our claim b) and hypothesis iii). Therefore, our framework more efficiently parameterises the training partner personas than our E2E baseline trained with ground truth partner personas. Also, our framework relaxes the constraint that requires the main and the auxiliary task to have a similar nature, which is unrealistic in our case.

Model Distinct-1 Distinct-2
Validation Set
Ground Truth Label
Our Generator
Test Set
Ground Truth Label 0.003267 0.03856
Our Generator
Table 2: Diversity evaluation for the generated partner personas against the ground truth partner personas.
Model Distinct-1 Distinct-2
E2E w/ Full
Our Framework
Table 3: Testing diversity evaluation for dialogue generation on our framework against the competitive E2E baseline with ground truth partner personas. We leave corresponding validation performance in Appendix A.

Our Framework w/ RL PPG&DRG

Our proposed framework performs the best when trained with reinforcement learning. Again, this phenomenon strengthens our claim b) and hypothesis iii). In addition, this phenomenon serves as direct evidence for our claim d). This phenomenon implicitly verifies our hypothesis i), because we train our critic network using ground truth partner personas. Since the reinforcement learning algorithm is effective, our critic network produces valid rewards, which indicates that the generated partner personas are reasonable. Furthermore, we present the change of validation performance for dialogue generation and partner personas generation in Figure 3.333The result is scaled for the sake of space and clarity. We observe that both performances improve during the RL stage. Therefore, the improvement in partner personas generation intimately relates to the improvement in the succeeding dialogue generation; see our claim b) and iii).

Dialogue Context Generated Personas Ours E2E w/ Full
Hi, want to go shopping with me? I love to shop. I drive BMW. Sure! I love shopping. I am a bit envious. I am a student at a small community college.
I am just happy that the apartment is quiet for once. I live in an apartment. I live in an apartment. I like to watch TV. I live in California.
I just went hiking also tired. I love the outdoors. I am sorry to hear that. Do you like to travel? I do not know. I am not very good at it.
Table 4: A case study that compares our framework against the baseline that conditions on the ground truth partner personas. We denote the latter one as E2E w/ Full. For the sake of clarity and space, we present the preceding partner utterance as dialogue context, and we give the most salient generated partner personas.
Criteria E2E w/ Full Ours
Appropriateness
Informativeness 31
Engagingness
Human-likeness
Table 5: Human evaluation results in winning percentages. indicates the results as passing a two-tailed binomial significance test with .

End-to-end (E2E) Baseline Models

Our E2E baseline with training partner personas has a better perplexity and worse extrinsic scores than the E2E baseline without partner personas. This might be due to the training-inference discrepancy, which could significantly impact the extrinsic evaluations.

Multi-task Learning Comparison Model

Our multi-task learning comparison model produces inferior results. This is, however, predictable. First of all, the prior work (Lee et al., 2021) constrained itself to a meta-learning framework. More concretely, the nature of partner personas generation and dialogue response generation largely differs. The output format of partner personas always initiates with first-person sentence starters, while dialogue responses give more general responses ranging from greetings to goodbyes.

5.2 Case Study on Dialogue Response Generation

Table 4 depicts the case study for response generation. In the first case, our partner personas generator successfully gave a reasonable imagination that a person who likes shopping could be rich and drive a luxury car, which is not in the ground truth personas. This follows a surprising response ‘I am a bit envious’. In the second case, our personas generator successfully identifies that the partner lives in an apartment. Succeedingly, the response generator gives a relevant response and reduces the undesired hallucination444‘I like to watch TV’ is in the self personas, but ‘I live in California’ is not. The latter one is thus a hallucination. from the baseline model. In the third case, our partner personas generator generates a partner persona ‘I love the outdoors’, which is not even in the ground truth personas. After that, the response generator produces a relevant response which also expresses empathy. These facts support our underlying hypotheses i), ii), and iii). Furthermore, as in Table 3, we compare our proposed framework with an end-to-end dialogue agent with both training and inference ground truth partner personas. For these cherry-picked examples, our framework generates more informative and engaging responses than this competitive baseline. This verifies our claim c) and hypothesis iii).

Generated Partner Personas Ground Truth Partner Personas
I am working on a biology degree. I love book. 1984 is my favourite book. I am in college. I am a student. I attend university and study biology. I am very studious and do not like to party or drink. I want to be marine biologist.
I have been married for 6 years. I am a financial analyst for a brewery. I like to go to the casino on weekends. I am a carpenter. I like playing poker. I do not have many friends. I have a wife and three kids.
I like to watch football. My friends like watching it too. Its great fun. We drink beer and eat food. I love watching football on Sunday. I have three dogs. My favroutie food is cheese piazza. I am a hair stylist.
Table 6: A case study to show that our generated personas are relevant, informative and coherent. For the sake of space, we present more cases in Appendix B.

5.3 Human Evaluation

We hired experienced annotators who have degrees relevant to English Linguistics or Applied Linguistics. We present a questionnaire composed of 280 questions with randomly sampled 70 testing instances. Three annotators compare model outputs in an A/B setting. As in previous work (Zou et al., 2021) and ACUTE-Evals (Li et al., 2019), annotators follow the criteria:

  • (Appropriateness): "Who is more appropriate given the previous dialogue context?"

  • (Informativeness): "Who is more diverse instead of null answers such as I do not know?"

  • (Engagingness): "Who would you prefer to talk with for a long conversation?"

  • (Human-likeness): "Which speaker do you think sounds more like a real person?"

Table 5 presents the human evaluation results. It is exciting to see our framework trained under reinforcement learning surpasses the end-to-end model that leverages both training and inference ground truth partner personas, from all the aspects, significantly. This supports our claim c) and d).

5.4 Ablation Study on Reinforcement Learning

Table 1 presents an ablation study on the framework when only one of the modules, namely partner personas generator (PPG) or dialogue response generator (DRG), is trained under reinforcement learning. Our framework exceeds these two variants in all the metrics except for perplexity, which aligns with the prior work (Roller et al., 2020).555PPL does not always correlate well with other metrics.

5.5 Case Study on Partner Personas Generation

As depicted in Table 2, we observe that our partner personas generator generates more diverse partner personas compared to the ground truth partner personas label, which is essentially the upper bound for retrieval-based partner personas predictor. This phenomenon verifies our claim a) and hypothesis i), indicating our generator produces even more informative and interesting partner personas than the ground truth partner personas.

As depicted in Table 6, our partner personas generator can generate plausible partner personas which are relevant to the ground truth partner personas. Our partner personas generator can give fascinating but reasonable imagination that is not even in the dialogue context or the ground truth partner personas. In the first case, the generator successfully identified the partner as a student studying biology. The generator recognizes the partner as being married in the second case, which is not even mentioned in the dialogue context. This phenomenon could be a matter of the fact that personas could be semantically closer to each other when they frequently co-occur in the training set. In the third case, the generator generates diverse personas, saying that the partner would drink beer and eat food while watching football, which is even not in any of the dialogue context or the ground truth partner personas. This verifies claim a) and hypothesis i) and ii), We depict more case studies in Appendix B to show that our personas generator generates informative and coherent partner personas.666Our partner personas generator is even capable of producing unseen personas. An offensiveness check thus is necessary for the actual usage, as in prior works (Baheti et al., 2021).

6 Conclusion

Our novel framework incorporates partner personas generation into succeeding dialogue response generation. First of all, our proposed framework mitigates the cold-start problem in practical applications when ground truth partner personas could be missing during inference. The experimental results with both automatic and human evaluation demonstrate that our framework generates informative and coherent partner personas, even compared to the ground truth partner personas, yet still reasonable and relevant. This enhances the succeeding response generation, thus surpassing our baselines and producing responses that are more diverse and engaging than our baseline conditioned on the ground truth partner personas. We employ reinforcement learning with a dedicatedly designed critic network that boosts our framework. Extensive case studies demonstrate that our framework can generate satisfying dialogue responses and partner personas. Finally, our framework gives better explainability and reduces the demands for external databases for partner personas.

References

Appendix A Validation Performance

Table 8 presents the corresponding validation performance to Table 1. Table 7 presents the corresponding validation performance to Table 3.

Model Distinct-1 Distinct-2
E2E w/ Full
Our Framework
Table 7: Validation diversity evaluation for dialogue generation that compares our framework against the competitive E2E baseline with ground truth partner personas, as promised in Table 3.

Appendix B More Case Studies

Table 9 presents extensive case studies for partner personas generation. These examples indicate that our framework can generate informative and coherent partner personas. We highlight in pink for informativeness and in yellow for coherence.

Model PPL ROUGE METEOR Distinct-1 Distinct-2
E2E w/o Partner Personas
E2E w/ Training Partner Personas
Multi-task Learning (Lee et al., 2021)
Our Framework w/ RL PPG
Our Framework w/ RL DRG
Our Framework w/o RL
Our Framework w/ RL PPG&DRG
Table 8: Corresponding validation performance as promised in Table 1
Generated Partner Personas
Personas A: I am a shy person, but I love to sing. Until recently, I ve never been able to sing in front of anyone. Anyways, I decided to give it a try and participaed in an audition for a talent show. My shyness made me panick and I didn t show up.
Personas B: I play the violin. I am married with 5 kids. I am nurse. I met my husband when I was a freshman in college.
Personas C: I am a soccer player. I am a goalie. My number is 42. Nike cleats are my favorite. I joined a new team last month.
Personas D: I have two kids, ages 2 and 6. I am from sterling heights, michigan. My favorite movie is titanic. I work part time at aldis. My husband owns a small auto repair shop.
Personas E: I am a retired computer programmer. I have one grandson and one daughter. I just turned 77. I love animals. I like watching british tv shows and movies.
Personas F: I like to go hunting. I like to remodel homes. I like to shoot a bow. My favorite holiday is halloween. I like to go shopping with my daughters.
Personas G: I have a large cd collection. I collect stamps. Favorite band is the beetles. I play the bass. I like vintage furniture.
Personas H: I have a daughter. I am a yoga instructor. I enjoy shopping. I have two adopted kids.
Personas I: I like to drink wine. I enjoy reading history books. I am a teacher. I love to write stories while sitting in the grass in my back yard. I grew up in new hampshire.
Personas J: I am retired. I stay active. I have eight grandchildren. I have good health.
Personas K: I m a student. I like to go out to eat. I like listening to other rap music too. One of my favorite artists is drake. A hobby of mine is the drums. I also enjoy cooking .
Personas L: I have two children. I like to go on walks. I am from mexico. I used to be a chef, but I am a teacher now. I like to bake.
Personas M: I drive a prius. My mom stays at home. I was adopted when I was a baby. My adopted dad works at hp.
Personas N: I love youtube. My father works in advertising agency. I have my own channel. I enjoy making let s plays.
Personas O: I like to cook. I am a foodie. I love to chat with my friends. I love kids and dogs. I like to go shopping with my daughters.
Personas P: My family hates my fiance. We will be traveling to niagra falls for our honeymoon. We are getting married in a park. My dog is the ring bearer.
Table 9: Case studies as promised in Table 6. We highlight in pink for informativeness and in yellow for coherence.