Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

07/16/2021 ∙ by Yajing Sun, et al. ∙ 6

End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation. Besides, it is indispensable to control personal knowledge utilization over the conversation level. In this paper, we propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds. First, we consider conversation consistency from multiple views. From the view of the persona profile, we design a novel interaction module that not only iteratively incorporates personalized knowledge into each turn conversation but also captures the personality-related information from conversation to enhance personalized knowledge semantic representation. From the view of speaking style, we introduce the speaking style vector and feed it into the decoder to keep the speaking style consistency. To avoid conversation repetition, we devise a coverage mechanism to keep track of the activation of personal knowledge utilization. Experiments on both automatic and human evaluation verify the superiority of our model over previous models.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Generating human-level conversation by machine has been a long-standing goal of artificial intelligence. Because of a large amount of human conversational data available, sequence-to-sequence models and their extensions

sutskever2014sequence; bahdanau2014neural; DBLP:journals/corr/VinyalsL15; shang2015neural have been widely adopted to learn a generative conversational model. These models incorporate rich knowledge information into the dialogue context and generate diversity responses. Despite great progress has been made, common issues still exist in dialogue systems. First, the dialogue system is incapable of adapting to users with different personalities, which leads to dialogue inconsistent. Second, the module tends to generate repetitive but meaningless content DBLP:journals/corr/SerbanLCP16 because the same external knowledge is incorporated and utilized multiple times bao2019know.

Presenting a consistent persona is beneficial to gain trust from the users and make users engaged for dialogue system. In recent years, several approaches have been developed to generate consistent responses. There are two methods to construct personalized neural conversation models. li2016deep represents user persona knowledge as a vector to capture the speaking style of the speaker implicitly and feed it into the decoder. It’s expensive to train these models because they need large quantities of conversational data labeled by user identifiers. Thus, some personal models generate personality-coherent responses using explicit personal profiles through either structural DBLP:journals/corr/abs-1902-04911 or textual DBLP:journals/corr/abs-1801-07243. It’s a matter of fact that persona information, which presents two aspects: persona-profile consistency and speaking style consistency, can greatly improve the consistency and interactivity of the dialogue.

However most of the existing methods tend to pay attention to utilizing unilateral information that the personal knowledge flows to the conversation to select and incorporate personal knowledge into the dialogue. In fact, leveraging the bilateral information flow of conversation and personal knowledge is of great importance to promote conversation quality. Figure 1 illustrates the reasons with an example. First, at the second turn, since the model incorporates personal knowledge I have four children into the dialogue, the response is personality-consistent. Second, the personality-related information from 10 to 21 in the conversation is of great importance for broadening personal knowledge, which is beneficial to subsequent conversations. Moreover, we can see that for the -th turn dialogue, the conversation history is greatly related to the topic children. If model is lacking of the effective control over the proper knowledge utilization on the whole conversation level, it’s easy to select I have four children again, which leads to repetition and couldn’t widen more abundant topics to make the conversation more interactive. Intuitively, we can conclude that: (1) The quality of dialogue systems can be improved if the model considers the information interaction between personal knowledge and conversation. (2) It’s also indispensable to keep track of the activation of each personal information and coordinate the balance of semantic relevance and repetition. (3) The partner’s persona profiles also reflect the model’s response generation.

As a result, we propose the serialized persona aware response generation model to address personality inconsistency and repetition problems. We address conversation consistency from two views. On the one hand, We design a novel interaction module that not only considers to fuse personalized knowledge into the conversation but also captures personality-related information from the conversation to enhance personalized knowledge semantic representation, which results in the success of the subsequent conversation. Moreover, we also keep track of the activation of personal information to avoid repetition. On the other hand, we consider the speaking style based on both speakers’ persona profiles and fed it into the decoder to generate a response with profile and speaking style consistent.

The contribution of this work are summarized as follows:

  • With the object of keeping dialogue consistent, we divide conversation consistency into the profile consistency and speaking style consistency. We model the bilateral semantic information flow between personal knowledge and conversation to keep profile consistency. And we devise the speaking style vector and incorporate it into the decoder to maintain speaking style consistency.

  • To avoid conversation repetition, we introduce a coverage mechanism to keep track of the activation of knowledge utilization to balance semantic relevance and repetition between conversation and knowledge when we incorporate personal knowledge into the every turn conversation.

  • Intensive and extensive experiments have been carried out on ConvAI2 and CMUDoG datasets. The comprehensive experiments demonstrate that our model significantly outperforms the existing methods in keeping dialogue consistency and alleviating repetition.

Figure 2: Model Overview. There are three parts including the encoder module, the interaction module, and the decoder module. The network takes context C, two speakers’ knowledge sentences and as inputs and generate appropriate responses. In the each step of interaction module, the module will update persona A information and generate coverage vector , which is used to avoid repetition.

Related Work

Sequence-to-sequence models and their extensions sutskever2014sequence; bahdanau2014neural have been widely adopted to learn a generative conversational model from large-scale social conversation data. In recent years, modeling personality consistent dialogue system is drawing increasing attention. The first attempt to model persona is li2016deepli2016deep, which uses the learned persona embedding to capture the users’ background information and speaking style into the model to keep consistency. DBLP:journals/corr/abs-1710-07388 DBLP:journals/corr/abs-1710-07388

integrates participant role and context information into LSTM. These work crucially depends on the availability of large amounts of speaker-specific conversational data, which are expensive and can’t be obtained in many domains. Besides, there are some interesting researchers attempting to use multi-task or transfer learning to model personalized dialogue system.

DBLP:journals/corr/abs-1710-07388 DBLP:journals/corr/abs-1710-07388 proposed a multi-task learning approach. The model utilizes both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be modeled. Moreover, another methods, namely explicit personalization approaches, attempt to endow dialogue models with persona which is described by natural language sentences or triples. DBLP:journals/corr/QianHZXZ17 DBLP:journals/corr/QianHZXZ17 constructed the structural personality knowledge and assigned a desired identity to a chat-bot. DBLP:journals/corr/abs-1801-07243 DBLP:journals/corr/abs-1801-07243 contributed the Persona-Chat dataset which gave a text-described persona, and they further proposed both ranking and generative models. DBLP:journals/corr/abs-1902-04911 DBLP:journals/corr/abs-1902-04911 pay attention to select appropriate profile knowledge. DBLP:journals/corr/SerbanLCP16DBLP:journals/corr/SerbanLCP16 presented the repetitive problems in dialogue systems. li2016deepli2016deep

used reinforcement learning methods to model future reward that display the conversational property of non-repetitive turns.


proposed that a good conversation need to avoid repetition, make sense, keeping fluent and coordinate them well. And they define five n-gram based decoding featues to identify repeating bigram features and content words.

bao2019know bao2019know proposed Generation-Evaluation framework to control knowledge selection via reinforcement learning.

But these methods just attempt to incorporate personality information to keep consistency. None of existing models pay attention to incorporating personality-related information of the conversation into the personal knowledge, which benefits the success of the subsequent conversation. So our work focuses on building a multi-turn dialogue system by modeling the bilateral information flow of conversation and personal knowledge to keep conversation consistency.


Our sentence encoder module is based on the BiGRU with attention mechanism. Specifically, given a sentence where is embedding of -th word in sentence , we firstly to run Bi-GRU cho-etal-2014-learning to capture the contextual information. Mathematically, given the word embedding at time step , previous forward hidden vector and last hidden vector , GRU recurrently computes as follows:


Then we apply attention mechanism to calculate sentence-level contextual representation, which focus more on important semantic information in the sentence-level. Practically, for sentence ,


, were . We denote the whole encoder module as .


Before presenting the model, we first provide the problem formulation. Suppose that we have a dataset . Let , where represents the -th piece personal knowledge of speaker . is the number of knowledge, is the length of a sentence. Similarly, let represent personal knowledge of speaker . And we denote as conversation context with utterances . The current question is located in the last turn. is the turn of context. Our goal is to generate consistent and diversity answer .

Model Overview

The framework is illustrated in Figure 2 and it consists of three components summarized as follows,

Encoder module

encodes the conversation context and personal knowledge into the semantic representation respectively, which aims to capture the important contextual information of them. And the module also calculate speaking style vector, which will be fed into the decoder.

Interaction module

is responsible for incorporating personal knowledge into the conversation context, updating the knowledge state and enhancing semantic representation of knowledge. The module defines coverage vector to keep track of the activation of personal knowledge utilization, which endows the model the capability of avoiding repetition. And the persona-aware history representation guarantees model consistency and updated personal representation extracts personality-related semantic information from conversation, resulting in successful subsequent conversation.

Decoder module

generates consistent and diversity responses based on the persona-aware context vector and the speaking style user vector.

Encode Module

Understanding the context and scene of the conversation is crucial for a dialogue system. The encoder module is responsible for extracting important contextual information from the sentences to enhance dialogue understanding. It consists of two parts: the first is to encode the context into a dense semantic representation using sentence encoder and the second is to calculate speaking style vector based on the personal knowledge of both parties since speaking style consistency is also important.

The sentence encoder consists of a Bi-GRU component and a self-attention component. Specifically, we encode personal knowledge , and conversational context using . Mathematically,


And . Considering the importance of speaking style, we implement the speaking style encoder to calculate a user style vector based on both speakers’ personal knowledge. Specifically,


The speaking style vector is then linearly incorporated into the decoder at each step.

Interaction Module

It’s indispensable to leverage the bilateral information flow of conversation and personal knowledge. Incorporating personal knowledge into the dialogue promotes the dialogue consistency, and keeping track of the state of knowledge using on the conversational level also can alleviate repetition. Moreover, it’s of great importance to fuse the persona-related conversation information into the persona semantic representation, resulting in diversity and consistent subsequent conversation.

Based on this, we design a serializing persona-conversation interaction module which recurrently updates the personal knowledge from utterance level and progressively incorporate it into the history step-by-step. For sake of semantic relevance, we first use attention mechanism to calculate persona-aware history representation and history-aware persona representation at the turn level based on the history and personal knowledge. To avoid repetitive utilization for personal knowledge, we proposed a coverage vector to keep track of the utilization of persona knowledge and design gate mechanism to get persona-aware history at the conversational level, which is sensitive to the knowledge utilization state. Then we fuse the two different granularity history representation to balance the semantic relevance and repetition. Finally we consider the sequential information between different history turns, and use hierarchical recurrent mechanism to calculate the final history representation.

The module concentrates on the interaction between the personal knowledge and conversation in every turn. Personal knowledge semantic representation and state representation is recurrently updated based on the history information , we define the dynamic update path as follows,


We represent the personal knowledge from two aspects: the semantic representation and the utilization state representation. We define the initial knowledge semantic and state representation are and respectively. is a zero vector which means knowledge is not used in the beginning and

Firstly, since dialogue understanding is closely related to the personal knowledge, we get persona-aware history for sake of semantic relevance as follows,


In terms of the repetition, distinct from the semantic relevant persona-aware history representation , we combine the coverage vector . Because record the history of knowledge utilization, it will discourage the attention which has been heavily attended in the past conversation while implicitly push the attention to the less attended personal knowledge. Mathematically,


Since consider the knowledge utilization in the past conversation, can be viewed as the conversation-level persona-aware history semantic representation, while means the turn-level persona-aware history representation.

Finally, we take consistency and repetition of the dialogue into consideration and fuse the two granularity representation and .


At the every turn, we iteratively update coverage vector through accumulate the attention weights generated by Equation (13), which is straightforward but effective. Formally,


Then we incorporate persona-related history information into the personal information to enhance personal information semantic representation.


Similar to DBLP:conf/aaai/XingWWHZ18DBLP:conf/aaai/XingWWHZ18, we adopt hierarchical recurrent network to capture sequential contextual information from conversational level. Specifically, the history representation , where is fed to a GRU with attention mechanism to pick up important information from the history into a vector. we represent the final history representation as .

Dataset Model Automatic Evaluation
BLEU-1/2 Distinct-1/2 Knowledge R/P/F1
ConvAI2(original) Seq2Seq 0.3844/0.3046 0.0052/0.0191 0.007/0.0401/0.0115
KG-Net 0.4264/0.3342 0.0055/0.0241 0.0103/0.0544/0.0167
Our model 0.4118/0.3268 0.0165/0.0708 0.0066/0.0349/0.0107
ConvAI2(revised) Seq2Seq 0.3572/0.2828 0.0051/0.0173 0.0025/0.0139/0.0041
KG-Net 0.4534/0.3569 0.0041/0.0177 0.0035/0.0176/0.0057
Our model 0.4163/0.3300 0.0081/0.0320 0.0041/0.0196/0.0066
CMUDOG Seq2Seq 0.0.2275/0.1822 0.0066/0.0193 0.0006/0.0165/0.0010
KG-Net 0.2632/0.2102 0.0145/0.0462 0.0020/0.0467/0.0038
Our model 0.2322/0.1884 0.0465/0.2112 0.0067/0.1155/0.0122
Table 1: Experimental results of automatic metrics with the different models on the persona-chat data. There are two different settings for ConvAI2 data: conditioned on the speakers given original persona or revised persona that does not have word overlap.

Decoder Module

The decoder generates response based on the persona-aware history and the speaking style user representation . we adopt hierarchical gated fusion Unit (HGFU) DBLP:journals/corr/abs-1902-04911 decoder to incorporate into the response generation. It’s consists of three components. the standard GRU calculate the hidden state for the last generated

, persona-style GRU encodes the hidden representation for

and fusion unit design a gate mechanism to fuse them and produce the hidden state of the decoder at time . Specifically,


where .

Then, we generate the next work according to the hidden state as follows:



is a nonlinear function that outputs the probability of


Finally, we use the objective of NLL loss is to quantify the difference between the true response and the response generated by model. It minimizes Negative Log-Likelihood (NLL):

Dataset Model Automatic Evaluation
BLEU-1/2 Distinct-1/2 Knowledge R/P/F1
ConvAI2(original) Our model 0.4118/0.3268 0.0165/0.0708 0.0066/0.0349/0.0112
w/o speaking style 0.3944/0.3117 0.0079/0.0296 0.0058/0.0325/0.0057
w/o knowledge 0.4091/0.3243 0.0068/0.0253 0.0054/0.0287/0.0089
ConvAI2(revised) Our model 0.4163/0.3300 0.0081/0.0320 0.0041/0.0196/0.0066
w/o speaking style 0.4102/0.3249 0.0074/0.0282 0.0035/0.0176/0.0057
w/o knowledge 0.4065/0.3216 0.0064/0.0232 0.0018/0.0109/0.0057
Table 2: Ablation experiments with the different models on the persona-chat data.



We conduct our experiments on two publicly available datasets: CMUDoG zhou2018dataset and ConvAI2, which is an extended version of PersonaChat dataset DBLP:journals/corr/abs-1801-07243. The ConvAI2 dataset has separated training and validation set with original and revised persona profiles. The training set contains utterances and there are utterances in valid dataset.

Besides ConvAI2 data, we also experiment with the CMUDoG dataset published in zhou2018dataset. The dataset consists of conversations with an average of 21.43 turns and has been divided into the train, valid and test dataset. Distinct from the PersonChat, the CMUDoG is more complex and informative semantic information. Besides, the knowledge in this dataset is about the movie. So the knowledge is more relevant to each other, which is helpful for training our model.

Each dialogue Comprehensive comparisons have been made to the following methods:

  • Sequence to sequence model with attention DBLP:journals/corr/VinyalsL15 concats the persona profiles with the history information as inputs.

  • The KG-Net, which is proposed by DBLP:journals/corr/abs-1902-04911DBLP:journals/corr/abs-1902-04911 , makes use of both prior and posterior distributions over knowledge to facilitate knowledge selection. The model achieves the state-of-the-art results on the PersonaChat.

Experiments Settings

As suggested in DBLP:journals/corr/abs-1902-04911DBLP:journals/corr/abs-1902-04911, we train our model using the following settings. For word embedding representation, we use Glove pennington2014glove with an embedding size of 300. For the encoder layer, we use one layer of bidirectional GRU and two different unidirectional GRU for the decoder. And the hidden size of GRU is 800 for ConvAI2 and 500 for CMUDoG. For optimization, we use Adam kingma2014adam optimizer with an initial rate of . And to avoid overfitting, we set the dropout rate as 0.3. We clip the gradient when its norm exceeds . And we train our model epochs or epochs on the ConvAI2 and CMUDoG respectively.

Model repetition
3 2 1 0
Seq2Seq 30% 8% 8% 54%
KG-Net 44% 16% 16% 24%
Our model 54% 9% 20% 17%
Human 44% 35% 6% 15%
3 2 1 -
Seq2Seq 13% 70% 17% -
KG-Net 22% 60% 18% -
Our model 23% 65% 12% -
Human 45% 52% 3% -
Table 3: Human evaluation for benchmarks, along with a comparison to human performance.


Automatic Evaluation

We use BLEU-1/2 papineni2002bleu, Distinct-1/2 li2016diversity and Knowledge R/P/F1 dinan2018wizard to evaluate response generation quality. BLEU-1/2 calculates the average n-gram precision between the generated response and the ground truth. However, because of the one-to-many problems in the dialogue generation, BLEU has poor ability to evaluate the dialogue quality. So we also use Distinct-1/2 to measure the diversity of generated response, which calculates the ratios for unigram and bigram. Besides, we also adopt Knowledge R/P/F1, which is proposed in dinan2018wizarddinan2018wizard, to evaluate how well personal knowledge is expressed.

Table 1 reports evaluation on the ConvAI2 and CMUDoG datasets. As shown in Table 1, our model outperforms most baselines, specifically in Distinct1/2, which means the diversity of generated responses is greatly improved compared to other knowledge-grounded baselines. This verifies our model can not only utilize the personal information to enhance conversation understanding but also provide effective guidance on improving personality understanding based on conversations, which promotes the response with better diversity in turn. Besides, the evaluation of our model on BLEU-1/2 significantly higher than the Seq2Seq model, which demonstrates that our model has the ability to generate high-quality responses.

Besides, compared to the results on the original persona profile setting, the results on the revised persona profile setting don’t decrease obviously, while the other baselines have worse performance in most automatic metrics. This means that the other baselines have poor ability to understand complex semantic information of persona profiles and incorporate it into dialogue.

Moreover, the results on the CMUDoG, which has more complex knowledge information and longer history, obviously outperform the other baselines, including the knowledge R/P/F1. There are two reasons. First, our coverage mechanism can help the conversation to select proper and informative knowledge. Second, the interaction module helps the model to incorporate personality-related conversation into the knowledge, resulting in the diversity responses in the subsequent conversation.

Ablation Study

In our model, we take persona profile consistency and speaking style consistency into consideration and design speaking style module and interaction module to enhance dialogue consistency. Different components play different roles. In order to display the necessity and performance of each component, we conduct the ablation experiments on the ConvAI2 dataset with original and revised profiles. From the result shown in Table 2, we can see that modeling speaking style is greatly helpful for improving dialogue quality and diversity. And it’s necessary to consider bilateral information flow of conversation and persona profiles, which contributes more to the improvements in the performance.

Human Evaluation

Since automated metrics are poor for evaluating the repetition and consistency of our model, we also adopt two kinds of human evaluation metrics, which are suggested by

DBLP:journals/corr/abs-1801-07243DBLP:journals/corr/abs-1801-07243 to evaluate the quality of generated response. Specifically, we selected 100 examples randomly for each model on ConvAI2 with original persona profiles, resulting in 400 examples in total for human evaluation and recruited annotators to rate the consistency, repetition. Note that for each metrics is required to evaluate twice by two participants. The participants are required to score the answers with the following standards.

  • Repetition: this metric measures whether the generated response tends to use the same personal knowledge multiple times. Score means response doesn’t contain personal knowledge. Score indicates that response uses the same but unnecessary information. Score means response use new personal knowledge. Score represents response contain new and different with personal knowledge. The reason that we set score 3 is that the model will generate some other new topics, which also can be regarded as alleviating repetition problems.

  • Consistency: we measure the model’s ability to generate the consistent response. Specially, is inconsistent, is neural and is consistent.

The human evaluation results are demonstrated in the Table 3. We can observe that: (1) our model can avoid repetition by generating more informative responses endowing richer topics than all the baselines, although low ratios in label 2 in repetition demonstrate that our model may be slightly hard to transfer between the given persona profiles. But we observe the dataset and find the fact that the given profiles in the dataset are less relevant to each other. (2) our model can generate more consistent responses than other baselines. In summary, all human judgment results again demonstrate the effectiveness of our model.

I’m a gold medalist olympian.
gymnastics is my favorite sport.
my favorite color is yellow.
i love italian food and garlic bread.
i workout four hours a day.
A: hello. i love to read and my favorite books
 are the hunger games series. you?
B: hi ! i am an olympic gymnast , i don’t
 read much .
A: that is exciting and must allow you to travel
 a lot. ever been to spain ?
B: yes , i’ve . i picked up a lovely yellow
 <unk>there .
A: i hope to go back when i retire in a few
 years . how long before you retire ?
B: i’ve another two years before i can
 retire from gymnastics .
A: what are your plans after retirement ?
i’d like to go to culinary school ,
i am still young.
Seq2Seq i am in the city . i am a student.
KG-Net i am a student . i am a bit of a reader.
Our model i am going to be a doctor . you ?
Table 4: Sampled generation on ConvAI2

Case Study

We also evaluate the models by generating the response given the contextual information and personal profiles. Some sampled results are shown in Table 4. We observe that our model is capable to generate semantic-related responses comparing the other baselines. Beyond being context-aware, the response also extends the other topics, which make the whole conversation interactive.


In this paper, we propose a conversation-adaption multi-view persona aware response generation model to keep dialogue consistent and avoid repetition. First, we divide the consistency as the profile consistency and the speaking style consistency. We pay attention to model bilateral information flow of conversation and personal knowledge and design coverage mechanism to avoid repetition. And we introduce speaking style vector and feed it into the decoder to keep speaking style consistency. The experimental evaluation shows that our model can improve the response generation quality. The ablation evaluation verifies the effectiveness of profile consistency and speaking style consistency. The human evaluation shows that our model can avoid repetition and keep consistency compared to the baselines.


We would like to thank all of the anonymous reviewers for their invaluable suggestions and helpful comments. This work was supported by the National Natural Science Foundation of China (Grant No 62006222).