The rapid advancements of language modeling and natural language generation (NLG) techniques have enabled fully data-driven conversation models, which take user inputs (utterances) and directly generate natural language responses (Shang et al., 2015; Vinyals and Le, 2015; Li et al., 2016). On the other hand, the current generation models may still degenerate dull and repetitive contents (Holtzman et al., 2019; Welleck et al., 2019), which, in conversation assistants, lead to irrelevant, off-topic, and non-useful responses that would damage user experiences (Tang et al., 2019; Zhang et al., 2018; Gao et al., 2019).
A promising way to address this degeneration challenge is to model conversations with the help of knowledge, for example, open-domain knowledge graph (Ghazvininejad et al., 2018), commonsense knowledge base (Zhou et al., 2018a), or background documents (Zhou et al., 2018b). Recent research leverages these prior knowledge by grounding the conversation utterances to the external knowledge and integrating them as additional semantic representations; then response can be generated by conditioning on both the text inputs and the grounded semantics (Ghazvininejad et al., 2018; Zhou et al., 2018a, b).
Integrating external knowledge as a semantic representation of the utterance and an additional input to the conversation model effectively improves the quality of generated responses (Ghazvininejad et al., 2018; Logan et al., 2019; Zhou et al., 2018a). On the other hand, human conversations do not stay still on the same set of grounded semantics; instead, our dialog dynamically flows in the semantic space: we shift our discussions from one concept to another, chat about a group of related entities, and may switch dialog topics entirely (Fang et al., 2018). Limiting the usages of knowledge only to the grounded ones, effective as they are, does not leverage semantics’ full potential in modeling human conversations.
This work presents ConceptFlow, (Conversation generation with Concept Flow), which leverages the commonsense knowledge graph to model the conversation flow in the latent concept space. Given a conversation utterance, ConceptFlow starts from the grounded knowledge, which in our case are the commonsense concepts appearing in the utterance, and extends to multi-hop concepts along the commonsense relations. Then the conversation flow is modeled in the extended concept graph using a new fine-grained graph attention mechanism, which learns to encode the concepts using central or outer graph. Mimicking the development conversation topic flow, the graph attentions guild the concept flow by attending on different directions in the concept flow.
The encoded latent concept flow is integrated to the response generation with standard conditional language models: during decoding, each token, word or concept, is sampled from ConceptFlow’s context vector, which combines the encodings of the utterance texts and the latent concept flow. This enables ConceptFlow to explicitly model the conversation structure when generating responses.
Our experiments on a Reddit conversation dataset (Zhou et al., 2018a) and a commonsense knowledge graph, ConceptNet (Speer and Havasi, 2012), demonstrate the advantage of ConceptFlow. In both automatic and human evaluation, ConceptFlow performs significantly better than various seq2seq based generation models (Sutskever et al., 2014), as well as previous methods that also leverage commonsense knowledge graph but as static memories (Zhou et al., 2018a; Ghazvininejad et al., 2018; Zhu et al., 2017). Notably, ConceptFlow also outperforms two fine-tuned GPT-2 systems (Radford et al., 2019), despite using much fewer parameters—Effective modeling of conversation structure can reduce the need of large parameter space.
We also provide extensive analyses and case studies to investigate the advantage of modeling conversation flow in the latent concept graph. Our analyses show that many of Reddit discussions are naturally aligned with the paths in the commonsense knowledge graph; expanding the latent concept graph multiple hops away from the initial grounded concepts significantly improves the coverage on the ground truth response. Our ablation study further confirms the effectiveness of our graph attention mechanism in selecting useful latent concepts and concepts appearing in golden responses, which help generate more relevant, informative, and less repetitive responses.
This section presents our Conversation generation with latent Concept Flow (ConceptFlow). As shown in Figure 1, the ConceptFlow models the conversation flow along commonsense relations between concepts to generate meaningful responses.
2.1 Preliminary on Grounded Conversation Models
Given a user utterance with words, conversation generation models often use an encoder-decoder architecture to generate a response .
Typically, the encoder represents the user utterance as a representation set
. This is often done with a Gated Recurrent Unit (GRU):
where the is the embedding of word .
The decoder generates -th response word according to the previous generated words and the user utterance :
The -th token is generated according to the -step decoder context representation :
where is the context embedding of -th time, is the -th generated word embedding and is the decoder output representation of -th time.
2.2 Conversation Generation with Latent Concept Flow
This part introduces the flow concept candidate construction, the latent concept flow encoding, and the conditional conversation decoder to generate response.
2.2.1 Constructing Flow Concept Candidates
ConceptFlow constructs a latent concept graph for knowledge grounded conversation generation. The latent concept graph starts from the grounded concepts (zero-hop concepts ), which appear in the conversation utterance and grounded by entity linking. Besides the grounded concepts, ConceptFlow grows zero-hop concepts with one-hop concepts and two-hop concepts . and form the central concept graph , which is closely related to the current conversation topic. and construct an outer concept graph , which models outer conversation flow.
Latent concept flow consists of related concepts that can help understand the conversation. Next, we model conversation flow from zero-hop concepts, to one-hop concepts and then to two-hop concepts.
2.2.2 Encoding Latent Concept Flow
This part describes the latent concept flow encoding of central flow concepts and outer flow concepts.
Central Flow Encoding.
The central flow concept models the concept flow from zero-hop concepts to one-hop concepts using the interactions between zero-hops and one-hops. A multi-layer Graph Neural Network (GNN)(Sun et al., 2018) is used to encode concept in central concept graph. The -th layer representation of concept is calculated by a single-layer feed-forward network (FFN) over three states:
where is the concatenate operator. represents the concept ’s representation of -th layer. represents user utterance representation of -th layer.
The -th layer user utterance representation is updated with the grounded concepts :
The aggregates the concept semantics of the relation specific neighbor concept . It uses attention weight to control concept flow from :
where is the concatenate operator and is the relation embedding of . The attention weight is computed over all concept ’s neighbor concepts according to the relation weight score and the Page Rank score (Sun et al., 2018):
where is the page rank score to control propagation of embeddings along paths starting from (Sun et al., 2018) and is the -th layer user utterance representation.
The -th layer concept representation for concept is initialized with the pre-trained concept embedding and the -th layer user utterance representation is initialized with the -th hidden state from the user utterance representation set . The result GNN encodings establishes the central concept flow between zero-hop and one-hop concepts using attentions.
Outer Flow Encoding. The outer flow models the concept flow from one-hop concepts to two-hop concepts. Given a concept from the one-hop concepts , all neighbor concepts are weighted to form the subflow ’s representation :
where is the concatenate operator. The and are embeddings for and , respectively. The attention score is calculated to weight and aggregate concept triple to get :
where is the relation between the concept and its neighbor concept . The , and are learnable parameters. The outer concept flow models more diverse developments of the conversation and guides the flow with the subgraph encoding to more possible directions.
2.2.3 Decoding from Latent Concept Flow
This part presents how to generate the response using the latent concept flow.
Context Representation with ConceptFlow. The -th time output representation of decoder is calculated by updating the -th step output representation with context representations:
where is the -th step generated token ’s embedding. The context representation is the concatenation of the text-based representation and the concept-based context representation :
reads the user utterance representations with the attention score (Bahdanau et al., 2014):
where the attention weights over user utterance representations:
The concept-based representation is a combination of central concept flow encodings and outer flow encodings:
where the attention weights over central concept representations:
The attention weights over outer flow representations:
Generating Words and Concepts. The conversation generator utilizes the -th time output representation to decode -th words in response from the word vocabulary and the concept vocabulary:
where is the word embedding for word , is the central concept representation for concept and is the two-hop concept ’s embedding.
The generation probability of wordis calculated over word vocabulary. The generation probability of concept is separated into two parts: central concept ’s probability over and outer concept over . The
is a gate used to control the token generation from these three probability distributions:
to choose words (), central concepts () and outer concepts () when generating the response.
Then we minimize the cross entropy loss and optimize all parameters end-to-end:
where the is the ground truth tokens for words or concepts.
3 Experiment Settings
Dataset. All experiments use the Commonsense Conversation Dataset (Zhou et al., 2018a), which collects single-round dialogs from the Reddit site. This dataset contains 3,384,185 training pairs, 10,000 validation pairs and 20,000 test pairs. ConceptNet is used for our commonsense graph. It contains 120,850 triples, 21,471 concepts and 44 relations. For each example in the Commonsense Conversation Dataset, the average number of central concepts and two-hop concepts are 98.6 and 782.2, respectively.
A wide range of evaluation metrics are included from three evaluating aspects: relevance, diversity and novelty. PPL(Serban et al., 2016), Bleu (Papineni et al., 2002), Nist (Doddington, 2002), ROUGE (Lin, 2004) and Meteor (Lavie and Agarwal, 2007) are used for relevance and repetitiveness; Dist-1, Dist-2 and Ent-4 are used for diversity, which is same with the previous work (Li et al., 2015; Zhang et al., 2018). Zhou et al. (2018a)’s concept PPL favors concept grounded models and is reported in Appendix A.1. The Precision, Recall and F1 Score to generate golden concepts (those appear in the ground truth response) are used to evaluate the quality of learned latent concept flow.
Baselines. Six baselines are compared in our experiments. Seq2Seq (Sutskever et al., 2014) is the basic encoder-decoder for the language generation task. MemNet (Ghazvininejad et al., 2018) and CopyNet (Zhu et al., 2017) utilize extra knowledge in two different ways: maintain a memory to store and read concepts; copy concepts for the response generation. Both MemNet and CopyNet provide solutions to store and incorporate knowledge for conversation generation. The Commonsense Knowledge Aware Conversation Generation Model (CCM) (Zhou et al., 2018a) leverages a graph attention mechanism to model local graphs, which further considers the graph structure for the improvement. The three models above use grounded graph as still knowledge and do not explicitly model conversation flow.
GPT-2 (Radford et al., 2019), the pre-trained model that achieves the state-of-the-art in lots of language generation tasks, is also compared in experiment. We fine-tune the 124M GPT-2 in two ways: concatenate all conversations together and train like a language model (GPT-2 (lang)); extend the GPT-2 model with encode-decoder architecture and supervised with response data (GPT-2 (conv)).
Implement Details. The zero-hop concepts are initialized by matching the keywords in post to concepts in ConceptNet, the same with CCM (Zhou et al., 2018a). Then zero-hop concepts are extended to their neighbors to form central concept graph. The outer concept flow usually contains lots of noise because of the large number of two-hop concepts. To select more related concepts for the conversation generation to reduce the computation cost, ConceptFlow first randomly selects 10% training data to train an initial version. Then we use the initial version’s learned graph attention to select the top-100 two-hop concepts on all the rest data, and then conduct standard train, develop, and test process with the pruned graph. More details of this concept selection step can be found in Appendix C. TransE (Bordes et al., 2013) embedding and Glove (Pennington et al., 2014) embedding are used to initialize the representation of concepts and words, respectively. Adam optimizer with learning rate of 0.0001 is used to train the model.
This section presents the quality of generated responses from the ConceptFlow, the ablation study for the roles of different modules, and case studies to evaluate the ConceptFlow.
4.1 Conversation Generation Quality Estimation
|Diversity()||Novelty w.r.t. Input()|
In Table 1, all the evaluation metrics compare the relevance between the generated response and the golden response. Our model outperforms all previous models by large margins. Responses generated by our model are on-topic and cover more necessary information. In Table 2, Dist-1, Dist-2, and Ent-4 measure the word diversity of generated response, whereas the rest of metrics measure the repetitiveness comparing to the user utterance to avoid dull copying the input. Our model also presents a convincing balance to generate novel and diverse responses. GPT-2 (lang) performs more diversely, but ConceptFlow performs more novelty and more on-topic than both GPT-2 versions, perhaps due to its different decoding strategy.
|Model||Parameters||Average Score||Best@1 Ratio|
The human evaluation mainly focuses on two testing scenarios: appropriateness and informativeness, which are important for conversation systems. Appropriateness indicates if the response is on-topic for the given utterance. Informativeness indicates the ability to provide new information instead of copying from the utterance (Zhou et al., 2018a). All responses of sampled 100 case are selected from four best methods: CCM, GPT-2 (conv), ConceptFlow and Golden Response. The responses are scored from 1 to 4 by five judges.
The model performance is listed in Table 3. The human evaluation is divided into two parts: Average Score and Best@1 ratio, where Best@1 ratio indicates the fraction of judges consider the corresponding response as the best. ConceptFlow outperforms all baseline models on all scenarios. This convincing result demonstrates the advantage of explicitly modeling conversation flow with semantics: ConceptFlow outperforms GPT-2 with one-third parameters. More details of human evaluation are presented in Appendix D.
4.2 Ablation Study
This part studies the effectiveness of the learned latent ConceptFlow. Figure 2 shows golden concept coverage, effectiveness for golden concept selection and perplexity of response generation of four different strategies to select latent concepts. Base only considers central concept graph. Random, Gold, and Full add two-hop concepts in three different ways: Random selects concepts randomly, Gold selects all golden concepts with random negatives, and Full is our method that selects by learned graph attentions.
As shown in Figure 2(a), Random has almost the same coverage with Base, while ConceptFlow (Full) performs better than Random by a large scale. This confirms the concept selection in ConceptFlow effectively selects more meaningful outer concepts for conversation generation. Then the effectiveness of two-hop concept selection strategies is presented in Figure 2(b). Full outperforms all models with Precision, Recall and F1. The ConceptFlow filters unrelated concepts and chooses underlying concepts to enhance the central graph understanding.
The high-quality latent concept flow leads to ConceptFlow’s advanced performances in Figure 2(c). Interestingly, ConceptFlow even outperforms Gold in Perplexity, even Gold includes all two-hop concepts from the golden response. This shows that the “negatives” selected by ConceptFlow, even not directly appear in the target response, are also only topic and related, thus provide more meaningful information than Gold’s random negatives. More results are presented in Appendix A.2.
4.3 Case Study
Figure 3 presents a case of ConceptFlow to demonstrate model effectiveness. The attention score and on central concepts and two-hop concepts are illustrated. The [rgb]0.5,0.5,0championship of zero-hop, [rgb]0,0,0.7fan of one-hop and [rgb]0.5,0,0.7team of two-hop receive more attention than others and are used by ConceptFlow to generate the response. On the other hand, some concepts, such as [rgb]0.5,0,0.7win and [rgb]0.5,0,0.7pretty, are filtered by the gate . More examples are listed in Appendix B.
5 Related Work
Natural language generation (NLG) has achieved promising results with the sequence-to-sequence model (Sutskever et al., 2014) and helped build end-to-end conversation systems (Shang et al., 2015; Vinyals and Le, 2015; Li et al., 2016; Wu et al., 2019). Recently, pre-trained language models, such as ELMO (Devlin et al., 2019), BERT (Peters et al., 2018) and GPT-2 (Radford et al., 2016), further improve the NLG performance with large-scale unlabeled data. Nevertheless, the degenerating irrelevant, off-topic, and non-useful response is still one of the main challenges in conversational generation (Tang et al., 2019; Zhang et al., 2018; Gao et al., 2019).
, while others extract knowledge with Convolutional Neural Network (CNN)(Long et al., 2017) or store knowledge with memory network (Ghazvininejad et al., 2018) to generate better conversation response.
The structured knowledge graphs include rich semantics about concepts and relations. Lots of previous studies focus on domain-targeted dialog system based on domain-specific knowledge base (Xu et al., 2017; Zhu et al., 2017; Gu et al., 2016). To generate the response with a large-scale commonsense knowledge base, Zhou et al. (2018a) and Liu et al. (2018) utilize graph attention and knowledge diffusion to select knowledge semantics for better user post understanding and response generation. Different from previous research, ConceptFlow models the conversation flow explicitly with the commonsense graph and presents a novel attention mechanism using Graph Neural Network to guide the conversation flow in the latent concept spaces.
In this paper, we present ConceptFlow, which models the conversation flow explicitly as transitions in the latent concept space in order to generate more meaningful responses. Our experiments on the Reddit conversation dataset illustrate the advantages of ConceptFlow over previous conversational systems that also use prior knowledge, as well as our fine-tuned GPT-2 systems, though the latter uses much more parameters. Our studies confirm the source of this advantage mainly derive from the high quality and high coverage latent concept flow, which is effectively captured by ConceptFlow’s graph attentions. Our human evaluation demonstrates that ConceptFlow generates more appropriate and informative responses by explicit modeling of the latent conversation structure.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §2.2.3.
- Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, Cited by: §3.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Cited by: §5.
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pp. 138–145. Cited by: §3.
- Sounding board: a user-centric and content-driven social chatbot. arXiv preprint arXiv:1804.10202. Cited by: §1.
- Jointly optimizing diversity and relevance in neural response generation. arXiv preprint arXiv:1902.11205. Cited by: §1, §5.
A knowledge-grounded neural conversation model.
Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §1, §1, §1, §3, §5.
- Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393. Cited by: §5.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751. Cited by: §1.
- METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231. Cited by: §3.
- A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055. Cited by: §3.
Deep reinforcement learning for dialogue generation. In
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1192–1202. Cited by: §1, §5.
- Rouge: a package for automatic evaluation of summaries. In Text summarization branches out, pp. 74–81. Cited by: §3.
- Knowledge diffusion for neural dialogue generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1489–1498. Cited by: §5.
- Barack’s wife hillary: using knowledge graphs for fact-aware language modeling. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 5962–5971. Cited by: §1.
- A knowledge enhanced generative conversational service agent. In DSTC6 Workshop, Cited by: §5.
- BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Cited by: §3.
- GloVe: global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. External Links: Cited by: §3.
- Deep contextualized word representations. In Proceedings of NAACL-HLT, pp. 2227–2237. Cited by: §5.
- Improving language understanding by generative pre-training. 2018. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language_understanding_paper. pdf. Cited by: §5.
- Language models are unsupervised multitask learners. OpenAI Blog 1 (8). Cited by: §1, §3.
- Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §3.
- Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364. Cited by: §1, §5.
- Representing general relational knowledge in conceptnet 5.. In LREC, pp. 3679–3686. Cited by: §1.
- Open domain question answering using early fusion of knowledge bases and text. arXiv preprint arXiv:1809.00782. Cited by: §2.2.2, §2.2.2.
- Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §1, §3, §5.
- Target-guided open-domain conversation. arXiv preprint arXiv:1905.11553. Cited by: §1, §5.
- A neural conversational model. arXiv preprint arXiv:1506.05869. Cited by: §1, §5.
- A neural network approach for knowledge-driven response generation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3370–3380. Cited by: §5.
- Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319. Cited by: §1.
- Transferable multi-domain state generator for task-oriented dialogue systems. Cited by: §5.
- Incorporating loose-structured knowledge into lstm with recall gate for conversation modeling. arXiv preprint arXiv:1605.05110 3. Cited by: §5.
- Incorporating loose-structured knowledge into conversation modeling via recall-gate lstm. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3506–3513. Cited by: §5.
- Generating informative and diverse conversational responses via adversarial information maximization. In Advances in Neural Information Processing Systems, pp. 1810–1820. Cited by: §1, §3, §5.
- Commonsense knowledge aware conversation generation with graph attention.. In IJCAI, pp. 4623–4629. Cited by: Table 4, §1, §1, §1, §3, §3, §3, §3, §4.1, §5.
- A dataset for document grounded conversations. arXiv preprint arXiv:1809.07358. Cited by: §1.
- Flexible end-to-end dialogue system for knowledge grounded conversation. arXiv preprint arXiv:1709.04264. Cited by: §1, §3, §5.
Appendix A Supplementary Results
a.1 Supplementary Results for Overall Experiments
a.2 Supplementary Results for Ablation Study
Appendix B Cases
|ConceptFlow||good [rgb]0.5,0,0.7luck , i ’m sure there are some good things in the [rgb]0,0,0.7house .|
|GPT-2 (conv)||i ’m sure they do n’t care about their kids .|
|GPT-2 (conv)||i think he ’ll be fine . he ’s a great player , but he ’s not a great player .|
|ConceptFlow||he ’s not going to [rgb]0.5,0.5,0give up the [rgb]0.5,0,0.7money , but the media will just ignore him .|
|Post||there ’s an [rgb]0.5,0.5,0offline [rgb]0.5,0.5,0mode for [rgb]0.5,0.5,0steam , and it [rgb]0.5,0.5,0works [rgb]0.5,0.5,0fine for me .|
|GPT-2 (conv)||i ’m not sure if it ’s offline or not , but i ’ve never had a problem with it .|
|ConceptFlow||you ’re right . it ’s not [rgb]0.5,0.5,0offline . you ’re going to have to [rgb]0.5,0,0.7wait for the next update .|
|CCM||i think he ’s just going to be a [rgb]0,0,0.7great player .|
|GPT-2 (conv)||i ’m sure he ’ll be fine . he ’s a great player , but he ’s not a top 10 player .|
|Post||no i ’ve never [rgb]0.5,0.5,0felt like i had [rgb]0.5,0.5,0friends [rgb]0.5,0.5,0whoops|
|CCM||i ’m sorry , i [rgb]0,0,0.7feel you .|
|ConceptFlow||you ’re gonna have to [rgb]0,0,0.7call your [rgb]0,0,0.7friend !|
|GPT-2 (conv)||i ’ve been playing with a lot of different music and it ’s always been the same .|
|ConceptFlow||i did n’t even notice the [rgb]0.5,0.5,0natural [rgb]0,0,0.7sound . i ’ll [rgb]0.5,0,0.7check it out !|
|GPT-2 (conv)||i think they are . they are a club that has been in the top 4 for a long time .|
Appendix C concept Selection
With the concept graph growing, more concepts are considered and a concept selector is needed for concept filter. The concept selector aims to select related two-hop concepts based on the sum of attention scores for each time over entire two-hop concepts:
where is the -th time decoder output representation and denotes the concept ’s embedding. Then top- concepts are reserved to construct the two-hop concept graph with central concept graph. Moreover, central concepts are all reversed because of the high correlation to the conversation topic and acceptable computation complexity.
Appendix D Agreement of Human Evaluation
For human evaluation, 100 cases with four responses from CCM, GPT-2 (conv), ConceptFlow and Golden Response are sampled and listed in an Excel file with randomly sort. A group of human judges are asked to score each response with 1 to 4 based on the quality of appropriateness and informativeness respectively, without knowing any clues of the source of response, thus the impartiality and objectivity of the evaluation can be guaranteed.
To further demonstrate the consistency among human judges, the agreement of human evaluation for CCM, GPT-2 (conv) and ConceptFlow are presented in Table 9. For each case, the result scores from two baseline models is compared with ConceptFlow and is divided into three categories: win, tie and loss. Then human evaluation agreement is indicated by Fleiss’ Kappa. All agreement values fall into the fair level of agreement, which confirms the quality of human evaluation.
Appendix E Data Study
|Concept||Concept Number||Coverage Ratio||Coverage Number|
To determine the gown deep of concept graph for conversation generation, some statistics are presented in Table 10. The two-hop deep concept graph covers more than 61% golden concepts appearing in the response with acceptable computational efficiency. With growing to the three-hop, the number of concepts is increased dramatically with only about one extra golden concept for each case, thus the outer concept ends in two-hop concepts because of the close connection with the topic and the endurable computation complexity.