In the past few years, the field of text generation witnesses many significant advances, including but not limited to neural machine translationVaswani et al. (2017); Gehring et al. (2017), dialogue systems Liu et al. (2018); Zhang et al. (2019) and text generation Clark et al. (2018); Guo et al. (2018). By utilizing the power of the sequence-to-sequence (S2S) framework Sutskever et al. (2014), generation models can predict the next token based on the previous generated outputs and contexts. However, S2S models are not perfect. One of the obvious drawbacks is that S2S models tend to be short-sighted on long context and are unaware of global knowledge. Therefore, how to incorporate global or local knowledge into S2S models has been a long-standing research problem.
There lie two different directions to include knowledge into S2S models. On the one hand, many efforts Zhang et al. (2018); Guan et al. (2019); Li et al. (2019b) have been taken to address the short-sighted problem in text generation by explicitly modeling unstructured context. Nevertheless, these approaches rely heavily on the quality and scale of unstructured context, and become intractable when applying to scenarios where the context length increases drastically (e.g., commenting a full-length novel). On the other hand, researchers (Beck et al., 2018; Marcheggiani and Perez-Beltrachini, 2018; Li et al., 2019a) try to combine knowledge with S2S by employing pre-processed structured data (e.g., knowledge graph) which naturally avoid the difficulty for context length. However, those models are oriented to static knowledge, and hence hardly model events where temporal knowledge evolution occurs.
The dynamic knowledge evolution is very common in full-length novels. In a novel, a knowledge graph can be constructed by using entities (characters, organizations, locations etc.) as vertices together with the entity relations as edges. Obviously, a single static knowledge graph is hardly to represent the dynamic story line full of dramatic changes. For example, a naughty boy can grow up into a hero, friends may become lovers, etc. In this paper, we proposes Evolutionary Knowledge Graph (EKG) which contains a series of sub-graph for each time step. Figure 1 illustrates EKG for the novel “Fights Break Sphere”. At three different scenes, “Yanxiao”, the leading role of the novel, is characterized as “proud boy”, “weak fighter”, and “magic master” separately. At the same time, the relation between “Yanxiao” and “Xuner” is evolved from friend into lovers, and finally get married, and the relation between “Yanxiao” and “Nalanyanran” is changed over time as “engagementdivorcefriend”.
EKG is important for commenting passage of novels since it is the dramatic evolution and comparison in the storyline but not the static facts resonate the readers most. As illustrated in Figure 1. When commenting passage sampled from the -th chapter of a novel, the user-A refers to a historical fact that ”Nalanyanran” has abandoned “Yanxiao”, while the user-B refers to the future relation between “Yanxian” and a related entity “Xuner” that they will go through difficulties together. In this paper, EKG is trained within a multi-task framework to represent the latent dynamic context, and then a novel graph-to-sequence model is designed to select the relevant context from EKG for comment generation.
1.1 Related Work
used the graph neural networks to solve the AMR-to-text problem.Bastings et al. (2017) and Zhao et al. (2018) utilized graph convolutional networks to incorporate syntactic structure into neural attention-based encoder-decoder models for machine translation. In comment generation, Graph2Seq (Li et al., 2019a) is proposed to generate comments by modeling the input news as a topic graph. These methods are using static graph, and did not involve the dynamic knowledge evolution.
Recently, more research attention has been focused on dynamic knowledge modeling. Taheri et al. (2019) used gated graph neural networks to learn the temporal dynamics of an evolving graph for dynamic graph classification. Trivedi et al. (2017, 2019); Kumar et al. (2019) learned evolving entity representations over time for dynamic link prediction. Unlike the EKG in this paper, they did not model the embeddings of the relations between dynamic entities. Iyyer et al. (2016)
proposed an unsupervised deep learning algorithm to model the dynamic relationship between characters in a novel without considering the entity embedding. Unlike these methods, our EKG-based model represents the temporal evolution of entities and relations simultaneously by learning their temporal embeddings, and hence has an advantage in supporting text generation tasks.
To our knowledge, few studies make use of evolutionary knowledge graph for text generation. This may due to the lack of datasets involving dynamic temporal evolution. We observed that novel commenting need to understand long context full of dramatic changes, and hence build such a dataset by collecting full-length novels and real user comments. The dataset with its EKG will be made publicly available, and more details can be found in Section 2.
The main contributions of our work are three-fold:
We build a new dataset to facilitate the research of evolutionary knowledge based text generation.
We propose a multi-task framework for the learning of evolutionary knowledge graph to model the long and dynamic context.
We propose a novel graph-to-sequence model to incorporate evolutionary knowledge graph for text generation.
2 Dataset Development and Evolutionary Knowledge Graph Building
To facilitate the research of modeling knowledge evolution for text generation, we build a dataset called GraphNovel by collecting full-length novels and real user comments. Together with the corresponding EKG embeddings, we will make the dataset public available soon. We detail the collection of the dataset below.
2.1 Data collection
The data is collected from well-known Chinese novel websites. To increase the diversity of data, top-1000 hottest novels are crawled with different types including science fiction, fantasy, action, romance, historical, and so on. Then we filter out novels due to the following 3 considerations: 1) the number of chapters is less than 10, 2) few entities are mentioned and 3) lack of user comments. Each remained novel includes chapters in chronological order, a set of user-underlined passages, and user comments for the passages.
Then, we use the lexical analysis tool (Jiao et al., 2018) to recognize three types of entities (persons, organizations, locations) from each novel. Due to many of the nickname in novels, the identified entities from the tool contains much noise. To improve the knowledge quality, human annotators are asked to verify the entities, and add missing ones. Then all the paragraphs containing mentions of two entities are identified, and will later serve as a representation of the entity relations at that specific time step.
As for the highlighted novel passage and user comments, three criteria are used to select high quality and informative data: 1) The selected passage must contain at least one entity; 2) the selected passage must be commented by at least three users; 3) comments related to the same passage are ranked according to the upvotes, and the bottom 20% are dropped. Notably, those highlighted passages have a degree of redundancy because users tend to highlight and comment passages at very similar positions. Thus we merge passages which have more than 50% overlapping rate. This operation can effectively reduce the quantity of passages by 30%.
2.2 Core Statistics
The dataset contains 203 novels, 349,695 highlighted passages and 3,136,210 comments totally. Due to diverse genre of novels in our dataset, the number of entities and relations per novel varies widely with range [, ]. And the number of comments per passage changes a lot with a range of [, ], because it depends on how interesting the corresponding passage is.
We partitioned the dataset into non-overlapping training, validation, and test portions, along novels (See Table 1 for detailed statistics). Five most relevant comments for each passage in validation and test set are selected by human annotators. While the comments in train set are all preserved in order to ensure its flexible use.
|Avg. # entities||383.4||720.6||281.7|
|Avg. # relations||9,013||1,9919||7,084|
2.3 Build up Evolutionary Knowledge Graph
Then for each novel, we build up a knowledge graph which consists of a sequence of sub-graphs. Obviously, it is sensible to build up a sub-graph for each import scene of the novel, and build up more sub-graphs around the critical transitions in the stroryline. In this paper, we assume each chapter usually represents an integral scene, and hence build a sub-graph for each chapter of the novel.
Then for each chapter, the entities being mentioned in the chapter are the vertices of the corresponding sub-graph. And if a paragraph consists two of the entities, an edge is created between the two entities. In such a way, a sequence of sub-graphs are constructed, and form our EKG. In the next section, we will formulate the embedding computation of the EKG, and its application for comment generation.
3 Model Formalism
In this section, we formulate our approach in details. First, the training of EKG embedding is presented. Then a graph-to-sequence model is shown to utilize the EKG for comment generation. The architecture of the model is shown in Figure 2.
For a novel with entities and relations, define a global evolutionary knowledge graph: , where is the number of chapters111We will cluster successive chapters into a longer one if they are too short. (or time periods); , is a temporal knowledge graph of the chapter ; is the set of vertices and is the set of edges between two vertices.
Given a passage from the chapter with entities and relations, a local EKG related to it can is a sub-graph of the global EKG: , where ; is a subset of with size of and is a subset of with size of . Then the local EKG of the passage is a sequence of local temporal knowledge sub-graphs with vertex embeddings and edge embeddings.
3.2 EKG Embedding Training
Inspired by the consistent state-of-the-art performance in language understanding tasks, we use off-the-shelf Chinese BERT model (Devlin et al., 2019) to calculate the initial semantic representation of sentences. Considering the fact that entities are either out-of-vocabulary or associated with special semantics within the novel context, we propose the following algorithm to jointly learn entity and relation embeddings in EKG:
Vertex embedding learning.
The passages containing mentions of entity will contribute to learn the embedding of . Specifically, the -th passage is tokenized while the entity mention is masked with token “[MASK]”. Then resulted tokens are fed into the pre-trained Chinese BERT model, and is obtained as the output feature corresponding to the mask token. Within the chapter , there exists sentences containing vertices . The embedding of
is learned by optimizing the following softmax loss summation which models the probabilities to predict the masked entities as.
where is learnable parameter and denotes the embedding of from the chapter . Usually the semantic representations of entities change smoothly over time, so we propose a temporally smoothed softmax loss to retain the similarity of entity embeddings from successive time periods:
where , and are smooth factors; and only valid probabilities are included when or . Then the overall loss for all time periods and all vertices is:
Edge embedding learning.
Since the number of relations equal the number of co-occurrence for any two entities, it is infeasible to employ an embedding matrix to model the relation. Therefore, a Relation Network (RN) is proposed to learn the edge embeddings in the TKGs as shown in Figure 3. Specifically, the RN takes two vertex embeddings as input, and feed them into the first hidden layer to obtain the embedding of the edge . Then the embeddings of two vertices and one edge are concatenated and fed into second hidden layer to reconstruct the sentence. We also use the pre-trained BERT (Devlin et al., 2019) to obtain the representation for the whole sentence. The is taken from the final hidden state corresponding to “[CLS]” token because it aggregates sequence representation.
A reconstruction loss applied to the network is optimized to jointly learn RN and edge embeddings:
where stands for a pair of vertices; represents a positive pair with two vertex related to the edge, and represents a negative pair with one vertex related to the edge and the other unrelated. The overall loss for learning edge embedding is:
Combining the vertex and edge loss above, our final multi-task loss is:
is a hyperparameter to be tuned.
3.3 Graph-to-sequence modeling
After EKG embedding learning, we propose a graph encoder to utilize the embeddings of the EKG for comment generation. From the learned
, we can obtain vector sequencesfor each vertex, and for each edge. All these sequences are fed into Bi-LSTM to integrate information from all time periods. Then final representation of the vertices and edges are taken from the final hidden state of Bi-LSTM corresponding to the time step .
Further, our graph encoder employs graph convolutional networks (Zhou et al., 2018) to aggregate the structured knowledge from the EKG and then is combined into a widely used encoder-decoder framework (Vaswani et al., 2017) for generation.
our graph encoder is based on the implementation of GAT (Veličković et al., 2018). The input to a single GAT layer is a set of vertices and edge features, denoted as , and , where is the number of vertices, and is the number of edges from the passage. The layer produces a new set of vertex features, as its output.
In order to aggregate the structured knowledge from input features and transform them into higher-level features, we then perform self-attention on both the vertices and the edges to compute attention coefficients
where is learnable parameter; and are mapping functions; and indicate the importance of neighbor features to vertex ; they are normalized using softmax.
Once obtained, the normalized attention coefficients are used to compute a linear combination of the features corresponding to them, to serve as the final output features for every vertex:
Then the graph encoder is combined into the encoder-decoder framework (Vaswani et al., 2017), in which a self-attention based encoder is used to encode the passage. To aggregate structured knowledge, the encoding of all vertices from graph encoder are concatenated with the output of the passage encoder, and fed into a Transformer decoder for text generation. The whole graph-to-sequence model is trained to minimize the negative log-likelihood of the related comment.
In this section, we first introduce the experimental details, and then present results from automatic and human evaluations.
We train all the models using training set and tune hyperparamters based on validation set. The automatic and human evaluations are carried out based on test set. During the training of the EKG, we first learn the vertex embeddings and fix them during the subsequent training of edge embeddings. Our GAT-based graph encoder is based on the entities from the passage. we keep entities for each passage: if the number of entities222Note that the number will not be zero because the passage contains as least one entity in our dataset. is smaller than , we use breadth-first searching on the global graph to fill the gap, otherwise we filter the low-order entities according to entity frequency. We set to 5 which is selected by validation. Label smoothing is used in the smoothed softmax loss. We denote our full model as “EKG+GAT(V+E)”, its variant which only use the first term in Equ. 10 as “EKG+GAT(V)” and name the other variant “EKG”, which does not use GAT-based graph encoder and feeds the encoding of vertices into the Transformer decoder directly.
In our model, we set , , for the smoothed softmax loss; set to 0.0 for the reconstruction loss and to 1.0 for the multi-task loss; the number of self-attention layers in our passage encoder is 6; the number of Bi-LSTMs is 2 and the length of its hidden state is 768; and the GAT-based graph encoder has two layers. To stabilize the training, we use Adam optimizer (Kingma and Ba, 2014) and follow the learning rate strategy in Klein et al. (2017) by increasing the learning rate linearly during the first steps for warming-up and then decaying it exponentially. For inference, the maximum length of decoding is 50; beam searching is used with beam size 4 for all the models.
4.3 Evaluation metrics
we use both automatic metrics and human evaluations to evaluate the quality of generated novel comments.
Automatic metrics: 1) BLEU is commonly employed in evaluating translation systems. It is also introduced into comment generation task (Qin et al., 2018; Yang et al., 2019). we use 333https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl to calculate the BLEU score. 2) ROUGE-L((Lin, 2004)) uses longest common subsequence to calculate the similar score between candidates and references. For calculation, we use a python package called 444https://pypi.org/project/pyrouge/. These metrics also support the multi-reference evaluations on our dataset.
Human evaluations: 1) Relevance: This metric evaluates how relevant is the comment to the passage. It measures the degree that the comment is about the main storyline of the novel.2) Fluency: This metric evaluates whether the sentence is fluent and judges whether the sentence follows the grammar and whether the sentence has clear logic. 3) Informativeness: This metric evaluates how much structured knowledge the comment contains. It measures whether the comment reflects the evolution of entities and relations, or is just a general description that can be used for many passages. All these metrics have three gears, the final scores are projected to 03.
4.4 Baseline Models
We describe three kinds of models used as baselines. All the baselines are implemented according to the related works and tuned on the validation set.
|Seq2Seq(Qin et al., 2018)||2.59||14.71||0.12||1.71||0.09|
|Attn(Qin et al., 2018)||3.71||16.44||0.34||1.70||0.33|
|Trans(Vaswani et al., 2017)||6.11||19.21||0.57||1.62||0.58|
|Trans.+CTX(Zhang et al., 2018)||6.52||19.11||0.68||1.68||0.67|
|Graph2Seq(Li et al., 2019a)||4.93||16.91||0.35||1.69||0.31|
|Graph2Seq++(Li et al., 2019a)||5.56||17.51||0.85||1.67||0.60|
Seq2Seq models (Qin et al., 2018): those models generate comments for news either from the title or the entire article. Considering there are no titles in our dataset, We compare two kinds of models from their work. 1) Seq2Seq: it is a basic sequence-to-sequence model (Sutskever et al., 2014) that generate comments from the passage; 2) Attn: sequence-to-sequence model with an attention mechanism(Bahdanau et al., 2014)
. For the input of the attention model, we append the related entities to the back of the passage.
Self-attention models: our model includes a graph encoder to encode knowledge from graph, and a passage encoder use multiple self-attention layers. To show the power of graph encoder, we use the encoder-decoder framework (Trans.) (Vaswani et al., 2017) for passage-based comparison. Also we introduce an improved Transformer (Zhang et al., 2018) with a context encoder to represent document-level context and denote it Trans.+CTX. For the context input, we use up to 512 tokens before the passage as context.
Graph2Seq (Li et al., 2019a): this is a graph-to-sequence model that builds a static topic graph for the input and generates comments based on representations of entities only. A two-layer transformer encoder is used in their work. For fair comparison, we use 6-layer transformer encoder to replace the original and denote the new model as Graph2Seq++.
4.5 Evaluation results
Table 2 shows the results of both automatic metrics and human evaluations.
In automatic metrics, our proposed model has best BLEU and ROUGE-L scores. For BLEU, our full model EKG+GAT(V+E) achieves 7.01 score, which is 0.59 higher than that of the best baseline Trans.+CTX. The Graph2Seq++ has a BLEU score 5.56 which is obviously lower than the EKG+GAT(V+E). The main reason is that the Graph2Seq++ is based on static graph and cannot make use of the dynamic knowledge. For Rouge-L, our models all have ROUGE-L scores higher than 20%, and is 0.79% slightly better than that of Trans., which is the best among all baselines; the ROUGE-L score of Trans. is higher than of the Trans.+CTX, which is opposite of that in BLEU.
In human evaluations, we randomly select 100 passages from the test set and run all the models in Table 2 to generate respective comments. We also provide one user comment for each passage to get evaluations of human performance. All these passage-comment pairs are labeled by human annotators. In relevance metric, our full model EKG+GAT(V+E) has better relevance score than all the baselines. It means our model can generate more relevant comments and better reflect the main storyline of the novel. For all this, there still exists significant gaps when compared to the human performance. In fluency and informativeness metrics, our EKG+GAT(V+E) model has achieves higher score compared to all baselines. It illustrates that the generated comments by our proposed model are more fluent and contains more attractive information.
|(Facing the enemy, Zeng made up his mind to commit suicide as soon as the city broke, following Ta Qibu and Luo Zenan.)|
4.6 Analysis and Discussion
we compare the results of EKG, EKG+GAT(V), and EKG+GAT(V+E). The EKG, which does not use graph encoder, can achieve 6.59 BLEU score, which is 1.03 higher than that of Graph2Seq++. Then the BLEU score can be further improved to 6.72 by introducing a vertex-only variant EKG+GAT(V). Comparing EKG+GAT(V) and EKG+GAT(V+E) to the EKG, the BLEU scores increase 0.13 and 0.42 respectively; it indicates the usefulness of the graph encoder and that the evolutionary knowledge from edges can be treated as a good supplement to that of vertices. In human evaluations, EKG+GAT(V+E) and EKG+GAT(V) have higher relevance and informativeness scores than that of the EKG. It also indicates that the graph encoder can effectively utilize the evolutionary knowledge from vertices and edges, and make the generated comments more relevant and informative.
Analysis of the number of entities:
the corresponding local EKG is constructed based on the entities from the passage. To explore the influence of the number () of entities, we report BLEU scores of our full model based on different number555We do not report the BLEU score of the full model when because there are no edges included. in Figure 4(a). The best BLEU score is achieved at . The BLEU score at belongs to the Transformer(Trans.). And our full model is robust to the number of entities because the BLEU scores are stable when N is in the range .
Analysis of the number of time periods:
we also report the BLEU scores under the different number of time periods in Figure 4(b). Our full model achieves the best BLEU score of 7.01 at , which is 0.41 higher than that of the static graph at . It illustrates that the dynamic knowledge is useful for improving the performance.
we provide a case study here. Four passages that need to be commented are extracted from a novel chronologically and shown in Table 3. For comparison, we use Trans.+CTX and Graph2Seq++, which have the best relevance and informativeness scores among baselines respectively. To start with, within each case, we find that the generated comments from our model are more informative, while the generated outputs from other models tend to be general or common replies, which proves the effectiveness of our knowledge usage.
From another perspective, we observe that our model can make use of the dynamics of knowledge. Let us take a look at P3, our generated comment describes that Zeng Guofan was born in the south of the country, which is in accordance with the passage described in P1. Similar interactions can be found in all four cases, which support our claims above.
In this paper, we propose to encode evolutionary knowledge for automatic commenting long novels. We learn an Evolutionary Knowledge Graph under a multi-task framework and then design a graph-to-sequence model to utilize the EKG for generating comments. In addition, we collect a new generation dataset called GraphNovel to advance the corresponding research. Experimental results show that our EKG-based model is superior to several strong baselines on both automatic metrics and human evaluations. In the future, we plan to develop new graph-based encoders to generate personalized comments with this dataset.
- Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. ArXiv, 1409.
Bastings et al. (2017)
Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil
encoders for syntax-aware neural machine translation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1957–1967, Copenhagen, Denmark. Association for Computational Linguistics.
- Beck et al. (2018) Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. Graph-to-sequence learning using gated graph neural networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 273–283, Melbourne, Australia. Association for Computational Linguistics.
- Clark et al. (2018) Elizabeth Clark, Yangfeng Ji, and Noah A. Smith. 2018. Neural text generation in stories using entity representations as context. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2250–2260, New Orleans, Louisiana. Association for Computational Linguistics.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Gehring et al. (2017)
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin.
Convolutional sequence to sequence learning.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1243–1252. JMLR. org.
Guan et al. (2019)
Jian Guan, Yansen Wang, and Minlie Huang. 2019.
Story ending generation with incremental encoding and commonsense
Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6473–6480.
- Guo et al. (2018) Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence.
- Guo et al. (2019) Zhijiang Guo, Yan Zhang, Zhiyang Teng, and Wei Lu. 2019. Densely connected graph convolutional networks for graph-to-sequence learning. Transactions of the Association for Computational Linguistics, 7:297–312.
- Iyyer et al. (2016) Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daumé III. 2016. Feuding families and former Friends: Unsupervised learning for dynamic fictional relationships. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1534–1544, San Diego, California. Association for Computational Linguistics.
- Jiao et al. (2018) Zhenyu Jiao, Shuqi Sun, and Ke Sun. 2018. Chinese lexical analysis with deep bi-gru-crf network. arXiv preprint arXiv:1807.01882.
- Kingma and Ba (2014) Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. International Conference on Learning Representations.
- Klein et al. (2017) Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, pages 67–72, Vancouver, Canada. Association for Computational Linguistics.
- Kumar et al. (2019) Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pages 1269–1278, New York, NY, USA. ACM.
- Li et al. (2019a) Wei Li, Jingjing Xu, Yancheng He, ShengLi Yan, Yunfang Wu, and Xu Sun. 2019a. Coherent comments generation for Chinese articles with a graph-to-sequence model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4843–4852, Florence, Italy. Association for Computational Linguistics.
- Li et al. (2019b) Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, Qian Li, and Jie Zhou. 2019b. Incremental transformer with deliberation decoder for document grounded conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 12–21, Florence, Italy. Association for Computational Linguistics.
- Lin (2004) Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Liu et al. (2018) Shuman Liu, Hongshen Chen, Zhaochun Ren, Yang Feng, Qun Liu, and Dawei Yin. 2018. Knowledge diffusion for neural dialogue generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1489–1498, Melbourne, Australia. Association for Computational Linguistics.
- Marcheggiani and Perez-Beltrachini (2018) Diego Marcheggiani and Laura Perez-Beltrachini. 2018. Deep graph convolutional encoders for structured data to text generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 1–9, Tilburg University, The Netherlands. Association for Computational Linguistics.
- Qin et al. (2018) Lianhui Qin, Lemao Liu, Wei Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, and Shuming Shi. 2018. Automatic article commenting: the task and dataset. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 151–156, Melbourne, Australia. Association for Computational Linguistics.
- Song et al. (2018) Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. A graph-to-sequence model for AMR-to-text generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1616–1626, Melbourne, Australia. Association for Computational Linguistics.
- Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3104–3112. Curran Associates, Inc.
- Taheri et al. (2019) Aynaz Taheri, Kevin Gimpel, and Tanya Berger-Wolf. 2019. Learning to represent the evolution of dynamic graphs with recurrent models. In Companion Proceedings of The 2019 World Wide Web Conference, WWW ’19, pages 301–307, New York, NY, USA. ACM.
- Trivedi et al. (2017) Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. 2017. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 3462–3471. JMLR.org.
- Trivedi et al. (2019) Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. 2019. Dyrep: Learning representations over dynamic graphs. In International Conference on Learning Representations.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
- Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations.
- Yang et al. (2019) Ze Yang, Can Xu, wei wu, and zhoujun li. 2019. Read, attend and comment: A deep architecture for automatic news comment generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5076–5088, Hong Kong, China. Association for Computational Linguistics.
- Zhang et al. (2018) Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, and Yang Liu. 2018. Improving the transformer translation model with document-level context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 533–542, Brussels, Belgium. Association for Computational Linguistics.
- Zhang et al. (2019) Zheng Zhang, Minlie Huang, Zhongzhou Zhao, Feng Ji, Haiqing Chen, and Xiaoyan Zhu. 2019. Memory-augmented dialogue management for task-oriented dialogue systems. ACM Transactions on Information Systems (TOIS), 37(3):34.
- Zhao et al. (2018) Guoshuai Zhao, Jun Yu Li, Lu Wang, Xueming Qian, and Yun Fu. 2018. Graphseq2seq: Graph-sequence-to-sequence for neural machine translation.
- Zhou et al. (2018) Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. CoRR, abs/1812.08434.