Learning to Encode Evolutionary Knowledge for Automatic Commenting Long Novels

04/21/2020 ∙ by Canxiang Yan, et al. ∙ Tencent 0

Static knowledge graph has been incorporated extensively into sequence-to-sequence framework for text generation. While effectively representing structured context, static knowledge graph failed to represent knowledge evolution, which is required in modeling dynamic events. In this paper, an automatic commenting task is proposed for long novels, which involves understanding context of more than tens of thousands of words. To model the dynamic storyline, especially the transitions of the characters and their relations, Evolutionary Knowledge Graph(EKG) is proposed and learned within a multi-task framework. Given a specific passage to comment, sequential modeling is used to incorporate historical and future embedding for context representation. Further, a graph-to-sequence model is designed to utilize the EKG for comment generation. Extensive experimental results show that our EKG-based method is superior to several strong baselines on both automatic and human evaluations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the past few years, the field of text generation witnesses many significant advances, including but not limited to neural machine translation

Vaswani et al. (2017); Gehring et al. (2017), dialogue systems Liu et al. (2018); Zhang et al. (2019) and text generation Clark et al. (2018); Guo et al. (2018). By utilizing the power of the sequence-to-sequence (S2S) framework Sutskever et al. (2014), generation models can predict the next token based on the previous generated outputs and contexts. However, S2S models are not perfect. One of the obvious drawbacks is that S2S models tend to be short-sighted on long context and are unaware of global knowledge. Therefore, how to incorporate global or local knowledge into S2S models has been a long-standing research problem.

Figure 1: An example from a novel called “Fights Break Sphere”. The relations between Yanxiao, Xuner and Nalanyanran are evolutionary. And the characteristic of Yanxiao changes over time.

There lie two different directions to include knowledge into S2S models. On the one hand, many efforts Zhang et al. (2018); Guan et al. (2019); Li et al. (2019b) have been taken to address the short-sighted problem in text generation by explicitly modeling unstructured context. Nevertheless, these approaches rely heavily on the quality and scale of unstructured context, and become intractable when applying to scenarios where the context length increases drastically (e.g., commenting a full-length novel). On the other hand, researchers (Beck et al., 2018; Marcheggiani and Perez-Beltrachini, 2018; Li et al., 2019a) try to combine knowledge with S2S by employing pre-processed structured data (e.g., knowledge graph) which naturally avoid the difficulty for context length. However, those models are oriented to static knowledge, and hence hardly model events where temporal knowledge evolution occurs.

The dynamic knowledge evolution is very common in full-length novels. In a novel, a knowledge graph can be constructed by using entities (characters, organizations, locations etc.) as vertices together with the entity relations as edges. Obviously, a single static knowledge graph is hardly to represent the dynamic story line full of dramatic changes. For example, a naughty boy can grow up into a hero, friends may become lovers, etc. In this paper, we proposes Evolutionary Knowledge Graph (EKG) which contains a series of sub-graph for each time step. Figure 1 illustrates EKG for the novel “Fights Break Sphere”. At three different scenes, “Yanxiao”, the leading role of the novel, is characterized as “proud boy”, “weak fighter”, and “magic master” separately. At the same time, the relation between “Yanxiao” and “Xuner” is evolved from friend into lovers, and finally get married, and the relation between “Yanxiao” and “Nalanyanran” is changed over time as “engagementdivorcefriend”.

EKG is important for commenting passage of novels since it is the dramatic evolution and comparison in the storyline but not the static facts resonate the readers most. As illustrated in Figure 1. When commenting passage sampled from the -th chapter of a novel, the user-A refers to a historical fact that ”Nalanyanran” has abandoned “Yanxiao”, while the user-B refers to the future relation between “Yanxian” and a related entity “Xuner” that they will go through difficulties together. In this paper, EKG is trained within a multi-task framework to represent the latent dynamic context, and then a novel graph-to-sequence model is designed to select the relevant context from EKG for comment generation.

1.1 Related Work

graph-to-sequence model has been proposed for text generation. Song et al. (2018),  Beck et al. (2018), and Guo et al. (2019)

used the graph neural networks to solve the AMR-to-text problem. 

Bastings et al. (2017) and  Zhao et al. (2018) utilized graph convolutional networks to incorporate syntactic structure into neural attention-based encoder-decoder models for machine translation. In comment generation, Graph2Seq (Li et al., 2019a) is proposed to generate comments by modeling the input news as a topic graph. These methods are using static graph, and did not involve the dynamic knowledge evolution.

Recently, more research attention has been focused on dynamic knowledge modeling. Taheri et al. (2019) used gated graph neural networks to learn the temporal dynamics of an evolving graph for dynamic graph classification. Trivedi et al. (2017, 2019); Kumar et al. (2019) learned evolving entity representations over time for dynamic link prediction. Unlike the EKG in this paper, they did not model the embeddings of the relations between dynamic entities.  Iyyer et al. (2016)

proposed an unsupervised deep learning algorithm to model the dynamic relationship between characters in a novel without considering the entity embedding. Unlike these methods, our EKG-based model represents the temporal evolution of entities and relations simultaneously by learning their temporal embeddings, and hence has an advantage in supporting text generation tasks.

To our knowledge, few studies make use of evolutionary knowledge graph for text generation. This may due to the lack of datasets involving dynamic temporal evolution. We observed that novel commenting need to understand long context full of dramatic changes, and hence build such a dataset by collecting full-length novels and real user comments. The dataset with its EKG will be made publicly available, and more details can be found in Section 2.

The main contributions of our work are three-fold:

  • We build a new dataset to facilitate the research of evolutionary knowledge based text generation.

  • We propose a multi-task framework for the learning of evolutionary knowledge graph to model the long and dynamic context.

  • We propose a novel graph-to-sequence model to incorporate evolutionary knowledge graph for text generation.

2 Dataset Development and Evolutionary Knowledge Graph Building

To facilitate the research of modeling knowledge evolution for text generation, we build a dataset called GraphNovel by collecting full-length novels and real user comments. Together with the corresponding EKG embeddings, we will make the dataset public available soon. We detail the collection of the dataset below.

2.1 Data collection

The data is collected from well-known Chinese novel websites. To increase the diversity of data, top-1000 hottest novels are crawled with different types including science fiction, fantasy, action, romance, historical, and so on. Then we filter out novels due to the following 3 considerations: 1) the number of chapters is less than 10, 2) few entities are mentioned and 3) lack of user comments. Each remained novel includes chapters in chronological order, a set of user-underlined passages, and user comments for the passages.

Then, we use the lexical analysis tool (Jiao et al., 2018) to recognize three types of entities (persons, organizations, locations) from each novel. Due to many of the nickname in novels, the identified entities from the tool contains much noise. To improve the knowledge quality, human annotators are asked to verify the entities, and add missing ones. Then all the paragraphs containing mentions of two entities are identified, and will later serve as a representation of the entity relations at that specific time step.

As for the highlighted novel passage and user comments, three criteria are used to select high quality and informative data: 1) The selected passage must contain at least one entity; 2) the selected passage must be commented by at least three users; 3) comments related to the same passage are ranked according to the upvotes, and the bottom 20% are dropped. Notably, those highlighted passages have a degree of redundancy because users tend to highlight and comment passages at very similar positions. Thus we merge passages which have more than 50% overlapping rate. This operation can effectively reduce the quantity of passages by 30%.

2.2 Core Statistics

The dataset contains 203 novels, 349,695 highlighted passages and 3,136,210 comments totally. Due to diverse genre of novels in our dataset, the number of entities and relations per novel varies widely with range [, ]. And the number of comments per passage changes a lot with a range of [, ], because it depends on how interesting the corresponding passage is.

Figure 2: Architecture of our model. First, the EKG is trained under a multi-task learning framework. Then a graph-to-sequence model is traind to utilize the EKG for text generation.

We partitioned the dataset into non-overlapping training, validation, and test portions, along novels (See Table 1 for detailed statistics). Five most relevant comments for each passage in validation and test set are selected by human annotators. While the comments in train set are all preserved in order to ensure its flexible use.

train valid test
# novels 173 10 20
# passages 324,803 7,976 16,916
# comments 3,011,750 39,880 84,580
Avg. length
  of context
520,571 305,492 277,847
Avg. # entities 383.4 720.6 281.7
Avg. # relations 9,013 1,9919 7,084
Avg. # comments
  per passage
9.3 5.0 5.0
Avg. # entities
  per passage
2.6 3.1 3.4
Avg. # relations
  per passage
4.4 9.2 11.0
Table 1: Statistics of Dataset.

2.3 Build up Evolutionary Knowledge Graph

Then for each novel, we build up a knowledge graph which consists of a sequence of sub-graphs. Obviously, it is sensible to build up a sub-graph for each import scene of the novel, and build up more sub-graphs around the critical transitions in the stroryline. In this paper, we assume each chapter usually represents an integral scene, and hence build a sub-graph for each chapter of the novel.

Then for each chapter, the entities being mentioned in the chapter are the vertices of the corresponding sub-graph. And if a paragraph consists two of the entities, an edge is created between the two entities. In such a way, a sequence of sub-graphs are constructed, and form our EKG. In the next section, we will formulate the embedding computation of the EKG, and its application for comment generation.

3 Model Formalism

In this section, we formulate our approach in details. First, the training of EKG embedding is presented. Then a graph-to-sequence model is shown to utilize the EKG for comment generation. The architecture of the model is shown in Figure 2.

3.1 Definition

For a novel with entities and relations, define a global evolutionary knowledge graph: , where is the number of chapters111We will cluster successive chapters into a longer one if they are too short. (or time periods); , is a temporal knowledge graph of the chapter ; is the set of vertices and is the set of edges between two vertices.

Given a passage from the chapter with entities and relations, a local EKG related to it can is a sub-graph of the global EKG: , where ; is a subset of with size of and is a subset of with size of . Then the local EKG of the passage is a sequence of local temporal knowledge sub-graphs with vertex embeddings and edge embeddings.

3.2 EKG Embedding Training

Inspired by the consistent state-of-the-art performance in language understanding tasks, we use off-the-shelf Chinese BERT model (Devlin et al., 2019) to calculate the initial semantic representation of sentences. Considering the fact that entities are either out-of-vocabulary or associated with special semantics within the novel context, we propose the following algorithm to jointly learn entity and relation embeddings in EKG:

Vertex embedding learning.

The passages containing mentions of entity will contribute to learn the embedding of . Specifically, the -th passage is tokenized while the entity mention is masked with token “[MASK]”. Then resulted tokens are fed into the pre-trained Chinese BERT model, and is obtained as the output feature corresponding to the mask token. Within the chapter , there exists sentences containing vertices . The embedding of

is learned by optimizing the following softmax loss summation which models the probabilities to predict the masked entities as

.

(1)
(2)

where is learnable parameter and denotes the embedding of from the chapter . Usually the semantic representations of entities change smoothly over time, so we propose a temporally smoothed softmax loss to retain the similarity of entity embeddings from successive time periods:

(3)

where , and are smooth factors; and only valid probabilities are included when or . Then the overall loss for all time periods and all vertices is:

(4)

Edge embedding learning.

Since the number of relations equal the number of co-occurrence for any two entities, it is infeasible to employ an embedding matrix to model the relation. Therefore, a Relation Network (RN) is proposed to learn the edge embeddings in the TKGs as shown in Figure 3. Specifically, the RN takes two vertex embeddings as input, and feed them into the first hidden layer to obtain the embedding of the edge . Then the embeddings of two vertices and one edge are concatenated and fed into second hidden layer to reconstruct the sentence. We also use the pre-trained BERT (Devlin et al., 2019) to obtain the representation for the whole sentence. The is taken from the final hidden state corresponding to “[CLS]” token because it aggregates sequence representation.

A reconstruction loss applied to the network is optimized to jointly learn RN and edge embeddings:

(5)

where stands for a pair of vertices; represents a positive pair with two vertex related to the edge, and represents a negative pair with one vertex related to the edge and the other unrelated. The overall loss for learning edge embedding is:

(6)

Combining the vertex and edge loss above, our final multi-task loss is:

(7)

where

is a hyperparameter to be tuned.

Figure 3: Relation Network with a reconstruction loss.The edge embedding is shown in red color.

3.3 Graph-to-sequence modeling

After EKG embedding learning, we propose a graph encoder to utilize the embeddings of the EKG for comment generation. From the learned

, we can obtain vector sequences

for each vertex, and for each edge. All these sequences are fed into Bi-LSTM to integrate information from all time periods. Then final representation of the vertices and edges are taken from the final hidden state of Bi-LSTM corresponding to the time step .

Further, our graph encoder employs graph convolutional networks (Zhou et al., 2018) to aggregate the structured knowledge from the EKG and then is combined into a widely used encoder-decoder framework (Vaswani et al., 2017) for generation.

our graph encoder is based on the implementation of GAT (Veličković et al., 2018). The input to a single GAT layer is a set of vertices and edge features, denoted as , and , where is the number of vertices, and is the number of edges from the passage. The layer produces a new set of vertex features, as its output.

In order to aggregate the structured knowledge from input features and transform them into higher-level features, we then perform self-attention on both the vertices and the edges to compute attention coefficients

(8)
(9)

where is learnable parameter; and are mapping functions; and indicate the importance of neighbor features to vertex ; they are normalized using softmax.

Once obtained, the normalized attention coefficients are used to compute a linear combination of the features corresponding to them, to serve as the final output features for every vertex:

(10)

Then the graph encoder is combined into the encoder-decoder framework (Vaswani et al., 2017), in which a self-attention based encoder is used to encode the passage. To aggregate structured knowledge, the encoding of all vertices from graph encoder are concatenated with the output of the passage encoder, and fed into a Transformer decoder for text generation. The whole graph-to-sequence model is trained to minimize the negative log-likelihood of the related comment.

4 Experiment

In this section, we first introduce the experimental details, and then present results from automatic and human evaluations.

4.1 Details

We train all the models using training set and tune hyperparamters based on validation set. The automatic and human evaluations are carried out based on test set. During the training of the EKG, we first learn the vertex embeddings and fix them during the subsequent training of edge embeddings. Our GAT-based graph encoder is based on the entities from the passage. we keep entities for each passage: if the number of entities222Note that the number will not be zero because the passage contains as least one entity in our dataset. is smaller than , we use breadth-first searching on the global graph to fill the gap, otherwise we filter the low-order entities according to entity frequency. We set to 5 which is selected by validation. Label smoothing is used in the smoothed softmax loss. We denote our full model as “EKG+GAT(V+E)”, its variant which only use the first term in Equ. 10 as “EKG+GAT(V)” and name the other variant “EKG”, which does not use GAT-based graph encoder and feeds the encoding of vertices into the Transformer decoder directly.

4.2 Hyperparameters

In our model, we set , , for the smoothed softmax loss; set to 0.0 for the reconstruction loss and to 1.0 for the multi-task loss; the number of self-attention layers in our passage encoder is 6; the number of Bi-LSTMs is 2 and the length of its hidden state is 768; and the GAT-based graph encoder has two layers. To stabilize the training, we use Adam optimizer (Kingma and Ba, 2014) and follow the learning rate strategy in Klein et al. (2017) by increasing the learning rate linearly during the first steps for warming-up and then decaying it exponentially. For inference, the maximum length of decoding is 50; beam searching is used with beam size 4 for all the models.

4.3 Evaluation metrics

we use both automatic metrics and human evaluations to evaluate the quality of generated novel comments.

Automatic metrics: 1) BLEU is commonly employed in evaluating translation systems. It is also introduced into comment generation task (Qin et al., 2018; Yang et al., 2019). we use 333https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl to calculate the BLEU score. 2) ROUGE-L((Lin, 2004)) uses longest common subsequence to calculate the similar score between candidates and references. For calculation, we use a python package called 444https://pypi.org/project/pyrouge/. These metrics also support the multi-reference evaluations on our dataset.

Human evaluations: 1) Relevance: This metric evaluates how relevant is the comment to the passage. It measures the degree that the comment is about the main storyline of the novel.2) Fluency: This metric evaluates whether the sentence is fluent and judges whether the sentence follows the grammar and whether the sentence has clear logic. 3) Informativeness: This metric evaluates how much structured knowledge the comment contains. It measures whether the comment reflects the evolution of entities and relations, or is just a general description that can be used for many passages. All these metrics have three gears, the final scores are projected to 03.

4.4 Baseline Models

We describe three kinds of models used as baselines. All the baselines are implemented according to the related works and tuned on the validation set.

Model BLEU ROUGE-L Relevance Fluency Informativeness
Seq2Seq(Qin et al., 2018) 2.59 14.71 0.12 1.71 0.09
Attn(Qin et al., 2018) 3.71 16.44 0.34 1.70 0.33
Trans(Vaswani et al., 2017) 6.11 19.21 0.57 1.62 0.58
Trans.+CTX(Zhang et al., 2018) 6.52 19.11 0.68 1.68 0.67
Graph2Seq(Li et al., 2019a) 4.93 16.91 0.35 1.69 0.31
Graph2Seq++(Li et al., 2019a) 5.56 17.51 0.85 1.67 0.60
EKG 6.59 20.00 0.81 1.83 0.64
EKG+GAT(V) 6.72 20.09 0.88 1.77 0.70
EKG+GAT(V+E) 7.01 20.10 0.89 1.74 0.75
Human Performance 100 100 1.09 1.85 1.04
Table 2: Automatic metrics and human evaluations.
  • Seq2Seq models (Qin et al., 2018): those models generate comments for news either from the title or the entire article. Considering there are no titles in our dataset, We compare two kinds of models from their work. 1) Seq2Seq: it is a basic sequence-to-sequence model (Sutskever et al., 2014) that generate comments from the passage; 2) Attn: sequence-to-sequence model with an attention mechanism(Bahdanau et al., 2014)

    . For the input of the attention model, we append the related entities to the back of the passage.

  • Self-attention models: our model includes a graph encoder to encode knowledge from graph, and a passage encoder use multiple self-attention layers. To show the power of graph encoder, we use the encoder-decoder framework (Trans.(Vaswani et al., 2017) for passage-based comparison. Also we introduce an improved Transformer (Zhang et al., 2018) with a context encoder to represent document-level context and denote it Trans.+CTX. For the context input, we use up to 512 tokens before the passage as context.

  • Graph2Seq (Li et al., 2019a): this is a graph-to-sequence model that builds a static topic graph for the input and generates comments based on representations of entities only. A two-layer transformer encoder is used in their work. For fair comparison, we use 6-layer transformer encoder to replace the original and denote the new model as Graph2Seq++.

4.5 Evaluation results

Table 2 shows the results of both automatic metrics and human evaluations.

In automatic metrics, our proposed model has best BLEU and ROUGE-L scores. For BLEU, our full model EKG+GAT(V+E) achieves 7.01 score, which is 0.59 higher than that of the best baseline Trans.+CTX. The Graph2Seq++ has a BLEU score 5.56 which is obviously lower than the EKG+GAT(V+E). The main reason is that the Graph2Seq++ is based on static graph and cannot make use of the dynamic knowledge. For Rouge-L, our models all have ROUGE-L scores higher than 20%, and is 0.79% slightly better than that of Trans., which is the best among all baselines; the ROUGE-L score of Trans. is higher than of the Trans.+CTX, which is opposite of that in BLEU.

In human evaluations, we randomly select 100 passages from the test set and run all the models in Table 2 to generate respective comments. We also provide one user comment for each passage to get evaluations of human performance. All these passage-comment pairs are labeled by human annotators. In relevance metric, our full model EKG+GAT(V+E) has better relevance score than all the baselines. It means our model can generate more relevant comments and better reflect the main storyline of the novel. For all this, there still exists significant gaps when compared to the human performance. In fluency and informativeness metrics, our EKG+GAT(V+E) model has achieves higher score compared to all baselines. It illustrates that the generated comments by our proposed model are more fluent and contains more attractive information.

P1: 这人家姓曾,住在县城以南一百三十里外的荷叶塘都。
(This family, surnamed Zeng, lives in Heyetangdu, 130 miles south of the county.)
T1: 原来是这么来的。 (That’s how it turned out.)
G1: 这个地方呀! (This is the place !)
E1: 他一直思念着家里人。 (He has been missing his family.)
 [P2]
P2: 国藩今日乃戴孝之身,老母并未安葬妥帖,怎忍离家出山?
(Today Guofan is wearing mourning. My mother hasn’t been buried yet. How can I leave home ?)
T2: 真是一个聪明人 (He is so clever.)
G2: 老太太也是个好人 (The old lady is also a good person.)
E2: 这个时候的国家已经有了变化 (The country is changing at this time.)
 [P3]
P3: 面临大敌,曾暗自下定决心,一旦城破,立即自刎,追随塔齐布、罗泽南于地下。
(Facing the enemy, Zeng made up his mind to commit suicide as soon as the city broke, following Ta Qibu and Luo Zenan.)
T3: 这就是原来战争的样子
(This is what the war looks like.)
G3: 一个人的命运总是如此
(One’s destiny is always like this.)
E3: 自立于南城,自破而立 (He established himself in the south of the country throught constant breakthroughs.)
 [P1]
P4: 曾国藩的脸上露出一丝浅浅的笑意,头一歪,倒在太师椅上.
(Zeng Guofan smiled slightly. His head tilted and fell on the chair.)
T4: 这一段描写真的很有画面感 (This description is really picturesque.)
G4: 这个人的心思缜密 (This man is very thoughtful.)
E4: 他一生忠君为国,就这样走了 (He was loyal to his country all his life; he is gone.)
 [P3]
Table 3: Comments generated by Trans.+CTX (T), Graph2Seq++ (G) and our EKG+GAT(V+E) (E). The passages (i.e., P1, P2, P3, P4) are extracted from the same novel called Zeng Guofan. We highlight the passage corresponding to the generated comment from our model E with blue color. Moreover, the relevant fragments are marked with a same color.
Figure 4: Ablation results about number of entities (a) and number of time periods (b).

4.6 Analysis and Discussion

Ablation study:

we compare the results of EKG, EKG+GAT(V), and EKG+GAT(V+E). The EKG, which does not use graph encoder, can achieve 6.59 BLEU score, which is 1.03 higher than that of Graph2Seq++. Then the BLEU score can be further improved to 6.72 by introducing a vertex-only variant EKG+GAT(V). Comparing EKG+GAT(V) and EKG+GAT(V+E) to the EKG, the BLEU scores increase 0.13 and 0.42 respectively; it indicates the usefulness of the graph encoder and that the evolutionary knowledge from edges can be treated as a good supplement to that of vertices. In human evaluations, EKG+GAT(V+E) and EKG+GAT(V) have higher relevance and informativeness scores than that of the EKG. It also indicates that the graph encoder can effectively utilize the evolutionary knowledge from vertices and edges, and make the generated comments more relevant and informative.

Analysis of the number of entities:

the corresponding local EKG is constructed based on the entities from the passage. To explore the influence of the number () of entities, we report BLEU scores of our full model based on different number555We do not report the BLEU score of the full model when because there are no edges included. in Figure 4(a). The best BLEU score is achieved at . The BLEU score at belongs to the Transformer(Trans.). And our full model is robust to the number of entities because the BLEU scores are stable when N is in the range .

Analysis of the number of time periods:

we also report the BLEU scores under the different number of time periods in Figure 4(b). Our full model achieves the best BLEU score of 7.01 at , which is 0.41 higher than that of the static graph at . It illustrates that the dynamic knowledge is useful for improving the performance.

Case study:

we provide a case study here. Four passages that need to be commented are extracted from a novel chronologically and shown in Table 3. For comparison, we use Trans.+CTX and Graph2Seq++, which have the best relevance and informativeness scores among baselines respectively. To start with, within each case, we find that the generated comments from our model are more informative, while the generated outputs from other models tend to be general or common replies, which proves the effectiveness of our knowledge usage.

From another perspective, we observe that our model can make use of the dynamics of knowledge. Let us take a look at P3, our generated comment describes that Zeng Guofan was born in the south of the country, which is in accordance with the passage described in P1. Similar interactions can be found in all four cases, which support our claims above.

5 Conclusion

In this paper, we propose to encode evolutionary knowledge for automatic commenting long novels. We learn an Evolutionary Knowledge Graph under a multi-task framework and then design a graph-to-sequence model to utilize the EKG for generating comments. In addition, we collect a new generation dataset called GraphNovel to advance the corresponding research. Experimental results show that our EKG-based model is superior to several strong baselines on both automatic metrics and human evaluations. In the future, we plan to develop new graph-based encoders to generate personalized comments with this dataset.

References

  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. ArXiv, 1409.
  • Bastings et al. (2017) Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima’an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In

    Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

    , pages 1957–1967, Copenhagen, Denmark. Association for Computational Linguistics.
  • Beck et al. (2018) Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. Graph-to-sequence learning using gated graph neural networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 273–283, Melbourne, Australia. Association for Computational Linguistics.
  • Clark et al. (2018) Elizabeth Clark, Yangfeng Ji, and Noah A. Smith. 2018. Neural text generation in stories using entity representations as context. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2250–2260, New Orleans, Louisiana. Association for Computational Linguistics.
  • Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  • Gehring et al. (2017) Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    , pages 1243–1252. JMLR. org.
  • Guan et al. (2019) Jian Guan, Yansen Wang, and Minlie Huang. 2019. Story ending generation with incremental encoding and commonsense knowledge. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    , volume 33, pages 6473–6480.
  • Guo et al. (2018) Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Guo et al. (2019) Zhijiang Guo, Yan Zhang, Zhiyang Teng, and Wei Lu. 2019. Densely connected graph convolutional networks for graph-to-sequence learning. Transactions of the Association for Computational Linguistics, 7:297–312.
  • Iyyer et al. (2016) Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daumé III. 2016. Feuding families and former Friends: Unsupervised learning for dynamic fictional relationships. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1534–1544, San Diego, California. Association for Computational Linguistics.
  • Jiao et al. (2018) Zhenyu Jiao, Shuqi Sun, and Ke Sun. 2018. Chinese lexical analysis with deep bi-gru-crf network. arXiv preprint arXiv:1807.01882.
  • Kingma and Ba (2014) Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. International Conference on Learning Representations.
  • Klein et al. (2017) Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, pages 67–72, Vancouver, Canada. Association for Computational Linguistics.
  • Kumar et al. (2019) Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pages 1269–1278, New York, NY, USA. ACM.
  • Li et al. (2019a) Wei Li, Jingjing Xu, Yancheng He, ShengLi Yan, Yunfang Wu, and Xu Sun. 2019a. Coherent comments generation for Chinese articles with a graph-to-sequence model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4843–4852, Florence, Italy. Association for Computational Linguistics.
  • Li et al. (2019b) Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, Qian Li, and Jie Zhou. 2019b. Incremental transformer with deliberation decoder for document grounded conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 12–21, Florence, Italy. Association for Computational Linguistics.
  • Lin (2004) Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  • Liu et al. (2018) Shuman Liu, Hongshen Chen, Zhaochun Ren, Yang Feng, Qun Liu, and Dawei Yin. 2018. Knowledge diffusion for neural dialogue generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1489–1498, Melbourne, Australia. Association for Computational Linguistics.
  • Marcheggiani and Perez-Beltrachini (2018) Diego Marcheggiani and Laura Perez-Beltrachini. 2018. Deep graph convolutional encoders for structured data to text generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 1–9, Tilburg University, The Netherlands. Association for Computational Linguistics.
  • Qin et al. (2018) Lianhui Qin, Lemao Liu, Wei Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, and Shuming Shi. 2018. Automatic article commenting: the task and dataset. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 151–156, Melbourne, Australia. Association for Computational Linguistics.
  • Song et al. (2018) Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. A graph-to-sequence model for AMR-to-text generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1616–1626, Melbourne, Australia. Association for Computational Linguistics.
  • Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3104–3112. Curran Associates, Inc.
  • Taheri et al. (2019) Aynaz Taheri, Kevin Gimpel, and Tanya Berger-Wolf. 2019. Learning to represent the evolution of dynamic graphs with recurrent models. In Companion Proceedings of The 2019 World Wide Web Conference, WWW ’19, pages 301–307, New York, NY, USA. ACM.
  • Trivedi et al. (2017) Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. 2017. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 3462–3471. JMLR.org.
  • Trivedi et al. (2019) Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. 2019. Dyrep: Learning representations over dynamic graphs. In International Conference on Learning Representations.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
  • Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations.
  • Yang et al. (2019) Ze Yang, Can Xu, wei wu, and zhoujun li. 2019. Read, attend and comment: A deep architecture for automatic news comment generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5076–5088, Hong Kong, China. Association for Computational Linguistics.
  • Zhang et al. (2018) Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, and Yang Liu. 2018. Improving the transformer translation model with document-level context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 533–542, Brussels, Belgium. Association for Computational Linguistics.
  • Zhang et al. (2019) Zheng Zhang, Minlie Huang, Zhongzhou Zhao, Feng Ji, Haiqing Chen, and Xiaoyan Zhu. 2019. Memory-augmented dialogue management for task-oriented dialogue systems. ACM Transactions on Information Systems (TOIS), 37(3):34.
  • Zhao et al. (2018) Guoshuai Zhao, Jun Yu Li, Lu Wang, Xueming Qian, and Yun Fu. 2018. Graphseq2seq: Graph-sequence-to-sequence for neural machine translation.
  • Zhou et al. (2018) Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. CoRR, abs/1812.08434.