Stories exhibit structure at multiple levels. While existing language models can generate stories with good local coherence, they struggle to coalesce individual phrases into coherent plots or even maintain consistency of the characters throughout the story. One reason for this failure is that classical language models generate the whole story at the word level, which makes it difficult to capture the high-level interactions between the plot points.
To address this, we investigate novel decompositions of the story generation process that break down the problem into a series of easier coarse-to-fine generation problems. These decompositions can offer three advantages:
They allow more abstract representations to be generated first, in which challenging long-range dependencies may be more apparent.
They allow specialized modelling techniques for the different stages, which exploit the structure of the specific sub-problem.
They are applicable to any textual dataset and require no manual labelling.
, but it is not well understood which properties characterize a good decomposition. We therefore implement and evaluate several representative approaches based on keyword extraction, sentence compression, and summarization.
We build on this understanding to devise the proposed decomposition (Figure 1). Our approach breaks down the generation process in three steps: modelling the action sequence, the narrative, and then entities (such as story characters). To model action sequences, we first generate the predicate-argument structure of the story by generating a sequence of verbs and arguments. This representation is more structured than free text, making it easier for the model learn dependencies across events. To model entities, we initially generate a version of the story where different mentions of the same entity are replaced with placeholder tokens. Finally, we re-write these tokens into different references for the entity, based on both its previous mentions and global story context.
The models are trained on a large dataset of 300k stories, and we evaluate quality both in terms of human judgments and using automatic metrics. We find that our novel approach leads to significantly better story generation. Specifically, we show that generating the action sequence first makes the model less prone to generating generic events, leading to a much greater diversity of verbs. We also find that by using sub-word modelling for the entities, our model can produce novel names for locations and characters that are appropriate given the story context.
The crucial challenge of long story generation lies in maintaining coherence across a large number of generated sentences—in terms of both the logical flow of the story, and the characters and entities. While there has been much recent progress in left-to-right text generation, particularly using self-attentive architectures Dai et al. (2018); Liu et al. (2018), we find that models still struggle to maintain coherence to produce interesting stories on par with human writing. We therefore introduce strategies to decompose neural story generation into coarse-to-fine steps to make modelling high-level dependencies easier.
2.1 Tractable Decompositions
In general, we can decompose the generation process by converting a story into a more abstract representation . The negative log likelihood of the decomposed problem is given by
We can generate from this model by first sampling from and then sampling from . However, the marginalization over is in general intractable, except in special cases where every can only be generated by a single (for example, if the transformation removed all occurrences of certain tokens). Instead, we minimize a variational upper bound of the loss by constructing a deterministic posterior , where can be given by running semantic role labeller or coreference resolution system on . Put together, we optimize the following loss:
This approach allows models and to be trained tractably and separately.
2.2 Model Architectures
We build upon the convolutional sequence-to-sequence architecture Gehring et al. (2017). Deep convolutional networks are used as the encoder and decoder, connected with an attention module Bahdanau et al. (2015) that performs a weighted sum of the encoder output. The decoder uses a gated multi-head self-attention mechanism Vaswani et al. (2017); Fan et al. (2018) to allow the model to refer to previously generated words and improve the ability to model long-range context.
2.3 Modelling Action Sequences
To decompose a story into a structured form that emphasizes logical sequences of actions, we use Semantic Role Labeling (SRL). SRL identifies predicates and arguments in sentences, and assigns each argument a semantic role. This representation abstracts over different ways of expressing the same semantic content. For example, John ate the cake and the cake that John ate would receive identical semantic representations.
Conditioned upon the prompt, we generate an SRL decomposition of the story by concatenating the predicates and arguments identified by a pretrained model He et al. (2017); Tan et al. (2018)222for predicate identification, we use https://github.com/luheng/deep_srl, for SRL given predicates, we use https://github.com/XMUNLP/Tagger and separating sentences with delimiter tokens. We place the predicate verb first, followed by its arguments in canonical order. To focus on the main narrative, we retain only core arguments.
Verb Attention Mechanism
SRL parses are more structured than free text, allowing scope for more structured models. To encourage the model to consider sequences of verbs, we designate one of the heads of the decoder’s multihead self-attention to be a verb-attention head (see Figure 2
). By masking the self-attention appropriately, this verb-attention head can only attend to previously generated verbs. When the text does not yet have a verb, the model attends to a padding token. We show that focusing on verbs with a specific attention head generates a more diverse array of verbs and reduces repetition in generation.
2.4 Modelling Entities
The challenges of modelling characters throughout a story is twofold: first, entities such as character names are rare tokens, which make them hard to model for neural language models. Human stories often feature novel character or location names. Second, maintaining the consistency of a specific set of characters is difficult, as the same entity may be referenced by many different strings throughout a story—for example Bilbo Baggins, he, and the hobbit may refer to the same entity. It is challenging for existing language models to track which words refer to which entity purely from a language modelling objective.
We address both problems by first generating a form of the story with different mentions of the same entity replaced by a placeholder token (e.g. ent0), similar to Hermann et al. (2015). We then use a sub-word seq2seq model trained to replace each mention with a reference, based on its context. The sub-word model is better equipped to model rare words and the placeholder tokens make maintining consistency easier.
2.4.1 Generating Entity Anonymized Stories
We explore two approaches to identifying and clustering entities:
NER Entity Anonymization
: We use a named entity recognition (NER) model333Specifically, Spacy: https://spacy.io/api/entityrecognizer, en_core_web_lg to identify all people, organizations, and locations. We replace these spans with placeholder tokens (e.g. ent0). If any two entity mentions have an identical string, we replace them with the same placeholder. For example, all mentions of Bilbo Baggins will be abstracted to the same entity token, but Bilbo would be a separate abstract entity.
Coreference-based Entity Anonymization: The above approach cannot detect different mentions of an entity that use different strings. Instead, we use the Coreference Resolution model from Lee et al. (2018)444https://github.com/kentonl/e2e-coref to identify clusters of mentions. All spans in the same cluster are then replaced with the same entity placeholder string. Coreference models do not detect singleton mentions, so we also replace non-coreferent named entities with unique placeholders.
2.4.2 Generating Entity References in a Story
We train models to replace placeholder entity mentions with the correct surface form, for both NER-based and coreference-based entity anonymised stories. Both our models use a seq2seq architecture that generates an entity reference based on its placeholder and the story. To better model the specific challenges of entity generation, we also make use of a pointer mechanism and sub-word modelling.
Generating multiple consistent mentions of rare entity names is challenging. To make it easier for the model to re-use previous names for an entity, we augment the standard seq2seq decoder with a pointer-copy mechanism Vinyals et al. (2015). To generate an entity reference, the decoder can either generate a new entity string or choose to copy an already generated entity reference, which encourages the model to use consistent naming for the entities.
To train the pointer mechanism, the final hidden state of the model
is used as input to a classifier.
is a fixed dimension parameter vector. When the model classifier predicts to copy, the previously decoded entity token with the maximum attention value is copied. One head of the decoder multi-head self-attention mechanism is used as the pointer attention head, to allow the heads to specialize.
Entities are often rare or novel words, so word-based vocabularies can be inadequate. We compare entity generation using word-based, byte-pair encoding (BPE) Sennrich et al. (2015), and character-level models.
NER-based Entity Reference Generation
Here, each placeholder string should map onto one (possibly multiword) surface form—e.g. all occurrences of the placeholder ent0 should map only a single string, such as Bilbo Baggins. We train a simple model that maps a combination placeholder token and story (with anonymized entities) to the surface form of the placeholder. While the placeholder can appear multiple times, we only make one prediction for each placeholder as they all correspond to the same string.
Coreference-based Entity Reference Generation
Generating entities based on coreference clusters is more challenging than for our NER entity clusters, because different mentions of the same entity may use different surface forms. We generate a separate reference for each mention by adding the following inputs to the above model:
A bag-of-words context window around the specific entity mention, which allows local context to determine if an entity should be a name, pronoun or nominal reference.
Previously generated references for the same entity placeholder. For example, if the model is filling in the third instance of ent0, it receives that the previous two generations for ent0 were Bilbo, him. Providing the previous entities allows the model to maintain greater consistency between generations.
3 Experimental Setup
We use the WritingPrompts dataset555Dataset download of 300k story premises paired with long stories. Stories are on average 734 words, making the generation significantly longer compared to related work on storyline generation. In this work, we focus on the prompt to story generation aspect of this task. We follow the previous preprocessing of limiting stories to 1000 words and fixing the vocabulary size to 19,025 for prompts and 104,960 for stories.
We compare our results to the Fusion model from Fan et al. (2018) which generates the full story directly from the prompt. We also implement various decomposition strategies as baselines:
Summarization: We propose a new baseline that generates a summary conditioned upon the prompt and then a story conditioned upon the summary. Story summaries are obtained with a multi-sentence summarization modelWu et al. (2019) trained on full-text CNN-Dailymail 666Dataset download and applied to stories.
Keyword Extraction: We generate a series of keywords conditioned upon the prompt and then a story conditioned upon the keywords, based on Yao et al. (2019). Following Yao et al, we extract keywords with the rake algorithm Rose et al. (2010)777https://pypi.org/project/rake-nltk/. Yao et al. extract one word per sentence, but we find that extracting keyword phrases per story worked well as our stories are much longer.
Sentence Compression: Inspired by Xu et al. (2018), we generate a story with compressed sentences conditioned upon the prompt and then a story conditioned upon the compressed shorter story. We use the same deletion-based compression data as Xu et al., from Filippova and Altun (2013)888Dataset download. We train a seq2seq model to compress all non-dialog story sentences. The compressed sentences are concatenated to generate the compressed story.
|SRL Action Plan||2.72||3.95|
|NER Entity Anonymization||3.32||4.75|
We suppress the generation of unknown tokens. We require stories to be at least 150 words and cut off the story at the nearest sentence for stories longer than 250 words (to ease human evaluation). We generate stories with temperature 0.8 and random top sampling, where next words are sampled from the top candidates rather than the entire vocabulary distribution.
4.1 Comparing Decomposition Strategies
We compare the relative difficulty of modelling through each decomposition strategy by measuring the log loss of the different stages in Table 1. We observe that generating the SRL structure has a lower negative log-likelihood and so is much easier than generating either summaries, keywords, or compressed sentences — a benefit of its more structured form. We find keyword generation is especially difficult as the identified keywords are often the more salient, rare words appearing in the story, which are challenging for neural seq2seq models. This suggests that rare words should appear mostly at the last levels of the decomposition. Finally, we compare models with entity-anonymized stories as an intermediate representation, either with NER-based or coreference-based entity anonymization. Entity references are then filled using a word-based model101010To make likelihoods are comparable across models.. The entity fill is the more difficult stage.
To compare overall story quality using various decomposition strategies, we conduct human evaluation. Judges marked which story they prefer from 2 choices. 100 stories are evaluated for each model by 3 judges.
Figure 5 shows that human evaluators prefer our novel decompositions over a carefully tuned Fusion model from Fan et al. (2018) by about 60% in a blind comparison. We see additive gains from modelling actions and entities.
In a second study, evaluators compared baselines against stories generated by our strongest model, which uses SRL-based action plans and coreference-based entity anonymization. In all cases, our full decomposition is preferred.
Qualitatively, we find that many poor generations result from mistakes in early stages. Subsequent models were not exposed to errors during training, so are not able to recover.
4.2 Effect of SRL Decomposition
Human-written stories feature a wide variety of events, while neural models are plagued by generic generations and repetition.
Table 2 quantifies model performance on two metrics to assess action diversity: (1) the number of unique verbs generated, averaged across all stories (2) the percentage of diverse verbs, measured by the percent of all verbs generated in the test set that are not one of the top 5 most frequent verbs. A higher percentage indicates more diverse events.111111We identify verbs using Spacy: https://spacy.io/
Our decomposition using the SRL predicate-argument structure improves the model’s ability to generate diverse verbs. Adding verb attention leads to further improvement. Qualitatively, the model can often outline clear action sequences, as shown in Figure 6. However, all models remain far from matching the diversity of human stories.
|Model||# Unique Verbs||% Diverse Verbs|
|First Mentions||Subsequent Mentions|
|Model||Rank 10||Rank 50||Rank 100||Rank 10||Rank 50||Rank 100|
|Left story context||59.1||49.6||33.3||62.9||53.2||49.4|
|Model||# Unique Entities|
|SRL + NER Entity Anonymization||2.16|
|SRL + Coreference Anonymization||1.59|
4.3 Comparing Entity Reference Models
We explored a variety of different ways to generate the full text of abstracted entities—using different amounts of context, and different granularities of subword generation. To compare these models, we calculated their accuracy at predicting the correct reference in Table 3. Each model evaluates different entities in the test set, 1 real and randomly sampled distractors. Models must give the true mention the highest likelihood. We analyze accuracy on the first mention of an entity, an assessment of novelty, and subsequent references, which measures coherence.
Effect of Sub-word Modelling
Table 3 shows that modelling a character-level vocabulary for entity generation significantly outperforms BPE and word-based models, because of the diversity of entity names. This result highlights a key advantage of multi-stage modelling: it allows the use specialized modelling techniques for each sub-task.
Effect of Additional Context
Entity references should be contextual. Firstly, names must be appropriate for the story setting—Bilbo Baggins is more appropriate for a fantasy novel than one set in the present day. Subsequent references to the character may be briefer, depending on context—for example, he is more likely to be referred to as he or Bilbo than his full name in the next sentence.
We compare three models ability to name entities based on context (using the coreference-anonymization): a model that does not receive the story, a model that uses only leftward context (as in Clark et al. (2018)), and the full story. We show in Table 3 that having access to the full story provides the best performance. Having no access to the story significantly decreases ranking accuracy, even though the model still receives the local context window of the entity as input. The left story context model performs significantly better, but looking at the complete story provides additional gains. We note that full-story context can only be provided in a multi-stage generation approach.
Figure 7 shows examples of entity naming in three stories of different genres. The models adapt to the context—for example generating The princess and The Queen when the context includes monarchy.
|Model||# Coref Chains||Unique Names per Chain|
|SRL + NER Entity Anonymization||4.09||2.49|
|SRL + Coreference Anonymization||4.27||3.15|
4.4 Effect of Entity Anonymization
To understand the effectiveness of the entity generation models, we examine their performance by analyzing generation diversity.
Diversity of Entity Names
Human-written stories often contain many diverse, novel names for people and places. However, these tokens are rare and subsequently difficult for standard neural models to generate. Table 4 shows that the fusion model and baseline decomposition strategies generate very few unique entities in each story. Generated entities are often generic names such as John.
In contrast, our proposed decompositions generate substantially more unique entities. We found that using coreference resolution for entity anonymization led to fewer unique entity names than generating the names independently. This result can be explained by the coreference-based model re-using previous names more frequently, and also it using more pronouns.
Coherence of Entity Clusters
Well structured stories will refer back to previously mentioned characters and events in a consistent manner. To evaluate if the generated stories have these characteristics, we examine the coreference properties in Table 5. We quantify the average number of coreference clusters and the diversity of entities within each cluster (e.g. the cluster Bilbo, he, the hobbit is more diverse than the cluster he, he, he).
Our full model produces more non-singleton coreference chains, suggesting greater coherence, and also gives different mentions of the same entity more diverse names. However, both numbers are still lower than for human generated stories, indicating potential for future work.
Figure 8 displays a sentence constructed to require the generation of an entity as the final word. The fusion model does not perform any implicit coreference to associate the allergy with his dog. In contrast, coreference entity fill produces a high quality completion.
5 Related Work
5.1 Story Generation with Planning
Story generation by first composing a plan has been explored using many different techniques. Traditional approaches organized sequences of character actions using hand crafted models Riedl and Young (2010); Porteous and Cavazza (2009). Recent work has extended this to modelling story events Martin et al. (2017); Mostafazadeh et al. (2016), plot graphs Li et al. (2013), or generate by conditioning upon sequences of images Huang et al. (2016) or descriptions Jain et al. (2017).
We build on previous work that decomposes generation into separate steps. For example, Xu et al. (2018)
learn a story skeleton extraction model and a generative model conditioned upon the skeleton, using reinforcement learning to train the two together.Zhou et al. (2018) train a storyline extraction model for news article generation, but require additional supervision from manually annotated storylines. Yao et al. (2019) use the rake Rose et al. (2010) algorithm to extract storylines, and condition upon the storyline to write the full story using dynamic and static schemas that govern if the storyline is allowed to change during the writing process.
5.2 Entity Language Models
An outstanding challenge in text generation is modelling and tracking entities through a document. Centering (Grosz et al., 1995) gives a theoretical account of how referring expressions for entities are chosen in discourse context. Named entity recognition has been incorporated into language models since at least Gotoh et al. (1999), and has been shown to improve domain adaptation Liu and Liu (2007). Language models have been extended to model entities based on additional information, such as entity type Parvez et al. (2018). Recent work has incorporated learning representations of entities and other unknown words Kobayashi et al. (2017), as well as explicitly model entities by dynamically updating these representations Ji et al. (2017). Dynamic updates to entity representations are used in other story generation models Clark et al. (2018).
We proposed an effective method for writing short stories by separating the generation of actions and entities. We show through human evaluation and automated metrics that our novel decomposition significantly improves story quality.
- Bahdanau et al. (2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. of ICLR.
- Clark et al. (2018) Elizabeth Clark, Yangfeng Ji, and Noah A Smith. 2018. Neural text generation in stories using entity representations as context. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 2250–2260.
- Dai et al. (2018) Zihang Dai, Zhilin Yang, Yiming Yang, William W Cohen, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2018. Transformer-xl: Language modeling with longer-term dependency.
- Fan et al. (2018) Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.
Filippova and Altun (2013)
Katja Filippova and Yasemin Altun. 2013.
Overcoming the lack of parallel data in sentence compression.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.
- Gehring et al. (2017) Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional Sequence to Sequence Learning. In Proc. of ICML.
- Gotoh et al. (1999) Yoshihiko Gotoh, Steve Renals, and Gethin Williams. 1999. Named entity tagged language models. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, volume 1, pages 513–516. IEEE.
- Grosz et al. (1995) Barbara J Grosz, Scott Weinstein, and Aravind K Joshi. 1995. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203–225.
- He et al. (2017) Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep semantic role labeling: What works and what’s next. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
- Hermann et al. (2015) Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, pages 1693–1701.
- Huang et al. (2016) Ting-Hao Kenneth Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, et al. 2016. Visual storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1233–1239.
- Jain et al. (2017) Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban Laha, and Karthik Sankaranarayanan. 2017. Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501.
- Ji et al. (2017) Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, and Noah A Smith. 2017. Dynamic entity representations in neural language models. arXiv preprint arXiv:1708.00781.
- Kobayashi et al. (2017) Sosuke Kobayashi, Naoaki Okazaki, and Kentaro Inui. 2017. A neural language model for dynamically representing the meanings of unknown words and entities in a discourse. arXiv preprint arXiv:1709.01679.
- Lee et al. (2018) Kenton Lee, Luheng He, and Luke Zettlemoyer. 2018. Higher-order coreference resolution with coarse-to-fine inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 687–692.
- Li et al. (2013) Boyang Li, Stephen Lee-Urban, George Johnston, and Mark Riedl. 2013. Story generation with crowdsourced plot graphs.
- Liu and Liu (2007) Feifan Liu and Yang Liu. 2007. Unsupervised language model adaptation incorporating named entity information. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 672–679.
- Liu et al. (2018) Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. arXiv preprint arXiv:1801.10198.
- Martin et al. (2017) Lara J Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, and Mark O Riedl. 2017. Event representations for automated story generation with deep neural nets. arXiv preprint arXiv:1706.01331.
- Mostafazadeh et al. (2016) Nasrin Mostafazadeh, Alyson Grealish, Nathanael Chambers, James Allen, and Lucy Vanderwende. 2016. Caters: Causal and temporal relation scheme for semantic annotation of event structures. In Proceedings of the Fourth Workshop on Events, pages 51–61.
- Parvez et al. (2018) Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2018. Building language models for text with named entities. arXiv preprint arXiv:1805.04836.
- Porteous and Cavazza (2009) Julie Porteous and Marc Cavazza. 2009. Controlling narrative generation with planning trajectories: the role of constraints. In Joint International Conference on Interactive Digital Storytelling, pages 234–245. Springer.
Riedl and Young (2010)
Mark O Riedl and Robert Michael Young. 2010.
Narrative planning: Balancing plot and character.
Journal of Artificial Intelligence Research, 39:217–268.
- Rose et al. (2010) Stuart Rose, Dave Engel, Nick Cramer, and Wendy Cowley. 2010. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory.
- Sennrich et al. (2015) Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
- Tan et al. (2018) Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, and Xiaodong Shi. 2018. Deep semantic role labeling with self-attention. In AAAI Conference on Artificial Intelligence.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In Proc. of NIPS.
- Vinyals et al. (2015) Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700.
- Wu et al. (2019) Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, and Michael Auli. 2019. Pay less attention with lightweight and dynamic convolutions. In International Conference on Learning Representations.
- Xu et al. (2018) Jingjing Xu, Yi Zhang, Qi Zeng, Xuancheng Ren, Xiaoyan Cai, and Xu Sun. 2018. A skeleton-based model for promoting coherence among sentences in narrative story generation. arXiv preprint arXiv:1808.06945.
- Yao et al. (2019) Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, and Rui Yan. 2019. Plan-and-write: Towards better automatic storytelling. In Association for the Advancement of Artificial Intelligence.
- Zhou et al. (2018) Deyu Zhou, Linsen Guo, and Yulan He. 2018. Neural storyline extraction model for storyline generation from news articles. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 1727–1736.