Future is not One-dimensional: Graph Modeling based Complex Event Schema Induction for Event Prediction

04/13/2021 ∙ by Manling Li, et al. ∙ NYU college University of Illinois at Urbana-Champaign Virginia Polytechnic Institute and State University 8

Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction either focuses on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce the concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. Additionally, we propose a Temporal Event Graph Model that models the emergence of event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and manually constructed gold schemas. Intrinsic evaluation by schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided event prediction further demonstrates the predictive power of our event graph model, significantly surpassing human schemas and baselines by more than 17.8 HITS@1.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The current automated event understanding task has been overly simplified to be local, sequential and flat. Real world events, such as disease outbreaks and drone bombing, have multiple actors, complex timelines, intertwined relations and multiple possible outcomes. Understanding them requires knowledge in the form of a repository of abstracted event schemas, capturing the progress of time, and performing global inference for event prediction. For example, regarding Ukrainian President Viktor Yanukovych’s refusal to sign a European Union (EU) association agreement in 2013, a typical question from analysts would be “Can you anticipate Russians’ reactions to Ukraine’s decision not to join the EU?” This requires an event understanding system to match events to schema representations and reason about what might happen next. The international conflict

schema would be triggered by “Ukraine declining to join the EU”, and evidence of Russian influence would suggest a Ukrainian revolution, or pro-Russian unrest with respective probabilities.

Comprehending such a news story requires following a timeline, identifying key events and tracking characters. We refer to such a “story” as a complex event. Its complexity comes from the inclusion of multiple events (and their arguments), relations and temporal order. We propose a new task, Probabilistic Temporal Complex Event Schema Induction, to address this challenge. In a Temporal Complex Event Schema, events are partially ordered by temporal relations, and connected through co-referential entities and their relations. Figure 1 shows an example schema with multi-dependency between events. For example, the happening of one event may depend on multiple events and their graph structures, e.g., the Assemble event happens after buying both the bomb materials and the vehicle. Also, there may be multiple events following an event, such as the Attack event in Figure 1. This automatically induced probabilistic complex event schema can be used to forecast event abstractions for the long-term, and thus provide a comprehensive understanding of evolving situations, events, and trends.

Figure 1: The schema example of the complex event type Car-bombing Improvised Explosive Device (IED). A person learned to make bombs and bought bombing materials as well as a car (vehicle). Then the bomb was assembled to the car, and then the attacker drove it to attack (people). People can be hurt by the car, or by the explosion of the bomb, or by the crash of the car.

For each type of complex event, we aim to induce a schema repository that is probabilistic, temporally organized and semantically coherent. Low level primitive components of event schemas are abundant, and can be part of multiple, sparsely occurring, higher-level schemas. We propose a Temporal Event Graph Model, an auto-regressive graph generation model, to reach this goal. Given a currently extracted event graph, we generate the next event type node with its potential arguments, and then propagate edge-aware information following temporal orders. After that, we employ a copy mechanism to generate coreferential arguments, and build relation edges for them. Finally, temporal dependencies are determined with argument connections considered.

Our generative model serves as both a schema library and a predictive model. Specifically, we can probe the model to generate event graphs unconditionally to obtain a set of schemas. We can also pass partially instantiated graphs to the model and “grow” the graph either forward or backward in time to predict missing events, arguments or relations, both from the past and in the future. We propose a set of schema matching metrics to evaluate the induced schemas by comparing with human created schemas and show the power of the probabilistic schema in event type prediction as an extrinsic evaluation. Specifically, in the test phase, we apply our framework to instantiate high-level event schemas, and predict event types that are likely to happen next, or some causal or conditional event types that may have happened in the past.

We make the following novel contributions:

  • [leftmargin=*]

  • This is the first work to induce probabilistic temporal graph schemas for complex events across documents, which capture temporal dynamics and connections among individual events through their coreferential or related arguments.

  • This is the first application of graph generation methods to induce event schemas.

  • This is the first work to use complex event schemas for event type prediction, and also produce multiple hypotheses with probabilities.

  • We have proposed a comprehensive set of metrics for both intrinsic and extrinsic evaluations.

  • We release a new data set of 6,399 documents with manually constructed gold schemas.

2 Problem Formulation

Symbol Meaning
Instance graph of a complex event
Schema graph of a complex event type
Event node in an instance graph
Entity node in an instance graph
Temporal ordering edge between events and , indicating is before
Argument edge, indicating plays argument role in the event
Relation edge between entities and , and is the relation type
Argument role set of event , defined by the IE ontology
The type set of events
The type set of entities
A mapping function from a node to its type
Subgraph of containing events before and their arguments
Table 1: List of symbols

From a set of documents describing a complex event, we construct an instance graph which contains event nodes and entity nodes (argument nodes) . There are three types of edges in this graph: (1) event-event edges connecting events that have direct temporal relations; (2) event-entity edges connecting arguments to the event; and (3) entity-entity edges indicating relations between entities. We extract instance graphs using Information Extraction (IE) system or from IE annotation. In these graphs, all relation edges do not have directions but temporal edges between events are directional.

For each complex event, given a set of instance graphs , the goal of schema induction is to generate a schema library . In each schema graph , the nodes are abstracted to the types of events and entities. Figure 1 is an example of schema for complex event type car-bombing. Schema graphs can be regarded as a summary abstraction of instance graphs, capturing the reoccurring structures.

Figure 2: The generation process of Graph Language Model.

3 Our Approach

3.1 Instance Graph Construction

We use OneIE, a state-of-the-art Information Extraction system Lin et al. (2020), to extract entities, relations and events, and then perform cross-document entity and event coreference resolution Pan et al. (2017); Joshi et al. (2019) over the document cluster of each complex event. We further conduct event-event temporal relation extraction Ning et al. (2019) to determine the order of event pairs.

We construct one event instance graph for each complex event, where coreferential events or entites are merged. We include those events that are involved in temporal relations, or whose arguments are connected through entity coreference links or entity-entity relations. We consider the isolated events as irrelevant nodes in schema induction, so they are excluded from the instance graphs during graph construction. Considering schema graphs focus on type level abstraction, we use type label and node index to represent each node.

3.2 Temporal Event Graph Model Overview

Given an instance graph about the same complex event, we regard the schema as the hidden knowledge to guide the generation of these graphs. To this end, we propose a temporal event graph model that maximizes the probability of each instance graph, parametrized by . At each step, based on the previous graph , we predict one event node and fill its argument roles to generate the next graph ,

We factorize the probability of generating new nodes and edges as:

An event node is generated first according to the probability . We then add argument nodes that are connected to this event. Since we have access to the IE ontology which defines all possible argument roles , we simply instantiate a node for each argument role . We also predict relation between the newly generated node and the existing nodes . After knowing the shared and related arguments, we add a final step to predict the temporal relations between the new event and the existing events .

In the traditional graph generation setting, the order of node generation can be arbitrary. However, in our instance graphs, event nodes are connected through temporal relations. We order events as a directed acyclic graph (DAG). Considering each event may have multiple events both “before” and “after”, we obtain the generation order by traversing the graph using Breadth-First Search.

We also add dummy Start/End event nodes to indicate the starting/ending of the graph generation. At the beginning of the generation process, the graph has a single start event node . We generate to signal the end of the graph at the end. The complexity of our algorithm is where is the number of events in the graph. At each generation step, we add nodes into the current graph, where is the generated event.

3.3 Event Generation

To determine the event type of the newly generated event node , we apply a graph pooling layer over all generated events to get the current graph representation ,

We use bold to denote the latent representations of nodes and edges, which will be initialized as zeros and updated at each generation step via message passing in § 3.4. We adopt a mean-pooling operation in this paper. After that, the event type is predicted through a fully connected layer,

Once we know the event type of , we add all of its arguments in defined in the IE ontology as new entity nodes. For example, in Figure 2, the new event is an Arrest event, so we add three argument nodes for Detainee, Jailor, and Place respectively. The edges between these arguments and event are also added into the graph.

3.4 Edge-Aware Graph Neural Network

We use a Graph Neural Network (GNN) 

Kipf and Welling (2017) to update node embeddings following the graph structure. Before we run the GNN on the graph, we first add virtual edges between the newly generated event and all previous events, and between new entities and previous entities, shown as dashed lines in Figure 2. The virtual edges enable the representations of new nodes to aggregate the messages from previous nodes, which has been proven effective in Liao et al. (2019).

To capture rich semantics of edge types, we pass edge-aware messages during graph propagation. An intuitive way is to encode different edge types with different convolutional filters, which is similar to RGCN Schlichtkrull et al. (2018). However, the number of RGCN parameters grows rapidly with the number of edge types and easily becomes unmanageable given the large number of relation types and argument roles in the IE ontology.222There are 131 edge types according to the fine-grained LDC Schema Learning Ontology.

Instead, we learn a vector representation for each relation type

and argument role . The message passed through each argument edge is:

where denotes concatenation operation. Similarly, the message between two entities and is computed as:

Considering that the direction of the temporal edge is important, we parametrize the message over this edge by assigning two separate weight matrices to the outgoing and incoming vertices:

The messages on virtual edges do not consider edge labels, similar to virtual edges between entity nodes:

We aggregate the messages using edge-aware attention following Liao et al. (2019):333Compared to Liao et al. (2019), we do not use the positional embedding mask because the newly generated nodes have distinct roles.


is the sigmoid function, and MLP contains two hidden layers with ReLU nonlinearities.

The event node representation is then updated using the messages from its local neighbors , similar to entity node representations:

3.5 Coreferential Argument Generation

After updating the node representations, we detect the entity type of each argument, and also predict whether the argument is coreferential to existing entities. Inspired by copy mechanism Gu et al. (2016)

, we classify each argument node

to either a new entity with entity type , or an existing entity node in the previous graph .

where is the generation probability, classifying the new node to its entity type :

The copy probability models selects the coreferential entity from the entities in existing graph, denoted by ,

Here, is the shared normalization term,

If determined to copy, we merge coreferential entities in the graph.

3.6 Entity Relational Edge Generation

In this phase, we determine the virtual edges to be kept and assign relation types to them. We model the relation edge generation probability as a categorical distribution over relation types, and add (Other) to the typeset to represent that there is no relation edge:

We use two hidden layers with ReLU activation functions to implement the MLP.

3.7 Event Temporal Ordering Prediction

To predict the temporal dependencies between the new events and existing events, we connect them through temporal edges. These edges are critical for message passing in predicting the next event. We build temporal edges in the last phase of generation, since it relies on the shared and related arguments. Considering that temporal edges are interdependent, we model the temporal edge generation probability as a mixture of Bernoulli distributions following 

Liao et al. (2019):

where is the number of mixture components. When , the distribution degenerates to factorized Bernoulli, which assumes the independence of each potential temporal edge conditioned on the existing graph.

3.8 Training and Schema Generation

We train the model by optimizing the negative log-likelihood loss,

To compose the schema library for each complex event scenario, we construct instance graphs from related documents to learn a graph model, and then get the schema using greedy generation.

4 Evaluation Benchmark

4.1 Dataset

We conduct experiments on two datasets for both the general scenario and a more specific scenario. We adopt a newly defined fine-grained ontology for Schema Learning,444 TheontologyisreleasedinLDC2020E25 which consists of 24 entity types, 46 relation types, 67 event types, and 85 argument roles.

General Schema Learning Corpus: The Schema Learning Corpus, released by LDC (LDC2020E25), includes 82 types of complex events, such as Disease Outbreak, Presentations and Shop Online. Each complex event is associated with a set of source documents. This data set also includes ground-truth schemas created by LDC annotators, which will be used for our intrinsic evaluation.

Dataset Split #Doc #Event #Argument #Relation
Train 383 6,040 10,720 6,858
General Dev 72 1,044 1,762 1,112
Test 83 1,211 2,112 1,363
Train 5,247 41,672 136,894 122,846
IED Dev 575 4,661 15,404 13,320
Test 577 5,089 16,721 14,054
Table 2: Data statistics of General and IED Schema Learning Corpus and the extracted instance graphs.

IED Schema Learning Corpus: The same type of complex events may have many variants, which depends on the different types of conditions and participants. In order to evaluate our model’s capability at capturing uncertainty and multiple hypotheses, we decide to dive deeper into one scenario and choose the improvised explosive device (IED) as our case study.

We first collect Wikipedia articles that describe 6 types of complex events, i.e., Car-bombing IED, Backpack IED, Drone Strikes IED, Roadside IED, Suicide IED and General IED, and then exploit the external links to collect the corresponding news documents for each complex event type. 555We have manually annotated events for 246 documents, and the annotations will be released as part of our data set.

To create ground-truth schemas for this data set, We present reference instance graphs and the ranked event sequences to annotators to create human schemas. The event sequences are generated by traversing the instance graphs, and are then sorted by frequency and the number of arguments. Human curation focuses on merging and trimming steps by validating them using the reference instance graphs. Also, temporal dependencies between steps are further refined, and coreferential entities and their relations are added during the curation process.

We construct instance graphs following § 3.1 for both data sets, and randomly split them into training, validation and testing sets for each complex event type. Table 2 shows the statistics.

4.2 Schema Matching Evaluation

We compare the generated schemas with the ground truth schemas based on the overlap between them, and propose the following evaluation metrics:

666We cannot use graph matching to compare between baselines and our approach due to the difference in the graph structures being modeled.

Event Match: A good schema must contain the events crucial to the complex event scenario. F-score is used to compute the overlap of event nodes.

Event Temporal Sequence Match: A good schema is able to track events through a timeline. So we obtain event sequences following temporal order, and evaluate F-score on the overlapping sequences.

Event Argument Connection Match: Our complex event graph schema includes entities and their relations and captures how events are connected through arguments, in additional to their temporal order. We categorize these connections into three categories: (1) two events are connected by shared arguments; (2) two events have related arguments, i.e., their arguments are connected through entity relations; (3) there are no direct connections between two events. For every pair of overlapped events, we calculate F-score based on whether these connections are predicted correctly.

4.3 Instance Graph Perplexity Evaluation

To evaluate our temporal event graph model, we compute the instance graph perplexity by predicting the instance graphs in the test set,

We calculate the full perplexity for the entire graph using Equation (1), and event perplexity using only containing event nodes:

The latter emphasizes the importance of correctly predicting events.

4.4 Schema-Guided Event Prediction

To explore schema-guided probabilistic reasoning and prediction, we perform an extrinsic evaluation of event prediction. Different from traditional event prediction tasks, the temporal event graphs contain arguments with relations, and there are type labels assigned to nodes and edges. We create a graph-based event prediction dataset using our testing graphs. The task aims to predict ending events of each graph, i.e., events that have no future events after it. An event is predicted correctly if its event type matches one of the ending events in the graph. Considering that there can be multiple ending events in one instance graph, we rank event type prediction scores and adopt MRR (Mean Reciprocal Rank) and HITS@1 as evaluation metrics.

Model     Event Sequence Match Connection
Dataset Match Match
General Event Language Model 26.62 19.31 7.21 -
Sequential Pattern Mining 19.93 18.81 6.07 -
Event Graph Model 30.12 24.79 9.18 -
   w/o ArgumentGeneration 29.35 22.47 8.21 -
IED Event Language Model 49.15 17.77 5.32 -
Sequential Pattern Mining 47.91 18.39 4.79 5.41
Event Graph Model 59.73 21.51 7.81 10.67
   w/o ArgumentGeneration 55.01 18.24 6.67 -
Table 3: Schema matching score (%) by checking the intersection of induced schemas and annotated schemas.

5 Experiments

5.1 Experiment Setting

Baseline 1: Event Language Model Rudinger et al. (2015); Pichotta and Mooney (2016) is the state-of-the-art event schema induction method. It learns the probability of event temporal sequences, and the event sequences generated from event language model are considered as schemas.

Baseline 2: Sequential Pattern Mining Pei et al. (2001) discovers frequent sequential patterns and encodes arguments and their relations as properties of the pattern. Considering event language model baseline cannot handle multiple arguments and relations, we add sequential pattern mining for comparison. The frequent patterns mined are considered as schemas.

Reference: Human Schema Since human-created schemas are highly accurate but not probabilistic, we want to evaluate its limits at predicting events in the extrinsic task. We match schemas to instances and fill in the matched type.

Ablation Study: Event Graph Model w/o Argument Generation is included as a variant of our temporal event graph model in which we remove argument generation (§3.5 and §3.6). It learns to generate a graph containing only event nodes with their temporal relations.

5.2 Implementation Details

Training Details For our event graph model, the representation dimension is 512, and we use a 2-layer GNN. The value of is 10. For the event language model baseline, instead of using LSTM-based architecture following Pichotta and Mooney (2016), we adopt the state-of-the-art auto-regressive language XLNet (Yang et al., 2019) for fair comparison.777 https://github.com/huggingface We use the default parameter setting of XLNet to train the model, and select the best model on validation set. For sequential pattern mining, we perform random walk, starting from every node in instance graphs and ending at sink nodes, to obtain event type sequences, and then apply PrefixSpan888https://github.com/chuanconggao/PrefixSpan-py to get the ranking list of sequential patterns.

Evaluation Details To compose the schema library, we use the first ranked sequence as schema for event language model and sequential pattern mining baselines. To perform event prediction using baselines, we traverse the input graph to get event type sequences, and conduct prediction on all sequences to produce an averaged score. For human schemas, we first linearize them and the input graphs, and find the longest common subsequence between them. We fill in the future event type using the best match.

Dataset Model Event Full
Perplexity Perplexity
General Event Graph Model 24.25 137.18
   w/o ArgumentGeneration 68.59 -
IED Event Graph Model 39.39 168.89
   w/o ArgumentGeneration 51.98 -
Table 4: Instance graph perplexity.
Dataset Model   MRR HITS@1
General Event Language Model 0.367 0.497
Sequential Pattern Mining 0.330 0.478
Human Schema 0.173 0.205
Event Graph Model 0.401 0.520
   w/o ArgumentGeneraion 0.392 0.509
IED Event Language Model 0.169 0.513
Sequential Pattern Mining 0.138 0.378
Human Schema 0.072 0.222
Event Graph Model 0.223 0.691
   w/o ArgumentGeneraion 0.204 0.674
Table 5: Schema-guided ending event prediction performance.

5.3 Results and Analysis

In Table 3, the significant gain on event match and ordering match demonstrates the ability of our graph model to keep salient events and order them. On sequence match, our approach achieves larger performance gain compared to baselines when the path length is longer. It implies that the proposed model is capable of capturing longer and wider temporal dependencies. In the case of connection match, only sequential pattern mining in the baselines can predict connections between events. When compared against sequential pattern mining, our generation model performs better since it considers the inter-dependency of arguments and encodes them with graph structures.

Removing argument generation (“w/o ArgumentGeneration”) generally lowers the performance on all evaluation tasks, since it ignores the coreferential arguments and their relations, but relies solely on the overly simplistic temporal order to connect events. This is especially apparent from the instance graph perplexity in Table 4.

Figure 3: The example of event prediction on IED corpus.

On the extrinsic task of schema-guided event prediction, our graph model obtains significant improvement (see Table 5.) The low performance of human schema demonstrates the importance of probabilistically modeling schemas to support downstream tasks. Take Figure 3 as an example. The human schema fails to predict Injure, because it relies on the exact match of event sequences, and cannot handle the variants of sequences. This problem can be solved by our probabilistic schema, via modeling the event prediction probability given the existing graph.

6 Related Work

The definition of a complex event schema separates us from related lines of work, namely schema induction and script learning. Previous work on schema induction aims to characterize event triggers and participants of individual atomic events Chambers (2013); Cheung et al. (2013); Nguyen et al. (2015); Sha et al. (2016); Yuan et al. (2018), ignoring inter-event relations. Work on script learning, on the other hand, originally limited their attention to event chains with a single protagonist Chambers and Jurafsky (2008, 2009); Rudinger et al. (2015); Jans et al. (2012); Granroth-Wilding and Clark (2016) and later extended to multiple participants Pichotta and Mooney (2014, 2016); Weber et al. (2018)

. Recent efforts rely on distributed representations encoded from the compositional nature of events 

Modi (2016); Granroth-Wilding and Clark (2016); Weber et al. (2018, 2020); Lyu et al. (2020), and language modeling Rudinger et al. (2015); Pichotta and Mooney (2016); Peng and Roth (2016). All of these methods still assume that events follow linear order in a single chain. They also overlook the relations between participants which are critical for understanding the complex event. Recent work on event graph schema induction Li et al. (2020) only considers the connections between a pair of two events. Similarly, the event prediction task is designed to automatically generate a missing event (e.g., a word sequence) given a single or a sequence of prerequisite events Nguyen et al. (2017); Hu et al. (2017); Li et al. (2018b); Kiyomaru et al. (2019); Lv et al. (2019), or predict a pre-condition event given the current events Kwon et al. (2020). In contrast, we leverage the automatically discovered temporal event schema as guidance to forecast the future events.

Our work is also related to recent advances in modeling and generation of graphs Li et al. (2018a); Jin et al. (2018); Grover et al. (2019); Simonovsky and Komodakis (2018); Liu et al. (2019); Fu et al. (2020); Dai et al. (2020). Autoregressive methods, such as GraphRNN You et al. (2018), GRAN Liao et al. (2019), GRAT Yoo et al. (2020) and GraphAF Shi et al. (2019), model graph generation as a sequence of additions of new nodes and edges conditioned on the graph substructure generated so far. To induce event schema, we propose to construct instance graphs and learn a graph generation model following similar manner. Our approach is designed for complex event graph to encode both temporal and argument semantics. It sequentially generates a new event node by predicting the most likely type of the subsequent event over a target event ontology, and then add argument coreferential edges with copy mechanism and event temporal orders by exploiting the dependency between the new event and all existing events.

7 Conclusions and Future Work

We propose a new task to induce temporal complex event schema, which is capable of representing multiple temporal dependencies between events and their connected arguments. We induce such schemas by learning an event graph model, a deep auto-regressive model, from the automatically extracted instance graphs. Experiments demonstrate the model’s effectiveness on both intrinsic evaluation and the downstream task of schema-guided event prediction. These schemas can guide our understanding and ability to make predictions with respect to what might happen next, along with background knowledge including location-, and participant-specific and temporally ordered event information. In the future, we plan to extend our framework to hierarchical event schema induction, as well as event and argument instance prediction.


  • Chambers (2013) Nathanael Chambers. 2013. Event schema induction with a probabilistic entity-driven model. In

    Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP2013)

    , volume 13, pages 1797–1807.
  • Chambers and Jurafsky (2008) Nathanael Chambers and Dan Jurafsky. 2008. Unsupervised learning of narrative event chains. In Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistics (ACL2008), pages 789–797.
  • Chambers and Jurafsky (2009) Nathanael Chambers and Dan Jurafsky. 2009. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJCNLP2009).
  • Cheung et al. (2013) Jackie Chi Kit Cheung, Hoifung Poon, and Lucy Vanderwende. 2013. Probabilistic frame induction. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 837–846.
  • Dai et al. (2020) Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, and Dale Schuurmans. 2020. Scalable deep generative modeling for sparse graphs. In

    Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119

  • Fu et al. (2020) Dongqi Fu, Dawei Zhou, and Jingrui He. 2020. Local motif clustering on time-evolving graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 390–400.
  • Granroth-Wilding and Clark (2016) Mark Granroth-Wilding and Stephen Clark. 2016. What happens next? event prediction using a compositional neural network model. In

    Thirtieth AAAI Conference on Artificial Intelligence

  • Grover et al. (2019) Aditya Grover, Aaron Zweig, and Stefano Ermon. 2019. Graphite: Iterative generative modeling of graphs. In International conference on machine learning, pages 2434–2444. PMLR.
  • Gu et al. (2016) Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1631–1640.
  • Hu et al. (2017) Linmei Hu, Juanzi Li, Liqiang Nie, Xiaoli Li, and Chao Shao. 2017. What happens next? future subevent prediction using contextual hierarchical lstm. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3450–3456.
  • Jans et al. (2012) Bram Jans, Steven Bethard, Ivan Vulić, and Marie Francine Moens. 2012.

    Skip n-grams and ranking functions for predicting script events.

    In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 336–344. Association for Computational Linguistics.
  • Jin et al. (2018) Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2018.

    Junction tree variational autoencoder for molecular graph generation.

    In International Conference on Machine Learning, pages 2323–2332.
  • Joshi et al. (2019) Mandar Joshi, Omer Levy, Daniel S Weld, and Luke Zettlemoyer. 2019. Bert for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5807–5812.
  • Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
  • Kiyomaru et al. (2019) Hirokazu Kiyomaru, Kazumasa Omura, Yugo Murawaki, Daisuke Kawahara, and Sadao Kurohashi. 2019. Diversity-aware event prediction based on a conditional variational autoencoder with reconstruction. In Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pages 113–122.
  • Kwon et al. (2020) Heeyoung Kwon, Mahnaz Koupaee, Pratyush Singh, Gargi Sawhney, Anmol Shukla, Keerthi Kumar Kallur, Nathanael Chambers, and Niranjan Balasubramanian. 2020. Modeling preconditions in text with a crowd-sourced dataset. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3818–3828, Online. Association for Computational Linguistics.
  • Li et al. (2020) Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers, and Clare Voss. 2020. Connecting the dots: Event graph schema induction with path language modeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP2020).
  • Li et al. (2018a) Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. 2018a. Learning deep generative models of graphs. In Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80.
  • Li et al. (2018b) Zhongyang Li, Xiao Ding, and Ting Liu. 2018b. Constructing narrative event evolutionary graph for script event prediction. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4201–4207.
  • Liao et al. (2019) Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud, Raquel Urtasun, and Richard Zemel. 2019. Efficient graph generation with graph recurrent attention networks. In Advances in Neural Information Processing Systems, pages 4255–4265.
  • Lin et al. (2020) Ying Lin, Heng Ji, Fei Huang, and Lingfei Wu. 2020. A joint neural model for information extraction with global features. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7999–8009.
  • Liu et al. (2019) Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, and Kevin Swersky. 2019. Graph normalizing flows. In Advances in Neural Information Processing Systems, pages 13578–13588.
  • Lv et al. (2019) Shangwen Lv, Wanhui Qian, Longtao Huang, Jizhong Han, and Songlin Hu. 2019. Sam-net: Integrating event-level and chain-level attentions to predict what happens next. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6802–6809.
  • Lyu et al. (2020) Qing Lyu, Li Zhang, and Chris Callison-Burch. 2020. Reasoning about goals, steps, and temporal ordering with wikihow. In Proceedings of The 2020 Conference on Empirical Methods In Natural Language Proceedings (EMNLP).
  • Modi (2016) Ashutosh Modi. 2016. Event embeddings for semantic script modeling. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 75–83.
  • Nguyen et al. (2017) Dat Quoc Nguyen, Cuong Xuan Chu, Stefan Thater, Manfred Pinkal, et al. 2017. Sequence to sequence learning for event prediction. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 37–42.
  • Nguyen et al. (2015) Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret, and Romaric Besançon. 2015. Generative event schema induction with entity disambiguation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 188–197.
  • Ning et al. (2019) Qiang Ning, Sanjay Subramanian, and Dan Roth. 2019. An improved neural baseline for temporal relation extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6204–6210.
  • Pan et al. (2017) Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, and Heng Ji. 2017. Cross-lingual name tagging and linking for 282 languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1946–1958.
  • Pei et al. (2001) Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proceedings of the 17th International Conference on Data Engineering, pages 215–224.
  • Peng and Roth (2016) Haoruo Peng and Dan Roth. 2016.

    Two discourse driven language models for semantics.

    In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, pages 290–300. Association for Computational Linguistics (ACL).
  • Pichotta and Mooney (2014) Karl Pichotta and Raymond Mooney. 2014. Statistical script learning with multi-argument events. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 220–229.
  • Pichotta and Mooney (2016) Karl Pichotta and Raymond J Mooney. 2016.

    Learning statistical scripts with lstm recurrent neural networks.

    In Thirtieth AAAI Conference on Artificial Intelligence.
  • Rudinger et al. (2015) Rachel Rudinger, Pushpendre Rastogi, Francis Ferraro, and Benjamin Van Durme. 2015. Script induction as language modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1681–1686.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607. Springer.
  • Sha et al. (2016) Lei Sha, Sujian Li, Baobao Chang, and Zhifang Sui. 2016. Joint learning templates and slots for event schema induction. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 428–434.
  • Shi et al. (2019) Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang. 2019.

    Graphaf: a flow-based autoregressive model for molecular graph generation.

    In International Conference on Learning Representations.
  • Simonovsky and Komodakis (2018) Martin Simonovsky and Nikos Komodakis. 2018. Graphvae: Towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pages 412–422. Springer.
  • Weber et al. (2018) Noah Weber, Niranjan Balasubramanian, and Nathanael Chambers. 2018.

    Event representations with tensor-based compositions.

    In Proceedings of the AAAI Conference on Artificial Intelligence.
  • Weber et al. (2020) Noah Weber, Rachel Rudinger, and Benjamin Van Durme. 2020. Causal inference of script knowledge. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7583–7596.
  • Yang et al. (2019) Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5753–5763.
  • Yoo et al. (2020) Sanghyun Yoo, Young-Seok Kim, Kang Hyun Lee, Kuhwan Jeong, Junhwi Choi, Hoshik Lee, and Young Sang Choi. 2020. Graph-aware transformer: Is attention all graphs need? arXiv preprint arXiv:2006.05213.
  • You et al. (2018) Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. Graphrnn: Generating realistic graphs with deep auto-regressive models. In International Conference on Machine Learning, pages 5708–5717.
  • Yuan et al. (2018) Quan Yuan, Xiang Ren, Wenqi He, Chao Zhang, Xinhe Geng, Lifu Huang, Heng Ji, Chin-Yew Lin, and Jiawei Han. 2018. Open-schema event profiling for massive news corpora. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 587–596.