Data and code for the paper "Future is not One-dimensional: Complex Event Schema Induction via Graph Modeling".
Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction either focuses on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce the concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. Additionally, we propose a Temporal Event Graph Model that models the emergence of event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and manually constructed gold schemas. Intrinsic evaluation by schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided event prediction further demonstrates the predictive power of our event graph model, significantly surpassing human schemas and baselines by more than 17.8 HITS@1.READ FULL TEXT VIEW PDF
Data and code for the paper "Future is not One-dimensional: Complex Event Schema Induction via Graph Modeling".
The current automated event understanding task has been overly simplified to be local, sequential and flat. Real world events, such as disease outbreaks and drone bombing, have multiple actors, complex timelines, intertwined relations and multiple possible outcomes. Understanding them requires knowledge in the form of a repository of abstracted event schemas, capturing the progress of time, and performing global inference for event prediction. For example, regarding Ukrainian President Viktor Yanukovych’s refusal to sign a European Union (EU) association agreement in 2013, a typical question from analysts would be “Can you anticipate Russians’ reactions to Ukraine’s decision not to join the EU?” This requires an event understanding system to match events to schema representations and reason about what might happen next. The international conflict
schema would be triggered by “Ukraine declining to join the EU”, and evidence of Russian influence would suggest a Ukrainian revolution, or pro-Russian unrest with respective probabilities.
Comprehending such a news story requires following a timeline, identifying key events and tracking characters. We refer to such a “story” as a complex event. Its complexity comes from the inclusion of multiple events (and their arguments), relations and temporal order. We propose a new task, Probabilistic Temporal Complex Event Schema Induction, to address this challenge. In a Temporal Complex Event Schema, events are partially ordered by temporal relations, and connected through co-referential entities and their relations. Figure 1 shows an example schema with multi-dependency between events. For example, the happening of one event may depend on multiple events and their graph structures, e.g., the Assemble event happens after buying both the bomb materials and the vehicle. Also, there may be multiple events following an event, such as the Attack event in Figure 1. This automatically induced probabilistic complex event schema can be used to forecast event abstractions for the long-term, and thus provide a comprehensive understanding of evolving situations, events, and trends.
For each type of complex event, we aim to induce a schema repository that is probabilistic, temporally organized and semantically coherent. Low level primitive components of event schemas are abundant, and can be part of multiple, sparsely occurring, higher-level schemas. We propose a Temporal Event Graph Model, an auto-regressive graph generation model, to reach this goal. Given a currently extracted event graph, we generate the next event type node with its potential arguments, and then propagate edge-aware information following temporal orders. After that, we employ a copy mechanism to generate coreferential arguments, and build relation edges for them. Finally, temporal dependencies are determined with argument connections considered.
Our generative model serves as both a schema library and a predictive model. Specifically, we can probe the model to generate event graphs unconditionally to obtain a set of schemas. We can also pass partially instantiated graphs to the model and “grow” the graph either forward or backward in time to predict missing events, arguments or relations, both from the past and in the future. We propose a set of schema matching metrics to evaluate the induced schemas by comparing with human created schemas and show the power of the probabilistic schema in event type prediction as an extrinsic evaluation. Specifically, in the test phase, we apply our framework to instantiate high-level event schemas, and predict event types that are likely to happen next, or some causal or conditional event types that may have happened in the past.
We make the following novel contributions:
This is the first work to induce probabilistic temporal graph schemas for complex events across documents, which capture temporal dynamics and connections among individual events through their coreferential or related arguments.
This is the first application of graph generation methods to induce event schemas.
This is the first work to use complex event schemas for event type prediction, and also produce multiple hypotheses with probabilities.
We have proposed a comprehensive set of metrics for both intrinsic and extrinsic evaluations.
We release a new data set of 6,399 documents with manually constructed gold schemas.
|Instance graph of a complex event|
|Schema graph of a complex event type|
|Event node in an instance graph|
|Entity node in an instance graph|
|Temporal ordering edge between events and , indicating is before|
|Argument edge, indicating plays argument role in the event|
|Relation edge between entities and , and is the relation type|
|Argument role set of event , defined by the IE ontology|
|The type set of events|
|The type set of entities|
|A mapping function from a node to its type|
|Subgraph of containing events before and their arguments|
From a set of documents describing a complex event, we construct an instance graph which contains event nodes and entity nodes (argument nodes) . There are three types of edges in this graph: (1) event-event edges connecting events that have direct temporal relations; (2) event-entity edges connecting arguments to the event; and (3) entity-entity edges indicating relations between entities. We extract instance graphs using Information Extraction (IE) system or from IE annotation. In these graphs, all relation edges do not have directions but temporal edges between events are directional.
For each complex event, given a set of instance graphs , the goal of schema induction is to generate a schema library . In each schema graph , the nodes are abstracted to the types of events and entities. Figure 1 is an example of schema for complex event type car-bombing. Schema graphs can be regarded as a summary abstraction of instance graphs, capturing the reoccurring structures.
We use OneIE, a state-of-the-art Information Extraction system Lin et al. (2020), to extract entities, relations and events, and then perform cross-document entity and event coreference resolution Pan et al. (2017); Joshi et al. (2019) over the document cluster of each complex event. We further conduct event-event temporal relation extraction Ning et al. (2019) to determine the order of event pairs.
We construct one event instance graph for each complex event, where coreferential events or entites are merged. We include those events that are involved in temporal relations, or whose arguments are connected through entity coreference links or entity-entity relations. We consider the isolated events as irrelevant nodes in schema induction, so they are excluded from the instance graphs during graph construction. Considering schema graphs focus on type level abstraction, we use type label and node index to represent each node.
Given an instance graph about the same complex event, we regard the schema as the hidden knowledge to guide the generation of these graphs. To this end, we propose a temporal event graph model that maximizes the probability of each instance graph, parametrized by . At each step, based on the previous graph , we predict one event node and fill its argument roles to generate the next graph ,
We factorize the probability of generating new nodes and edges as:
An event node is generated first according to the probability . We then add argument nodes that are connected to this event. Since we have access to the IE ontology which defines all possible argument roles , we simply instantiate a node for each argument role . We also predict relation between the newly generated node and the existing nodes . After knowing the shared and related arguments, we add a final step to predict the temporal relations between the new event and the existing events .
In the traditional graph generation setting, the order of node generation can be arbitrary. However, in our instance graphs, event nodes are connected through temporal relations. We order events as a directed acyclic graph (DAG). Considering each event may have multiple events both “before” and “after”, we obtain the generation order by traversing the graph using Breadth-First Search.
We also add dummy Start/End event nodes to indicate the starting/ending of the graph generation. At the beginning of the generation process, the graph has a single start event node . We generate to signal the end of the graph at the end. The complexity of our algorithm is where is the number of events in the graph. At each generation step, we add nodes into the current graph, where is the generated event.
To determine the event type of the newly generated event node , we apply a graph pooling layer over all generated events to get the current graph representation ,
We use bold to denote the latent representations of nodes and edges, which will be initialized as zeros and updated at each generation step via message passing in § 3.4. We adopt a mean-pooling operation in this paper. After that, the event type is predicted through a fully connected layer,
Once we know the event type of , we add all of its arguments in defined in the IE ontology as new entity nodes. For example, in Figure 2, the new event is an Arrest event, so we add three argument nodes for Detainee, Jailor, and Place respectively. The edges between these arguments and event are also added into the graph.
We use a Graph Neural Network (GNN)Kipf and Welling (2017) to update node embeddings following the graph structure. Before we run the GNN on the graph, we first add virtual edges between the newly generated event and all previous events, and between new entities and previous entities, shown as dashed lines in Figure 2. The virtual edges enable the representations of new nodes to aggregate the messages from previous nodes, which has been proven effective in Liao et al. (2019).
To capture rich semantics of edge types, we pass edge-aware messages during graph propagation. An intuitive way is to encode different edge types with different convolutional filters, which is similar to RGCN Schlichtkrull et al. (2018). However, the number of RGCN parameters grows rapidly with the number of edge types and easily becomes unmanageable given the large number of relation types and argument roles in the IE ontology.222There are 131 edge types according to the fine-grained LDC Schema Learning Ontology.
Instead, we learn a vector representation for each relation typeand argument role . The message passed through each argument edge is:
where denotes concatenation operation. Similarly, the message between two entities and is computed as:
Considering that the direction of the temporal edge is important, we parametrize the message over this edge by assigning two separate weight matrices to the outgoing and incoming vertices:
The messages on virtual edges do not consider edge labels, similar to virtual edges between entity nodes:
We aggregate the messages using edge-aware attention following Liao et al. (2019):333Compared to Liao et al. (2019), we do not use the positional embedding mask because the newly generated nodes have distinct roles.
is the sigmoid function, and MLP contains two hidden layers with ReLU nonlinearities.
The event node representation is then updated using the messages from its local neighbors , similar to entity node representations:
After updating the node representations, we detect the entity type of each argument, and also predict whether the argument is coreferential to existing entities. Inspired by copy mechanism Gu et al. (2016)
, we classify each argument nodeto either a new entity with entity type , or an existing entity node in the previous graph .
where is the generation probability, classifying the new node to its entity type :
The copy probability models selects the coreferential entity from the entities in existing graph, denoted by ,
Here, is the shared normalization term,
If determined to copy, we merge coreferential entities in the graph.
In this phase, we determine the virtual edges to be kept and assign relation types to them. We model the relation edge generation probability as a categorical distribution over relation types, and add (Other) to the typeset to represent that there is no relation edge:
We use two hidden layers with ReLU activation functions to implement the MLP.
To predict the temporal dependencies between the new events and existing events, we connect them through temporal edges. These edges are critical for message passing in predicting the next event. We build temporal edges in the last phase of generation, since it relies on the shared and related arguments. Considering that temporal edges are interdependent, we model the temporal edge generation probability as a mixture of Bernoulli distributions followingLiao et al. (2019):
where is the number of mixture components. When , the distribution degenerates to factorized Bernoulli, which assumes the independence of each potential temporal edge conditioned on the existing graph.
We train the model by optimizing the negative log-likelihood loss,
To compose the schema library for each complex event scenario, we construct instance graphs from related documents to learn a graph model, and then get the schema using greedy generation.
We conduct experiments on two datasets for both the general scenario and a more specific scenario. We adopt a newly defined fine-grained ontology for Schema Learning,444 TheontologyisreleasedinLDC2020E25 which consists of 24 entity types, 46 relation types, 67 event types, and 85 argument roles.
General Schema Learning Corpus: The Schema Learning Corpus, released by LDC (LDC2020E25), includes 82 types of complex events, such as Disease Outbreak, Presentations and Shop Online. Each complex event is associated with a set of source documents. This data set also includes ground-truth schemas created by LDC annotators, which will be used for our intrinsic evaluation.
IED Schema Learning Corpus: The same type of complex events may have many variants, which depends on the different types of conditions and participants. In order to evaluate our model’s capability at capturing uncertainty and multiple hypotheses, we decide to dive deeper into one scenario and choose the improvised explosive device (IED) as our case study.
We first collect Wikipedia articles that describe 6 types of complex events, i.e., Car-bombing IED, Backpack IED, Drone Strikes IED, Roadside IED, Suicide IED and General IED, and then exploit the external links to collect the corresponding news documents for each complex event type. 555We have manually annotated events for 246 documents, and the annotations will be released as part of our data set.
To create ground-truth schemas for this data set, We present reference instance graphs and the ranked event sequences to annotators to create human schemas. The event sequences are generated by traversing the instance graphs, and are then sorted by frequency and the number of arguments. Human curation focuses on merging and trimming steps by validating them using the reference instance graphs. Also, temporal dependencies between steps are further refined, and coreferential entities and their relations are added during the curation process.
We compare the generated schemas with the ground truth schemas based on the overlap between them, and propose the following evaluation metrics:666We cannot use graph matching to compare between baselines and our approach due to the difference in the graph structures being modeled.
Event Match: A good schema must contain the events crucial to the complex event scenario. F-score is used to compute the overlap of event nodes.
Event Temporal Sequence Match: A good schema is able to track events through a timeline. So we obtain event sequences following temporal order, and evaluate F-score on the overlapping sequences.
Event Argument Connection Match: Our complex event graph schema includes entities and their relations and captures how events are connected through arguments, in additional to their temporal order. We categorize these connections into three categories: (1) two events are connected by shared arguments; (2) two events have related arguments, i.e., their arguments are connected through entity relations; (3) there are no direct connections between two events. For every pair of overlapped events, we calculate F-score based on whether these connections are predicted correctly.
To evaluate our temporal event graph model, we compute the instance graph perplexity by predicting the instance graphs in the test set,
We calculate the full perplexity for the entire graph using Equation (1), and event perplexity using only containing event nodes:
The latter emphasizes the importance of correctly predicting events.
To explore schema-guided probabilistic reasoning and prediction, we perform an extrinsic evaluation of event prediction. Different from traditional event prediction tasks, the temporal event graphs contain arguments with relations, and there are type labels assigned to nodes and edges. We create a graph-based event prediction dataset using our testing graphs. The task aims to predict ending events of each graph, i.e., events that have no future events after it. An event is predicted correctly if its event type matches one of the ending events in the graph. Considering that there can be multiple ending events in one instance graph, we rank event type prediction scores and adopt MRR (Mean Reciprocal Rank) and HITS@1 as evaluation metrics.
|General||Event Language Model||26.62||19.31||7.21||-|
|Sequential Pattern Mining||19.93||18.81||6.07||-|
|Event Graph Model||30.12||24.79||9.18||-|
|IED||Event Language Model||49.15||17.77||5.32||-|
|Sequential Pattern Mining||47.91||18.39||4.79||5.41|
|Event Graph Model||59.73||21.51||7.81||10.67|
Baseline 1: Event Language Model Rudinger et al. (2015); Pichotta and Mooney (2016) is the state-of-the-art event schema induction method. It learns the probability of event temporal sequences, and the event sequences generated from event language model are considered as schemas.
Baseline 2: Sequential Pattern Mining Pei et al. (2001) discovers frequent sequential patterns and encodes arguments and their relations as properties of the pattern. Considering event language model baseline cannot handle multiple arguments and relations, we add sequential pattern mining for comparison. The frequent patterns mined are considered as schemas.
Reference: Human Schema Since human-created schemas are highly accurate but not probabilistic, we want to evaluate its limits at predicting events in the extrinsic task. We match schemas to instances and fill in the matched type.
Training Details For our event graph model, the representation dimension is 512, and we use a 2-layer GNN. The value of is 10. For the event language model baseline, instead of using LSTM-based architecture following Pichotta and Mooney (2016), we adopt the state-of-the-art auto-regressive language XLNet (Yang et al., 2019) for fair comparison.777 https://github.com/huggingface We use the default parameter setting of XLNet to train the model, and select the best model on validation set. For sequential pattern mining, we perform random walk, starting from every node in instance graphs and ending at sink nodes, to obtain event type sequences, and then apply PrefixSpan888https://github.com/chuanconggao/PrefixSpan-py to get the ranking list of sequential patterns.
Evaluation Details To compose the schema library, we use the first ranked sequence as schema for event language model and sequential pattern mining baselines. To perform event prediction using baselines, we traverse the input graph to get event type sequences, and conduct prediction on all sequences to produce an averaged score. For human schemas, we first linearize them and the input graphs, and find the longest common subsequence between them. We fill in the future event type using the best match.
|General||Event Graph Model||24.25||137.18|
|IED||Event Graph Model||39.39||168.89|
|General||Event Language Model||0.367||0.497|
|Sequential Pattern Mining||0.330||0.478|
|Event Graph Model||0.401||0.520|
|IED||Event Language Model||0.169||0.513|
|Sequential Pattern Mining||0.138||0.378|
|Event Graph Model||0.223||0.691|
In Table 3, the significant gain on event match and ordering match demonstrates the ability of our graph model to keep salient events and order them. On sequence match, our approach achieves larger performance gain compared to baselines when the path length is longer. It implies that the proposed model is capable of capturing longer and wider temporal dependencies. In the case of connection match, only sequential pattern mining in the baselines can predict connections between events. When compared against sequential pattern mining, our generation model performs better since it considers the inter-dependency of arguments and encodes them with graph structures.
Removing argument generation (“w/o ArgumentGeneration”) generally lowers the performance on all evaluation tasks, since it ignores the coreferential arguments and their relations, but relies solely on the overly simplistic temporal order to connect events. This is especially apparent from the instance graph perplexity in Table 4.
On the extrinsic task of schema-guided event prediction, our graph model obtains significant improvement (see Table 5.) The low performance of human schema demonstrates the importance of probabilistically modeling schemas to support downstream tasks. Take Figure 3 as an example. The human schema fails to predict Injure, because it relies on the exact match of event sequences, and cannot handle the variants of sequences. This problem can be solved by our probabilistic schema, via modeling the event prediction probability given the existing graph.
The definition of a complex event schema separates us from related lines of work, namely schema induction and script learning. Previous work on schema induction aims to characterize event triggers and participants of individual atomic events Chambers (2013); Cheung et al. (2013); Nguyen et al. (2015); Sha et al. (2016); Yuan et al. (2018), ignoring inter-event relations. Work on script learning, on the other hand, originally limited their attention to event chains with a single protagonist Chambers and Jurafsky (2008, 2009); Rudinger et al. (2015); Jans et al. (2012); Granroth-Wilding and Clark (2016) and later extended to multiple participants Pichotta and Mooney (2014, 2016); Weber et al. (2018)
. Recent efforts rely on distributed representations encoded from the compositional nature of eventsModi (2016); Granroth-Wilding and Clark (2016); Weber et al. (2018, 2020); Lyu et al. (2020), and language modeling Rudinger et al. (2015); Pichotta and Mooney (2016); Peng and Roth (2016). All of these methods still assume that events follow linear order in a single chain. They also overlook the relations between participants which are critical for understanding the complex event. Recent work on event graph schema induction Li et al. (2020) only considers the connections between a pair of two events. Similarly, the event prediction task is designed to automatically generate a missing event (e.g., a word sequence) given a single or a sequence of prerequisite events Nguyen et al. (2017); Hu et al. (2017); Li et al. (2018b); Kiyomaru et al. (2019); Lv et al. (2019), or predict a pre-condition event given the current events Kwon et al. (2020). In contrast, we leverage the automatically discovered temporal event schema as guidance to forecast the future events.
Our work is also related to recent advances in modeling and generation of graphs Li et al. (2018a); Jin et al. (2018); Grover et al. (2019); Simonovsky and Komodakis (2018); Liu et al. (2019); Fu et al. (2020); Dai et al. (2020). Autoregressive methods, such as GraphRNN You et al. (2018), GRAN Liao et al. (2019), GRAT Yoo et al. (2020) and GraphAF Shi et al. (2019), model graph generation as a sequence of additions of new nodes and edges conditioned on the graph substructure generated so far. To induce event schema, we propose to construct instance graphs and learn a graph generation model following similar manner. Our approach is designed for complex event graph to encode both temporal and argument semantics. It sequentially generates a new event node by predicting the most likely type of the subsequent event over a target event ontology, and then add argument coreferential edges with copy mechanism and event temporal orders by exploiting the dependency between the new event and all existing events.
We propose a new task to induce temporal complex event schema, which is capable of representing multiple temporal dependencies between events and their connected arguments. We induce such schemas by learning an event graph model, a deep auto-regressive model, from the automatically extracted instance graphs. Experiments demonstrate the model’s effectiveness on both intrinsic evaluation and the downstream task of schema-guided event prediction. These schemas can guide our understanding and ability to make predictions with respect to what might happen next, along with background knowledge including location-, and participant-specific and temporally ordered event information. In the future, we plan to extend our framework to hierarchical event schema induction, as well as event and argument instance prediction.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP2013), volume 13, pages 1797–1807.
Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119.
Thirtieth AAAI Conference on Artificial Intelligence.
Skip n-grams and ranking functions for predicting script events.In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 336–344. Association for Computational Linguistics.
Junction tree variational autoencoder for molecular graph generation.In International Conference on Machine Learning, pages 2323–2332.
Two discourse driven language models for semantics.In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, pages 290–300. Association for Computational Linguistics (ACL).
Learning statistical scripts with lstm recurrent neural networks.In Thirtieth AAAI Conference on Artificial Intelligence.
Graphaf: a flow-based autoregressive model for molecular graph generation.In International Conference on Learning Representations.
Event representations with tensor-based compositions.In Proceedings of the AAAI Conference on Artificial Intelligence.