A Heterogeneous Graph with Factual, Temporal and Logical Knowledge for Question Answering Over Dynamic Contexts

04/25/2020
by   Wanjun Zhong, et al.
Microsoft
0

We study question answering over a dynamic textual environment. Although neural network models achieve impressive accuracy via learning from input-output examples, they rarely leverage various types of knowledge and are generally not interpretable. In this work, we propose a graph-based approach, where a heterogeneous graph is automatically built with factual knowledge of the context, temporal knowledge of the past states, and logical knowledge that combines human-curated knowledge bases and rule bases. We develop a graph neural network over the constructed graph, and train the model in an end-to-end manner. Experimental results on a benchmark dataset show that the injection of various types of knowledge improves a strong neural network baseline. An additional benefit of our approach is that the graph itself naturally serves as a rational behind the decision making.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/06/2019

Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

Multi-paragraph reasoning is indispensable for open-domain question answ...
01/15/2022

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

Knowledge Base Question Answering (KBQA) tasks that involve complex reas...
09/09/2019

Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering

Commonsense question answering aims to answer questions which require ba...
08/16/2021

Generative Relation Linking for Question Answering over Knowledge Bases

Relation linking is essential to enable question answering over knowledg...
08/30/2019

Answering Conversational Questions on Structured Data without Logical Forms

We present a novel approach to answering sequential questions based on s...
04/28/2020

LogicalFactChecker: Leveraging Logical Operations for Fact Checking with Graph Module Network

Verifying the correctness of a textual statement requires not only seman...
11/05/2020

Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering

Interpretability and explainability of deep neural networks are challeng...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this work, we study the problem of question answering over a dynamic textual environment, where the state of participants and their relationships in the environment evolve through time. The problem is a good testbed to measure ability of natural language understanding systems in reasoning about casual effects implicitly expressed in text, and is also important for tasks like effect prediction and procedure execution and evaluation. Reasoning over a dynamic world is challenging because it requires a model to not only understand past states of participants in the world and the effect of new evidence, but also avoid the state of change which conflicts with the universal law of the world (such as an entity cannot be located at somewhere before it is created).

Existing approaches are dominated by neural network based approaches [9, 17, 2, 4, 21, 5], where sequential and attentional architectures are used to model the interaction between the question and the context. Neural models can be conventionally trained in an end-to-end manner, and prove to generalize well empirically with remarkably high accuracy after trained on input-output examples. However, they are commonly accused of lack of the ability to leverage human knowledge and are generally not interpretable. On the other hand, expert systems with knowledge bases and logical rules are interpretable and do not rely on training data [14], however, they cannot generalize beyond what is manually defined in rules. This motivates us to take the best of both worlds.

In this paper, we present an approach that injects knowledge and rules in a neural model for question answering. The key idea is to use graph neural network as the pivot, where the graph is automatically constructed by using various types of knowledge, and representation learning over the graph is implemented with neural network. Specifically, a graph conveys three kinds of knowledge, including factual knowledge about the context which is obtained by information extraction, temporal knowledge about the past states of participants through transformation in semantic vector space, and logical knowledge which is obtained by the inference outcomes of a generic rule base grounded on an external knowledge base like VerbNet

[16]. Representations of nodes on the graph are initially calculated with BERT [7], and message passing and aggregation are implemented with graph neural network [15].

We conduct experiments on PROPARA [4], a benchmark dataset for reasoning the state of entities in procedural text. We develop our system based on ProGlobal [4]

, a strong neural network method which is open-sourced. We show that our approach, which leverages graph networks and various types of knowledge, achieves

F1 score, which improves ProGlobal by xxx. We make an ablation study and observe that the integration of the various types of knowledge brings significant improvements.

2 Task Formulation & Dataset

We study on the PROPARA [4] dataset. The task is to infer the state of a specific participant given procedural text. An example is given in Figure 1. Each grid in the row describes the location and existence of a given participant before and after each time step (“-” means “no exist” and “?” means “unknown”). For instance, sunlight is located at greenhouse after time step 1 (first sentence).

Figure 1: An example from PROPARA dataset.
Total Annotations 81,345 Total paragraph 488 Paragraph (Train/ Dev/Test) (391/43/54) Domains 183 Total sentences 3,300 Avg sentences per paragraph 6.7 Avg entities per paragraph 4.17
Table 1: Statistics of PROPARA dataset.

PROPARA [4] is built based on natural procedural text. Crowd workers annotated the location of given participants at each time step (sentence) in the paragraph. Table 1

summarizes the statistics of the PROPARA dataset. The evaluation metrics proposed by mishra2018tracking and tandonDGYBC18 are equivalent to predict the state and location of participants via answering the following seven questions.

Q1: Which participants are the inputs of the overall process?

Q2: Which participants are the outputs of the overall process?

Q3: When and where does the conversion occur?

Q4: When and where does the movement occur?

Q5: Is the participant ever created (destroyed, moved) in the overall process?

Q6: In which time step is the participant created (moved, destroyed)?

Q7: Where is the participant created (destroyed at, moved from/to)?

3 Related Work

Approaches in literature are dominated by neural models. Before PROPARA was constructed, representative neural models update the representation of state of participant with the action, and also model historical information of entities with recurrent attention. We describe the following representative works. EntNet [9] maintains and updates a representation of the state of the entity with a gating mechanism as the model reading text. QRN [17] updates the representation of the question as the model reading a passage. NPN [2] learns explicit action representations as functional operators.

Along with releasing PROPARA, dalviHTYC18 introduce two neural models, ProLocal and ProGlobal. ProLocal takes a sentence and a participant as the input, and predicts state change type and location spans. The model outputs are post-processed with persistence/inertia rules. Compared to ProLocal, ProGlobal further encodes the entire paragraph and the location of each word, and predicates state (“not exist”, “unknown location”, “known location”). tandonDGYBC18 develop ProStruct, which predicts state change (including “MOVE”, “CREATE”, “DESTROY”, “NONE”) and detects the locations of beginning and ending words with commonsense-based constraints. gupta2019tracking track the state of each participant and uses a neural CRF (Conditional Random Field) to model the global information of participant changes explicitly. KG-MRC [5]

learns the state of each participant by querying a machine reading comprehension model and keeping track of entities over a knowledge graph. Our work differs from KG-MRC in that our graph has more abundant types of knowledge including explicit factual edges, temporal edges, and logical edges.

A representative logic-based model on PROPARA is ProComp [3], which is largely neglected in the literature. ProComp processes paragraph with OpenIE and SRL, and derives rule bases from VerbNet with manual checking and correcting. It performs reasoning with four commonsense laws and three pragmatics of discourse. Our graph construction process is largely inspired by their great work.

4 Approach

Following the dominant workflow in the literature, we solve the task by predicting the state change and location span of a participant. The former consists of four types of state change (CREATE, DESTROY, MOVE, NONE), which indicate how the state of participants influenced by events happened on each time step (sentence). The latter indicates the location of participants before and after each time step.

At a high level, the design of our approach contains three main components: graph construction, representation learning over the graph and prediction, as given in Figure 2. Given a procedural paragraph and a participant as input, the graph construction component constructs a participant-specific graph. After constructing the graph, we represent each node with contextual word representations and calculate the representations of participants with a graph neural network. With learned representations of participants, we have two prediction models to predict the state and location of them at each time step.

Figure 2: An overview of the pipeline of our approach. Example of graph constructed with given paragraph. The orange node indicate the specific participant. The orange line indicate temporal edge and blue line indicate verb edge. The grey line indicate attribute edge. Each participant is given as a prior. Participant nodes are related to several entity nodes extracted from procedural text at each time step and are related to attribute nodes deduced from symbolic system.

4.1 Graph Construction

The upper right part of Figure 2 shows an example of a constructed graph for the participant “sunlight” given a procedural text on the left as context. The constructed graph not only relates the participants to other entities involved in events at each time step, but also persists the temporal consistency between participant nodes at different time steps. The graph is constructed in a participant-oriented manner with three types of edges, namely factual edges, temporal edges, and logical edges, which we will detail later.

Notation

A graph contains six main components , where denotes participant nodes at all time steps, denotes current time step (sentence), denotes a set of entity nodes (noun or phrase) that are related to participants connected by “factual edge, denotes a set of “attribute nodes, which relates to participants with “logical edge” , and denotes the subset of that have all entities related to . We further define “temporal edge to indicate a set of edges that persists temporal continuity between and .

Factual Edges

For each sentence that mentions the target participant, entity nodes and their factual edges connected to participants are automatically extracted via by OpenIE [20] or SRL (Semantic Role Labeling) [18] toolkits. To increase the coverage of the graph, we make an extension by using POS tagger and dependency relations [11] to construct the related tuples if they cannot be found by OpenIE or SRL. To filter tuples related to the current participant, we apply a soft-match mechanism to align the participant and inferred entities. In this way, we obtain a set of entity nodes (noun or phase) related to current participant with factual edges .

Temporal Edges

To model the temporal relationship between the same participant at different time steps, we define temporal edge as a set of edges that connect participants at time step to participants at time step .

Logical Edges

We largely follow clark2018 and use Semantic Lexicon to infer attribute nodes, which include the state of the participants at different time steps. Semantic Lexicon is an expert-curated rule base built based on VerbNet, describing the state of participants changed by different actions. In Semantic Lexicon, each query consists of a verb

and a syntax pattern , describing the core action and the syntactic elements in a sentence. With a matched query, the Semantic Lexicon states the before and after state of an entity. Figure 3 shows an example of query and statement of Semantic Lexicon. With a verb and a pattern described by each extracted tuple from SRL or OpenIE, our system will query the Semantic Lexicon for the state and state change of participant. In this way, we construct a set of attribute nodes with inferred state change.

Figure 3: Example of Semantic Lexicon. The example describe an event happens on participant “sunlight”. The syntax of this sentence is extracted via SRL toolkits. Semantics of this syntax is defined in Semantic Lexicon. With extracted syntax and corresponding semantics, we can infer that the state change of participant “sunlight” is “MOVE”.

Graph Complement

Sometimes the state change is not explicitly mentioned, but the location can be inferred from the text. In order to complete the graph, we define several commonsense rules to help the system to deduce states by the locations reversely. The commonsense rules are described as follows:

R1: The inferred state change is NONE if the before and after locations are the same.

R2: The inferred state change is MOVE if the before and after locations exist and are different.

R3: The inferred state change is CREATE if the participant does not exists before but has after location inferred from the current step.

4.2 Graph-Based Presentation Learning

We describe how we learn the representation of participants based on the constructed graph.

Background: Graph Network

We first introduce the common notations in graph network [8, 15, 10, 1, 19, 6], which will be applied in the following components.

Graph network framework represents the reasoning framework built based on the relational graph structure. We partly follow the notations mentioned by Battaglia et al. [1] and take the “graph” as a directed and attributed multi-graph. Thus, the graph is denoted as , where represents a set of nodes and denotes a set of edges. Each represents the attribute of a specific node. The attribute of a given node or edge may have multi-dimensional attributes. here denotes the set of neighbors that have incoming or outgoing edges to node . denotes all the incoming and outgoing edges from . As described in Battaglia et al. [1] , the node representation is learned following the below recurrence equation for time step , where and denotes the label for node and respectively.

(1)

where denoted update function aggregating information.

Node Representations

There are three types of nodes in our graph, namely “participant nodes”, “entity nodes”, and “attribute nodes”. The first two types of nodes are text spans occurred in the paragraph, where the former is given as a prior and the latter is extracted by OpenIE or SRL toolkits. We leverage the contextual word representations to initialize the representation of them. Since attributes come from a fixed vocabulary, we adopt an additional word embedding matrix to encode them.

Specifically, in order to obtain the representation of a word span, we first derive the contextual representation of the paragraph. We first represent each word as word embedding . For , we apply the concatenation of Glove embedding and contextual word embedding from BERT. Similar to Mishra et al. [12] , we further concatenate the representation with position embedding and sentence indicator embedding . describes the relative distance of each word to the participant and indicates the position of the current sentence in the paragraph. Afterwards, at time step , the contextual word embeddings of all words in a procedural text are obtained via BiLSTM, which is defined as:

(2)

where denotes number of words in document .

Representation Learning over Graph

In this part, we introduce how the representation of a given participant is learned via graph network.

We denote as the participant at the -th time step. The neighbors of participant can be defined below according to the definition as mentioned in the previous subsection.

(3)

where denotes participant node at time step . and denote neighboring entity nodes and attribute nodes of , respectively.

(4)

where , and denote the verb edges, the temporal edges and the attribute edges, respectively.

The representation of participant nodes are learned by the aggregation of information from neighbors at each time step. We develop two mechanisms for the representation learning: (1) recurrent unit to model information aggregated by “temporal edge” and “verb edge” ; (2) relational attention mechanism which acts as a fusion block to integrate attribute information from and the participant information at different time steps.

Firstly, is calculated by accumulating the information from the previous time step (i.e. ) and the information propagated by a verb edge at the current time step (i.e., ). is the concatenation of three components, including the contextual representation of participants, entities and verbs. These representations are calculated by the sum of contextual representation of participant words, entity words and verb words, respectively. For each of them, the contextual representation is calculated by the sum of the contextual representations at their corresponding positions in the text. Afterwards, the calculation of the recurrent unit is given as below, where , , , , and are model parameters.

(5)

Secondly, to integrate the attribute inferred from the graph construction system into the representation of participants, we apply a graph relational attention mechanism [22]. In our graph, an attribute node represents the state change of a participant. Thus, we regard an attribute as the operation for modifying the meaning of the participants at previous time steps [13]. Specifically, we take the attribute embedding for as the query, and take the representation of participants at previous time steps as the memory. We calculate using weighted sum over the memory , where the weight of -th memory cell

is calculated by a dot-product function with the key. Then it is followed by a linear and an activation function. The final representation of participant at time step

is the concatenation of and .

4.3 Prediction Model

We have two prediction models, designed for the prediction of state change and state (location and existence), respectively. The prediction model takes the contextual representation and representation of participant learned from graph network as input, and outputs the state change and location span of the participant at current time step.

State Change Predictor

We take the advantages of ProGlobal [21] and ProStruct [12]

. We adopt a multi-task learning objectives consisting of two main predictors. (1) state change predictor classifies the state change from one of four classes:

MOVE, CREATE, DESTROY, NONE. (2) location state predictor predicts state of location from one of three classes: not exist, location unknown, location known.

These two predictors take the representation

and the predicted category probability vector

or from last time step as inputs. We apply node-wise classification over the inputs, where and denote the category probability vectors for state change and state of location separately. These probabilities are calculated as follows, where and are model parameters and and

are bias vectors.

(6)
(7)

Location Span Predictor

To predict the location span of given participants, the model calculates the probability distribution of the start and ending word of the location span.

The span predictor takes the participant representation , contextual representation of each candidate word in paragraph , and the probability distribution of start word predicted from last time step as input. To better utilize the symbolic model for location prediction, we use location mask to filter potential location span by two predefined rules: (1) the location span is most likely relevant to an entity extracted in the graph; (2) most location spans contain only nouns and adjectives. We observe that of the instances accord with the mentioned rules.

We reuse some of the operations from ProGlobal, and calculate the probability distribution at -th time step (i.e., ) by the following formula, where represents the start probability distribution of location at the time step .

(8)
(9)
(10)

We use the similar way to predict the probability of the ending word of the location span.

4.4 Training and Inference

The contextual representation model, graph network and prediction model are trained in an end-to-end manner. Model parameters are trained by minimizing the sum of the negative log likelihood calculated for the state change classification, location state classification and location span prediction.

To better model the consistency between state predictor and location predictor, the model only infer the location span when the location state is classified into “known location”. Otherwise, if the location state is “does not exist” or “location unknown”, the location will be assigned to “null” or “unk”.

5 Experiment

In this section, we describe experimental settings, model comparison, ablation study and quantitative analysis. We abbreviate our approach as ProGraph.

5.1 Task 1: Document Level Evaluation

We first evaluate the performance of our model on the document-level task, which is to answer the first four questions as mentioned in Section 2.

Precision Recall F1 QRN 55.50 31.30 40.00 EntNet 50.20 33.50 40.20 Pro-Global 46.70 52.40 49.40 Pro-Struct 74.20 42.10 53.75 KG-MRC 64.52 50.68 56.77 ProComp 64.80 38.10 48.00 ProGraph 67.30 55.80
Table 2: Results on document-level task. Our approach is abbreviated as ProGraph. ProComp is the symbolic-based baseline and other systems are strong neural baselines.
Model Performance P R F1 ProGraph 67.30 55.80 61.00    -w/o location mask 66.50 53.80 59.50    -w/o attribute 63.10 55.90 59.30    -w/o entire graph 62.10 46.90 53.40
Table 3: Ablation experiments on PROPARA. We eliminate the components of graph-based reasoning model and then we eliminate the whole graph-based learning model (entire graph).

Table 3 reports the results on document-level task. We also report the performance of our re-implemented ProComp system, with some modifications as described previously. We compare ProGraph with the pure symbolic-based system ProComp described before. As shown in the table, our system achieves 17.70% absolute improvements in Recall and 13% improvement on the F1 compared with ProComp. Moreover, our system also outperforms previous strong neural baselines with 61.00% F1 score. Our model also outperforms KG-MRC, the most related work to ProGraph, with 4.23% absolute performance gain in F1 score. This observation indicates that integrating graph neural network with the symbolic system not only persists the in-domain reasoning ability of the symbolic system but also alleviates its shortcomings when learning the fluidity of concepts.

To make further analysis about the effect of different components (i.e. variants of attributes, nodes or relations), we make ablation experiments. Table 3

describes the result after eliminating different components of our model. We eliminate the graph-based representation learning model. Instead of predicting the state of the participant by its learned representation from the graph network, we replace it with a paragraph-based representation calculated by max-pooling over the contextual representation

. As shown in the table, eliminating the component of graph-based representation learning model from the overall model causes substantial performance drops (61.0% to 53.4% on F1). The result verifies that incorporating the graph network can enhance the system’s ability to perform entity state reasoning.

Moreover, we eliminate the location mask, which is designed to shrink the number of candidate location words by using the extracted graph. This operation leads to 0.8%, 2.0% and 1.5% performance drops on precision, recall and F1, respectively. This result indicates that the symbolic model can provide a strong prior knowledge of the important information to the learning of the neural model.

We further remove the attribute node from the graph, and find that this operation causes 4.2% and 1.7% performance drops on precision and F1 respectively. This observation confirms that attributes inferred by the symbolic system are meaningful in guiding the learning of graph network.

Case Study

We provide a case study for the qualitative analysis of our model. As shown in Figure 4, with two given sentences (i.e., “Water covers streets. Water goes into houses.”), our system can build the graph that captures the critical information from the sentences. Afterwards, ProGraph learns the representation over the constructed graph and make a correct prediction with the prediction model.

Figure 4: An example of model output. With given procedural text, the model first construct a graph for participant “water”. Then ProGraph perform reasoning over the graph and output the state change and the position of “water”.

5.2 Task 2: Sentence Level Evaluation

The fine-grained sentence level task was proposed by Mishra et al. [12], which requires the system to answer the last three questions as mentioned in Section2.

As shown in Table 4, our model outperforms previous strong baselines in all types of questions (4.9%, 4.62% and 3.55% performance gain in Cat-1, Cat-2, Cat-3). This observation indicates that ProGraph is better at both location tracking and state change prediction because it can better model the correlation between participants and other entities. ProGraph also significantly outperforms ProComp in all types of question, which proves that integrating symbolic with the neural models enhances the reasoning ability.

Cat-1 Cat-2 Cat-3 Macro-Avg Micro-Avg
Human upper bound 91.67 87.66 62.96 80.76 79.69
EntNet [9] 51.62 18.83 7.77 26.07 25.96
QRN [17] 52.37 15.51 10.92 26.26 26.49
Pro-Local [12] 62.65 30.50 10.35 34.50 33.96
Pro-Global [12] 62.95 36.39 35.90 45.08 45.37
KG-MRC [5] 62.86 40.00 38.23 47.03 46.62
ProComp 55.93 26.59 11.08 31.20 30.84
ProGraph
Table 4: Results on sentence-level task. Our approach is abbreviated as ProGraph. ProComp is the symbolic-based baseline and other systems are strong neural baselines.

5.3 Error Analysis and Discussion

We analyze randomly selected 500 incorrectly predicted instances and summarize the major types of errors.

The dominant type is that the model fails to distinguish the initial state of participants between “does not exist” or “location unknown” if the participant is not mentioned explicitly at the beginning of the process.

The second type of errors is caused by failing to model consistency between the participant and its carrier. The state of the participant can be affected by the state of its carrier, but it is hard for the model to learn this consistency. For example, the procedural text mentions “Soft tissues quickly decompose leaving behind hard bones or shells. Over time sediment builds over the top and hardens into rock.”. After this process, bones should be located in rock because sediment is the carrier of bones. However, the model fails to locate bones because it is not mentioned in the second sentence.

The third type of errors is caused by the fact that the external knowledge resource (i.e., semantic lexicon) lack fine-grained knowledge about the meaning of verbs when associated with different entities. For instance, when the input is “Rain clouds are stopped or slowed by mountains or wind.”, the model states that the clouds moves to the mountain while the answer is that they stay on the sky. The rule “subject (stop by) object” will infer the state change “subject (MOVE) to object” without considering different types of subjects. Therefore, the state of “cloud” will be misclassified into “MOVE” because it corresponds to the most situation when the action “stop by” happened.

Moreover, the model may fail to identify real location through pronoun. For example, when the paragraph mentions “The dead plants sink to the bottom of the swamps. Many more dead plants sink in the same area.”, the model outputs that the location of plants is “same area” while the golden answer is “bottom of the swamps”. These two locations are the same with the referential relationship but the model fails to identify that. One intuition of solving this type of errors is to employ the co-reference resolution toolkits.

6 Conclusion

We present an approach, namely ProGraph, to improve entity state reasoning by integrating various types of knowledge into neural network. We contribute by introducing a graph-based reasoning framework, in which the graph construction process leverages factual, temporal and logical knowledge. The representations of nodes and the compositionality over a graph are modeled via neural models. Results show that integrating the neural model and the symbolic model with the graph network significantly improves performance. Our model performs better than strong baselines on PROPARA dataset.

We suggest following directions for further research:

  1. Identifying ways to improve the construction of the participant-specific structure in a dynamic environment.

  2. Developing a better knowledge-enhanced model to automatically retrieve the external/world knowledge and integrate the knowledge for making the prediction.

  3. Handling the error at the initial state with prior/commonsense knowledge.

References

  • [1] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre, F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu (2018)

    Relational inductive biases, deep learning, and graph networks

    .
    CoRR abs/1806.01261. External Links: Link, 1806.01261 Cited by: §4.2, §4.2.
  • [2] A. Bosselut, O. Levy, A. Holtzman, C. Ennis, D. Fox, and Y. Choi (2017) Simulating action dynamics with neural process networks. CoRR abs/1711.05313. External Links: Link, 1711.05313 Cited by: §1, §3.
  • [3] P. Clark, B. Dalvi, and N. Tandon (2018) What happened? leveraging verbnet to predict the effects of actions in procedural text. CoRR abs/1804.05435. External Links: Link, 1804.05435 Cited by: §3.
  • [4] B. Dalvi, L. Huang, N. Tandon, W. Yih, and P. Clark (2018) Tracking state changes in procedural text: a challenge dataset and models for process paragraph comprehension. See DBLP:conf/naacl/2018-1, pp. 1595–1604. External Links: Link Cited by: §1, §1, §2, §2.
  • [5] R. Das, T. Munkhdalai, X. Yuan, A. Trischler, and A. McCallum (2019) Building dynamic knowledge graphs from text using machine reading comprehension. ICLR. External Links: Link Cited by: §1, §3, Table 4.
  • [6] N. De Cao, W. Aziz, and I. Titov (2018) Question answering by reasoning across documents with graph convolutional networks. arXiv preprint arXiv:1808.09920. Cited by: §4.2.
  • [7] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • [8] M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §4.2.
  • [9] M. Henaff, J. Weston, A. Szlam, A. Bordes, and Y. LeCun (2016) Tracking the world state with recurrent entity networks. CoRR abs/1612.03969. External Links: Link, 1612.03969 Cited by: §1, §3, Table 4.
  • [10] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §4.2.
  • [11] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky (2014)

    The stanford corenlp natural language processing toolkit

    .
    In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60. Cited by: §4.1.
  • [12] B. D. Mishra, L. Huang, N. Tandon, W. Yih, and P. Clark (2018) Tracking state changes in procedural text: a challenge dataset and models for process paragraph comprehension. arXiv preprint arXiv:1805.06975. Cited by: §4.2, §4.3, §5.2, Table 4.
  • [13] J. Mitchell and M. Lapata (2010) Composition in distributional models of semantics. Cognitive science 34 (8), pp. 1388–1429. Cited by: §4.2.
  • [14] S. Russell and P. Norvig (2002) Artificial intelligence: a modern approach. Cited by: §1.
  • [15] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The graph neural network model. IEEE Trans. Neural Networks 20 (1), pp. 61–80. External Links: Link, Document Cited by: §1, §4.2.
  • [16] K. K. Schuler (2005) VerbNet: a broad-coverage, comprehensive verb lexicon. Cited by: §1.
  • [17] M. Seo, S. Min, A. Farhadi, and H. Hajishirzi (2016) Query-reduction networks for question answering. arXiv preprint arXiv:1606.04582. Cited by: §1, §3, Table 4.
  • [18] P. Shi and J. Lin (2019) Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255. Cited by: §4.1.
  • [19] L. Song, Z. Wang, M. Yu, Y. Zhang, R. Florian, and D. Gildea (2018) Exploring graph-structured passage representation for multi-hop reading comprehension with graph neural networks. arXiv preprint arXiv:1809.02040. Cited by: §4.2.
  • [20] G. Stanovsky, J. Michael, L. S. Zettlemoyer, and I. Dagan (2018) Supervised open information extraction. In NAACL-HLT, Cited by: §4.1.
  • [21] N. Tandon, B. Dalvi, J. Grus, W. Yih, A. Bosselut, and P. Clark (2018) Reasoning about actions and state changes by injecting commonsense knowledge. See DBLP:conf/emnlp/2018, pp. 57–66. External Links: Link Cited by: §1, §4.3.
  • [22] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §4.2.