Automatically building knowledge graphs (KGs) from text is a long-standing goal in artificial intelligence research. KGs organize raw information in a structured form, capturing relationships (labeled edges) between entities (nodes). They enable automated reasoning, e.g., the ability to infer unobserved facts from observed evidence and to make logical “hops,” and render data amenable to decades of work in graph analysis.
There exists a profusion of text that describes complex, dynamic worlds in which entities’ relationships evolve through time. This includes news articles, scientific manuals, and procedural text (e.g., recipes, how-to guides, and so on). Building KGs from this data would not only help us to study the changing relations among participant entities, but also to make implicit information more explicit. For example, the graphs at each step in Figure 1 help us to infer that the new entity mixture is created in the leaf, since the previous location of its participant entities (light, CO, water) was leaf – even though this is never stated in the text.
This paper introduces a neural machine-reading model, Kg-Mrc, that (i) explicitly constructs dynamic knowledge graphs to track state changes in procedural text and (ii) conditions on its own constructed knowledge graphs to improve downstream question answering on the text. Our dynamic graph model is recurrent, that is, the graph at each time step depends on the state of the graph at the previous time step. The constructed graphs are parameterized by real-valued embeddings for each node that change through time.
In text, entities and their states (e.g., their locations) are given by spans of words. Because of the variety of natural language, the same entity/state may be described with several surface forms. To address the challenge of entity/state recognition, our model uses a machine reading comprehension (MRC) mechanism (Seo et al., 2017a; Xiong et al., 2017; Chen et al., 2017; Yu et al., 2018, inter alia), which queries for entities and their states at each time step. We leverage MRC mechanisms because they have proven adept at extracting text spans that answer entity-centric questions (Levy et al., 2017). However, such models are static by design, returning the same answer for the same query and context. Since we expect answers about entity states to change over the course of the text, our model’s MRC component conditions on the evolving graph at the current time step (this graph captures the instantaneous states of entities).
To address the challenge of aliased text mentions, our model performs soft co-reference as it updates the graph. Instead of adding an alias node, like the leaf or leaves as aliases for leaf, the graph update procedure soft-attends (Bahdanau et al., 2014) over all nodes at the previous time step and performs a gated update (Cho et al., 2014; Chung et al., 2014) of the current embeddings with the previous ones. This ensures that state information is preserved and propagated across time steps. Soft co-reference can also handle the case that entity states do not change across time steps, by applying a near-null update to the existing state node rather than duplicating it.
At each time step, after the graph has been updated with the (possibly) new states of all entities, our model updates each entity representation with information about its state. The updated information about each individual entity is further propagated to all other entities (§ 4.4). This enables the model to recognize, for example, that entities are present in the same location (e.g., light, CO and water in Figure 1). Thus, our model can use the information encoded in its internal knowledge graphs for a more comprehensive understanding of the text. We will demonstrate this experimentally by tackling comprehension tasks from the the recently released ProPara and Recipes datasets.
Our complete machine reading model, which both builds and leverages dynamic knowledge graphs, can be trained end-to-end using only the loss from its MRC component; i.e., the negative log-likelihood that the MRC component assigns to the span that correctly describes each entity’s queried state. We evaluate our model (Kg-Mrc) on the above two ProPara tasks and find that the same model significantly outperforms the previous state of the art. For example, Kg-Mrc obtains a 9.92% relative improvement on the hard task of predicting at which time-step an entity moves. Similarly on the latter task, Kg-Mrc obtains a 5.7% relative improvement over ProStruct and 41% relative improvement over other entity-centric models such as EntNet (Henaff et al., 2017). On the Recipes dataset, the same model obtains competitive performance.
2 Related Work
There are few datasets that address the challenging problem of tracking entity state changes. The bAbI dataset (Weston et al., 2015)
includes questions about movement of entities; however, its language is generated synthetically over a small lexicon, and hence models trained on bAbI often do not generalize well when tested on real-world data. For example, state-of-the-art models likeEntNet (Henaff et al., 2017) and Query Reduction Networks (Seo et al., 2017b) fail to perform well on ProPara.
ProRead (Berant et al., 2014) introduced the ProcessBank dataset, which contains paragraphs of procedural text as in ProPara. However, this earlier task involves mining arguments and relations from events, not tracking the dynamic state changes of entities. The model that Berant et al. (2014) propose builds small knowledge graphs from the text, but they are not dynamic in nature. The model also relies on densely annotated process structure for training, demanding curation by domain experts. On the other hand, our model, Kg-Mrc, learns to build dynamic KGs just from annotations of text spans, which are much easier to collect.
For the sentence-level ProPara task they propose, Dalvi et al. (2018) introduce two models: ProLocal and ProGlobal. ProLocal
makes local predictions about entities by considering just the current sentence. This is followed by some heuristic/rule-based answer propagation.ProGlobal
considers a broader context (previous sentences) and also includes the previous state of entities by considering the probability distribution over paragraph tokens in the previous step.Tandon et al. (2018) recently proposed a neural structured-prediction model, (ProStruct), where hard and soft common-sense constraints are injected to steer their model away from globally incoherent predictions. We evaluate Kg-Mrc on the two ProPara tasks proposed by Dalvi et al. (2018) and Tandon et al. (2018), respectively, and find that our single model outperforms each of the above models on their respective tasks of focus.
EntNet (Henaff et al., 2017) and query reduction networks (QRN) (Seo et al., 2017b) are two state-of-the-art entity-centric models for the bAbI dataset. EntNet maintains a dynamic memory of hidden states with a gated update to the memory slots at each step. Memory slots can be tied to specific entities, but unlike our model, EntNet does not maintain separate embeddings of individual states (e.g., current locations); it also does not perform explicit co-reference updates. QRN
refines the query vector as it processes each subsequent sentence until the query points to the answer, but does not maintain explicit representations of entity states. Neural Process Networks (NPN)(Bosselut et al., 2018) learn to understand procedural text by explicitly parameterizing actions and composing them with entities. These three models return an answer by predicting a vocabulary item in a multi-class classification setup, while in our work we predict spans of text directly from the paragraph.
MRC models have been used previously for extracting the argument of knowledge base (KB) relations, by associating one or more natural language questions with each relation (querification). These models have been shown to perform well in a zero-shot setting, i.e., for a previously unseen relation type (Levy et al., 2017), and for extracting entities that belong to non-standard types (Roth et al., 2018). These recent positive results motivate our use of an MRC component in Kg-Mrc.
3 Data & Tasks
We evaluate Kg-Mrc on the recently released ProPara dataset (Dalvi et al., 2018), which comprises procedural text about scientific processes. The location states of participant entities at each time step (sentence) in these processes are labeled by human annotators, and the names of participant entities are given. As an example, for a process describing photosynthesis, the participant entities provided are: light, CO, water, mixture and glucose. Although participant entities are thus known a priori, the location of an entity could be any arbitrary span in the process text. This makes the task of determining and tracking an entity’s changing location quite challenging.
|avg. # entities||4.17|
|avg. # sentences||6.7|
It should also be noted that the dataset does not provide information on whether a particular entity is an input to or output of a process. Not all entities exist from the beginning of the process (e.g. glucose) and not all exist at the end (e.g. water). Table 1 shows statistics of ProPara. As can be seen, the training set is small, which makes learning challenging.
Along with the dataset, Dalvi et al. (2018) introduce the task of tracking state changes at a fine-grained sentence level. To solve this task, a model must answer three categories of questions (10 questions in total) about an entity : (1) Is created, (destroyed, moved) in the process? (2) When (step #) is created, (destroyed, moved)? (3) Where is created, (destroyed, moved from/to)? Cat. 1 asks boolean questions about the existence and movement of entities. Cat. 2 and 3 are harder tasks, as the model must correctly predict the step number at which a state changes as well as the correct locations (text spans) of entities at each step.
Tandon et al. (2018) introduce a second task on the ProPara dataset that measures state changes at a coarser process level. To solve this task, a model must correctly answer the following four types of questions: (1) What are the inputs to the process? (2) What are the outputs of the process? (3) What conversions occur, when and where? (4) What movements occur, when and where? Inputs to a process are defined as entities that exist at the start of the process but not at the end and outputs are entities that exist at the end of the process and were created during it. A conversion is when some entities are created and others destroyed, while movements refer to changes in location. Dalvi et al. (2018) and Tandon et al. (2018) propose different models to solve each of these tasks separately, whereas we evaluate the same model, Kg-Mrc, on both tasks.
Bosselut et al. (2018) recently released the Recipes dataset, which has various annotated states (e.g. shape, composition, location, etc.) for ingredients in cooking recipes. We further test Kg-Mrc on the location task to align with our ProPara experiments. This is arguably the dataset’s hardest task, since it requires classification over more than 260 classes while the others have a much smaller label space (maximum of 4). Note that rather than treating this problem as classification over a fixed lexicon as in previous models, our model aims to find the location-describing span of text in the recipe paragraph.
Kg-Mrc tracks the temporal state change of entities in procedural text. Naturally, the model is entity-centric (Henaff et al., 2017; Bansal et al., 2017): it associates each participant entity of the procedural text with a unique node and embedding in its internal graph. Kg-Mrc is also equipped with a neural machine reading comprehension model which is queried about the current location of each entity.
At a high level, our model operates as follows. We summarize some important notation in Table 2. Kg-Mrc processes a paragraph , of words, by incrementally reading prefixes of the paragraph up to and including sentence at each time step . This continues until it has seen all sentences of the paragraph. At each time step (sentence) , we engage the MRC module to query for the state of each participant entity (participants are known in ProPara a priori). The query process conditions on both the input text and the target entity’s node embedding, , where the latter comes from the graph at the previous time step. The MRC module returns an answer span describing the entity’s current location at ; we encode this text span as the vector . Conditioning on the span vectors , the model constructs the graph by updating from the previous time step.
The model’s knowledge graphs are bipartite, having two sets of nodes with implied connections between them: . Each node denotes either an entity () or that entity’s corresponding location (), and is associated with a real-valued vector. We use and to denote nodes in the graph and their vector representations interchangeably. The bipartite graphs have only one (implicit) relation type, the current location, though we plan to extend this in future work. To derive from its previous iterate , we combine both hard and soft graph updates. The update to an entity’s node representation with new location information arises from a hard decision made by the MRC model, whereas co-reference between entities across time steps is resolved with soft attention. We now describe all components of the model in detail.
4.1 Entity and Span Representations
In the ProPara dataset, entities appear in the paragraph text.111We compute the positions of the occurrence of entities by simple string matching. Therefore, we derive the initial entity representations from contextualized hidden vectors by encoding the paragraph with a bi-directional LSTM (Hochreiter & Schmidhuber, 1997). This choice has the added advantage that initial entity representations share information through context, unlike in previous models (Henaff et al., 2017; Das et al., 2017; Bansal et al., 2017). Entities in the dataset can be multi-word expressions (e.g., electric oven). To obtain a single representation, we concatenate the contextualized hidden vectors corresponding to the start and end span tokens and take a linear projection. i.e., if the mention of entity occurs between the -th and -th position, then the initial entity representation is computed as We use to index an entity and its corresponding location, while represents the contextualized hidden vectors for token and represents the concatenate operation. An entity may occur multiple times within a paragraph. We give equal importance to all occurrences by summing the representations for each.
When queried about the current location of an entity, the MRC module (§ 4.2) returns a span of text as the answer, whose representation is later used to update the appropriate node vector in the graph. We obtain this answer-span representation analogously as above, and denote it with .
4.2 Machine Reading Comprehension Model
Rather than design a specialized MRC architecture, we make simple extensions to a widely used model – DrQA (Chen et al., 2017) – to adapt it to query about the evolving states of entities. In summary, our modified DrQA implementation operates on prefixes of sentences rather than the full paragraph (like ProGlobal), and at each sentence (time step) it conditions on both the current sentence representation and the dynamic entity representations in .
For complete details of the DrQA model, we refer readers to the original publication (Chen et al., 2017)
. Broadly, it uses a multi-layer recurrent neural network (RNN) architecture for encoding both the passage and question text and uses self-attention to match these two encodings. For each tokenin the text, it outputs a score indicating its likelihood of being the start or end of the span that answers the question. We reuse all of these operations in our model, modified as described below.
|Number of participant entities in the process.|
|Initial entity representation, derived from the text, for the -th entity at time = 0 (§ 4.1)|
|Entity node representation for the -th entity at time , in the graph (§ 4.4)|
|Location representation derived from the text for the -th entity at time (§ 4.1)|
|Location node representation for the -th entity at time , in the graph (§ 4.3, 4.4)|
|Matrix of all location node representations at time|
|Soft co-reference matrix at time step (§ 4.3)|
. The text-based representations of entities and locations are derived from the hidden representations of the context-RNN (§4.1). The node representations are added to the graph at the end of time step (§ 4.4).
We query the DrQA model about the state of each participant entity at each time step . This involves reading the paragraph up to and including sentence . To query, we generate simple natural language questions for an entity, , such as “Where is located?” This is motivated by the work of Levy et al. (2017). Our DrQA component also conditions on entities. Recall that vector denotes the entity’s representation in the knowledge graph . The module conditions on in its output layer, basically the same way as the question representation is used in the output alignment step in Chen et al. (2017). However, instead of taking a bi-linear map between the question and passage representations as in that work, we first concatenate the question representation with and pass the concatenation through a 2-layer MLP. This yields an entity-dependent question representation. We use this to compute the output start and end scores for each token position, taking the to obtain the most likely span. As mentioned, we encode this span as vector (§ 4.1).
The ProPara dataset includes two special locations that don’t appear as text spans: nowhere and somewhere. The current location of an entity is nowhere when the entity does not exist yet or has been destroyed, whereas it is somewhere when the entity exists but its location is unknown from the text. Since these locations don’t appear as tokens in the text, the span-predictive MRC module cannot extract them. Following Dalvi et al. (2018)
, we address this with a separate classifier that predicts, given a graph entity node and the text, whether the entity represented by the node isnowhere, somewhere, or its location is stated. We learn the location-node representations for nowhere and somewhere during training.
4.3 Soft Co-reference
To handle cases when entity states do not change and when states are referred to with different surface forms (either of which could lead to undesired node duplication), our model uses soft co-reference mechanisms (Figure 2) both across and within time steps. Disambiguation across time steps is accomplished by attention and a gated update, using the incoming location vector and the location node representations from the previous time step:
where is a matrix of location node representations from the previous time step (stacked row-wise) and is the location span vector output by the MRC module. The result vector is a disambiguated intermediate node representation.
This process only partially addresses node de-duplication. Since different instances of the same location can be predicted for multiple entities, we also perform a co-reference disambiguation within each time step using a self-attention mechanism:
where is a matrix of intermediate node representations (stacked row-wise) and is a co-reference adjacency matrix. We calculate this adjacency matrix at the beginning of each time step to track related nodes within , and re-use it in the graph update step.
4.4 Graph Update
The graph update proceeds according to the following set of equations for each update layer :
We first compose all connected entity and location nodes with their history summary, , using an LSTM unit. Next, the updated node information is attached to the entity and location representations through two residual updates (He et al., 2016). These propagate information between the entity and location representations; i.e., if two entities are at the same location, then the corresponding entity representations will receive a similar update. Likewise, location representations are updated with pertinent entity information. Last, we perform a co-reference pooling operation for the location node representations. This uses the adjacency matrix , where is a row-wise stacked matrix of the , to tie co-referent location nodes together.
The recurrent graph module stacks such layers to propagate node information along the graph’s edges. The resulting node representations are and for each participant entity and its location. We use to condition the MRC model, as described in §4.2. We make use of this particular graph module structure, rather than adopting an existing model like GraphCNNs (Edwards & Xie, 2016; Kipf & Welling, 2017), because recurrent networks are designed to propagate information through time.
The full Kg-Mrc model is trained end-to-end by minimizing the negative log-likelihood of the correct span tokens under the MRC module’s output distribution and the textual entailment model. This is a fairly soft supervision signal, since we do not train the graph construction modules directly. We teacher-force the model at training time by updating the location-node representations with the encoding of the correct span. We do not pretrain the MRC module, but we represent paragraph tokens with pretrained FastText embeddings (Joulin et al., 2016). See the appendix A for full implementation and training details.
5 Experiments and Discussion
We evaluate our model on three different tasks. We also provide an ablation study along with quantitative and qualitative analyses to highlight the performance contributions of each module.
5.1 Results on Procedural Text
We benchmarked our model on two ProPara comprehension tasks introduced respectively in Dalvi et al. (2018) and Tandon et al. (2018). Refer to Section 3 for a detailed description about the data and tasks. Dalvi et al. (2018) and Tandon et al. (2018) respectively introduce a specific model for each task, whereas we test Kg-Mrc on both tasks. A primary motivation for building KGs is because they can be queried for salient knowledge in downstream applications. We evaluate Kg-Mrc on the above two tasks by querying the KGs it builds at each time-step; we use the official evaluation pipeline222https://github.com/allenai/propara/tree/master/propara/eval
for each task. In results below, we report an average score of three runs of our model with different hyperparameter settings.
5.1.1 Task 1: Sentence-level Evaluation
Table 3 shows our main results on the first task. Following the original task evaluation, we report model accuracy on each subtask category and macro and micro averages over the subtasks.
Human performance is 79.69%, micro-average. A state-of-the-art memory augmented network, EntNet (Henaff et al., 2017), which is built to track entities but lacks an explicit graph structure, achieves 25.96%. The previous best performing model is ProGlobal, which achieves 45.37%. Our Kg-Mrc improves over this result by 1.25% absolute score in terms of micro-averaged accuracy. Comparing various models for each subtask category, ProGlobal leads in Category 1 by a small margin of around 0.1%. For the more challenging Categories 2 and 3, Kg-Mrc outperforms ProGlobal by a large margin. These questions require fine-grained predictions of state changes.
|Cat 1||Cat 2||Cat 3||Macro-avg||Micro-avg|
|Human upper bound||91.67||87.66||62.96||80.76||79.69|
|EntNet (Henaff et al. (2017))||51.62||18.83||7.77||26.07||25.96|
|Pro-Local (Dalvi et al. (2018))||62.65||30.50||10.35||34.50||33.96|
|Pro-Global (Dalvi et al. (2018))||62.95||36.39||35.90||45.08||45.37|
5.1.2 Task 2: Document-level Evaluation
We report the performance of our model on the document-level task, along with previously published results, in Table 4. The same Kg-Mrc model achieves 3.02% absolute improvement in over the previous best result of ProStruct. ProStruct incorporates a set of commonsense constraints for globally consistent predictions. We analyzed Kg-Mrc’s outputs and were surprised to discover that our model learns these commonsense constraints from the data in an end-to-end fashion, as we show quantitatively in §5.4.
|Pro-Local (Dalvi et al. (2018))||77.4||22.9||35.3|
|QRN (Seo et al. (2017b))||55.5||31.3||40.0|
|EntNet (Henaff et al. (2017))||50.2||33.5||40.2|
|Pro-Global (Dalvi et al. (2018))||46.7||52.4||49.4|
|Pro-Struct (Tandon et al. (2018))||74.2||42.1||53.75|
5.2 Recipe Description Experiments
We also evaluate our model on the Recipes dataset, where we predict the locations of cooking ingredients. In the original work of Bosselut et al. (2018), they treat this problem as classification over a fixed lexicon of locations, whereas Kg-Mrc searches for the correct location span in the text. Our model slightly outperforms the baseline NPN model on this task even after it was trained on just 10K examples (the full training set is around 60K examples): NPN achieves 51.28% training on all the data, while Kg-Mrc achieves 51.64% after 10k examples.
5.3 Ablation Study
We performed an ablation study to evaluate different model variations on ProPara Task 1. The main results are reported in Table 5. Removing the soft co-reference disambiguation within time steps (Equations 2) from Kg-Mrc resulted in around 1% performance drop. The drop is more significant when the co-reference disambiguation across time steps (Equations 1) is removed.
We also replaced the recurrent graph module with the standard LSTM unit and used the LSTM hidden state for the entity representation. As this model variation lacks the information propagation across graph nodes, we observed a large performance decrease.
For the last two variations, we simply train the MRC model in isolation and predict location spans from the current sentence or paragraph prefix text (i.e., the current and all previous sentences). These models construct no internal knowledge graphs. We can see that training the MRC model on paragraph prefixes already provides a good starting performance of 40.83% micro-average, which is significantly boosted by the recurrent graph module and graph conditioning up to 47.64%.
|Cat 1||Cat 2||Cat 3||Macro-avg||Micro-avg|
|- Coref across time steps||61.07||37.38||35.58||44.68||46.32|
|- Coref within time step||57.88||38.09||40.19||45.39||46.63|
|Standard LSTM as graph unit||56.84||13.15||10.95||26.98||29.97|
|MRC on entire paragraph||58.85||21.82||26.52||35.73||35.98|
|MRC on prefix||61.28||32.58||29.48||41.11||40.83|
5.4 Commonsense Constraints
For accurate, globally consistent predictions for the second task, Tandon et al. (2018) introduced a set of commonsense constraints on their model. Stated in natural language, these constraints are: 1) An entity must exist before it can be moved or destroyed; 2) An entity cannot be created if it already exists; 3) An entity cannot change until it is mentioned in the paragraph.
To analyze whether our model can learn the above constraints from data, we count the number of predictions that violate any constraints on the test set. In Table 6 we compare the behavior of different models by computing the number of violations made by Tandon et al. (2018)’s model and several variants of our model. Note that we only count instances where a model predicts an entity state change.
|Model||State Change Predictions||Violations||Violation Proportion (%)|
|Pro-Struct (Tandon et al. (2018))||270||17||6.30|
|MRC on entire paragraph||381||104||27.30|
|MRC on prefix||703||154||21.93|
|Standard LSTM as graph unit||447||20||4.47|
To our surprise, Kg-Mrc learns to violate fewer constraints (proportionally) than ProStruct even without explicitly training it to do so. As the table shows, MRC models without recurrent graph modules perform worse in terms of constraint violations than both Kg-Mrc and a model with standard LSTM as its graph unit. This suggests that recurrency and graph representations play an important role in helping the model to learn commonsense constraints.
5.5 Qualitative Analysis
We picked an example from the test data and took a closer look at the model outputs to investigate how Kg-Mrc dynamically adjusts its decisions via the dynamic graph module and finds accurate spans with the conditional MRC model. The step-by-step output of both ProGlobal (Dalvi et al. (2018)) and Kg-Mrc is shown in Table 7, where we track the state of entity blood across six sentences. Kg-Mrc outputs smoother and more accurate predictions.
|Sentences||Location of entities after each sentence|
|(Before first sentence)||somewhere||somewhere|
|Blood enters the right side of your heart.||heart||right side of your heart|
|Blood travels to the lungs.||lung||lungs|
|Carbon dioxide is removed from the blood.||blood||lungs|
|Oxygen is added to your blood.||lung||lungs|
|Blood returns to left side of your heart.||blood||heart|
|The blood travels through the body.||body||body|
We proposed a neural machine-reading model that constructs dynamic knowledge graphs from text to track locations of participant entities in procedural text. It further uses these graphical representations to improve its downstream comprehension of text. Our model, Kg-Mrc, achieves state-of-the-art results on two question-answering tasks from the ProPara dataset and one from the Recipes dataset. In future work, we will extend the model to construct more general knowledge graphs with multiple relation types.
- Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Bansal et al. (2017) Trapit Bansal, Arvind Neelakantan, and Andrew McCallum. Relnet: End-to-end modeling of entities & relations. In AKBC, NIPS, 2017.
- Berant et al. (2014) Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Harding, Brad Huang, Peter Clark, and Christopher D Manning. Modeling biological processes for reading comprehension. In EMNLP, 2014.
- Bosselut et al. (2018) Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. Simulating action dynamics with neural process networks. In ICLR, 2018.
- Chen et al. (2017) Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open-domain questions. In ACL, 2017.
- Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Chung et al. (2014) Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Dalvi et al. (2018) Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau Yih, and Peter Clark. Tracking state changes in procedural text: a challenge dataset and models for process paragraph comprehension. In NAACL, 2018.
- Das et al. (2017) Rajarshi Das, Manzil Zaheer, Siva Reddy, and Andrew McCallum. Question answering on knowledge bases and text using universal schema and memory networks. In ACL, 2017.
- Edwards & Xie (2016) Michael Edwards and Xianghua Xie. Graph based convolutional neural network. arXiv preprint arXiv:1609.08965, 2016.
- He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In
- Henaff et al. (2017) Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann LeCun. Tracking the world state with recurrent entity networks. In ICLR, 2017.
- Hochreiter & Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
- Joulin et al. (2016) Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016.
- Kiddon et al. (2015) Chloé Kiddon, Ganesa Thandavam Ponnuraj, Luke Zettlemoyer, and Yejin Choi. Mise en place: Unsupervised interpretation of instructional recipes. In EMNLP, 2015.
- Kingma & Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Kipf & Welling (2017) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- Levy et al. (2017) Omer Levy, Minjoon Seo, Eunsol Choi, and Luke S. Zettlemoyer. Zero-shot relation extraction via reading comprehension. In CoNLL, 2017.
Paszke et al. (2017)
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer.
Automatic differentiation in pytorch.In NIPS-W, 2017.
- Roth et al. (2018) Benjamin Roth, Costanza Conforti, Nina Poerner, Sanjeev Karn, and Hinrich Schütze. Neural architectures for open-type relation argument extraction. arXiv preprint arXiv:1803.01707, 2018.
- Seo et al. (2017a) Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. In ICLR, 2017a.
- Seo et al. (2017b) Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. Query-reduction networks for question answering. In ICLR, 2017b.
- Tandon et al. (2018) Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, and Peter Clark. Reasoning about actions and state changes by injecting commonsense knowledge. In EMNLP, 2018.
- Weston et al. (2015) Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart van Merriënboer, Armand Joulin, and Tomas Mikolov. Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698, 2015.
- Xiong et al. (2017) Caiming Xiong, Victor Zhong, and Richard Socher. Dynamic coattention networks for question answering. In ICLR, 2017.
- Yu et al. (2018) Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. Qanet: Combining local convolution with global self-attention for reading comprehension. In ICLR, 2018.
Appendix A Implementation Details
Implementation details of Kg-Mrc are as follows.
In all experiments, the word embeddings are initialized with FastText embeddings (Joulin et al., 2016); we use a document LSTM with two layers, the number of hidden units in each layer is 64. We apply dropout rate of 0.4 in all recurrent layers, and 0.3 in all other layers. The number of recurrent graph layers were set to (). The hidden unit size for the recurrent graph component was set to 64.