RelNet: End-to-End Modeling of Entities & Relations

We introduce RelNet: a new model for relational reasoning. RelNet is a memory augmented neural network which models entities as abstract memory slots and is equipped with an additional relational memory which models relations between all memory pairs. The model thus builds an abstract knowledge graph on the entities and relations present in a document which can then be used to answer questions about the document. It is trained end-to-end: only supervision to the model is in the form of correct answers to the questions. We test the model on the 20 bAbI question-answering tasks with 10k examples per task and find that it solves all the tasks with a mean error of 0.3 the 20 tasks.


page 1

page 2

page 3

page 4


Variational Reasoning for Question Answering with Knowledge Graph

Knowledge graph (KG) is known to be helpful for the task of question ans...

End-To-End Memory Networks

We introduce a neural network with a recurrent attention model over a po...

Reading Comprehension using Entity-based Memory Network

This paper introduces a novel neural network model for question answerin...

Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks

Long document coreference resolution remains a challenging task due to t...

Neural Compositional Denotational Semantics for Question Answering

Answering compositional questions requiring multi-step reasoning is chal...

Self-Assttentive Associative Memory

Heretofore, neural networks with external memory are restricted to singl...

Self-Attentive Associative Memory

Heretofore, neural networks with external memory are restricted to singl...

1 Introduction

Reasoning about entities and their relations is an important problem for achieving general artificial intelligence. Often such problems are formulated as reasoning over graph-structured representation of knowledge. Knowledge graphs, for example, consist of entities and relations between them

(Hoffart et al., 2013; Bollacker et al., 2008; Carlson et al., 2010; Auer et al., 2007). Representation learning (Riedel et al., 2013; Bordes et al., 2013; Wang et al., 2014; Lin et al., 2015) and reasoning (Das et al., 2016a; Neelakantan et al., 2015, 2016; Miller et al., 2016) with such structured representations is an important and active area of research.

Most previous work on knowledge representation and reasoning relies on a pipeline of natural language processing systems, often consisting of named entity extraction

(McCallum and Li, 2003), entity resolution and coreference (Dredze et al., 2010), relationship extraction (Riedel et al., 2013), and knowledge graph inference (Das et al., 2016b). While this cascaded approach of using NLP systems can be effective at reasoning with knowledge bases at scale, it also leads to a problem of compounding of the error from each component sub-system. The importance of each of these sub-component on a particular downstream application is also not clear.

For the task of question-answering, we instead make an attempt at an end-to-end approach which directly models the entities and relations in the text as memory slots. While incorporating existing knowledge (from curated knowledge bases) for the purpose of question-answering (Miller et al., 2016; Das et al., 2016a; Fader et al., 2014) is an important area of research, we consider the simpler setting where all the information is contained within the text itself – which is the approach taken by many recent memory based neural network models (Sukhbaatar et al., 2015; Henaff et al., 2017; Weston et al., 2015; Munkhdalai and Yu, 2016).

Recently, Henaff et al. (2017)

proposed a dynamic memory based neural network for implicitly modeling the state of entities present in the text for question answering. However, this model lacks any module for relational reasoning. In response, we propose RelNet, which extends memory-augmented neural networks with a relational memory to reason about relationships between multiple entities present within the text. Our end-to-end method reads text, and writes to both memory slots and edges between them. Intuitively, the memory slots correspond to entities and the edges correspond to relationships between entities, each represented as a vector. The only supervision signal for our method comes from answering questions on the text.

We demonstrate the utility of the model through experiments on the bAbI tasks (Weston et al., 2015) and find that the model achieves smaller mean error across the tasks than the best previously published result (Henaff et al., 2017) in the 10k examples regime and achieves 0% error on 11 of the 20 tasks.

2 RelNet Model

Figure 1:

RelNet Model: The model represents the state of the world as a neural turing machine with relational memory. At each time step, the model reads the sentence into an encoding vector and updates both entity memories and all edges between them representing the relations.

We describe the RelNet model in this section. Figure 1 provides a high-level view of the model. The model is sequential in nature, consisting of the following steps: read text, process it into a dynamic relational memory and then attention conditioned on the question generates the answer. We model the dynamic memory in a fashion similar to Recurrent Entity Networks (Henaff et al., 2017) and then equip it with an additional relational memory.

There are three main components to the model: 1) input encoder 2) dynamic memory, and 3) output module. We will describe these three modules in details. The input encoder and output module implementations are similar to the Entity Network (Henaff et al., 2017) and main novelty lies in the dynamic memory. We describe the operations executed by the network for a single example consisting of a document with sentences, where each sentence consists of a sequence of words represented with -dimensional word embeddings , a question on the document represented as another sequence of words and an answer to the question.

Input Encoder:

The input at each time point is a sentence from the document which can be encoded into a fixed vector representation using some encoding mechanism, such as a recurrent neural network. We use a simple encoder with a learned multiplicative mask

(Henaff et al., 2017; Sukhbaatar et al., 2015): .

Dynamic Relational Memory

This is the main component of an end-to-end reasoning pipeline, where we need to process the information contained in the text such that it can be used to reason about the entities, their properties and the relationships among them. The memory consists of two parts: entity memory and relational memory. The entity memory is organized as a key-value memory network (Miller et al., 2016), where the keys are global embeddings updated during training time but not during inference, and the value memory slot is a dynamic memory for each example (document, question) whose values are updated while reading the document. The memory thus consists of memory slots (each is a vector of dimension ) and associated keys (again vectors of dimension ). At time , after reading the sentence into a vector representation , a gating mechanism decides the set of memories to be updated ( denotes inner product):


Intuitively the memory slots can be thought of as entities. Indeed, Henaff et al. (2017) found that if they tie the key vectors to entities in the text then the memories contain information about the state of those entities. The update in (1) essentially does a soft selection of memory slots based on cosine distance in the embedding space. Note that there can be multiple entites in a sentence hence a sigmoid operation is more suitable, and it is also more scalable (Henaff et al., 2017). After selecting the set of memories, there is an update step which stores information in the corresponding memory slots:


where PReLU is a parametric Rectified linear unit

(He et al., 2015), and , and are parameter matrices.

Now we augment the model with additional relational memory cells. Intuitively, the entity memory allows modeling of entities and information about the entities in isolation. This can be insufficient in scenarios where a particular entity participates in may relations with other entities across the document. Thus, in order to succeed at relational reasoning the model needs to be able to compare each pair of the entity memories. The relational memories will allow modeling of these relations and provide an inherent inductive bias towards a more structured representation of the participating entities in the text, in the form a latent knowledge graph. The relational memories are memory slots indexed by the entity memory slots .

The relational memories are updated as follows. First, a gating mechanism decides the set of active relational memories:


where select the relational memory slot based on the active entity slots and the last sigmoid gate decides whether the corresponding relational memory needs to be updated based on the current input sentence. After selecting the set of active relational memory, we update the contents of the relational memory:


where again are parameter matrices. Note that for updates (3)–(4) we use a different encoding mask to obtain the sentence representation for relations.

Similar to Henaff et al. (2017), we normalize the memories after each update step (that is after reading each sentence). This acts as a forget step and does not cause the memory to explode.

The full memory consists of the entity memory slots and the relational memory slots .

Output Module

This is a standard attention module used in memory networks (Sukhbaatar et al., 2015; Henaff et al., 2017). The question is encoded as a dimensional vector using the same encoding mechanism as the sentences (though with a separate learned mask). We first concatenate the relational memory vectors with the corresponding entity vectors, and project the resulting memory vector to dimension. Then attention on these projected memories, conditioned on the vector , yields the final answer:

where is the predicted answer, and are parameter matrices.

3 Related Work

There is a long line of work in textual question-answering systems (Kwiatkowski et al., 2010; Berant et al., 2013). Recent successful approaches use memory based neural networks for question answering, for example Weston et al. (2014, 2015); Xiong et al. (2016); Munkhdalai and Yu (2016); Henaff et al. (2017). Our model is also a memory network based model and is also related to the neural turing machine (Graves et al., 2014). As described previously, the model is closely related to the Recurrent Entity Networks model (Henaff et al., 2017) which describes an end-to-end approach to model entities in text but does not directly model relations. Other approaches to question answering use external knowledge, for instance external knowledge bases (Bordes et al., 2015; Miller et al., 2016; Das et al., 2017; Andreas et al., 2016; Neelakantan et al., 2015) or external text like Wikipedia (Yang et al., 2015; Chen et al., 2017).

Very recently, and in parallel to this work, a method for relational reasoning called relation networks (Santoro et al., 2017) was proposed. They demonstrated that simple neural network modules are not as effective at relational reasoning and their proposed module is similar to our model. However, relation network is not a memory-based model and there is no mechanism to read and write relevant information for each pair. Moreover, while their approach scales as the square of the number of sentences, our approach scales as the square of the number of memory slots used per QA pair. The output module in our model can be seen as a type of relation network.

Representation learning and reasoning over graph structured data is also relevant to this work. Graph based neural network models (Li et al., 2015; Scarselli et al., 2009; Kipf and Welling, 2016) have been proposed which take graph data as an input. The relational memory however does not rely on a specified graph structure and such models can potentially be used for multi-hop reasoning over the relational memory. Johnson (2016) proposed a method for learning a graphical representation of the text data for question answering, however the model requires explicit supervision for the graph at every step whereas RelNet does not require explicit supervision for the graph.

Task EntNet Henaff et al. (2017) RelNet
1: 1 supporting fact 0 0
2: 2 supporting facts 0.1 0.7
3: 3 supporting facts 4.1 3.4
4: 2 argument relations 0 0
5: 3 argument relations 0.3 0.6
6: yes/no questions 0.2 0
7: counting 0 0.1
8: lists/sets 0.5 0
9: simple negation 0.1 0
10: indefinite knowledge 0.6 0.1
11: basic coreference 0.3 0.1
12: conjunction 0 0
13: compound coreference 1.3 0
14: time reasoning 0 0.2
15: basic deduction 0 0
16: basic induction 0.2 0.1
17: positional reasoning 0.5 0
18: size reasoning 0.3 0.4
19: path finding 2.3 0
20: agents motivation 0 0
Tasks with 0 % error 7 11
Mean % Error 0.5 0.3
Table 1: Mean % Error on the 20 Babi tasks.

4 Experiments

We evaluate the model’s performance on the bAbI tasks (Weston et al., 2015), a collection of 20 question answering tasks which have become a benchmark for evaluating memory-augmented neural networks. We compare the performance with the Recurrent Entity Networks model (EntNet) (Henaff et al., 2017). Performance is measured in terms of mean percentage error on the tasks.

Training Details: We used Adam and did a grid search for the learning rate in {0.01, 0.005, 0.001} and choose a fixed learning rate of 0.005 based on performance on the validation set, and clip the gradient norm at 2. We keep all other details similar to Henaff et al. (2017)

for a fair comparison. embedding dimensions were fixed to be 100, models were trained for a maximum of 250 epochs with mini-batches size of 32 for all tasks except 3 for which the batch size was 16. The document sizes were limited to most recent 70 sentences for all tasks, except for task 3 for which it was limited to 130. The RelNet models were run for 5 times with random seed on each task and the model with best validation performance was chosen as the final model. The baseline EntNet model was run for 10 times for each task

(Henaff et al., 2017).

The results are shown in Table 1. The RelNet model achieves a mean error of 0.285% across tasks which is better than the results of the EntNet model (Henaff et al., 2017). The RelNet model is able to achieve 0% test error on 11 of the tasks, whereas the EntNet model achieves 0% error on 7 of the tasks.

5 Conclusion

We demonstrated an end-to-end trained neural network augmented with a structured memory representation which can reason about entities and relations for question answering. Future work will investigate the performance of these models on more real world datasets, interpreting what the models learn, and scaling these models to answer questions about entities and relations from reading massive text corpora.