Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.



There are no comments yet.


page 8


Neighborhood Mixture Model for Knowledge Base Completion

Knowledge bases are useful resources for many natural language processin...

Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding

Knowledge graph embedding aims at modeling entities and relations with l...

MDE: Multi Distance Embeddings for Link Prediction in Knowledge Graphs

Over the past decade, knowledge graphs became popular for capturing stru...

Type-augmented Relation Prediction in Knowledge Graphs

Knowledge graphs (KGs) are of great importance to many real world applic...

A Critical Examination of RESCAL for Completion of Knowledge Bases with Transitive Relations

Link prediction in large knowledge graphs has received a lot of attentio...

Communicative Message Passing for Inductive Relation Reasoning

Relation prediction for knowledge graphs aims at predicting missing rela...

Embeddings and Attention in Predictive Modeling

We explore in depth how categorical data can be processed with embedding...

Code Repositories


ACL 2019: Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge graphs (KGs) represent knowledge bases (KBs) as a directed graph whose nodes and edges represent entities and relations between entities, respectively. For example, in Figure  1, a triple (London, capital_of, United Kingdom) is represented as two entities: London and United Kingdom along with a relation (capital_of) linking them. KGs find uses in a wide variety of applications such as semantic search Berant et al. (2013); Berant and Liang (2014), dialogue generation He et al. (2017); Keizer et al. (2017), and question answering Zhang et al. (2016); Diefenbach et al. (2018), to name a few. However, KGs typically suffer from missing relations Socher et al. (2013a); West et al. (2014). This problem gives rise to the task of knowledge base completion (also referred to as relation prediction), which entails predicting whether a given triple is valid or not.

State-of-the-art relation prediction methods are known to be primarily knowledge embedding

based models. They are broadly classified as

translational models Bordes et al. (2013); Yang et al. (2015); Trouillon et al. (2016) and convolutional neural network (CNN) Nguyen et al. (2018); Dettmers et al. (2018) based models. While translational models learn embeddings using simple operations and limited parameters, they produce low quality embeddings. In contrast, CNN based models learn more expressive embeddings due to their parameter efficiency and consideration of complex relations. However, both translational and CNN based models process each triple independently and hence fail to encapsulate the semantically rich and latent relations that are inherently present in the vicinity of a given entity in a KG.

Motivated by the aforementioned observations, we propose a generalized attention-based graph embedding for relation prediction. For node classification, graph attention networks (GATs) Veličković et al. (2018) have been shown to focus on the most relevant portions of the graph, namely the node features in a 1-hop neighborhood. Given a KG and the task of relation prediction, our model generalizes and extends the attention mechanism by guiding attention to both entity (node) and relation (edge) features in a multi-hop neighborhood of a given entity / node.

Our idea is: 1) to capture multi-hop relations Lin et al. (2015) surrounding a given node, 2) to encapsulate the diversity of roles played by an entity in various relations, and 3) to consolidate the existing knowledge present in semantically similar relation clusters Valverde-Rebaza and de Andrade Lopes (2012). Our model achieves these objectives by assigning different weight mass (attention) to nodes in a neighborhood and by propagating attention via layers in an iterative fashion. However, as the model depth increases, the contribution of distant entities decreases exponentially. To resolve this issue, we use relation composition as proposed by Lin et al. (2015) to introduce an auxiliary edge between -hop neighbors, which then readily allows the flow of knowledge between entities. Our architecture is an encoder-decoder model where our

generalized graph attention model

and ConvKB Nguyen et al. (2018) play the roles of an encoder and decoder, respectively. Moreover, this method can be extended for learning effective embeddings for Textual Entailment Graphs  KOTLERMAN et al. (2015), where global learning has proven effective in the past as shown by  Berant et al. (2015) and  Berant et al. (2010).

Our contributions are as follows. To the best of our knowledge, we are the first to learn new graph attention based embeddings that specifically target relation prediction on KGs. Secondly, we generalize and extend graph attention mechanisms to capture both entity and relation features in a multi-hop neighborhood of a given entity. Finally, we evaluate our model on challenging relation prediction tasks for a wide variety of real-world datasets. Our experimental results indicate a clear and substantial improvement over state-of-the-art relation prediction methods. For instance, our attention-based embedding achieves an improvement of over the state-of-the-art method for the Hits@ metric on the popular Freebase (FB15K-237) dataset.

The rest of the paper is structured as follows. We first provide a review of related work in Section 2 and then our detailed approach in Section 3. Experimental results and dataset descriptions are reported in Section 4 followed by our conclusion and future research directions in Section 5.

Figure 1: Subgraph of a knowledge graph contains actual relations between entities (solid lines) and inferred relations that are initially hidden (dashed lines).

2 Related Work

Recently, several variants of KG embeddings have been proposed for relation prediction. These methods can be broadly classified as: (i) compositional, (ii) translational, (iii) CNN based, and (iv) graph based models.

RESCAL Nickel et al. (2011), NTN Socher et al. (2013b), and the Holographic embedding model (HOLE) Nickel et al. (2016)

are examples of compositional based models. Both RESCAL and NTN use tensor products which capture rich interactions, but require a large number of parameters to model relations and are thus cumbersome to compute. To combat these drawbacks, HOLE creates more efficient and scalable compositional representations using the circular correlation of entity embeddings.

In comparison, translational models like TransE Bordes et al. (2013), DISTMULT Yang et al. (2015) and ComplEx Trouillon et al. (2016) propose arguably simpler models. TransE considers the translation operation between head and tail entities for relations. DISTMULT Yang et al. (2015) learns embeddings using a bilinear diagonal model which is a special case of the bilinear objective used in NTN and TransE. DISTMULT uses weighted element-wise dot products to model entity relations. ComplEx Trouillon et al. (2016) generalizes DISTMULT Yang et al. (2015) by using complex embeddings and Hermitian dot products instead. These translational models are faster, require fewer parameters and are relatively easier to train, but result in less expressive KG embeddings.

Recently, two CNN based models have been proposed for relation prediction, namely ConvE Dettmers et al. (2018) and ConvKB Nguyen et al. (2018). ConvE uses 2-D convolution over embeddings to predict links. It comprises of a convolutional layer, a fully connected projection layer and an inner product layer for the final predictions. Different feature maps are generated using multiple filters to extract global relationships. Concatenation of these feature maps represents an input triple. These models are parameter efficient but consider each triple independently without taking into account the relationships between the triples.

A graph based neural network model called R-GCN Schlichtkrull et al. (2018) is an extension of applying graph convolutional networks (GCNs) Kipf and Welling (2017) to relational data. It applies a convolution operation to the neighborhood of each entity and assigns them equal weights. This graph based model does not outperform the CNN based models.

Existing methods either learn KG embeddings by solely focusing on entity features or by taking into account the features of entities and relations in a disjoint manner. Instead, our proposed graph attention model holistically captures multi-hop and semantically similar relations in the -hop neighborhood of any given entity in the KG.

3 Our Approach

We begin this section by introducing the notations and definitions used in the rest of the paper, followed by a brief background on graph attention networks (GATs) Veličković et al. (2018). Finally, we describe our proposed attention architecture for knowledge graphs followed by our decoder network.

3.1 Background

A knowledge graph is denoted by , where and represent the set of entities (nodes) and relations (edges), respectively. A triple is represented as an edge between nodes and in 111From here onwards, the pairs “node / entity” and “edge / relation” will be used interchangeably. Embedding models try to learn an effective representation of entities, relations, and a scoring function , such that for a given input triple , gives the likelihood of being a valid triple. For example, Figure 1 shows the subgraph from a KG which infers missing links represented by dashed lines using existing triples such as (London, captial_of, United Kingdom).

3.2 Graph Attention Networks (GATs)

Graph convolutional networks (GCNs) Kipf and Welling (2017) gather information from the entity’s neighborhood and all neighbors contribute equally in the information passing. To address the shortcomings of GCNs, Veličković et al. (2018) introduced graph attention networks (GATs). GATs learn to assign varying levels of importance to nodes in every node’s neighborhood, rather than treating all neighboring nodes with equal importance, as is done in GCN.

The input feature set of nodes to a layer is

. A layer produces a transformed set of node feature vectors

, where and are input and output embeddings of the entity , and is number of entities (nodes). A single GAT layer can be described as


where is the attention value of the edge in , W

is a parametrized linear transformation matrix mapping the input features to a higher dimensional output feature space, and

is any attention function of our choosing.

Attention values for each edge are the importance of the edge s features for a source node . Here, the relative attention is computed using a softmax function over all the values in the neighborhood. Equation 2 shows the output of a layer. GAT employs multi-head attention to stabilize the learning process as credited to Vaswani et al. (2017).


The multihead attention process of concatenating attention heads is shown as follows in Equation 3.


where represents concatenation, represents any non-linear function, are normalized attention coefficients of edge calculated by the -th attention mechanism, and represents the corresponding linear transformation matrix of the -th attention mechanism. The output embedding in the final layer is calculated using averaging, instead of the concatenation operation, to achieve multi-head attention, as is shown in the following Equation 4.


3.3 Relations are Important

Despite the success of GATs, they are unsuitable for KGs as they ignore relation (edge) features, which are an integral part of KGs. In KGs, entities play different roles depending on the relation they are associated with. For example, in Figure 1, entity Christopher Nolan appears in two different triples assuming the roles of a brother and a director. To this end, we propose a novel embedding approach to incorporate relation and neighboring node features in the attention mechanism.

Figure 2: This figure shows the aggregation process of our graph attentional layer. represents relative attention values of the edge. The dashed lines represent an auxiliary edge from a -hop neighbors, in this case .

We define a single attentional layer, which is the building block of our model. Similar to GAT, our framework is agnostic to the particular choice of attention mechanism.

Each layer in our model takes two embedding matrices as input. Entity embeddings are represented by a matrix , where the -th row is the embedding of entity , is the total number of entities, and is the feature dimension of each entity embedding. With a similar construction, the relation embeddings are represented by a matrix . The layer then outputs the corresponding embedding matrices, and .

In order to obtain the new embedding for an entity , a representation of each triple associated with is learned. We learn these embeddings by performing a linear transformation over the concatenation of entity and relation feature vectors corresponding to a particular triple , as is shown in Equation 5. This operation is also illustrated in the initial block of Figure 4.


where is the vector representation of a triple . Vectors , and denote embeddings of entities and relation , respectively. Additionally, denotes the linear transformation matrix.

Figure 3: Attention Mechanism
Figure 4: This figure shows end-to-end architecture of our model. Dashed arrows in the figure represent concatenation operation. Green circles represents initial entity embedding vectors and yellow circles represents initial relation embedding vectors.

Similar to Veličković et al. (2018), we learn the importance of each triple denoted by . We perform a linear transformation parameterized by a weight matrix followed by application of the LeakyRelu non-linearity to get the absolute attention value of the triple (Equation 6).


To get the relative attention values softmax is applied over as shown in Equation 7. Figure 3 shows the computation of relative attention values for a single triple.


where denotes the neighborhood of entity and denotes the set of relations connecting entities and . The new embedding of the entity is the sum of each triple representation weighted by their attention values as shown in Equation 8.


As suggested by Veličković et al. (2018), multi-head attention which was first introduced by Vaswani et al. (2017), is used to stabilize the learning process and encapsulate more information about the neighborhood. Essentially, independent attention mechanisms calculate the embeddings, which are then concatenated, resulting in the following representation:


This is the graph attention layer shown in Figure  4. We perform a linear transformation on input relation embedding matrix G, parameterized by a weight matrix , where is the dimensionality of output relation embeddings (Equation 10).


In the final layer of our model, instead of concatenating the embeddings from multiple heads we employ averaging to get final embedding vectors for entities as shown in Equation 11.


However, while learning new embeddings, entities lose their initial embedding information. To resolve this issue, we linearly transform to obtain using a weight matrix , where represents the input entity embeddings to our model, represents the transformed entity embeddings, denotes the dimension of an initial entity embedding, and denotes the dimension of the final entity embedding. We add this initial entity embedding information to the entity embeddings obtained from the final attentional layer, as shown in Equation 12.


In our architecture, we extend the notion of an edge to a directed path by introducing an auxiliary relation for -hop neighbors between two entities. The embedding of this auxiliary relation is the summation of embeddings of all the relations in the path. Our model iteratively accumulates knowledge from distant neighbors of an entity. As illustrated in figure 2, in the first layer of our model, all entities capture information from their direct in-flowing neighbors. In the second layer, U.S gathers information from entities Barack Obama, Ethan Horvath, Chevrolet, and Washington D.C, which already possess information about their neighbors Michelle Obama and Samuel L. Jackson, from a previous layer. In general, for a layer model the incoming information is accumulated over a -hop neighborhood. The aggregation process to learn new entity embeddings and the introduction of an auxiliary edge between -hop neighbors is also shown in Figure 2. We normalize the entity embeddings after every generalized GAT layer and prior to the first layer, for every main iteration.

3.4 Training Objective

Our model borrows the idea of a translational scoring function from Bordes et al. (2013), which learns embeddings such that for a given valid triple , the condition holds, i.e., is the nearest neighbor of connected via relation . Specifically, we try to learn entity and relation embeddings to minimize the L1-norm dissimilarity measure given by .

We train our model using hinge-loss which is given by the following expression


where is a margin hyper-parameter, is the set of valid triples, and denotes the set of invalid triples, given formally as

# Edges
Dataset # Entities # Relations Training Validation Test Total Mean in-degree Median in-degree
WN18RR 40,943 11 86,835 3034 3134 93,003 2.12 1
FB15k-237 14,541 237 272,115 17,535 20,466 310,116 18.71 8
NELL-995 75,492 200 149,678 543 3992 154,213 1.98 0
Kinship 104 25 8544 1068 1074 10,686 82.15 82.5
UMLS 135 46 5216 652 661 6529 38.63 20
Table 1: Dataset statistics

3.5 Decoder

Our model uses ConvKB Nguyen et al. (2018) as a decoder. The aim of the convolutional layer is to analyze the global embedding properties of a triple across each dimension and to generalize the transitional characteristics in our model. The score function with multiple feature maps can be written formally as:

where represents the convolutional filter, is a hyper-parameter denoting number of filters used, is a convolution operator, and represents a linear transformation matrix used to compute the final score of the triple. The model is trained using soft-margin loss as


4 Experiments and Results

WN18RR FB15K-237
Hits@N Hits@N
MR MRR @1 @3 @10 MR MRR @1 @3 @10
DistMult Yang et al. (2015) 7000 0.444 41.2 47 50.4 512 0.281 19.9 30.1 44.6
ComplEx Trouillon et al. (2016) 7882 0.449 40.9 46.9 53 546 0.278 19.4 29.7 45
ConvE Dettmers et al. (2018) 4464 0.456 41.9 47 53.1 245 0.312 22.5 34.1 49.7
TransE Bordes et al. (2013) 2300 0.243 4.27 44.1 53.2 323 0.279 19.8 37.6 44.1
ConvKB Nguyen et al. (2018) 1295 0.265 5.82 44.5 55.8 216 0.289 19.8 32.4 47.1
R-GCN Schlichtkrull et al. (2018) 6700 0.123 20.7 13.7 8 600 0.164 10 18.1 30
Our work 1940 0.440 36.1 48.3 58.1 210 0.518 46 54 62.6
Table 2: Experimental results on WN18RR and FB15K-237 test sets. Hits@N values are in percentage. The best score is in bold and second best score is underlined.
NELL-995 Kinship
Hits@N Hits@N
MR MRR @1 @3 @10 MR MRR @1 @3 @10
DistMult Yang et al. (2015) 4213 0.485 40.1 52.4 61 5.26 0.516 36.7 58.1 86.7
ComplEx Trouillon et al. (2016) 4600 0.482 39.9 52.8 60.6 2.48 0.823 73.3 89.9 97.11
ConvE Dettmers et al. (2018) 3560 0.491 40.3 53.1 61.3 2.03 0.833 73.8 91.7 98.14
TransE Bordes et al. (2013) 2100 0.401 34.4 47.2 50.1 6.8 0.309 0.9 64.3 84.1
ConvKB Nguyen et al. (2018) 600 0.43 37.0 47 54.5 3.3 0.614 43.62 75.5 95.3
R-GCN Schlichtkrull et al. (2018) 7600 0.12 8.2 12.6 18.8 25.92 0.109 3 8.8 23.9
Our work 965 0.530 44.7 56.4 69.5 1.94 0.904 85.9 94.1 98
Table 3: Experimental results on NELL-995 and Kinship test sets. Hits@N values are in percentage. The best score is in bold and second best score is underlined.

4.1 Datasets

To evaluate our proposed method, we use five benchmark datasets: WN18RR Dettmers et al. (2018), FB15k-237 Toutanova et al. (2015), NELL-995 Xiong et al. (2017), Unified Medical Language Systems (UMLS) Kok and Domingos (2007) and Alyawarra Kinship Lin et al. (2018). Previous works Toutanova et al. (2015); Dettmers et al. (2018) suggest that the task of relation prediction in WN18 and FB15K suffers from the problem of inverse relations, whereby one can achieve state-of-the-art results using a simple reversal rule based model, as shown by Dettmers et al. (2018). Therefore, corresponding subset datasets WN18RR and FB15k-237 were created to resolve the reversible relation problem in WN18 and FB15K. We used the data splits provided by Nguyen et al. (2018). Table 1 provides statistics of all datasets used.

4.2 Training Protocol

We create two sets of invalid triples, each time replacing either the head or tail entity in a triple by an invalid entity. We randomly sample equal number of invalid triples from both the sets to ensure robust performance on detecting both head and tail entity. Entity and relation embeddings produced by TransE Bordes et al. (2013); Nguyen et al. (2018) are used to initialize our embeddings.

We follow a two-step training procedure, i.e., we first train our generalized GAT to encode information about the graph entities and relations and then train a decoder model like ConvKB Nguyen et al. (2018) to perform the relation prediction task. The original GAT update Equation 3 only aggregates information passed from -hop neighborhood, while our generalized GAT uses information from the -hop neighborhood. We use auxiliary relations to aggregate more information about the neighborhood in sparse graphs. We use Adam to optimize all the parameters with initial learning rate set at . Both the entity and relation embeddings of the final layer are set to . The optimal hyper-parameters set for each dataset are mentioned in our supplementary section.

4.3 Evaluation Protocol

In the relation prediction task, the aim is to predict a triple with or missing, i.e., predict given or predict given . We generate a set of corrupt triples for each entity by replacing it with every other entity , then we assign a score to each such triple. Subsequently, we sort these scores in ascending order and get the rank of a correct triple . Similar to previous work (Bordes et al. (2013), Nguyen et al. (2018), Dettmers et al. (2018)), we evaluate all the models in a filtered setting, i.e, during ranking we remove corrupt triples which are already present in one of the training, validation, or test sets. This whole process is repeated by replacing the tail entity , and averaged metrics are reported. We report mean reciprocal rank (MRR), mean rank (MR) and the proportion of correct entities in the top ranks (Hits@N) for and .

4.4 Results and Analysis

Tables 2 and 3 present the prediction results on the test sets of all the datasets. The results clearly demonstrate that our proposed method222Our work significantly outperforms state-of-the-art results on five metrics for FB15k-237, and on two metrics for WN18RR. We downloaded publicly available source codes to reproduce results of the state-of-the-art methods333TransE444DistMult555ComplEx666R-GCN777ConvE888ConvKB on all the datasets.

(a) Epoch 0
(b) Epoch 1200
(c) Epoch 2400
Figure 5: Learning process of our model on FB15K-237 dataset. Y-axis represents attention values .
(a) Epoch 0
(b) Epoch 1800
(c) Epoch 3600
Figure 6: Learning process of our model on WN18RR dataset. Y-axis represents attention values

Attention Values vs Epochs: We study the distribution of attention with increasing epochs for a particular node. Figure 5 shows this distribution on FB15k-237. In the initial stages of the learning process, the attention is distributed randomly. As the training progresses and our model gathers more information from the neighborhood, it assigns more attention to direct neighbors and takes minor information from the more distant neighbors. Once the model converges, it learns to gather multi-hop and clustered relation information from the -hop neighborhood of the node.

PageRank Analysis: We hypothesize that complex and hidden multi-hop relations among entities are captured more succinctly in dense graphs as opposed to sparse graphs. To test this hypothesis, we perform an analysis similar to ConvE, where they study the correlation between mean PageRank and increase in MRR relative to DistMult. We notice a strong correlation coefficient of . Table 4 indicates that when there is an increase in PageRank values, there is also a corresponding increase in MRR values. We observe an anomaly to our observed correlation in case of NELL-995 versus WN18RR and attribute this to the highly sparse and hierarchical structure of WN18RR which poses as a challenge to our method that does not capture information in a top-down recursive fashion.

4.5 Ablation Study

We carry out an ablation study on our model, where we analyze the behavior of mean rank on a test set when we omit path generalization (PG), i.e., removing -hop information, and omit relation Information (Relations) from our model. Figure 7 shows that our model performs better than the two ablated models and we see a significant drop in the results when using ablated models on NELL-995. Removing the relations from the proposed model has a huge impact on the results which suggests that the relation embeddings play a pivotal role in relation prediction.

Dataset PageRank Relative Increase
NELL-995 1.32 0.025
WN18RR 2.44 -0.01
FB15k-237 6.87 0.237
UMLS 740 0.247
Kinship 961 0.388
Table 4: Mean PageRank vs relative increase in MRR wrt. DistMult.
Figure 7: Epochs vs Mean Rank for our model and two ablated models on NELL-995. PG (green) represents the model after removing -hop auxiliary relations or path generalization, Relations (blue) represents model without taking relations into account and Our model (red) represents the entire model.

5 Conclusion and Future Work

In this paper, we propose a novel approach for relation prediction. Our approach improves over the state-of-the-art models by significant margins. Our proposed model learns new graph attention-based embeddings that specifically cater to relation prediction on KGs. Additionally, we generalize and extend graph attention mechanisms to capture both entity and relation features in a multi-hop neighborhood of a given entity. Our detailed and exhaustive empirical analysis gives more insight into our method’s superiority for relation prediction on KGs. The proposed model can be extended to learn embeddings for various tasks using KGs such as dialogue generation He et al. (2017); Keizer et al. (2017), and question answering Zhang et al. (2016); Diefenbach et al. (2018).

In the future, we intend to extend our method to better perform on hierarchical graphs and capture higher-order relations between entities (like motifs) in our graph attention model.


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.


  • Berant et al. (2015) Jonathan Berant, Noga Alon, Ido Dagan, and Jacob Goldberger. 2015. Efficient global learning of entailment graphs. Computational Linguistics, 41(2):221–263.
  • Berant et al. (2013) Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In

    Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013

    , pages 1533–1544.
  • Berant et al. (2010) Jonathan Berant, Ido Dagan, and Jacob Goldberger. 2010. Global learning of focused entailment graphs. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1220–1229, Uppsala, Sweden. Association for Computational Linguistics.
  • Berant and Liang (2014) Jonathan Berant and Percy Liang. 2014. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems (NIPS), 2013, pages 2787–2795.
  • Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In

    Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018

  • Diefenbach et al. (2018) Dennis Diefenbach, Kamal Singh, and Pierre Maret. 2018. Wdaqua-core1: a question answering service for rdf knowledge bases. In Companion of the The Web Conference 2018 on The Web Conference (WWW), 2018, pages 1087–1091. International World Wide Web Conferences Steering Committee.
  • He et al. (2017) He He, Anusha Balakrishnan, Mihail Eric, and Percy Liang. 2017. Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017.
  • Keizer et al. (2017) Simon Keizer, Markus Guhe, Heriberto Cuayahuitl, Ioannis Efstathiou, Klaus-Peter Engelbrecht, Mihai Dobre, Alex Lascarides, and Oliver Lemon. 2017.

    Evaluating persuasion strategies and deep reinforcement learning methods for negotiation dialogue agents.

    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (ACL), 2017.
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  • Kok and Domingos (2007) Stanley Kok and Pedro Domingos. 2007. Statistical predicate invention. In

    Proceedings of the 24th International Conference on Machine Learning

    , (ICML), 2007.
  • KOTLERMAN et al. (2015) LILI KOTLERMAN, IDO DAGAN, BERNARDO MAGNINI, and LUISA BENTIVOGLI. 2015. Textual entailment graphs. Natural Language Engineering, 21(5):699–724.
  • Lin et al. (2018) Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2018. Multi-hop knowledge graph reasoning with reward shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
  • Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015. Modeling relation paths for representation learning of knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
  • Nguyen et al. (2018) Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. 2018. A novel embedding model for knowledge base completion based on convolutional neural network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018, volume 2, pages 327–333.
  • Nickel et al. (2016) Maximilian Nickel, Lorenzo Rosasco, Tomaso A Poggio, et al. 2016. Holographic embeddings of knowledge graphs. In (AAAI), 2016, volume 2, pages 3–2.
  • Nickel et al. (2011) Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on International Conference on Machine Learning, (ICML), 2011.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference (ESWC), 2018, pages 593–607.
  • Socher et al. (2013a) Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013a. Reasoning with neural tensor networks for knowledge base completion. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26 (NIPS), 2013, pages 926–934.
  • Socher et al. (2013b) Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013b. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (NIPS), 2013, pages 926–934.
  • Toutanova et al. (2015) Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015, pages 1499–1509.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning (ICML), 2016, pages 2071–2080.
  • Valverde-Rebaza and de Andrade Lopes (2012) Jorge Carlos Valverde-Rebaza and Alneu de Andrade Lopes. 2012. Link prediction in complex networks based on cluster information. In Proceedings of the 21st Brazilian Conference on Advances in Artificial Intelligence, (SBIA), 2012, pages 92–101.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems (NIPS), 2017, pages 5998–6008.
  • Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. International Conference on Learning Representations (ICLR), 2018.
  • West et al. (2014) Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, and Dekang Lin. 2014. Knowledge base completion via search-based question answering. In Proceedings of the 23rd International Conference on World Wide Web, (WWW), 2014, pages 515–526.
  • Xiong et al. (2017) Wenhan Xiong, Thien Hoang, and William Yang Wang. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
  • Yang et al. (2015) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. International Conference on Learning Representations (ICLR), 2015.
  • Zhang et al. (2016) Yuanzhe Zhang, Kang Liu, Shizhu He, Guoliang Ji, Zhanyi Liu, Hua Wu, and Jun Zhao. 2016. Question answering over knowledge base with neural attention combining global knowledge information. arXiv preprint arXiv:1606.00979, 2016.