Graphs are data structures suitable for a wide variety of applications, including knowledge representation. They are useful for encoding information from different domains by representing discrete entities and relations of different types between them, forming a Knowledge Graph (KG). A common way to answer a question using a KG is to pose it as a structured query (for example, using the SPARQL query language ). The query is then answered via logical inference, using the information present in the graph. However, knowledge graphs are usually incomplete, either due to the construction process, or their dynamic nature. This means that there will be cases where these systems return no answer for a query. To circumvent this problem, one could use query relaxation techniques, that analyze the query and modify it, to reduce the constraints that must be met by an entity to be considered an answer [4, 6].
We address this problem by answering a query without being limited by an incomplete KG, while avoiding any direct modification of the query. We follow recent works that propose to map the query and all entities in the KG to an embedding space [7, 17, 11]. There, we can compute similarity scores to produce a ranked list of answers, even if the graph had some information missing to answer the original query.
In this work, we propose Message Passing Query Embedding (MPQE), motivated by the observation that queries over a KG can be represented by small graphs, where nodes correspond to entities (constants) and variables in the query. We employ a Graph Neural Network (GNN) to perform message passing on the query graph, and an aggregation function to combine all the messages in a single vector, which acts as a representation of the query in the embedding space. Our architecture is illustrated in fig.1. By training on the task of query answering, our method learns jointly embeddings of entities and variables. Our contributions can be summarized as follows:
We propose a novel method to embed queries over knowledge graphs, that addresses limitations of previous works in terms of computational complexity, and the diversity of query structures that it admits.
We introduce three datasets for the evaluation of complex query answering over knowledge graphs from thousands to millions of entities and edges.
We carry out multiple experiments to evaluate the performance of methods on query answering. Our results show that our architecture is competitive with the state-of-the-art method for query embedding, when training on multiple query structures. We demonstrate the superior generalization properties of our method by training for link prediction only. The results show that MPQE generalizes to much more complex queries not seen during training.
We conduct a qualitative analysis of the entity embeddings produced by query embedding methods, and show that MPQE learns a more structured embedding space.
2 Problem Definition
We define a Knowledge Graph (KG) as a tuple , where is a set of nodes representing entities, and a set of typed edges between the nodes. A function assigns a type to every node, where is a set of node types. Each edge corresponds to a relation between two nodes and , that we denote by , where is a relation type.
Given a KG, we can pose queries that seek for an entity satisfying certain conditions. One way to define these conditions is to use a conjunctive form, that consists of a conjunction of binary predicates where the arguments are entities, or query variables. The condition specifies constraints on the relations between entities and variables.
To illustrate this, consider a KG of an academic institution where researchers work on topics, and topics are related to projects. We can formulate the following query: “select all projects , such that topic is related to , and both alice and bob work on .” This query asks for entities that satisfy the following condition:
In general, a query is defined by a condition on a target variable as follows:
where , and and are either entities in , or query variables in . An entity is therefore considered an answer if it satisfies the condition defined by the query.
We address the problem of returning a list of entities that satisfy the query, even when the binary predicates would require edges missing in the KG. To do so, we assign an embedding to every entity . We additionally define an embedding method for the query, that maps the complete query to a vector
. We can then score each entity in the KG as an answer to the query, using the cosine similarity betweenand the entity embedding :
The problem thus requires a specification of a query embedding mechanism that captures the properties of the entities relevant to the query.
3 Message Passing Query Embedding
As noted in previous work [7, 17], some queries in conjunctive form can be represented as a Directed Acyclic Graph (DAG). In this graph, the leaf nodes correspond to entities in the query, the root to the variable to be retrieved, and any intermediate nodes to other variables in the query. In the SPARQL query language, these graphs are Basic Graph Patterns (BGP)  which, in contrast with the former DAG, are not constrained to having entities in the leaves only. In this work, we are concerned with the latter, more general case.
Given a query of the form given in eq. 2, we define the query graph as the tuple . Here, is the union of the entity nodes , and the type nodes , where contains a node for each entity present in the binary predicates, and a node for each variable.To construct , we add one edge for each binary predicate in the query.
The representation of the query as a graph allows us to combine the use of entity embeddings with recent advances in neural networks for graph-structured data . Our method, which we call Message Passing Query Embedding (MPQE) has three steps: initialization of the nodes, message passing, and aggregation of the state of the nodes into one embedding for the query.
3.1 Model Definition
The first step is initializing the nodes of the query the graph. We do this by assigning an initial feature vector to every node in the query graph, given by a one-hot representation with elements if is an entity node, or elements if it is a variable node. This representation is used to index embedding matrices that project the nodes into a low-dimensional space. We define a matrix of entity embeddings , where is the dimension of the embedding space, and type embeddings with a matrix . The node embedding function is defined as follows:
In words, this means that each entity has its own embedding and that each variable gets initialized by a representation for its type. Note that we overload the definition of to also provide types for the variable nodes of queries.
Having defined features for every node in the query graph, we proceed to apply steps of message passing with a GNN. In particular, we employ a Relational Graph Convolutional Network (R-GCN) , which updates the features of a node taking into account its neighbors and the type of the relations involved. The representation for node at step for the R-GCN is defined as follows:
where is a non-linearity, is the set of neighbors of node through relation type , and and are parameters of the model.
After applications of an R-GCN layer, the representations of all nodes in the query graph can be combined into a single vector that acts as the embedding of the query, by means of an aggregation function :
we continue by defining several options for this function.
Let denote the diameter of the query graph (the longest shortest path between two nodes in the graph). We propose an adaptive query embedding method, by noting that at most message passing steps are required to propagate messages from all nodes, to the target node. Given a query graph, the method performs steps of message passing, and it then employs a Target Message (TM) aggregation function, which simply selects the representation of the target node:
Alternative aggregation functions can leverage the representations of other nodes in the query graph. Simple permutation-invariant functions include the sum and maximum, but we also consider functions with additional parameters [8, 22]
. We first consider an aggregation function that passes all representations through a Multi-Layer Perceptron (MLP) and then sums the results:
Previous works have highlighted the importance of leveraging features from different layers of a neural network [8, 20], which motivates an aggregation function that concatenates node representation from hidden layers of the R-GCN. We denote this function as CMLP:
The parameters of MPQE consist of entity and type embeddings, together with the parameters of the R-GCN used during the message passing procedure, and any additional parameters included in the aggregation function. Following previous work on query embedding [7, 11]
, we optimize MPQE using gradient descent on a contrastive loss function, where given a queryand its embedding , a positive sample corresponds to an entity in the knowledge graph that answers the query, and a negative sample is an entity sampled at random, that is not an answer to the query but has the correct type. We minimize the following margin loss function:
where is the embedding of the entity (according to eq. 4). The optimization of this loss function encourages higher scores for positive samples than for negative ones, as it penalizes the model whenever the margin between the two scores is lower than 1.
4 Related Work
Multiple approaches for machine learning on graphs consider embedding the graph into a vector space[2, 18, 21]. The applicability of these methods for answering complex queries is limited. For each link that needs to be predicted to answer a query, link prediction methods need to consider all possible entities, which is exponential in the size of the query. In comparison, our method is based on an architecture that directly encodes the query into an embedding, which is optimized to be similar to the embedding of correct entities. This provides our method with a linear complexity in the size of the graph.
To avoid the link prediction problem at each step when traversing a KG, other works seek to perform prediction across longer paths [3, 12], which restricts the use of these methods to chain-like queries. This contrasts with our work, where we are interested in answering queries of a more arbitrary shape.
More recent works have also addressed the problem of obtaining a vector representation of a query, which is then used to obtain approximate answers, while still leveraging the properties of embedding methods such as TransE. This is achieved by partitioning the query graph in different subgraphs, so that candidate answers can be provided for each of them . In , the authors pre-train embeddings using an algorithm inspired by TransE. In order to combine the embeddings in a meaningful way for the task of query answering, the authors propose a set of rules to aggregate the embeddings, by following the structure of the query graph. Since their method is related to probabilistic models of entities in context (such as node2vec ), the embeddings are dependent on how the context is selected, and the effect of this aspect on query answering is not clear. Our method differs significantly from this approach. Instead of relying on a separate pre-training step, we learn with an objective that optimizes entity embeddings for the task of query answering directly.
The most related approaches to our work consist of recently proposed methods for encoding queries directly in the embedding space [7, 11], which work by applying a sequence of projection and intersection operators that follow the structure of the query graph. These methods are constrained to queries that in graph form correspond to Directed Acyclic Graphs (DAG) where entities can only be present at the leaves. Furthermore, the use of projection and intersection mechanisms requires these models to be trained with multiple query structures that comprise both chains and intersections. Our method has a more general formulation that enables it to i) encode a general set of query graphs, without constraints on the location of entities in the query, and ii) learn from link prediction training alone.
We evaluate the performance of MPQE in query answering over knowledge graphs, by considering 7 different query structures (see fig. 2). All the code to reproduce our experiments is available online 111https://github.com/dfdazac/mpqe.
AIFB: is a KG of academic institution, where entities are persons, organizations, projects, publications, and topics.
MUTAG: a KG of carcinogenic molecules, where entities are atoms, bonds, compounds, and structures.
AM: this KG contains the relations between different artifacts in the Amsterdam Museum, including locations, makers, documentation, and agents, among others.
Bio: a dataset of a biological interaction network containing entities of type drug, disease, protein, side effect, and biological processes.
A list of their statistics can be found in table 1.
To obtain query graphs, we sample subgraphs from the KG, following the structures shown in Figure 2.222These are chosen such that our work is comparable to related work [7, 11]. Each sampled subgraph specifies the entities and the types of variables in the query (including the target node), together with the correct answer to the query, which is used as a positive sample. For each query we also obtain a negative sample, and in the case of query graphs with intersections, a hard negative sample. These are entities that would be a correct answer to the query, if the conjunction represented by the intersection is relaxed to a disjunction.
We evaluate the effectiveness of our method when answering queries that require information in the graph not observed during training. In particular, given a KG, we start by removing 10% of its edges. Using this incomplete graph, we extract 1 million subgraphs, containing all the query structures outlined previously. These queries form the training set, which we use to optimize eq. 10.
We then restore the removed edges, and extract 11,000 additional subgraphs of all structures, ensuring that they all rely in at least one of the edges that was removed to create the training set. We split this set of query graphs into two disjoint sets, containing 1,000 queries for validation, and 10,000 for testing. We use the validation set to perform early stopping during training, and we report results on the test set.
For evaluation, we use the embedding of a query to compute a score against its correct answer and a negative sample, using eq. 3. The scores obtained for a set of queries is used to calculate the area under the ROC curve (AUC). Furthermore, we compute scores against at most 1,000 negative samples, that we use to compute the Average Percentile Rank (APR), so that in the ideal case, a correct entity should have a percentile rank of 100. These metrics are thus a proxy of the retrieval quality of our method for query answering.
We evaluate the performance of MPQE under different aggregation functions. We initialize all embedding matrices randomly, although they could be obtained from a pretraining step with methods such as TransE 
. With the exception of the TM aggregation function (where the number of message passing steps is given by the query diameter), we use 2 R-GCN layers. For aggregation functions with MLPs, we use two fully-connected layers, and in all cases we use ReLU for the nonlinearities.
As a baseline we include the Graph Query Embedding (GQE) method by Hamilton et al. (2018)  with the default settings reported by the authors in their implementation333https://github.com/williamleif/graphqembed. We test the three variants that they propose, namely TransE, DistMult, and Bilinear.
We minimize eq. 10
using the Adam optimizer with a learning rate of 0.01, and use an embedding dimension of 128. We train the models on 1-chain queries until convergence, and then on the full set of query structures until convergence, as we found that this procedure helps to speed up convergence. For our implementation we use PyTorch and the PyTorch Geometric library.
The results for the query answering task are shown in table 2. We show results for two cases: in the Base case, we evaluate the performance across all query structures, with regular positive and negative samples. In the All case, we include hard negative samples. We observe that MPQE obtains competitive performance in comparison with GQE across different datasets. We also note that the performance of our method is consistent when considering a rank against multiple negative samples, as shown by the APR results.
As expected, performance decreases when considering hard negative samples, with both methods exhibiting a similar reduction. The largest difference occurs in the MUTAG dataset, which we identified as the dataset with the less diverse set of relations, with MPQE resulting in lower performance. In spite of this discrepancy, the difference in the averages for the MUTAG dataset is not significant (according to a Wilcoxon signed-rank test) between GQE-DistMult and MPQE-TM (the best variants): while GQE-DistMult handles hard negative samples well (which occur only on queries with intersections), MPQE-TM has better performance on regular samples, across all query structures.
While the previous experiments show that our method is competitive on the task in comparison with GQE, we argue that our method has a more general formulation. To examine the generalization properties of MPQE, we evaluate the methods when training on 1-chain queries only. In this scenario, the models are optimized to perform link prediction, but we carry out the evaluation using the complete set of query structures.
GQE is designed to work with an intersection operator that can only be optimized if there are queries with intersections in the training set. Therefore, when training it on 1-chain queries only, GQE cannot provide an answer better than random for queries with intersections. For our method, there is no such a limitation. We thus consider two evaluations modes when training on 1-chain queries: evaluating on queries with chain structures only, and evaluating on the complete set of query structures (where GQE is not applicable). These modes are denoted as “ch” and “all”, respectively, in table 3. The results of MPQE are competitive when evaluating on queries with chains only, and crucially, it also generalizes well to six query structures not seen during training. This surprising results shows that message passing is an effective mechanism that does not necessarily require training on many diverse query structures to generalize well, as is the case for GQE.
Message passing performance
An interesting observation from our experiments is that the message passing mechanism alone is sufficient to provide good performance for query answering, as we can see from the results for the MPQE-TM architecture. In this model, we perform a number of steps of message passing equal to the diameter of the query, and take as query embedding the resulting feature vector at the target node. Intuitively, this allows MPQE-TM to adapt to the structure of a query so that after message passing, all information from the entity and variable nodes has reached the target node. To confirm this intuition, we evaluate the performance of MPQE as a function of the number of message passing steps, ranging from 1 to 4. The results are shown in Figure 3, for all the query structures that we have considered. We highlight the points that correspond to the diameter of the query, and we note that the results align with our intuition about the message passing mechanism. When the number of steps matches the diameter, there is a significant increase in performance, and further steps have little effect. This supports the superior generalization observed in previous experiments, in comparison with GQE, and other MPQE architectures where the number of R-GCN layers was fixed.
In order to assess the properties of the embedding space induced by MPQE, we sample 200 entities for each type in the Amsterdam Museum dataset, and visualize them using T-SNE . The results are shown in fig. 4 for GQE-Bilinear and MPQE-TM. We observe that the embedding space for MPQE is clearly structured. The embeddings form clusters in the space on regions of the same type. This is in stark contrast with the embeddings of GQE, where we do not observe a clear structure, apart from a single cluster that is not completely concentrated. These results enrich the generalization experiments, where training for link prediction resulted in useful embeddings for more complex queries. With such a structured space, MPQE can compose messages for paths of length 1 to move across the space and obtain an embedding for queries that require more message passing steps.
We have presented MPQE, a neural architecture to encode complex queries on knowledge graphs, that jointly learns entity and type embeddings and a straightforward message passing architecture to obtain a query embedding. Our experiments show that message passing across the query graph is a powerful mechanism for query answering, and that it generalizes to multiple query structures even when only trained for single hop link prediction.
The qualitative results show that MPQE learns a well-structured embedding space. This result motivates future research on the application of the learned embeddings to other tasks related to KGs, such as clustering, and node and graph classification. Under this new light, MPQE can be seen as an unsupervised representation learning method for KGs, since all the training data it requires is generated from the graph.
We showed that the general formulation of our model allowed it to exhibit greater generalization, but we also note that there are applications of this architecture that constitute interesting directions for future work. By being able to encode queries independent of the position of entities and variables, we could encode queries with additional information, that could be used to condition the answers on a given context. Such an application would be useful in information retrieval and recommender systems.
Our method presents limitations when evaluating on hard negative samples. Our experiments showed a slight increase in performance when increasing the number of message passing steps, but the end effect was not significant. Further modifications could include improving the message passing procedure, by including attention or gating functions that would enable a better conditioning of the query embedding given the structure of the query graph.
-  Y. Bengio and Y. LeCun (Eds.) (2015) 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings. External Links: Cited by: 21.
-  (2013) Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: §4, §5.
Chains of reasoning over entities, relations, and text using recurrent neural networks. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain. Cited by: §4.
-  (2011) Query relaxation for entity-relationship search. In Extended Semantic Web Conference, pp. 62–76. Cited by: §1.
-  (2019) Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, Cited by: §5.
-  (2017) Handling failing rdf queries: from diagnosis to relaxation. Knowledge and Information Systems 50 (1), pp. 167–195. Cited by: §1.
-  (2018) Embedding logical queries on knowledge graphs. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp. 2030–2041. Cited by: §1, §3.1, §3, §4, §5, §5, footnote 2.
-  (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1025–1035. Cited by: §3.1, §3.1.
-  (2013) SPARQL 1.1 query language. W3C recommendation 21 (10), pp. 778. Cited by: §1, §3.
-  (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §5.1.
-  (2019) Contextual graph attention for answering logical queries over incomplete knowledge graphs. In Proceedings of K-CAP 2019, Nov. 19 - 21,2019, Marina del Rey, CA, USA., Cited by: §1, §3.1, §4, §5, footnote 2.
Compositional vector space models for knowledge base completion.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 156–166. External Links: Cited by: §4.
-  (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §4.
-  (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In International Semantic Web Conference, pp. 186–194. Cited by: §5.
-  (2016) Rdf2vec: rdf graph embeddings for data mining. In International Semantic Web Conference, pp. 498–514. Cited by: §5.
-  (2018) Modeling relational data with graph convolutional networks. In The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, pp. 593–607. Cited by: §3.1, §5.
-  (2018) Towards empty answers in sparql: approximating querying with rdf embedding. In International Semantic Web Conference, pp. 513–529. Cited by: §1, §3, §4.
Knowledge graph embedding by translating on hyperplanes. In
Twenty-Eighth AAAI conference on artificial intelligence, Cited by: §4.
-  (2019) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §3.
-  (2018) Representation learning on graphs with jumping knowledge networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 5449–5458. Cited by: §3.1.
-  (2015) Embedding entities and relations for learning and inference in knowledge bases. See 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings, Bengio and LeCun, External Links: Cited by: §4.
-  (2017) Deep sets. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 3394–3404. Cited by: §3.1.
-  (2018) TrQuery: an embedding-based framework for recommending sparql queries. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), Cited by: §4.