A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs

09/04/2019
by   Katsuhiko Hayashi, et al.
Osaka University
0

Bilinear diagonal models for knowledge graph embedding (KGE), such as DistMult and ComplEx, balance expressiveness and computational efficiency by representing relations as diagonal matrices. Although they perform well in predicting atomic relations, composite relations (relation paths) cannot be modeled naturally by the product of relation matrices, as the product of diagonal matrices is commutative and hence invariant with the order of relations. In this paper, we propose a new bilinear KGE model, called BlockHolE, based on block circulant matrices. In BlockHolE, relation matrices can be non-commutative, allowing composite relations to be modeled by matrix product. The model is parameterized in a way that covers a spectrum ranging from diagonal to full relation matrices. A fast computation technique is developed on the basis of the duality of the Fourier transform of circulant matrices.

READ FULL TEXT VIEW PDF
08/11/2020

DensE: An Enhanced Non-Abelian Group Representation for Knowledge Graph Embedding

Capturing the composition patterns of relations is a vital task in knowl...
06/03/2019

Relation Embedding with Dihedral Group in Knowledge Graph

Link prediction is critical for the application of incomplete knowledge ...
01/26/2018

Knowledge Graph Embedding with Multiple Relation Projections

Knowledge graphs contain rich relational structures of the world, and th...
05/09/2017

Improved method for finding optimal formulae for bilinear maps in a finite field

In 2012, Barbulescu, Detrey, Estibals and Zimmermann proposed a new fram...
03/10/2017

On matrices with displacement structure: generalized operators and faster algorithms

For matrices with displacement structure, basic operations like multipli...
02/15/2022

STaR: Knowledge Graph Embedding by Scaling, Translation and Rotation

The bilinear method is mainstream in Knowledge Graph Embedding (KGE), ai...
12/07/2019

Minimal Sufficient Conditions for Structural Observability/Controllability of Composite Networks via Kronecker Product

In this paper, we consider composite networks formed from the Kronecker ...

1 Introduction

Large-scale knowledge graphs Nickel et al. (2016a) are indispensable resources for knowledge-intensive applications such as question answering, dialog systems, and distantly supervised relation extraction. A knowledge graph is a collection of triplets representing the fact that (binary) relation holds between subject entity and object entity . Although efforts continue to enrich existing knowledge graphs with more facts, many facts are still missing Nickel et al. (2016a). Knowledge graph completion (KGC) aims to automatically detect missing facts in an incomplete knowledge graph, and has become an active field of research in recent years.

William

Harry

Charles

Elizabeth

Andrew

Beatrice

Eugenie

(a)

motherOf

motherOf

motherOf

motherOf

fatherOf

fatherOf

fatherOf

fatherOf

fatherOf

brotherOf

brotherOf

fatherOf

William

(b)

fatherOf

brotherOf

fatherOf

William

(c)

brotherOf

fatherOf

fatherOf
Figure 1: (a) A knowledge graph and (b,c) two relation paths starting from William.

Knowledge graph embedding (KGE) is a promising approach to KGC. It embeds entities and relations in vector space, and defines a

scoring function to evaluate the degree of factuality of a given triplet in terms of vector operations.

Bilinear KGE models are a popular choice for a scoring function, along with those based on translation and neural networks. RESCAL 

Nickel et al. (2011) adopts a generic bilinear form as the scoring function, given by . In this formula, are the -dimensional vector embeddings of entities and , respectively, and is the matrix embedding of relation . Some of the more recent models have constrained the relation matrices to be diagonal. DistMult Yang et al. (2015) and ComplEx Trouillon et al. (2016) are two such diagonal models. HolE Nickel et al. (2016b) does not use diagonal relation matrices, but has been shown Hayashi and Shimbo (2017) to be isomorphic to ComplEx. These models have a smaller number of parameters than RESCAL, making them less prone to overfitting, and the performance is usually better.

While all these models were designed with a specific task of KGC in mind, i.e., computing the factuality of triplets, another important task on knowledge graphs was pursued by Guu et al. (2015) and Lin et al. (2015). This latter task, called path query answering (path QA), is to answer composite queries that consist of a cascade of relations, as opposed to an atomic relation. See Figure 1 for instance. A query “Is Beatrice a child of a paternal uncle of William?” can be answered by predicting the truth value of the triplet (William, fatherOf/brotherOf/fatherOf, Beatrice) where fatherOf/brotherOf/fatherOf is a binary relation not present in the knowledge graph as a relation (edge) label but is composed of a cascade of three atomic relations.111We regard inverse relations (e.g., ) also as atomic relations. Composite queries are also called path queries, as they can be represented as paths in a knowledge graph; see, e.g., the blue line in Figure 1(a). Notice however that some of the edges in the path may be missing due to the incompleteness of the knowledge graph; even in such circumstances, the model must ideally be able to answer path queries correctly.

Guu et al. (2015) extended the existing KGE approaches to path QA. For example, to answer a general path query with RESCAL, a composite relation is modeled by matrix product , and the score for the given query is modeled by . This formulation is also applicable to DistMult and ComplEx, which use diagonal relation matrices. In diagonalized models, however, relation matrices are commutative, in the sense that for any pair of relations .

Commutativity of relation matrices was not recognized as an issue in the past research because the main focus was on predicting the truth value of atomic triplets. However, when path queries are concerned, commutativity poses a problem. Consider, for example, a relation sequence

fatherOf/brotherOf/fatherOf

and its permutation

Although these are two distinct paths (cf. Figure 1(b, c)), in bilinear models with commutative relation matrices, they are represented by the same product of relation matrices, which thereby makes the truth values of these permutated queries indistinguishable by their scores.

Drawing on the observation above, this paper proposes a new KGE model called BlockHolE, wherein relations are represented by block circulant matrices. This makes relation matrices non-commutative, and thus it does not suffer from the issues arising from commutativity, yet in general manages to reduce the number of parameters compared with RESCAL. It can be interpreted as a generalization of HolE and ComplEx, and also subsumes RESCAL as an extreme case. We report experimental results in both path and atomic QA tasks.

2 Notation and preliminaries

Symbol Description
sets of real/complex numbers
th component of vector
-component of matrix
transpose of
conjugate of
componentwise (Hadamard) product
circular convolution
circular correlation
real part of complex number
diagonal matrix with main diagonal
circulant matrix determined by
sum of the componentwise products of
discrete Fourier matrix
set of entities
set of relations
set of observed facts (triplets)
set of ground truth facts
Knowledge graph induced by facts
Table 1: List of symbols. See Secs. 2 and 3 for detail.

We first introduce symbols and notation used in this paper, followed by some preliminaries on circulant matrices, circular convolution, correlation, and Fourier transform. The summary of symbols can be found in Table 1.

Let be the set of reals, and be the set of complex numbers. Let denote the th component of vector , and let the element of matrix . For a complex number , vector , and matrix , let , , and denote their complex conjugate, respectively.

Let , , and be -dimensional (real or complex) vectors. Let denote an diagonal matrix with the main diagonal components given by . We write to denote the componentwise product of and ; i.e., , or , . We also write .

For -dimensional real vectors222Generally, circular convolution, circular correlation, and circulant matrices are defined over . However, in this paper, it suffices to define them over . , and denote circular convolution and circular correlation, respectively defined by

where vector indices that do not fall in the range must be interpreted by .

For -dimensional real vector , let

be an operation that converts a vector to a circulant matrix of size .

A circulant matrix can be diagonalized as , where is the discrete Fourier matrix of order . Also, circular convolution and correlation can be written in terms of : , and . It follows that

(1)
(2)

These equations imply that circular convolution and correlation can be computed in time using the fast Fourier transform (FFT).

3 Knowledge graph embedding using bilinear maps

A knowledge graph is a labeled multigraph , where is the set of entities (or vertices), is the set of relation labels (or edge labels), and defines the observed instances of binary relations over entities (or labeled edges). An item is called a triplet, with and called its subject and object, respectively. For every entity in , it is assumed that contains at least one triplet with or ; likewise, for every relation in , is assumed to contain at least one triplet . Because determines the sets and of entities and relations, we write to denote the knowledge graph determined by .

Aside from observed triplets , we also assume the presence of a set of (ground truth) facts, which is a strict superset of , i.e., . Thus, is not fully observable.

3.1 Knowledge graph completion

Knowledge graph completion (KGC) is the task of identifying the set of ground truth facts from observed facts (or equivalently, from ).

A popular approach to KGC is to design a scoring function quantifying how likely a triplet is true. This scoring function is learned from the observed triplets , in a way that it generalizes well to unobserved triplets ; i.e., the score must be high for both observed and unobserved facts, and it must be low for nonfactual triplets.

In knowledge graph embedding (KGE)–based approaches to KGC, the scoring function is defined in terms of the embeddings of entities and relations; i.e., , , and are embedded as objects in a vector space, and is defined in terms of some operations over these objects.

3.2 Bilinear models for knowledge graph embedding

Below, we describe some of the popular KGE models that use bilinear maps to define scoring functions.

3.2.1 Rescal

RESCAL Nickel et al. (2011) provides the most general form of bilinear scoring function.

(3)

where are the vector embeddings of entities and , respectively, and is the matrix representing relation . Thus, parameters are required per relation, which is not only a computational burden but also the cause of overfitting during training Kazemi and Poole (2018).

3.2.2 DistMult

DistMult Yang et al. (2015) is a model obtained by restricting the relation matrices of RESCAL to diagonal; i.e., , . The scoring function is thus

(4)

Although the number of parameters is reduced considerably, the scoring function (4) is symmetric with respect to the entities, i.e., . This is a severe limitation because most real-world relations are non-symmetric.

3.2.3 ComplEx: Complex embedding

The complex embedding (ComplEx) Trouillon et al. (2016) represents entities and relations as -dimensional vectors as in DistMult, but their components are complex-valued.

The scoring function of ComplEx is given by

where are the embeddings of , , and , respectively. The number of parameters in ComplEx is , and the score is computable in time linear in the dimension of vector space. Unlike DistMult, ComplEx can model non-symmetric relations, since in general.

3.2.4 HolE: Holographic embedding

The holographic embedding (HolE) Nickel et al. (2016b) uses circular correlation to define a scoring function

(5)

where are -dimensional real vectors representing relation , and entities and , respectively. HolE has only parameters per relation, and it can model non-symmetric relations since in general. Computing circular correlation requires time if FFT is employed. Eq. (5) is not a bilinear form, but it has been shown Hayashi and Shimbo (2017) that HolE is isomorphic to ComplEx, and thus any model in HolE can be converted to an equivalent model in ComplEx, and vice versa.

4 Path question answering over a knowledge graph

4.1 Path query answering

Let be the set of ground truth facts, and let be its induced knowledge graph. For relations , we call a relation path of length . When , the relation path is atomic; otherwise, it is composite. Let . We say a path query holds (or “is true”) in (or with respect to ) if

where and . Path query answering (path QA) is the task of predicting the truth value of path queries with respect to the unobserved set of ground truth facts, when its incomplete subset is only available. In other words, we want to predict that is true if a path from to exists in , although some of the edges that constitute the path may be missing in the observed graph .

For atomic path queries (i.e., those with length ), path QA reduces to that of knowledge graph completion introduced in Section 3.1. Thus, it is natural to address general path QA by extending the scoring function of KGC methods so that composite relation is allowed in place of atomic relation ; i.e., by defining . Previous work Guu et al. (2015) explored this direction, which is also pursued in the rest of this paper.

4.2 Issues in existing KGE models applied to path QA

We now discuss the extension of existing bilinear KGE models to path QA. We begin with RESCAL, which is the most general among existing bilinear models. In RESCAL, if we assume for true triplets , we can model path QA as computing

(6)

As seen in this formula, a composite relation is represented by the product of the matrices for atomic relations Guu et al. (2015).

Likewise, DistMult and ComplEx can also be used for path QA, by computing

and

respectively. However, because diagonal matrices are commutative, the score of is equal to any path query in which are permutated, such as . That is, because , their truth values cannot be distinguished by the magnitude of scores. More recent bilinear models such as ANALOGY333 We categorize ANALOGY as a diagonal model because each block diagonal element of its relation matrices can be substituted by a single equivalent complex-valued component.  Liu et al. (2017) and SimplE Kazemi and Poole (2018) also represent relations by diagonal matrices, and thus they can only model commutative relation paths. Moreover, for SimplE, which represents subject and object entities in different vector spaces, it is not clear how it can be applied to path QA.

In the translation-based model TransE Bordes et al. (2013), the scoring function is given by444The original TransE defines a penalty function, which gives a smaller value if a triplet is more likely to be true. We thus changed the sign to make it a scoring function in Eq. (7).

(7)

Guu et al. (2015) extended this function for a path query by

(8)

Thus, a composite relation is represented as the sum of the embedding vectors for its constituent atomic relations. Unfortunately, Eq. (8) is also invariant with the permutation of relations , and their order is not respected.

5 Knowledge graph embedding with block circulant matrices

5.1 BlockHolE

In this section, we propose a bilinear KGE model suitable for path QA. In this model, the relation matrices are non-commutative. It thus respects the order of relations in a path query. Further, it has a smaller number of parameters than RESCAL in general. To be specific, our model constrains the relation matrices to be block circulant.

A matrix is block circulant if it can be written in the form

(9)

where each , , is a circulant matrix determined by . Thus, if the dimension of the matrix in Eq. (9) is , we have . A block circulant matrix is non-commutative when ; i.e., for two block circulant matrices , , in general.

Substituting a block circulant matrix of Eq. (9) for matrix in the bilinear scoring function (Eq. (3)) yields

(10)

where , and , . Recall that , and thus , . Using equalities Nickel et al. (2016b) and to rewrite Eq. (10), we have

(11)

We call this model BlockHolE, after the fact that it reduces to HolE when ; cf. Eq. (5). Also, BlockHolE is identical to -dimensional RESCAL when (or equivalently ).

The number of parameters in BlockHolE is (or ), and naive computation of Eq. (11) takes time using FFT. However, we can make this computation faster by exploiting the duality of the Fourier transform, as shown below.

5.2 Fast computation in complex space

Using a similar technique used by Hayashi and Shimbo Hayashi and Shimbo (2017) to show the equivalence of ComplEx and HolE, we can eliminate Fourier transform to speed up the computation of BlockHolE scores. We first rewrite Eq. (11) as follows:

where is the discrete Fourier matrix. Here we used Eq. (1) to derive the second equation, and to derive the third. Defining complex vectors , , and yields

(12)

On the basis of Eq. (12), we train directly in complex space (i.e., the Fourier domain) instead of and use it as the vector embedding of entity , for all ; similarly, is directly trained in complex space to represent relation . The number of parameters in this model is , and Eq. (12) can be computed in time. Typically, we set . For instance, in the experiment of Section 6, we set and , and thus . In this case, factor is negligible and the computational complexity is linear in .

5.3 Modeling path QA

BlockHolE can be used in path QA as follows. First, for any and , let

Then, Eq. (12) can be rewritten as

and we can compute the score of relation paths by

Since for , this scoring function respects the order of relations in .

6 Experiments

In this section, we report the results of empirical evaluation investigating the commutativity property of bilinear KGE models on the path QA task. As expected, the proposed BlockHolE model, which uses non-commutative relation matrices, outperformed commutative bilinear KGE models.

6.1 Dataset and evaluation protocol

WN11 FB13
Train 112,581 316,232
Base Valid 2,609 5,908
Test 10,544 23,733
Train 2,129,539 6,266,058
Path Valid 11,277 27,163
Test-Deduction 24,749 77,883
Test-Induction 21,828 31,674
Table 2: Dataset provided by Guu et al. (2015).
WN11 FB13
Base Deduction Induction Base Deduction Induction
P@10 MQ P@10 MQ P@10 MQ P@10 MQ P@10 MQ P@10 MQ
DistMult 45.6 83.0 33.5 97.7 29.6 79.8 62.7 91.6 63.6 86.4 59.3 86.5
ComplEx 60.9 83.1 68.7 99.2 46.1 79.7 76.8 93.0 71.5 90.0 70.5 88.9
RESCAL 51.8 74.2 43.2 97.9 51.2 76.8 65.2 91.1 66.9 88.4 69.8 89.0
80.9 83.4 70.2 99.5 54.9 81.0 79.2 93.2 75.0 91.5 71.3 90.0
80.5 75.6 69.3 99.2 54.5 77.4 76.2 92.1 72.1 90.5 70.9 89.5
Table 3: Path QA ranking result: Comparing BlockHolE ( and ) to other bilinear models. The dimension of the embedding space for DistMult, ComplEx and RESCAL was set to as the result of grid search.
Figure 2:

CPU run time per epoch of BlockHolE on WN11 Base with single CPU thread.

Figure 3: Path QA classification result comparing BlockHolE ( and ) to DistMult, ComplEx and RESCAL models (all with as the result of grid search).

The comparison of KGE models was performed in two path QA tasks: (i) ranking and (ii) binary classification tasks.

6.1.1 Path QA ranking

For the path QA ranking task, we adopted the same protocol and dataset used by Guu et al. (2015). Table 2 shows the statistics of their dataset. The dataset consists of two parts, “Base” and “Path”.

The Base part only contains facts (i.e., path queries with ), and thus it is essentially for evaluating KGC performance. Its training samples constitute the observed facts , and the facts in the entire Base part (training/validation/test sets) make the ground truth facts .

The Path part contains path queries sampled from the same and as the Base part. The test samples in the Path part is divided into “deduction” and “induction” sets. In the “deduction” set, test samples were sampled from the Base training graph . By contrast, in the “induction” set, the test samples were chosen from the ground truth graph such that none of them have a corresponding path in . Thus, the “induction” set is intended to measure how well a model generalizes to unobserved paths, whereas the “deduction” set is to test its ability to faithfully encode the observed training graph.

At the time of evaluation, for each a test sample , a candidate set

was first computed. In other words, the candidates are the entities for which (i.e., the last relation in the test query) takes as its object at least once in . Then, for each compared model, we made the ranking of the candidates entities in by the score , where is learned by the model from the training set.

The quality of the ranking was measured by two evaluation metrics: averaged mean quantile (MQ) and P@10 (percentage of correct answers ranked in the top 10). For

where , the correct answer set is the set of all entities that can be reached from by traversing over . Formally, let , and the answer set can be recursively defined: . With these definitions, MQ is computed by the following formula:

(13)

where is the set of incorrect answers. Eq. (13) cannot be computed for queries with which , and these queries were excluded from evaluation. For further details, see the original paper by Guu et al. (2015).

6.1.2 Path QA classification

In the path QA classification task, we simply report classification accuracy. After the scoring function

was trained with logistic regression, a path query

was classified as true if

, or false otherwise.

Since the test and validation sets of Path in Table 2 contain only correct queries, we sampled negative ones by the following procedure: For a correct query (), we generated its reverse relation path query . If does not exist in , we used it as a negative.

6.2 Experiment setup

We compared BlockHolE with state-of-the-art bilinear KGE models: DistMult, RESCAL and ComplEx. We have implemented BlockHolE in Java. BlockHolE reduces to ComplEx when , and with the imaginary parts of parameters set to , it reduces to RESCAL when and to DistMult when . For a fair run time comparison, however, we separately implemented RESCAL using jblas-1.2.4 for matrix computation. Through all experiments, we optimized the logistic loss with L2 regularization on the parameters :

where denotes the truth value of a query in a training data . Given a correct query , we generated negative samples by replacing with an entity randomly sampled from .

We selected the hyperparameters via grid search such that on the validation set they maximize classification accuracy in the path QA classification task and MQ in the path QA ranking task. For all models except BlockHolE, all combinations of

, learning rate , and the embedding size were tried during grid search. For BlockHolE, all combinations of , and were tried. The maximum number of training epochs was set to 500. The number of negatives generated per positive sample was 5 during training.

6.3 Results

6.3.1 Path QA ranking

Table 3 shows the results on the path QA ranking data. BlockHolE outperforms other bilinear KGE models considerably both on deductive and inductive test settings. These results strongly suggest that BlockHolE is more expressive in modeling path QA than DistMult and ComplEx, while effectively reducing redundant parameters in RESCAL which can cause model overfitting. Figure 2 shows the empirical scalability of BlockHolE. When is small, BlockHolE scales linearly in the dimension of the embedding space.

6.3.2 Path QA classification

Figure 3 shows the accuracy of path QA classification. DistMult and ComplEx were considerably worse than BlockHolE and RESCAL for both WN11 and FB13. This result confirms our claim: The non-commutativity of relation matrices plays a critical role in modeling path QA. The performance of BlockHolE () was comparable to that of RESCAL but the former was 12 times faster.

6.4 Analysis

Label Relation Path ComplEx BlockHolE
+ */parents/religion/* 96.7 100.0
- */religion/parents/* 3.3 100.0
Table 4: Classification accuracy on selected queries.

The accuracies of BlockHolE and RESCAL on the path QA classification task were markedly better than those of DistMult and ComplEx. We analyzed the results further. We extracted all queries from of FB13 that consist of an interpretable relation path */parents/religion/* where denotes “can match any relation path”. For such queries , we also generated meaningless queries as negatives. Table 4 shows the classification accuracies of ComplEx and BlockHolE (). The results clearly show that ComplEx cannot correctly answer the negative queries at all due to the lack of the non-commutative property.

7 Summary

In this paper, we have pointed out the problems of existing bilinear KGE models in path QA, and proposed a new model that overcomes these problems. This model, called BlockHolE, represents relations as block circulant matrices. As a result, it respects the order of relations in path queries, while enjoying linear-time computation of scoring functions when the number

of blocks is sufficiently small. It generalizes HolE/ComplEx, and it can also be interpreted as an interpolation between RESCAL and HolE/ComplEx. Its effectiveness was shown empirically in path QA.

Our proposal can be useful in not only path QA but also many tasks such as associative rule mining Yang et al. (2015), path regularization Lin et al. (2015), and more complex QA Hamilton et al. (2018), in which composite relations need to be embedded as a vector. Other future directions include reducing the increased parameters in the proposed block circulant matrices, such as by using multiplicative L1 regularization for ComplEx Manabe et al. (2018).

Acknowledgments

We thank anonymous reviewers for helpful comments. This work was partially supported by JSPS Kakenhi Grant Numbers 19H04173, 18K11457, and 18H03288.

References