CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space Regression and Embedding

12/19/2021
by   Xiou Ge, et al.
University of Southern California
0

Entity type prediction is an important problem in knowledge graph (KG) research. A new KG entity type prediction method, named CORE (COmplex space Regression and Embedding), is proposed in this work. The proposed CORE method leverages the expressive power of two complex space embedding models; namely, RotatE and ComplEx models. It embeds entities and types in two different complex spaces using either RotatE or ComplEx. Then, we derive a complex regression model to link these two spaces. Finally, a mechanism to optimize embedding and regression parameters jointly is introduced. Experiments show that CORE outperforms benchmarking methods on representative KG entity type inference datasets. Strengths and weaknesses of various entity type prediction methods are analyzed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/12/2020

On the Complementary Nature of Knowledge Graph Embedding, Fine Grain Entity Types, and Language Modeling

We demonstrate the complementary natures of neural knowledge graph embed...
09/25/2020

AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding

Recent advances in Knowledge Graph Embed-ding (KGE) allow for representi...
07/21/2020

Connecting Embeddings for Knowledge Graph Entity Typing

Knowledge graph (KG) entity typing aims at inferring possible missing en...
04/27/2022

Learning to Borrow – Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

Prior work on integrating text corpora with knowledge graphs (KGs) to im...
04/15/2020

Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

In this paper, we describe an embedding-based entity recommendation fram...
10/02/2021

Is There More Pattern in Knowledge Graph? Exploring Proximity Pattern for Knowledge Graph Embedding

Modeling of relation pattern is the core focus of previous Knowledge Gra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Research on knowledge graph (KG) construction, completion, inference, and applications has grown rapidly in recent years since it offers a powerful tool for modeling human knowledge in graph forms. Nodes in KGs denote entities and links represent relations between entities. The basic building blocks of KG are entity-relation triples in form of (subject, predicate, object

) introduced by the Resource Description Framework (RDF). Learning representations for entities and relations in low dimensional vector spaces is one of the most active research topics in the field.

Entity type offers a valuable piece of information to KG learning tasks. Better results in KG-related tasks have been achieved with the help of entity type. For example, TKRL [xie2016representation] uses a hierarchical type encoder for KG completion by incorporating entity type information. AutoETER [Niu:AutoETER] adopts a similar approach but encodes the type information with projection matrices. Based on DistMult [yang2014embedding] and ComplEx [trouillon2016complex] embedding, [jain2018type] propose an improved factorization model without explicit type supervision. JOIE [hao2019universal] attempts to embed entities and types in two separate spaces by learning instance-view embedding and ontology-view embedding. Similar to JOIE, TaRP [DBLP:conf/aaai/CuiKTGJ21]

leverages the hierarchical type ontology structure for relation prediction. Instead of learning embedding for all types, TaRP develops a heuristic weighting mechanism to rank the prior probability of a relation given its head and tail type information. A similar idea was examined in TransT

[ma2017transt]. Besides, entity type is also important for Information Extraction tasks including entity linking [gupta2017entity] and relation extraction [zhou2005exploring, culotta2004dependency].

Figure 1: A KG with the entity type information.
Figure 2: Illustration of the statistical approach.

On the other hand, entity type prediction is challenging for several reasons. First, collecting additional type labels for entities is expensive. Second, type information is often incomplete especially for large-scale datasets. Fig. 2

shows a snapshot of a KG, where missing entity types need to be inferred. Third, KGs are ever-evolving and type information is often corrupted by noisy facts. Thus, there is a need to design algorithms to predict missing type labels. Quite a few approaches have been proposed to predict missing entity types in KG. They can be classified into three different categories: namely statistical-based, classifier-based, and embedding-based methods. A brief review is given in Sec.

2.

The contributions of our work are summarized as follows:

  • We present a new method for entity type prediction named CORE (COmplex space Regression and Embedding). CORE leverages the expressive power of complex space embedding models including RotatE [sun2018rotate] and ComplEx [trouillon2016complex] to represent entities and types. To capture the relatedness of entities and types, a complex regression model is built between entity space and type space.

  • We conduct experiments on three major KG datasets and make performance comparison between CORE and state-of-the-art entity type prediction methods. CORE outperforms extant methods in most evaluation metrics.

  • We study and compare statistical-based, classifier-based, and embedding-based methods for entity type prediction. Strengths and weaknesses of different approaches are discussed. We also introduce a better statistical method baseline named SDType-Cond.

2 Related Work

Work on entity type prediction can be categorized into three types as elaborated below.

2.1 Statistical Approach.

Before machine learning is introduced to KG entity type prediction, the type inference is often performed using RDF rules and graph pattern matching

[gangemi2012automatic]. Although these handcrafted rules can make type prediction with high precision, they tend to miss a lot of possible cases since it is almost impossible to manually create all patterns. Hence, this approach is not scalable for large scale datasets. As such, researchers apply basic statistics to solve the KG entity type prediction problem. An early method, called SDType [paulheim2013type]

, predicts missing entity types by estimating the empirical probability distribution of

and aggregating all such conditional probabilities generated by neighboring relations of the target entity. Although this approach is robust to noise, it cannot predict unseen types and relation combinations. Another shortcoming of the statistical approach is that its performance deteriorates as the number of entity types becomes larger (say, with thousands of entity types). Furthermore, it does not exploit the type information of neighboring entities. In this paper, we will show that the type prediction performance can be further improved by conditioning on the type information of neighboring entities. Fig.

2 illustrates the statistical approach for entity type prediction. Suppose we want to predict the type of the target node, then given the information of each neighbor relation and neighbor node’s type, we can estimate the type distribution of for the target node.

2.2 Classification Approach.

Node classification is a common task in graph learning. One solution is to train a classifier with node features as the input and corresponding labels as the output. This idea can be applied to entity type prediction as well. Researchers [yaghoobzadeh-schutze-2015-corpus, xin2018improving, sofronova2020entity]

conducted experiments with entity textual descriptions in form of word embeddings as features and neural networks such as MLP, LSTM, and CNN as classifiers. An end-to-end architecture that learns entity embedding and type prediction jointly was proposed in

[jin2018attributed]

. The correlation between entity attributes and KG link structures were taken into account in the learning of distributed entity representations. In this work, with pretrained KG embeddings as features, we test a couple of classifiers such as SVM and XGBoost

[chen2016xgboost]. Again, we find this approach does not perform well for large datasets. In fact, a comparative study on statistical and classifier approaches was conducted in [jain2021embeddings]. By analyzing results from selected entity type classes, they concluded that KG embeddings fail to capture the semantics of entities and statistical approaches are often superior in type prediction. While we concur with their experimental findings that the combination of entity embedding features and classifiers tend to yield poor results, we do not agree that KG embedding models cannot be useful for entity type prediction as elaborated below.

2.3 Embedding Approach.

Before introducing the embedding approach for entity type prediction, it is worthwhile to review KG Embedding models briefly. According to [ji2021survey], a KG embedding model can be categorized based on its representation space and scoring function. Among several representation spaces, real and complex vector spaces are the two most common ones. TransE models triples in form of subject, relation, object or in -dimensional real vector space with the translational principle . Yet, TransE is not suitable for modeling asymmetric and many-to-one relations. To overcome this weakness, researchers venture into the complex vector space for and design models with greater expressive power. ComplEx and RotatE are two prominent examples. ComplEx is motivated by low rank matrix factorization in the complex space to model both symmetric and asymmetric relations effectively. Inspired by Euler’s identity, RotatE models relations as rotations in the complex vector space to remedy ComplEx’s inability to model composition patterns in KG. Both TransE and RotatE adopt distance-based scoring function while ComplEx has a semantic matching score.

Besides KG embedding, embedding-based entity type prediction approaches learn a distributed representation of entity types. For example, a distance-based scoring function can be used to measure the relevance of a particular type to the target entity. The ETE model

[moon2017learning] embeds entity and entity types in the same space. ConnectE [zhao-etal-2020-connecting] embeds entities and types in two different spaces and learns a mapping from the entity space to the type space. It leverages neighbor label information to boost the performance further. Based on a similar idea, JOIE [zhao-etal-2020-connecting] adds an intra-view component to model the hierarchical structure of the type ontology. ETE, ConnectE, and JOIE all adopt TransE embedding to represent KG entities. JOIE targets at better results for link prediction whereas ETE and ConnectE focus on improving entity type prediction. TransE is known to suffer from a few problems, e.g., not able to model asymmetric relations. Poor relation representation leads to poor entity representation. Since the quality of entity representation affects entity type prediction performance, we exploit the expressive power of complex-space KG embedding to achieve better results.

3 Proposed CORE Methods

To leverage the expressive power of the complex-space KG embedding, we propose to learn a set of embeddings for both the entity space and the type space as shown in Fig. 4. Specifically, we experiment with ComplEx and RotatE embedding models. On top of entity embedding and type embedding, we learn a regression between these two spaces. Finally, we make type predictions using a distance-based scoring scoring function based on the embeddings and regression parameters. The high-level concept of the proposed CORE model is given in Fig. 4.

3.1 Complex Space KG Embedding.

Let be a KG triple and denote the complex space representation of triple’s subject, relation, and object. For ComplEx, the score function is

where denotes an element-wise multi-linear dot product, and denotes the conjugate for complex vectors. For RotatE embedding, the score function is

where denotes the element-wise product.

We follow RotatE’s negative sampling loss and self-adversarial training strategy to train the embedding. The loss function for KG embedding models can be written as

where

is the sigmoid function,

is a fixed margin hyperparameter for training KG embedding,

is the th negative triple and is the probability of drawing negative triple . Given a positive triple, , the negative sampling distribution is

where is the temperature of sampling.

3.2 Complex Space Type Embedding.

Similar to definitions in KG Embedding space, we use to denote a type triple and to denote representations of the subject type, the relation, and the object type in the type embedding space. For ComplEx embedding, the score function is

For RotatE embedding, the score function is

To train type embedding, we use the self-adversarial negative sampling loss in form of

where is a fixed margin hyperparameter for training type embedding and is the th negative triple, and is the probability of drawing the negative triple . Given a positive triple, , the negative sampling distribution is

where is the temperature of sampling.

Figure 3: Illustration of RotatE entity space and RotatE type space and the regression linking two spaces.
Figure 4: Illustration of the CORE Model, where the blue and red dots denote entities and types in their complex embedding spaces, respectively.

3.3 Solving Complex Space Regression.

To propagate information from the entity space to the type space, we learn a regression between two complex spaces. A feasible and logical way of solving the complex regression is to cast the problem into a multivariate regression problem in real vector space. Formally, let , denote the representation of the entity and its type. We divide the real and the imaginary parts of every complex entity vector into two real vectors; namely, and . We do the same to divide the complex type vector into two real vectors: and .

As shown in Fig. 4, the regression process consists of four different real block matrices: , , and . The real part of the output vector depends on both the real and imaginary part of the input vector. Similarly, the imaginary part of the output vector also depends on both the real and imaginary part of the input vector. The regression problem can be rewritten as

where denotes the error vector. To minimize , we use the following score function

We find that the self-adversarial negative sampling strategies are useful in optimizing regression coefficients. The loss function in learning these coefficients is set to

where is a fixed margin hyperparameter in the regression, is the th negative pair, and is the probability of drawing negative pair . Given positive triple , the negative sampling distribution is equal to

where is the temperature of sampling.

3.4 Type Prediction.

We use the distance-based scoring function in the regression to predict entity types. The type prediction function can be written as

where denotes the set of all types.

3.5 Optimization.

We first initialize the embedding and regression parameters by sampling from the standard uniform distribution. Three parts of our model are optimized sequentially. First, we optimize the KG embeddings using KG triples and negative triples. Next, we move on to train regression and type space embeddings parameters. we freeze the KG embedding to ensure the regression is learning important information in this stage. Last, we further optimize the type space embeddings using type triples. To avoid overfitting of the regression model in the early training stage, we alternate the optimization for each part of the model every 1000 iterations.

3.6 Complexity.

The memory and space complexity for CORE are both , where denotes the number of objects, denotes the dimension, and the subscripts denotes entity, denotes relation, denotes type, respectively.

Dataset #Ent #Rel #Type #KG Triples #Entity Type Pairs
#Train #Valid #Test #Train #Valid #Test
FB15k-ET 14,951 1,345 3,851 483,142 50,000 59,071 136,618 15,749 15,780
YAGO43k-ET 42,335 37 45,182 331,687 29,599 29,593 375,853 42,739 42,750
DB111K-174 111,762 305 242 527,654 65,000 65,851 57,969 1,000 39,371
Table 1: Statistics of three KG datasets used in our experiments.

4 Experiments

4.1 Datasets

We evaluate the proposed CORE model by conducting experiments on several well-known KG datasets with the entity type information. They include FB15k-ET, YAGO43k-ET [moon2017learning], and DB111k-174 [hao2019universal], which are subsets of Freebase [bollacker2008freebase], YAGO [suchanek2007yago]

, and DBpedia

[auer2007dbpedia] KGs, respectively. [zhao-etal-2020-connecting] further clean FB15k-ET and YAGO43k-ET datasets by removing triples in the training sets from the validation and test sets. They also create a script to generate type triples by enumerating all possible combination of relations and the types of their subject and object. We use the same script to generate type triples for training type embedding. The statistics of these datasets are shown in Table 1.

4.2 Hyperparameter Setting

We list out the hyperparameter settings for each of the benchmarking datasets we run experiments on in Table  2. In this table, and denote the dimension of entity embedding and type embedding, respectively. , , and denote the entity batch size, type batch size, and negative sample size, respectively. , , and denote the sampling temperature, margin parameter, and learning rate respectively. In addition, we also show the MRR and Hits@k results for RotatE and ComplEx with different type dimensions for FB15k-ET in Fig. 6 and Fig. 6, respectively.

Dataset Model
FB15k-ET CORE-RotatE 1000 700 1024 4096 256 1 24 0.0001
CORE-ComplEx 500 550 1024 4096 400 1 24 0.0002
YAGO43k-ET CORE-RotatE 500 350 1024 4096 400 1 24 0.0002
CORE-ComplEx 500 350 1024 4096 400 1 24 0.0002
DB111K-174 CORE-RotatE 1000 250 1024 4096 256 1 24 0.0005
CORE-ComplEx 1000 250 1024 4096 400 1 24 0.0002
Table 2: Hyperparameter setting.
Figure 5: Comparison of the MRR performance for FB15k-ET as a function of the type dimension with RotatE embedding.
Figure 6: Comparison of the MRR performance for FB15k-ET as a function of the type dimension with ComplEx embedding.
Datasets FB15k-ET YAGO43k-ET
Metrics MRR H@1 H@3 H@10 MRR H@1 H@3 H@10
RESCAL [nickel2011three] 0.19 9.71 19.58 37.58 0.08 4.24 8.31 15.31
RES.-ET [moon2017learning] 0.24 12.17 27.92 50.72 0.09 4.32 9.62 19.40
HOLE [nickel2016holographic] 0.22 13.29 23.35 38.16 0.16 9.02 17.28 29.25
HOLE-ET [moon2017learning] 0.42 29.40 48.04 66.73 0.18 10.28 20.13 34.90
TransE [bordes2013translating] 0.45 31.51 51.45 73.93 0.21 12.63 23.24 38.93
TransE-ET [moon2017learning] 0.46 33.56 52.96 71.16 0.18 9.19 19.41 35.58
ETE [moon2017learning] 0.50 38.51 55.33 71.93 0.23 13.73 26.28 42.18
ConnectE-E2T [zhao-etal-2020-connecting] 0.57 45.53 62.31 78.12 0.24 13.54 26.20 44.51
ConnectE-E2T-TRT [zhao-etal-2020-connecting] 0.59 49.55 64.32 79.92 0.28 16.01 30.85 47.92
ConnectE-E2T-TRT (Actual) 0.58 47.45 64.33 77.55 0.14 8.04 17.59 24.62
SDType-Cond 0.42 27.56 50.09 71.23 - - - -
CORE-RotatE 0.60 49.32 65.25 81.09 0.32 22.96 36.55 51.00
CORE-ComplEx 0.60 48.91 66.30 81.60 0.35 24.17 39.18 54.95
Table 3: Performance comparison of various entity type prediction methods in terms of filtered ranking for FB15k-ET and YAGO43k-ET, where the best and the second best performance numbers are shown in bold face and with an underscore, respectively.
Datasets DB111K-174
Metrics MRR H@1 H@3
JOIE-HATransE-CT 0.857 75.55 95.91

SDType
0.861 78.53 92.67
ConnectE-E2T 0.88 81.63 94.19
ConnectE-E2T-TRT 0.90 82.96 96.07
SDType-Cond 0.879 80.99 94.05
CORE-RotatE 0.889 82.02 95.36
CORE-ComplEx 0.900 84.25 95.42
TransE+XGBoost 0.878 81.38 94.07
TransE+SVM 0.917 86.77 96.33
Table 4: Performance comparison of entity type prediction for DB111K-174, where the best and the second best performance numbers are shown in bold face and with an underscore, respectively.
Entity Model Top 3 Type Predictions
Albert Einstein ConnectE Islands of Sicily, Swiss singers, Heads of state of Canada
CORE Nobel laureates in Physics, Fellows of the Royal Society,
20th-century mathematicians
Warsaw ConnectE Defunct political parties in Poland, Political parties in Poland,
Universities in Poland
CORE Administrative district, Port cities, Cities in Europe
George Michael ConnectE United Soccer League players, People from Stourbridge,
Fortuna Düsseldorf managers
CORE British singers, English musicians, Rock singers
Table 5: An illustrative example of type prediction for the YAGO43k-ET dataset.

4.3 Benchmarking Methods

To the best of our knowledge, we are the first work to compare embedding-based methods with statistical-based and classifier-based methods since we would like to understand the strengths and weaknesses of different models.

4.3.1 Statistical-based Method.

We compare the performance of the proposed CORE model with that of a statistical-based method named SDType-Cond in FB15k-ET dataset. SDType-Cond is a variant of SDType. The neighbor type is readily available in many entities, yet SDType ignores this important piece of information. SDType-Cond is capable of estimating the type distribution more precisely by leveraging the known neighbor type information. Instead of estimating the type distribution given a paricular relation or , we estimate or , , , where and denote the set of all relations and the set of all entity types, respectively. The two probabilities represent the two cases where the target entity serves as a subject and an object, respectively. By aggregating the probabilities generated by all possible combinations in the neighborhood of the target entity, we can rank the type candidates using the following function:

where denotes the set of all combinations in the target entity’s neighborhood.

4.3.2 Classifier-based Method.

We also explore the node classification approach to solve the type prediction problem as a benchmarking method. Specifically, we experiment with the combination of pretrained TransE embedding as entity features and use SVM and XGBoost as classifiers. To get the pretrained TransE embedding for FB15k, we set the batch size to 1000, negative sample size to 256, hidden dimension to 1000, , the margin parameter , and the learning rate

, and train for 150000 epochs. For the SVM classifier, we set the regularization parameter

and adopt the Radial Basis Function (RBF) kernel. The kernel coefficient for RBF is

. For the XGBoost classifier, we set the learning rate to , the number of estimators to , the maximum depth of each estimator to , the minimum child weight to 1, and use the softmax objective function for training.

4.4 Experimental Results

To evaluate the performance of the CORE method and several benchmarking methods, we use the Mean Reciprocal Rank (MRR) and Hits@k as the performance metrics. Since models are trained to favor observed types as top choices, we filter the observed types from all possible type candidates when computing the ranking based on a scoring function.

First, we show the MRR and Hits@k results for RotatE and ComplEx with different type dimensions for FB15k-ET in Fig. 6 and Fig. 6, respectively. The optimal type dimensions for the RotatE and the ComplEx embeddings are 700 and 550. We adopt this setting in the following experiments.

Table 3 shows the results for FB15k-ET and YAGO34k-ET. We see from the table that our proposed CORE models offer the state-of-the-art performance in both datasets. CORE-ComplEx achieves the best performance in all categories except Hits@1 for FB15K-ET. It outperforms the previous best method, ConnectE-E2T-TRT, by a significant margin for YAGO34K-ET dataset.

We also experiment with classifier-based methods and observe that classifier-based methods do not scale well for large datasets such as FB15k-ET and YAGO43k-ET. The training time for classifier grows significantly with the size of the label set and the feature dimension. In addition, the performance of the classifier-based method is much lower than the statistical-based and embedding-based methods.

On the other hand, for datasets with a small number of distinct types, classifier-based methods do have some advantage. Table 4 compares the performance of several type prediction methods for DB111K-174. The SVM classifier with pretrained TransE embedding features outperforms all other methods. The idea to incorporate hierarchical structural information of types by JOIE does not seem effective for type prediction since even the simple statistical-based method such as SDType can outperform the JOIE baseline for MRR and Hits@1 metrics. By conditioning on neighbor type label information, SDType-Cond can further boost the performance. The performance gap among different benchmarking methods is smaller on DB111K-174.

To gain insights into entity type prediction, we provide an illustrative example of type prediction in the YAGO43k-ET dataset. In Table 5, we compare the top three type predictions by ConnectE and CORE for some well-known people and place. Although there are some lapses in CORE’s predictions, the model can make right decisions for most queries and majority of the top three candidates are in fact valid type labels for the corresponding target entity. These results demonstrate the impressive prediction power of our proposed CORE model, given the enormous amount of unique type labels.

5 Conclusion and Future Work

A complex regression and embedding method, called CORE, has been proposed to solve entity type prediction problem in this work by exploiting the expressive power of RotatE and ComplEx models. It embeds entities and types in two different complex spaces and used a complex regression model to measure the relatedness of entities and types. Finally, it optimizes embedding and regression parameters jointly to reach the optimal performance under this framework. Experimental results demonstrate that CORE offers great performance on representative KG entity type inference datasets and outperforms the state-of-the-art model by a significant margin for YAGO34K-ET dataset.

There are several research directions worth exploration in the future. First, the use of textual descriptions and transformer models to extract features for entity type prediction can be investigated. Second, we can examine the multilabel classification framework for entity type prediction since it shares a similar problem formulation. Although both try to predict multiple target labels, there are however differences. For multilabel classification, objects in the train set and the test set are disjoint. That is, we train the classifier using the train set and test it on a different set. For entity type prediction, the two sets are not disjoint. In training, a model is trained with a set of entity feature vectors and their corresponding labels. In inference, it is often to infer missing type labels for the same set of entities. Third, a binary classifier can also be used for entity type prediction. Yet, there exist far more negative samples than positive ones, and it requires good selection of negative examples to handle the data imbalance problem.

References