1 Introduction
Research on knowledge graph (KG) construction, completion, inference, and applications has grown rapidly in recent years since it offers a powerful tool for modeling human knowledge in graph forms. Nodes in KGs denote entities and links represent relations between entities. The basic building blocks of KG are entityrelation triples in form of (subject, predicate, object
) introduced by the Resource Description Framework (RDF). Learning representations for entities and relations in low dimensional vector spaces is one of the most active research topics in the field.
Entity type offers a valuable piece of information to KG learning tasks. Better results in KGrelated tasks have been achieved with the help of entity type. For example, TKRL [xie2016representation] uses a hierarchical type encoder for KG completion by incorporating entity type information. AutoETER [Niu:AutoETER] adopts a similar approach but encodes the type information with projection matrices. Based on DistMult [yang2014embedding] and ComplEx [trouillon2016complex] embedding, [jain2018type] propose an improved factorization model without explicit type supervision. JOIE [hao2019universal] attempts to embed entities and types in two separate spaces by learning instanceview embedding and ontologyview embedding. Similar to JOIE, TaRP [DBLP:conf/aaai/CuiKTGJ21]
leverages the hierarchical type ontology structure for relation prediction. Instead of learning embedding for all types, TaRP develops a heuristic weighting mechanism to rank the prior probability of a relation given its head and tail type information. A similar idea was examined in TransT
[ma2017transt]. Besides, entity type is also important for Information Extraction tasks including entity linking [gupta2017entity] and relation extraction [zhou2005exploring, culotta2004dependency].On the other hand, entity type prediction is challenging for several reasons. First, collecting additional type labels for entities is expensive. Second, type information is often incomplete especially for largescale datasets. Fig. 2
shows a snapshot of a KG, where missing entity types need to be inferred. Third, KGs are everevolving and type information is often corrupted by noisy facts. Thus, there is a need to design algorithms to predict missing type labels. Quite a few approaches have been proposed to predict missing entity types in KG. They can be classified into three different categories: namely statisticalbased, classifierbased, and embeddingbased methods. A brief review is given in Sec.
2.The contributions of our work are summarized as follows:

We present a new method for entity type prediction named CORE (COmplex space Regression and Embedding). CORE leverages the expressive power of complex space embedding models including RotatE [sun2018rotate] and ComplEx [trouillon2016complex] to represent entities and types. To capture the relatedness of entities and types, a complex regression model is built between entity space and type space.

We conduct experiments on three major KG datasets and make performance comparison between CORE and stateoftheart entity type prediction methods. CORE outperforms extant methods in most evaluation metrics.

We study and compare statisticalbased, classifierbased, and embeddingbased methods for entity type prediction. Strengths and weaknesses of different approaches are discussed. We also introduce a better statistical method baseline named SDTypeCond.
2 Related Work
Work on entity type prediction can be categorized into three types as elaborated below.
2.1 Statistical Approach.
Before machine learning is introduced to KG entity type prediction, the type inference is often performed using RDF rules and graph pattern matching
[gangemi2012automatic]. Although these handcrafted rules can make type prediction with high precision, they tend to miss a lot of possible cases since it is almost impossible to manually create all patterns. Hence, this approach is not scalable for large scale datasets. As such, researchers apply basic statistics to solve the KG entity type prediction problem. An early method, called SDType [paulheim2013type], predicts missing entity types by estimating the empirical probability distribution of
and aggregating all such conditional probabilities generated by neighboring relations of the target entity. Although this approach is robust to noise, it cannot predict unseen types and relation combinations. Another shortcoming of the statistical approach is that its performance deteriorates as the number of entity types becomes larger (say, with thousands of entity types). Furthermore, it does not exploit the type information of neighboring entities. In this paper, we will show that the type prediction performance can be further improved by conditioning on the type information of neighboring entities. Fig.
2 illustrates the statistical approach for entity type prediction. Suppose we want to predict the type of the target node, then given the information of each neighbor relation and neighbor node’s type, we can estimate the type distribution of for the target node.2.2 Classification Approach.
Node classification is a common task in graph learning. One solution is to train a classifier with node features as the input and corresponding labels as the output. This idea can be applied to entity type prediction as well. Researchers [yaghoobzadehschutze2015corpus, xin2018improving, sofronova2020entity]
conducted experiments with entity textual descriptions in form of word embeddings as features and neural networks such as MLP, LSTM, and CNN as classifiers. An endtoend architecture that learns entity embedding and type prediction jointly was proposed in
[jin2018attributed]. The correlation between entity attributes and KG link structures were taken into account in the learning of distributed entity representations. In this work, with pretrained KG embeddings as features, we test a couple of classifiers such as SVM and XGBoost
[chen2016xgboost]. Again, we find this approach does not perform well for large datasets. In fact, a comparative study on statistical and classifier approaches was conducted in [jain2021embeddings]. By analyzing results from selected entity type classes, they concluded that KG embeddings fail to capture the semantics of entities and statistical approaches are often superior in type prediction. While we concur with their experimental findings that the combination of entity embedding features and classifiers tend to yield poor results, we do not agree that KG embedding models cannot be useful for entity type prediction as elaborated below.2.3 Embedding Approach.
Before introducing the embedding approach for entity type prediction, it is worthwhile to review KG Embedding models briefly. According to [ji2021survey], a KG embedding model can be categorized based on its representation space and scoring function. Among several representation spaces, real and complex vector spaces are the two most common ones. TransE models triples in form of subject, relation, object or in dimensional real vector space with the translational principle . Yet, TransE is not suitable for modeling asymmetric and manytoone relations. To overcome this weakness, researchers venture into the complex vector space for and design models with greater expressive power. ComplEx and RotatE are two prominent examples. ComplEx is motivated by low rank matrix factorization in the complex space to model both symmetric and asymmetric relations effectively. Inspired by Euler’s identity, RotatE models relations as rotations in the complex vector space to remedy ComplEx’s inability to model composition patterns in KG. Both TransE and RotatE adopt distancebased scoring function while ComplEx has a semantic matching score.
Besides KG embedding, embeddingbased entity type prediction approaches learn a distributed representation of entity types. For example, a distancebased scoring function can be used to measure the relevance of a particular type to the target entity. The ETE model
[moon2017learning] embeds entity and entity types in the same space. ConnectE [zhaoetal2020connecting] embeds entities and types in two different spaces and learns a mapping from the entity space to the type space. It leverages neighbor label information to boost the performance further. Based on a similar idea, JOIE [zhaoetal2020connecting] adds an intraview component to model the hierarchical structure of the type ontology. ETE, ConnectE, and JOIE all adopt TransE embedding to represent KG entities. JOIE targets at better results for link prediction whereas ETE and ConnectE focus on improving entity type prediction. TransE is known to suffer from a few problems, e.g., not able to model asymmetric relations. Poor relation representation leads to poor entity representation. Since the quality of entity representation affects entity type prediction performance, we exploit the expressive power of complexspace KG embedding to achieve better results.3 Proposed CORE Methods
To leverage the expressive power of the complexspace KG embedding, we propose to learn a set of embeddings for both the entity space and the type space as shown in Fig. 4. Specifically, we experiment with ComplEx and RotatE embedding models. On top of entity embedding and type embedding, we learn a regression between these two spaces. Finally, we make type predictions using a distancebased scoring scoring function based on the embeddings and regression parameters. The highlevel concept of the proposed CORE model is given in Fig. 4.
3.1 Complex Space KG Embedding.
Let be a KG triple and denote the complex space representation of triple’s subject, relation, and object. For ComplEx, the score function is
where denotes an elementwise multilinear dot product, and denotes the conjugate for complex vectors. For RotatE embedding, the score function is
where denotes the elementwise product.
We follow RotatE’s negative sampling loss and selfadversarial training strategy to train the embedding. The loss function for KG embedding models can be written as
where
is the sigmoid function,
is a fixed margin hyperparameter for training KG embedding,
is the th negative triple and is the probability of drawing negative triple . Given a positive triple, , the negative sampling distribution iswhere is the temperature of sampling.
3.2 Complex Space Type Embedding.
Similar to definitions in KG Embedding space, we use to denote a type triple and to denote representations of the subject type, the relation, and the object type in the type embedding space. For ComplEx embedding, the score function is
For RotatE embedding, the score function is
To train type embedding, we use the selfadversarial negative sampling loss in form of
where is a fixed margin hyperparameter for training type embedding and is the th negative triple, and is the probability of drawing the negative triple . Given a positive triple, , the negative sampling distribution is
where is the temperature of sampling.
3.3 Solving Complex Space Regression.
To propagate information from the entity space to the type space, we learn a regression between two complex spaces. A feasible and logical way of solving the complex regression is to cast the problem into a multivariate regression problem in real vector space. Formally, let , denote the representation of the entity and its type. We divide the real and the imaginary parts of every complex entity vector into two real vectors; namely, and . We do the same to divide the complex type vector into two real vectors: and .
As shown in Fig. 4, the regression process consists of four different real block matrices: , , and . The real part of the output vector depends on both the real and imaginary part of the input vector. Similarly, the imaginary part of the output vector also depends on both the real and imaginary part of the input vector. The regression problem can be rewritten as
where denotes the error vector. To minimize , we use the following score function
We find that the selfadversarial negative sampling strategies are useful in optimizing regression coefficients. The loss function in learning these coefficients is set to
where is a fixed margin hyperparameter in the regression, is the th negative pair, and is the probability of drawing negative pair . Given positive triple , the negative sampling distribution is equal to
where is the temperature of sampling.
3.4 Type Prediction.
We use the distancebased scoring function in the regression to predict entity types. The type prediction function can be written as
where denotes the set of all types.
3.5 Optimization.
We first initialize the embedding and regression parameters by sampling from the standard uniform distribution. Three parts of our model are optimized sequentially. First, we optimize the KG embeddings using KG triples and negative triples. Next, we move on to train regression and type space embeddings parameters. we freeze the KG embedding to ensure the regression is learning important information in this stage. Last, we further optimize the type space embeddings using type triples. To avoid overfitting of the regression model in the early training stage, we alternate the optimization for each part of the model every 1000 iterations.
3.6 Complexity.
The memory and space complexity for CORE are both , where denotes the number of objects, denotes the dimension, and the subscripts denotes entity, denotes relation, denotes type, respectively.
Dataset  #Ent  #Rel  #Type  #KG Triples  #Entity Type Pairs  
#Train  #Valid  #Test  #Train  #Valid  #Test  
FB15kET  14,951  1,345  3,851  483,142  50,000  59,071  136,618  15,749  15,780 
YAGO43kET  42,335  37  45,182  331,687  29,599  29,593  375,853  42,739  42,750 
DB111K174  111,762  305  242  527,654  65,000  65,851  57,969  1,000  39,371 
4 Experiments
4.1 Datasets
We evaluate the proposed CORE model by conducting experiments on several wellknown KG datasets with the entity type information. They include FB15kET, YAGO43kET [moon2017learning], and DB111k174 [hao2019universal], which are subsets of Freebase [bollacker2008freebase], YAGO [suchanek2007yago]
, and DBpedia
[auer2007dbpedia] KGs, respectively. [zhaoetal2020connecting] further clean FB15kET and YAGO43kET datasets by removing triples in the training sets from the validation and test sets. They also create a script to generate type triples by enumerating all possible combination of relations and the types of their subject and object. We use the same script to generate type triples for training type embedding. The statistics of these datasets are shown in Table 1.4.2 Hyperparameter Setting
We list out the hyperparameter settings for each of the benchmarking datasets we run experiments on in Table 2. In this table, and denote the dimension of entity embedding and type embedding, respectively. , , and denote the entity batch size, type batch size, and negative sample size, respectively. , , and denote the sampling temperature, margin parameter, and learning rate respectively. In addition, we also show the MRR and Hits@k results for RotatE and ComplEx with different type dimensions for FB15kET in Fig. 6 and Fig. 6, respectively.
Dataset  Model  
FB15kET  CORERotatE  1000  700  1024  4096  256  1  24  0.0001 
COREComplEx  500  550  1024  4096  400  1  24  0.0002  
YAGO43kET  CORERotatE  500  350  1024  4096  400  1  24  0.0002 
COREComplEx  500  350  1024  4096  400  1  24  0.0002  
DB111K174  CORERotatE  1000  250  1024  4096  256  1  24  0.0005 
COREComplEx  1000  250  1024  4096  400  1  24  0.0002 
Datasets  FB15kET  YAGO43kET  

Metrics  MRR  H@1  H@3  H@10  MRR  H@1  H@3  H@10 
RESCAL [nickel2011three]  0.19  9.71  19.58  37.58  0.08  4.24  8.31  15.31 
RES.ET [moon2017learning]  0.24  12.17  27.92  50.72  0.09  4.32  9.62  19.40 
HOLE [nickel2016holographic]  0.22  13.29  23.35  38.16  0.16  9.02  17.28  29.25 
HOLEET [moon2017learning]  0.42  29.40  48.04  66.73  0.18  10.28  20.13  34.90 
TransE [bordes2013translating]  0.45  31.51  51.45  73.93  0.21  12.63  23.24  38.93 
TransEET [moon2017learning]  0.46  33.56  52.96  71.16  0.18  9.19  19.41  35.58 
ETE [moon2017learning]  0.50  38.51  55.33  71.93  0.23  13.73  26.28  42.18 
ConnectEE2T [zhaoetal2020connecting]  0.57  45.53  62.31  78.12  0.24  13.54  26.20  44.51 
ConnectEE2TTRT [zhaoetal2020connecting]  0.59  49.55  64.32  79.92  0.28  16.01  30.85  47.92 
ConnectEE2TTRT (Actual)  0.58  47.45  64.33  77.55  0.14  8.04  17.59  24.62 
SDTypeCond  0.42  27.56  50.09  71.23         
CORERotatE  0.60  49.32  65.25  81.09  0.32  22.96  36.55  51.00 
COREComplEx  0.60  48.91  66.30  81.60  0.35  24.17  39.18  54.95 
Datasets  DB111K174  

Metrics  MRR  H@1  H@3 
JOIEHATransECT  0.857  75.55  95.91 
SDType 
0.861  78.53  92.67 
ConnectEE2T  0.88  81.63  94.19 
ConnectEE2TTRT  0.90  82.96  96.07 
SDTypeCond  0.879  80.99  94.05 
CORERotatE  0.889  82.02  95.36 
COREComplEx  0.900  84.25  95.42 
TransE+XGBoost  0.878  81.38  94.07 
TransE+SVM  0.917  86.77  96.33 
Entity  Model  Top 3 Type Predictions 
Albert Einstein  ConnectE  Islands of Sicily, Swiss singers, Heads of state of Canada 
CORE  Nobel laureates in Physics, Fellows of the Royal Society,  
20thcentury mathematicians  
Warsaw  ConnectE  Defunct political parties in Poland, Political parties in Poland, 
Universities in Poland  
CORE  Administrative district, Port cities, Cities in Europe  
George Michael  ConnectE  United Soccer League players, People from Stourbridge, 
Fortuna Düsseldorf managers  
CORE  British singers, English musicians, Rock singers 
4.3 Benchmarking Methods
To the best of our knowledge, we are the first work to compare embeddingbased methods with statisticalbased and classifierbased methods since we would like to understand the strengths and weaknesses of different models.
4.3.1 Statisticalbased Method.
We compare the performance of the proposed CORE model with that of a statisticalbased method named SDTypeCond in FB15kET dataset. SDTypeCond is a variant of SDType. The neighbor type is readily available in many entities, yet SDType ignores this important piece of information. SDTypeCond is capable of estimating the type distribution more precisely by leveraging the known neighbor type information. Instead of estimating the type distribution given a paricular relation or , we estimate or , , , where and denote the set of all relations and the set of all entity types, respectively. The two probabilities represent the two cases where the target entity serves as a subject and an object, respectively. By aggregating the probabilities generated by all possible combinations in the neighborhood of the target entity, we can rank the type candidates using the following function:
where denotes the set of all combinations in the target entity’s neighborhood.
4.3.2 Classifierbased Method.
We also explore the node classification approach to solve the type prediction problem as a benchmarking method. Specifically, we experiment with the combination of pretrained TransE embedding as entity features and use SVM and XGBoost as classifiers. To get the pretrained TransE embedding for FB15k, we set the batch size to 1000, negative sample size to 256, hidden dimension to 1000, , the margin parameter , and the learning rate
, and train for 150000 epochs. For the SVM classifier, we set the regularization parameter
and adopt the Radial Basis Function (RBF) kernel. The kernel coefficient for RBF is
. For the XGBoost classifier, we set the learning rate to , the number of estimators to , the maximum depth of each estimator to , the minimum child weight to 1, and use the softmax objective function for training.4.4 Experimental Results
To evaluate the performance of the CORE method and several benchmarking methods, we use the Mean Reciprocal Rank (MRR) and Hits@k as the performance metrics. Since models are trained to favor observed types as top choices, we filter the observed types from all possible type candidates when computing the ranking based on a scoring function.
First, we show the MRR and Hits@k results for RotatE and ComplEx with different type dimensions for FB15kET in Fig. 6 and Fig. 6, respectively. The optimal type dimensions for the RotatE and the ComplEx embeddings are 700 and 550. We adopt this setting in the following experiments.
Table 3 shows the results for FB15kET and YAGO34kET. We see from the table that our proposed CORE models offer the stateoftheart performance in both datasets. COREComplEx achieves the best performance in all categories except Hits@1 for FB15KET. It outperforms the previous best method, ConnectEE2TTRT, by a significant margin for YAGO34KET dataset.
We also experiment with classifierbased methods and observe that classifierbased methods do not scale well for large datasets such as FB15kET and YAGO43kET. The training time for classifier grows significantly with the size of the label set and the feature dimension. In addition, the performance of the classifierbased method is much lower than the statisticalbased and embeddingbased methods.
On the other hand, for datasets with a small number of distinct types, classifierbased methods do have some advantage. Table 4 compares the performance of several type prediction methods for DB111K174. The SVM classifier with pretrained TransE embedding features outperforms all other methods. The idea to incorporate hierarchical structural information of types by JOIE does not seem effective for type prediction since even the simple statisticalbased method such as SDType can outperform the JOIE baseline for MRR and Hits@1 metrics. By conditioning on neighbor type label information, SDTypeCond can further boost the performance. The performance gap among different benchmarking methods is smaller on DB111K174.
To gain insights into entity type prediction, we provide an illustrative example of type prediction in the YAGO43kET dataset. In Table 5, we compare the top three type predictions by ConnectE and CORE for some wellknown people and place. Although there are some lapses in CORE’s predictions, the model can make right decisions for most queries and majority of the top three candidates are in fact valid type labels for the corresponding target entity. These results demonstrate the impressive prediction power of our proposed CORE model, given the enormous amount of unique type labels.
5 Conclusion and Future Work
A complex regression and embedding method, called CORE, has been proposed to solve entity type prediction problem in this work by exploiting the expressive power of RotatE and ComplEx models. It embeds entities and types in two different complex spaces and used a complex regression model to measure the relatedness of entities and types. Finally, it optimizes embedding and regression parameters jointly to reach the optimal performance under this framework. Experimental results demonstrate that CORE offers great performance on representative KG entity type inference datasets and outperforms the stateoftheart model by a significant margin for YAGO34KET dataset.
There are several research directions worth exploration in the future. First, the use of textual descriptions and transformer models to extract features for entity type prediction can be investigated. Second, we can examine the multilabel classification framework for entity type prediction since it shares a similar problem formulation. Although both try to predict multiple target labels, there are however differences. For multilabel classification, objects in the train set and the test set are disjoint. That is, we train the classifier using the train set and test it on a different set. For entity type prediction, the two sets are not disjoint. In training, a model is trained with a set of entity feature vectors and their corresponding labels. In inference, it is often to infer missing type labels for the same set of entities. Third, a binary classifier can also be used for entity type prediction. Yet, there exist far more negative samples than positive ones, and it requires good selection of negative examples to handle the data imbalance problem.
Comments
There are no comments yet.