1 Introduction
Knowledge graphs (KGs), i.e., graphbased knowledgebases, have proven to be sources of valuable information that have become important for various applications like websearch or question answering. Whereas, KGs were initially driven by academic efforts which resulted in KGs like Freebase [4], DBpedia [3], Nell [6] or YAGO [9], more recently commercial applications have evolved; a significant commercial application is the Freebase powered Google Knowledge Graph that supports Google’s web search and the smart assistant Google Now, or Microsoft’s Satori that supports Bing and Cortana. A related activity is the linked open data initiative which interlinks data sources using the W3C Resource Description Framework (RDF) [13] and thus also generates a huge KG accessible via querying [2].
Even though these graphs have reached an impressive size, containing billions of facts about the world, they are not errorfree and far from complete. In Freebase and DBpedia for example a vast amount of persons (71% in Freebase [8]
and 66% in DBpedia) are missing a place of birth. In DBpedia 58% of the scientists do not have a fact that describes what they are known for. Supporting KG cleaning, completion and construction via machine learning is one of the core challenges. In this context, Representation Learning in form of latent variable methods has successfully been applied to KG data
[19, 20, 5, 10, 7]. These models learn latent embeddings for entities and relationtypes from the data that can then be used as representations of their semantics. It is highly desirable that these embeddings are meaningful in low dimensional latent spaces, because a higher dimensionality leads to a higher model complexities which can cause unacceptable runtime performances and high memory loads. Latent variable models have recently been exploited for generating priors for facts in the context of automatic graphbased knowledgebase construction [8]. It has also been shown that these models can be interpreted as a compressed probabilistic knowledge representation, which allows complex querying over all possible triples and their uncertainties, resulting in a probabilistically ranked list of query answers [11].In addition to the stored facts, schemabased KGs also provide rich descriptions of the semantics of entities and relationtypes such as class hierarchies of entities and typeconstraints for relationtypes which define the semantic role of relations. This curated prior knowledge on relationtypes provides valuable information to machines, e.g. that the marriedTo relationtype should relate only instances of the class Person. In recent work [10, 7], it has been shown that RESCAL, a much studied latent variable approach, benefits greatly from prior knowledge about the semantics of relationtypes. In this work we will study the impact of prior knowledge about the semantics of relationtypes in the state of the art representative latent variable models TransE [5], RESCAL [18]
and the multiway neural network approach used in the Google Knowledge Vault project
[8]. These models are very different in the way they model KGs, and therefore they are especially well suited for drawing conclusions on the general value of prior knowledge about relationtypes for the statistical modeling of KGs with latent variable models.Additionally, we address the issue that typeconstraints can also suffer from incompleteness, e.g. rdfs:domain or rdfs:range concepts are absent in the schema or the entities miss proper typing even after materialization. Here, we study the local closedworld assumption as proposed in prior work [10] that approximates the semantics of relationtypes based on observed triples. We provide empirical proof that this prior assumption on relationtypes generally improves linkprediction quality in case proper typeconstraints are absent.
This paper is structured as follows: In the next section we motivate our model selection and briefly review RESCAL, TransE and the multiway neural network approach of [8]. The integration of typeconstraints and local closedworld assumptions into these models will be covered in Section 3. In Section 4, we will motivate and describe our experimental setup before we discuss our results in Section 5. We provide related work in Section 6 and conclude in Section 7.
2 Latent Variable Models for Knowledge Graph Modeling
In this work, we want to study the general value of prior knowledge about the semantics of relationtypes for the statistical modeling of KGs with latent variable models. For this reason, we have to consider a representative set of latent variable models that covers the currently most promising research activities in this field. We selected RESCAL [18], TransE [5] and the multiway neural network approach pursued in the Google’s Knowledge Vault project [8] (denoted as mwNN) for a number of reasons:

All of these models have been published at well respected conferences and are the basis for the most recent research activities in the field of statistical modeling of KGs (see Section 6).

These models are very diverse, meaning they are very different in the way they model KGs, thereby covering a wide range of possible ways a KG can be statistically modeled; the RESCAL tensorfactorization is a bilinear model, where the distancebased TransE models triples as linear translations and the mwNN exploits nonlinear interactions of latent embeddings in its neural network layers.
2.1 Notation
In this work, will denote a threeway tensor, where represents the th frontal slice of the tensor . Further will denote the frontalslice where only subject entities (rows) and object entities (columns) are included that agree with the domain and range constraints of relationtype . or denote matrices and is the
th column vector of
. A single entry of will be denoted as . Additionally we use to illustrate the indexing of multiple rows from the matrix , where is a vector of indices and “” the colon operator, generally used when indexing arrays. Further (s,p,o) will denote a triple with subject entity , object entity and predicate relationtype , where the entities and represent nodes in the KG that are linked by the predicate relationtype . The entities belong to the set of all observed entities in the data.2.2 Rescal
RESCAL [18] is a threeway tensor factorization method that has been shown to lead to very good results in various canonical relational learning tasks like linkprediction, entity resolution and collective classification [19]. In RESCAL, triples are represented in an adjacency tensor of shape , where is the amount of observed entities in the data and is the amount of relationtypes. Each of the frontal slices of represents an adjacency matrix for all entities in the dataset with respect to the th relationtype. Given an adjacency tensor , RESCAL computes a rank factorization, where each entity is represented via a dimensional vector that is stored in the factor matrix and each relationtype is represented via a frontal slice of the core tensor which encodes the asymmetric interactions between subject and object entities. The embeddings are learned by minimizing the regularized leastsquares function
(1) 
where and are hyperparameters and is the Frobenius norm. The cost function can be minimized via very efficient Alternating LeastSquares (ALS) that effectively exploits data sparsity [18] and closedform solutions. During factorization, RESCAL finds a unique latent representation for each entity that is shared between all relationtypes in the dataset.
RESCAL’s confidence for a triple is computed through reconstruction by the vectormatrixvector product
(2) 
from the latent representations of the subject and object entities and , respectively and the latent representation of the predicate relationtype .
2.3 Translational Embeddings Model
TransE [5] is a distancebased model that models relationships of entities as translations in the embedding space. The approach assumes for a true fact that a relationtype specific translation function exists that is able to map (or translate) the latent vector representation of the subject entity to the latent representation the object entity. The fact confidence is expressed by the similarity of the translation of the subject embedding to the object embedding.
In case of TransE, the translation function is defined by a simple addition of the latent vector representations of the subject entity and the predicate relationtype . The similarity of the translation and the object embedding is measured by the or distance. TransE’s confidence in a triple is derived by
(3) 
where is the or the distance and the latent embedding for the object entity. The embeddings are learned by minimizing the maxmarginbased ranking cost function
with  (4) 
on a set of observed training triples
through Stochastic Gradient Descent (SGD), where
. The “corrupted” entities and are drawn from the set of all observed entitieswhere the ranking loss function enforces that the confidence in the corrupted triples (
or ) is lower than in the true triple by a certain margin. During training, it is enforced that the latent embeddings of entities have an norm of one after each SGD iteration.2.4 Knowledge Vault Neural Network
In the Google Knowledge Vault project [8]
a multiway neural network (mwNN) for predicting prior probabilities for triples from existing KG data was proposed to support triple extraction from unstructured web documents. The confidence value
for a target triple is predicted by(5) 
where () is a nonlinear function like e.g. tanh, and describe the latent embeddings for the subject and object entities and is the latent embedding vector for the predicate relationtype . is a column vector that stacks the three embeddings on top of each other. and are neural network weights and denotes the logistic function. The model is trained by minimizing the Bernoulli costfunction
(6) 
through SGD, where denotes the number of objectcorrupted triples sampled under a local closedworld assumption as defined by [8]. Note that corrupted are treated as negative evidence in this model.
3 Prior Knowledge On RelationType Semantics
Generally, entities in KGs like DBpedia, Freebase or YAGO are assigned to one or multiple predefined classes (or types) that are organized in an often hierarchical ontology. These assignments represent for example the knowledge that the entity Albert Einstein is a person and therefore allow a semantic description of the entities contained in the KG. This organization of entities in semantically meaningful classes permits a semantic definition of relationtypes. The RDFSchema, which provides schema information for RDF, offers among others the concepts rdfs:domain and rdfs:range for this purpose. These concepts are used to represent typeconstraints on relationtypes by defining the classes or types of entities which they should relate, where the domain covers the subject entity classes and the range the object entity classes in a RDFTriple. This can be interpreted as an explicit definition of the semantics of a relation, for example by defining that the relationtype marriedTo should only relate instances of the class Person with each other. Recently [7] and [10] showed independently that including knowledge about these domain and range constraints into RESCAL’s ALS optimization scheme resulted in better latent representations of entities and relationtypes that lead to a significantly improved linkprediction quality at a much lower model complexity (lower rank) when applied to KGs like DBpedia or Nell. The need of a less complex model significantly decreases model trainingtime especially for larger datasets.
In the following, we denote as the ordered indices of all entities that agree with the domain constraints of relationtype . Accordingly, denotes these indices for the range constraints of relationtype .
3.1 TypeConstrained Alternating LeastSquares
In RESCAL, the integration of typed relations in the ALS optimization procedure is achieved by indexing only those latent embeddings of entities for each relationtype that agree with the rdfs:domain and rdfs:range constraints. In addition, only the subgraph (encoded by the sparse adjacency matrix ) that is defined with respect to the constraints is considered in the equation
(7)  
where contains the latent embeddings for the entities and the embedding for the relationtype . For each relationtype the latent embeddings matrix is indexed by the corresponding domain and range constraints, thereby excluding all entities that disagree with the typeconstraints. Note that if the adjacency matrix of the subgraph defined by relationtype and its typeconstraints has the shape , then is of shape , and of shape where is the dimension of the latent embeddings (or rank of the factorization).
3.2 TypeConstrained Stochastic Gradient Descent
In contrast to RESCAL, TransE and mwNN are both optimized through minibatch Stochastic Gradient Descent (SGD), where a small batch of randomly sampled triples is used in each iteration of the optimization to drive the model parameters to a local minimum. Generally, KG data does not explicitly contain negative evidence, i.e. false triples ^{1}^{1}1There are of course undetected false triples included in graph which are assumed to be true., and is generated in this algorithms through corruption of observed triples (see Section 2.3 and 2.4). In the original algorithms of TransE and mwNN the corruption of triples is not restricted and can therefore lead to the generation of triples that violate the semantics of relationtypes. For integrating knowledge about typeconstraints into the SGD optimization scheme of these models, we have to make sure that none of the corrupted triples violates the typeconstraints of the corresponding relationtypes. For TransE we update Equation 2.3 and get
with  (8) 
where, in difference to Equation 2.3, we enforce by that the subject entities are only corrupted through the subset of entities that belong to the domain and by that the corrupted object entities are sampled from the subset of entities that belong to the range of predicate relationtype . For mwNN we corrupt only the object entities through sampling from the subset of entities that belong to the range of the predicate relationtype and get accordingly
(9) 
3.3 Local ClosedWorld Assumptions
Typeconstraints as given by KGs tremendously reduce the possible worlds of the statistically modeled KGs, but like the rest of the data represented by the KG, they can also suffer from incompleteness and inconsistency of the data. Even after materialization, entities and relationtypes might miss complete typing leading to fuzzy typeconstraints. Increased fuzziness of proper typing can in turn lead to disagreements of true facts and present typeconstraints in the KG. For relationtypes where these kind of inconsistencies are quite frequent we cannot simply apply the given typeconstraints without the risk of loosing true triples. On the other hand, if the domain and range constraints themselves are missing (e.g. in schemaless KGs) we might consider many triples that do not have any semantic meaning.
We argue that in these cases a local closedworld assumption (LCWA) can be applied which approximates the domain and range constraints of the targeted relationtype not on class level, but on instance level based solely on observed triples. Given all observed triples, under this LCWA the domain of a relationtype consists of all entities that are related by the relationtype as subject. The range is accordingly defined, but contains all the entities related as object by relationtype . Of course, this approach can exclude entities from the domain or range constraints that agree with the typeconstraints given by the RDFSSchema concepts rdfs:domain and rdfs:range, thereby ignoring them during model training when exploiting the LCWA (only for the target relationtype). On the other hand, nothing is known about these entities (in object or subject role) with respect to the target relationtype and therefore treating them as missing can be a valid assumption. In case of the ALS optimized RESCAL we reduce the size and sparsity of the data by this approach, which has a positive effect on model training compared to the alternative, a closedworld assumption that considers all entities to be part of the domain and range of the target relationtype [10]. For the SGD optimized TransE and mwNN models also a positive effect on the learned factors is expected since the corruption of triples will be based on entities from which we can expect that they do not disagree to the semantics of the corresponding relationtype.
4 Experimental Setup
^{2}^{2}2Code and datasets will be available from http://www.dbs.ifi.lmu.de/krompass/As stated before, we explore in our experiments the importance of prior knowledge about the semantics of relationtypes for latent variable models. We consider two settings. In the first setting, we assume that curated typeconstraints extracted from the KG’s schema are available. In the second setting, we explore the local closedworld assumption (see Section 3.3). Our experimental setup covers three important aspects which will enable us to make generalizing conclusions about the importance of such prior knowledge when applying latent variable models to KGs:

We test various representative latent variable models that cover the diversity of these models in the domain. As motivated in the introduction of Section 2, we belief that RESCAL, TransE and mwNN are especially well suited for this task.

We extracted diverse datasets from instances of the LinkedOpen Data Cloud, namely Freebase, YAGO and DBpedia, because it is expected that the value of prior knowledge about relationtype semantics is also dependent on the particular dataset the models are applied to. From these KGs we constructed datasets that will be used as representatives for general purpose KGs that cover a wide range of relationtypes from a diverse set of domains, domain focused KGs with a small amount of entity classes and relationtypes and high quality KGs.
In the remainder of this section we will give details on the extracted datasets and the evaluation, implementation and training of RESCAL, TransE and mwNN.
4.1 Datasets
Dataset  Source  Entities  RelationTypes  Triples 

DBpediaMusic  DBpedia 2014  321,950  15  981,383 
Freebase150k  Freebase RDFDump  151,146  285  1,047,844 
YAGOc195k  YAGO2Core  195,639  32  1,343,684 
Below, we describe how we extracted the different datasets from Freebase, DBpedia and YAGO. In Table 1 some details about the size of these datasets are given. In our experiments, the Freebase150k dataset will simulate a general purpose KG, the DBpediaMusic dataset a domain specific KG and the YAGOc195k dataset a high quality KG.
4.1.1 Freebase150k
The Freebase KG includes triples extracted from Wikipedia Infoboxes, MusicBrainz [21], WordNet [15] and many more. From the current materialized Freebase RDFdump^{3}^{3}3https://developers.google.com/freebase/data, we extracted entitytypes, typeconstraints and all triples that involved entities (Topics) with more than 100 relations to other topics. Subsequently, we discarded the triples of relationtypes with incomplete typeconstraints or which occurred in less than 100 triples. Additionally, we discarded all triples that involved entities that are not an instance of any class covered by the remaining typeconstraints. The entities involved in typeconstraint violating triples were added to the subset of entities that agree with the typeconstraints since we assumed that they only miss proper typing.
4.1.2 DBpediaMusic
For the DBpediaMusic datasets, we extracted triples and types from 15 preselected objectproperties regarding the music domain of DBpedia ^{4}^{4}4http://wiki.dbpedia.org/Downloads2014
, canonicalized datasets: mappingbasedproperties(cleaned), mappingbasedtypes and heuristics
; musicalBand, musicalArtist, musicBy, musicSubgenre, derivative, stylisticOrigin, associatedBand, associatedMusicalArtist, recordedIn, musicFusionGenre, musicComposer, artist, bandMember, formerBandMember, genre, where genre has been extracted to include only those entities that were covered by the other objectproperties to restrict it to musical genres. We extracted the typeconstraints from the DBpedia OWLOntology and for entities that occurred less than two times we discarded all triples. In case types for entities or typeconstraints were absent we assigned them to owl#Thing. Remaining disagreements between triples and typeconstraints were resolved as in case of the Freebase150k dataset.4.1.3 YAGOc195k
YAGO (Yet Another Great Ontology) is an automatically generated high quality KG that combines the information richness of Wikipedia Infoboxes and its category system with the clean taxonomy of WordNet. We extracted entitiy types, typeconstraints^{5}^{5}5yagoSchema and yagoTransitiveType and all triples that involved entities with more than 5 and relationtypes that were involved in more than 100 relations from the YAGOcore dataset^{6}^{6}6http://www.mpiinf.mpg.de/departments/databasesandinformationsystems/research/yagonaga/yago/downloads/. We only included entities that share the types used in the rdfs:domain and rdfs:range triples.
4.2 Evaluation Procedure
We evaluate RESCAL, TransE and mwNN on link prediction tasks, where we delete triples from the datasets and try to repredict them without considering them during model training. For model training and evaluation we split the triples of the datasets into three sets, where 20% of the triples were taken as holdout set, 10% as validation set for hyperparameter tuning and the remaining 70% served as training set^{7}^{7}7additional 5% of the training set were used for early stopping
. In case of the validation and holdout set, we sampled 10 times as many negative triples for evaluation, where the negative triples were drawn such that they did not violate the given domain and range constraints of the KG. Also, the negative evidence of the holdout and validation set are not overlapping. In KG data, we are generally dealing with a strongly skewed ratio of observed and unobserved triples, through this sampling we try to mimic this effect to some extend since it is intractable to sample all unobserved triples. In case of the LCWA, the domain and range constraints are always derived from the training set. After deriving the best hyperparameter settings for all models, we trained all models with these settings using both, the training and the validation set to predict the holdout set (20% of triples). We report the Area Under Precision Recall Curve (AUPRC) for all models. In addition, we provide the Area Under Receiver Operating Characteristic Curve (AUROC), because it is widely used in this problem even though it is not well suited for evaluation in these tasks due to the imbalance of (assumed) false and true triples.
^{8}^{8}8AUROC considers the falsepositive rate which relies on the amount of truenegatives that is generally high in these kind of datasets resulting in misleadingly high scores. The discussions and conclusions will be primarily based on the AUPRC results.4.3 Implementation and Model Training Details
All models were implemented in Python using in part Theano
[1]. For TransE we exploited the code provided by the authors ^{9}^{9}9https://github.com/glorotxa/SME as a basis to implement a typeconstraints supporting version of TransE, but we replaced large parts of the original code to allow a significantly faster training.^{10}^{10}10Mainly caused by the ranking function used for calculating the validation error but also the consideration of trivial zero gradients during the SGDupdates. We made sure that our implementation achieved very similar results to the original model on a smaller dataset^{11}^{11}11http://alchemy.cs.washington.edu/data/cora/ (results not shown).The mwNN was also implemented in Theano. Since there are not many details on model training in the corresponding work [8], we added elasticnet regularization combined with DropConnect [22]
on the network weights and optimized the cost function using minibatch adaptive gradient descent. We randomly initialized the weights by drawing from a zero mean normal distribution where we treat the standard deviation as an additional hyperparameter. The corrupted triples were sampled with respect to the local closedworld assumption discussed in
[8]. We fixed the amount of corrupted triples per training example to five.^{12}^{12}12We tried different amounts of corrupted triples and five seemed to give the most stable results across all datasetsFor RESCAL, we used the ALS implementation provided by the author^{13}^{13}13https://github.com/mnick/scikittensor and our own implementation used in [10], but modified them such that they support a more scalable early stopping criteria based on a small validation set.
For hyperparameter tuning, all models were trained for a maximum of 50 epochs and for the final evaluation on the holdout set for a maximum of 200 epochs. For all models, we sampled 5% of the training data and used the change in AUPRC on this subsample as early stopping criteria.
5 Experimental Results
In tables 2, 3 and 4 our experimental results for RESCAL, TransE and mwNN are shown. All of these tables have the same structure and compare different versions of exactly one of these methods on all three datasets. Table 2 for example shows the results for RESCAL and Table 4 the results of mwNN. The first column in these tables indicates the datasets the model was applied to (Freebase150k, DbpediaMusic or YAGOc195) and the second column which kind of prior knowledge about the semantics of relationtypes was exploited by the model. None denotes in this case the original model that does not consider any prior knowledge on relationtypes, whereas TypeConstraints denotes that the model has exploited the curated domain and range constraints extracted from the KG’s schema and LCWA that the model has exploited the Local ClosedWorld Assumption (Section 3.3) during model training. The last two columns show the AUPRC and AUROC scores for the various model versions on the different datasets. Each of these two columns contains three subcolumns that show the AUPRC and AUROC scores at different enforced latent embedding lengths: 10, 50 or 100.
Prior Knowledge on Semantics  AUPRC  AUROC  

RESCAL  d=10  d=50  d=100  d=10  d=50  d=100  
Freebase150k  None  0.327  0.453  0.514  0.616  0.700  0.753 
TypeConstraints  0.521  0.630  0.654  0.804  0.863  0.877  
LCWA  0.579  0.675  0.699  0.849  0.886  0.896  
DBpediaMusic  None  0.307  0.362  0.416  0.583  0.617  0.653 
TypeConstraints  0.413  0.490  0.545  0.656  0.732  0.755  
LCWA  0.453  0.505  0.571  0.701  0.776  0.800  
YAGOc195k  None  0.507  0.694  0.721  0.621  0.787  0.800 
TypeConstraints  0.626  0.721  0.739  0.785  0.820  0.833  
LCWA  0.567  0.672  0.680  0.814  0.839  0.849 
Prior Knowledge on Semantics  AUPRC  AUROC  

TransE  d=10  d=50  d=100  d=10  d=50  d=100  
Freebase150k  None  0.548  0.715  0.743  0.886  0.890  0.892 
TypeConstraints  0.699  0.797  0.808  0.897  0.918  0.907  
LCWA  0.671  0.806  0.831  0.894  0.932  0.931  
DBpediaMusic  None  0.701  0.748  0.745  0.902  0.911  0.903 
TypeConstraints  0.734  0.783  0.826  0.927  0.937  0.942  
LCWA  0.719  0.839  0.848  0.910  0.943  0.953  
YAGOc195  None  0.793  0.849  0.816  0.904  0.960  0.910 
TypeConstraints  0.843  0.896  0.896  0.962  0.972  0.974  
LCWA  0.790  0.861  0.872  0.942  0.962  0.962 
Prior Knowledge on Semantics  AUPRC  AUROC  

mwNN  d=10  d=50  d=100  d=10  d=50  d=100  
Freebase150k  None  0.437  0.471  0.512  0.852  0.868  0.879 
TypeConstraints  0.775  0.815  0.837  0.956  0.962  0.967  
LCWA  0.610  0.765  0.776  0.918  0.954  0.956  
DBpediaMusic  None  0.436  0.509  0.538  0.836  0.864  0.865 
TypeConstraints  0.509  0.745  0.754  0.858  0.908  0.913  
LCWA  0.673  0.707  0.723  0.876  0.900  0.884  
YAGOc195  None  0.600  0.684  0.655  0.949  0.949  0.957 
TypeConstraints  0.836  0.840  0.837  0.953  0.954  0.960  
LCWA  0.714  0.836  0.833  0.926  0.935  0.943 
5.1 TypeConstraints are Essential
The experimental results shown in Table 2, 3 and 4 give strong evidence that typeconstraints as provided by the KG’s schema are generally of great value for the statistical modeling of KGs with latent variable models. For all datasets, this prior information lead to significant improvements in linkprediction quality for all models and settings in both, AUPRC and AUROC. For example, RESCAL’s, AUPRC score on the Freebase150k dataset gets improved from 0.327 to 0.521 at the lowest model complexity () (Table 2). With higher model complexities the relative improvements decrease but stay significant (27% at from 0.514 to 0.654). The benefit for RESCAL in considering typeconstraints was expected due to prior works [7, 10], but also the other models improve significantly when considering typeconstraints.
For TransE, large improvements on the Freebase150k and DBpediaMusic datasets can be observed (Table 3), where the AUPRC score increases e.g. for from 0.548 to 0.699 in Freebase150k and for from 0.745 to 0.826 in DBpediaMusic. Also on the YAGOc195k dataset the linkprediction quality improves from 0.793 to 0.843 with . Especially the multiway neural network approach (mwNN) seems to improve the most by considering typeconstraints during the model training (Table 4). In case of the Freebase150k dataset, it improves up to 77% in AUPRC for from 0.437 to 0.775 and on the DBpediaMusic dataset from 0.436 to 0.509 with and from 0.538 to 0.754 with in AUPRC. In case of the YAGOc195k dataset the linkprediction quality of mwNN also benefits to a large extent from the typeconstraints.
Besides observing that the latent variable models are superior when exploiting typeconstraints at a fixed latent embedding length , it is also worth noticing that the biggest improvements are most often achieved at a very low model complexity (), which is especially interesting for the application of these models to large datasets. At this low complexity level the typeconstraints supported models even outperform more complex counterparts that ignore typeconstraints, e.g. on Freebase150k mwNN reaches 0.512 AUPRC with an embedding length of 100 but by considering typeconstraints this models achieves 0.775 AUPRC with an embedding length of only 10.
In accordance to the AUPRC scores, the improvements of the less meaningful and generally high AUROC scores support the conclusion that typeconstraints add value to the prediction quality of the models. It can be inferred from the corresponding scores that the improvements have a smaller scale, but are still significant.
5.2 Local ClosedWorld Assumption – Simple but Powerful
From Tables 2, 3 and 4, it can be observed that the LCWA leads to similar large improvements in linkprediction quality than the real typeconstraints, especially at the lowest model complexities (). For example, by exploiting the LCWA TransE improves from 0.715 to 0.806 with in the Freebase150k dataset, mwNN improves its initial AUPRC score of 0.600 () on the YAGO dataset to 0.714 and RESCAL’s AUPRC score jumps from 0.327 to 0.579 (). The only exception to this observation is RESCAL when applied to the YAGOc195k dataset. For , the RESCAL AUPRC score decreases from 0.694 to 0.672 and for from 0.721 to 0.680 AUPRC when considering the LCWA in the model. The typeconstraints of the YAGOc195k relationtypes are defined over a large set of entities, covering 22% of all possible triples It seems that a closedworld assumption is more beneficial for RESCAL in this case. As in case of the typecnstraints, the AUROC scores also support the trend observed through the AUPRC scores.
Even though the LCWA has a similar beneficial impact on the linkprediction quality than the typeconstraints, there is no evidence in our experiments that the LCWA can generally replace the extracted typeconstraints provided by the KG’s schema. For the YAGOc195k dataset, the typeconstraint supported models are clearly superior to those that exploit the LCWA, but in case of the Freebase150k and DBpediaMusic datasets the message is not as clear. RESCAL achieves on these two datasets its best results when exploiting LCWA where mwNN achieves its best results when exploiting the typeconstraints. For TransE it seems to depend on the chosen embedding length, where longer embedding lengths favor the LCWA.
6 Related Work
A number of other latent variable models have been proposed for the statistical modeling of KGs. [20] recently proposed a neural tensor network, which we did not consider in our study, since it was observed that it does not scale to larger datasets [7, 8]. Instead we exploit a less complex and more scalable neural network model proposed in [8], which could achieve comparable results to the neural tensor network of [20]. TransE [5] has been target of other recent research activities. [24] proposed a framework for relationship modeling that combines aspects of TransE and the neural tensor network proposed in [20]. [23]
proposed TransH which improves TransE’s capability to model reflexive onetomany, manytoone and manytomany relationtypes by introducing a relationtype specific hyperplane where the translation is performed. This work has been further extended in
[14] by introducing TransR which separates representations of entities and relationtypes in different spaces, where the translation is performed in the relationspace. An extensive review on representation learning with KGs can be found in [17].Domain and range constraints as given by the KG’s schema or via a local closedworld assumption have been exploited very recently in RESCAL [7, 10], but to the best of our knowledge have not yet been integrated into other latent variable methods nor has their general value been recognized for these models.
7 Conclusions and Future Work
In this work we have studied the general value of prior knowledge about the semantics of relationtypes, extracted from the schema of the knowledge graph (typeconstraints) or approximated through a local closedworld assumption, for the statistical modeling of KGs with latent variable models. Our experiments give clear empirical proof that the curated semantic information of typeconstraints significantly improves linkprediction quality of TransE, RESCAL and mwNN (up to 77%) and can therefore be considered as essential for latent variable models when applied to KGs. Thereby the value of typeconstraints becomes especially prominent when the model complexity, i.e. the dimensionality of the embeddings has to be very low, an essential requirement when applying these models to very large datasets.
Since typeconstraints can be absent or fuzzy (due to e.g. insufficient typing of entities), we further showed that an alternative, a local closedworld assumption (LCWA), can be applied in these cases that approximates domain range constraints for relationtypes on instance level rather on class level solely based on observed triples. This LCWA also leads to large improvements in the linkprediction tasks, but especially at a very low model complexity the integration of typeconstraints seemed superior. In our experiments we used models that either exploited typeconstraints or the LCWA, but in a real setting we would combine both, where we would use the typeconstraints whenever possible, but the LCWA on the relationtypes where typeconstraints are absent or fuzzy.
In future work we will further investigate on additional extensions for latent variable models that can be combined with the typeconstraints or LCWA. In the relatedwork we gave some examples were the integration of graphfeature models (e.g. the path ranking algorithm [12]) was shown to improve these models. In addition we will look at the many aspects in which RESCAL, TransE and mwNN differ. Identifying the aspects of these models that have the most beneficial impact on linkprediction quality can give rise to a new generation of latent variable approaches that could further drive knowledge graph modeling.
References
 [1] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. WardeFarley, and Y. Bengio. Theano: A cpu and gpu math compiler in python. In Proceedings of the 9th Python in Science Conference, pages 3–10, 2010.
 [2] C. Bizer, T. Heath, and T. BernersLee. Linked data  the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009.
 [3] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia  a crystallization point for the web of data. Web Semant., 7(3):154–165, 2009.
 [4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247–1250. ACM, 2008.
 [5] A. Bordes, N. Usunier, A. GarcíaDurán, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multirelational data. In NIPS, pages 2787–2795, 2013.
 [6] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr., and T. M. Mitchell. Toward an architecture for neverending language learning. In AAAI. AAAI Press, 2010.
 [7] K. Chang, W. Yih, B. Yang, and C. Meek. Typed tensor decomposition of knowledge bases for relation extraction. In EMNLP, pages 1568–1579, 2014.
 [8] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A webscale approach to probabilistic knowledge fusion. In SIGKDD, pages 601–610. ACM, 2014.
 [9] J. Hoffart, F. M. Suchanek, K. Berberich, E. LewisKelham, G. de Melo, and G. Weikum. Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In WWW, pages 229–232. ACM, 2011.
 [10] D. Krompaß, M. Nickel, and V. Tresp. Largescale factorization of typeconstrained multirelational data. In DSAA, pages 18–24. IEEE, 2014.
 [11] D. Krompaß, M. Nickel, and V. Tresp. Querying factorized probabilistic triple databases. In ISWC, pages 114–129, 2014.
 [12] N. Lao and W. W. Cohen. Relational retrieval using a combination of pathconstrained random walks. Mach. Learn., 81(1):53–67, 2010.
 [13] O. Lassila and R. R. Swick. Resource Description Framework (RDF) Model and Syntax Specification. W3c recommendation, W3C, 1999.
 [14] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, pages 2181–2187, 2015.
 [15] G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41, 1995.
 [16] M. Nickel, X. Jiang, and V. Tresp. Reducing the rank in relational factorization models by including observable patterns. In NIPS, pages 1179–1187, 2014.
 [17] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs: From multirelational link prediction to automated knowledge graph construction. CoRR, abs/1503.00759, 2015.
 [18] M. Nickel, V. Tresp, and H. Kriegel. A threeway model for collective learning on multirelational data. In ICML, pages 809–816. ACM, 2011.
 [19] M. Nickel, V. Tresp, and H. Kriegel. Factorizing yago: Scalable machine learning for linked data. In WWW, pages 271–280. ACM, 2012.
 [20] R. Socher, D. Chen, C. D. Manning, and A. Y. Ng. Reasoning With Neural Tensor Networks For Knowledge Base Completion. In NIPS. 2013.
 [21] A. Swartz. Musicbrainz: A semantic web service. IEEE Intelligent Systems, 17(1):76–77, 2002.
 [22] L. Wan, M. D. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. In ICML, volume 28, pages 1058–1066. JMLR.org, 2013.
 [23] Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge graph embedding by translating on hyperplanes. In AAAI, pages 1112–1119, 2014.
 [24] B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2014.
Comments
There are no comments yet.