A Physical Embedding Model for Knowledge Graphs

01/21/2020 ∙ by Caglar Demir, et al. ∙ Universität Paderborn 0

Knowledge graph embedding methods learn continuous vector representations for entities in knowledge graphs and have been used successfully in a large number of applications. We present a novel and scalable paradigm for the computation of knowledge graph embeddings, which we dub PYKE . Our approach combines a physical model based on Hooke's law and its inverse with ideas from simulated annealing to compute embeddings for knowledge graphs efficiently. We prove that PYKE achieves a linear space complexity. While the time complexity for the initialization of our approach is quadratic, the time complexity of each of its iterations is linear in the size of the input knowledge graph. Hence, PYKE's overall runtime is close to linear. Consequently, our approach easily scales up to knowledge graphs containing millions of triples. We evaluate our approach against six state-of-the-art embedding approaches on the DrugBank and DBpedia datasets in two series of experiments. The first series shows that the cluster purity achieved by PYKE is up to 26 of art. In addition, PYKE is more than 22 times faster than existing embedding solutions in the best case. The results of our second series of experiments show that PYKE is up to 23 of type prediction while maintaining its superior scalability. Our implementation and results are open-source and are available at http://github.com/dice-group/PYKE.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The number and size of knowledge graphs (KGs) available on the Web and in companies grows steadily.111https://lod-cloud.net/ For example, more than 150 billion facts describing more than 3 billion things are available in the more than 10,000 knowledge graphs published on the Web as Linked Data.222lodstats.aksw.org Knowledge graph embedding (KGE) approaches aim to map the entities contained in knowledge graphs to -dimensional vectors [19, 13, 22]

. Accordingly, they parallel word embeddings from the field of natural language processing

[11, 14]

and the improvement they brought about in various tasks (e.g., word analogy, question answering, named entity recognition and relation extraction). Applications of KGEs include collective machine learning, type prediction, link prediction, entity resolution, knowledge graph completion and question answering

[13, 2, 12, 19, 22, 15]. In this work, we focus on type prediction. We present a novel approach for KGE based on a physical model, which goes beyond the state of the art (see [19] for a survey) w.r.t. both efficiency and effectiveness. Our approach, dubbed Pyke, combines a physical model (based on Hooke’s law) with an optimization technique inspired by simulated annealing. Pyke scales to large KGs by achieving a linear space complexity while being close to linear in its time complexity on large KGs. We compare the performance of Pyke with that of six state-of-the-art approaches—Word2Vec [11], ComplEx [18], RESCAL [13], TransE [2], DistMult [22] and Canonical Polyadic (CP) decomposition [6]— on two tasks, i.e., clustering and type prediction w.r.t. both runtime and prediction accuracy. Our results corroborate our formal analysis of Pyke and suggest that our approach scales close to linearly with the size of the input graph w.r.t. its runtime. In addition to outperforming the state of the art w.r.t. runtime, Pyke also achieves better cluster purity and type prediction scores.

The rest of this paper is structured as follows: after providing a brief overview of related work in Section 2, we present the mathematical framework underlying Pyke in Section 3. Thereafter, we present Pyke in Section 4. Section 5 presents the space and time complexity of Pyke. We report on the results of our experimental evaluation in Section 6. Finally, we conclude with a discussion and an outlook on future work in Section 7.

2 Related Work

A large number of KGE approaches have been developed to address tasks such as link prediction, graph completion and question answering [7, 8, 12, 13, 18] in the recent past. In the following, we give a brief overview of some of these approaches. More details can be found in the survey at [19]. RESCAL [13]

is based on computing a three-way factorization of an adjacency tensor representing the input KG. The adjacency tensor is decomposed into a product of a core tensor and embedding matrices.RESCAL captures rich interactions in the input KG but is limited in its scalability. HolE

[12] uses circular correlation as its compositional operator. Holographic embeddings of knowledge graphs yield state-of-the-art results on link prediction task while keeping the memory complexity lower than RESCAL and TransR [8]. ComplEx [18] is a KGE model based on latent factorization, wherein complex valued embeddings are utilized to handle a large variety of binary relations including symmetric and antisymmetric relations.

Energy-based KGE models [1, 2, 3] yield competitive performances on link prediction, graph completion and entity resolution. SE [3] proposes to learn one low-dimensional vector () for each entity and two matrices (, ) for each relation. Hence, for a given triple (), SE aims to minimize the distance, i.e., . The approach in [1] embeds entities and relations into the same embedding space and suggests to capture correlations between entities and relations by using multiple matrix products. TransE [2] is a scalable energy-based KGE model wherein a relation between entities and corresponds to a translation of their embeddings, i.e., provided that exists in the KG. TransE outperforms state-of-the-art models in the link prediction task on several benchmark KG datasets while being able to deal with KGs containing up to 17 million facts. DistMult [22] proposes to generalize neural-embedding models under an unified learning framework, wherein relations are bi-linear or linear mapping function between embeddings of entities.

With Pyke, we propose a different take to generating embeddings by combining a physical model with simulated annealing. Our evaluation suggests that this simulation-based approach to generating embeddings scales well (i.e., linearly in the size of the KG) while outperforming the state of the art in the type prediction and clustering quality tasks [21, 20].

3 Preliminaries and Notation

In this section, we present the core notation and terminology used throughout this paper. The symbols we use and their meaning are summarized in Table 1.

3.1 Knowledge Graph

In this work, we compute embeddings for RDF KGs. Let be the set of all RDF resources, be the set of all RDF blank nodes, be the set of all properties and denote the set of all RDF literals. An RDF KG is a set of RDF triples where , and . We aim to compute embeddings for resources and blank nodes. Hence, we define the vocabulary of an RDF knowledge graph as . Essentially, stands for all the URIs and blank nodes found in . Finally, we define the subjects with type information of as , where rdf:type stands for the instantiation relation in RDF.

Notation Description
An RDF knowledge graph
Set of all RDF resources, predicates, blank nodes and literals respectively
Set of all RDF subjects with type information
Vocabulary of
Similarity function on
Embedding of at time
Attractive and repulsive forces, respectively
Threshold for positive and negative examples
Function mapping each to a set of attracting elements of
Function mapping each to a set of repulsive elements of
Probability
Repulsive constant
System energy
Upper bound on alteration of locations of across two iterations
Energy release
Table 1: Overview of our notation

3.2 Hooke’s Law

Hooke’s law describes the relation between a deforming force on a spring and the magnitude of the deformation within the elastic regime of said spring. The increase of a deforming force on the spring is linearly related to the increase of the magnitude of the corresponding deformation. In equation form, Hooke’s law can be expressed as follows:

(1)

where is the deforming force, is the magnitude of deformation and is the spring constant. Let us assume two points of unit mass located at and respectively. We assume that the two points are connected by an ideal spring with a spring constant , an infinite elastic regime and an initial length of 0. Then, the force they are subjected to has a magnitude of . Note that the magnitude of this force grows with the distance between the two mass points.

The inverse of Hooke’s law, where

(2)

has the opposite behavior. It becomes weaker with the distance between the two mass points it connects.

3.3 Positive Pointwise Mutual Information

The Positive Pointwise Mutual Information (PPMI) is a means to capture the strength of the association between two events (e.g., appearing in a triple of a KG). Let and be two events. Let stand for the joint probability of and , for the probability of and for the probability of . Then, is defined as

(3)

The equation truncates all negative values to 0 as measuring the strength of dissociation between events accurately demands very large sample sizes, which are empirically seldom available.

4 Pyke

In this section, we introduce our novel KGE approach dubbed Pyke (a physical model for knowledge graph embeddings). Section 4.1 presents the intuition behind our model. In Section 4.2, we give an overview of the Pyke framework, starting from processing the input KG to learning embeddings for the input in a vector space with a predefined number of dimensions. The workflow of our model is further elucidated using the running example shown in Figure 1.

4.1 Intuition

Pyke is an iterative approach that aims to represent each element of the vocabulary of an input KG as an embedding (i.e., a vector) in the -dimensional space . Our approach begins by assuming that each element of is mapped to a single point (i.e., its embedding) of unit mass whose location can be expressed via an -dimensional vector in according to an initial (e.g., random) distribution at iteration . In the following, we will use to denote the embedding of at iteration . We also assume a similarity function (e.g., a PPMI-based similarity) over to be given. Simply put, our goal is to improve this initial distribution iteratively over a predefined maximal number of iterations (denoted ) by ensuring that

  1. the embeddings of similar elements of are close to each other while

  2. the embeddings of dissimilar elements of are distant from each other.

Let be the distance (e.g., the Euclidean distance) between two embeddings in . According to our goal definition, a good iterative embedding approach should have the following characteristics:

  1. If , then . This means that the embeddings of similar terms should become more similar with the number of iterations. The same holds the other way around:

  2. If , then .

We translate into our model as follows: If and are similar (i.e., if ), then a force of attraction must exist between the masses which stand for and at any time . must be proportional to , i.e., the attraction between must grow with the distance between and . These conditions are fulfilled by setting the following force of attraction between the two masses:

(4)

From the perspective of a physical model, this is equivalent to placing a spring with a spring constant of between the unit masses which stand for and . At time , these masses are hence accelerated towards each other with a total acceleration proportional to .

The translation of into a physical model is as follows: If and are not similar (i.e., if ), we assume that they are dissimilar. Correspondingly, their embeddings should diverge with time. The magnitude of the repulsive force between the two masses representing and should be strong if the masses are close to each other and should diminish with the distance between the two masses. We can fulfill this condition by setting the following repulsive force between the two masses:

(5)

where denotes a constant, which we dub the repulsive constant. At iteration , the embeddings of dissimilar terms are hence accelerated away from each other with a total acceleration proportional to . This is the inverse of Hooke’s law, where the magnitude of the repulsive force between the mass points which stand for two dissimilar terms decreases with the distance between the two mass points.

Based on these intuitions, we can now formulate the goal of Pyke formally: We aim to find embeddings for all elements of which minimize the total distance between similar elements and maximize the total distance between dissimilar elements. Let be a function which maps each element of to the subset of it is similar to. Analogously, let map each element of to the subset of it is dissimilar to. Pyke aims to optimize the following objective function:

(6)

4.2 Approach

Pyke implements the intuition described above as follows: Given an input KG , Pyke first constructs a symmetric similarity matrix of dimensions . We will use to denotes the similarity coefficient between and stored in . Pyke truncates this matrix to (1) reduce the effect of oversampling and (2) accelerate subsequent computations. The initial embeddings of all in are then determined. Subsequently, Pyke uses the physical model described above to improve the embeddings iteratively. The iteration is ran at most times or until the objective function stops decreasing. In the following, we explain each of the steps of the approach in detail. We use the RDF graph shown in Figure 1 as a running example.333This example is provided as an example in the DL-Learner framework at http://dl-learner.org.

Figure 1: Example RDF graph
Figure 2: PPMI similarity matrix of resources in the RDF graph shown in Figure 1

4.2.1 Building the similarity matrix.

For any two elements , we set in our current implementation. We compute the probabilities , and as follows:

(7)

Similarly,

(8)

Finally,

(9)

For our running example (see Figure 1), Pyke constructs the similarity matrix shown in Figure 2. Note that our framework can be combined with any similarity function . Exploring other similarity function is out the scope of this paper but will be at the center of future works.

4.2.2 Computing and .

To avoid oversampling positive or negative examples, we only use a portion of for the subsequent optimization of our objective function. For each , we begin by computing by selecting resources which are most similar to . Note that if less than resources have a non-zero similarity to , then contains exactly the set of resources with a non-zero similarity to . Thereafter, we sample elements of with randomly. We call this set . For all , we set to , where is our repulsive constant. The values of for are preserved. All other values are set to 0. After carrying out this process for all , each row of now contains exactly non-zero entries provided that each has at least resources with non-zero similarity. Given that , is now sparse and can be stored accordingly.444We use for the sake of explanation. For practical applications, this step can be implemented using priority queues, hence making quadratic space complexity for storing unnecessary. The PPMI similarity matrix for our example graph is shown in Figure 2.

4.2.3 Initializing the embeddings.

Each is mapped to a single point of unit mass in at iteration . As exploring sophisticated initialization techniques is out of the scope of this paper, the initial vector is set randomly.555

Preliminary experiments suggest that applying a singular value decomposition on

and initializing the embeddings with the latent representation of the elements of the vocabulary along the

most salient eigenvectors has the potential of accelerating the convergence of our approach.

Figure 3 shows a 3D projection of the initial embeddings for our running example (with ).

4.2.4 Iteration.

This is the crux of our approach. In each iteration , our approach assumes that the elements of attract with a total force

(10)

On the other hand, the elements of repulse with a total force

(11)

We assume that exactly one unit of time elapses between two iterations. The embedding of at iteration can now be calculated by displacing proportionally to .However, implementing this model directly leads to a chaotic (i.e., non-converging) behavior in most cases. We enforce the convergence using an approach borrowed from simulated annealing, i.e., we reduce the total energy of the system by a constant factor after each iteration. By these means, we can ensure that our approach always terminates, i.e., we can iterate until does not decrease significantly or until a maximal number of iterations is reached.

Figure 3: PCA projection of 50-dimensional embeddings for our running example. Left are the randomly initialized embeddings. The figure on the right shows the 50-dimensional Pyke embedding vectors for our running example after convergence. Pyke was configured with , , and .

4.2.5 Implementation.

Algorithm 1 shows the pseudocode of our approach. Pyke updates the embeddings of vocabulary terms iteratively until one of the following two stopping criteria is satisfied: Either the upper bound on the iterations is met or a lower bound on the total change in the embeddings (i.e., ) is reached. A gradual reduction in the system energy inherently guarantees the termination of the process of learning embeddings. A 3D projection of the resulting embedding for our running example is shown in Figure 3.

0:  , , , , , ,
  //initialize embeddings
  for  each in  do
      = random vector in ;
  end for
  //initialize similarity matrix
   = new Matrix[][];
  for  each in  do
     for  each in  do
         = ;
     end for
  end for// perform positive and negative sampling
  for  each in  do
      = getPositives ;
      = getNegatives ;
  end for
  // iteration
  ;
  ;
  while  do
     for  each in  do
        ;
        ;
        ;
     end for
     ;
     if  then
        break
     end if;
  end while
  return  Embeddings
Algorithm 1 Pyke

5 Complexity Analysis

5.1 Space complexity

Let . We would need at most entries to store , as the matrix is symmetric and we do not need to store its diagonal. However, there is actually no need to store . We can implement as a priority queue of size in which the indexes of elements of most similar to as well as their similarity to are stored. can be implemented as a buffer of size which contains only indexes. Once reaches its maximal size , then new entries (i.e., with ) are added randomly. Hence, we need space to store both and . Note that . The embeddings require exactly space as we store and for each . The force vectors and each require a space of . Hence, the space complexity of Pyke lies clearly in and is hence linear w.r.t. the size of the input knowledge graph when the number of dimensions of the embeddings and the number of positive and negative examples are fixed.

5.2 Time complexity

Initializing the embeddings requires operations. The initialization of and can also be carried out in linear time. Adding an element to and is carried out at most times. For each , the addition of an element to has a runtime of at most . Adding elements to is carried out in constant time, given that the addition is random. Hence the computation of and can be carried out in linear time w.r.t. . This computation is carried out times, i.e., once for each . Hence, the overall runtime of the initialization for Pyke is on . Importantly, the update of the position of each can be carried out in , leading to each iteration having a time complexity of . The total runtime complexity for the iterations is hence , which is linear in . This result is of central importance for our subsequent empirical results, as the iterations make up the bulk of Pyke’s runtime. Hence, Pyke’s runtime should be close to linear in real settings.

6 Evaluation

6.1 Experimental Setup

The goal of our evaluation was to compare the quality of the embeddings generated by Pyke with the state of the art. Given that there is no intrinsic measure for the quality of embeddings, we used two extrinsic evaluation scenarios. In the first scenario, we measured the type homogeneity of the embeddings generated by the KGE approaches we considered. We achieved this goal by using a scalable approximation of DBSCAN dubbed HDBSCAN [4]. In our second evaluation scenario, we compared the performance of Pyke on the type prediction task against that of 6 state-of-the-art algorithms. In both scenarios, we only considered embeddings of the subset of as done in previous works [10, 17]. We set , and throughout our experiments. The values were computed using a Sobol Sequence optimizer [16]. All experiments were carried out on a single core of a server running Ubuntu 18.04 with GB RAM with 16 Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz processors.

We used six datasets (2 real, 4 synthetic) throughout our experiments. An overview of the datasets used in our experiments is shown in Table 2. Drugbank666download.bio2rdf.org/#/release/4/drugbank is a small-scale KG, whilst the DBpedia (version 2016-10) dataset is a large cross-domain dataset.777 Note that we compile the DBpedia datasets by merging the dumps of mapping-based objects, skos categories and instance types provided in the DBpedia download folder for version 2016-10 at downloads.dbpedia.org/2016-10. The four synthetic datasets were generated using the LUBM generator [5] with 100, 200, 500 and 1000 universities.

Dataset
Drugbank 3,146,309 521,428 421,121 102
DBpedia 27,744,412 7,631,777 6,401,519 423
LUBM100 9,425,190 2,179,793 2,179,766 14
LUBM200 18,770,356 4,341,336 4,341,309 14
LUBM500 46,922,188 10,847,210 10,847,183 14
LUBM1000 93,927,191 21,715,108 21,715,081 14
Table 2: Overview of RDF datasets used in our experiments

We evaluated the homogeneity of embeddings by measuring the purity [9] of the clusters generated by HDBSCAN [4]. The original cluster purity equation assumes that each element of a cluster is mapped to exactly one class [9]. Given that a single resource can have several types in a knowledge graph (e.g., BarackObama is a person, a politician, an author and a president in DBpedia), we extended the cluster purity equation as follows: Let be the set of all classes found in . Each was mapped to a binary type vector of length . The ith entry of was 1 iff was of type . In all other cases, was set to 0. Based on these premises, we computed the purity of a clustering as follows:

(12)

where are the clusters computed by HDBSCAN. A high purity means that resources with similar type vectors (e.g., presidents who are also authors) are located close to each other in the embedding space, which is a wanted characteristic of a KGE.

In our second evaluation, we performed a type prediction experiment in a manner akin to [10, 17]. For each resource , we used the closest embeddings of to predict ’s type vector. We then compared the average of the types predicted with

’s known type vector using the cosine similarity:

(13)

where stands for the neareast neighbors of . We employed 1, 3, 5, 10, 15, 30, 50, 100 in our experiments.

Preliminary experiments showed that performing the cluster purity and type prediction evaluations on embeddings of large knowledge graphs is prohibited by the long runtimes of the clustering algorithm. For instance, HDBSCAN did not terminate in 20 hours of computation when . Consequently, we had to apply HDBSCAN on embeddings on the subset of on DBpedia which contained resources of type Person or Settlement. The resulting subset of on DBpedia consists of RDF resources. For the type prediction task, we sampled resources from according to a random distribution and fixed them across the type prediction experiments for all KGE models.

6.2 Results

6.2.1 Cluster Purity Results.

Table 3 displays the cluster purity results for all competing approaches. Pyke achieves a cluster purity of 0.75 on Drugbank and clearly outperforms all other approaches. DBpedia turned out to be a more difficult dataset. Still, Pyke was able to outperform all state-of-the-art approaches by between 11% and 26% (absolute) on Drugbank and between 9% and 23% (absolute) on DBpedia. Note that in 3 cases, the implementations available were unable to complete the computation of embeddings within 24 hours.

Approach Drugbank DBpedia
Pyke 0.75 0.57
Word2Vec 0.43 0.37
ComplEx 0.64 *
RESCAL * *
TransE 0.60 0.48
CP 0.49 0.41
DistMult 0.49 0.34
Table 3: Cluster purity results. The best results are marked in bold. Experiments marked with * did not terminate after 24 hours of computation.

6.2.2 Type Prediction Results.

Figure 4 and Figure 5 show our type prediction results on the Drugbank and DBpedia datasets. Pyke outperforms all state-of-the-art approaches across all experiments. In particular, it achieves a margin of up to 22% (absolute) on Drugbank and 23% (absolute) on DBpedia. Like in the previous experiment, all KGE approaches perform worse on DBpedia, with prediction scores varying between and .

Figure 4: Mean results on type prediction scores on randomly sampled entities of DBpedia
Figure 5: Mean of type prediction scores on all entities of Drugbank

6.2.3 Runtime Results.

Table 5 show runtime performances of all models on the two real benchmark datasets, while Figure 6 display the runtime of Pyke on the synthetic LUBM datasets. Our results support our original hypothesis. The low space and time complexities of Pyke mean that it runs efficiently: Our approach achieves runtimes of only 25 minutes on Drugbank and 309 minutes on DBpedia, while outperforming all other approaches by up to 14 hours in runtime.

In addition to evaluating the runtime of Pyke

on synthetic data, we were interested in determining its behaviour on datasets of growing sizes. We used LUBM datasets and computed a linear regression of the runtime using ordinary least squares (OLS). The runtime results for this experiment are shown in

Figure 6. The linear fit shown in Table 4 achieves values beyond 0.99, which points to a clear linear fit between Pyke’s runtime and the size of the input dataset.

Figure 6: Runtime performances of Pyke on synthetic KGs. Colored lines represent fitted linear regressions with fixed values of Pyke.
K Coefficient Intercept
5 4.52 10.74 0.997
10 4.65 13.64 0.996
20 5.23 19.59 0.997
Table 4: Results of fitting OLS on runtimes.
Approach Drugbank DBpedia
Pyke 25 1 309 1
Word2Vec 41 420
ComplEx 705 1 *
RESCAL * *
TransE 68 1 685 1
CP 230 1 1154 1
DistMult 210 1 1030 1
Table 5:

Runtime performances (in minutes) of all competing approaches. All approaches were executed three times on each dataset. The reported results are the mean and standard deviation of the last two runs. The best results are marked in bold. Experiments marked with * did not terminate after 24 hours of computation.

We believe that the good performance of Pyke stems from (1) its sampling procedure and (2) its being akin to a physical simulation. Employing PPMI to quantify the similarity between resources seems to yield better sampling results than generating negative examples using the local closed word assumption that underlies sampling procedures of all of competing state-of-the-art KG models. More importantly, positive and negative sampling occur in our approach per resource rather than per RDF triple. Therefore, Pyke is able to leverage more from negative and positive sampling. By virtue of being akin to a physical simulation, Pyke is able to run efficiently even when each resource is mapped to 45 attractive and 45 repulsive resources (see Table 5) whilst all state-of-the-art KGE required more computation time.

7 Conclusion

We presented Pyke, a novel approach for the computation of embeddings on knowledge graphs. By virtue of being akin to a physical simulation, Pyke retains a linear space complexity. This was proven through a complexity analysis of our approach. While the time complexity of the approach is quadratic due to the computation of and , all other steps are linear in their runtime complexity. Hence, we expected our approach to behave closes to linearly. Our evaluation on LUBM datasets suggests that this is indeed the case and the runtime of our approach grows close to linearly. This is an important result, as it means that our approach can be used on very large knowledge graphs and return results faster than popular algorithms such as Word2VEC and TransE. However, time efficiency is not all. Our results suggest that Pyke

outperforms state-of-the-art approaches in the two tasks of type prediction and clustering. Still, there is clearly a lack of normalized evaluation scenarios for knowledge graph embedding approaches. We shall hence develop such benchmarks in future works. Our results open a plethora of other research avenues. First, the current approach to compute similarity between entities/relations on KGs is based on the local similarity. Exploring other similarity means will be at the center of future works. In addition, using a better initialization for the embeddings should lead to faster convergence. Finally, one could use a stochastic approach (in the same vein as stochastic gradient descent) to further improve the runtime of

Pyke.

References

  • [1] A. Bordes, X. Glorot, J. Weston, and Y. Bengio (2014) A semantic matching energy function for learning with multi-relational data. Machine Learning. Cited by: §2.
  • [2] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. Cited by: §1, §2.
  • [3] A. Bordes, J. Weston, R. Collobert, and Y. Bengio (2011) Learning structured embeddings of knowledge bases. In

    Twenty-Fifth AAAI Conference on Artificial Intelligence

    ,
    Cited by: §2.
  • [4] R. J. Campello, D. Moulavi, and J. Sander (2013)

    Density-based clustering based on hierarchical density estimates

    .
    In Pacific-Asia conference on knowledge discovery and data mining, Cited by: §6.1, §6.1.
  • [5] Y. Guo, Z. Pan, and J. Heflin (2005) LUBM: a benchmark for owl knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3 (2-3), pp. 158–182. Cited by: §6.1.
  • [6] F. L. Hitchcock (1927) The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics 6 (1-4), pp. 164–189. Cited by: §1.
  • [7] X. Huang, J. Zhang, D. Li, and P. Li (2019) Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Cited by: §2.
  • [8] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu (2015) Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, Cited by: §2.
  • [9] C. Manning, P. Raghavan, and H. Schütze (2010) Introduction to information retrieval. Natural Language Engineering. Cited by: §6.1.
  • [10] A. Melo, H. Paulheim, and J. Völker (2016) Type prediction in rdf knowledge bases using hierarchical multilabel classification. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, pp. 14. Cited by: §6.1, §6.1.
  • [11] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, Cited by: §1.
  • [12] M. Nickel, L. Rosasco, and T. Poggio Holographic embeddings of knowledge graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 1955–1961. Cited by: §1, §2.
  • [13] M. Nickel, V. Tresp, and H. Kriegel (2011) A three-way model for collective learning on multi-relational data.. In ICML, Vol. 11. Cited by: §1, §2.
  • [14] J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Cited by: §1.
  • [15] P. Ristoski and H. Paulheim (2016) RDF2Vec: rdf graph embeddings for data mining. In International Semantic Web Conference, Cited by: §1.
  • [16] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola (2010) Variance based sensitivity analysis of model output. design and estimator for the total sensitivity index. Computer Physics Communications 181 (2), pp. 259–270. Cited by: §6.1.
  • [17] S. Thoma, A. Rettinger, and F. Both (2017) Towards holistic concept representations: embedding relational knowledge, visual attributes, and distributional word semantics. In International Semantic Web Conference, Cited by: §6.1, §6.1.
  • [18] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard (2016) Complex embeddings for simple link prediction. In International Conference on Machine Learning, Cited by: §1, §2.
  • [19] Q. Wang, Z. Mao, B. Wang, and L. Guo (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1, §2.
  • [20] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang (2017) Community preserving network embedding. In AAAI, Cited by: §2.
  • [21] R. Xie, Z. Liu, J. Jia, H. Luan, and M. Sun Representation learning of knowledge graphs with entity descriptions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 2659–2665. Cited by: §2.
  • [22] B. Yang, W. Yih, X. He, J. Gao, and L. Deng (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §2.