1 Introduction
Link prediction, in general, is a problem of finding the missing or unknown links among interconnected entities. This assumes that entities and links can be represented as a graph, where entities are nodes and links (symmetric relationships) are edges (arcs if relationships are asymmetric). This prediction problem has been most probably defined for the first time in the social network analysis community
LibenNowell and Kleinberg (2003), however, it has soon become an important problem in other domains, and in particular in largescale knowledgebases Nickel et al. (2016), where it is used to add missing data and discover new facts. When we are dealing with the link prediction problem for knowledgebases, the semantic information contained within is usually encoded as a knowledge graph (KG) Sri Nurdiati and Hoede (2008). For the purpose of this manuscript, we treat a knowledge graph as a graph where links may have different types, and we conform to the closedworld assumption. This means that all the existing (asserted) links are considered positive, and all the links which are unknown, and obtained via knowledge graph completion, are considered negative (Figure 1).This separation into positive and negative links (examples) naturally allows us to treat the link prediction problem as a supervized classification problem with binary predictors. However, while this separation enables a wealthy body of available and wellstudied machine learning algorithms to be used for link prediction, the main challenge is how to find the best representations for links. And this is the core subject of the recent research trend in learning suitable representations for knowledge graphs, largely dominated by the socalled
neural embeddings (initially introduced for language modelling Mikolov et al. (2013)). Neural embeddings are numeric representations of nodes, and/or relations of the knowledge graph, in some continuous and dense vector space. These
embeddings are learnedwith neural networks by optimizing a specific objective function. Usually, the objective function models the constraints that neighboring nodes are
embedded close to each other, and the nodes that are not directly connected, or separated via long paths in the graph, are embedded to stay far apart. A link in a knowledge graph is then represented as a combination of node and/or relation type embeddings Nickel et al. (2016); Goyal and Ferrara (2018).1.1 Benchmarking accuracy and robustness of neural embeddings
There are two major ways of measuring the accuracy of embeddings of entities in a knowledge graph for link prediction tasks, inspired by two different fields: information retrieval Bordes et al. (2013); Yang et al. (2015); Trouillon et al. (2016); Nickel et al. (2016); Kadlec et al. (2017); Wu et al. (2017); Nickel and Kiela (2017) and graphbased data mining Perozzi et al. (2014); Grover and Leskovec (2016); GarciaGasulla et al. (2016); Chamberlain et al. (2017); Alshahrani et al. (2017); Agibetov and Samwald (2018). Information retrieval inspired approaches seem to favor nodecentric evaluations, which measure how well the embeddings are able to reconstruct the immediate neighbourhood for each node in the graph; these evaluations are based on the mean rank measurement and its variants (mean average precision, top results, mean reciprocal rank). And graphbased data mining approaches tend to measure linkequality by recurring to the evaluation measurements such as, ROC AUC and Fmeasure. See Crichton et al. (2018) for more discussion on node and linkequality measures.
Besides evaluating the accuracy, some works have focused their attention on issues that might hinder the appropriate evaluation of embeddings. For instance, there is the issue of imbalanced classes – many more negatives than positives – when the link prediction in graphs is treated as a classification problem GarciaGasulla et al. (2016). In the bioinformatics community the problem of imbalanced classes can be circumvented by considering negative links that have a biological meaning, truncating thus many potential negative links that are highly improbable biologically Alshahrani et al. (2017). Other works have demonstrated that if no care is applied while splitting the datasets, we might end up producing biased train and test examples, such that the implicit information from the test set may leak into the train set Toutanova et al. (2016); Dettmers et al. (2017). Kadlec et al. Kadlec et al. (2017)
have mentioned that the fair optimization of hyperparameters for competing approaches should be considered, as some of the reported KG completion results are significantly lower than what they potentially could be. In the life sciences domain, the timesliced graphs as generators for train and test examples have been proposed as a more realistic evaluation benchmark
Crichton et al. (2018), as opposed to the randomly generated slices of graphs.In addition to reference implementations accompanying scientific papers that propose novel embedding methodologies, there is a wealthy core of opensource initiatives that provide a one stop solution for efficient training of knowledge graph embeddings. For instance, pkg2vec Yu et al. (2019) and PyKeen Ali et al. (2019) implement many of the stateoftheart KG embedding techniques, with the focus on reproducibility and efficiency. While the community has many options for the efficient KG embedding implementations, we believe that fewer attention has been drawn to evaluating neural embeddings when knowledge graphs may exhibit structural changes. In this work we aim to make this gap narrower. Our work is closest in spirit to Goyal et al. Goyal et al. (2019) – an evaluation benchmark for graph embeddings that tries to explain which intrinsic properties of graphs make certain embedding models perform better. Unlike us, the authors consider knowledge graphs with only one type of relation.
The rest of this manuscript is organized as follows: in the Methods section (Section 2) we define the notation, present knowledge graphs we used to evaluate our approach, and formalize the evaluation pipeline. Then, in the Section 3 we introduce connectivity descriptors that allow us to capture the structural change in knowledge graphs. Sections 4, 5 report our experiments and analysis. Finally, we conclude our manuscript in the Section 6.
1.2 Contributions of this work
In our work we define semantic similarity descriptors for knowledge graphs, and correlate the performance of neural embeddings to these descriptors. The big take away from this work is that it is possible to improve the accuracy of the embeddings by adding more instances of semantically related relations. For instance, we can improve overall accuracy of knowledge graph embeddings by increasing the number of semantically related relations. This means that if we have access to information that is partially redundant (triples for an inferred relation, or a semantically related relation) this may improve overall accuracy. Moreover, by using our benchmark, we can perform a more finetuned error analysis, by examining which specific type of links pose the most problem for overall link prediction. Finally, by examining the correlation of accuracy scores to the semantic similarity descriptors we can explain the performance of neural embeddings, and predict their performance by simulating modifications to the knowledge graphs.
2 Methods
2.1 Notation and terminology
Throughout this manuscript we employ the tripleoriented treatment of knowledge graphs. As such, a knowledge graph is simply a set of triples , where are some entities, and are its relation types, or simply relations. We assume that entities and relation types are disjoint sets, i.e., . Let denote all the existing triples in the , i.e., triples , and let denote the nonexisting triples via the knowledge graph completion (, ). Similarly, and denote the existing and nonexisting triples involving a relation , respectively. Obviously, in every triple of or the relation type is fixed to . For each relation type , indicate the entities that belong to the domain and range of a relation . To describe the process of sampling some triples, we use the notation , where is any set of triples. For instance, is a sampled set of triples involving , and consisting of of triples from . Occasionally, when we write we refer to a set of triples where are fixed, and the relation type is free.
2.2 Knowledge graphs
We run our experiments on four different knowledge graphs: WN11 Toutanova et al. (2016) (subset of original WordNet dataset Miller (1995) brought down to 11 types of relations, and without inverse relation assertions), FB15k237 Toutanova et al. (2016) (a subset of Freebase knowledge graph Bollacker et al. (2008) where inverse relations have been removed), UMLS (subset of the Unified Medical Language System Bodenreider (2004) semantic network) and BIOKG (comprehensive biological knowledge graph Alshahrani et al. (2017)). WN11, FB15k237 and UMLS have been downloaded (December 2017) from the ConvE Dettmers et al. (2017) ^{1}^{1}1https://github.com/TimDettmers/ConvEGitHub repository, and BIOKG has been downloaded (September 2017) from the official link indicated in the ^{2}^{2}2http://aberowl.net/aberowl/bio2vec/bioknowledgegraph.n3supplementary material for Alshahrani et al. (2017). Details on the derivation of subsets for Wordnet and Freebase knowledge graphs can be found in Toutanova et al. (2016); Dettmers et al. (2017).
2.3 Training and evaluating neural embeddings
Our work builds upon the earlier proposed framework Alshahrani et al. (2017) to both learn and evaluate neural embeddings for the knowledge graphs, which we extend to make it more scalable. Throughout this manuscript we refer to the original framework as specialized embeddings approach. In a nutshell this approach learns and evaluates specialized embeddings for each relation type of entities of as follows: a) we generate the retained graph where we delete some of the triples involving , then b) we compute the embeddings of the entities on this resulting retained graph, finally, c) we assess the quality of these specialized embeddings on relation by training and testing binary predictors on positive and negative triples involving . These three steps are detailed in Figure 2 in the specialized embeddings box. The arrows labelled with “a” in Figure 2 symbolize the generation of retained graphs for relations and
, those marked with “b” computation of the entity embeddings, and “c” represents the training and testing binary classifiers for each relation type.
The inconvenience of the specialized embeddings approach is that we need to compute entity embeddings for each relation type separately, which is a serious scalability issue when the number of relation types in the knowledge graph becomes big. To circumvent this issue, we propose to train generalized neural embeddings for all relation types once, as opposed to training specialized embeddings for a specific relation (Figure 2, generalized embeddings box). Specifically, we generate only one retained graph, where we delete a fraction of triples for each relation type (arrow marked with “a” on the bottom of Figure 2). This retained graph is then used as a corpus for the computation of entity embeddings (“b”), which are then assessed with binary predictors for each relation type as in the specialized case (arrows marked with “c” on the bottom of Figure 2). Evidently, this approach is more scalable and economic, since we only compute and keep one set of entity embeddings per knowledge graph.
In what follows we formalize the pipeline for link prediction with specialized and generalized neural embeddings, and we give a thorough description of steps “a”, “b” and “c” (Figure 2).
2.3.1 Generation of retained graphs (step a)
By treating the problem of evaluation of the quality of the embeddings in a settheoretic approach, we can define the following datasets:

a specialized retained graph on
– training corpus for unsupervised learning of local to
entity embeddings (in Figure 3 this set is demarcated with bold contour in the upper left corner), 
a generalized retained graph on all relations – training corpus for unsupervised learning of global entity embeddings ,

– train examples for the binary classifier for ,

– test examples for the binary classifier for .
2.3.2 Neural embedding model (step b)
In this work we employ a shallow unsupervised neural embedding model Agibetov and Samwald (2018), which aims at learning entity embeddings in a dense dimensional vector space. The model is simple and fast, and it embeds the entities that appear in the positive triples close to each other, and places the entities that appear in negative triples farther appart. As in many neural embedding approaches, the weight matrix of the hidden layer of the neural network serves the role of the lookup matrix (the matrix of embeddings  latent vectors). The neural network is trained by minimizing, for each positive triple in the specialized (), or generalized graphs (
), the following loss function
Where, for each positive triple , we embed entities close to each other, such that stays as far as possible from the negative entities . The similarity function sim is taskdependent and should operate on dimensional vector representations of the entities (e.g., standard Euclidean dot product). The loss function is a softmax, that compares the positive example () to all the negative examples (().
2.3.3 Link prediction evaluation with binary classifiers (step c)
To quantify confidence in the trained embeddings, we perform the repeated random subsampling validation for each classifier . That is, for each relation we generate times: retained graph corpus for unsupervised learning of entity embeddings ) and train and test splits of positive and negative examples. Link prediction is then treated as a binary classification task with a classifier , where is a binary operator that combines entity embeddings into one single representation of the link . The performance of the classifier is measured with the standard performance measurements (e.g., Fmeasure, ROC AUC).
2.4 Evaluation benchmark summary and implementation
The whole evaluation pipeline is summarized in Algorithm 1. In our experiments the specialized and generalized neural embeddings are trained with the StarSpace toolkit Wu et al. (2017) in the train mode 1 (see StarSpace specification) with fixed hyperparameters: embedding size
, number of epochs 10, all other parameters set to default. Classification results are obtained with the scikit Python library
Pedregosa et al. (2011), statistical analysis is performed with Pandas McKinney (2010). Our experiments were performed on a high performance cluster, with modern PCs controlling multiple NVIDIA GPUs (GTX1080 and GTX1080TI). To demonstrate the highflexibility of our pipeline, we also consider knowledge graph embeddings provided with the stateoftheart DistMult Yang et al. (2015) and ComplEx Trouillon et al. (2016)models. Both of these models for our experiments were implemented in PyTorch (v1.2).
3 Structure of knowledge graphs and their change
In this section we introduce a few descriptors that are necessary to capture the variability and change in the structure of knowledge graphs. In addition to standard descriptors that describe the structure of knowledge graphs syntactically (number of entities, relations and triples), we define descriptors that measure the positive to negative ratio for each relation, and the semantic similarity of relations in the knowledge graph. These descriptors will be then used to evaluate and explain the performance of neural embeddings.
The variation of syntactic structure of the four graphs is summarized in Table 1 under the label Global. By analyzing these global descriptors, , , , we see that we have one small knowledge graph (UMLS), two mediumsized graphs (WN11, FB15K237) and one very large biological graph (BIOKG). In what follows we define the descriptors that are used in the rest of the Table 1.
Semantic  Pos/Neg ratio  Global  

mean() %  mean() %  
WN11  0.81  <0.01  0.68  5e4  40,943  11  93,003 
FB15k237  19.09  3.84  17.33  6e4  14,541  237  310,116 
BIOKG  1.31  0  0.73  1e4  346,225  9  1,619,239 
UMLS  9.46  2.31  60  7e1  137  46  6,527 
3.1 Positive to negative ratio descriptors
To measure the ratio of positive to negative examples in a knowledge graph, for a fixed relation , we use the descriptors and , defined as follows:
(1)  
(2) 
For an induced graph – consisting of all the triples involving relation of a knowledge graph – both descriptors measure how close is to being complete (fully connected). Intuitively, if a graph is only half complete (Figure 4, left), then we could potentially generate as many negatives as positives. However, if the graph is complete (all entities are connected, Figure 4, right), then there will be no negative links generated. In we restrict the space of negative examples by generating semantically plausible links, i.e., we only consider unconnected pairs from the domain and range of . Analogously, relaxes this restrictions, i.e., the negatives can be generated from all possible pairs of entities in the knowledge graph. We hypothesize that the performance of a binary link predictor of type should be positively correlated with both and , i.e., the more training examples of type there are (the more connected is) the better is the performance of the binary predictor for .
Focusing on the positive to negative ratio descriptors, under the label Pos/Neg ratio in Table 1, we see that we have two dense graphs (UMLS and FB15k237), and two very sparse graphs (WN11 and BIOKG). When we restrict the negative sample generation to only plausible semantic examples (), UMLS has on average 60% positive triples per relation, and FB15k237 17.33%. On the other hand, the two sparse graphs, BIOKG, and WN11, both have less than 1% positive triples. This suggests that for these sparse graphs, the binary prediction for any relation is extremely imbalanced, which may potentially hinder the performance of neural embedding models. If we consider negative sample generation without any semantic restriction (), then all binary tasks are highly imbalanced.
3.2 Descriptors to measure semantic similarity of relations
We introduce two descriptors that capture the amount to which the relations in the knowledge graph are similar one to another. measures the number of shared instances between the relations, and measures the proportion of shared entities either in the domain or range of the two relations. Both are based on the Jaccard similarity index, where sets are defined as in Equations 3, 4. Notice that can be seen as the degree of the role equivalence in the description logic sense; the higher it is the more two relations are semantically similar (contain the same pairs of entities). And the is higher when the two relations interconnect the same entities. Note that in elements of sets are tuples , and in elements are entities .
(3)  
(4) 
When we consider semantic similarity among the relations in our four knowledge graphs, we see the similar pattern as for the descriptors that measure positive to negative ratio. In Figure 5 we demonstrate heatmaps of semantic similarity of relations for four graphs.
We can observe that UMLS and FB15k237 have many similar relations (), very few of WN11 relations share instances, and the relations of BIOKG do not share any instances at all (). If we consider the semantic similarity in terms of shared entities, we see that, although BIOKG and WN11 have dissimilar relations (), they still can share information for the shared embeddings (). To see it consider two relations that do not share instances (), but do share entities that they interconnect (). In this situation, the training examples for and may share information during the learning phase and improve the quality of embeddings for . Using these two similarity matrices, we can define measures of semantic similarity among relations in the whole knowledge graph by taking the Frobenius norm () of matrices and (Table 1, label Semantic).
Overall, the proposed connectivity descriptors capture well the semantic and structural variability of knowledge graphs, and will allow us to make more nuanced evaluation of neural embeddings.
4 Benchmarking specialized and generalized embeddings under structural change
The goal of our experiments is to empirically investigate if, and when, the generalized neural embeddings attain similar performance as the specialized embeddings, for the four considered datasets. To do so, we first generate the retained graphs, and the train and test datasets. The retained graphs are generated for each relation type in the case of specialized embeddings, and only once for the generalized embeddings. We always keep the 1:1 ratio for the positive and negative examples. When we sample the negatives for a relation , we only consider the triples where the entities come from the domain () and the range () of . Embeddings are computed from retained graphs, and then evaluated on train and test datasets. Note that we only provide results for generalized embeddings for FB15k237, since the computation of specialized embeddings for 237 relations of FB15k237 would take months (on our machine) to finish^{3}^{3}3computation of specialized embeddings grows linearly with the number of relations, and exponentially in the number of repeated subsample validation runs. The evaluation of embeddings for one relation type
is performed with the logistic regression classifier
, where is the vector concatenation operator. To test the robustness of embeddings we perform evaluations with limited information, i.e., the size of the retained graphs controlled by, and we analyze the amount of missed embeddings in all experiments. All of our results are presented as averages of 10 repeated random subsampling validations. We thus report mean Fmeasure scores and their standard deviations.
4.1 Comparing accuracy
In Figure 6 we present distributions of averaged F1 scores, which measure the accuracy of embeddings, and ratios ( of missed examples at training and testing of the binary classifiers . As such, the overall performance of specialized or generalized embeddings on one knowledge graph is characterized by these three distributions over all relations in the given knowledge graph. The performance of embeddings is compared with varying amount of information present at the time of training of neural embeddings (parameter ). All distributions in Figure 6
are estimated and normalized with kernel density interpolation from actual histograms.
In the three knowledge graphs: BIOKG, UMLS and WN11, distributions converge as we increase the amount of available information (e.g., ), which supports of the hypothesis of this manuscript, that the generalized embeddings may yield the similar (if not the same) performance as specialized embeddings. When we consider BIOKG, the F1 and missing examples distributions for specialized and generalized neural embeddings converge almost to identical distributions, even when the overall amount of information is low (e.g., only 20 % of available triples). This may be explained by a relatively big size of available positive examples per relation type (hundreds of thousands of available triples per relation). Though, and (Table 1) are very similar for BIOKG and WN11, differences between specialized and generalized embeddings for WN11 are much more characterized, than in the case of BIOKG. In particular, neural embeddings for WN11 are very sensitive to , the less information there is the more is the intradiscrepancy in specialized and generalized distributions for the same scores (F1 and the ratio of missed examples). The amount of missed examples is very high for both specialized and generalized cases, for smaller values of , and distributions converge when . The most regular behavior is demonstrated by neural embeddings trained on UMLS corpora, where missing examples rates are all almost zero, even when . Shapes of the F1 distributions are very similar for all values of , intradiscrepancies are very low. These observations allow us to hypothesize that similar trends might exist for the FB15k237 knowledge graph, since UMLS and FB15k237 have similar distributions of and .
To summarize, as we increase the amount of available information during the computation of neural embeddings () intradiscrepancies between specialized and the generalized embeddings become negligible. And this is good news, since training generalized embeddings is times faster than training specialized embeddings for each relation , with the strong evidence that if we have enough information we can achieve the same performance.
4.2 Comparing average performance
We recall that each distribution’s sampled point is obtained by averaging results of repeated experiments for one relation . To directly compare distributions, we compare their means and standard deviations, and, as such, we are comparing the average performance of binary classifiers for specialized and generalized neural embeddings, with the varying parameter . Figure 7 depicts the average performance of all binary classifiers and its standard deviation for the four knowledge graphs.
As expected, the performance of specialized embeddings is better than the performance of generalized embeddings, however differences are very slim. BIOKG and UMLS demonstrate that, as we increase , the average F1 score increases in both cases, however, so does the standard deviation as well. WN11, on the other hand, demonstrates a counterintuitive trend where the best performance of specialized embeddings occurs when less information is available. And, for specialized embeddings, the F1 score decreases slightly when we include more information during the computation of neural embeddings. This maybe explained by an increased amount of missing examples, both during training and testing of the binary classifier. Due to a very sparse connectivity of the induced graphs of WN11, when we only consider of available triples – we exclude 80 % of available links – many entities are likely to become disconnected. This means that no embeddings are learned for them, and, as a result, the binary classifier is both trained and tested on fewer examples.
5 DistMult vs. ComplEx
In this experiment, our goal is to compare two of the most popular knowledge graph embedding models, DistMult Yang et al. (2015) and ComplEx Trouillon et al. (2016), by using our relationcentric ablation benchmark. In particular, our mission is to explain which intrinsic properties of graphs directly impact the accuracy of neural models. In contrast to our previous experiment, we perform random ablations on each relation type . For each knowledge graph we train 10 DistMult and 10 ComplEx models. We fix the embedding dimension to 200, and we use Adam optimizer. Each time models are trained for 50 epochs. The accuracy is assessed with mean rank and mean reciprocal rank metrics. In Table 2 we report mean (with standard deviation) MR and MRR performance of two models on four datasets, as well as averaged performance of models for all graphs. Overall, ComplEx is slightly better than DistMult ( vs.
, mean(SD)) however their performances stay within the confidence intervals for all graphs. If we look at the performance of models for specific graphs, the differences are more apparent. In Figure
8 we present point estimates and confidence intervals of the MRR metric for a specific graph, with horizontal lines accentuating the difference in accuracy for the two models. ComplEx is better at UMLS and WN11. DistMult on the other two.MRR (mean (SD))  MR (mean (SD))  

model  
ComplEx  BIOKG  0.67 (0.06)  30,810.75 (6330.92) 
FB15k237  0.92 (0.02)  63.56 (20.61)  
WN11  0.5 (0.15)  6,033.56 (2360.53)  
UMLS  0.89 (0.02)  2.87 (0.36)  
0.76 (0.2)  6144.39 (10,846.1)  
DistMult  BIOKG  0.92 (0.04)  189.26 (42.09) 
FB15k237  0.95 (0.01)  67.96 (14.48)  
WN11  0.38 (0.21)  8,347.39 (4016.02)  
UMLS  0.83 (0.03)  4.27 (0.78)  
0.75 (0.26)  2,432.64 (4321.82) 
To explain such a disparity in performance we analyze correlations of model’s performance to intrinsic properties of (test) graphs. Figure 9 summarizes Spearman correlations, and the Figure 10 shows regression plots to emphasize correlations.
In the following we use the abbreviation corr to refer to the Spearman correlation (since distributions are potentially not normal) of the MRR metric to intrinsic properties of graphs (Figure 9). We see that ComplEx depends on properties that emphasize semantic similarity among the relations (corr to and to ), it performs better whenever the graph has semantically related relations (be it dense or sparse), and it depends less on the number of triples (samples, corr 0.11), suggesting that this model better learns semantic relationships within the graph. On the other hand, DistMult leverages less the role equivalence similarity (corr to ), concentrates more on similar entities (corr to ), and is highly sensitive to the number of triples (corr to 0.64). This may explain why ComplEx outperforms DistMult at a small and extremely dense UMLS graph, and at a relatively big and sparse WN11 graph. DistMult, on the other hand, better leverages the abundant presence of triples in the big and sparse BIOKG graph, and in the dense FB15k237 graph. By looking at the regression plots (Figure 10), we can see that both models have high variability (small confidence) for the graphs that exhibit low semantic similarity among the relations, and contain very few samples at their disposal. Overall, the ComplEx model is better at extracting semantic relationships, while DistMult is better at leveraging big sample sizes.
6 Conclusions
The lessons learned from our experiments lead us to conclude that neural embeddings’ performance depends on the degree of how tight the relations within the knowledge graph interconnect entities. The presence of multiple relations – edges – that make the overall spider web of entities more entangled, affect the accuracy. Therefore, to increase the accuracy of neural embeddings in knowledge bases we would identify two main ingredients: a) increase the sample size, b) add similar relations. Obviously, by introducing novel relation types we increase the overall sample size. The addition of semantically similar relations can be achieved by using logical reasoners, or by recurring to external data sources. For instance, language models could be used to augment knowledge bases Petroni et al. (2019).
Herein, we proposed an opensource evaluation benchmark for knowledge graph embeddings that better captures structural variability and its change in real world knowledge graphs.
Acknowledgements
The computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC).
References
 Global and local evaluation of link prediction tasks with neural embeddings. arXiv. External Links: Link Cited by: §1.1, §2.3.2.
 Predicting missing links using pykeen. In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas), CEUR Workshop Proceedings, Vol. 2456, pp. 245–248. Cited by: §1.1.
 Neurosymbolic representation learning on biological knowledge graphs.. Bioinformatics 33 (17), pp. 2723–2730. External Links: Link, Document Cited by: §1.1, §1.1, §2.2, §2.3.
 The unified medical language system (UMLS): integrating biomedical terminology.. Nucleic Acids Res 32 (Database issue), pp. D267–70. External Links: Document Cited by: §2.2.
 Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data  SIGMOD ’08, New York, New York, USA, pp. 1247. External Links: Link, ISBN 9781605581026, Document Cited by: §2.2.
 Translating embeddings for modeling multirelational data. In Proc. NIPS 2013, Cited by: §1.1.
 Neural embeddings of graphs in hyperbolic space. arXiv:1705.10359 [cs, stat]. External Links: Link Cited by: §1.1.
 Neural networks for link prediction in realistic biomedical graphs: a multidimensional evaluation of graph embeddingbased approaches.. BMC Bioinformatics 19 (1), pp. 176. External Links: Link, Document Cited by: §1.1, §1.1.
 Convolutional 2D knowledge graph embeddings. External Links: Link Cited by: §1.1, §2.2.
 Limitations and alternatives for the evaluation of largescale link prediction. arXiv. External Links: Link Cited by: §1.1, §1.1.
 Graph embedding techniques, applications, and performance: a survey. KnowledgeBased Systems 151, pp. 78 – 94. External Links: ISSN 09507051, Document, Link Cited by: §1.
 Benchmarks for graph embedding evaluation. arXiv abs/1908.06543. Cited by: §1.1.
 Node2vec: scalable feature learning for networks.. KDD 2016, pp. 855–864. External Links: Link, Document Cited by: §1.1.
 Knowledge base completion: baselines strike back. arXiv:1705.10744 [cs]. External Links: Link Cited by: §1.1, §1.1.
 The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management  CIKM ’03, New York, New York, USA, pp. 556. External Links: Link, ISBN 1581137230, Document Cited by: §1.
 Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman (Eds.), pp. 51 – 56. Cited by: §2.4.
 Distributed representations of words and phrases and their compositionality. arXiv. External Links: Link Cited by: §1.
 WordNet: a lexical database for english. Commun ACM 38 (11), pp. 39–41. External Links: Link, ISSN 00010782, Document Cited by: §2.2.
 Poincaré embeddings for learning hierarchical representations. arXiv:1705.08039 [cs, stat]. External Links: Link Cited by: §1.1.
 A review of relational machine learning for knowledge graphs. Proc. IEEE 104 (1), pp. 11–33. External Links: Link, ISSN 00189219, Document Cited by: §1.1, §1, §1.
 Scikitlearn: machine learning in python. Journal of Machine Learning Research. External Links: Link Cited by: §2.4.
 DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining  KDD ’14, New York, New York, USA, pp. 701–710. External Links: Link, ISBN 9781450329569, Document Cited by: §1.1.
 Language models as knowledge bases?. In Proc. EMNLPIJCNLP Conference, pp. 2463–2473. Cited by: §6.
 25 years development of knowledge graph theory: the results and the challenge. Memorandum, Discrete Mathematics and Mathematical Programming (DMMP). Cited by: §1.
 Compositional learning of embeddings for relation paths in knowledge base and text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Stroudsburg, PA, USA, pp. 1434–1444. External Links: Link, Document Cited by: §1.1, §2.2.
 Complex embeddings for simple link prediction. In Proc. of ICML, pp. 2071–2080. Cited by: §1.1, §2.4, §5.
 StarSpace: embed all the things!. arXiv. External Links: Link Cited by: §1.1, §2.4.
 Embedding entities and relations for learning and inference in knowledge bases. In Proc. of ICLR, Cited by: §1.1, §2.4, §5.
 Pykg2vec: a python library for knowledge graph embedding. arXiv preprint arXiv:1906.04239. Cited by: §1.1.