Log In Sign Up

Putting RDF2vec in Order

The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on tasks where entities of different classes are involved.


page 1

page 2

page 3

page 4


EAGER: Embedding-Assisted Entity Resolution for Knowledge Graphs

Entity Resolution (ER) is a constitutional part for integrating differen...

Efficient Parallel Translating Embedding For Knowledge Graphs

Knowledge graph embedding aims to embed entities and relations of knowle...

Federated Knowledge Graphs Embedding

In this paper, we propose a novel decentralized scalable learning framew...

Bending the Future: Autoregressive Modeling of Temporal Knowledge Graphs in Curvature-Variable Hyperbolic Spaces

Recently there is an increasing scholarly interest in time-varying knowl...

Creating Knowledge Graphs Subsets using Shape Expressions

The initial adoption of knowledge graphs by Google and later by big comp...

Code Repositories


A high-performance Java Implementation of RDF2Vec

view repo

1 Introduction

Figure 1: Classic word2vec vs. Structured word2vec

RDF2vec [14] is a representation learning approach for entities in a knowledge graph. The basic idea is to first create sequences from a knowledge graph by starting random walks from each node. These sequences are then fed into the word2vec algorithm [7]

for creating word embeddings, with each entity or property in the graph being treated as a “word”. As a result, a fixed-size feature vector is obtained for each entity.

Word2vec is a well-known neural language model to train latent representations (i.e., fixed size vectors) of words based on a text corpus. Its objective is either to predict a word given its context words (known as continous bag of words or CBOW), or vice versa (known as skip gram or SG).

Given the context of a word , where is a set of preceding and succeeding words of , the learning objective of word2vec is to predict . This is known as continuous bag of words model (CBOW). The skip-gram (SG) model is trained the other way around: Given , has to be predicted. Within this training process, the size of and is also known as window or window size.

One shortfall of the original original word2vec approach is its insensitivity to the relative positions of words. It is, for instance, irrelevant whether a word is preceding or succeeding , and the actual distance to is not considered. This property of word2vec is ideal to cope with the fact that in many languages, the same sentence can be expressed with different word orderings (cf. Yesterday morning, Tom ate bread vs. Tom ate bread yesterday morning). In contrast, walks extracted from knowledge graphs, the semantics of the underlying nodes differ depending on the position of an entity in the walk, as the following examples illustrates.

Figure 2: Example knowledge graph

Fig. 2 depicts a small excerpt of a knowledge graph. Among others, the following walks could be extracted from the graph:

Hamburg -> country -> Germany            -> leader     -> Angela_Merkel
Germany -> leader  -> Angela_Merkel      -> birthPlace -> Hamburg
Hamburg -> leader  -> Peter_Tschentscher -> residence  -> Hamburg

If an RDF2vec model is trained for the entities in the center (i.e., Germany, Angela_Merkel, and Peter_Tschentscher), all of the sequences share exactly two entities in their context (Hamburg and leader), i.e., they will be projected equally close in the vector space. However, a model respecting positions would particularly differentiate the different meanings of leader (i.e., whether someone/thing has or is a leader), and the different roles of involved entities (i.e., Hamburg as a place of birth or a residence of a person, or being located in a country). Therefore, it would map the two politicians closer to each other than to Germany.

Ling et al. [6] present an extension to the word2vec algorithm, known as structured word2vec, which incorporates the positional information of words. This is achieved by using multiple encoders (CBOW) respectively decoders (SG) depending on the position of the context words. An illustration for SG can be found in Figure 1 where it is visible that the classic component uses only one output matrix which maps the embeddings to the output while the structured approach uses one output matrix per position in the window (e.g. for the subsequent word to ).

In this paper, we present , an order aware variant of RDF2vec obtained by changing the training component from word2vec to structured word2vec, and show promising preliminary results.

2 Related Work

RDF2vec was one of the first approaches to adopt statistical language modeling techniques to knowledge graphs. Similar approaches, such as node2vec [4] and DeepWalk [11], were proposed for unlabeled graphs while knowledge graphs are labeled by nature, i.e., they contain different types of edges.

Other language modeling techniques that have been adapted for knowledge graphs include GloVe [10], which yielded KGlove [2], and BERT [3], which yielded KG-BERT [16].

Variants of RDF2vec include the use of different heuristics for biasing the walks 

[1]; [15] evaluate multiple heuristics for biasing the walks or alternative walk strategies. Very few authors tried to change the training objective of RDF2vec. Besides word2vec, the GloVe [9] algorithm has also been used [2].

3 Experiments and Preliminary Results

We use jRDF2vec222 [12] to generate random walks and Ling et al.’s structured word2vec implementation333 to train an embedding based on the walks.

For the embeddings, we use the DBpedia 2016-04 dataset. We generated 500 random walks for each node in the graph with a depth of 4 (node hops). word2vec and structured word2vec were trained using the same set of walks and the same training parameters:

, , and .

We evaluate both, the classic and the position aware RDF2vec approach, on a variety of different tasks and datasets. For our evaluation, we use the GEval framework [8]. We follow the setup proposed in [13] and [8]

. Those works use data mining tasks with an external ground truth. Different feature extraction methods – which includes the generation of embedding vectors – can then be compared using a fixed set of learning methods. Overall, we evaluate our new embedding approach on six tasks using 20 datasets altogether. The evaluation is conducted on six different downstream tasks – classification and regression, clustering, determining semantic analogies, and computing entity relatedness and document similarity, the latter based on entities mentioned in the documents.

The results are presented in Table 1. When comparing the classic to the order aware embeddings, it is visible that the performances are very similar on most tasks such as classification. A first observation is that we cannot observe significant performance drops on any of the tasks when switching from classic to order aware RDF2vec embeddings. However, significant performance increases can be observed on clustering tasks and on semantic analogy tasks, which are the tasks where entities of different classes are involved (whereas the classification and regression tasks deal with entities of the same class, e.g., cities or countries). The order aware RDF2vec configuration with 100 dimensions achieved on 7 datasets the overall best results and outperforms its classic configuration with the same dimension on 10 datasets partly with significantly better outcomes. On the other hand, in most cases where the classic variant performs better, it does so by a smaller margin. Thus, in general, the order-aware variant can be used safely without performance drops, and in some cases with significant performance gains.

Task Metric Dataset c-100 oa-100 c-200 oa-200
Classification ACC AAUP 0.693 0.679 0.692 0.683
ACC Cities 0.793 0.793 0.798 0.807
ACC Forbes 0.629 0.607 0.635 0.630
ACC Metacritic Albums 0.783 0.799 0.788 0.792
ACC Metacritic Movies 0.757 0.736 0.763 0.748
Clustering ACC Cities/Countries (2k) 0.755 0.939 0.758 0.946
ACC Cities/Countries 0.786 0.785 0.7624 0.766
ACC Cities/Albums/Movies /AAUP/Forbes 0.932 0.931 0.861 0.929
ACC Teams 0.969 0.971 0.892 0.945
Regression RMSE AAUP 65.151 62.624 66.301 65.077
RMSE Cities 12.726 11.220 14.855 13.484
RMSE Forbes 34.290 34.340 36.460 35.967
RMSE Metacritic Albums 11.366 11.215 11.528 11.651
RMSE Metacritic Movies 19.091 19.530 19.078 19.432
Semantic ACC Capital-Countries 0.852 0.990 0.872 0.949
Analogies ACC Capital-Countries (all) 0.832 0.933 0.901 0.896
ACC Currency-Country 0.417 0.520 0.537 0.441
ACC City-State 0.5577 0.607 0.555 0.627
Entity Relatedness Harmonic Mean - 0.726 0.716 0.747 0.747
Document Similarity Kendall Tau - 0.405 0.373 0.350 0.325
Table 1: Results of RDF2vec (c-100, c-200) and RDF2vec (oa-100, oa-200) trained with 100 and 200 dimensions respectively. The best value in each dimension group is printed in bold, the overall best value is additionally underlined.

4 Summary and Future Work

In this paper, we presented a position aware variant of RDF2vec together with first very promising evaluation results. In the future, we plan to conduct more thorough analyses, analyzing which knowledge graph characteristics and downstream tasks benefit most from the ordered variant, and which do not. For example, we believe that graphs with a small set of predicates, or graphs which have all symmetric, inverse, and transitive relations materialized [5], can benefit more from using the ordered variant.

Furthermore, we plan to analyze how the ordered variant can be integrated into other RDF2vec configurations and flavours, such as different biased walks [2], or RDF2vec Light [12].


  • [1] M. Cochez, P. Ristoski, S. P. Ponzetto, and H. Paulheim (2017) Biased graph walks for RDF graph embeddings. In WIMS 2017, pp. 21:1–21:12. Cited by: §2.
  • [2] M. Cochez, P. Ristoski, S. P. Ponzetto, and H. Paulheim (2017) Global RDF vector space embeddings. In ISWC 2017, LNCS, Vol. 10587, pp. 190–207. Cited by: §2, §2, §4.
  • [3] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §2.
  • [4] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In ACM SIGKDD 2016, pp. 855–864. Cited by: §2.
  • [5] A. Iana and H. Paulheim (2020) More is not always better: the negative impact of a-box materialization on rdf2vec knowledge graph embeddings. In Proceedings of the CIKM 2020 Workshops, Cited by: §4.
  • [6] W. Ling, C. Dyer, A. W. Black, and I. Trancoso (2015) Two/too simple adaptations of word2vec for syntax problems. In NAACL HLT 2015, pp. 1299–1304. Cited by: §1.
  • [7] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In NIPS, Cited by: §1.
  • [8] M. A. Pellegrino, A. Altabba, M. Garofalo, P. Ristoski, and M. Cochez (2020) GEval: A modular and extensible evaluation framework for graph embedding techniques. In ESWC, Cited by: §3.
  • [9] J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation. In EMNLP 2014, pp. 1532–1543. Cited by: §2.
  • [10] J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation. In EMNLP 2014, pp. 1532–1543. Cited by: §2.
  • [11] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In ACM SIGKDD 2014, pp. 701–710. Cited by: §2.
  • [12] J. Portisch, M. Hladik, and H. Paulheim (2020) RDF2Vec light - A lightweight approachfor knowledge graph embeddings. In ISWC Posters and Demos, Cited by: §3, §4.
  • [13] P. Ristoski, G. K. D. de Vries, and H. Paulheim (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In ISWC, Cited by: §3.
  • [14] P. Ristoski, J. Rosati, T. D. Noia, R. D. Leone, and H. Paulheim (2019) RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10 (4), pp. 721–752. Cited by: §1.
  • [15] G. Vandewiele, B. Steenwinckel, P. Bonte, M. Weyns, H. Paulheim, P. Ristoski, F. D. Turck, and F. Ongenae (2020) Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs. CoRR abs/2009.04404. Cited by: §2.
  • [16] L. Yao, C. Mao, and Y. Luo (2019) KG-bert: bert for knowledge graph completion. arXiv preprint arXiv:1909.03193. Cited by: §2.