Log In Sign Up

ChronoR: Rotation Based Temporal Knowledge Graph Embedding

Despite the importance and abundance of temporal knowledge graphs, most of the current research has been focused on reasoning on static graphs. In this paper, we study the challenging problem of inference over temporal knowledge graphs. In particular, the task of temporal link prediction. In general, this is a difficult task due to data non-stationarity, data heterogeneity, and its complex temporal dependencies. We propose Chronological Rotation embedding (ChronoR), a novel model for learning representations for entities, relations, and time. Learning dense representations is frequently used as an efficient and versatile method to perform reasoning on knowledge graphs. The proposed model learns a k-dimensional rotation transformation parametrized by relation and time, such that after each fact's head entity is transformed using the rotation, it falls near its corresponding tail entity. By using high dimensional rotation as its transformation operator, ChronoR captures rich interaction between the temporal and multi-relational characteristics of a Temporal Knowledge Graph. Experimentally, we show that ChronoR is able to outperform many of the state-of-the-art methods on the benchmark datasets for temporal knowledge graph link prediction.


RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

We study the problem of learning representations of entities and relatio...

Learning Sequence Encoders for Temporal Knowledge Graph Completion

Research on link prediction in knowledge graphs has mainly focused on st...

Holographic Embeddings of Knowledge Graphs

Learning embeddings of entities and relations is an efficient and versat...

Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs

We present a multi-relational temporal Knowledge Graph based on the dail...

TransGCN:Coupling Transformation Assumptions with Graph Convolutional Networks for Link Prediction

Link prediction is an important and frequently studied task that contrib...

Learning Meta Representations of One-shot Relations for Temporal Knowledge Graph Link Prediction

Few-shot relational learning for static knowledge graphs (KGs) has drawn...

Temporal Attribute Prediction via Joint Modeling of Multi-Relational Structure Evolution

Time series prediction is an important problem in machine learning. Prev...

1 Introduction

Figure 1: A timeline of events extracted from the ICEWS14 knowledge graph. The events are chronologically sorted from top to bottom, demonstrating interactions between, president Obama, NATO, Afghanistan and Russia.

Knowledge Graphs (KGs) organize information around entities (people, countries, organizations, movies, etc.) in the form of factual triplets, where each triplet represents how two entities are related to each other, for example (Washington DC, captialOf, USA).

There exists an ever growing number of publicly available KGs, for example DBPedia 

Auer et al. (2007), Freebase Bollacker et al. (2008), Google Knowledge Graph Blog (2012), NELL Carlson et al. (2010), OpenIE Yates et al. (2007); Etzioni et al. (2011), YAGO Biega et al. (2013); Hoffart et al. (2013); Mahdisoltani et al. (2013), and UMLS Burgun and Bodenreider (2001). This structured way of representing knowledge makes it easy for computers to digest and utilize in various applications. For example KGs are used in recommender systems Cao et al. (2019); Zhang et al. (2016), the medical domain Sang et al. (2018); Abdelaziz et al. (2017), question-answering Hao et al. (2017), information retrieval Xiong et al. (2017)

, and natural language processing 

Yang and Mitchell (2017).

Though these graphs are continuously growing, they remain particularly incomplete. Knowledge Base Completion (KBC) focuses on finding/predicting missing relations between entities. A wide range of work explores various methods of finding extraction errors or completing KGs Bordes et al. (2013); Yang et al. (2014); Trouillon et al. (2016); Sadeghian et al. (2019a); Zhou et al. (2019); Sun et al. (2019); Zhang et al. (2019).

Different events and actions cause relations and entities to evolve over time. For example, Figure 1 illustrates a timeline of relations and events happening in 2014 involving entities like Obama (president of USA at the time), NATO, Afghanistan, and Russia. Here, an agreement between Obama and Afghanistan is happening concurrently with NATO increasing armed forces in Afghanistan; after Obama made a visit to NATO. A few months later, after NATO makes some pessimistic comments towards Russia and Obama wants to de-escalate the conflict, it appears that he provides military aid to NATO, and in turn, NATO provides military aid in Afghanistan.

Temporally-aware KGs are graphs that add a fourth dimension, namely time t, giving the fact a temporal context. Temporal KGs are designed to capture this temporal information and the dynamic nature of real-world facts. While many of the previous works study knowledge graph completion on static KGs, little attention has been given to temporally-aware KGs. Though recent work has begun to solve the temporal link prediction task, these models often utilize a large number of parameters, making them difficult to train Garcia-Duran et al. (2018); Dasgupta et al. (2018); Leblay and Chekol (2018). Furthermore, many use inadequate datasets such as YAGO2 Hoffart et al. (2013), which are sparse in the time domain, or a time augmented version of FreeBase Bollacker et al. (2008), where time is appended to some existing facts.

One of the most popular models used to solve the link prediction task involves embedding the KG, that is, mapping entities and relations in the KG to high dimensional vectors, in which each entity and relation mapping considers the structure of the graph as constraints 

Bordes et al. (2013); Yang et al. (2014); Lin et al. (2015). These techniques have proven to be the state-of-the-art in modeling static knowledge graphs and inferring new facts from the KG based on the existing ones. Similarly, temporal knowledge graph embedding methods learn an additional mapping for time. Different methods differ based on how they map the elements in the knowledge graph and their scoring function.

In this paper, we propose ChronoR, a novel temporal link prediction model based on

-dimensional rotation. We formulate the link prediction problem as learning representations for entities in the knowledge graph and a rotation operator based on each fact’s relation and temporal elements. Further, we show that the proposed scoring function is a generalization of previously used scoring functions for the static KG link prediction task. We also provide insights into the regularization used in other similar models and propose a new regularization method inspired by tensor nuclear norms. Empirical experiments also confirm its advantages. Our experiments on available benchmarks show that ChronoR outperforms previous state-of-the-art methods.

The rest of the paper is organized as follows: Section 2 covers the relevant previous work, Section 3 formally defines the temporal knowledge graph completion problem, in Section 4 we go over the proposed model’s details and learning procedure, Section 5 details the loss and regulation functions, Section 6 discusses the experimental setup, and Section 7 concludes our work.

2 Related Work

2.1 Static KG Embeddings

There has been a substantial amount of research in KG embedding in the non-temporal domain. One of the earliest models, TransE Bordes et al. (2013), is a translational distance-based model, which embeds the entities h and t, along with relation r, and maps them through the function: h + r = t. There have been several extensions to TransE, including TransH Wang et al. (2014)

which models relations as hyperplanes; TransR 

Lin et al. (2015) which embed entities and relations in separate spaces and TransD Ji et al. (2015) in which two vectors represent each element in a triple in order to represent the elements and construct mapping matrices. Other works, such as DistMult Yang et al. (2014), represent relations as bilinear functions. ComplEx Trouillon et al. (2016) extends DistMult to the complex space. RotatE Sun et al. (2019) also embeds the entities in the complex space, and treats relations as planar rotations. QuatE Zhang et al. (2019) embeds each element using quaternions.

2.2 Temporal Embeddings

There has been a wide range of approaches to the problem of temporal link prediction Kazemi and Goel (2020). A straightforward technique is to ignore the timestamps and make a static KB by aggregating links across different times Liben-Nowell and Kleinberg (2007), then learn a static embedding for each entity. There have been several attempts to improve along this direction by giving more weights to the links that are more recent Sharan and Neville (2008); Ibrahim and Chen (2015); Ahmed and Chen (2016); Ahmed et al. (2016). In contrast to these methods, Yao et al. (2016) first learn embeddings for each snapshot of the KB, then aggregate the embeddings by using a weighted average of them. Several techniques have been proposed for the choice of weights of the embedding aggregation, for instance, based on ARIMA Güneş et al. (2016)

or reinforcement learning

Moradabadi and Meybodi (2017).

Some of the other works to extend sequence models to TKG. Sarkar et al. (2007)

employs a Kalman filter to learn dynamic node embeddings.

Garcia-Duran et al. (2018) use recurrent neural nets (RNN) to accommodate for temporal data and extend DistMult and TransE to TKG. For each relation, a temporal embedding has been learned by feeding time characters and the static relation embedding to an LSTM. This method only learns dynamic embedding for relations, not entities. Furthermore, Han et al. (2020) utilize a temporal point process parameterized by a deep neural architecture.

In one of the earliest works to employ representation learning techniques for reasoning over TKGs Sadeghian et al. (2016), proposed both an embedding method as well as rule mining methods for reasoning over TKGs. Another related work, t-TransE  Jiang et al. (2016), learns time-based embeddings indirectly by learning the order of time-sensitive relations, such as wasBornIn followed by diedIn. Esteban et al. (2016) also impose temporal order constraints on their data by adding an element to their quadruple (s,p,o,t:Bool), where Bool indicates if the fact vanishes or continues after time t. However, the model is only demonstrated on medical and sensory data.

Inspired by the success of diachronic word embeddings, some methods have tried to extend them to the TKG problem Garcia-Duran et al. (2018); Dasgupta et al. (2018)

. Diachronic methods map every (node, timestamp) or (relation, timestamp) pair to a hidden representation.

Goel et al. (2020)

learn dynamic embeddings by masking a fraction of the embedding weights with an activation function of frequencies and

Xu et al. (2019) embed the vectors as a direct function of time. Two con-current temporal reasoning methods TeRo Xu et al. (2020) and TeMP Wu et al. (2020) are also included in the empirical comparison table in Section 6.

Other methods, like Ma et al. (2019); Sadeghian et al. (2019b); Jain et al. (2020); Lacroix et al. (2020), do not evolve the embedding of entities over time. Instead, by using a representation for time, learn the temporal behavior. For instance, Ma et al. (2019) change the scoring function based on the time embedding and Lacroix et al. (2020)

perform tensor decomposition based on the time representation.

3 Problem Definition

In this section, we formally define the problem of temporal knowledge graph completion and specify the notations used throughout the rest of the paper.

We represent scalars with lower case letters , vectors and matrices with bold lower case letters , higher order tensors with bold upper case letters , and the th element of a vector as . We use to denote the element wise product of two vectors and to denote matrix or vector concatenation. We denote the complex norm as and denotes the vector p-norm; we drop when .

A Temporal Knowledge Graph is referred to a set of quadruples . Each quadruple represents a temporal fact that is true in a world. is the set of all entities and is the set of all relations in the ontology. The fourth element in each quadruple represents time, which is often discretized. represents the set of all possible time stamps.

Temporal Knowledge Graph Completion (temporal link prediction) refers to the problem of completing a TKGE by inferring facts from a given subset of its facts. In this work, we focus on predicting temporal facts within the observed set , as opposed to the more general problem which also involves forecasting future facts.

4 Temporal KG Representation Learning

In this section, we present a framework for temporal knowledge graph representation learning. Given a TKG, we want to learn representations for entities, relations, and timestamps (e.g., ) and a scoring function , such that true quadruples receive high scores. Thus, given , the embeddings can be learned by optimizing an appropriate cost function.

Many of the previous works use a variety of different objectives and linear/non-linear operators to adapt static KG completion’s scoring functions to the scoring function in the temporal case (see Section 2).

One example, RotatE Sun et al. (2019) learns embeddings by requiring each triplet’s head to fall close to its tail once transformed by an (element-wise) rotation parametrized by the relation vector: , where . Thus, RotatE defines , for some .

Our model is inspired by the success of rotation-based models in static KG completion Sun et al. (2019); Zhang et al. (2019). For example, to carry out a rotation by an angle in the two dimensional space, one can use the well known Euler’s formula , and the fact that if is represented by its dual complex form , then . We represent rotation by angle in the 2-dimensional space with .

4.1 ChronoR

In this paper, we consider a subset of the group of general linear transformations

over the k-dimensional real space, consisting of rotation and scaling and parametrize the transformation by both time and relation. Intuitively, we expect that for true facts:


where and represents the (row-wise) linear operator in k-dimensional space, parametrized by and

. Note that any orthogonal matrix

(i.e., ) is equivalent to a k-dimensional rotation. However, we relax the unit norm constraint, thus extending rotations to a subset of linear operators which also includes scaling.

As previous work has noted, for 2 or 3 dimensional rotations, one can represent using complex numbers and quaternions (), respectively. Higher dimensional transformations can also be constructed using the Aguilera-Perez Algorithm Aguilera and Pérez-Aguila (2004) followed by a scalar multiplication.

4.2 Scoring Function

Unlike RotatE, that uses a scoring function based on the Euclidean distance of and , we propose to use the angle between the two vectors.

Our motivations comes from observations in a variety of previous works Aggarwal et al. (2001); Zimek et al. (2012) showing that in higher dimensions, the Euclidean norm suffers from the curse of high dimensionality and is not a good measure of the concept of proximity or similarity between vectors. We use the following well-known definition of inner product to define the angle between and .

Definition 1.

If and are matrices in we define:


Based on the above definition, the angle between two matrices is proportional to their inner product. Hence, we define our scoring function as:


That is, we expect the of the relative angle between and to be higher (i.e., their angle close to ) when is a true fact in the TKG.

Some of the state of the art work in static KG completion, such as quaternion-based rotations QuatE Zhang et al. (2019), complex domain tensor factorization Trouillon et al. (2016), and also recent works on temporal KG completion like TNTComplEx Lacroix et al. (2020), use a scoring function similar to


and motivate it by the fact that the optimisation function requires the scores to be purely real Trouillon et al. (2016).

However, it is interesting to note that this scoring method is in fact a special case of Equation 4. The following theorem proves the equivalence for scoring functions used in ComplEx111A similar theorem holds for quaternions when k=4, see Appendix to Equation 4 when .

Theorem 1.

If and are their equivalent matrix forms, then (proof in Appendix)

To fully define we also need to specify how the linear operator is parameterized by . In the rest of the paper and experiments, we simply concatenate the head and relation embeddings to get , where and are the representations of the fact’s relation and time elements and . Since in many real world TKGs, there are a combination of static and dynamic facts, we also allow an extra rotation operator parametrized only by , i.e., an extra term in to better represent static facts.

To summarize, our scoring function is defined as:


where , and .

5 Optimization

Model MRR Hit@1 Hits@3 Hit@10 MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
TransE (2013) 28.0 9.4 - 63.70 29.4 8.4 - 66.30 29.6 22.8 - 46.8
DistMult (2014) 43.9 32.3 - 67.2 45.6 33.7 - 69.1 27.5 21.5 - 43.8
SimpIE (2018) 45.8 34.1 51.6 68.7 47.8 35.9 53.9 70.8 - - - -
ComplEx (2016) 47.0 35.0 54.0 71.0 49.0 37.0 55.0 73.0 36.0 29.0 36.0 54.0
ConT (2018) 18.5 11.7 20.5 31.50 16.4 10.5 18.9 27.20 - - - -
TTransE (2016) 25.5 7.4 - 60.1 27.1 8.4 - 61.6 32.1 23.0 - 51.0
TA-TransE (2018) 27.5 9.5 - 62.5 29.9 9.6 - 66.8 32.1 23.1 - 51.2
HyTE (2018) 29.7 10.8 41.6 65.5 31.6 11.6 44.5 68.1 - - - -
TA-DistMult (2018) 47.7 - 36.3 68.6 47.4 34.6 - 72.8 29.1 21.6 - 47.6
DE-SimpIE (2020) 52.6 41.8 59.2 72.5 51.3 39.2 57.8 74.8 - - - -
TIMEPLEX (2020) 60.40 51.50 - 77.11 63.99 54.51 - 81.81 - - - -
TNTComplEx (2020) 60.72 51.91 65.92 77.17 66.64 58.34 71.82 81.67 35.94 28.49 36.84 53.75
TeRo (concurrent work) 56.2 46.8 62.1 73.2 58.6 46.9 66.8 79.5 - - - -
TeMP-SA (concurrent work) 60.7 48.4 68.4 84.0 68.0 55.3 76.9 91.3 - - - -
ChronoR (k=3) 59.39 49.64 65.40 77.30 68.41 61.06 73.01 82.13 36.50 29.16 37.63 53.53
ChronoR (k=2) 62.53 54.67 66.88 77.31 67.50 59.63 72.29 82.03 36.62 29.18 37.92 53.79
Table 1: Evaluation on the YAGO15k, ICEWS14, and ICEWS05-15 datasets. Results reported for previous related works are the best numbers reported in their respective paper222The original results for TNTComplex were reported on the validation set, we use the code and hyper-parameters from the official repository re-run the model and report test set values..

Having an appropriate scoring function, one can model the likelihood of any , correctly answering the query as:


and similarly for .

To learn appropriate model parameters, one can minimize, for each quadruple in the training set, the negative log-likelihood of correct prediction:


where represents all the model parameters.

Formulating the loss function following Equation 

8 requires computing the denominator of Equation 7 for every fact in the training set of the temporal KG; however it does not require generating negative samples Armandpour et al. (2019) and has been shown in practice (if computationally feasible - for our experiments’ scale is) to perform better.

5.1 Regularization

Various embedding methods use some form of regularization to improve the model’s generalizability to unseen facts and prevent from overfitting to the training data. TNTComplEx Lacroix et al. (2020) treats the TKG as an order 3 tensor by unfolding the temporal and predicate mode together and adopts the regularization derived for static KGs in ComplEx Lacroix et al. (2018). Other methods, for example TIMEPLEX Jain et al. (2020), use a sample-weighted L2 regularization penalty to prevent overfitting.

We also use the tensor nuclear norm due to its connection to tensor nuclear rank Friedland and Lim (2018). However, we directly consider the TKG as an order 4 tensor with containing the rank-one coefficients of its decomposition, and where are the tensors containing entity, relation and time embeddings. Based on this connection, we propose the following regularization:


We empirically compare different regularizations in Section 6 and show that outperforms other methods. We provide the theoretical theorems required to drive Equation 9 in the Appendix.

5.2 Temporal Regularization

In addition, one would like the model to take advantage of the fact that most entities behave smoothly over time. We can capture this smoothness property of real datasets by encouraging the model to learn similar transformations for closer timestamps. Hence, following Lacroix et al. (2020) and Sarkar and Moore (2006), we add a temporal smoothness objective to our loss function:


Tuning the hyper-parameter is related to the scale of and other components of the loss function and finding an appropriate can become difficult in practice if each component follows a different scale. Since we are using the 4-norm regularization in , we also use the 4-norm for . Similar phenomenal have been previously explored in other domains. For example, in Belloni et al. (2014) the authors propose , where they use the square root of the MSE component to match the scale of the 1-norm used in the sparsity regularization component of Lasso Tibshirani (1996) and show improvements in handling the unknown scale in badly behaved systems.

Since we used a 4-norm in , we also use the 4-norm for

. We saw that in practice using the same order makes it easier to tune the hyperparameters

and in .

5.3 Loss Function

To learn the representations for any TKG , the final training objective is to minimize:


where the first and the second terms encourage an accurate estimation of the edges in the TKG and the third term incorporates the temporal smoothness behaviour.

In the next section, we provide empirical experiments and compare ChronoR to various other benchmarks.

6 Experiments

We evaluate our proposed model for temporal link prediction on temporal knowledge graphs. We tune all the hyper-parameters using a grid search and each dataset’s provided validation set. We tune and from and the ratio of from with increments. For a fair comparison, we do not tune the embedding dimension; instead, in each experiment we choose such that our models have an equal number of parameters to those used in Lacroix et al. (2020). Table 3, in the Appendix, shows the dimensions used by each model for each of the datasets.

Training was done using mini-batch stochastic gradient descent with AdaGrad and a learning rate of 0.1 with a batch size of 1000 quadruples. We implemented all our models in Pytorch and trained on a single GeForce RTX 2080 GPU. The source code to reproduce the full experimental results will be made public on GitHub.

6.1 Datasets

We evaluate our model on three popular benchmarks for Temporal Knowledge graph completion, namely ICEWS14, ICEWS05-15, and Yago15K. All datasets contain only positive triples. The first two datasets are subsets of Integrated Crisis Early Warning System (ICEWS), which is a very popular knowledge graph used by the community. ICEWS14 is collected from 01/01/2014 to 12/31/2014, while ICEWS15-05 is the subset occurring between 01/01/2005 and 12/31/2015. Both datasets have timestamps for every fact with a temporal granularity of 24 hours. It is worth mentioning that these datasets are selected such that they only include the most frequently occurring entities (in both head and tail). Below are examples from ICEWS14:


(John Kerry, Praise or endorse, Lawmaker (Iraq), 2014-10-18) (Iraq, Receive deployment of peacekeepers, Iran, 2014-07-05) (Japan, Engage in negotiation, South Korea, 2014-02-18)

To create YAGO15K, Garcia-Duran et al. (2018)

 aligned the entities in FB15K 

Bordes et al. (2013) with those from YAGO, which contains temporal information. The final dataset is the result of all facts with successful alignment. It is worth noting that since YAGO does not have temporal information for all facts, this dataset is also temporally incomplete and more challenging. Below are examples from this dataset333Some strings shortened due to space.:


(David_Beckham, isAffiliatedTo, Man_U) (David_B, isAffiliatedTo, Paris_SG) (David_B, isMarriedTo, Victoria_Beckham, occursSince, ”1999-##-##”)

Entities 7,128 10,488 15,403
Relations 230 251 34
Timestamps 365 4,017 198
Facts 90,730 479,329 138,056
Time Span 2014 2005 - 2015 1513 - 2017
Table 2: Statistics for the various experimental datasets.
ChronoR k=2 1600 1350 1900
ChronoR k=3 800 700 950
Table 3: The embedding dimension (n) for each dataset used in our experiments.

To adapt YAGO15 to our model, following Lacroix et al. (2020), for each fact we group the relations occureSince/occureUntil together, in turn doubling our relation size. Note that this does not effect the evaluation protocol. Table 2, summarizes the statistics of used temporal KG benchmarks.

Figure 2: Comparison of various regularizers with different weights on a ChronoR(k=2) trained on ICEWS14.

6.2 Evaluation Metrics and Baselines

We follow the experimental set-up described in Garcia-Duran et al. (2018) and Goel et al. (2020). For each quadruple in the test set, we fill and by scoring and sorting all possible entities in . We report Hits@k for and filtered Mean Reciprocal Rank (MRR) for all datasets. Please see Nickel et al. (2016) for more details about filtered MRR.

We use baselines from both static and temporal KG embedding models. From the static KG embedding models, we use TransE, DistMult, SimplE, and ComplEx. These models ignore the timing information. It is worth noting that when evaluating these models on temporal KGs in the filtered setting, for each test quadruple, one must filter previously seen entities according to the fact and its time stamp, for a fair comparison.

To the best of our knowledge, we compare against every previously published temporal KG embedding models that have been evaluated on these datasets, which we discussed the details of in Section 2.

6.3 Results

In this section we analyze and perform a quantitative comparison of our model and previous state-of-the-art ones. We also experimentally verify the advantage of using Equation 9 for learning temporal embeddings.

Table 2 demonstrates link prediction performance comparison on all datasets. ChronoR consistently outperforms all competitors in terms of link prediction MRR and is greater than or equal to the previous work in terms of Hits@10 metric.

Our experiments with rotations in 3-dimensions show an improvement over ICEWS05-15, but lower performances compared to planar rotations on the other two datasets. We believe this is due to the more complex nature of this dataset (the higher number of relations and timestamps) compared to YAGO15K and ICEWS14. We do not see any significant gain on these three datasets using higher dimensional rotations. Similar to the observations in some static KG benchmarks Toutanova and Chen (2015), this might suggest the need for more sophisticated detests. However, we leave further studying of these datasets for future work.

In Figure 2, we plot a detailed comparison of our proposed regularizer to , the regularizer used in TNTComplEx Lacroix et al. (2020). , is a variational form of the nuclear 3-norm and is based on folding the TKG (as a 4-tensor) on its relation and temporal axis to get an order 3 tensor.

We drive by directly linking the scoring function to the 4-tensors factorization and show that it is the natural regularizer to use when penalizing by tensor nuclear norm. Note that increases MRR by 2 points and carefully selecting regularization weight can increase MRR up to 7 points.

7 Conclusion

We propose a novel k-dimensional rotation based embedding model to learn useful representations from temporal Knowledge graphs. Our method takes into account the change in entities and relations with respect to time. The temporal dynamics of both subject and object entities are captured by transforming them in the embedding space through rotation and scaling operations. Our work generalizes and adopts prior rotation based models in static KGs to the temporal domain. Moreover, we highlight and establish previously unexplored connections between prior scoring and regularization functions. Experimentally, we showed that ChronoR provides state-of-the-art performance on a variety of benchmark temporal KGs and that it can model the dynamics of temporal and relational patterns while being very economical in the number of its parameters. In future work, we will investigate combining other geometrical transformations and rotations and also explore other regularization techniques as well as closely examine the current temporal datasets as discussed in the experiment section.


This work is partially supported by NSF under IIS Award #1526753 and DARPA under Award #FA8750-18-2-0014 (AIDA/GAIA).


  • I. Abdelaziz, A. Fokoue, O. Hassanzadeh, P. Zhang, and M. Sadoghi (2017) Large-scale structural and textual similarity-based mining of knowledge graph to predict drug–drug interactions. Journal of Web Semantics 44, pp. 104–117. Cited by: §1.
  • C. C. Aggarwal, A. Hinneburg, and D. A. Keim (2001) On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory, pp. 420–434. Cited by: §4.2.
  • A. Aguilera and R. Pérez-Aguila (2004) General n-dimensional rotations. Cited by: §4.1.
  • N. M. Ahmed, L. Chen, Y. Wang, B. Li, Y. Li, and W. Liu (2016) Sampling-based algorithm for link prediction in temporal networks. Information Sciences 374, pp. 1–14. Cited by: §2.2.
  • N. M. Ahmed and L. Chen (2016) An efficient algorithm for link prediction in temporal uncertain social networks. Information Sciences 331, pp. 120–136. Cited by: §2.2.
  • M. Armandpour, P. Ding, J. Huang, and X. Hu (2019) Robust negative sampling for network embedding. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 3191–3198. Cited by: §5.
  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (2007) Dbpedia: a nucleus for a web of open data. In The semantic web, Cited by: §1.
  • A. Belloni, V. Chernozhukov, L. Wang, et al. (2014) Pivotal estimation via square-root lasso in nonparametric regression. The Annals of Statistics 42 (2), pp. 757–788. Cited by: §5.2.
  • J. Biega, E. Kuzey, and F. M. Suchanek (2013) Inside yago2s: a transparent information extraction architecture. In Proceedings of the 22nd International Conference on World Wide Web, Cited by: §1.
  • G. Blog (2012) Introducing the knowledge graph: thing, not strings. Introducing the Knowledge Graph: things, not strings. Cited by: §1.
  • K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Cited by: §1, §1.
  • A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, Cited by: §1, §1, §2.1, §6.1.
  • A. Burgun and O. Bodenreider (2001) Comparing terms, concepts and semantic classes in wordnet and the unified medical language system. In Proceedings of the NAACL’2001 Workshop,“WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Cited by: §1.
  • Y. Cao, X. Wang, X. He, Z. Hu, and T. Chua (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In The world wide web conference, Cited by: §1.
  • A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell (2010) Toward an architecture for never-ending language learning.. In Aaai, Cited by: §1.
  • S. S. Dasgupta, S. N. Ray, and P. Talukdar (2018) HyTE: hyperplane-based temporally aware knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Cited by: §1, §2.2.
  • C. Esteban, V. Tresp, Y. Yang, S. Baier, and D. Krompaß (2016) Predicting the co-evolution of event and knowledge graphs. In 2016 19th International Conference on Information Fusion (FUSION), pp. 98–105. Cited by: §2.2.
  • O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam (2011) Open information extraction: the second generation.. In IJCAI, Cited by: §1.
  • S. Friedland and L. Lim (2018) Nuclear norm of higher-order tensors. Mathematics of Computation. Cited by: §5.1.
  • A. Garcia-Duran, S. Dumančić, and M. Niepert (2018) Learning sequence encoders for temporal knowledge graph completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4816–4821. Cited by: §1, §2.2, §2.2, §6.1, §6.2.
  • R. Goel, S. Kazemi, M. A. Brubaker, and P. Poupart (2020) Diachronic embedding for temporal knowledge graph completion. In Thirty-Fourth AAAI Conference on Artificial Intelligence, Cited by: §2.2, §6.2.
  • İ. Güneş, Ş. Gündüz-Öğüdücü, and Z. Çataltepe (2016) Link prediction using time series of neighborhood-based node similarity scores. Data Mining and Knowledge Discovery 30 (1), pp. 147–180. Cited by: §2.2.
  • Z. Han, Y. Wang, Y. Ma, S. Guünnemann, and V. Tresp (2020) The graph hawkes network for reasoning on temporal knowledge graphs. arXiv preprint arXiv:2003.13432. Cited by: §2.2.
  • Y. Hao, Y. Zhang, K. Liu, S. He, Z. Liu, H. Wu, and J. Zhao (2017) An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 221–231. Cited by: §1.
  • J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum (2013) YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence. Cited by: §1, §1.
  • N. M. A. Ibrahim and L. Chen (2015) Link prediction in dynamic social networks by integrating different types of information. Applied Intelligence 42 (4), pp. 738–750. Cited by: §2.2.
  • P. Jain, S. Rathi, S. Chakrabarti, et al. (2020) Temporal knowledge base completion: new algorithms and evaluation protocols. arXiv preprint arXiv:2005.05035. Cited by: §2.2, §5.1.
  • G. Ji, S. He, L. Xu, K. Liu, and J. Zhao (2015) Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 687–696. Cited by: §2.1.
  • T. Jiang, T. Liu, T. Ge, L. Sha, S. Li, B. Chang, and Z. Sui (2016) Encoding temporal information for time-aware link prediction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354. Cited by: §2.2.
  • S. M. Kazemi and R. Goel (2020) Representation learning for dynamic graphs: a survey..

    Journal of Machine Learning Research

    21, pp. 1–73.
    Cited by: §2.2.
  • T. Lacroix, G. Obozinski, and N. Usunier (2020) Tensor decompositions for temporal knowledge base completion. ICLR. Cited by: §2.2, §4.2, §5.1, §5.2, §6.1, §6.3, §6.
  • T. Lacroix, N. Usunier, and G. Obozinski (2018) Canonical tensor decomposition for knowledge base completion. arXiv preprint arXiv:1806.07297. Cited by: §5.1.
  • J. Leblay and M. W. Chekol (2018) Deriving validity time in knowledge graph. In Companion of the The Web Conference 2018 on The Web Conference 2018, Cited by: §1.
  • D. Liben-Nowell and J. Kleinberg (2007) The link-prediction problem for social networks. Journal of the American society for information science and technology 58 (7), pp. 1019–1031. Cited by: §2.2.
  • Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu (2015) Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, Cited by: §1, §2.1.
  • Y. Ma, V. Tresp, and E. A. Daxberger (2019) Embedding models for episodic knowledge graphs. Journal of Web Semantics 59, pp. 100490. Cited by: §2.2.
  • F. Mahdisoltani, J. Biega, and F. M. Suchanek (2013) Yago3: a knowledge base from multilingual wikipedias. In CIDR, Cited by: §1.
  • B. Moradabadi and M. R. Meybodi (2017) A novel time series link prediction method: learning automata approach. Physica A: Statistical Mechanics and its Applications 482, pp. 422–432. Cited by: §2.2.
  • M. Nickel, L. Rosasco, and T. Poggio (2016) Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence, Cited by: §6.2.
  • A. Sadeghian, M. Armandpour, P. Ding, and D. Z. Wang (2019a) DRUM: end-to-end differentiable rule mining on knowledge graphs. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Link Cited by: §1.
  • A. Sadeghian, S. Minaee, I. Partalas, X. Li, D. Z. Wang, and B. Cowan (2019b) Hotel2vec: learning attribute-aware hotel embeddings with self-supervision. arXiv preprint arXiv:1910.03943. Cited by: §2.2.
  • A. Sadeghian, M. Rodriguez, D. Z. Wang, and A. Colas (2016) Temporal reasoning over event knowledge graphs. Cited by: §2.2.
  • S. Sang, Z. Yang, L. Wang, X. Liu, H. Lin, and J. Wang (2018) SemaTyP: a knowledge graph based literature mining method for drug discovery. BMC bioinformatics 19 (1), pp. 193. Cited by: §1.
  • P. Sarkar and A. W. Moore (2006) Dynamic social network analysis using latent space models. In Advances in Neural Information Processing Systems, pp. 1145–1152. Cited by: §5.2.
  • P. Sarkar, S. M. Siddiqi, and G. J. Gordon (2007) A latent space approach to dynamic embedding of co-occurrence data. In Artificial Intelligence and Statistics, pp. 420–427. Cited by: §2.2.
  • U. Sharan and J. Neville (2008)

    Temporal-relational classifiers for prediction in evolving domains

    In 2008 Eighth IEEE International Conference on Data Mining, pp. 540–549. Cited by: §2.2.
  • Z. Sun, Z. Deng, J. Nie, and J. Tang (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: §1, §2.1, §4, §4.
  • R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §5.2.
  • K. Toutanova and D. Chen (2015) Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §6.3.
  • T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard (2016) Complex embeddings for simple link prediction. Cited by: §1, §2.1, §4.2, §4.2.
  • Z. Wang, J. Zhang, J. Feng, and Z. Chen (2014) Knowledge graph embedding by translating on hyperplanes. In Twenty-Eighth AAAI conference on artificial intelligence, Cited by: §2.1.
  • J. Wu, M. Cao, J. C. K. Cheung, and W. L. Hamilton (2020) TeMP: temporal message passing for temporal knowledge graph completion. arXiv preprint arXiv:2010.03526. Cited by: §2.2.
  • C. Xiong, R. Power, and J. Callan (2017) Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pp. 1271–1279. Cited by: §1.
  • C. Xu, M. Nayyeri, F. Alkhoury, J. Lehmann, and H. S. Yazdi (2019) Temporal knowledge graph embedding model based on additive time series decomposition. arXiv preprint arXiv:1911.07893. Cited by: §2.2.
  • C. Xu, M. Nayyeri, F. Alkhoury, H. S. Yazdi, and J. Lehmann (2020) TeRo: a time-aware knowledge graph embedding via temporal rotation. arXiv preprint arXiv:2010.01029. Cited by: §2.2.
  • B. Yang and T. Mitchell (2017) Leveraging knowledge bases in lstms for improving machine reading. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Cited by: §1.
  • B. Yang, W. Yih, X. He, J. Gao, and L. Deng (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §1, §2.1.
  • L. Yao, L. Wang, L. Pan, and K. Yao (2016) Link prediction based on common-neighbors for dynamic social network. Procedia Computer Science 83, pp. 82–89. Cited by: §2.2.
  • A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni, and S. Soderland (2007) Textrunner: open information extraction on the web. In (NAACL-HLT), Cited by: §1.
  • F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma (2016) Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362. Cited by: §1.
  • S. Zhang, Y. Tay, L. Yao, and Q. Liu (2019) Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, pp. 2735–2745. Cited by: §1, §2.1, §4.2, §4.
  • X. Zhou, A. Sadeghian, and D. Z. Wang (2019) Mining rules incrementally over large knowledge bases. In Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 154–162. Cited by: §1.
  • A. Zimek, E. Schubert, and H. Kriegel (2012)

    A survey on unsupervised outlier detection in high-dimensional numerical data


    Statistical Analysis and Data Mining: The ASA Data Science Journal

    5 (5), pp. 363–387.
    Cited by: §4.2.