1 Introduction
Knowledge Graphs (KGs) organize information around entities (people, countries, organizations, movies, etc.) in the form of factual triplets, where each triplet represents how two entities are related to each other, for example (Washington DC, captialOf, USA).
There exists an ever growing number of publicly available KGs, for example DBPedia
Auer et al. (2007), Freebase Bollacker et al. (2008), Google Knowledge Graph Blog (2012), NELL Carlson et al. (2010), OpenIE Yates et al. (2007); Etzioni et al. (2011), YAGO Biega et al. (2013); Hoffart et al. (2013); Mahdisoltani et al. (2013), and UMLS Burgun and Bodenreider (2001). This structured way of representing knowledge makes it easy for computers to digest and utilize in various applications. For example KGs are used in recommender systems Cao et al. (2019); Zhang et al. (2016), the medical domain Sang et al. (2018); Abdelaziz et al. (2017), questionanswering Hao et al. (2017), information retrieval Xiong et al. (2017), and natural language processing
Yang and Mitchell (2017).Though these graphs are continuously growing, they remain particularly incomplete. Knowledge Base Completion (KBC) focuses on finding/predicting missing relations between entities. A wide range of work explores various methods of finding extraction errors or completing KGs Bordes et al. (2013); Yang et al. (2014); Trouillon et al. (2016); Sadeghian et al. (2019a); Zhou et al. (2019); Sun et al. (2019); Zhang et al. (2019).
Different events and actions cause relations and entities to evolve over time. For example, Figure 1 illustrates a timeline of relations and events happening in 2014 involving entities like Obama (president of USA at the time), NATO, Afghanistan, and Russia. Here, an agreement between Obama and Afghanistan is happening concurrently with NATO increasing armed forces in Afghanistan; after Obama made a visit to NATO. A few months later, after NATO makes some pessimistic comments towards Russia and Obama wants to deescalate the conflict, it appears that he provides military aid to NATO, and in turn, NATO provides military aid in Afghanistan.
Temporallyaware KGs are graphs that add a fourth dimension, namely time t, giving the fact a temporal context. Temporal KGs are designed to capture this temporal information and the dynamic nature of realworld facts. While many of the previous works study knowledge graph completion on static KGs, little attention has been given to temporallyaware KGs. Though recent work has begun to solve the temporal link prediction task, these models often utilize a large number of parameters, making them difficult to train GarciaDuran et al. (2018); Dasgupta et al. (2018); Leblay and Chekol (2018). Furthermore, many use inadequate datasets such as YAGO2 Hoffart et al. (2013), which are sparse in the time domain, or a time augmented version of FreeBase Bollacker et al. (2008), where time is appended to some existing facts.
One of the most popular models used to solve the link prediction task involves embedding the KG, that is, mapping entities and relations in the KG to high dimensional vectors, in which each entity and relation mapping considers the structure of the graph as constraints
Bordes et al. (2013); Yang et al. (2014); Lin et al. (2015). These techniques have proven to be the stateoftheart in modeling static knowledge graphs and inferring new facts from the KG based on the existing ones. Similarly, temporal knowledge graph embedding methods learn an additional mapping for time. Different methods differ based on how they map the elements in the knowledge graph and their scoring function.In this paper, we propose ChronoR, a novel temporal link prediction model based on
dimensional rotation. We formulate the link prediction problem as learning representations for entities in the knowledge graph and a rotation operator based on each fact’s relation and temporal elements. Further, we show that the proposed scoring function is a generalization of previously used scoring functions for the static KG link prediction task. We also provide insights into the regularization used in other similar models and propose a new regularization method inspired by tensor nuclear norms. Empirical experiments also confirm its advantages. Our experiments on available benchmarks show that ChronoR outperforms previous stateoftheart methods.
The rest of the paper is organized as follows: Section 2 covers the relevant previous work, Section 3 formally defines the temporal knowledge graph completion problem, in Section 4 we go over the proposed model’s details and learning procedure, Section 5 details the loss and regulation functions, Section 6 discusses the experimental setup, and Section 7 concludes our work.
2 Related Work
2.1 Static KG Embeddings
There has been a substantial amount of research in KG embedding in the nontemporal domain. One of the earliest models, TransE Bordes et al. (2013), is a translational distancebased model, which embeds the entities h and t, along with relation r, and maps them through the function: h + r = t. There have been several extensions to TransE, including TransH Wang et al. (2014)
which models relations as hyperplanes; TransR
Lin et al. (2015) which embed entities and relations in separate spaces and TransD Ji et al. (2015) in which two vectors represent each element in a triple in order to represent the elements and construct mapping matrices. Other works, such as DistMult Yang et al. (2014), represent relations as bilinear functions. ComplEx Trouillon et al. (2016) extends DistMult to the complex space. RotatE Sun et al. (2019) also embeds the entities in the complex space, and treats relations as planar rotations. QuatE Zhang et al. (2019) embeds each element using quaternions.2.2 Temporal Embeddings
There has been a wide range of approaches to the problem of temporal link prediction Kazemi and Goel (2020). A straightforward technique is to ignore the timestamps and make a static KB by aggregating links across different times LibenNowell and Kleinberg (2007), then learn a static embedding for each entity. There have been several attempts to improve along this direction by giving more weights to the links that are more recent Sharan and Neville (2008); Ibrahim and Chen (2015); Ahmed and Chen (2016); Ahmed et al. (2016). In contrast to these methods, Yao et al. (2016) first learn embeddings for each snapshot of the KB, then aggregate the embeddings by using a weighted average of them. Several techniques have been proposed for the choice of weights of the embedding aggregation, for instance, based on ARIMA Güneş et al. (2016)
Moradabadi and Meybodi (2017).Some of the other works to extend sequence models to TKG. Sarkar et al. (2007)
employs a Kalman filter to learn dynamic node embeddings.
GarciaDuran et al. (2018) use recurrent neural nets (RNN) to accommodate for temporal data and extend DistMult and TransE to TKG. For each relation, a temporal embedding has been learned by feeding time characters and the static relation embedding to an LSTM. This method only learns dynamic embedding for relations, not entities. Furthermore, Han et al. (2020) utilize a temporal point process parameterized by a deep neural architecture.In one of the earliest works to employ representation learning techniques for reasoning over TKGs Sadeghian et al. (2016), proposed both an embedding method as well as rule mining methods for reasoning over TKGs. Another related work, tTransE Jiang et al. (2016), learns timebased embeddings indirectly by learning the order of timesensitive relations, such as wasBornIn followed by diedIn. Esteban et al. (2016) also impose temporal order constraints on their data by adding an element to their quadruple (s,p,o,t:Bool), where Bool indicates if the fact vanishes or continues after time t. However, the model is only demonstrated on medical and sensory data.
Inspired by the success of diachronic word embeddings, some methods have tried to extend them to the TKG problem GarciaDuran et al. (2018); Dasgupta et al. (2018)
. Diachronic methods map every (node, timestamp) or (relation, timestamp) pair to a hidden representation.
Goel et al. (2020)learn dynamic embeddings by masking a fraction of the embedding weights with an activation function of frequencies and
Xu et al. (2019) embed the vectors as a direct function of time. Two concurrent temporal reasoning methods TeRo Xu et al. (2020) and TeMP Wu et al. (2020) are also included in the empirical comparison table in Section 6.Other methods, like Ma et al. (2019); Sadeghian et al. (2019b); Jain et al. (2020); Lacroix et al. (2020), do not evolve the embedding of entities over time. Instead, by using a representation for time, learn the temporal behavior. For instance, Ma et al. (2019) change the scoring function based on the time embedding and Lacroix et al. (2020)
perform tensor decomposition based on the time representation.
3 Problem Definition
In this section, we formally define the problem of temporal knowledge graph completion and specify the notations used throughout the rest of the paper.
We represent scalars with lower case letters , vectors and matrices with bold lower case letters , higher order tensors with bold upper case letters , and the ^{th} element of a vector as . We use to denote the element wise product of two vectors and to denote matrix or vector concatenation. We denote the complex norm as and denotes the vector pnorm; we drop when .
A Temporal Knowledge Graph is referred to a set of quadruples . Each quadruple represents a temporal fact that is true in a world. is the set of all entities and is the set of all relations in the ontology. The fourth element in each quadruple represents time, which is often discretized. represents the set of all possible time stamps.
Temporal Knowledge Graph Completion (temporal link prediction) refers to the problem of completing a TKGE by inferring facts from a given subset of its facts. In this work, we focus on predicting temporal facts within the observed set , as opposed to the more general problem which also involves forecasting future facts.
4 Temporal KG Representation Learning
In this section, we present a framework for temporal knowledge graph representation learning. Given a TKG, we want to learn representations for entities, relations, and timestamps (e.g., ) and a scoring function , such that true quadruples receive high scores. Thus, given , the embeddings can be learned by optimizing an appropriate cost function.
Many of the previous works use a variety of different objectives and linear/nonlinear operators to adapt static KG completion’s scoring functions to the scoring function in the temporal case (see Section 2).
One example, RotatE Sun et al. (2019) learns embeddings by requiring each triplet’s head to fall close to its tail once transformed by an (elementwise) rotation parametrized by the relation vector: , where . Thus, RotatE defines , for some .
Our model is inspired by the success of rotationbased models in static KG completion Sun et al. (2019); Zhang et al. (2019). For example, to carry out a rotation by an angle in the two dimensional space, one can use the well known Euler’s formula , and the fact that if is represented by its dual complex form , then . We represent rotation by angle in the 2dimensional space with .
4.1 ChronoR
In this paper, we consider a subset of the group of general linear transformations
over the kdimensional real space, consisting of rotation and scaling and parametrize the transformation by both time and relation. Intuitively, we expect that for true facts:(1) 
where and represents the (rowwise) linear operator in kdimensional space, parametrized by and
. Note that any orthogonal matrix
(i.e., ) is equivalent to a kdimensional rotation. However, we relax the unit norm constraint, thus extending rotations to a subset of linear operators which also includes scaling.As previous work has noted, for 2 or 3 dimensional rotations, one can represent using complex numbers and quaternions (), respectively. Higher dimensional transformations can also be constructed using the AguileraPerez Algorithm Aguilera and PérezAguila (2004) followed by a scalar multiplication.
4.2 Scoring Function
Unlike RotatE, that uses a scoring function based on the Euclidean distance of and , we propose to use the angle between the two vectors.
Our motivations comes from observations in a variety of previous works Aggarwal et al. (2001); Zimek et al. (2012) showing that in higher dimensions, the Euclidean norm suffers from the curse of high dimensionality and is not a good measure of the concept of proximity or similarity between vectors. We use the following wellknown definition of inner product to define the angle between and .
Definition 1.
If and are matrices in we define:
(2)  
(3) 
Based on the above definition, the angle between two matrices is proportional to their inner product. Hence, we define our scoring function as:
(4) 
That is, we expect the of the relative angle between and to be higher (i.e., their angle close to ) when is a true fact in the TKG.
Some of the state of the art work in static KG completion, such as quaternionbased rotations QuatE Zhang et al. (2019), complex domain tensor factorization Trouillon et al. (2016), and also recent works on temporal KG completion like TNTComplEx Lacroix et al. (2020), use a scoring function similar to
(5)  
and motivate it by the fact that the optimisation function requires the scores to be purely real Trouillon et al. (2016).
However, it is interesting to note that this scoring method is in fact a special case of Equation 4. The following theorem proves the equivalence for scoring functions used in ComplEx^{1}^{1}1A similar theorem holds for quaternions when k=4, see Appendix to Equation 4 when .
Theorem 1.
If and are their equivalent matrix forms, then (proof in Appendix)
To fully define we also need to specify how the linear operator is parameterized by . In the rest of the paper and experiments, we simply concatenate the head and relation embeddings to get , where and are the representations of the fact’s relation and time elements and . Since in many real world TKGs, there are a combination of static and dynamic facts, we also allow an extra rotation operator parametrized only by , i.e., an extra term in to better represent static facts.
To summarize, our scoring function is defined as:
(6) 
where , and .
5 Optimization
ICEWS14  ICEWS0515  YAGO15K  
Model  MRR  Hit@1  Hits@3  Hit@10  MRR  Hit@1  Hit@3  Hit@10  MRR  Hit@1  Hit@3  Hit@10 
TransE (2013)  28.0  9.4    63.70  29.4  8.4    66.30  29.6  22.8    46.8 
DistMult (2014)  43.9  32.3    67.2  45.6  33.7    69.1  27.5  21.5    43.8 
SimpIE (2018)  45.8  34.1  51.6  68.7  47.8  35.9  53.9  70.8         
ComplEx (2016)  47.0  35.0  54.0  71.0  49.0  37.0  55.0  73.0  36.0  29.0  36.0  54.0 
ConT (2018)  18.5  11.7  20.5  31.50  16.4  10.5  18.9  27.20         
TTransE (2016)  25.5  7.4    60.1  27.1  8.4    61.6  32.1  23.0    51.0 
TATransE (2018)  27.5  9.5    62.5  29.9  9.6    66.8  32.1  23.1    51.2 
HyTE (2018)  29.7  10.8  41.6  65.5  31.6  11.6  44.5  68.1         
TADistMult (2018)  47.7    36.3  68.6  47.4  34.6    72.8  29.1  21.6    47.6 
DESimpIE (2020)  52.6  41.8  59.2  72.5  51.3  39.2  57.8  74.8         
TIMEPLEX (2020)  60.40  51.50    77.11  63.99  54.51    81.81         
TNTComplEx (2020)  60.72  51.91  65.92  77.17  66.64  58.34  71.82  81.67  35.94  28.49  36.84  53.75 
TeRo (concurrent work)  56.2  46.8  62.1  73.2  58.6  46.9  66.8  79.5         
TeMPSA (concurrent work)  60.7  48.4  68.4  84.0  68.0  55.3  76.9  91.3         
ChronoR (k=3)  59.39  49.64  65.40  77.30  68.41  61.06  73.01  82.13  36.50  29.16  37.63  53.53 
ChronoR (k=2)  62.53  54.67  66.88  77.31  67.50  59.63  72.29  82.03  36.62  29.18  37.92  53.79 
Having an appropriate scoring function, one can model the likelihood of any , correctly answering the query as:
(7) 
and similarly for .
To learn appropriate model parameters, one can minimize, for each quadruple in the training set, the negative loglikelihood of correct prediction:
(8) 
where represents all the model parameters.
Formulating the loss function following Equation
8 requires computing the denominator of Equation 7 for every fact in the training set of the temporal KG; however it does not require generating negative samples Armandpour et al. (2019) and has been shown in practice (if computationally feasible  for our experiments’ scale is) to perform better.5.1 Regularization
Various embedding methods use some form of regularization to improve the model’s generalizability to unseen facts and prevent from overfitting to the training data. TNTComplEx Lacroix et al. (2020) treats the TKG as an order 3 tensor by unfolding the temporal and predicate mode together and adopts the regularization derived for static KGs in ComplEx Lacroix et al. (2018). Other methods, for example TIMEPLEX Jain et al. (2020), use a sampleweighted L2 regularization penalty to prevent overfitting.
We also use the tensor nuclear norm due to its connection to tensor nuclear rank Friedland and Lim (2018). However, we directly consider the TKG as an order 4 tensor with containing the rankone coefficients of its decomposition, and where are the tensors containing entity, relation and time embeddings. Based on this connection, we propose the following regularization:
5.2 Temporal Regularization
In addition, one would like the model to take advantage of the fact that most entities behave smoothly over time. We can capture this smoothness property of real datasets by encouraging the model to learn similar transformations for closer timestamps. Hence, following Lacroix et al. (2020) and Sarkar and Moore (2006), we add a temporal smoothness objective to our loss function:
(10) 
Tuning the hyperparameter is related to the scale of and other components of the loss function and finding an appropriate can become difficult in practice if each component follows a different scale. Since we are using the 4norm regularization in , we also use the 4norm for . Similar phenomenal have been previously explored in other domains. For example, in Belloni et al. (2014) the authors propose , where they use the square root of the MSE component to match the scale of the 1norm used in the sparsity regularization component of Lasso Tibshirani (1996) and show improvements in handling the unknown scale in badly behaved systems.
Since we used a 4norm in , we also use the 4norm for
. We saw that in practice using the same order makes it easier to tune the hyperparameters
and in .5.3 Loss Function
To learn the representations for any TKG , the final training objective is to minimize:
(11) 
where the first and the second terms encourage an accurate estimation of the edges in the TKG and the third term incorporates the temporal smoothness behaviour.
In the next section, we provide empirical experiments and compare ChronoR to various other benchmarks.
6 Experiments
We evaluate our proposed model for temporal link prediction on temporal knowledge graphs. We tune all the hyperparameters using a grid search and each dataset’s provided validation set. We tune and from and the ratio of from with increments. For a fair comparison, we do not tune the embedding dimension; instead, in each experiment we choose such that our models have an equal number of parameters to those used in Lacroix et al. (2020). Table 3, in the Appendix, shows the dimensions used by each model for each of the datasets.
Training was done using minibatch stochastic gradient descent with AdaGrad and a learning rate of 0.1 with a batch size of 1000 quadruples. We implemented all our models in Pytorch and trained on a single GeForce RTX 2080 GPU. The source code to reproduce the full experimental results will be made public on GitHub.
6.1 Datasets
We evaluate our model on three popular benchmarks for Temporal Knowledge graph completion, namely ICEWS14, ICEWS0515, and Yago15K. All datasets contain only positive triples. The first two datasets are subsets of Integrated Crisis Early Warning System (ICEWS), which is a very popular knowledge graph used by the community. ICEWS14 is collected from 01/01/2014 to 12/31/2014, while ICEWS1505 is the subset occurring between 01/01/2005 and 12/31/2015. Both datasets have timestamps for every fact with a temporal granularity of 24 hours. It is worth mentioning that these datasets are selected such that they only include the most frequently occurring entities (in both head and tail). Below are examples from ICEWS14:
[fontsize=]
(John Kerry, Praise or endorse, Lawmaker (Iraq), 20141018) (Iraq, Receive deployment of peacekeepers, Iran, 20140705) (Japan, Engage in negotiation, South Korea, 20140218)
To create YAGO15K, GarciaDuran et al. (2018)
aligned the entities in FB15K
Bordes et al. (2013) with those from YAGO, which contains temporal information. The final dataset is the result of all facts with successful alignment. It is worth noting that since YAGO does not have temporal information for all facts, this dataset is also temporally incomplete and more challenging. Below are examples from this dataset^{3}^{3}3Some strings shortened due to space.:[fontsize=]
(David_Beckham, isAffiliatedTo, Man_U) (David_B, isAffiliatedTo, Paris_SG) (David_B, isMarriedTo, Victoria_Beckham, occursSince, ”1999####”)
ICEWS14  ICEWS0515  YAGO15k  

Entities  7,128  10,488  15,403 
Relations  230  251  34 
Timestamps  365  4,017  198 
Facts  90,730  479,329  138,056 
Time Span  2014  2005  2015  1513  2017 
ICEWS14  ICEWS0515  YAGO15k  

ChronoR k=2  1600  1350  1900 
ChronoR k=3  800  700  950 
6.2 Evaluation Metrics and Baselines
We follow the experimental setup described in GarciaDuran et al. (2018) and Goel et al. (2020). For each quadruple in the test set, we fill and by scoring and sorting all possible entities in . We report Hits@k for and filtered Mean Reciprocal Rank (MRR) for all datasets. Please see Nickel et al. (2016) for more details about filtered MRR.
We use baselines from both static and temporal KG embedding models. From the static KG embedding models, we use TransE, DistMult, SimplE, and ComplEx. These models ignore the timing information. It is worth noting that when evaluating these models on temporal KGs in the filtered setting, for each test quadruple, one must filter previously seen entities according to the fact and its time stamp, for a fair comparison.
To the best of our knowledge, we compare against every previously published temporal KG embedding models that have been evaluated on these datasets, which we discussed the details of in Section 2.
6.3 Results
In this section we analyze and perform a quantitative comparison of our model and previous stateoftheart ones. We also experimentally verify the advantage of using Equation 9 for learning temporal embeddings.
Table 2 demonstrates link prediction performance comparison on all datasets. ChronoR consistently outperforms all competitors in terms of link prediction MRR and is greater than or equal to the previous work in terms of Hits@10 metric.
Our experiments with rotations in 3dimensions show an improvement over ICEWS0515, but lower performances compared to planar rotations on the other two datasets. We believe this is due to the more complex nature of this dataset (the higher number of relations and timestamps) compared to YAGO15K and ICEWS14. We do not see any significant gain on these three datasets using higher dimensional rotations. Similar to the observations in some static KG benchmarks Toutanova and Chen (2015), this might suggest the need for more sophisticated detests. However, we leave further studying of these datasets for future work.
In Figure 2, we plot a detailed comparison of our proposed regularizer to , the regularizer used in TNTComplEx Lacroix et al. (2020). , is a variational form of the nuclear 3norm and is based on folding the TKG (as a 4tensor) on its relation and temporal axis to get an order 3 tensor.
We drive by directly linking the scoring function to the 4tensors factorization and show that it is the natural regularizer to use when penalizing by tensor nuclear norm. Note that increases MRR by 2 points and carefully selecting regularization weight can increase MRR up to 7 points.
7 Conclusion
We propose a novel kdimensional rotation based embedding model to learn useful representations from temporal Knowledge graphs. Our method takes into account the change in entities and relations with respect to time. The temporal dynamics of both subject and object entities are captured by transforming them in the embedding space through rotation and scaling operations. Our work generalizes and adopts prior rotation based models in static KGs to the temporal domain. Moreover, we highlight and establish previously unexplored connections between prior scoring and regularization functions. Experimentally, we showed that ChronoR provides stateoftheart performance on a variety of benchmark temporal KGs and that it can model the dynamics of temporal and relational patterns while being very economical in the number of its parameters. In future work, we will investigate combining other geometrical transformations and rotations and also explore other regularization techniques as well as closely examine the current temporal datasets as discussed in the experiment section.
Acknowledgments
This work is partially supported by NSF under IIS Award #1526753 and DARPA under Award #FA87501820014 (AIDA/GAIA).
References
 Largescale structural and textual similaritybased mining of knowledge graph to predict drug–drug interactions. Journal of Web Semantics 44, pp. 104–117. Cited by: §1.
 On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory, pp. 420–434. Cited by: §4.2.
 General ndimensional rotations. Cited by: §4.1.
 Samplingbased algorithm for link prediction in temporal networks. Information Sciences 374, pp. 1–14. Cited by: §2.2.
 An efficient algorithm for link prediction in temporal uncertain social networks. Information Sciences 331, pp. 120–136. Cited by: §2.2.

Robust negative sampling for network embedding.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 3191–3198. Cited by: §5.  Dbpedia: a nucleus for a web of open data. In The semantic web, Cited by: §1.
 Pivotal estimation via squareroot lasso in nonparametric regression. The Annals of Statistics 42 (2), pp. 757–788. Cited by: §5.2.
 Inside yago2s: a transparent information extraction architecture. In Proceedings of the 22nd International Conference on World Wide Web, Cited by: §1.
 Introducing the knowledge graph: thing, not strings. Introducing the Knowledge Graph: things, not strings. Cited by: §1.
 Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Cited by: §1, §1.
 Translating embeddings for modeling multirelational data. In Advances in neural information processing systems, Cited by: §1, §1, §2.1, §6.1.
 Comparing terms, concepts and semantic classes in wordnet and the unified medical language system. In Proceedings of the NAACL’2001 Workshop,“WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Cited by: §1.
 Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In The world wide web conference, Cited by: §1.
 Toward an architecture for neverending language learning.. In Aaai, Cited by: §1.
 HyTE: hyperplanebased temporally aware knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Cited by: §1, §2.2.
 Predicting the coevolution of event and knowledge graphs. In 2016 19th International Conference on Information Fusion (FUSION), pp. 98–105. Cited by: §2.2.
 Open information extraction: the second generation.. In IJCAI, Cited by: §1.
 Nuclear norm of higherorder tensors. Mathematics of Computation. Cited by: §5.1.
 Learning sequence encoders for temporal knowledge graph completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4816–4821. Cited by: §1, §2.2, §2.2, §6.1, §6.2.
 Diachronic embedding for temporal knowledge graph completion. In ThirtyFourth AAAI Conference on Artificial Intelligence, Cited by: §2.2, §6.2.
 Link prediction using time series of neighborhoodbased node similarity scores. Data Mining and Knowledge Discovery 30 (1), pp. 147–180. Cited by: §2.2.
 The graph hawkes network for reasoning on temporal knowledge graphs. arXiv preprint arXiv:2003.13432. Cited by: §2.2.
 An endtoend model for question answering over knowledge base with crossattention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 221–231. Cited by: §1.
 YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence. Cited by: §1, §1.
 Link prediction in dynamic social networks by integrating different types of information. Applied Intelligence 42 (4), pp. 738–750. Cited by: §2.2.
 Temporal knowledge base completion: new algorithms and evaluation protocols. arXiv preprint arXiv:2005.05035. Cited by: §2.2, §5.1.
 Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 687–696. Cited by: §2.1.
 Encoding temporal information for timeaware link prediction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354. Cited by: §2.2.

Representation learning for dynamic graphs: a survey..
Journal of Machine Learning Research
21, pp. 1–73. Cited by: §2.2.  Tensor decompositions for temporal knowledge base completion. ICLR. Cited by: §2.2, §4.2, §5.1, §5.2, §6.1, §6.3, §6.
 Canonical tensor decomposition for knowledge base completion. arXiv preprint arXiv:1806.07297. Cited by: §5.1.
 Deriving validity time in knowledge graph. In Companion of the The Web Conference 2018 on The Web Conference 2018, Cited by: §1.
 The linkprediction problem for social networks. Journal of the American society for information science and technology 58 (7), pp. 1019–1031. Cited by: §2.2.
 Learning entity and relation embeddings for knowledge graph completion. In Twentyninth AAAI conference on artificial intelligence, Cited by: §1, §2.1.
 Embedding models for episodic knowledge graphs. Journal of Web Semantics 59, pp. 100490. Cited by: §2.2.
 Yago3: a knowledge base from multilingual wikipedias. In CIDR, Cited by: §1.
 A novel time series link prediction method: learning automata approach. Physica A: Statistical Mechanics and its Applications 482, pp. 422–432. Cited by: §2.2.
 Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence, Cited by: §6.2.
 DRUM: endtoend differentiable rule mining on knowledge graphs. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlchéBuc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Link Cited by: §1.
 Hotel2vec: learning attributeaware hotel embeddings with selfsupervision. arXiv preprint arXiv:1910.03943. Cited by: §2.2.
 Temporal reasoning over event knowledge graphs. Cited by: §2.2.
 SemaTyP: a knowledge graph based literature mining method for drug discovery. BMC bioinformatics 19 (1), pp. 193. Cited by: §1.
 Dynamic social network analysis using latent space models. In Advances in Neural Information Processing Systems, pp. 1145–1152. Cited by: §5.2.
 A latent space approach to dynamic embedding of cooccurrence data. In Artificial Intelligence and Statistics, pp. 420–427. Cited by: §2.2.

Temporalrelational classifiers for prediction in evolving domains
. In 2008 Eighth IEEE International Conference on Data Mining, pp. 540–549. Cited by: §2.2.  Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: §1, §2.1, §4, §4.
 Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §5.2.
 Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §6.3.
 Complex embeddings for simple link prediction. Cited by: §1, §2.1, §4.2, §4.2.
 Knowledge graph embedding by translating on hyperplanes. In TwentyEighth AAAI conference on artificial intelligence, Cited by: §2.1.
 TeMP: temporal message passing for temporal knowledge graph completion. arXiv preprint arXiv:2010.03526. Cited by: §2.2.
 Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pp. 1271–1279. Cited by: §1.
 Temporal knowledge graph embedding model based on additive time series decomposition. arXiv preprint arXiv:1911.07893. Cited by: §2.2.
 TeRo: a timeaware knowledge graph embedding via temporal rotation. arXiv preprint arXiv:2010.01029. Cited by: §2.2.
 Leveraging knowledge bases in lstms for improving machine reading. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Cited by: §1.
 Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §1, §2.1.
 Link prediction based on commonneighbors for dynamic social network. Procedia Computer Science 83, pp. 82–89. Cited by: §2.2.
 Textrunner: open information extraction on the web. In (NAACLHLT), Cited by: §1.
 Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362. Cited by: §1.
 Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, pp. 2735–2745. Cited by: §1, §2.1, §4.2, §4.
 Mining rules incrementally over large knowledge bases. In Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 154–162. Cited by: §1.

A survey on unsupervised outlier detection in highdimensional numerical data
.Statistical Analysis and Data Mining: The ASA Data Science Journal
5 (5), pp. 363–387. Cited by: §4.2.