Knowledge graphs (KGs), consisting of triples in the form of (head entity, relationship, tail entity), are effective data structures for representing factual knowledge and lie at the core of many downstream tasks; e.g.,, question answering (Zhang et al., 2018; Lukovnikov et al., 2017; Huang et al., 2019) and web search (Paulheim, 2017). Although KGs enable powerful relational reasoning, they are usually incomplete. As such, inferring new facts based on existing ones in the KG, known as KG completion, is one of the most important tasks in KG research. Typical KGs represent knowledge facts without incorporating temporal information, which is sufficient under some circumstances (Bordes et al., 2013; Yang et al., 2014; Trouillon et al., 2016). By additionally associating each triple with a timestamp, such as (Obama, visit, China, 2014), temporal knowledge graphs (TKGs) are able to consider the temporal dynamics. Usually, TKGs are assumed to consist of discrete timestamps (Jiang et al., 2016b). They can be represented as a sequence of static KG snapshots. The task of inferring missing facts across these snapshots is referred to as temporal knowledge graph completion (TKGC). To tackle the TKGC task, two avenues of work have been explored. The first line of models induces time-dependent representation with time-agnostic decoding functions to extend static KGC methods for capturing the temporal dynamics (Dasgupta et al., 2018; Goel et al., 2020)
. The second category of methods adopts spatial-temporal models, which leverage graph neural networks (GNNs) to capture the intra-graph structural information and inter-graph temporal dependencies(Wu et al., 2020). We argue that there are still several areas for improvement. First, previous methods do not explicitly formulate the incremental learning problem, where the change (addition and deletion) of historical information is incrementally available, and the model is expected to adapt to the changes while maintaining its knowledge about the historical facts. Naively, one might fine-tune the TKGC model with all available data at each new time step using gradient descent optimization. This, however, causes the model performance on the historical task to degrade quickly, a phenomenon known as catastrophic forgetting (McCloskey and Cohen, 1989; Xu et al., 2020), which usually occurs because the model loses track of the key static features derived from earlier data. Second, previous methods usually only assess overall link prediction metrics such as Hits@10 and Mean Reciprocal Rank (MRR) while omitting the dynamic aspects of the TKG performance. There is an absence of metrics that can evaluate how well a model forgets deleted facts. For example, the quadruple (Trump, presidentOf, US, 2020) is no longer true in 2021. Hence we would like the model to rank Biden higher than Trump given the query (?, presidentOf, US, 2021). We argue that this is an essential measure of a model’s effectiveness in modeling the temporal dynamics of TKGs. Third, as discussed in Section LABEL:sec:problem_formulation, previous TKGC methods (Dasgupta et al., 2018; Goel et al., 2020) conduct training and evaluation once across all the time steps. This does not satisfy the scalability and training efficiency requirements in real-world KG applications, where millions of entities and relations frequently update (Vashishth et al., 2020; Ahrabian et al., 2020).
We introduce a new task, incremental TKGC, and propose TIE, a training and evaluation framework that integrates incremental learning with TKGC. TIE combines TKG representation learning, experience replay, temporal regularization to improve model performance and alleviate catastrophic forgetting. To measure TKGC models’ ability to discern facts that were true in the past but false at present, we propose new evaluation metrics dubbed Deleted Facts Hits@10 (DF) and Reciprocal Rank Difference Measure (RRD). To this end, we explicitly associate deleted quadruples with negative labels and integrate them into the training process, which shows improvement upon the two metrics compared to baseline methods. Finally, we show that training using added facts significantly improves the training speed and reduces dataset size by around ten times while maintaining a similar ranking performance level compared to vanilla fine-tuning methods. We adapt HyTE (Dasgupta et al., 2018) and DE (Goel et al., 2020), two existing TKGC models, to the incremental learning task on wikidata12k and YAGO11k datasets. Experiments results demonstrate that the proposed TIE framework reduces training time by about ten times and improves some of the proposed metrics compared to the full-batch training. It comes without a significant loss in any traditional measures. Extensive ablation studies reveal the performance trade-offs among different evaluation metrics, providing insights for choosing among model variations.
2. Related Work
2.1. Temporal KG Completion
Existing TKGC methods can be broadly categorized into two lines of work. The first line uses shallow encoders with time-sensitive decoding functions to extend static KGC methods (Jiang et al., 2016a; Dasgupta et al., 2018; Goel et al., 2020; Xu et al., 2019). For example, (Dasgupta et al., 2018)
constrains entity and relation embeddings. The decoded scores of triples lie in different hyperplanes for each timestamp. The second line of methods uses spatiotemporal models, which leverage graph neural networks (GNNs) to capture intra-graph neighborhood information and temporal recurrence or attention mechanisms to capture temporal information(Wu et al., 2020; Jin et al., 2020; Sankar et al., 2020). The third line of methods leverages temporal point processes to deal with continuous prediction in TKGs (Trivedi et al., 2017, 2019; Han et al., 2020). However, this line of work is orthogonal to ours as their focus is the extrapolation task in the TKG, which aims at predicting the future interactions among entities and relations. In our work, we aim to provide an efficient incremental learning framework for TKGC. Hence we focus on the shallow embedding methods.
2.2. Incremental Learning
As knowledge graphs evolve, more graph snapshots become available. However, deep learning models suffer fromcatastrophic forgetting when existing models are incrementally fine-tuned according to the newly available data (Kirkpatrick et al., 2017; Castro et al., 2018). Various incremental learning techniques have been introduced to combat this issue for deep learning models. Our work is closely related to the experience replay and regularization-based methods. Experience replay, also referred to as reservoir sampling, retains an additional set of the most representative historical data. Rehearsal methods (Rebuffi et al., 2017; Chaudhry et al., 2019b; Isele and Cosgun, 2018; Prabhu et al., 2020) explicitly maintain a pool of historical data when training the model on new tasks. One of the earliest methods, iCarLR (Rebuffi et al., 2017), sets the fixed number of samples for each task and selects samples that best approximate the feature mean of each class. Constrained optimization methods also belong to this category. Previous work (Lopez-Paz and Ranzato, 2017; Chaudhry et al., 2019a) exploits the stored samples to project the gradient of the current task’s loss to a desired region. The objective is to ensure that the loss on the historical samples will decrease after training on the current task. This is equivalent to projecting the gradients of the current data to a direction that aligns with the gradients of the previous data. Regularization-based approaches consolidate previous knowledge by introducing regularization terms in the loss when learning on new data (Kirkpatrick et al., 2017; Castro et al., 2018; Yang et al., 2019; Zenke et al., 2017). More recent work has explored applying incremental learning techniques for training deep graph neural networks. GraphSAIL (Xu et al., 2020) tackles the GNN-based recommendation system’s forgetting issue using knowledge distillation at both node and graph levels. ER-GNN (Zhou et al., 2020) proposes node importance metrics and selects the most influential nodes in the graph as reservoir data. The model is fine tuned on the new data as well as the selected nodes during the training. A more relevant work (Song and Park, 2018) applies the regularization-based method to enrich embeddings in knowledge graphs. However, the method in (Song and Park, 2018) focuses on data synthesized by subdividing a static knowledge graph into multiple snapshots. In our work, we propose an end-to-end framework combining experience replay and regularization-based methods that are specifically tailored for incrementally training TKGC tasks.
3. Problem Setup and Formulation
In this section, we introduce notations, specify assumptions, and describe the encoder-decoder framework for the standard TKGC (Wu et al., 2020). This is the foundation of our TIE framework for incremental TKGC.
We start by introducing commonly used evaluation metrics in standard TKGC, followed by the notions of current, historical average, and intransigence measures in the context of TKGC to quantify the different aspects of model capacity.
4.1. Standard TKGC Metrics
For each quadruple , we evaluate an object query and a subject query . Regarding the object query, we calculate the scores for all known entities, i.e., . The ranks are obtained by sorting the scores in descending order. Thereafter this is used to compute commonly used metrics such as Mean Reciprocal Rank (MRR) and Hits@k (k is usually 1, 3, and 10). The Hits@k is the percentage of test facts for which the correct entity’s rank is at most . For , we have the Hits@10 metrics, defined for object queries as:
where is the indicator function.
4.2. Incremental TKGC Metrics
Since the objective of incremental TKGC is to incorporate facts from new time steps while preserving knowledge derived from the previous ones, an incremental learning approach should be evaluated based on its performance on both the current and historical quadruples. Additionally, we would like them to measure a model’s ability to discern changes in the validity of facts at a different point in time, e.g., change of political affiliation or end of a marriage.
Current and Historical Average Measure
Let be the Hits@10 value specified in Equation (1) evaluated on , (), using the model incrementally trained after time step . The current performance measure () is written as . We adapt the Average Accuracy Measure proposed in (Chaudhry et al., 2018) to the TKGC setting, replacing the accuracy with the Hits@10 measure. The Average Hits@10 () at time step is defined as . The higher the value of , the better the model in terms of historical average performance, which is an important aspect for TKGC evaluation. This, to some degree, also measures whether a model is prone to catastrophic forgetting. A model that cannot retain past knowledge would yield a much lower than a model trained using all the historical data.
In the context of TKGC, we define intransigence as the inability of an algorithm to identify knowledge that was true in the past but false at present. For example, after graduating from a college, a student is no longer associated with the college. We categorize the measure into the model’s ability to 1) assign a low rank to the deleted facts and 2) rank the currently valid facts above the deleted facts. We propose Deleted Facts Hits@10 (DF) and Reciprocal Rank Difference (RRD) to measure the two aspects. The DF is analogous to the false positive rate in the classification setting, measuring the rank of the deleted triples’ current time step as their time attributes. A lower DF value suggests that a model has a better capability to exclude deleted facts from the top 10 results. The RRD is defined as the pairwise difference of reciprocal ranks between each positive quadruple in the test set and each deleted fact in the previous data. RRD implicitly focuses on the cases where the rank value of either the positive object or the negative object is low, e.g., , while discounting the cases where both rank values are high, e.g. . We define a time window ranging from to to limit the scope of evaluation. For every quadruple , we aim to find and then evaluate the related deleted facts from this time window. We define the DF and RRD metrics for object queries at time step :
where is the collection of negative objects and is the normalizing constant:
In practice, the RRD values are very close to zero. Hence we multiply the RRD by a factor of 100 for better presentation. The intransigence metrics for subject queries can be defined analogously.
5. Proposed Framework: TIE
We provide an overview of TIE before describing the proposed methods in detail in the following sections.
We establish the TIE framework that augments the TKGC encoder-decoder framework (Section LABEL:sec:encoder_decoder) with incremental learning techniques, a method to overcome intransigence, and an efficient training strategy. The overall architecture of TIE model is depicted in Figure 1. Algorithm LABEL:algorithm_core outlines the representation learning procedure of TIE. A key insight of our framework is that we adapt experience replay and temporal regularization techniques (Sections LABEL:sec:experience_replay and LABEL:sec:regularization) to address the catastrophic forgetting issues of fine-tuning methods using TKG representation learning models. Additionally, we propose to use the deleted facts from the recent time steps as a subset of negative training examples to address the intransigence issue of the state-of-the-art TKGC methods. Finally, we propose to use newly added facts only for fine-tuning at each time step. This is based on the finding that the particular type of TKGs of most interest is composed primarily of persistent facts, i.e., the average duration of facts is typically long enough that no drastic changes occur between adjacent time steps.
We evaluate the performance of our models on two standard TKGC benchmark datasets using our proposed evaluation protocol. We also conduct various ablation studies investigating the effectiveness of individual and combined components of the proposed methods.
We present a novel incremental learning framework named TIE for TKGC tasks. TIE combines TKG representation learning, frequency-based experience replay, and temporal regularization to improve the model’s performance on both current and past time steps. TIE leverages pattern frequencies to select among reservoir samples and uses only the deleted and added facts at the current time step for training, which significantly reduces training time and the size of training data. Moreover, we propose DF and RRD metrics to measure the intransigence of the model. Extensive ablation studies shows each proposed component’s effectiveness. They also provide insights for deciding among model variations by revealing performance trade-offs among various evaluation metrics. This work serves as a first attempt and exploration to apply incremental learning to TKGC tasks. Future work might involve exploring other incremental learning techniques, such as constrained optimization, to achieve more robust performance across datasets and metrics.
This research was supported in part by Noah’s Ark Lab (Montreal Research Centre), CIFAR Canada AI Chair program, FRQNT222The Fonds de Nature et technologies of Quebec and Samsung Electronics. The authors would like to thank Noah’s Ark Lab for providing the computational resources.
- Software engineering event modeling using relative time in temporal knowledge graphs. arXiv preprint arXiv:2007.01231. Cited by: §1.
- Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: §1.
End-to-end incremental learning.
Proceedings of the European conference on computer vision (ECCV), pp. 233–248. Cited by: §2.2.
- Riemannian walk for incremental learning: understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–547. Cited by: §4.2.
- Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, External Links: Cited by: §2.2.
- Continual learning with tiny episodic memories. Cited by: §2.2.
Hyte: hyperplane-based temporally aware knowledge graph embedding.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2001–2011. Cited by: §1, §1, §2.1.
Diachronic embedding for temporal knowledge graph completion.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 3988–3995. Cited by: §1, §1, §2.1.
- The graph hawkes network for reasoning on temporal knowledge graphs. arXiv preprint arXiv:2003.13432. Cited by: §2.1.
- Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 105–113. Cited by: §1.
- Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.2.
- Towards time-aware knowledge graph completion. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1715–1724. Cited by: §2.1.
- Towards time-aware knowledge graph completion. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1715–1724. Cited by: §1.
- Recurrent event network: autoregressive structure inference over temporal knowledge graphs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6669–6683. Cited by: §2.1.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences. Cited by: §2.2.
- Gradient episodic memory for continual learning. In Advances in neural information processing systems, pp. 6467–6476. Cited by: §2.2.
- Neural network-based question answering over knowledge graphs on word and character level. In Proceedings of the 26th international conference on World Wide Web, pp. 1211–1220. Cited by: §1.
- Catastrophic interference in connectionist networks: the sequential learning problem. In Psychology of learning and motivation, Vol. 24, pp. 109–165. Cited by: §1.
- Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8 (3), pp. 489–508. Cited by: §1.
- GDumb: a simple approach that questions our progress in continual learning. In European Conference on Computer Vision, pp. 524–540. Cited by: §2.2.
Icarl: incremental classifier and representation learning. In
Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Cited by: §2.2.
- DySAT: deep neural representation learning on dynamic graphs via self-attention networks. In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 519–527. Cited by: §2.1.
- Enriching translation-based knowledge graph embeddings through continual learning. IEEE Access 6, pp. 60489–60497. Cited by: §2.2.
Know-evolve: deep temporal reasoning for dynamic knowledge graphs.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3462–3471. Cited by: §2.1.
- DyRep: learning representations over dynamic graphs. In International Conference on Learning Representations, External Links: Cited by: §2.1.
- Complex embeddings for simple link prediction. In International Conference on Machine Learning, pp. 2071–2080. Cited by: §1.
- Composition-based multi-relational graph convolutional networks. In International Conference on Learning Representations, External Links: Cited by: §1.
- TeMP: temporal message passing for temporal knowledge graph completion. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5730–5746. Cited by: §1, §2.1, §3.
- Temporal knowledge graph embedding model based on additive time series decomposition. arXiv preprint arXiv:1911.07893. Cited by: §2.1.
- GraphSAIL: graph structure aware incremental learning for recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2861–2868. Cited by: §1, §2.2.
- Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1.
- Adaptive deep models for incremental learning: considering capacity scalability and sustainability. In Proc. ACM Conf. Knowledge Discovery and Data Mining, Cited by: §2.2.
- Continual learning through synaptic intelligence. Proceedings of machine learning research 70, pp. 3987–3995. External Links: Cited by: §2.2.
- Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §1.
- Continual graph learning. arXiv:2003.09908. Cited by: §2.2.