1 Introduction
In recent years a number of sizable Knowledge Graphs (KGs) have been developed, the largest ones containing more than 100 billion facts. Well known examples are DBpedia auer2007dbpedia ,YAGO suchanek2007yago , Freebase bollacker2008freebase , Wikidata vrandevcic2014wikidata and the Google KG singhal2012introducing . Practical issues with completeness, quality and maintenance have been solved to a degree that some of these knowledge graphs support search, text understanding and question answering in largescale commercial systems singhal2012introducing . In addition, statistical embedding models have been developed that can be used to compress a knowledge graph, to derive implicit facts, to detect errors, and to support the above mentioned applications. A recent survey on KG models can be found in nickel2015 .
Most knowledge graphs are static and reflect the world at its current state. In reality, of course, the state of the world is changing: a healthy person becomes diagnosed with a disease and a new president is inaugurated. In this paper, we extend semantic knowledge graph embedding models to episodic/temporal knowledge graphs as an efficient way to store episodic data and to be able to generalize to new facts (inductive learning). In particular, we generalize leading approaches for static knowledge graphs (i.e., constrained Tucker, DistMult, RESCAL, HolE, ComplEx) to temporal knowledge graphs. We test these models using two temporal KGs. The first one is derived from the Integrated Conflict Early Warning System (ICEWS) data set which describes interactions between nations over several years. The second one is derived from the Global Database of Events, Language and Tone (GDELT) that, for more than 30 years, monitors news media from all over the world. In the experiments, we analyze the generalization abilities to new facts that might be missing in the temporal KGs and also analyze to what degree a factorized KG can serve as an explicit memory.
We propose that our technical models might be related to the brain’s explicit memory systems, i.e., its episodic and its semantic memory. Both are considered longterm memories and store information potentially over the lifetime of an individual ebbinghaus1885gedachtnis ; atkinson1968human ; squire1987memory ; ebbinghaus1885gedachtnis . The semantic memory stores general factual knowledge, i.e., information we know, independent of the context where this knowledge was acquired and would be related to a static KG. Episodic memory concerns information we remember and includes the spatiotemporal context of events tulving1986episodic and would correspond to a temporal KG.
An interesting question is how episodic and semantic memories are related. There is evidence that these main cognitive categories are partially dissociated from one another in the brain, as expressed in their differential sensitivity to brain damage. However, there is also evidence indicating that the different memory functions are not mutually independent and support one another greenberg2010interdependence . We propose that semantic memory can be derived from episodic memory by marginalization. Hereby we also consider that many episodes describe starting and endpoints of state changes. For example, an individual might become sick with a disease, which eventually is cured. Similarly, a president’s tenure eventually ends. We study our hypothesis on the Integrated Conflict Early Warning System (ICEWS) dataset, which contains many events with start and end dates. Figure 1 compares semantic and episodic knowledge graphs. Furthermore, Figure 2 illustrates the main ideas of building and modeling semantic and episodic knowledge graphs.
The paper is organized as follows. Section 2 introduces knowledge graphs, the mapping of a knowledge graph to an adjacency tensor, and the statistical embedding models for knowledge graphs. We also describe how popular embedding models for KGs can be extended to episodic KGs. Section 3 shows experimental results on modelling episodic KGs. Finally, we present experiments on the possible relationships between episodic and semantic memory in Section 4.
2 Model Descriptions
A static or semantic knowledge graph (KG) is a tripleoriented knowledge representation. Here we consider a slight extension to the subjectpredicateobject triple form by adding the value in the form (; Value), where Value is a function of and, e.g., can be a Boolean variable (True for 1, False for 0) or a real number. Thus (Jack, likes, Mary; True) states that Jack (the subject or head entity) likes Mary (the object or tail entity). Note that and represent the entities for subject index and object index . To simplify notation we also consider to be a generalized entity associated with predicate type with index . For the episodic KGs we introduce , which is a generalized entity for time .
To model a static KG, we introduce the threeway semantic adjacency tensor where the tensor element is the associated Value of the triple . One can also define a companion tensor with the same dimensions as and with entries . Thus, the probabilistic model for the semantic tensor is defined as , where . Similarly, the fourway temporal or episodic tensor has elements which are the associated values of the quadruples , with . Therefore, the probabilistic model for episodic tensor is defined with the corresponding companion tensor as
(1) 
We assume that each entity has a unique latent representation . In particular, the embedding approach used for modeling semantic and episodic knowledge graphs assumes that , and , respectively. Here, the indicator function is a function to be learned.
Given a labeled dataset , latent representations and other parameters (denoted as ) are learned by minimizing the regularized logistic loss
(2) 
In general, most KGs only contain positive triples; nonexisting triples are normally used as negative examples sampled with local closed world assumption. Alternatively, we can minimize a marginbased ranking loss over the dataset such as
(3) 
where is the margin parameter, and and denote the set of positive and negative samples, respectively.
There are different ways for modeling the indicator function or . In this paper, we will only investigate multilinear models derived from tensor decompositions and compositional operations. We now describe the models in detail. Graphical illustrations of the described models are shown in Figure 3.
(a)  (b)  (c)  (d)  (e) 
Table 1 and Table 2 summarize notations used throughout this paper for easy reference, while Table 3 summarizes the number of parameters required for each model.^{2}^{2}2For DistMult, ComplEx, and HolE it is required that . In our experiments (see Sections 3 and 4), in order to enable a fair comparison between the different models, we assume that the latent representations of entities, predicates, and time indices all have the same rank/dimensionality.
General  

Symbol  Meaning 
Entity for subject index  
Entity for object index  
Generalized entity for predicate index  
Generalized entity for time index  
Latent representation of entity  
Latent representation of starting timestamp  
th element of  
Rank/Dimensionality of for  
Rank/Dimensionality of  
Number of entities / predicates / timestamps 
Semantic knowledge graphs  Episodic knowledge graphs  

Symbol  Meaning  Symbol  Meaning 
Sem. adjacency tensor  Epi. adjacency tensor  
Companion tensor of  Companion tensor of  
Value of (, , )  Value of (, , , )  
Logit of (, , )  Logit of (, , , )  
Sem. indicator function  Epi. indicator function  
Sem. core tensor  Epi. core tensor  
Element of  Element of 
Tucker. First, we consider the Tucker model for semantic tensor decomposition of the form . Here, are elements of the core tensor . Similarly, the indicator function of a fourway Tucker model for episodic tensor decomposition is of the form
(4) 
with a four dimensional core tensor . Note that this is a constraint Tucker model, since, as in RESCAL, entities have unique representations, independent of the roles as subject or object.
RESCAL. Another model closely related to the semantic Tucker tensor decomposition is the RESCAL model, which has shown excellent performance in modelling KGs nickel2011three
. In RESCAL, subjects and objects have vector latent representations, while predicates have matrix latent representations. The indicator function of RESCAL for modeling semantic KGs takes the form
, where represents the matrix latent representation for the predicate . Then next two models, Tree and ConT, are novel generalizations of RESCAL to episodic tensors.Tree. From a practical perspective, training an episodic Tucker tensor model is very expensive since the computational complexity is approximately . Tensor networks provide a general and flexible framework to design nonstandard tensor decompositions cichocki2014era ; cichocki2014tensor . One of the simplest tensor networks is a tree tensor decomposition () of the episodic indicator function, which is illustrated in compositional operations. We now describe the models in detail. Graphical illustrations of the described models are shown in Figure 3(e). Therefore, we propose a tree tensor decomposition () of the episodic indicator function. The tree is partitioned into two subtrees and , wherein subject and time reside in , while object and an auxiliary time reside in . and are connected with through two core tensors and . Thus, the indicator function can be written as
(5) 
Within , we reduce the fourway core tensor in Tucker into two threedimensional tensors and , so that the computational complexity of is approximately .
ConT. ConT is another generalization of the RESCAL model to episodic tensors with reduced computational complexity of approximately . The idea is that another way of reducing the complexity is by contracting indices of the core tensor. Therefore, we contract the from Tucker with the time index giving a threeway core tensor for each time instance. The indicator function takes the form
(6) 
In this model, the tensor resembles the relationspecific matrix from RESCAL. Later, we will see that ConT is a superior model for modeling episodic knowledge graphs due to the representational flexibility of its highdimensional tensor for the time index.
Even though the complexity of Tree and ConT is reduced as compared to episodic Tucker, the threedimensional core tensor might cause rapid overfitting during training. Therefore, we next propose episodic generalization of compositional models, such as DistMult yang2014embedding , HolE nickel2016holographic and ComplEx trouillon2016complex . For those models, the number of parameters only increases linearly with the rank.
DistMult. DistMult yang2014embedding is a simple generalization of the CP model, by enforcing the constraint that entities should have unique representations. Episodic DistMult takes the form . Here, we require that vector latent representations of entities, predicates, and timestamps have the same rank. DistMult is a special case of Tucker having a core tensor with only diagonal elements .
HolE. Holographic embedding (HolE) nickel2016holographic is a stateofart link prediction and knowledge graph completion method, which is inspired by holographic models of associative memory.
HolE uses circular correlation to generate a compositional representation from inputs and . The indicator of HolE reads , where denotes the circular correlation . We define the episodic extension of HolE as
(7) 
As argued by nickel2016holographic , HolE employs a holographic reduced representation plate1995holographic to store and retrieve the predicates from and . Analogously, episodic HolE should be able to retrieve the stored timestamps from , and . In the semantic case, can be retrieved if existing triple relations are stored via circular convolution , and superposition in the representation , where is the set of all true triples given . This is based on the fact that nickel2016holographic . Analogously, the stored timestamp for an event can be retrieved if all existing episodic events are stored via , and superposition in the representation of , , where is the set of all true quadruples given . However, high order circular correlation/convolution will increase the inaccuracy of retrieval. Another motivation for our episodic extension (7) is that a compositional operator of the form allows a projection from episodic memory to semantic memory, to be detailed later.
ComplEx. Complex embedding (ComplEx) trouillon2016complex is another stateofart method closely related to HolE. It can accurately describe both symmetric and antisymmetric relations. HolE is a special case of ComplEx with imposed conjugate symmetry on embeddings DBLP:journals/corr/HayashiS17
. Thus, ComplEx has more degrees of freedom, if compared to HolE. For the semantic complex embedding, the indicator function is
with complex valued a and where the bar indicates the complex conjugate. To be consistent with the episodic HolE, the episodic complex embedding is defined as^{3}^{3}3One can show that Eq. (7) is equivalent to Eq. (8) by converting it to the frequency domain DBLP:journals/corr/HayashiS17 . Then, , whereare the discrete Fourier transforms of embeddings
a, and using the fact that is conjugate symmetric for real vector a.(8) 
3 Experiments on Episodic Models
We investigate the proposed tensor and compositional models with experiments which are evaluated on two datasets:
ICEWS. The Integrated Conflict Early Warning System (ICEWS) dataset ward2013comparing is a natural episodic dataset recording dyadic events between different countries. An example entry could be (Turkey, Syria, Fight, 12/25/2014). These dyadic events are aggregated into a fourway tensor with entities, relation types, and timestamps, which has in total positive quadruples ^{4}^{4}4Note that for an episodic event the dataset contains all the quadruples for . . This dataset was first created and used in schein2015bayesian . From this ICEWS dataset, a semantic tensor is generated by extracting consecutive events that last until the last timestamp, constituting the current ^{5}^{5}5Current always indicates the last timestamp/timestamps of the applied episodic KGs. semantic facts of the world.
GDELT. The Global Database of Events, Language and Tone (GDELT) ward2013comparing monitors the world’s news media in broadcast, print and web formats from all over the world, daily since January 1, 1979 ^{6}^{6}6https://www.gdeltproject.org/about.html. We use GDELT as a large episodic dataset. For our experiments, GDELT data is collected from January 1, 2012 to December 31, 2012 (with a temporal granularity of 24 hrs). These events are aggregated into an episodic tensor with entities, relation types, and timestamps, which has in total positive quadruples.
We assess the quality of episodic information retrieval on both datasets for the proposed tensor and compositional models. Since both episodic datasets only consist of positive quadruples, we generated negative episodic instances following the protocol of corrupting semantic triples given by Bordes bordes2013translating : negative instances of an episodic quadruple are drawn by corrupting the object to or the timestamp to , meaning that serves as a negative evidence of the episodic event at time instance , and is a true fact which cannot be correctly recalled at time instance . During training, for each positive sample in a batch we assigned two negative samples with corrupted object or corrupted subject.
Runtime  

Model  Semantic  Episodic  Complexity  rank  rank  rank 
DistMult  
HolE  
ComplEx  
Tree  
ConT  
Tucker 
Number of parameters for different models and the runtime of one training epoch on the GDELT dataset.
The model performance is evaluated using the following scores. To retrieve the occurrence time, for each true quadruple, we replace the time index with every other possible time index , compute the value of the indicator function , and rank them in a decreasing order. We filter the ranking as in bordes2013translating by removing all quadruples where and , in order to eliminate ambiguity during episodic information retrieval. Similarly, we evaluated the retrieval of the predicate between a given subject and object at a certain time instance by computing and ranking the indicator . We also evaluated the retrieval of entities by ranking and averaging the filtered indicators and . To measure the generalization ability of the models, we report different measures of the ranking: mean reciprocal rank (MRR), and Hits@n on the test dataset.
The datasets were split into train, validation, and test sets that contain the most frequently appearing entities in the episodic knowledge graphs. Training was performed by minimizing the logistic loss (2), and was terminated using early stopping on the validation dataset by monitoring the filtered MRR recall scores every epochs depending on the models, where the maximum training duration was epochs. This ensures that the generalization ability of unique latent representations of entities doesn’t suffer from overfitting. Before training, all model parameters are initialized using Xavier initialization glorot2010understanding . We also apply an norm penalty on all parameters for regularization purposes (see Eq. (2)).
In Table 3 we summarize the runtime for one training epoch on the GDELT dataset for different models at ranks . All experiments were performed on a single Tesla K GPU. In the following experiments, for compositional models we search rank in , while for tensor models we search optimal rank in
since larger ranks could lead to overfitting rapidly. Loss function is minimized with Adam method
kingma2014adam with the learning rate selected from .Entity  Predicate  

Method  MRR  @1  @3  @10  MRR  @1  @3  @10 
DistMult  0.182  6.55  19.77  43.70  0.269  12.65  30.29  59.40 
HolE  0.177  6.67  18.95  41.84  0.256  11.81  28.35  57.73 
ComplEx  0.172  6.54  17.52  41.56  0.255  12.05  27.75  56.60 
Tree  0.196  8.17  21.00  44.65  0.274  13.30  30.66  60.05 
Tucker  0.204  8.93  21.85  46.35  0.275  12.69  31.35  60.70 
ConT  0.233  13.85  24.65  42.96  0.263  12.83  29.27  57.30 
We first assess the filtered MRR, Hits@1, Hits@3, and Hits@10 scores of inferring missing entities and predicates on the GDELT test dataset. Table 4 summarizes the results. Generalizations on the test dataset indicate the inductive reasoning capability of the proposed models. This generalization can be useful for the completion of evolving KGs with missing records, such as clinical datasets. It can be seen that tensor models are able to outperform compositional models consistently on both entity and predicate prediction tasks. ConT has the best inference results on the entityrelated tasks, while Tucker performs better on the predicaterelated tasks. The superior Hits@1 result of ConT on the entity prediction indicates that there are easily to be fitted entities in the GDELT dataset along the timestamps. In fact, the GDELT dataset is unbalanced, and episodic quadruples related to certain entities dominate in the episodic Knowledge graph, such as quadruples containing the entities USA, or UN. Experiment results on balanced and extremely sparse episodic dataset will be reported in the following.
Entity  Predicate  

Method  MRR  @1  @3  @10  MRR  @1  @3  @10 
DistMult  0.222  9.72  22.48  52.32  0.520  33.73  62.25  91.13 
HolE  0.229  9.85  23.49  54.21  0.517  31.55  65.47  93.59 
ComplEx  0.229  8.94  23.53  57.72  0.506  30.99  61.46  93.44 
Tree  0.205  10.48  19.84  42.81  0.554  36.62  67.25  94.70 
Tucker  0.257  12.88  27.10  54.43  0.563  36.96  69.55  95.43 
ConT  0.264  15.71  29.60  46.67  0.557  38.12  67.76  87.71 
Next, Table 5 shows the MRR, Hits@1, Hits@3, and Hits@10 scores of inferring missing entities and predicates on the ICEWS test dataset. Similarly, we can read that tensor models outperform compositional models on both missing entity and predicate inference tasks. The superior Hits@1 result of ConT for the missing entity prediction indicates again that the ICEWS dataset is unbalanced, and episodic quadruples related to certain entities dominate.
Timestamp  Entity  

Method  Rank  MRR  @3  MRR  @3 
DistMult  200  0.257  27.0  0.211  21.9 
HolE  200  0.216  20.8  0.179  16.3 
ComplEx  200  0.354  40.3  0.301  33.2 
Tree  40  0.421  55.3  0.314  35.7 
Tucker  40  0.923  98.9  0.893  97.1 
ConT  40  0.982  99.7  0.950  97.9 
The recollection of the exact occurrence time of a significant past event (e.g. unusual, novel, attached with emotion) is also an important capability of episodic cognitive memory function. In order to manifest this perspective of proposed models, Table 6 shows the filtered MRR, and Hits@3 scores for the timestamps and entities recollection on the episodic ICEWS (rare) training dataset, where rank column registers the optimal and minimum rank having the outstanding recall scores. Figure 4 further displays the filtered MRR score as a function of rank. Unlike the original ICEWS, which contains many consecutive events that last from the first to the last timestamp leading to unreasonably high filtered timestamp recall scores, this ICEWS (rare) dataset consists of rare temporal events that happen less than three times throughout the whole time and starting points of events.
The outstanding performance of ConT compared with other compositional models indicates the importance of large dimensionality of time latent representation for the episodic tensor reconstruction / episodic memory recollection. Recall that for ConT the real dimension of the latent representation of time is actually after flattening . This flexible latent representation for time could compress almost all the semantic triples that occur at a certain instance ^{7}^{7}7This observation has its biological counterpart. In fact, the entorhinal cortex, which plays an important role in the formation of episodic memory, is the main part of the adult hippocampus that shows neurogenesis deng2010new
. In an adult human, approximately 700 new neurons are added per day through hippocampal neurogenesis, which are believed to perform sensory and spatial information encoding, as well as temporal separation of events
lazarov2016hippocampal ; spalding2013dynamics ..4 Semantic Memory from Episodic Memory with Marginalization
We already discussed that a semantic KG might be related to a human semantic memory and that an episodic KG might be related to a human episodic memory. It has been speculated that episodic and semantic memory must be closely related, and that semantic memory is generated from episodic memory by some training process mcclelland1995there ; nadel2000multiple . As a very simple implementation of that idea, we propose that a semantic memory could be generated from episodic memory by marginalizing time. Thus, both types of memories would rely on identical representations and the marginalization step can be easily performed: Since probabilistic tensor models belong to the classes of sumproduct nets, a marginalization simply means an integration over all time representations.
Thus, in the second set of experiments, we test the hypothesis that semantic memory can be derived from episodic memory by projection. In other words, a semantic knowledge graph containing current semantic facts can be approximately constructed after modeling a corresponding episodic knowledge graph via marginalization. A marginalization can be performed by activating all time index neurons, i.e., summing over all , since, e.g., Tucker decompositions are an instance of a socalled sumproduct network poon2011sum . However, events having start as well as end timestamps cannot simply be integrated into our current semantic knowledge describing what we know now. For example, (Ban Kimoon, SecretaryOf, UN) is not consistent with what we know currently. To resolve this problem, we introduce two types of time indices, and , having the latent representations and , respectively. Those time indices can be used to construct the episodic tensor aggregating the start timestamps of consecutive events, as well as the episodic tensor aggregating the end timestamps^{8}^{8}8E.g., if the duration of a triple event lasts from to , the quadruple is stored in , while is stored only if (where is the last timestamp). In other words, events that last until the last timestamp do not possess ..
For the projection, instead of only summing over , we also subtract the sum over . In this way, we can achieve the effect that events that have terminated already (i.e., have an end time index smaller than the current time index) are not integrated into the current semantic facts. Now, to test our hypothesis that this extended projection allows us to derive semantic memory from episodic memory, we trained HolE, DistMult, ComplEx, ConT, and Tucker on the episodic tensors and as well as on the semantic tensor derived from ICEWS. Note that only these models allow projection, since their indicator functions can be written in the form , where can be arbitrary function of , , and depending on the model choice^{9}^{9}9For ConT, , where denotes the outer product. For ComplEx, , where denotes the Hadamard product. The Tree model cannot be written in this form since resides in both subtrees and .. The model parameters are optimized using the marginbased ranking loss (3)^{10}^{10}10
For the projection experiment, we omit the sigmoid function in Eq. (
3), train and interpret the multilinear indicatordirectly as the probability of episodic quadruple. Only in this way of training, a projection is mathematically legitimate.
.Training was first performed on the episodic tensor , and then on with fixed , , and obtained from the training on , since we assume that latent representations for subject, object, and predicate of a consecutive event do not change during the event. Note that after training in this way, we could recall the starting and terminal point of a consecutive event (see the episodic tensor reconstruction experiments in Section 3), or infer a current semantic fact solely from the latent representations instead of rulebased reasoning.
To evaluate the projection, we compute the recall and area under precisionrecallcurve (AUPRC) scores for the projection at different ranks on the ICEWS training dataset, and compare them with the scores obtained from training the semantic tensor separately. The semantic dataset contains positive triples, which are episodic events that continue until the last (current) timestamp, e.g. (António Guterres, SecretaryOf, UN, True), along with negative triples extracted from already terminated episodic events, e.g. (Ban Kimoon, SecretaryOf, UN, False). During the test phase of projection, a triple from the semantic dataset is given with nonspecified time index, e.g. . Then, for the first method considering only the starting point of an episodic event, the projection to semantic space is computed as
(9) 
while for the second method considering both starting and terminal points, the projection is computed as
(10) 
Then, the scores are evaluated by taking the label of the given semantic triple as the target, and taking as the prediction. The goal of this test is to check how well the algorithms can project a given consecutive event to semantic knowledge space using only the marginalized latent representation of time. All other experimental settings are similar to those in Section 3, and the experiments were repeated four times on different sampled training datasets.
Figure 5 shows the recall scores for the two different projection methods on the training dataset in comparison to the separately trained semantic dataset. Due to limited space, we only show four models: ConT, Tucker, ComplEx, and HolE. As we can see, only the marginalization considering both starting and terminal time indices allows a reasonable projection from episodic memory to the current semantic memory. Again, ConT^{11}^{11}11Note that since ConT doesn’t have a direct semantic counterpart, we instead use the semantic results obtained using RESCAL. This is reasonable since ConT can be viewed as a highdimensional (i.e., episodic) generalization of RESCAL. exhibits the best performance, with its recall score saturating after . In contrast, HolE shows insufficient projection quality with sizable errors, especially at small ranks, which is due to its higherorder encoding noise. To show that the two types of latent representations of time do not simply eliminate each other for a correct episodic projection, Figure 6 shows the AUPRC scores evaluated on the training dataset. Overall, this experiment supports the idea that semantic memory is a longterm storage for episodic memory, where the exact timing information is lost.
Start  StartEnd  Start (false)  StartEnd (false)  Semantic  

Method  Filter  Raw  Filter  Raw  Filter  Raw  Filter  Raw  Filter  Raw 
DistMult  3.8  3.6  5.6  5.0  4.0  3.8  3.8  3.6  59.3  32.4 
HolE  5.8  5.4  5.5  5.1  4.7  4.5  5.6  5.2  56.1  31.3 
ComplEx  4.1  3.7  4.9  4.4  3.9  3.7  3.8  3.6  60.1  29.4 
Tucker  14.8  13.1  15.1  13.4  11.3  10.3  11.8  10.9  46.5  23.7 
ConT  30.9  24.6  40.8  30.3  23.0  19.9  22.6  19.3  43.8  20.4 
For a fair comparison, in the last experiment we report the recall scores of the semantic models obtained by projecting the episodic models with respect to the temporal dimension. We compare two projection methods, the Start projection which only considers the staring point of episodic events (see Eq. 9), and the StartEnd projection which takes both the starting and terminal points of episodic events into consideration. In addition, we report the recall scores on two semantic datasets. The first one contains genuine semantic facts, while the second dataset contains false semantic triples which should already be ruled out through the projection.
Two different projections are performed on two semantic datasets, the genuine one and the false one. Theoretically, the recall scores on the genuine semantic dataset should be higher than those on the false dataset. Thus, the model hyperparameters are chosen by monitoring the difference between the recall scores Hits@10 on the genuine and false semantic datasets.
Table. 7 reports the filtered and raw Hits@10 metrics for different models, projection methods, and datasets. Moreover, we also compare the projection with the recall scores obtained by directly modeling the genuine semantic dataset using the corresponding semantic models ^{12}^{12}12Note that we use the RESCAL model as the corresponding semantic model for the ConT.. The ConT model has the best projection performance, since its projected recall scores on the genuine dataset are much higher than those obtained on the false semantic dataset. Moreover, the StartEnd projection method based on the ConT model is the only combination which achieves similar results compared to the corresponding semantic model. One can also notice that all the projected compositional models are only able to tell whether a semantic triple is already ruled out or not before the last timestamp, however they can not provide good inference results on the genuine semantic dataset.
5 Conclusion
This paper described the first mathematical models for the declarative memories: the semantic and episodic memory functions. To model these cognitive functions, we generalized leading approaches for static knowledge graphs (i.e., Tucker, RESCAL, HolE, ComplEx, DistMult) to 4dimensional temporal/episodic knowledge graphs. In addition, we developed two novel generalizations of RESCAL to episodic tensors, i.e., Tree and ConT. In particular, ConT has superior performance overall, which indicates the importance of introduced highdimensional latent representation of time for both sparse episodic tensor reconstruction and generalization.
Our hypothesis is that perception includes an active semantic decoding process, which relies on latent representations of entities and predicates, and that episodic and semantic memories depend on the same decoding process. We argue that temporal knowledge graph embeddings might be models for human cognitive episodic memory and that semantic memory (facts we know) can be generated from episodic memory by a marginalization operation. We also test this hypothesis on the ICEWS dataset, the experiments show that the current semantic facts can only be derived from the episodic tensor by a proper projection considering both starting and terminal points of consecutive events.
Acknowledgements. This work is funded by the
Cognitive Deep Learning
research project in Siemens AG.References
References
 (1) S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Dbpedia: A nucleus for a web of open data, The semantic web (2007) 722–735.
 (2) F. M. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: Proceedings of the 16th international conference on World Wide Web, ACM, 2007, pp. 697–706.
 (3) K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, AcM, 2008, pp. 1247–1250.
 (4) D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Communications of the ACM 57 (10) (2014) 78–85.
 (5) A. Singhal, Introducing the knowledge graph: things, not strings, Official google blog.

(6)
M. Nickel, K. Murphy, V. Tresp, E. Gabrilovich, A review of relational machine learning for knowledge graphs, Proceedings of the IEEE.
 (7) H. Ebbinghaus, Über das gedächtnis: untersuchungen zur experimentellen psychologie, Duncker & Humblot, 1885.
 (8) R. C. Atkinson, R. M. Shiffrin, Human memory: A proposed system and its control processes, Psychology of learning and motivation 2 (1968) 89–195.
 (9) L. R. Squire, Memory and brain.
 (10) E. Tulving, Episodic and semantic memory: Where should we go from here?, Behavioral and Brain Sciences 9 (03) (1986) 573–577.
 (11) D. L. Greenberg, M. Verfaellie, Interdependence of episodic and semantic memory: evidence from neuropsychology, Journal of the International Neuropsychological society 16 (05) (2010) 748–753.
 (12) M. Nickel, V. Tresp, H.P. Kriegel, A threeway model for collective learning on multirelational data, in: Proceedings of the 28th international conference on machine learning (ICML11), 2011, pp. 809–816.
 (13) A. Cichocki, Era of big data processing: A new approach via tensor networks and tensor decompositions, in: International Workshop on Smart InfoMedia Systems in Asia (SISA2013), 2013.
 (14) A. Cichocki, Tensor networks for big data analytic and largescale optimization problems, in: Second Int. Conference on Engineering and Computational Schematics (ECM2013), 2013.
 (15) B. Yang, W.t. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and inference in knowledge bases, International Conference on Learning Representations (ICLR).

(16)
M. Nickel, L. Rosasco, T. Poggio, Holographic embeddings of knowledge graphs, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016.
 (17) T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in: International Conference on Machine Learning, 2016, pp. 2071–2080.

(18)
T. A. Plate, Holographic reduced representations, IEEE Transactions on Neural Networks 6 (3) (1995) 623–641.

(19)
K. Hayashi, M. Shimbo, On the
equivalence of holographic and complex embeddings for link prediction, CoRR
abs/1702.05563.
URL http://arxiv.org/abs/1702.05563  (20) M. D. Ward, A. Beger, J. Cutler, M. Dickenson, C. Dorff, B. Radford, Comparing gdelt and icews event data, Analysis 21 (2013) 267–297.
 (21) A. Schein, J. Paisley, D. M. Blei, H. Wallach, Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp. 1045–1054.
 (22) A. Bordes, N. Usunier, A. GarciaDuran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multirelational data, in: Advances in neural information processing systems, 2013, pp. 2787–2795.
 (23) X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks., in: Aistats, Vol. 9, 2010, pp. 249–256.
 (24) D. Kingma, J. Ba, Adam: A method for stochastic optimization, Proceedings of the 3rd International Conference on Learning Representations (ICLR).
 (25) W. Deng, J. B. Aimone, F. H. Gage, New neurons and new memories: how does adult hippocampal neurogenesis affect learning and memory?, Nature reviews. Neuroscience 11 (5) (2010) 339.
 (26) O. Lazarov, C. Hollands, Hippocampal neurogenesis: learning to remember, Progress in neurobiology 138 (2016) 1–18.
 (27) K. L. Spalding, O. Bergmann, K. Alkass, S. Bernard, M. Salehpour, H. B. Huttner, E. Boström, I. Westerlund, C. Vial, B. A. Buchholz, et al., Dynamics of hippocampal neurogenesis in adult humans, Cell 153 (6) (2013) 1219–1227.
 (28) J. L. McClelland, B. L. McNaughton, R. C. O’reilly, Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory., Psychological review 102 (3) (1995) 419.
 (29) L. Nadel, A. Samsonovich, L. Ryan, M. Moscovitch, Multiple trace theory of human memory: computational, neuroimaging, and neuropsychological results, Hippocampus 10 (4) (2000) 352–368.

(30)
H. Poon, P. Domingos, Sumproduct networks: A new deep architecture, in: Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, IEEE, 2011, pp. 689–690.
Comments
There are no comments yet.