1 Introduction
Target task  Interpretable latent factors?  Can induce relation schema?  Can use NP side info?  Can use relation side info?  
Typed RESCAL [Chang et al.2014a]  Embedding  No  No  Yes  No 
Universal Schema [Singh et al.2015]  Link Prediction  No  No  No  No 
KBLDA [MovshovitzAttias and Cohen2015]  Ontology Induction  Yes  Yes  Yes  No 
SICTF (this paper)  Schema Induction  Yes  Yes  Yes  Yes 
Over the last few years, several techniques to build Knowledge Graphs (KGs) from large unstructured text corpus have been proposed, examples include NELL [Mitchell et al.2015] and Google Knowledge Vault [Dong et al.2014]. Such KGs consist of millions of entities (e.g., Oslo, Norway, etc.), their types (e.g., isA(Oslo, City), isA(Norway, Country)), and relationships among them (e.g., cityLocatedInCountry(Oslo, Norway)). These KG construction techniques are called ontologyguided as they require as input list of relations, their schemas (i.e., their type signatures, e.g., cityLocatedInCountry(City, Country)), and seed instances of each such relation. Listing of such relations and their schemas are usually prepared by human domain experts.
The reliance on domain expertise poses significant challenges when such ontologyguided KG construction techniques are applied to domains where domain experts are either not available or are too expensive to employ. Even when such a domain expert may be available for a limited time, she may be able to provide only a partial listing of relations and their schemas relevant to that particular domain. Moreover, this expertmediated model is not scalable when new data in the domain becomes available, bringing with it potential new relations of interest. In order to overcome these challenges, we need automatic techniques which can discover relations and their schemas from unstructured text data itself, without requiring extensive human input. We refer to this problem as Relation Schema Induction (RSI).
In contrast to ontologyguided KG construction techniques mentioned above, Open Information Extraction (OpenIE) techniques [Etzioni et al.2011] aim to extract surfacelevel triples from unstructured text. Such OpenIE triples may provide a suitable starting point for the RSI problem. In fact, KBLDA, a topic modelingbased method for inducing an ontology from SVO (SubjectVerbObject) triples was recently proposed in [MovshovitzAttias and Cohen2015]. We note that ontology induction [Velardi et al.2013] is a more general problem than RSI, as we are primarily interested in identifying categories and relations from a domain corpus, and not necessarily any hierarchy over them. Nonetheless, KBLDA maybe used for the RSI problem and we use it as a representative of the stateoftheart of this area.
Instead of a topic modeling approach, we take a tensor factorizationbased approach for RSI in this paper. Tensors are a higher order generalization of matrices and they provide a natural way to represent OpenIE triples. Applying tensor factorization methods over OpenIE triples to identify relation schemas is a natural approach, but one that has not been explored so far. Also, a tensor factorizationbased approach presents a flexible and principled way to incorporate various types of side information. Moreover, as we shall see in Section 4, compared to stateoftheart baselines such as KBLDA, tensor factorizationbased approach results in better and faster solution for the RSI problem. In this paper, we make the following contributions:

We present Schema Induction using Coupled Tensor Factorization (SICTF), a novel and principled tensor factorization method which jointly factorizes a tensor constructed out of OpenIE triples extracted from a domain corpus, along with various types of additional side information for relation schema induction.

We compare SICTF against stateoftheart baseline on various realworld datasets from diverse domains. We observe that SICTF is not only significantly more accurate than such baselines, but also much faster. For example, SICTF achieves 14x speedup over KBLDA [MovshovitzAttias and Cohen2015].

We have made the data and code available ^{1}^{1}1https://github.com/malllabiisc/sictf.
2 Related Work
Schema Induction:
Properties of SICTF and other related methods are summarized in Table 1^{2}^{2}2Please note that not all methods mentioned in the table are directly comparable with SICTF, the table only illustrates the differences.
KBLDA is the only method which is directly comparable.. A method for inducing (binary) relations and the categories they connect was proposed by [Mohamed et al.2011]. However, in that work, categories and their instances were known apriori. In contrast, in case of SICTF, both categories and relations are to be induced.
A method for event schema induction, the task of learning highlevel representations of complex events and their entity roles from unlabeled text, was proposed in [Chambers2013]. This gives the schemas of slots per event, but our goal is to find schemas of relations.
[Chen et al.2013] and [Chen et al.2015] deal with the problem of finding semantic slots for unsupervised spoken language understanding, but we are interested in finding schemas of relations relevant for a given domain.
Methods for link prediction in the Universal Schema setting using matrix and a combination of matrix and tensor factorization are proposed in [Riedel et al.2013] and [Singh et al.2015], respectively. Instead of link prediction where relation schemas are assumed to be given, SICTF focuses on discovering such relation schemas. Moreover, in contrast to such methods which assume access to existing KGs, the setting in this paper is unsupervised.
Tensor Factorization: Due to their flexibility of representation and effectiveness, tensor factorization methods have seen increased application in Knowledge Graph (KG) related problems over the last few years. Methods for decomposing ontological KGs such as YAGO [Suchanek et al.2007] were proposed in [Nickel et al.2012, Chang et al.2014b, Chang et al.2014a]. In these cases, relation schemas are known in advance, while we are interested in inducing such relation schemas from unstructured text. A PARAFAC [Harshman1970] based method for jointly factorizing a matrix and tensor for data fusion was proposed in [Acar et al.2013]. In such cases, the matrix is used to provide auxiliary information [Narita et al.2012, Erdos and Miettinen2013]. Similar PARAFACbased ideas are explored in Rubik [Wang et al.2015] to factorize structured electronic health records. In contrast to such structured data sources, SICTF aims at inducing relation schemas from unstructured text data. Propstore, a tensorbased model for distributional semantics, a problem different from RSI, was presented in [Goyal et al.2013]. Even though coupled factorization of tensor and matrices constructed out of unstructured text corpus provide a natural and plausible approach for the RSI problem, they have not yet been explored – we fill this gap in this paper.
Ontology Induction: Relation Schema Induction can be considered a sub problem of Ontology Induction [Velardi et al.2013]. Instead of building a fullfledged hierarchy over categories and relations as in ontology induction, we are particularly interested in finding relations and their schemas from unstructured text corpus. We consider KBLDA^{3}^{3}3In this paper, whenever we refer to KBLDA, we only refer to the part of it that learns relations from unstructured data. [MovshovitzAttias and Cohen2015], a topicmodeling based approach for ontology induction, as a representative of this area. Among all prior work, KBLDA is most related to SICTF. While both KBLDA and SICTF make use of noun phrase side information, SICTF is also able to exploit relational side information in a principled manner. In Section 4, through experiments on multiple realworld datasets, we observe that SICTF is not only more accurate than KBLDA but also significantly faster with a speedup of 14x.
A method for canonicalizing noun and relation phrases in OpenIE triples was recently proposed in [Galárraga et al.2014]. The main focus of this approach is to cluster lexical variants of a single entity or relation. This is not directly relevant for RSI, as we are interested in grouping multiple entities of the same type into one cluster, and use that to induce relation schema.
3 Our Approach: Schema Induction using Coupled Tensor Factorization (SICTF)
3.1 Overview
SICTF poses the relation schema induction problem as a coupled factorization of a tensor along with matrices containing relevant side information. Overall architecture of the SICTF system is presented in Figure 1. First, a tensor is constructed to store OpenIE triples and their scores extracted from the text corpus^{4}^{4}4 is the set of nonnegative reals.. Here, and represent the number of NPs and relation phrases, respectively. Following [MovshovitzAttias and Cohen2015], SICTF makes use of noun phrase (NP) side information in the form of (noun phrase, hypernym). Additionally, SICTF also exploits relationrelation similarity side information. These two side information are stored in matrices and , where is the number of hypernyms extracted from the corpus. SICTF then performs collective nonnegative factorization over , , and to output matrix and the core tensor . Each row in corresponds to an NP, while each column corresponds to an induced category (latent factor). For brevity, we shall refer to the induced category corresponding to the column of as . Each entry in the output matrix provides a membership score for NP in induced category . Please note that each induced category is represented using the NPs participating in it, with the NPs ranked by their membership scores in the induced category. In Figure 1, is an induced category.
Each slice of the core tensor is a matrix which corresponds to a specific relation, e.g., the matrix highlighted in Figure 1 corresponds to the relation undergo. Each cell in this matrix corresponds to an induced schema connecting two induced categories (two columns of the matrix), with the cell value representing model’s score of the induced schema. For example, in Figure 1, undergo(, ) is an induced relation schema with score involving relation undergo and induced categories and .
3.2 Side Information
MEDLINE 

(hypertension, disease), (hypertension, state), (hypertension, disorder) , (neutrophil, blood element), (neutrophil, effector cell), (neutrophil, cell type) 
StackOverflow 
(image, resource), (image, content), (image, file), (perl, language), (perl, script), (perl, programs) 
MEDLINE  StackOverflow 
(evaluate, analyze), (evaluate, examine), (indicate, confirm), (indicate, suggest)  (provides, confirms), (provides, offers), (allows, lets), (allows, enables) 

Noun Phrase Side Information: Through this type of side information, we would like to capture type information of as many noun phrases (NPs) as possible. We apply Hearst patterns [Hearst1992], e.g., ”Hypernym such as NP”, over the corpus to extract such (NP, Hypernym) pairs. Please note that neither hypernyms nor NPs are prespecified, and they are all extracted from the data by the patterns. Examples of a few such pairs extracted from two different datasets are shown in Table 2. These extracted tuples are stored in a matrix whose rows correspond to NPs and columns correspond to extracted hypernyms. We define,
Please note that we don’t expect to be a fully specified matrix, i.e., we don’t assume that we know all possible hypernyms for a given NP.

Relation Side Information: In addition to the side information involving NPs, we would also like to take prior knowledge about textual relations into account during factorization. For example, if we know two relations to be similar to one another, then we also expect their induced schemas to be similar as well. Consider the following sentences ”Mary purchased a stuffed animal toy.“ and ”Janet bought a toy car for her son.”. From these we can say that both relations purchase and buy have the schema (Person, Item). Even if one of these relations is more abundant than the other in the corpus, we still want to learn similar schemata for both the relations. As mentioned before, is the relation similarity matrix, where is the number of textual relations. We define,
where is a threshold^{5}^{5}5For the experiments in this paper, we set , a relatively high value, to focus on highly similar relations and thereby justifying the binary matrix.
. For the experiments in this paper, we use cosine similarity over word2vec
[Mikolov et al.2013]vector representations of the relational phrases. Examples of a few similar relation pairs are shown in Table 3.
3.3 SICTF Model Details
SICTF performs coupled nonnegative factorization of the input triple tensor along with the two side information matrices and by solving the following optimization problem.
(1) 
where,
(non negative)  
In the objective above, the first term minimizes reconstruction error for the relation, with additional regularization on the matrix^{6}^{6}6For brevity, we also refer to as , and similarly as . The second term, , factorizes the NP side information matrix into two matrices and , where is the number of induced categories. We also enforce to be nonnegative. Typically, we require to get a lower dimensional embedding of each NP (rows of ). Finally, the third term enforces the requirement that two similar relations as given by the matrix should have similar signatures (given by the corresponding matrix). Additionally, we require and to be nonnegative, as marked by the (nonnegative) constraints. In this objective, , , , , and are all hyperparameters.
We derive nonnegative multiplicative updates for , and following the rules proposed in [Lee and Seung2000], which has the following general form:
Here represents the cost function of the nonnegative variables and and are the negative and positive parts of the derivative of [Mørup et al.2008]. [Lee and Seung2000] proved that for , the cost function monotonically decreases with the multiplicative updates ^{7}^{7}7We also use .. for SICTF is given in equation (1). The above procedure will give the following updates:
In the equations above, is the Hadamard or elementwise product^{8}^{8}8. In all our experiments, we find the iterative updates above to converge in about 1020 iterations.
4 Experiments
Dataset  # Docs  # Triples 

MEDLINE  50,216  2,499 
StackOverflow  5.5m  37,439 
Relation Schema  Top 3 NPs in Induced Categories which were presented to annotators  Annotator Judgment 
StackOveflow  
clicks  : users, client, person  valid 
: link, image, item  
refreshes  : browser, window, tab  valid 
: page, activity, app  
can_parse  : access, permission, ability  invalid 
: image file, header file, zip file  
MEDLINE  
suffer_from  : patient, first patient, anesthetized patient  valid 
: viral disease, renal disease, von recklin ghausen’s disease  
have_undergo  : fifth patient, third patient, sixth patient  valid 
: initial liver biopsy, gun biopsy, lymph node biopsy  
have_discontinue  : patient, group, no patient  invalid 
: endemic area, this area, fiber area  
In this section, we evaluate performance of different methods on the Relation Schema Induction (RSI) task. Specifically, we address the following questions.
4.1 Experimental Setup
Datasets: We used two datasets for the experiments in this paper, they are summarized in Table 4. For MEDLINE dataset, we used Stanford CoreNLP [Manning et al.2014] for coreference resolution and Open IE v4.0^{9}^{9}9Open IE v4.0: http://knowitall.github.io/openie/ for triple extraction. Triples with Noun Phrases that have Hypernym information were retained. We obtained the StackOverflow triples directly from the authors of [MovshovitzAttias and Cohen2015], which were also prepared using a very similar process. In both datasets, we use corpus frequency of triples for constructing the tensor.
Side Information: Seven Hearst patterns such as ”hypernym such as NP”, ”NP or other hypernym” etc., given in [Hearst1992] were used to extract NP side information from the MEDLINE documents. NP side information for the StackOverflow dataset was obtained from the authors of [MovshovitzAttias and Cohen2015].
As described in Section 3, word2vec embeddings of the relation phrases were used to extract relationsimilarity based sideinformation. This was done for both datasets. Cosine similarity threshold of was used for the experiments in the paper.
Samples of side information used in the experiments are shown in Table 2 and Table 3. A total of 2067 unique NPhypernym pairs were extracted from MEDLINE data and 16,639 were from StackOverflow data. 25 unique pairs of relation phrases out of 1172 were found to be similar in MEDLINE data, whereas 280 unique pairs of relation phrases out of approximately 3200 were found similar in StackOverflow data.
Hyperparameters were tuned using grid search and the set which gives minimum reconstruction error for both and was chosen. We set for StackOverflow, and and for Medline and we use for our experiments. Please note that our setting is unsupervised, and hence there is no separate train, dev and test sets.
Ablation  MEDLINE  StackOverflow  

A1  A2  Avg  A1  A2  Avg  
SICTF  0.64  0.64  0.64  0.96  0.92  0.94 
SICTF ( = 0)  0.60  0.56  0.58  0.83  0.70  0.77 
SICTF ( = 0)  0.46  0.40  0.43  0.89  0.90  0.90 
SICTF (=0, = 0)  0.46  0.50  0.48  0.84  0.33  0.59 
SICTF (=0, = 0, and no nonnegativity constraints )  0.14  0.10  0.12  0.20  0.14  0.17 
4.2 Evaluation Protocol
In this section, we shall describe how the induced schemas are presented to human annotators and how final accuracies are calculated. In factorizations produced by SICTF and other ablated versions of SICTF, we first select a few top relations with best reconstruction score. The schemas induced for each selected relation is represented by the matrix slice of the core tensor obtained after factorization (see Section 3). From each such matrix, we identify the indices with highest values. The indices and select columns of the matrix . A few top ranking NPs from the columns and along with the relation are presented to the human annotator, who then evaluates whether the tuple constitutes a valid schema for relation . Examples of a few relation schemas induced by SICTF are presented in Table 5. A human annotator would see the first and second columns of this table and then offer judgment as indicated in the third column of the table. All such judgments across all topreconstructed relations are aggregated to get the final accuracy score. This evaluation protocol was also used in [MovshovitzAttias and Cohen2015] to measure learned relation accuracy.
All evaluations were blind, i.e., the annotators were not aware of the method that generated the output they were evaluating. Moreover, the annotators are experts in software domain and has highschool level knowledge in medical domain. Though recall is a desirable statistic to measure, it is very challenging to calculate it in our setting due to the nonavailability of relation schema annotated text on large scale.
4.3 Results
4.3.1 Effectiveness of SICTF
Experimental results comparing performance of various methods on the RSI task in the two datasets are presented in Figure 2(a). RSI accuracy is calculated based on the evaluation protocol described in Section 4.2. Performance number of KBLDA for StackOveflow dataset is taken directly from the [MovshovitzAttias and Cohen2015] paper, we used our implementation of KBLDA for the MEDLINE dataset. Annotation accuracies from two annotators were averaged to get the final accuracy. From Figure 2(a), we observe that SICTF outperforms KBLDA on the RSI task. Please note that the interannotator agreement for SICTF is 88% and 97% for MEDLINE and StackOverflow datasets respectively. This is the main result of the paper.
In addition to KBLDA, we also compared SICTF with PARAFAC, a standard tensor factorization method. PARAFAC induced extremely poor and small number of relation schemas, and hence we didn’t consider it any further.
Runtime comparison: Runtimes of SICTF and KBLDA over both datasets are compared in Figure 2(b). From this figure, we find that SICTF is able to achieve a 14x speedup on average over KBLDA^{10}^{10}10Runtime of KBLDA over the StackOverflow dataset was obtained from the authors of [MovshovitzAttias and Cohen2015] through personal communication. Our own implementation also resulted in similar runtime over this dataset.. In other words, SICTF is not only able to induce better relation schemas, but also do so at a significantly faster speed.
4.3.2 Importance of Side Information
One of the central hypothesis of our approach is that coupled factorization through additional side information should result in better relation schema induction. In order to evaluate this thesis further, we compare performance of SICTF with its ablated versions: (1) SICTF ( = 0), which corresponds to the setting when no relation side information is used, (2) SICTF (), which corresponds to the setting when no noun phrases side information is used, and (3) SICTF ( = 0, ), which corresponds to the setting when no side information of any kind is used. Hyperparameters are separately tuned for the variants of SICTF. Results are presented in the first four rows of Table 6. From this, we observe that additional coupling through the side information significantly helps improve SICTF performance. This further validates the central thesis of our paper.
4.3.3 Importance of NonNegativity on Relation Schema Induction
In the last row of Table 6, we also present an ablated version of SICTF when no side information no nonnegativity constraints are used. Comparing the last two rows of this table, we observe that nonnegativity constraints over the matrix and core tensor result in significant improvement in performance. We note that the last row in Table 6 is equivalent to RESCAL [Nickel et al.2011] and the fourth row is equivalent to NonNegative RESCAL [Krompaß et al.2013], two tensor factorization techniques. We also note that none of these tensor factorization techniques have been previously used for the relation schema induction problem.
The reason for this improved performance may be explained by the fact that absence of nonnegativity constraint results in an under constrained factorization problem where the model often overgenerates incorrect triples, and then compensates for this overgeneration by using negative latent factor weights. In contrast, imposition of nonnegativity constraints restricts the model further forcing it to commit to specific semantics of the latent factors in . This improved interpretability also results in better RSI accuracy as we have seen above. Similar benefits of nonnegativity on interpretability have also been observed in matrix factorization [Murphy et al.2012].
5 Conclusion
Relation Schema Induction (RSI) is an important first step towards building a Knowledge Graph (KG) out of text corpus from a given domain. While human domain experts have traditionally prepared listing of relations and their schemas, this expertmediated model poses significant challenges in terms of scalability and coverage. In order to overcome these challenges, in this paper, we present SICTF, a novel nonnegative coupled tensor matrix factorization method for relation schema induction. SICTF is flexible enough to incorporate various types of side information during factorization. Through extensive experiments on realworld datasets, we find that SICTF is not only more accurate but also significantly faster (about 11.8x speedup) compared to stateoftheart baselines. As part of future work, we hope to analyze CNTF and its optimization further, assign labels to induced categories, and also apply the model to more domains. We hope to make all code and datasets used in the paper publicly available upon publication of the paper.
Acknowledgement
Thanks to the members of MALL Lab, IISc who read our drafts and gave valuable feedback and we also thank the reviewers for their constructive reviews. This research has been supported in part by Bosch Engineering and Business Solutions and Google.
References
 [Acar et al.2013] Evrim Acar, Morten Arendt Rasmussen, Francesco Savorani, Tormod Næs, and Rasmus Bro. 2013. Understanding data fusion within the framework of coupled matrix and tensor factorizations. Chemometrics and Intelligent Laboratory Systems, 129(Complete):53–63.
 [Chambers2013] Nathanael Chambers. 2013. Event schema induction with a probabilistic entitydriven model. In EMNLP, pages 1797–1807. ACL.

[Chang et al.2014a]
KaiWei Chang, Wen tau Yih, Bishan Yang, and Christopher Meek.
2014a.
Typed tensor decomposition of knowledge bases for relation
extraction.
In
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
. ACL – Association for Computational Linguistics, October.  [Chang et al.2014b] KaiWei Chang, Wentau Yih, Bishan Yang, and Christopher Meek. 2014b. Typed tensor decomposition of knowledge bases for relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1568–1579.
 [Chen et al.2013] YunNung Chen, William Y. Wang, and Alexander I. Rudnicky. 2013. Unsupervised induction and filling of semantic slots for spoken dialogue systems using framesemantic parsing. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 120–125. IEEE.
 [Chen et al.2015] YunNung Chen, William Yang Wang, Anatole Gershman, and Alexander I. Rudnicky. 2015. Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In ACL (1), pages 483–494. The Association for Computer Linguistics.
 [Dong et al.2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A webscale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–610. ACM.
 [Erdos and Miettinen2013] Dora Erdos and Pauli Miettinen. 2013. Discovering facts with boolean tensor tucker decomposition. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 1569–1572, New York, NY, USA. ACM.
 [Etzioni et al.2011] Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In IJCAI, volume 11, pages 3–10.
 [Galárraga et al.2014] Luis Galárraga, Geremy Heitz, Kevin Murphy, and Fabian Suchanek. 2014. Canonicalizing Open Knowledge Bases. CIKM.
 [Goyal et al.2013] Kartik Goyal, Sujay Kumar, Jauhar Huiying, Li Mrinmaya, Sachan Shashank, and Srivastava Eduard Hovy. 2013. A structured distributional semantic model: Integrating structure with semantics.
 [Harshman1970] R. A. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an” explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics, 16(1):84.
 [Hearst1992] Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In In Proceedings of the 14th International Conference on Computational Linguistics, pages 539–545.

[Krompaß et al.2013]
Denis Krompaß, Maximilian Nickel, Xueyan Jiang, and Volker Tresp.
2013.
Nonnegative tensor factorization with rescal.
Tensor Methods for Machine Learning, ECML workshop
.  [Lee and Seung2000] Daniel D. Lee and H. Sebastian Seung. 2000. Algorithms for nonnegative matrix factorization. In In NIPS, pages 556–562. MIT Press.
 [Manning et al.2014] Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60.
 [Mikolov et al.2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc.
 [Mitchell et al.2015] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2015. Neverending learning. In Proceedings of AAAI.
 [Mohamed et al.2011] Thahir P. Mohamed, Estevam R. Hruschka, Jr., and Tom M. Mitchell. 2011. Discovering relations between noun categories. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 1447–1455, Stroudsburg, PA, USA. Association for Computational Linguistics.
 [Mørup et al.2008] M. Mørup, L. K. Hansen, and S. M. Arnfred. 2008. Algorithms for sparse nonnegative TUCKER. Neural Computation, 20(8):2112–2131, aug.
 [MovshovitzAttias and Cohen2015] Dana MovshovitzAttias and William W. Cohen. 2015. Kblda: Jointly learning a knowledge base of hierarchy, relations, and facts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
 [Murphy et al.2012] Brian Murphy, Partha Pratim Talukdar, and Tom M Mitchell. 2012. Learning effective and interpretable semantic models using nonnegative sparse embedding. In COLING, pages 1933–1950.
 [Narita et al.2012] Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. 2012. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery, 25(2):298–324.
 [Nickel et al.2011] Maximilian Nickel, Volker Tresp, and HansPeter Kriegel. 2011. A threeway model for collective learning on multirelational data. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML11), ICML ’11, pages 809–816, New York, NY, USA, June. ACM.
 [Nickel et al.2012] Maximilian Nickel, Volker Tresp, and HansPeter Kriegel. 2012. Factorizing yago: Scalable machine learning for linked data. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pages 271–280, New York, NY, USA. ACM.
 [Riedel et al.2013] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 914, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pages 74–84.
 [Singh et al.2015] Sameer Singh, Tim Rocktäschel, and Sebastian Riedel. 2015. Towards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction. In NAACL Workshop on Vector Space Modeling for NLP (VSM).
 [Suchanek et al.2007] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of WWW.
 [Velardi et al.2013] Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. Ontolearn reloaded: A graphbased algorithm for taxonomy induction. Computational Linguistics, 39(3):665–707.
 [Wang et al.2015] Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel N. Kho, You Chen, Bradley A. Malin, and Jimeng Sun. 2015. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams, editors, KDD, pages 1265–1274. ACM.