A Concept-Centered Hypertext Approach to Case-Based Retrieval

11/27/2018 ∙ by Stefano Marchesin, et al. ∙ Università di Padova 0

The goal of case-based retrieval is to assist physicians in the clinical decision making process, by finding relevant medical literature in large archives. We propose a research that aims at improving the effectiveness of case-based retrieval systems through the use of automatically created document-level semantic networks. The proposed research tackles different aspects of information systems and leverages the recent advancements in information extraction and relational learning to revisit and advance the core ideas of concept-centered hypertext models. We propose a two-step methodology that in the first step addresses the automatic creation of document-level semantic networks, then in the second step it designs methods that exploit such document representations to retrieve relevant cases from medical literature. For the automatic creation of documents' semantic networks, we design a combination of information extraction techniques and relational learning models. Mining concepts and relations from text, information extraction techniques represent the core of the document-level semantic networks' building process. On the other hand, relational learning models have the task of enriching the graph with additional connections that have not been detected by information extraction algorithms and strengthening the confidence score of extracted relations. For the retrieval of relevant medical literature, we investigate methods that are capable of comparing the documents' semantic networks in terms of structure and semantics. The automatic extraction of semantic relations from documents, and their centrality in the creation of the documents' semantic networks, represent our attempt to go one step further than previous graph-based approaches.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Motivation for the proposed research

The volume of medical literature published every year keeps growing at a very fast pace. In terms of performance, this poses a real challenge to clinicians111Clinician is a general term that encompasses every medical position that involves patients; therefore, the responsibilities for clinicians vary depending on the job title. Clinicians can be medical assistants, physicians, surgeons, counselors, psychiatrists, and so on. In a nutshell, clinicians’ responsibilities are diagnosing illnesses and administering treatment to patients through either medicine or procedures. who need to examine such huge volume of literature. Indeed, large databases of published medical research can support clinicians in the clinical decision making process, by providing them with relevant information for a given case. However, the time required to retrieve relevant information from these databases using standard systems is often prohibitive, turning them into cumbersome resources for clinicians. Therefore, an increase in interest for systems that help clinicians to make clinical decisions has emerged. Such systems are known as Clinical Decision Support (CDS) systems.

A CDS system is designed to assist clinicians in providing patient care by producing effective and timely knowledge that can help in the decision making process. One of the tasks of CDS systems is to retrieve, given a medical case of interest, highly related medical literature that could aid clinicians in formulating diagnoses or deciding treatments for the case at hand. This side of CDS systems is known as case-based retrieval. Therefore, the goal of case-based retrieval is to assist clinicians in the clinical decision making process by finding, in large collections, medical literature that most resembles a given case. Cases are described by reports, usually composed of a textual description on the medical findings and the related medical images.

Case-based retrieval presents some unique challenges and peculiarities. Since queries are medical reports, this search task differs radically from other more common search domains, like Web search. In fact, clinical reports are substantially longer compared to Web search queries. Furthermore, having cases often a narrative structure, case-based retrieval differs also from those search domains where queries are long but keyword based, like legal or patent search. Last but not least, due to the limited time that clinicians can afford spending in reading literature while practicing, case-based retrieval strongly favors precision over recall (Burke et al., 2004).

Clinical reports are often created with minimal consideration for any subsequent computational analysis of the underlying concepts. Therefore, case-based retrieval systems have to process a variety of literature styles written with a wide domain-specific vocabulary, comprised of many synonyms and context specific expressions. This fact makes the comparison of clinical reports challenging, as reports lack a representation in a space where their similarity can be measured quantitatively.

Ontologies and thesauri have been often exploited by retrieval systems as a mean to overcome such issues — concept-based Information Retrieval (IR) is the most notable example of this. However, the semantic relations within these knowledge sources have been only partially leveraged, not fully exploiting their power to semantically represent relations between documents’ concepts. The semantic relations expressed within documents carry high informative power that can be the turning point to create more semantic-aware concept-based models.

The combination of recent developments in Information Extraction (IE) and the availability of unparalleled medical resources, thus offers us the opportunity to develop new techniques that better capture the semantics within medical reports. Along with IE techniques, relational learning models can also be employed. Relational learning studies methods for the statistical analysis of graph-structured data. The main objectives of relational learning include prediction of missing edges, prediction of properties of nodes, and clustering nodes based on their connectivity patterns.

Bearing in mind the centrality of providing a higher semantic understanding of the clinical reports’ contents, the remainder of the paper is organized as follows: Section 2 presents the background and related work, Section 3 describes the purposes and aims of the proposed research, Section 4 discusses the proposed methodology and Section 5 provides an outlook on future challenges.

2. Background and related work

The type of queries submitted to case-based retrieval systems — and the need for results that are similar to the issued query — relate case-based retrieval to several research directions in the literature, including: query-by-example (typically used in image retrieval

(Ballerini et al., 2009)), item-to-item recommendation (Sarwar et al., 2001) and search-by-ideal-candidate (Ha-Thuc et al., 2016) (used by LinkedIn in job search).

Another related research direction is concept-based IR. Concept-based IR aims at making use of external knowledge sources (like thesauri and ontologies) to provide additional knowledge and context that may not be explicit in a document collection and users’ queries. Concept-based IR represents both documents and queries using semantic concepts, instead of (or in addition to) keywords, and performs retrieval in that concept space. Concept-based methods can be categorized in two types: (i) methods that use concepts throughout the entire process, in both indexing and retrieval stages (Egozi et al., 2011), and (ii) methods that apply concept analysis in one stage only, such as concept-based query expansion in (Grootjen and Van Der Weide, 2006) — which is a simpler but less accurate solution. The approach we adopt can be considered as an extension of (i) to semantic relations between concepts.

In the biomedical domain — where the authoritative and curated ontologies can provide a valuable source of knowledge — concept-based approaches demonstrate consistent improvements over classic keyword-based systems. For instance, in (Koopman et al., 2012b) queries and documents are transformed from their term-based originals into medical concepts as defined by the SNOMED CT ontology222https://www.snomed.org/. In addition, parent-child ’is-a’ relationships between concepts are used to weight documents that contain concepts subsumed by the query’s concepts. In (Limsopatham et al., 2013b), the authors proposed a method to represent medical records and queries by focusing only on medical concepts essential for the information need of a medical search task. In (Limsopatham et al., 2013a), queries are expanded by inferring additional conceptual relationships from domain-specific resources as well as by extracting informative concepts from the top-ranked medical records.

The field of Biomedical Information Extraction (BioIE) has been active for many years and is highly relevant for clinical decision support. In (Liu et al., 2016) the authors focused on reviewing the recent advances in learning-based approaches for BioIE tasks. BioIE tasks comprise entity linking (Zheng et al., 2015), event identification (Ananiadou et al., 2010) and relation extraction (Uzuner et al., 2011; Wang and Fan, 2014). The i2b2 tranSMART Foundation333https://www.i2b2.org/index.html provided different challenges to evaluate BioIE systems on these tasks.

Regarding relational learning, (Nickel et al., 2016)

discusses two different kinds of relational models, both big data scale compliant. The former are based on latent feature models such as tensor factorization and multi-way neural networks

(Nickel et al., 2011; Bordes et al., 2013). The latter are based on mining observable patterns in the graph (Lao and Cohen, 2010). Combining these latent and observable models can improve the modeling power at decreased computational cost. In Google’s knowledge vault project (Dong et al., 2014)

, the combination of relational learning models and IE methods is performed for the construction of a knowledge graph from Web sources.

Graph-based models have been applied in IR since the early work of Minsky on semantic IR (Minsky, 1969), which was followed by several variants of conceptual IR and knowledge-based IR. With the advent of the Web, and in particular thanks to the seminal works of (Page et al., 1999; Kleinberg, 1999), graph models flourished again. More recent works relate graph theoretic approaches to IR in the context of social or collaborative networks and recommender systems (Craswell and Szummer, 2007; Konstas et al., 2009).

The approaches described above usually build the graph out of the main components of an IR process (e.g. documents/queries/users). Instead, in (Blanco and Lioma, 2012) the authors build the graph out of the individual terms contained in a document. Hence, the object of their representation is not the IR process per se. The same approach has been followed by (Koopman et al., 2012a), where Blanco’s model (Blanco and Lioma, 2012) has been adapted to a concept representation of documents, in order to capture the dependencies between concepts found in medical free-text. Finally, in (Koopman et al., 2016) a graph inference retrieval model is presented that integrates structured knowledge resources, statistical IR methods and inference in a unified framework.

3. Description of proposed research

The proposed research has the objective of improving the effectiveness of case-based retrieval systems for clinical decision support. The research is twofold, comprising of a document graph creation phase and a case-based retrieval phase. Due to this dual nature, two main questions drive the research.

  • How can clinical cases and medical literature be represented in such a way that the semantic, and authoritative, information that lies within them can be connected and leveraged to the best?

  • How can be leveraged such semantic representations of clinical cases and medical literature in such a way that, given a query case, we can effectively return related clinical cases or medical literature?

(i) raises the question of how to represent documents using structures that allow the semantic and authoritative information to be explicitly brought out. Thus, our research presents, as a first step, the study and design of methods for extracting semantic information contained within documents. The principal constituents of the semantic information are authoritative concepts and relations between them. In order to extract authoritative concepts and semantic relations from unstructured medical reports it is necessary to leverage external knowledge sources. Therefore, the research is directed towards the use of these external sources, like knowledge bases, ontologies and thesauri.

Once we have extracted concepts and relations, the second step is to provide a document representation that is well suited to express such semantic information contained within the document. In this way, the document representation can achieve a finer semantic level, by means of authoritative concepts and semantic relations describing the underlying document. Due to the nature of the information extracted by IE techniques, that is concepts and relations, the document representation is well suited to be conceived as a graph structure, precisely - a semantic network (Sowa, 2014). A semantic network is a graph structure for representing knowledge in patterns of interconnected nodes (concepts) and edges (relations). The main feature of semantic networks is the declarative graphic representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge.

The research proposed for (i) leads the research for (ii) to focus on leveraging the documents’ semantic networks for case-based retrieval. Leveraging documents’ semantic networks in both indexing and retrieval stages is the key element to bring case-based retrieval to a higher semantic level. Therefore, semantic similarity measures that are capable of comparing and relating such semantic networks one with each other will be studied and designed. The analysis of documents’ semantic networks can be divided into two approaches: an explicit, or graph-based, approach and an implicit, or latent-based, approach. In short, the former makes use of the topological structure of the networks, along with semantic paths between the nodes (i.e. concepts), in order to define similarity measures that take into consideration the semantic interconnectivity between the elements of the network (Lao and Cohen, 2010; Spanier et al., 2017). The latter leverages graph kernels (Shervashidze et al., 2011, 2009) and embedding models (Narayanan et al., 2017, 2016), that can learn latent (low-dimensional) representations of graphs. Such latent representations can then be compared to find the highest correspondences between the query case and the medical literature.

We believe that our research can improve case-based retrieval systems and also, as a side effect, tackle the semantic gap. In fact, the medical language amplifies some specific challenges of the semantic gap. Vocabulary mismatch is more accentuated and the interdependence between terms is greater than in other domains. Therefore, providing a finer semantic representation of medical literature, by building documents’ semantic networks out of authoritative concepts and semantic relations, can reduce vocabulary mismatch and can better represent the interdependence between medical concepts within text.

4. Research methodology and proposed experiments

The starting point for the proposed research are the works on concept-centered hypertext models by Agosti (Agosti and Crestani, 1993; Agosti and Marchetti, 1992; Agosti et al., 1992). The core ideas of these works are here revisited and leveraged to address the task of case-based retrieval. The description of the methodology is divided in three parts: a first document graph construction part, which describes the methodology for the automatic construction of documents’ semantic networks; a second case-based retrieval part, which describes the methodology proposed to leverage such semantic representations for the retrieval of medical literature; and a third evaluation part, which describes the evaluation of the methods proposed in the first and second part.

4.1. Document Graph Construction

The research focused on a preliminary study of state-of-the-art methods and techniques related to entity linking, relation extraction and link prediction. An investigation of knowledge bases, ontologies and thesauri has also been performed.

The method we propose for the automatic creation of documents’ semantic networks is a combination of three different techniques: entity linking; relation extraction and link prediction. The first two techniques represent the core of the document graph creation, while link prediction comes into play in a second phase, where it is used to enrich the created graph with additional unknown relations. In fact, with link prediction, indirect linkages between source nodes and target nodes can be analyzed, and previously unknown relationships can be discovered.

By extracting semantic relations from text, and by inferring additional connections only between extracted concepts — as opposed to (Koopman et al., 2012a, 2016) — the graph representation is more adherent to the contents that are explicitly stated in the document itself.

For the link prediction part, we decided to exploit TransE (Bordes et al., 2013) as it is one of the most solid and easy-to-train algorithms for link prediction. In our work we are interested in predicting the most likely relations that exist between two concepts (nodes). Thus, the role of link prediction can be twofold. Its primary use is to enrich the document graph with previously unknown relationships between nodes (concepts), as already mentioned. However, as a secondary task, link prediction techniques can also be used to strengthen the confidence score of extracted relations, following the approach presented in (Dong et al., 2014).

For the entity linking part we tested different open-source tools. We decided to adopt MetaMap444https://metamap.nlm.nih.gov/, the most authoritative tool to detect medical entity mentions in free-text. MetaMap analyses biomedical free-text and identifies concepts belonging to the Unified Medical Language System (UMLS), associating each mention with a number of concepts from the UMLS Metathesaurus555https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/ — which comprises more than 3 million distinct concepts.

Currently, we are working on the relation extraction part. Before trying to develop our own relation extraction algorithm, we tested different open-source natural language processing tools. We started with the tool most related to our needs: SemRep

666https://semrep.nlm.nih.gov/. SemRep can detect some relations between UMLS concepts using hand-crafted rules. However, SemRep is based on relations belonging to the UMLS Semantic Network, which relates UMLS Semantic Types — not UMLS concepts. Therefore, SemRep is set to a lower level of granularity, with respect to the possible relations that can occur between concepts, turning out to be too generic for our task. Besides, SemRep was specifically made for extracting hypernymic propositions. We also tested tools developed for the general-knowledge domain. However, these systems, not having been developed expressly for the biomedical domain, were not satisfying enough for our task.

Being targeted to CDS — i.e. voted to the extraction of key relations that can facilitate clinical decision making — our problem setup is fundamentally different from the conventional biomedical setups. Most of state-of-the-art biomedical relation extraction techniques are developed for specific relations, like protein-protein interactions, gene-disease interactions and so on. Therefore, such techniques are targeted to specific areas of the biomedical domain, covering only fractions of it. An example is the i2b2 relation extraction task (Uzuner et al., 2011). In the i2b2 relation extraction task, entity mentions are manually labeled, and each mention refers to one of three classes: ‘problem’, ‘treatment’ and ‘test’. The extraction task focused on assigning relation types that hold between the three classes mentioned.

To resemble real-world CDS tasks, where perfect mentions do not exist and the set of relations must have sufficient coverage of the medical domain, our setup requires the entity mentions to be automatically detected and the set of relations to be as representative as possible of the medical domain. Furthermore, due to the lack of annotated documents available for the task of general medical relation extraction, it is also necessary to collect training data with minimal-to-none labeling effort.

To overcome the lack of labeled data, we are evaluating various distant supervision methods. Distant supervision has become a popular choice for training relation extraction algorithms without using manually labeled data. Regarding the set of relations to be considered, we opted for the relations contained in the UMLS Metathesaurus. Within UMLS, a substantial understanding of the medical domain is included, comprising medical concepts, relations, definitions and so on. Therefore, UMLS relations are the most suited for CDS tasks.

To build the relation extractor, we are considering to use bidirectional Recurrent Neural Networks (RNNs). The idea behind bidirectional RNNs is that the output at time

t may not only depend on the previous elements in the sequence, but also on future elements. Intuitively, this type of RNNs fits naturally to natural language tasks like relation extraction. In the biomedical domain, the use of RNNs for relation extraction tasks is still in its early stages. Therefore, there is large room for improvement and a good margin of novelty that can be introduced — especially considering the problem setup defined.

When the creation of documents’ semantic networks is performed, the document collection can be considered as a connected graph, where documents are connected by means of semantic relationships determined through the similarity analysis of their semantic networks. Fig 1 shows this two-level data representation.

Figure 1. Two-level data representation.

4.2. Case-Based Retrieval

The second part of the research focuses on leveraging the documents’ semantic networks to perform the retrieval of medical literature, given a query case. Due to the nature of the query, graph-based models suit well to be used for case-based retrieval. In fact, the retrieval task can be seen as the insertion of a new node, the query case, in the graph representation of the document collection by finding its closest medical cases, in terms of semantic similarity. Fig 2 shows the concept.

Figure 2. Case-based retrieval as a node insertion task.

Since case-based retrieval strongly favors precision over recall, and the most semantically similar nodes to the query case are its nearest neighbors in the graph, the proposed retrieval model is highly suited to address this need.

Thus, both graph-based and latent-based models will be investigated for the analysis of documents’ semantic networks. We will evaluate which techniques, among the two categories, prove to be the most appropriate for the documents’ semantic networks, hence returning the most related medical literature for the given query case. Furthermore, approaches that combine graph-based models with latent-based models will be investigated too, in order to test if considering explicit and implicit information together can enhance semantic similarity techniques and thus, the overall retrieval effectiveness.

4.3. Evaluation

In 2014, NIST’s TREC has introduced a CDS search track777http://www.trec-cds.org/. The 2017 track focused on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients. Therefore, to evaluate the methodology described, we are considering to use the NIST’s TREC CDS search track datasets.

5. Specific research issues

In this section we highlight some of the possible issues that could affect the research results and push us to consider alternative approaches to those presented above.

  • The quality of the documents’ semantic networks depends strongly on the quality of the IE systems. Therefore, the overall effectiveness of the retrieval system is bounded to these semantic representations of documents. A minimum level of quality for the networks has to be investigated in order to find the minimum threshold to have competitive results.

  • While representing documents through semantic networks can have the advantage of removing the high amount of noise contained within free-text, it might also have the downside of removing useful information too. Information that could have helped better retrieving relevant documents.

  • In terms of efficiency, the construction of documents’ semantic networks can be time consuming. Depending on the size of documents within the collection, the process of creating and then analyzing semantic networks for each document might become prohibitive.

  • If the structure of relational data appears to be not significantly discriminative for computing the similarity between documents’ semantic networks, then considering it in addition to semantics might result to be more than necessary.


  • (1)
  • Agosti and Crestani (1993) M. Agosti and F. Crestani. 1993. A Methodology for the Automatic Construction of a Hypertext for Information Retrieval. In Proc. 1993 ACM/SIGAPP Symposium on Applied Computing: States of the Art and Practice, Indianapolis, IN, USA, February 14-16. 745–753.
  • Agosti et al. (1992) M. Agosti, G. Gradenigo, and P. G. Marchetti. 1992. A hypertext environment for interacting with large textual databases. Information Processing & Management 28, 3 (1992), 371–387.
  • Agosti and Marchetti (1992) M. Agosti and P. G. Marchetti. 1992. User Navigation in the IRS Conceptual Structure through a Semantic Association Function. Comput. J. 35, 3 (1992), 194–199.
  • Ananiadou et al. (2010) S. Ananiadou, S. Pyysalo, J. Tsujii, and D. B Kell. 2010. Event extraction for systems biology by text mining the literature. Trends in biotechnology 28, 7 (2010), 381–390.
  • Ballerini et al. (2009) L. Ballerini, X. Li, R. B Fisher, and J. Rees. 2009. A query-by-example content-based image retrieval system of non-melanoma skin lesions. In MICCAI International Workshop on Medical Content-Based Retrieval for Clinical Decision Support. 31–38.
  • Blanco and Lioma (2012) R. Blanco and C. Lioma. 2012. Graph-based term weighting for information retrieval. Information Retrieval Journal 15, 1 (2012), 54–92.
  • Bordes et al. (2013) A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787–2795.
  • Burke et al. (2004) D. T. Burke, M. C. DeVito, J. C. Schneider, S. Julien, and A. L. Judelson. 2004. Reading habits of physical medicine and rehabilitation resident physicians. American journal of physical medicine & rehabilitation 83, 7 (2004), 551–559.
  • Craswell and Szummer (2007) N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proc. 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 239–246.
  • Dong et al. (2014) X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 601–610.
  • Egozi et al. (2011) O. Egozi, S. Markovitch, and E. Gabrilovich. 2011. Concept-Based Information Retrieval Using Explicit Semantic Analysis. ACM Trans. Inf. Syst. 29, 2, Article 8 (April 2011), 34 pages.
  • Grootjen and Van Der Weide (2006) F. A. Grootjen and T. P. Van Der Weide. 2006. Conceptual query expansion.

    Data & Knowledge Engineering

    56, 2 (2006), 174–193.
  • Ha-Thuc et al. (2016) V. Ha-Thuc, Y. Xu, S. P. Kanduri, X. Wu, V. Dialani, Y. Yan, A. Gupta, and S. Sinha. 2016. Search by ideal candidates: Next generation of talent search at LinkedIn. In Proceedings of the 25th International Conference Companion on World Wide Web. 195–198.
  • Kleinberg (1999) J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46, 5 (1999), 604–632.
  • Konstas et al. (2009) I. Konstas, V. Stathopoulos, and J. M. Jose. 2009. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 195–202.
  • Koopman et al. (2012a) B. Koopman, G. Zuccon, P. Bruza, L. Sitbon, and M. Lawley. 2012a. Graph-based concept weighting for medical information retrieval. In Proceedings of the Seventeenth Australasian Document Computing Symposium. ACM, 80–87.
  • Koopman et al. (2016) B. Koopman, G. Zuccon, P. Bruza, L. Sitbon, and M. Lawley. 2016. Information retrieval as semantic inference: A graph inference model applied to medical search. Information Retrieval Journal 19, 1-2 (2016), 6–37.
  • Koopman et al. (2012b) B. Koopman, G. Zuccon, A. Nguyen, D. Vickers, L. Butt, and P. D. Bruza. 2012b. Exploiting SNOMED CT concepts and relationships for clinical information retrieval: Australian e-Health Research Centre and Queensland University of Technology at the TREC 2012 Medical Track. In The Twenty-First Text REtrieval Conference Proceedings (TREC 2012)[NIST Special Publication: SP 500-298]. 1–8.
  • Lao and Cohen (2010) N. Lao and W. W. Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81, 1 (2010), 53–67.
  • Limsopatham et al. (2013a) N. Limsopatham, C. Macdonald, and I. Ounis. 2013a. Inferring conceptual relationships to improve medical records search. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval. 1–8.
  • Limsopatham et al. (2013b) N. Limsopatham, C. Macdonald, and I. Ounis. 2013b. A task-specific query and document representation for medical records search. In European Conference on Information Retrieval. Springer, 747–751.
  • Liu et al. (2016) F. Liu, J. Chen, A. Jagannatha, and H. Yu. 2016. Learning for Biomedical Information Extraction: Methodological Review of Recent Advances. arXiv preprint arXiv:1606.07993 (2016).
  • Marchesin (2018) S. Marchesin. 2018. Case-Based Retrieval Using Document-Level Semantic Networks. In 41st ACM SIGIR (SIGIR ’18). ACM, New York, NY, USA, 1451.
  • Minsky (1969) M. L. Minsky. 1969. Semantic Information Processing. The MIT Press.
  • Narayanan et al. (2016) A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, and S. Saminathan. 2016. subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928 (2016).
  • Narayanan et al. (2017) A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal. 2017. graph2vec: Learning Distributed Representations of Graphs. arXiv preprint arXiv:1707.05005 (2017).
  • Nickel et al. (2016) M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. 2016. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1 (2016), 11–33.
  • Nickel et al. (2011) M. Nickel, V. Tresp, and H. Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data.. In ICML, Vol. 11. 809–816.
  • Page et al. (1999) L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
  • Sarwar et al. (2001) B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. ACM, 285–295.
  • Shervashidze et al. (2011) N. Shervashidze, P. Schweitzer, E. Leeuwen, K. Mehlhorn, and K. Borgwardt. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, Sep (2011), 2539–2561.
  • Shervashidze et al. (2009) N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics. 488–495.
  • Sowa (2014) J. F. Sowa. 2014. Principles of semantic networks: Explorations in the representation of knowledge. Morgan Kaufmann.
  • Spanier et al. (2017) A. Spanier, D. Cohen, and L. Joskowicz. 2017. A new method for the automatic retrieval of medical cases based on the RadLex ontology. International journal of computer assisted radiology and surgery 12, 3 (2017), 471–484.
  • Uzuner et al. (2011) Ö. Uzuner, B. South, S. Shen, and S. L. DuVall. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18, 5 (2011), 552–556.
  • Wang and Fan (2014) C. Wang and J. Fan. 2014. Medical relation extraction with manifold models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 828–838.
  • Zheng et al. (2015) J. Zheng, D. Howsmon, B. Zhang, J. Hahn, D. McGuinness, J. Hendler, and H. Ji. 2015. Entity linking for biomedical literature. BMC medical informatics and decision making 15, 1 (2015), S4.