Gextext: Disease Network Extraction from Biomedical Literature

11/06/2019
by   Robert O'Shea, et al.
0

PURPOSE: We propose a fully unsupervised method to learn latent disease networks directly from unstructured biomedical text corpora. This method addresses current challenges in unsupervised knowledge extraction, such as the detection of long-range dependencies and requirements for large training corpora. METHODS: Let C be a corpus of n text chunks. Let V be a set of p disease terms occurring in the corpus. Let X indicate the occurrence of V in C. Gextext identifies disease similarities by positively correlated occurrence patterns. This information is combined to generate a graph on which geodesic distance describes dissimilarity. Diseasomes were learned by Gextext and GloVE on corpora of 100-1000 PubMed abstracts. Similarity matrix estimates were validated against biomedical semantic similarity metrics and gene profile similarity. RESULTS: Geodesic distance on Gextext-inferred diseasomes correlated inversely with external measures of semantic similarity. Gene profile similarity also correlated significant with proximity on the inferred graph. Gextext outperformed GloVE in our experiments. The information contained on the Gextext graph exceeded the explicit information content within the text. CONCLUSIONS: Gextext extracts latent relationships from unstructured text, enabling fully unsupervised modelling of diseasome graphs from PubMed abstracts.

READ FULL TEXT

page 1

page 7

research
11/06/2019

Gextext: Unsupervised Knowledge Modelling in Biomedical Literature

PURPOSE: Literature review is a complex task, requiring the expert analy...
research
11/10/2020

Biomedical Information Extraction for Disease Gene Prioritization

We introduce a biomedical information extraction (IE) pipeline that extr...
research
07/09/2018

Jointly Embedding Entities and Text with Distant Supervision

Learning representations for knowledge base entities and concepts is bec...
research
09/17/2020

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology

Automatic phenotype concept recognition from unstructured text remains a...
research
11/10/2020

Relation-weighted Link Prediction for Disease Gene Identification

Identification of disease genes, which are a set of genes associated wit...

Please sign up or login with your details

Forgot password? Click here to reset