Gextext: Unsupervised Knowledge Modelling in Biomedical Literature

11/06/2019
by   Robert O'Shea, et al.
0

PURPOSE: Literature review is a complex task, requiring the expert analysis of unstructured data. Computational automation of this process presents a valuable opportunity for high throughput knowledge extraction and meta analysis. Currently available methods are limited to the detection of explicit and short-context relationships. We address this challenge with Gextext, which extracts a knowledge graph of latent relationships directly from unstructured text. METHODS: Let C be a corpus of n text chunks. Let V_target be a set of query terms and V_random a random selection of terms in C. Let X indicate the occurrence of V_target and V_random in C. Gextext learns a graph G(V,E) by correlation thresholding on the covariance matrix of X, where thresholds are estimated by the correlations with randomly selected terms. Gextext was benchmarked against GloVE in tasks where embedding distance matrices were correlated against real world similarity matrices. A general corpus was generated from 5,000 randomly selected Wikipedia articles and a biomedical corpus from 961 research papers on stroke. RESULTS: Embeddings generated by Gextext preserved relative geographical distances between countries (Gextext: rho = 0.255, p < 2.22e-16; GloVE: rho = 0.086, p = 1.859e-09) and capital cities (Gextext: rho = 0.282, p < 2.22e-16 ; Glove: rho = 0.093, p = 8.0805e-11). Gextext embeddings organised drug names by shared target (Gextext: rho = 0.456, p < 2.22e-16; GloVE: rho = 0.091, p = 0.00087) and stroke phenotypes by body system (Gextext: rho = 0.446, p < 2.22e-16; GloVE: rho = 0.129, p = 1.7464e-11). CONCLUSIONS: Gextext extracts latent relationships from unstructured text, enabling fully unsupervised automation of the literature review process.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro