Predicting Disease-Gene Associations using Cross-Document Graph-based Features

09/26/2017
by   Hendrik ter Horst, et al.
0

In the context of personalized medicine, text mining methods pose an interesting option for identifying disease-gene associations, as they can be used to generate novel links between diseases and genes which may complement knowledge from structured databases. The most straightforward approach to extract such links from text is to rely on a simple assumption postulating an association between all genes and diseases that co-occur within the same document. However, this approach (i) tends to yield a number of spurious associations, (ii) does not capture different relevant types of associations, and (iii) is incapable of aggregating knowledge that is spread across documents. Thus, we propose an approach in which disease-gene co-occurrences and gene-gene interactions are represented in an RDF graph. A machine learning-based classifier is trained that incorporates features extracted from the graph to separate disease-gene pairs into valid disease-gene associations and spurious ones. On the manually curated Genetic Testing Registry, our approach yields a 30 points increase in F1 score over a plain co-occurrence baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2017

Increasing the Discovery Power and Confidence Levels of Disease Association Studies: A Survey

The majority of common diseases are influenced by multiple genetic and e...
research
02/18/2011

Inferring Disease and Gene Set Associations with Rank Coherence in Networks

A computational challenge to validate the candidate disease genes identi...
research
05/25/2021

Graph Based Link Prediction between Human Phenotypes and Genes

Background: The learning of genotype-phenotype associations and history ...
research
01/07/2023

Unsupervised ensemble-based phenotyping helps enhance the discoverability of genes related to heart morphology

Recent genome-wide association studies (GWAS) have been successful in id...
research
08/10/2017

Jumping across biomedical contexts using compressive data fusion

Motivation: The rapid growth of diverse biological data allows us to con...
research
04/27/2015

On a Possible Similarity between Gene and Semantic Networks

In several domains such as linguistics, molecular biology or social scie...

Please sign up or login with your details

Forgot password? Click here to reset