Jumping across biomedical contexts using compressive data fusion

08/10/2017
by   Marinka Zitnik, et al.
0

Motivation: The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects--such as a gene and a disease--can be related in different ways, for example, directly via gene-disease associations or indirectly via functional annotations, chemicals and pathways. Different ways of relating these objects carry different semantic meanings. However, traditional methods disregard these semantics and thus cannot fully exploit their value in data modeling. Results: We present Medusa, an approach to detect size-k modules of objects that, taken together, appear most significant to another set of objects. Medusa operates on large-scale collections of heterogeneous data sets and explicitly distinguishes between diverse data semantics. It advances research along two dimensions: it builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program. Medusa is flexible in choosing or combining semantic meanings and provides theoretical guarantees about detection quality. In a systematic study on 310 complex diseases, we show the effectiveness of Medusa in associating genes with diseases and detecting disease modules. We demonstrate that in predicting gene-disease associations Medusa compares favorably to methods that ignore diverse semantic meanings. We find that the utility of different semantics depends on disease categories and that, overall, Medusa recovers disease modules more accurately when combining different semantics.

READ FULL TEXT

page 1

page 2

page 4

page 10

research
01/11/2019

Determining Multifunctional Genes and Diseases in Human Using Gene Ontology

The study of human genes and diseases is very rewarding and can lead to ...
research
07/02/2013

Data Fusion by Matrix Factorization

For most problems in science and engineering we can obtain data sets tha...
research
09/26/2017

Predicting Disease-Gene Associations using Cross-Document Graph-based Features

In the context of personalized medicine, text mining methods pose an int...
research
02/18/2011

Inferring Disease and Gene Set Associations with Rank Coherence in Networks

A computational challenge to validate the candidate disease genes identi...
research
11/13/2019

Predicting microRNA-disease associations from knowledge graph using tensor decomposition with relational constraints

Motivation: MiRNAs are a kind of small non-coding RNAs that are not tran...
research
08/18/2018

Bayesian Hidden Markov Tree Models for Clustering Genes with Shared Evolutionary History

Determination of functions for poorly characterized genes is crucial for...
research
04/27/2015

On a Possible Similarity between Gene and Semantic Networks

In several domains such as linguistics, molecular biology or social scie...

Please sign up or login with your details

Forgot password? Click here to reset