Graph Based Link Prediction between Human Phenotypes and Genes

05/25/2021
by   Rushabh Patel, et al.
0

Background: The learning of genotype-phenotype associations and history of human disease by doing detailed and precise analysis of phenotypic abnormalities can be defined as deep phenotyping. To understand and detect this interaction between phenotype and genotype is a fundamental step when translating precision medicine to clinical practice. The recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. Methods: In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. The annotation data from the heterogeneous knowledge resources i.e., orphanet, is used to parse human phenotype-gene associations. To generate the embeddings for the nodes (HPO genes), an algorithm called node2vec was used. It performs node sampling on this graph based on random walks, then learns features over these sampled nodes to generate embeddings. These embeddings were used to perform the downstream task to predict the presence of the link between these nodes using 5 different supervised machine learning algorithms. Results: The downstream link prediction task shows that the Gradient Boosting Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR 0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype gene pairs.

READ FULL TEXT

page 4

page 9

page 10

research
09/26/2017

Predicting Disease-Gene Associations using Cross-Document Graph-based Features

In the context of personalized medicine, text mining methods pose an int...
research
08/23/2020

MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach

Network embedding approaches are gaining momentum to analyse a large var...
research
09/15/2020

Does Link Prediction Help Detect Feature Interactions in Software Product Lines (SPLs)?

An ongoing challenge for the requirements engineering of software produc...
research
07/25/2021

Graph Representation Learning on Tissue-Specific Multi-Omics

Combining different modalities of data from human tissues has been criti...
research
02/10/2021

Memory-Associated Differential Learning

Conventional Supervised Learning approaches focus on the mapping from in...
research
10/31/2022

Learning to Navigate Wikipedia by Taking Random Walks

A fundamental ability of an intelligent web-based agent is seeking out a...
research
02/15/2021

A Hidden Challenge of Link Prediction: Which Pairs to Check?

The traditional setup of link prediction in networks assumes that a test...

Please sign up or login with your details

Forgot password? Click here to reset