Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

12/13/2021
by   Stephen Bonner, et al.
7

Improving on the standard of care for diseases is predicated on better treatments, which in turn relies on finding and developing new drugs. However, drug discovery is a complex and costly process. Adoption of methods from machine learning has given rise to creation of drug discovery knowledge graphs which utilize the inherent interconnected nature of the domain. Graph-based data modelling, combined with knowledge graph embeddings provide a more intuitive representation of the domain and are suitable for inference tasks such as predicting missing links. One such example would be producing ranked lists of likely associated genes for a given disease, often referred to as target discovery. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, knowledge graphs can be biased either directly due to the underlying data sources that are integrated or due to modeling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We show how knowledge graph embedding models can be affected by this structural imbalance, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models and predictive tasks. Further, we show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information. This suggests that such models can be more influenced by the frequency of entities rather than biological information encoded in the relations, creating issues when entity frequency is not a true reflection of underlying data. Our results highlight the importance of data modeling choices and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during knowledge graph composition.

READ FULL TEXT
research
02/19/2021

A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective

Drug discovery and development is an extremely complex process, with hig...
research
12/18/2020

Biomedical Knowledge Graph Refinement and Completion using Graph Representation Learning and Top-K Similarity Measure

Knowledge Graphs have been one of the fundamental methods for integratin...
research
08/01/2017

Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations

Knowledge graphs are a versatile framework to encode richly structured d...
research
05/17/2021

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) mod...
research
03/25/2022

Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science

Within clinical, biomedical, and translational science, an increasing nu...
research
08/20/2019

Unsupervised Hierarchical Grouping of Knowledge Graph Entities

Knowledge graphs have attracted lots of attention in academic and indust...
research
05/09/2023

Representation Learning for Person or Entity-centric Knowledge Graphs: An Application in Healthcare

Knowledge graphs (KGs) are a popular way to organise information based o...

Please sign up or login with your details

Forgot password? Click here to reset