BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale

12/23/2019
by   Qingyu Chen, et al.
0

Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The evaluation results demonstrate that BioConceptVec has better performance than existing methods in all tasks. Finally, BioConceptVec is made freely available to the research community and general public via https://github.com/ncbi-nlp/BioConceptVec.

READ FULL TEXT
research
10/22/2018

BioSentVec: creating sentence embeddings for biomedical texts

Sentence embeddings have become an essential part of today's natural lan...
research
06/19/2023

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

Biomedical relation extraction (RE) is the task of automatically identif...
research
08/10/2023

OpenProteinSet: Training data for structural biology at scale

Multiple sequence alignments (MSAs) of proteins encode rich biological i...
research
07/05/2021

DPPIN: A Biological Dataset of Dynamic Protein-Protein Interaction Networks

Nowadays, many network representation learning algorithms and downstream...
research
06/12/2019

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

Motivation: Graph embedding learning which aims to automatically learn l...
research
06/19/2019

Evaluating Protein Transfer Learning with TAPE

Protein modeling is an increasingly popular area of machine learning res...
research
07/17/2023

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

Understanding protein interactions and pathway knowledge is crucial for ...

Please sign up or login with your details

Forgot password? Click here to reset