Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

09/04/2023
by   Ruth Dannenfelser, et al.
0

Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the knowledge extraction process from end-to-end. To address this unmet need, we present FlaMBé (Flow annotations for Multiverse Biological entities), a collection of expert-curated datasets across a series of complementary tasks that capture procedural knowledge in biomedical texts. This dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology. The workflows annotated in FlaMBé are from texts in the burgeoning field of single cell research, a research area that has become notorious for the number of software tools and complexity of workflows used. Additionally, FlaMBé provides, to our knowledge, the largest manually curated named entity recognition (NER) and disambiguation (NED) datasets for tissue/cell type, a fundamental biological entity that is critical for knowledge extraction in the biomedical research domain. Beyond providing a valuable dataset to enable further development of NLP models for procedural knowledge extraction, automating the process of workflow mining also has important implications for advancing reproducibility in biomedical research.

READ FULL TEXT
research
09/21/2023

Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition

Large language models (LLMs) have demonstrated dominating performance in...
research
01/27/2022

Epistemic AI platform accelerates innovation by connecting biomedical knowledge

Epistemic AI accelerates biomedical discovery by finding hidden connecti...
research
05/10/2017

A Biomedical Information Extraction Primer for NLP Researchers

Biomedical Information Extraction is an exciting field at the crossroads...
research
09/27/2021

Discovering Drug-Target Interaction Knowledge from Biomedical Literature

The Interaction between Drugs and Targets (DTI) in human body plays a cr...
research
08/16/2021

MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain

We present MobIE, a German-language dataset, which is human-annotated wi...
research
05/23/2023

BAND: Biomedical Alert News Dataset

Infectious disease outbreaks continue to pose a significant threat to hu...
research
07/03/2023

Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines

With the proliferation of research means and computational methodologies...

Please sign up or login with your details

Forgot password? Click here to reset