Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

03/19/2019
by   Rafael S. Gonçalves, et al.
0

The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represented in a uniform way that can be queried effectively. One step toward uniformly-represented metadata is to normalize the multiple, distinct field names used in metadata (e.g., lat lon, lat and long) to describe the same type of value. To that end, we present a new method based on clustering and embeddings (i.e., vector representations of words) to align metadata field names with ontology terms. We apply our method to biomedical metadata by generating embeddings for terms in biomedical ontologies from the BioPortal repository. We carried out a comparative study between our method and the NCBO Annotator, which revealed that our method yields more and substantially better alignments between metadata and ontology terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

The Variable Quality of Metadata About Biological Samples Used in Biomedical Experiments

We present an analytical study of the quality of metadata about samples ...
research
08/03/2017

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies

The metadata about scientific experiments are crucial for finding, repro...
research
04/30/2021

Content-based subject classification at article level in biomedical context

Subject classification is an important task to analyze scholarly publica...
research
09/20/2018

Specimens as research objects: reconciliation across distributed repositories to enable metadata propagation

Botanical specimens are shared as long-term consultable research objects...
research
03/21/2019

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases

Metadata-the machine-readable descriptions of the data-are increasingly ...
research
04/29/2018

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

Motivation: Ontologies are widely used in biology for data annotation, i...
research
07/19/2018

Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures

Over the last years, scientific workflows have become mature enough to b...

Please sign up or login with your details

Forgot password? Click here to reset