Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies

08/03/2017
by   Rafael S. Gonçalves, et al.
0

The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample--a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled--15 not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27 the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

The Variable Quality of Metadata About Biological Samples Used in Biomedical Experiments

We present an analytical study of the quality of metadata about samples ...
research
03/21/2019

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases

Metadata-the machine-readable descriptions of the data-are increasingly ...
research
09/25/2020

A review of metadata fields associated with podcast RSS feeds

Podcasts are traditionally shared through RSS feeds. As well as pointing...
research
03/30/2023

MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

Metadata quality is crucial for digital objects to be discovered through...
research
03/19/2019

Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

The metadata about scientific experiments published in online repositori...
research
05/23/2017

Calidad en repositorios digitales en Argentina, estudio comparativo y cualitativo

Numerous institutions and organizations need not only to preserve the ma...
research
06/20/2019

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets

Automatically extracted metadata from scholarly documents in PDF formats...

Please sign up or login with your details

Forgot password? Click here to reset