A Decision Tree to Shepherd Scientists through Data Retrievability

04/12/2023
by   Andrea Bianchi, et al.
0

Reproducibility is a crucial aspect of scientific research that involves the ability to independently replicate experimental results by analysing the same data or repeating the same experiment. Over the years, many works have been proposed to make the results of the experiments actually reproducible. However, very few address the importance of data reproducibility, defined as the ability of independent researchers to retain the same dataset used as input for experimentation. Properly addressing the problem of data reproducibility is crucial because often just providing a link to the data is not enough to make the results reproducible. In fact, also proper metadata (e.g., preprocessing instruction) must be provided to make a dataset fully reproducible. In this work, our aim is to fill this gap by proposing a decision tree to sheperd researchers through the reproducibility of their datasets. In particular, this decision tree guides researchers through identifying if the dataset is actually reproducible and if additional metadata (i.e., additional resources needed to reproduce the data) must also be provided. This decision tree will be the foundation of a future application that will automate the data reproduction process by automatically providing the necessary metadata based on the particular context (e.g., data availability, data preprocessing, and so on). It is worth noting that, in this paper, we detail the steps to make a dataset retrievable, while we will detail other crucial aspects for reproducibility (e.g., dataset documentation) in future works.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2021

Collapsing the Decision Tree: the Concurrent Data Predictor

A family of concurrent data predictors is derived from the decision tree...
research
02/09/2018

Terminologies for Reproducible Research

Reproducible research---by its many names---has come to be regarded as a...
research
07/18/2022

ir_metadata: An Extensible Metadata Schema for IR Experiments

The information retrieval (IR) community has a strong tradition of makin...
research
07/04/2022

Building a Relation Extraction Baseline for Gene-Disease Associations: A Reproducibility Study

Reproducibility is an important task in scientific research. It is cruci...
research
02/22/2020

A Novel Decision Tree for Depression Recognition in Speech

Depression is a common mental disorder worldwide which causes a range of...
research
10/08/2019

Simulation Reproducibility of a Chaotic Circuit

An evergreen scientific feature is the ability for scientific works to b...
research
09/17/2020

Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows

Scientists rely on simulations to study natural phenomena. Trusting the ...

Please sign up or login with your details

Forgot password? Click here to reset