Reproducible data citations for computational research

08/22/2018
by   Christian Schulz, et al.
0

The general purpose of a scientific publication is the exchange and spread of knowledge. A publication usually reports a scientific result and tries to convince the reader that it is valid. With an ever-growing number of papers relying on computational methods that make use of large quantities of data and sophisticated statistical modeling techniques, a textual description of the result is often not enough for a publication to be transparent and reproducible. While there are efforts to encourage sharing of code and data, we currently lack conventions for linking data sources to a computational result that is stated in the main publication text or used to generate a figure or table. Thus, here I propose a data citation format that allows for an automatic reproduction of all computations. A data citation consists of a descriptor that refers to the functional program code and the input that generated the result. The input itself may be a set of other data citations, such that all data transformations, from the original data sources to the final result, are transparently expressed by a directed graph. Functions can be implemented in a variety of programming languages since data sources are expected to be stored in open and standardized text-based file formats. A publication is then an online file repository consisting of a Hypertext Markup Language (HTML) document and additional data and code source files, together with a summarization of all data sources, similar to a list of references in a bibliography.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2021

Finding citations for PubMed: A large-scale comparison between five freely available bibliographic data sources

As an important biomedical database, PubMed provides users with free acc...
research
02/08/2019

Open data to evaluate academic researchers: an experiment with the Italian Scientific Habilitation

The need for scholarly open data is ever increasing. While there are lar...
research
08/27/2021

A map of Digital Humanities research across bibliographic data sources

Purpose. This study presents the results of an experiment we performed t...
research
05/18/2023

Towards the Automatic Generation of Conversational Interfaces to Facilitate the Exploration of Tabular Data

Tabular data is the most common format to publish and exchange structure...
research
04/13/2018

Are Abstracts Enough for Hypothesis Generation?

The potential for automatic hypothesis generation (HG) systems to improv...
research
11/02/2022

Stack graphs: Name resolution at scale

We present stack graphs, an extension of Visser et al.'s scope graphs fr...

Please sign up or login with your details

Forgot password? Click here to reset