Towards Long-term and Archivable Reproducibility

06/04/2020
by   Mohammad Akhlaghi, et al.
0

Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: Completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free software. As a proof of concept, we introduce "Maneage" (Managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This paper is itself written with Maneage (project commit eeff5de).

READ FULL TEXT

page 1

page 4

page 6

research
05/06/2020

Advancing computational reproducibility in the Dataverse data repository platform

Recent reproducibility case studies have raised concerns showing that mu...
research
07/11/2022

Long-term Reproducibility for Neural Architecture Search

It is a sad reflection of modern academia that code is often ignored aft...
research
09/29/2020

Long-term Productivity for Long-term Impact

We present a new conceptual definition of 'productivity' for sustainably...
research
01/28/2022

1-2-3 Reproducibility for Quantum Software Experiments

Various fields of science face a reproducibility crisis. For quantum sof...
research
08/03/2017

Testing as an Investment

Software testing is an expensive and important task. Plenty of researche...
research
12/01/2018

A Big Data Architecture for Log Data Storage and Analysis

We propose an architecture for analysing database connection logs across...
research
07/04/2020

Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example

Replicability and reproducibility (R R) are critical for the long-term...

Please sign up or login with your details

Forgot password? Click here to reset